The 'source' directory compiles, but the repl and tests are almost
untouched so far. There's no guarantee that the code in 'source' is
correct, so I'm branching this for a short time, until I'm confident the
whole project passes the CI again.
I'm adjusting the concepts of routines and bytecode to make them more
consistent, and tweaking the VM so it loads from an instance of
'Toy_Module'.
* 'Toy_ModuleBuilder' (formally 'Toy_Routine')
This is where the AST is compiled, producing a chunk of memory that can
be read by the VM. This will eventually operate on individual
user-defined functions as well.
* 'Toy_ModuleBundle' (formally 'Toy_Bytecode')
This collects one or more otherwise unrelated modules into one chunk of
memory, stored in sequence. It is also preprended with the version data for
Toy's reference implementation:
For each byte in the bytecode:
0th: TOY_VERSION_MAJOR
1st: TOY_VERSION_MINOR
2nd: TOY_VERSION_PATCH
3rd: (the number of modules in the bundle)
4th and onwards: TOY_VERSION_BUILD
TOY_VERSION_BUILD has always been a null terminated C-string, but from
here on, it begins at the word-alignment, and continues until the first
word-alignment after the null terminator.
As for the 3rd byte listed, since having more than 256 modules in one
bundle seems unlikely, I'm storing the count here, as it was otherwise
unused. This is a bit janky, but it works for now.
* 'Toy_Module'
This new structure represents a single complete unit of operation, such
as a single source file, or a user-defined function. It is divided into
three main sections, with various sub-sections.
HEADER (all members are unsigned ints):
total module size in bytes
jumps count
param count
data count
subs count
code addr
jumps addr (if jumps count > 0)
param addr (if param count > 0)
data addr (if data count > 0)
subs addr (if subs count > 0)
BODY:
<raw opcodes, etc.>
DATA:
jumps table
uint array, pointing to addresses in 'data' or 'subs'
param table
uint array, pointing to addresses in 'data'
data
heterogeneous data, including strings
subs
an array of modules, using recursive logic
The reference implementation as a whole uses a lot of recursion, so this
makes sense.
The goal of this rework is so 'Toy_Module' can be added as a member of
'Toy_Value', as a simple and logical way to handle functions. I'll
probably use the union pattern, similarly to Toy_String, so functions
can be written in C and Toy, and used without needing to worry which is
which.
By leaving 'null' on the stack, it won't cause stack underflows in a
bunch of erroneous situations. This will allow the repl (and other
situations) to continue if they want to.
I've also fixed some error messages in toy_table.c, which were formatted
badly.
Closes#162
I've also added some support for compiler errors in general, but these
will get expanded on later.
I've also quickly added a valgrind option to the tests and found a few
leaks. I'll deal with these later.
Summary of changes:
* Clarified the lifetime of the bytecode in memory
* Erroneous routines exit without compiling
* Empty VMs don't run
* Added a check for malformed assignments
* Renamed "routine" to "module" within the VM
* VM no longer tries to free the bytecode - must be done manually
* Started experimenting with valgrind, not yet ready
I've brought the tests up to scratch, except for compounds im the
parser, because I'm too damn tired to do that over SSH. It looks like
collections are right-recursive, whixh was unintended but still works
just fine.
I've also added the '--verbose' flag to the repl to control the
debugging output.
Several obscure bugs have been fixed, and comments have been tweaked.
Mustfail tests are still needed, but that's a low priority. See #142.
Fixed#151
Strings, due to their potentially large size, are stored outside of a
routine's code section, in the data section. To access the correct
string, you must read the jump index, then the real address from the
jump table - and extra layer of indirection will result in more flexible
data down the road, I hope.
Other changes include:
* Added string concat operator ..
* Added TOY_STRING_MAX_LENGTH
* Strings can't be created or concatenated longer than the max length
* The parser will display a warning if the bucket is too small for a
string at max length, but it will continue
* Added TOY_BUCKET_IDEAL to correspend with max string length
* The bucket now allocates an address that is 4-byte aligned
* Fixed missing entries in the parser rule table
* Corrected some failing TOY_BITNESS tests
The following structures are now more independant:
- Toy_Array
- Toy_Stack
- Toy_Bucket
- Toy_String
I reworked a lot of the memory allocation, so now there are more direct
calls to malloc() or realloc(), rather than relying on the macros from
toy_memory.h.
I've also split toy_memory into proper array and bucket files, because
it makes more sense this way, rather than having them both jammed into
one file. This means the eventual hashtable structure can also stand on
its own.
Toy_Array is a new wrapper around raw array pointers, and all of the
structures have their metadata embedded into their allocated memory now,
using variable length array members.
A lot of 'capacity' and 'count' variables were changed to 'size_t'
types, but this doesn't seem to be a problem anywhere.
If the workflow fails, then I'll leave it for tonight - I'm too tired,
and I don't want to overdo myself.
Strings are needed for the handling of identifiers in the key/value
variable storage, so I've got them working first. I used the rope
pattern, which seems to be quite an interesting approach.
I'll add comparison checks later.
Adjusted how buckets are handled in all tests, could've been an issue
down the line.
Added the build instructions to README.md.
At this point, only a minimal number of operations are working, and
after running any kind of source code, the 'result' is simply left on
the VM's stack. Still, it's awesome to see it reach this point.