WIP, adjusting architecture, read more

The 'source' directory compiles, but the repl and tests are almost
untouched so far. There's no guarantee that the code in 'source' is
correct, so I'm branching this for a short time, until I'm confident the
whole project passes the CI again.

I'm adjusting the concepts of routines and bytecode to make them more
consistent, and tweaking the VM so it loads from an instance of
'Toy_Module'.

* 'Toy_ModuleBuilder' (formally 'Toy_Routine')

This is where the AST is compiled, producing a chunk of memory that can
be read by the VM. This will eventually operate on individual
user-defined functions as well.

* 'Toy_ModuleBundle' (formally 'Toy_Bytecode')

This collects one or more otherwise unrelated modules into one chunk of
memory, stored in sequence. It is also preprended with the version data for
Toy's reference implementation:

For each byte in the bytecode:

    0th: TOY_VERSION_MAJOR
    1st: TOY_VERSION_MINOR
    2nd: TOY_VERSION_PATCH
    3rd: (the number of modules in the bundle)
    4th and onwards: TOY_VERSION_BUILD

TOY_VERSION_BUILD has always been a null terminated C-string, but from
here on, it begins at the word-alignment, and continues until the first
word-alignment after the null terminator.

As for the 3rd byte listed, since having more than 256 modules in one
bundle seems unlikely, I'm storing the count here, as it was otherwise
unused. This is a bit janky, but it works for now.

* 'Toy_Module'

This new structure represents a single complete unit of operation, such
as a single source file, or a user-defined function. It is divided into
three main sections, with various sub-sections.

    HEADER (all members are unsigned ints):
        total module size in bytes
        jumps count
        param count
        data count
        subs count
        code addr
        jumps addr (if jumps count > 0)
        param addr (if param count > 0)
        data addr (if data count > 0)
        subs addr (if subs count > 0)
    BODY:
        <raw opcodes, etc.>
    DATA:
        jumps table
            uint array, pointing to addresses in 'data' or 'subs'
        param table
            uint array, pointing to addresses in 'data'
        data
            heterogeneous data, including strings
        subs
            an array of modules, using recursive logic

The reference implementation as a whole uses a lot of recursion, so this
makes sense.

The goal of this rework is so 'Toy_Module' can be added as a member of
'Toy_Value', as a simple and logical way to handle functions. I'll
probably use the union pattern, similarly to Toy_String, so functions
can be written in C and Toy, and used without needing to worry which is
which.
This commit is contained in:
2025-01-21 13:59:04 +11:00
parent a1f6f147c5
commit 002651f95d
17 changed files with 1554 additions and 1459 deletions

View File

@@ -2,9 +2,9 @@
#include "toy_common.h"
#include "toy_bytecode.h"
#include "toy_bucket.h"
#include "toy_scope.h"
#include "toy_module.h"
#include "toy_value.h"
#include "toy_string.h"
@@ -14,40 +14,42 @@
typedef struct Toy_VM {
//raw instructions to be executed
unsigned char* module; //URGENT: rename to 'code'
unsigned int moduleSize;
unsigned char* code;
unsigned int paramSize;
unsigned int jumpsSize;
unsigned int dataSize;
unsigned int subsSize;
//metadata
unsigned int jumpsCount;
unsigned int paramCount;
unsigned int dataCount;
unsigned int subsCount;
unsigned int paramAddr;
unsigned int codeAddr;
unsigned int jumpsAddr;
unsigned int paramAddr;
unsigned int dataAddr;
unsigned int subsAddr;
//execution utils
unsigned int programCounter;
//stack - immediate-level values only
Toy_Stack* stack;
//scope - block-level key/value pairs
Toy_Scope* scope;
//stack - immediate-level values only
Toy_Stack* stack;
//easy access to memory
Toy_Bucket* stringBucket; //stores the string literals
Toy_Bucket* scopeBucket; //stores the scopes
Toy_Bucket* scopeBucket; //stores the scope instances TODO: is this separation needed?
} Toy_VM;
TOY_API void Toy_initVM(Toy_VM* vm);
TOY_API void Toy_bindVM(Toy_VM* vm, struct Toy_Bytecode* bc); //process the version data
TOY_API void Toy_bindVMToModule(Toy_VM* vm, unsigned char* module); //process the module only
TOY_API void Toy_resetVM(Toy_VM* vm); //persists memory
TOY_API void Toy_initVM(Toy_VM* vm); //creates memory
TOY_API void Toy_inheritVM(Toy_VM* vm, Toy_VM* parent); //inherits memory
TOY_API void Toy_bindVMToModule(Toy_VM* vm, Toy_Module* module);
TOY_API void Toy_runVM(Toy_VM* vm);
TOY_API void Toy_freeVM(Toy_VM* vm);
TOY_API void Toy_resetVM(Toy_VM* vm); //prepares for another run without deleting stack, scope and memory
//TODO: inject extra data (hook system for external libraries)