mirror of
https://github.com/krgamestudios/Toy.git
synced 2026-04-15 23:04:08 +10:00
The 'source' directory compiles, but the repl and tests are almost
untouched so far. There's no guarantee that the code in 'source' is
correct, so I'm branching this for a short time, until I'm confident the
whole project passes the CI again.
I'm adjusting the concepts of routines and bytecode to make them more
consistent, and tweaking the VM so it loads from an instance of
'Toy_Module'.
* 'Toy_ModuleBuilder' (formally 'Toy_Routine')
This is where the AST is compiled, producing a chunk of memory that can
be read by the VM. This will eventually operate on individual
user-defined functions as well.
* 'Toy_ModuleBundle' (formally 'Toy_Bytecode')
This collects one or more otherwise unrelated modules into one chunk of
memory, stored in sequence. It is also preprended with the version data for
Toy's reference implementation:
For each byte in the bytecode:
0th: TOY_VERSION_MAJOR
1st: TOY_VERSION_MINOR
2nd: TOY_VERSION_PATCH
3rd: (the number of modules in the bundle)
4th and onwards: TOY_VERSION_BUILD
TOY_VERSION_BUILD has always been a null terminated C-string, but from
here on, it begins at the word-alignment, and continues until the first
word-alignment after the null terminator.
As for the 3rd byte listed, since having more than 256 modules in one
bundle seems unlikely, I'm storing the count here, as it was otherwise
unused. This is a bit janky, but it works for now.
* 'Toy_Module'
This new structure represents a single complete unit of operation, such
as a single source file, or a user-defined function. It is divided into
three main sections, with various sub-sections.
HEADER (all members are unsigned ints):
total module size in bytes
jumps count
param count
data count
subs count
code addr
jumps addr (if jumps count > 0)
param addr (if param count > 0)
data addr (if data count > 0)
subs addr (if subs count > 0)
BODY:
<raw opcodes, etc.>
DATA:
jumps table
uint array, pointing to addresses in 'data' or 'subs'
param table
uint array, pointing to addresses in 'data'
data
heterogeneous data, including strings
subs
an array of modules, using recursive logic
The reference implementation as a whole uses a lot of recursion, so this
makes sense.
The goal of this rework is so 'Toy_Module' can be added as a member of
'Toy_Value', as a simple and logical way to handle functions. I'll
probably use the union pattern, similarly to Toy_String, so functions
can be written in C and Toy, and used without needing to worry which is
which.
74 lines
2.9 KiB
Plaintext
74 lines
2.9 KiB
Plaintext
The bytecode format
|
||
|
||
===
|
||
|
||
There are four components in the bytecode header:
|
||
|
||
TOY_VERSION_MAJOR
|
||
TOY_VERSION_MINOR
|
||
TOY_VERSION_PATCH
|
||
TOY_VERSION_BUILD
|
||
|
||
The first three are each one unsigned byte, and the fourth is a null terminated C-string.
|
||
|
||
* Under no circumstance, should you ever run bytecode whose major version is different
|
||
* Under no circumstance, should you ever run bytecode whose minor version is above the interpreter’s minor version
|
||
* You may, at your own risk, attempt to run bytecode whose patch version is different from the interpreter’s patch version
|
||
* You may, at your own risk, attempt to run bytecode whose build version is different from the interpreter’s build version
|
||
|
||
An additional note: The contents of the build string may be anything, such as:
|
||
|
||
* the compilation date and time of the interpreter
|
||
* a marker identifying the current fork and/or branch
|
||
* identification information, such as the developer's copyright
|
||
* a link to Risk Astley's "Never Gonna Give You Up" on YouTube
|
||
|
||
Please note that in the final bytecode, if the null terminator of TOY_VERSION_BUILD is not 4-byte aligned, extra space will be allocated to round out the header's size to a multiple of 4. The contents of the extra bytes are undefined.
|
||
|
||
===
|
||
|
||
At this time, a 'module' consists of a single 'routine', which acts as its global scope.
|
||
|
||
Additional information may be added later, or multiple 'modules' listed sequentially may be a possibility.
|
||
|
||
===
|
||
|
||
# the routine structure, which is potentially recursive
|
||
|
||
# symbol shorthand : 'module::identifier'
|
||
# where 'module' can be omitted if it's local to this module ('identifier' within the symbols is calculated at the module level, it's always unique)
|
||
|
||
.header:
|
||
N total size # size of this routine, including all data and subroutines
|
||
N .jumps count # the number of entries in the jump table (should be data count + routine count)
|
||
N .param count # the number of parameter fields expected
|
||
N .data count # the number of data fields expected
|
||
N .routine count # the number of routines present
|
||
.code start # absolute address of .code; mandatory
|
||
.param start # absolute addess of .param; omitted if not needed
|
||
.datatable start # absolute address of .datatable; omitted if not needed
|
||
.data start # absolute address of .data; omitted if not needed
|
||
.routine start # absolute address of .routine; omitted if not needed
|
||
# additional metadata fields can be added later
|
||
|
||
.code:
|
||
# instructions read and 'executed' by the interpreter
|
||
READ 0
|
||
LOAD 0
|
||
ASSERT
|
||
|
||
.param:
|
||
# a list of symbols to be used as keys in the environment
|
||
|
||
.jumptable:
|
||
# a 'symbol -> pointer' jumptable for quickly looking up values in .data and .routines
|
||
0 -> {string, 0x00}
|
||
1 -> {fn, 0xFF}
|
||
|
||
.data:
|
||
# data that can't really be embedded into .code
|
||
<STRING>,"Hello world"
|
||
|
||
.routines:
|
||
# inner routines, each of which conforms to this spec
|