6.6 KiB
Developing Toy
Here you'll find some of the implementation details.
Bytecode Header Format
The bytecode header format must not change.
Every instance of Toy bytecode will be divided up into several sections, by necessity - however the first one to be read is the header. This section is used to define what version of Toy is currently running, as well as to prevent any future version/fork clashes.
The header consists of four values:
- TOY_VERSION_MAJOR
- TOY_VERSION_MINOR
- TOY_VERSION_PATCH
- TOY_VERSION_BUILD
The first three are single unsigned bytes, embedded at the beginning of the bytecode in sequence. These represent the major, minor and patch versions of the language. The fourth value is a null-terminated string of unspecified data, which is intended but not required to specify the time that the langauge's compiler was itself compiled. The build string can hold arbitrary data, such as the current maintainer's name, current fork of the language, or other versioning info.
There are some strict rules when interpreting these values (mimicking, but not conforming to semver.org):
- Under no circumstance, should you ever run bytecode whose major version is different - there are definitely broken APIs involved.
- Under no circumstance, should you ever run bytecode whose minor version is above the interpreter's minor version - the bytecode could potentially use unimplemented features.
- You may, at your own risk, attempt to run bytecode whose patch version is different.
- You may, at your own risk, attempt to run bytecode whose build version is different.
All interpreter implementations retain the right to reject any bytecode whose header data does not conform to the above specification.
The latest version information can be found in toy_common.h
Parser Structure and Operations
TODO
Compiler Structure and Operations
No.
Interpreter Structure and Operations
The Toy interpreter is, at it's core, just a big loop that reads bytes from memory and acts on them. Here, I'll break down exactly how it works, from a top-down perspective.
Running the Interpreter
There are four main functions for running the interpreter:
Toy_initInterpreterToy_runInterpreterToy_resetInterpreterToy_freeInterpreter
First, init zeroes out the interpreter, sets up the printing functions, and delegates to reset, which in turn sets up the program's scope (and injects the default global functions). The initialization function is split into two this way so that reset can be used independantly on a "dirty" interpreter to ready it for another script (or another run of the same script). reset is usually not needed and may be removed in future.
free simply frees the interpreter after execution.
Interestingly, run doesn't jump straight into exection. Instead, it first does it's own bit of setup, before reading out the bytecode's header. If the header indicates an incompatible version, then the interpreter will refuse to run, to prevent mistakes from ruining the program.
run will also delegate to a function called readInterpreterSections(), which reads and reconstructs the "literalCache" - a collection of all values within the program (variable identifiers, variable values, function bytecode, etc.)
Next, run will pass to a function called execInterpreter(), which contains the program's loop.
Finally, run will automatically free the bytecode and associated literalCache (this may change at some point).
Bytecode Layout
I don't know.
To put it bluntly, the layout of the compressed bytecode was very adhoc, and as such it was not documented at the time. This was partially because I (wrongly) believed that the layout didn't matter much, only the final execution.
I can say a few things about it though -
- Literal compounds are stored as arrays of integers which reference previously declared literals
- Functions are stored after the literal cache, in their own section and are referenced in the literal cache by index
- Functions are structured very similarly to the program as a whole, and store their argument and return arrays within their own literalCaches
I will document this one day, but not any time soon.
Executing the Interpreter
Opcodes within the bytecode are 1 byte in length, and specify a single action to take. Each possible action is definied within the interpreter in a function that begins with exec, and are called from within a big looping switch statement. If any of these exec functions encounters an error, they can simply return false to break the loop.
The interpeter is stack-based; most, if not all of the actions are preformed on literals within a specially designated array called stack. for example:
case TOY_OP_PRINT:
if (!execPrint(interpreter)) {
return;
}
break;
When a the opcode TOY_OP_PRINT is encountered, the top literal within the stack is popped off, and printed (more info on literals below).
static bool execPrint(Toy_Interpreter* interpreter) {
//get the top literal
Toy_Literal lit = Toy_popLiteralArray(&interpreter->stack);
//if the top literal is an identifier, get it's value
Toy_Literal idn = lit;
if (TOY_IS_IDENTIFIER(lit) && Toy_parseIdentifierToValue(interpreter, &lit)) {
Toy_freeLiteral(idn);
}
//print as a string to the current print method
Toy_printLiteralCustom(lit, interpreter->printOutput);
//free the literal
Toy_freeLiteral(lit);
//continue the loop
return true;
}
Identity Crisis
As in most programming languages, variables can be represented by names specified by the programmer; in Toy, these are called "identifiers". These identifiers can be passed around in place of their actual values, but can't be used directly. To retrieve a value, you must first "parse" it, like so:
Toy_Literal idn = literal; //cache the literal, just in case it's an identifier
if (TOY_IS_IDENTIFIER(literal) && Toy_parseIdentifierToValue(interpreter, &literal)) { //if it is an identifier, parse it...
Toy_freeLiteral(idn); //always remember to free the original identifier, otherwise you'll have a memory leak!
}
You will often see this pattern throughout the codebase.
Other Utility Functions
Other functions are available at the top of the interpreter source file:
- printing utilities
- injection utilities
- parsing utilities
- bytecode utilities
- function utilities (these ones is at the very bottom of the source file)
Literals
TODO
Arrays & Dictionaries
TODO