mirror of
https://github.com/krgamestudios/Toy.git
synced 2026-04-15 14:54:07 +10:00
docs: spelling mistakes correction
This commit is contained in:
@@ -4,7 +4,7 @@ Here you'll find some of the implementation details.
|
||||
|
||||
# Bytecode
|
||||
|
||||
The output of Toy's compiler, and the input of the interpreter, is known as "bytecode". Here, I've attempted to fully document the layout of the canonical bytecode's structure, but since this was written after most of this was implemented, there may be small discrepencies present.
|
||||
The output of Toy's compiler, and the input of the interpreter, is known as "bytecode". Here, I've attempted to fully document the layout of the canonical bytecode's structure, but since this was written after most of this was implemented, there may be small discrepancies present.
|
||||
|
||||
There are four main sections of the bytecode:
|
||||
|
||||
@@ -26,7 +26,7 @@ The header consists of four values:
|
||||
* TOY_VERSION_PATCH
|
||||
* TOY_VERSION_BUILD
|
||||
|
||||
The first three are single unsigned bytes, embedded at the beginning of the bytecode in sequence. These represent the major, minor and patch versions of the language. The fourth value is a null-terminated c-string of unspecified data, which is *intended* but not required to specify the time that the langauge's compiler was itself compiled. The build string can hold arbitrary data, such as the current maintainer's name, current fork of the language, or other versioning info.
|
||||
The first three are single unsigned bytes, embedded at the beginning of the bytecode in sequence. These represent the major, minor and patch versions of the language. The fourth value is a null-terminated c-string of unspecified data, which is *intended* but not required to specify the time that the language's compiler was itself compiled. The build string can hold arbitrary data, such as the current maintainer's name, current fork of the language, or other versioning info.
|
||||
|
||||
There are some strict rules when interpreting these values (mimicking, but not conforming to [semver.org](https://semver.org/)):
|
||||
|
||||
@@ -43,33 +43,34 @@ The latest version information can be found in [toy_common.h](https://github.com
|
||||
|
||||
In Toy, a "Literal" is a value of some kind, be it an integer, or a dictionary, or even a variable name. Rather than embedding the same literal (potentially) many times within the bytecode, the "Literal Cache" was devised to act as an immutable, indexable repository of any literals needed. When bytecode is first loaded into the interpreter, the first thing that happens (after the header is parsed) is the reconstruction of the literal cache. The internal function `readInterpreterSections()` is responsible for this step.
|
||||
|
||||
The first `unsigned short` to be read from this section is `literalCount`, which defines the number of literals which are to be read. Once all literals have been read out of this section, the opcode `TOY_OP_SECTION_END` is expected to be consumed. Some preprocessor macros can also enable or disable debug printing functionality within the repl.
|
||||
The first `unsigned short` to be read from this section is `literalCount`, which defines the number of literals which are to be read. Once all literals have been read out of this section, the opcode `TOY_OP_SECTION_END` is expected to be consumed. Some preprocessor macros can also enable or disable debug printing functionality within the REPL.
|
||||
|
||||
The list of valid literal types are:
|
||||
|
||||
|
||||
### TOY_LITERAL_NULL
|
||||
|
||||
This literal is simply inserted into the literal cache when encountered.
|
||||
|
||||
### TOY_LITERAL_BOOLEAN
|
||||
|
||||
This literal specifies that the next byte is it's value, either true or false.
|
||||
This literal specifies that the next byte is its value, either true or false.
|
||||
|
||||
### TOY_LITERAL_INTEGER
|
||||
|
||||
This literal specifies that the next 4 bytes are it's value, interpreted as a 32-bit integer.
|
||||
This literal specifies that the next 4 bytes are its value, interpreted as a 32-bit integer.
|
||||
|
||||
### TOY_LITERAL_FLOAT
|
||||
|
||||
This literal specifies that the next 4 bytes are it's value, interpreted as a 32-bit floating point integer.
|
||||
This literal specifies that the next 4 bytes are its value, interpreted as a 32-bit floating point integer.
|
||||
|
||||
### TOY_LITERAL_STRING
|
||||
|
||||
This literal specifies that the next collection of null terminated bytes are it's value, interpreted as a null-terminated string.
|
||||
This literal specifies that the next collection of null terminated bytes are its value, interpreted as a null-terminated string.
|
||||
|
||||
### TOY_LITERAL_ARRAY_INTERMEDIATE
|
||||
|
||||
`TOY_LITERAL_ARRAY_INTERMEDIATE` specifies that the literal to be read is a flattened `LiteralArray`. A "flattened" compound literal does not actually store it's contents, only references to it's contents' positions within the literal cache.
|
||||
`TOY_LITERAL_ARRAY_INTERMEDIATE` specifies that the literal to be read is a flattened `LiteralArray`. A "flattened" compound literal does not actually store its contents, only references to its contents' positions within the literal cache.
|
||||
|
||||
To read this array, you must first read an `unsigned short` which specifies the size, then read that many additional `unsigned shorts`, which are indices. Finally, the original `LiteralArray` can be reconstructed using those indices, in order.
|
||||
|
||||
@@ -77,7 +78,7 @@ As the final step, the newly reconstructed `LiteralArray` is added to the litera
|
||||
|
||||
### TOY_LITERAL_DICTIONARY_INTERMEDIATE
|
||||
|
||||
`TOY_LITERAL_DICTIONARY_INTERMEDIATE` specifies that the literal to be read is a flattened `LiteralDictionary`. A "flattened" compound literal does not actually store it's contents, only references to it's contents' positions within the literal cache.
|
||||
`TOY_LITERAL_DICTIONARY_INTERMEDIATE` specifies that the literal to be read is a flattened `LiteralDictionary`. A "flattened" compound literal does not actually store its contents, only references to its contents' positions within the literal cache.
|
||||
|
||||
To read this dictionary, you must first read an `unsigned short` which specifies the size (both keys and values), then read that many additional `unsigned shorts`, which are indices of keys and values. Finally, the original `LiteralDictionary` can be reconstructed using those key and value indices.
|
||||
|
||||
@@ -85,13 +86,13 @@ As the final step, the newly reconstructed `LiteralDictionary` is added to the l
|
||||
|
||||
### TOY_LITERAL_FUNCTION
|
||||
|
||||
When a `TOY_LITERAL_FUNCTION` is encountered, the next `unsigned short` to be read (the function index) should be converted into an integer literal, before having it's type manually changed to `TOY_LITERAL_FUNCTION_INTERMEDIATE` for storage within the literal cache.
|
||||
When a `TOY_LITERAL_FUNCTION` is encountered, the next `unsigned short` to be read (the function index) should be converted into an integer literal, before having its type manually changed to `TOY_LITERAL_FUNCTION_INTERMEDIATE` for storage within the literal cache.
|
||||
|
||||
Functions will be processed properly in a later step - so this literal is added to the cache as a placeholder until that point.
|
||||
|
||||
### TOY_LITERAL_IDENTIFIER
|
||||
|
||||
This literal specifies that the next collection of null terminated bytes are it's value, interpreted as a null-terminated string.
|
||||
This literal specifies that the next collection of null terminated bytes are its value, interpreted as a null-terminated string.
|
||||
|
||||
### TOY_LITERAL_TYPE
|
||||
|
||||
@@ -103,7 +104,7 @@ This literal specifies that the next byte is the type of a literal, and the foll
|
||||
|
||||
This literal specifies that the next byte is the type of a literal, and the following byte is a boolean specifying const-ness.
|
||||
|
||||
Then if the type is `TOY_LITERAL_ARRAY`, the following `unsigned short` is an index within the cache, representing the type of the contents.
|
||||
Then, if the type is `TOY_LITERAL_ARRAY`, the following `unsigned short` is an index within the cache, representing the type of the contents.
|
||||
|
||||
Otherwise, if the type is `TOY_LITERAL_DICTIONARY`, the following two `unsigned short`s are indices within the cache, representing the types of the keys and values.
|
||||
|
||||
@@ -188,6 +189,7 @@ TODO: finish these
|
||||
|TOY_OP_PREFIX|256|Used internally.|
|
||||
|TOY_OP_POSTFIX|257|Used internally.|
|
||||
|
||||
|
||||
\*If this literal is an identifier, it is instead replaced with the correct given value from the current scope.
|
||||
\*\*On failure, the script will print an error message to the error output and exit.
|
||||
|
||||
@@ -216,11 +218,11 @@ There are four main functions for running the interpreter:
|
||||
* `Toy_resetInterpreter`
|
||||
* `Toy_freeInterpreter`
|
||||
|
||||
First, `init` zeroes out the interpreter, sets up the printing functions, and delegates to `reset`, which in turn sets up the program's scope (and injects the default global functions). The initialization function is split into two this way so that `reset` can be used independantly on a "dirty" interpreter to ready it for another script (or another run of the same script). `reset` is usually not needed and may be removed in future.
|
||||
First, `init` zeroes out the interpreter, sets up the printing functions, and delegates to `reset`, which in turn sets up the program's scope (and injects the default global functions). The initialization function is split into two this way so that `reset` can be used independently on a "dirty" interpreter to ready it for another script (or another run of the same script). `reset` is usually not needed and may be removed in future.
|
||||
|
||||
`free` simply frees the interpreter after execution.
|
||||
|
||||
Interestingly, `run` doesn't jump straight into exection. Instead, it first does it's own bit of setup, before reading out the bytecode's header. If the header indicates an incompatible version, then the interpreter will refuse to run, to prevent mistakes from ruining the program.
|
||||
Interestingly, `run` doesn't jump straight into execution. Instead, it first does its own bit of setup, before reading out the bytecode's header. If the header indicates an incompatible version, then the interpreter will refuse to run, to prevent mistakes from ruining the program.
|
||||
|
||||
`run` will also delegate to a function called `readInterpreterSections()`, which reads and reconstructs the "literalCache" - a collection of all values within the program (variable identifiers, variable values, function bytecode, etc.)
|
||||
|
||||
@@ -230,39 +232,39 @@ Finally, `run` will automatically free the bytecode and associated literalCache
|
||||
|
||||
## Executing the Interpreter
|
||||
|
||||
Opcodes within the bytecode are 1 byte in length, and specify a single action to take. Each possible action is definied within the interpreter in a function that begins with `exec`, and are called from within a big looping switch statement. If any of these `exec` functions encounters an error, they can simply return false to break the loop.
|
||||
Opcodes within the bytecode are 1 byte in length, and specify a single action to take. Each possible action is defined within the interpreter in a function that begins with `exec`, and are called from within a big looping switch statement. If any of these `exec` functions encounters an error, they can simply return false to break the loop.
|
||||
|
||||
The interpeter is stack-based; most, if not all of the actions are preformed on literals within a specially designated array called `stack`. for example:
|
||||
The interpreter is stack-based; most, if not all, the actions are preformed on literals within a specially designated array called `stack`. For example:
|
||||
|
||||
```c
|
||||
case TOY_OP_PRINT:
|
||||
if (!execPrint(interpreter)) {
|
||||
return;
|
||||
}
|
||||
break;
|
||||
case TOY_OP_PRINT:
|
||||
if (!execPrint(interpreter)) {
|
||||
return;
|
||||
}
|
||||
break;
|
||||
```
|
||||
|
||||
When a the opcode `TOY_OP_PRINT` is encountered, the top literal within the stack is popped off, and printed (more info on literals below).
|
||||
When the opcode `TOY_OP_PRINT` is encountered, the top literal within the stack is popped off, and printed (more info on literals below).
|
||||
|
||||
```c
|
||||
static bool execPrint(Toy_Interpreter* interpreter) {
|
||||
//get the top literal
|
||||
Toy_Literal lit = Toy_popLiteralArray(&interpreter->stack);
|
||||
//get the top literal
|
||||
Toy_Literal lit = Toy_popLiteralArray(&interpreter->stack);
|
||||
|
||||
//if the top literal is an identifier, get it's value
|
||||
Toy_Literal idn = lit;
|
||||
if (TOY_IS_IDENTIFIER(lit) && Toy_parseIdentifierToValue(interpreter, &lit)) {
|
||||
Toy_freeLiteral(idn);
|
||||
}
|
||||
//if the top literal is an identifier, get it's value
|
||||
Toy_Literal idn = lit;
|
||||
if (TOY_IS_IDENTIFIER(lit) && Toy_parseIdentifierToValue(interpreter, &lit)) {
|
||||
Toy_freeLiteral(idn);
|
||||
}
|
||||
|
||||
//print as a string to the current print method
|
||||
Toy_printLiteralCustom(lit, interpreter->printOutput);
|
||||
//print as a string to the current print method
|
||||
Toy_printLiteralCustom(lit, interpreter->printOutput);
|
||||
|
||||
//free the literal
|
||||
Toy_freeLiteral(lit);
|
||||
//free the literal
|
||||
Toy_freeLiteral(lit);
|
||||
|
||||
//continue the loop
|
||||
return true;
|
||||
//continue the loop
|
||||
return true;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -273,7 +275,7 @@ As in most programming languages, variables can be represented by names specifie
|
||||
```c
|
||||
Toy_Literal idn = literal; //cache the literal, just in case it's an identifier
|
||||
if (TOY_IS_IDENTIFIER(literal) && Toy_parseIdentifierToValue(interpreter, &literal)) { //if it is an identifier, parse it...
|
||||
Toy_freeLiteral(idn); //always remember to free the original identifier, otherwise you'll have a memory leak!
|
||||
Toy_freeLiteral(idn); //always remember to free the original identifier, otherwise you'll have a memory leak!
|
||||
}
|
||||
```
|
||||
|
||||
@@ -287,7 +289,7 @@ Other functions are available at the top of the interpreter source file:
|
||||
* injection utilities
|
||||
* parsing utilities
|
||||
* bytecode utilities
|
||||
* function utilities (these ones is at the very bottom of the source file)
|
||||
* function utilities (these are at the very bottom of the source file)
|
||||
|
||||
# Literals
|
||||
|
||||
|
||||
Reference in New Issue
Block a user