docs: spelling mistakes correction

This commit is contained in:
Julien Higginson
2025-01-11 14:22:55 +01:00
parent 575003433f
commit f0e8bfeb02
30 changed files with 201 additions and 175 deletions

View File

@@ -1,3 +1,4 @@
# Building Toy
This tutorial assumes you're using git, GCC, and make.
@@ -10,22 +11,22 @@ Toy's makefile uses the exported variable `TOY_OUTDIR` to define where the outpu
export TOY_OUTDIR = out
```
Next, you'll want to run make the from within Toy's `source`, assuming the output directory has been created. There are two options for building Toy - `library` (default) or `static`; the former will create a shared library (and a .dll file on windows), while the latter will create a static library.
Next, you'll want to run make the from within Toy's `source`, assuming the output directory has been created. There are two options for building Toy - `library` (default) or `static`; the former will create a shared library (and a .dll file on Windows), while the latter will create a static library.
```make
toy: $(OUTDIR)
$(MAKE) -C Toy/source
$(MAKE) -C Toy/source
$(OUTDIR):
mkdir $(OUTDIR)
mkdir $(OUTDIR)
```
Finally, link against the outputted library, with the source directory as the location of the header files.
```make
all: $(OBJ) toy
$(CC) $(CFLAGS) -o $(OUT) $(OBJ) -L$(TOY_OUTDIR) -ltoy
$(CC) $(CFLAGS) -o $(OUT) $(OBJ) -L$(TOY_OUTDIR) -ltoy
```
These snippets of makefile are only an example - the repository has a more fully featured set of makefiles which can also produce a usable repl program.
These snippets of makefile are only an example - the repository has a more fully featured set of makefiles which can also produce a usable REPL program.

View File

@@ -1,6 +1,7 @@
# Compiling Toy
This tutorial is a sub-section of [Embedding Toy](deep-dive/embedding-toy) that has been spun off into it's own page for the sake of brevity/sanity. It's recommended that you read the main article first.
This tutorial is a subsection of [Embedding Toy](deep-dive/embedding-toy) that has been spun off into its own page for the sake of brevity/sanity. It's recommended that you read the main article first.
The exact phases outlined here are entirely implementation-dependent - that is, they aren't required, and are simply how the canonical implementation of Toy works.
@@ -69,7 +70,7 @@ Toy_freeCompiler(&compiler);
The writing step is the process in which AST nodes are compressed into bytecode instructions, while literal values are extracted and placed aside in a cache (usually in a compressed, intermediate state).
The collation phase, however is when the bytecode instructions, along with the now flattened intermediate literals and function bodies are combined. The bytecode header specified in [Developing Toy](deep-dive/developing-toy) is placed at the beginning of this blob of bytes during this step.
The collation phase, however, is when the bytecode instructions, along with the now flattened intermediate literals and function bodies are combined. The bytecode header specified in [Developing Toy](deep-dive/developing-toy) is placed at the beginning of this blob of bytes during this step.
The Toy bytecode (abbreviated to `tb`), along with the `size` variable indicating the size of the bytecode, are the result of the compilation. This bytecode can be saved into a file for later consumption by the host at runtime - you must ensure that any bytecode files have the `.tb` extension.

View File

@@ -4,7 +4,7 @@ Here you'll find some of the implementation details.
# Bytecode
The output of Toy's compiler, and the input of the interpreter, is known as "bytecode". Here, I've attempted to fully document the layout of the canonical bytecode's structure, but since this was written after most of this was implemented, there may be small discrepencies present.
The output of Toy's compiler, and the input of the interpreter, is known as "bytecode". Here, I've attempted to fully document the layout of the canonical bytecode's structure, but since this was written after most of this was implemented, there may be small discrepancies present.
There are four main sections of the bytecode:
@@ -26,7 +26,7 @@ The header consists of four values:
* TOY_VERSION_PATCH
* TOY_VERSION_BUILD
The first three are single unsigned bytes, embedded at the beginning of the bytecode in sequence. These represent the major, minor and patch versions of the language. The fourth value is a null-terminated c-string of unspecified data, which is *intended* but not required to specify the time that the langauge's compiler was itself compiled. The build string can hold arbitrary data, such as the current maintainer's name, current fork of the language, or other versioning info.
The first three are single unsigned bytes, embedded at the beginning of the bytecode in sequence. These represent the major, minor and patch versions of the language. The fourth value is a null-terminated c-string of unspecified data, which is *intended* but not required to specify the time that the language's compiler was itself compiled. The build string can hold arbitrary data, such as the current maintainer's name, current fork of the language, or other versioning info.
There are some strict rules when interpreting these values (mimicking, but not conforming to [semver.org](https://semver.org/)):
@@ -43,33 +43,34 @@ The latest version information can be found in [toy_common.h](https://github.com
In Toy, a "Literal" is a value of some kind, be it an integer, or a dictionary, or even a variable name. Rather than embedding the same literal (potentially) many times within the bytecode, the "Literal Cache" was devised to act as an immutable, indexable repository of any literals needed. When bytecode is first loaded into the interpreter, the first thing that happens (after the header is parsed) is the reconstruction of the literal cache. The internal function `readInterpreterSections()` is responsible for this step.
The first `unsigned short` to be read from this section is `literalCount`, which defines the number of literals which are to be read. Once all literals have been read out of this section, the opcode `TOY_OP_SECTION_END` is expected to be consumed. Some preprocessor macros can also enable or disable debug printing functionality within the repl.
The first `unsigned short` to be read from this section is `literalCount`, which defines the number of literals which are to be read. Once all literals have been read out of this section, the opcode `TOY_OP_SECTION_END` is expected to be consumed. Some preprocessor macros can also enable or disable debug printing functionality within the REPL.
The list of valid literal types are:
### TOY_LITERAL_NULL
This literal is simply inserted into the literal cache when encountered.
### TOY_LITERAL_BOOLEAN
This literal specifies that the next byte is it's value, either true or false.
This literal specifies that the next byte is its value, either true or false.
### TOY_LITERAL_INTEGER
This literal specifies that the next 4 bytes are it's value, interpreted as a 32-bit integer.
This literal specifies that the next 4 bytes are its value, interpreted as a 32-bit integer.
### TOY_LITERAL_FLOAT
This literal specifies that the next 4 bytes are it's value, interpreted as a 32-bit floating point integer.
This literal specifies that the next 4 bytes are its value, interpreted as a 32-bit floating point integer.
### TOY_LITERAL_STRING
This literal specifies that the next collection of null terminated bytes are it's value, interpreted as a null-terminated string.
This literal specifies that the next collection of null terminated bytes are its value, interpreted as a null-terminated string.
### TOY_LITERAL_ARRAY_INTERMEDIATE
`TOY_LITERAL_ARRAY_INTERMEDIATE` specifies that the literal to be read is a flattened `LiteralArray`. A "flattened" compound literal does not actually store it's contents, only references to it's contents' positions within the literal cache.
`TOY_LITERAL_ARRAY_INTERMEDIATE` specifies that the literal to be read is a flattened `LiteralArray`. A "flattened" compound literal does not actually store its contents, only references to its contents' positions within the literal cache.
To read this array, you must first read an `unsigned short` which specifies the size, then read that many additional `unsigned shorts`, which are indices. Finally, the original `LiteralArray` can be reconstructed using those indices, in order.
@@ -77,7 +78,7 @@ As the final step, the newly reconstructed `LiteralArray` is added to the litera
### TOY_LITERAL_DICTIONARY_INTERMEDIATE
`TOY_LITERAL_DICTIONARY_INTERMEDIATE` specifies that the literal to be read is a flattened `LiteralDictionary`. A "flattened" compound literal does not actually store it's contents, only references to it's contents' positions within the literal cache.
`TOY_LITERAL_DICTIONARY_INTERMEDIATE` specifies that the literal to be read is a flattened `LiteralDictionary`. A "flattened" compound literal does not actually store its contents, only references to its contents' positions within the literal cache.
To read this dictionary, you must first read an `unsigned short` which specifies the size (both keys and values), then read that many additional `unsigned shorts`, which are indices of keys and values. Finally, the original `LiteralDictionary` can be reconstructed using those key and value indices.
@@ -85,13 +86,13 @@ As the final step, the newly reconstructed `LiteralDictionary` is added to the l
### TOY_LITERAL_FUNCTION
When a `TOY_LITERAL_FUNCTION` is encountered, the next `unsigned short` to be read (the function index) should be converted into an integer literal, before having it's type manually changed to `TOY_LITERAL_FUNCTION_INTERMEDIATE` for storage within the literal cache.
When a `TOY_LITERAL_FUNCTION` is encountered, the next `unsigned short` to be read (the function index) should be converted into an integer literal, before having its type manually changed to `TOY_LITERAL_FUNCTION_INTERMEDIATE` for storage within the literal cache.
Functions will be processed properly in a later step - so this literal is added to the cache as a placeholder until that point.
### TOY_LITERAL_IDENTIFIER
This literal specifies that the next collection of null terminated bytes are it's value, interpreted as a null-terminated string.
This literal specifies that the next collection of null terminated bytes are its value, interpreted as a null-terminated string.
### TOY_LITERAL_TYPE
@@ -103,7 +104,7 @@ This literal specifies that the next byte is the type of a literal, and the foll
This literal specifies that the next byte is the type of a literal, and the following byte is a boolean specifying const-ness.
Then if the type is `TOY_LITERAL_ARRAY`, the following `unsigned short` is an index within the cache, representing the type of the contents.
Then, if the type is `TOY_LITERAL_ARRAY`, the following `unsigned short` is an index within the cache, representing the type of the contents.
Otherwise, if the type is `TOY_LITERAL_DICTIONARY`, the following two `unsigned short`s are indices within the cache, representing the types of the keys and values.
@@ -188,6 +189,7 @@ TODO: finish these
|TOY_OP_PREFIX|256|Used internally.|
|TOY_OP_POSTFIX|257|Used internally.|
\*If this literal is an identifier, it is instead replaced with the correct given value from the current scope.
\*\*On failure, the script will print an error message to the error output and exit.
@@ -216,11 +218,11 @@ There are four main functions for running the interpreter:
* `Toy_resetInterpreter`
* `Toy_freeInterpreter`
First, `init` zeroes out the interpreter, sets up the printing functions, and delegates to `reset`, which in turn sets up the program's scope (and injects the default global functions). The initialization function is split into two this way so that `reset` can be used independantly on a "dirty" interpreter to ready it for another script (or another run of the same script). `reset` is usually not needed and may be removed in future.
First, `init` zeroes out the interpreter, sets up the printing functions, and delegates to `reset`, which in turn sets up the program's scope (and injects the default global functions). The initialization function is split into two this way so that `reset` can be used independently on a "dirty" interpreter to ready it for another script (or another run of the same script). `reset` is usually not needed and may be removed in future.
`free` simply frees the interpreter after execution.
Interestingly, `run` doesn't jump straight into exection. Instead, it first does it's own bit of setup, before reading out the bytecode's header. If the header indicates an incompatible version, then the interpreter will refuse to run, to prevent mistakes from ruining the program.
Interestingly, `run` doesn't jump straight into execution. Instead, it first does its own bit of setup, before reading out the bytecode's header. If the header indicates an incompatible version, then the interpreter will refuse to run, to prevent mistakes from ruining the program.
`run` will also delegate to a function called `readInterpreterSections()`, which reads and reconstructs the "literalCache" - a collection of all values within the program (variable identifiers, variable values, function bytecode, etc.)
@@ -230,39 +232,39 @@ Finally, `run` will automatically free the bytecode and associated literalCache
## Executing the Interpreter
Opcodes within the bytecode are 1 byte in length, and specify a single action to take. Each possible action is definied within the interpreter in a function that begins with `exec`, and are called from within a big looping switch statement. If any of these `exec` functions encounters an error, they can simply return false to break the loop.
Opcodes within the bytecode are 1 byte in length, and specify a single action to take. Each possible action is defined within the interpreter in a function that begins with `exec`, and are called from within a big looping switch statement. If any of these `exec` functions encounters an error, they can simply return false to break the loop.
The interpeter is stack-based; most, if not all of the actions are preformed on literals within a specially designated array called `stack`. for example:
The interpreter is stack-based; most, if not all, the actions are preformed on literals within a specially designated array called `stack`. For example:
```c
case TOY_OP_PRINT:
if (!execPrint(interpreter)) {
return;
}
break;
case TOY_OP_PRINT:
if (!execPrint(interpreter)) {
return;
}
break;
```
When a the opcode `TOY_OP_PRINT` is encountered, the top literal within the stack is popped off, and printed (more info on literals below).
When the opcode `TOY_OP_PRINT` is encountered, the top literal within the stack is popped off, and printed (more info on literals below).
```c
static bool execPrint(Toy_Interpreter* interpreter) {
//get the top literal
Toy_Literal lit = Toy_popLiteralArray(&interpreter->stack);
//get the top literal
Toy_Literal lit = Toy_popLiteralArray(&interpreter->stack);
//if the top literal is an identifier, get it's value
Toy_Literal idn = lit;
if (TOY_IS_IDENTIFIER(lit) && Toy_parseIdentifierToValue(interpreter, &lit)) {
Toy_freeLiteral(idn);
}
//if the top literal is an identifier, get it's value
Toy_Literal idn = lit;
if (TOY_IS_IDENTIFIER(lit) && Toy_parseIdentifierToValue(interpreter, &lit)) {
Toy_freeLiteral(idn);
}
//print as a string to the current print method
Toy_printLiteralCustom(lit, interpreter->printOutput);
//print as a string to the current print method
Toy_printLiteralCustom(lit, interpreter->printOutput);
//free the literal
Toy_freeLiteral(lit);
//free the literal
Toy_freeLiteral(lit);
//continue the loop
return true;
//continue the loop
return true;
}
```
@@ -273,7 +275,7 @@ As in most programming languages, variables can be represented by names specifie
```c
Toy_Literal idn = literal; //cache the literal, just in case it's an identifier
if (TOY_IS_IDENTIFIER(literal) && Toy_parseIdentifierToValue(interpreter, &literal)) { //if it is an identifier, parse it...
Toy_freeLiteral(idn); //always remember to free the original identifier, otherwise you'll have a memory leak!
Toy_freeLiteral(idn); //always remember to free the original identifier, otherwise you'll have a memory leak!
}
```
@@ -287,7 +289,7 @@ Other functions are available at the top of the interpreter source file:
* injection utilities
* parsing utilities
* bytecode utilities
* function utilities (these ones is at the very bottom of the source file)
* function utilities (these are at the very bottom of the source file)
# Literals

View File

@@ -1,3 +1,4 @@
# Embedding Toy
This tutorial assumes that you've managed to embed Toy into your program by following the tutorial [Building Toy](deep-dive/building-toy).
@@ -18,7 +19,7 @@ The functions intended for usage by the API are prepended with the C macro `TOY_
## Structures Used Throughout Toy
The main unit of data within Toy's internals is `Toy_Literal`, which can contain any value that can exist within the Toy langauge - even identifiers. The exact implementation of `Toy_Literal` may change or evolve as time goes on, so it's recommended that you only interact with literals directly by using the macros and functions outlined [above](#embedded-api-macros). See the [types](getting-started/types) page for information on exactly what datatypes exist in Toy.
The main unit of data within Toy's internals is `Toy_Literal`, which can contain any value that can exist within the Toy language - even identifiers. The exact implementation of `Toy_Literal` may change or evolve as time goes on, so it's recommended that you only interact with literals directly by using the macros and functions outlined [above](#embedded-api-macros). See the [types](getting-started/types) page for information on exactly what data types exist in Toy.
There are two main "compound structures" used within Toy's internals - the `Toy_LiteralArray` and `Toy_LiteralDictionary`. The former is an array of `Toy_Literal` instances stored sequentially in memory for fast lookups, while the latter is a key-value hashmap designed for efficient lookups based on a `Toy_Literal` key. These are both accessible via the language as well.
@@ -61,23 +62,23 @@ Hooks can simply inject native functions into the current scope, or they can do
```c
//a utility structure for storing the native C functions
typedef struct Natives {
char* name;
Toy_NativeFn fn;
char* name;
Toy_NativeFn fn;
} Natives;
int Toy_hookStandard(Toy_Interpreter* interpreter, Toy_Literal identifier, Toy_Literal alias) {
//the list of available native C functions that can be called from Toy
Natives natives[] = {
{"clock", nativeClock},
{NULL, NULL}
};
//the list of available native C functions that can be called from Toy
Natives natives[] = {
{"clock", nativeClock},
{NULL, NULL}
};
//inject each native C functions into the current scope
for (int i = 0; natives[i].name; i++) {
Toy_injectNativeFn(interpreter, natives[i].name, natives[i].fn);
}
//inject each native C functions into the current scope
for (int i = 0; natives[i].name; i++) {
Toy_injectNativeFn(interpreter, natives[i].name, natives[i].fn);
}
return 0;
return 0;
}
```
@@ -90,7 +91,7 @@ TOY_API bool Toy_callLiteralFn(Toy_Interpreter* interpreter, Toy_Literal func, T
TOY_API bool Toy_callFn (Toy_Interpreter* interpreter, char* name, Toy_LiteralArray* arguments, Toy_LiteralArray* returns);
```
The first argument must be an interpreter. The third argument is a pointer to a `Toy_LiteralArray` containing a list of arguments to pass to the function, and the fourth is a pointer to a `Toy_LiteralArray` where the return values can be stored (an array is used here for a potential future feature). The contents of the argument array is consumed and left in an indeterminate state (but is safe to free), while the returns array always has one value - if the function did not return a value, then it contains a `null` literal.
The first argument must be an interpreter. The third argument is a pointer to a `Toy_LiteralArray` containing a list of arguments to pass to the function, and the fourth is a pointer to a `Toy_LiteralArray` where the return values can be stored (an array is used here for a potential future feature). The contents of the argument array are consumed and left in an indeterminate state (but is safe to free), while the returns array always has one value - if the function did not return a value, then it contains a `null` literal.
The second arguments to these functions are either the function to be called as a `Toy_Literal`, or the name of the function within the interpreter's scope. The latter API simply finds the specified `Toy_Literal` if it exists and calls the former. As with most APIs, these return `false` if something went wrong.

View File

@@ -1,8 +1,9 @@
# Roadmapping Toy
## Game And Game Engine
The Toy programming langauge was designed from the beginning as though it was supposed to be embedded into an imaginary game engine. Development on said engine and an associated game have proceeded smoothly so far.
The Toy programming language was designed from the beginning as though it was supposed to be embedded into an imaginary game engine. Development on said engine and an associated game have proceeded smoothly so far.
## Microprocessor Support
@@ -22,7 +23,7 @@ Some of these have always been planned, but were sidelined or are incomplete for
## Nope Features
Some things which simply will not be added in the foreseeable future are:
Some things that simply will not be added in the foreseeable future are:
* Classes & Structures
* Do-while loops

View File

@@ -1,5 +1,5 @@
# Testing Toy
Toy uses GitHub CI/CD for comprehensive automated testing - however, all of the tests are under `test/`, and can be executed by running `make test`. Doing so on linux will attempt to use valgrind; to disable using valgrind, pass in `DISABLE_VALGRIND=true` as an environment variable. GitHub CI also has access to the option `make test-sanitized` which attempts to use memory sanitation.
Toy uses GitHub CI/CD for comprehensive automated testing - however, all the tests are under `test/`, and can be executed by running `make test`. Doing so on Linux will attempt to use valgrind; to disable using valgrind, pass in `DISABLE_VALGRIND=true` as an environment variable. GitHub CI also has access to the option `make test-sanitized` which attempts to use memory sanitation.
The tests consist of a number of different situations and edge cases which have been discovered, and should probably be thoroughly tested one way or another. There are also several "-bugfix.toy" scripts which explicitly test a bug that has been encountered in one way or another, to prevent regressions. The libs that are stored in `repl/` are also tested - their tests are under `/tests/scripts/lib`; some error cases are also checked by the mustfail tests in `/test/scripts/mustfail`.
The tests consist of a number of different situations and edge cases that have been discovered, and should probably be thoroughly tested one way or another. There are also several "-bugfix.toy" scripts that explicitly test a bug that has been encountered in one way or another to prevent regressions. The libs that are stored in `repl/` are also tested - their tests are under `/tests/scripts/lib`; some error cases are also checked by the mustfail tests in `/test/scripts/mustfail`.

View File

@@ -1,24 +1,23 @@
# Theorizing Toy
Sooner or later, every coder will try to create their own programming language. In my case, it took me over a decade and a half to realize that was even an option, but once I did I read through a fantastic book called [Crafting Interpreters](https://craftinginterpreters.com/). This sent me down the rabbit hole, so to speak.
Sooner or later, every coder will try to create their own programming language. In my case, it took me over a decade and a half to realize that was even an option, but once I did, I read through a fantastic book called [Crafting Interpreters](https://craftinginterpreters.com/). This sent me down the rabbit hole, so to speak.
The main driving idea behind the Toy programming langauge has remained the same from the very beginning - I wanted a scripting language that could be embedded into a larger host program, to allow for easy modification by the end user. Specifically, I wanted to enable easy modding of video games made in an imaginary game engine.
The main driving idea behind the Toy programming language has remained the same from the very beginning - I wanted a scripting language that could be embedded into a larger host program to allow for easy modification by the end user. Specifically, I wanted to enable easy modding of video games made in an imaginary game engine.
At the time of writing, I've started working on said engine, building it around Toy, and adjusting Toy to fit the engine as needed. I've also begun working on a game within that engine, as I believe the best way to build an engine is to build a game with it first. The engine has been dubbed "Box", and the game is called "Skylands".
But this post isn't about the engine, it's about Toy - I want to explain, in some detail, my thought processes when developing it. Let's start at the beginning.
But this post isn't about the engine; it's about Toy - I want to explain, in some detail, my thought processes when developing it. Let's start at the beginning.
```toy
print "Hello world";
```
I've drawn the `print` keyword from Crafting Interpreter's Lox language, for much the same reason as explained in the book - it's a simple and easy way to debug issues. You'll be able to print out any kind of value or variable from this statement - but it loses some context like function implementations, and the values of `opaque` literals.
I've drawn the `print` keyword from Crafting Interpreter's Lox language, for much the same reason as explained in the book - it's a simple and easy way to debug issues. You'll be able to print out any kind of value or variable from this statement - but it loses some context, like function implementations and the values of `opaque` literals.
Let's touch on variables quickly - There's about a dozen variable types that can be used, depending on how you count them. They include `bool`, `int`, `float`, `string` and a couple of compound types - but strict typing in Toy is completely optional (`any` is used by default). There are also functions, which are reusable chunks of code, and a pretty standard set of operators with their traditional precedences.
Let's touch on variables quickly. There's about a dozen variable types that can be used, depending on how you count them. They include `bool`, `int`, `float`, `string` and a couple of compound types - but strict typing in Toy is completely optional (`any` is used by default). There are also functions, which are reusable chunks of code, and a pretty standard set of operators with their traditional precedences.
One way in which Toy stands out is the bytecode compilation step. Before execution, the source code must be compiled into an intermediate bytecode format (a trait also inherited from Lox) before it can be executed by the interpreter. The exact specifications of the bytecode formatting are not currently documented (yet). The intermediate bytecode stage, and the independance of the interpreter from the compiler, also allow unique features such as the possiblity of operating on a microcontroller.
One way in which Toy stands out is the bytecode compilation step. Before execution, the source code must be compiled into an intermediate bytecode format (a trait also inherited from Lox) before it can be executed by the interpreter. The exact specifications of the bytecode formatting are not currently documented (yet). The intermediate bytecode stage and the independence of the interpreter from the compiler also allow unique features, such as the possibility of operating on a microcontroller.
One major native feature which is missing from Toy is an input system, such as from stdin. Instead, Toy is intended to receive its instructions from the host program, including any input needed. One such example would be a game controller library - something which takes in button presses, and calls certain Toy functions to move a character around the game world. Toy is almost infinitely extensible via the C API's hook injection system.
I would like to keep the core language nice and simple, as much as possible - something you can explain with just the quickstart page. However, feedback and criticism are always welcome.
One major native feature that is missing from Toy is an input system, such as from stdin. Instead, Toy is intended to receive its instructions from the host program, including any input needed. One such example would be a game controller library - something that takes in button presses and calls certain Toy functions to move a character around the game world. Toy is almost infinitely extensible via the C API's hook injection system.
I would like to keep the core language nice and simple, as much as possible - something you can explain with just the quick-start page. However, feedback and criticism are always welcome.