From 9e09279b8799977b752ae47dfcf99612001bb9e7 Mon Sep 17 00:00:00 2001 From: Kayne Ruse Date: Sun, 2 Oct 2022 10:42:35 +1100 Subject: [PATCH] Wrote some tutorials --- README.md | 5 ++- compiling-toy.md | 98 +++++++++++++++++++++++++++++++++++++++++++++++ developing-toy.md | 14 ------- using-toy.md | 96 ++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 198 insertions(+), 15 deletions(-) create mode 100644 compiling-toy.md create mode 100644 using-toy.md diff --git a/README.md b/README.md index 651b6dd..d3916e4 100644 --- a/README.md +++ b/README.md @@ -12,10 +12,11 @@ The host will provide all of the extensions needed on a case-by-case basis. Scri * Simple C-like syntax * Bytecode intermediate compilation -* `import` and `export` variables from the host program * Optional, but robust type system * functions and types are first-class citizens +* `import` and `export` variables from the host program * Fancy slice notation for strings, arrays and dictionaries +* Can re-direct output, error and assertion failure messages * Open source under the zlib license # Getting Started @@ -23,6 +24,8 @@ The host will provide all of the extensions needed on a case-by-case basis. Scri * [Quick Start Guide](quick-start-guide) * Tutorials * [Embedding Toy](embedding-toy) + * [Compiling Toy](compiling-toy) + * [Using Toy](using-toy) * ~~[Standard Libary](standard-library)~~ * [Types](types) * [Developing Toy](developing-toy) diff --git a/compiling-toy.md b/compiling-toy.md new file mode 100644 index 0000000..dfa898a --- /dev/null +++ b/compiling-toy.md @@ -0,0 +1,98 @@ +# Compiling Toy + +This tutorial is a sub-section of [Using-Toy](using-toy) that has been spun off into it's own page for the sake of brevity/sanity. It's recommended that you read the main article first. + +The exact phases outline here are entirely implementation-dependent - that is, they aren't required, and are simply how the canonical version of Toy works. + +## How the Compilation works + +There are four main phases to running a Toy source file. These are: + +``` +lexing -> parsing -> compiling -> interpreting +``` + +Each phases has a dedicated set of functions and structures, and there are intermediate structures between these stages that carry the information from one set to another. + +``` +source -> lexer -> token +token -> parser -> AST +AST -> compiler -> bytecode +bytecode -> interpreter -> result +``` + +## Lexer + +Exactly how the source code is loaded into memory is left up to the user, however once it's loaded, it can be bound to a `Lexer` structure. + +```c +Lexer lexer; +initLexer(&lexer, source); +``` + +The lexer, when invoked, will produce a break down the string of characters into individual `Tokens`. + +The lexer does not need to be freed after use, however the source code does. + +## Parser + +The `Parser` structure takes a `Lexer` as an argument when initialized. + +```c +Parser parser; +initParser(&parser, &lexer); + +ASTNode* node = scanParser(&parser); + +freeParser(&parser); +``` + +The parser takes tokens, one at a time, and converts them into structures called Abstract Syntax Trees, or ASTs for short. Each AST represents a single top-level statement within the Toy script. You'll know when the parser is finished when `scanParser()` begins returning `NULL` pointers. + +The AST Nodes produced by `scanParser()` must be freed manually, and the parser itself should not be used again. + +## Compiler + +The actual compilation phase has two steps - instruction writing and collation. + +```c +size_t size; +Compiler compiler; + +initCompiler(&compiler); +writeCompiler(&compiler, node); + +unsigned char* tb = collateCompiler(&compiler, &size); + +freeCompiler(&compiler); +``` + +The writing step is the process in which AST nodes are compressed into bytecode instructions, while literal values are extracted and placed aside in a cache (usually in an intermediate state). + +The collation phase, however is when the bytecode instructions, along with the now flattened intermediate literals and function bodies are combined. The bytecode header specified in [Developing Toy](developing-toy) is placed at the beginning of this blob of bytes during this step. + +The Toy bytecode (abbreviated to `tb`), along with the `size` variable indicating the size of the bytecode, are the result of the compilation. + +This bytecode can be saved into a file for later consumption by the host at runtime - ensure that the file has the `.tb` extension. + +The bytecode loaded in memory is consumed and freed by `runInterpreter()`. + +## Interpreter + +The interpreter acts based on the contents of the bytecode given to it. + +```c +Interpreter interpreter; +initInterpreter(&interpreter); +runInterpreter(&interpreter, tb, size); +freeInterpreter(&interpreter); +``` + +Exactly how it accomplishes this task is up to it - as long as the result matches expectations. + +## REPL + +An example program, called `toyrepl`, is provided alongside Toy's core. This program can handle many things, such as loading, compiling and executing Toy scripts; it's capable of compiling any valid Toy program for later use, even those that rely on non-standard libraries. + +To get a list of options, run `toyrepl -h`. + diff --git a/developing-toy.md b/developing-toy.md index 022abac..51fdc92 100644 --- a/developing-toy.md +++ b/developing-toy.md @@ -27,17 +27,3 @@ There are some strict rules when interpreting these values (mimicking, but not c All interpreter implementations retain the right to reject any bytecode whose header data does not conform to the above specification. The latest version information can be found in [common.h](https://github.com/Ratstail91/Toy/blob/0.6.0/source/common.h#L7-L10) - -## Embedded API - -The functions intended for usage by the API are prepended with the C macro `TOY_API`. The exact value of this macro can vary by platform, or even be empty. - -In addition, the macros defined in [literal.h](https://github.com/Ratstail91/Toy/blob/0.6.0/source/literal.h) are available for use when manipulating literals. These include: - -* `IS_*` - check if a literal is a specific type -* `AS_*` - use the literal as a specific type -* `TO_*` - create a literal of a specific type -* `IS_TRUTHY` - check if a literal is truthy -* `MAX_STRING_LENGTH` - the maximum length of a string in Toy (can be altered if needed) - -When you create a new Literal object, be sure to call `freeLiteral()` on it afterwards! If you don't, your program will leak memory as Toy has no internal tracker for such things. diff --git a/using-toy.md b/using-toy.md new file mode 100644 index 0000000..e890143 --- /dev/null +++ b/using-toy.md @@ -0,0 +1,96 @@ +# Using Toy + +This tutorial assumes that you've managed to embed Toy into your program by following the tutorial [Embedding Toy](embedding-toy). + +Here, we'll look at some ways in which you can utilize Toy's C API within your host program. + +Be ware that when you create a new Literal object, you must call `freeLiteral()` on it afterwards! If you don't, your program will leak memory as Toy has no internal tracker for such things. + +## Embedded API Macros + +The functions intended for usage by the API are prepended with the C macro `TOY_API`. The exact value of this macro can vary by platform, or even be empty. In addition, the macros defined in [literal.h](https://github.com/Ratstail91/Toy/blob/0.6.0/source/literal.h) are available for use when manipulating literals. These include: + +* `IS_*` - check if a literal is a specific type +* `AS_*` - cast the literal to a specific type +* `TO_*` - create a literal of a specific type +* `IS_TRUTHY` - check if a literal is truthy +* `MAX_STRING_LENGTH` - the maximum length of a string in Toy (can be altered if needed) + +## Structures Used Throughout Toy + +The main unit of data within Toy's internals is `Literal`, which can contain any value that can exist within the Toy langauge. The exact implementation of `Literal` may change or evolve as time goes on, so it's recommended that you only interact with literals directly by using the macros and functions outlined [above](#embedded-api-macros). See the [types](types) page for information on what datatypes exist in Toy. + +There are two main "compound structures" used within Toy's internals - the `LiteralArray` and `LiteralDictionary`. The former is an array of `Literal` instances stored sequentially in memory for fast lookups, while the latter is a key-value hashmap designed for efficient lookups based on a `Literal` key. These are both accessible via the language as well. + +These compound structures hold **copies** of literals given to them, rather than taking ownership of existing literals. + +## Compiling Toy Scripts + +Please see [Compiling Toy](compiling-toy) for more information on the process of turning scripts into bytecode. + +## Interpreting Toy + +The `Interpreter` structure is the beating heart of Toy - You'll usually only need one interpreter, as it can be reset as needed. + +The four basic functions are used as follows: + +```c +//assume "tb" and "size" are the results of compilation +Interpreter interpreter; + +initInterpreter(&interpreter); +runInterpreter(&interpreter, tb, size); +resetInterpreter(&interpreter); //You usually want to reset between runs +freeInterpreter(&interpreter); +``` + +In addition to this, you might also wish to "inject" a series of usable libraries into the interpreter, which can be `import`-ed within the language itself. This process only needs to be done once, after initialization, but before the first run. + +```c +injectNativeHook(&interpreter, "standard", hookStandard); +``` + +A "hook" is a callback function which is invoked when the given library is imported. `standard` is the most commonly used library available. + +``` +import standard; +``` + +Hooks can simply inject native functions into the current scope, or they can do other, more esoteric things (though this is not recommended). + +```c +//a utility structure for storing the native C functions +typedef struct Natives { + char* name; + NativeFn fn; +} Natives; + +int hookStandard(Interpreter* interpreter, Literal identifier, Literal alias) { + //the list of available native C functions that can be called from Toy + Natives natives[] = { + {"clock", nativeClock}, + {NULL, NULL} + }; + + //inject each native C functions into the current scope + for (int i = 0; natives[i].name; i++) { + injectNativeFn(interpreter, natives[i].name, natives[i].fn); + } + + return 0; +} +``` + +## Calling Toy from C + +In some situations, you may find it convenient to call a function written in Toy from the host program. For this, a pair of utility functions have been provided: + +```c +TOY_API bool callLiteralFn(Interpreter* interpreter, Literal func, LiteralArray* arguments, LiteralArray* returns); +TOY_API bool callFn (Interpreter* interpreter, char* name, LiteralArray* arguments, LiteralArray* returns); +``` + +The first argument must be an interpreter. The third argument is a pointer to a `LiteralArray` containing a list of arguments to pass to the function, and the fourth is a pointer to a `LiteralArray` where the return values can be stored (an array is used here for a potential future feature). The contents of the argument array is consumed and left in an indeterminate state (but is safe to free), while the returns array always has one value - if the function did not return a value, then it contains a `null` literal. + +The second arguments to these functions are either the function to be called as a `Literal`, or the name of the function within the interpreter's scope. The latter API simply finds the specified `Literal` if it exists and calls the former. As with most APIs, these return `false` if something went wrong. +