Updated Deep Dive Document

This commit is contained in:
2023-07-25 21:44:34 +10:00
committed by GitHub
parent 744b74aed2
commit 7a46c9fc9a
9 changed files with 77 additions and 137 deletions

View File

@@ -2,7 +2,7 @@
This tutorial is a sub-section of [Embedding Toy](deep-dive/embedding-toy) that has been spun off into it's own page for the sake of brevity/sanity. It's recommended that you read the main article first.
The exact phases outlined here are entirely implementation-dependent - that is, they aren't required, and are simply how the canonical version of Toy works.
The exact phases outlined here are entirely implementation-dependent - that is, they aren't required, and are simply how the canonical implementation of Toy works.
## How the Compilation works
@@ -12,7 +12,7 @@ There are four main phases to running a Toy source file. These are:
lexing -> parsing -> compiling -> interpreting
```
Each phase has a dedicated set of functions and structures, and there are intermediate structures between these stages that carry the information from one set to another.
Each phase has a dedicated set of functions and structures, as well as intermediate structures between these that carry information.
```
source -> lexer -> token
@@ -23,7 +23,7 @@ bytecode -> interpreter -> result
## Lexer
Exactly how the source code is loaded into memory is left up to the user, however once it's loaded, it can be bound to a `Lexer` structure.
Exactly how the source code is loaded into a C-string is left up to the user, however once it's loaded, it can be bound to a `Toy_Lexer` structure.
```c
Toy_Lexer lexer;
@@ -47,7 +47,7 @@ Toy_ASTNode* node = Toy_scanParser(&parser);
Toy_freeParser(&parser);
```
The parser takes tokens, one at a time, and converts them into structures called Abstract Syntax Trees, or ASTs for short. Each AST represents a single top-level statement within the Toy script. You'll know when the parser is finished when `Toy_scanParser()` begins returning `NULL` pointers.
The parser pumps the lexer for tokens, one at a time, and converts them into structures called Abstract Syntax Trees (or ASTs for short). Each AST represents a single top-level statement within the Toy script. You'll know when the parser is finished with the lexer's source when `Toy_scanParser()` begins returning `NULL` pointers.
The AST Nodes produced by `Toy_scanParser()` must be freed manually, and the parser itself should not be used again.
@@ -60,22 +60,20 @@ size_t size;
Toy_Compiler compiler;
Toy_initCompiler(&compiler);
Toy_writeCompiler(&compiler, node);
Toy_writeCompiler(&compiler, node); //node is an Toy_ASTNode
unsigned char* tb = Toy_collateCompiler(&compiler, &size);
Toy_freeCompiler(&compiler);
```
The writing step is the process in which AST nodes are compressed into bytecode instructions, while literal values are extracted and placed aside in a cache (usually in an intermediate state).
The writing step is the process in which AST nodes are compressed into bytecode instructions, while literal values are extracted and placed aside in a cache (usually in a compressed, intermediate state).
The collation phase, however is when the bytecode instructions, along with the now flattened intermediate literals and function bodies are combined. The bytecode header specified in [Developing Toy](developing-toy) is placed at the beginning of this blob of bytes during this step.
The collation phase, however is when the bytecode instructions, along with the now flattened intermediate literals and function bodies are combined. The bytecode header specified in [Developing Toy](deep-dive/developing-toy) is placed at the beginning of this blob of bytes during this step.
The Toy bytecode (abbreviated to `tb`), along with the `size` variable indicating the size of the bytecode, are the result of the compilation.
The Toy bytecode (abbreviated to `tb`), along with the `size` variable indicating the size of the bytecode, are the result of the compilation. This bytecode can be saved into a file for later consumption by the host at runtime - you must ensure that any bytecode files have the `.tb` extension.
This bytecode can be saved into a file for later consumption by the host at runtime - ensure that the file has the `.tb` extension.
The bytecode loaded in memory is consumed and freed by `Toy_runInterpreter()`.
Alternatively, the bytecode in memory can be passed directly to the interpreter.
## Interpreter
@@ -88,11 +86,11 @@ Toy_runInterpreter(&interpreter, tb, size);
Toy_freeInterpreter(&interpreter);
```
Exactly how it accomplishes this task is up to it - as long as the result matches expectations.
Exactly how it accomplishes this task is implementation dependant - as long as the results match expectations.
## REPL
An example program, called `toyrepl`, is provided alongside Toy's core. This program can handle many things, such as loading, compiling and executing Toy scripts; it's capable of compiling any valid Toy program for later use, even those that rely on non-standard libraries.
An example program, called `toyrepl`, is provided alongside Toy's core. This program can handle many things, such as loading, compiling and executing Toy scripts; it's capable of compiling any valid Toy program for later use, even those that rely on non-standard libraries. It also has a number of commonly needed libraries provided.
To get a list of options, run `toyrepl -h`.