Wednesday, 27 July 2016

Slider

Download link of Zip file (link removed - file contained errors)

This programming language, named Slider (after its garbage collector, which slides objects along in memory), does garbage collection by crawling the web of object references, starting at the roots (variables and parameters of running function invocations), building up a list of pointers to live (used) objects, then sorting that list, then iterating through it and moving each live object into the next available space in the heap. An object contains: first, a metadata header, then any references, then any value bytes. The compiler, which is written in Yeti and outputs C source code, reorders struct fields as needed, placing references before value fields and giving the same field order to structs having the same field names with compatible types. The header contains: first, the object's type number, then its relocation pointer (used when moving objects, and updating references to them to point to their new locations). String literals (which can't be empty, and are between " and ") become byte arrays, whose type is 'bytes'. There are inbuilt functions that handle arrays, threads, logic and arithmetic, and writing to the command line. Variable and parameter names must be encased in [ and ], and unique within a function. Expressions are between { and }, and long number literals are between ( and ). The 'd' command defines a variable, e.g. d [myVar]="a string"; and can contain the special values 'true','false', and 'null'. A type must be written for number values of types other than long, e.g d [myNum]:byte=123;. Commands end in ;. Functions start with !, then their name, then | and a space or their parameters (separated by commas), then | and their commands. The 'main' function has no parameters, can't be invoked and can't contain a 'return' command; other functions must end with one that returns something, and need at least 1 parameter; functions referred to by function values must have exactly 1. Functions are invoked with : and their parameters (separated by commas) e.g. d [result]=myFunc:(123);. The 's' command can set struct fields, or invoke inbuilt functions with no result or ignoring their result. Function values are made with _, then the function name, then : e.g. d [funcVal]=_myFunc: ;. Structs are made with $, then their field names (separated by _), then : and their field values (separated by commas). Variants are made with the case name, then $: and the content. The 'getTag' command destructures a variant, contains 'case' commands which include the case name then a space and content variable, and ends with an 'endGet' command. Condition blocks are encased in < and >, and include the condition, then @ and the commands. Struct fields are accessed with ., e.g [myStruct].myField.

Reserved words include 'ref','refs','bytes','TaggedObj','true','false', 'null','size','gettype','tag','main' and the value types (bool, byte, short, int and long). Fractions and unsigned numbers are not supported. Names can't be numbers, and  can contain letters and digits. Some instruction sets forbid unaligned memory access; currently there may be issues running Slider code on them. There are inbuilt 'get' and 'set' functions for the numeric value types in byte arrays, and for references in reference arrays. The send_msg function appends a byte array to a thread's message queue, and the getmsg function retrieves this thread's message queue's content as a byte array and clears that queue, or returns the byte array that was passed into it if the queue is already empty. Threads are identified by numbers; the thread_num function returns this thread's number. The second command line argument to the compiler sets the number of threads, and the first sets the file to compile. The bytescopy and refscopy functions copy a range of array items, and the byteslen and refslen functions return an array's length. Because Slider implements parametric polymorphism via type erasure rather than multi-instantiation, it doesn't apply to value types.  Some objects have a footer.

An example of a thread error: program 1 has opened file A, and is obtaining a lock on file B, but program 2 has already opened file B, and won't close it until opening file A. Broadly there are 2 ways to prevent thread errors: message passing and atomic transactions. Message passing makes many threads running on 1 machine behave like many machines, which interact by transmitting data to each other over a network instead of sharing the same storage device: rather than sharing data between threads, threads append data to each other's message queues. The heap is the section of memory that stores objects. Erlang, a language used in high-volume communication systems, provides message passing with a separate garbage-collected heap for each thread, as does Slider.

Transactions are often used in databases. A transaction is a series of operations that behaves like 1 operation: either they all complete, or none of them does, and no other threads can access the data they apply to until they all complete. An example: a database with an Accounts table backs a bank website, and a money transfer is a transaction with 2 operations: adding money to an account, and subtracting it from another.

The stack is the section of memory that stores the identifiers, variables, and parameters of all running function invocations within a particular thread. Each thread has its own stack. The entry of a function invocation in the stack is called a stack frame. If a function's last action is to invoke a function and return its result, the invoking function's stack frame can be replaced by the invoked function's instead of making another stack frame, saving stack space: this is called tail call elimination, and can be implemented using continuation-passing style, which Slider does. Variants are implemented using the TaggedObj struct.

Type inference is done by building up a list of facts about types or requirements they must fulfil, first per each function, then between functions by looking at their invocations of each other, e.g if a function adds 2 variables, they must be numbers, and if they come from struct fields, those fields must be numbers too. Slider is inspired partly by row-typed programming languages like Yeti, PureScript etc.

The popular OSs of modern times (apart from some embedded systems) are derived not from the small, simple, single-user, single-tasking systems (e.g. DOS and classic Macintosh) but the large-scale time-sharing server systems (e.g. Unix and VMS) of olden times. They group running application programs (threads) into 'processes' each of which is confined to its own memory address space and disallowed raw access to external hardware devices. They make each physical processor/core switch rapidly between running threads. Return Infinity's BareMetal OS is different. Slider code runs on either Unix/Linux (using the pthreads library) or BareMetal.

This is the first  time I've written a compiler, and it may still contain errors which, if you find, you can report in comments on this blog. The compiler and runtime system are still under construction.

Yeti can be obtained here. If the yeti.jar file is in the same folder the Zip file is unzipped into, and Java and Clang are installed, running the following commands in that folder can test the Slider compiler:
java -jar yeti.jar -d . miscm.yeti
java -jar yeti.jar -cp . compile.yeti testth.txt 2 > o.c
clang -fsanitize=integer -Werror=int-conversion -pthread -o o o.c

Then the resulting program file "o" can be run.

Option types - a special kind of variant type that represents a reference that might be null - may be added to Slider in future.

Slider can be adapted to particular use cases and situations. For example:

  • Currently Slider runs on 64-bit systems - a few small changes would probably be sufficient to make it run on 32-bit or 16-bit systems also.
  • Slider programs can't do much now other than write text to standard output. Wrappers around OS functions can be added, providing more capabilities.
  • Slider uses 4-byte characters (UTF-32) - this could be changed to 2-byte (UTF-16) or 1-byte (UTF-8).
  • Functional programming means that after an object is created, its content is not changed. If the web of object references is arranged mostly in accordance with functional programming, at the time garbage collection happens, then I suspect that, in many cases, Slider can be made faster by using an adaptive sorting algorithm, such as Blocksort, instead of Heapsort which it now uses, sorting the pointer list into reverse order (it must then be looped through in reverse order), and looping through the roots in reverse order. An adaptive sorting algorithm is one that tends to be faster if the items are already partly sorted. Also, using some kind of Radixsort/Bucketsort would make  the sorting faster, if the locations of live objects are mostly spread out evenly across the heap.
  • Object-oriented programming means that the same object can behave both as a record and as a variant. 
Some of the above changes may be introduced in the next version of Slider.

No comments:

Post a Comment