Saturday 15 August 2020

Slider 2

Download link of Zip file
After developing a new Slider version, and making further adjustments to it, I've published it in this Zip file, along with a PDF file documenting it. The Zip file includes the PDF, compiler and runtime system, which I expect are ready to use.

As stated in the first version's description, this programming language (named after its garbage collector, which slides objects along in memory) does garbage collection by crawling the web of object references, starting at the roots (variables and parameters of running function invocations), building up a list of pointers to live (used) objects, then sorting that list, then iterating through it and moving each live object into the next available space in the heap.

The compiler is written in Yeti and outputs C source code. Slider uses message-passing concurrency with a separate garbage-collected heap for each thread, and has tail calls, a static type system with type inference, and polymorphic records similar to Yeti. The Slider compiler can infer the types of record fields and variant cases, including recursive types. Record and variant types need not be defined manually.


The Slider 2 project has consisted of developing a new version of Slider, correcting the old errors and adding much additional functionality and a far more clear and expressive new syntax (shown and described in the tutorials and language reference contained in the PDF) similar to Java.
 
The following features have been added to Slider, as of version 2:  

1. A fast sorting process/algorithm for the pointer list, which must be sorted so that live objects can be moved without damaging each other. Moving existing live objects enables new objects to be created quickly.
2. Object-oriented programming, which enables the same object to behave both as a record and as a variant.
3. A clear and conventional syntax. As can be seen in the example code (in the tutorials), Slider looks similar to Java, the most striking difference being the absence of manual type annotations.
4. Option types - a special kind of variant type that represents a reference that might be null.
5. Wrappers around more C functions have been added - the ability to call various C functions provides more capabilities, e.g. interacting with the file system, drawing graphics, writing network services, interacting with databases, or serving dynamic Web content.
 
The Slider compiler has been tested running on Java 8, and generates C code that has been tested by compiling with GCC and Clang (the two most popular open source C compilers) into both 32-bit and 64-bit code. As shown in the tutorials, Slider programs are now able to run as FastCGI application servers (which enables writing dynamic Web content in Slider), and to call APIs such as OpenGL ES and the standard C file API (among others). 

The following features are planned for future version(s) of Slider:
1. Optional region-based memory management with region inference, similar to SML (MLKit). This has the potential to reduce pause times without necessarily reducing overall performance - indeed, it often increases performance, and may enable Slider to compete with hand-written high-performance C or C++ on raw speed in the general case.
2. Integration with functions that parse and write various binary formats and are auto-generated from descriptions of those formats (similar to protobuf).
3. Inner classes that are defined in a method and record variables from it, similar to Java's anonymous inner classes.
4. General method overloading. Currently Slider methods are overloaded only based on the number of parameters and the "this" parameter (in Slider, there is no sharp distinction between overloading and overriding).
5. Named lists of classes that can be used in a "case" command.
6. Extension methods, i.e. methods of a class that are declared outside the class.

7. Integration with tools that prove properties of programs, such as absence of runtime errors (e.g. out-of-bounds access to arrays or strings).

8. Various other enhancements in performance and in the type system's flexibility.

9. Ability of Slider threads to suspend, resume, or yield to each other, passing data to each other on the C stack as they do.

Wednesday 27 July 2016

Slider

Download link of Zip file (link removed - file contained errors)

This programming language, named Slider (after its garbage collector, which slides objects along in memory), does garbage collection by crawling the web of object references, starting at the roots (variables and parameters of running function invocations), building up a list of pointers to live (used) objects, then sorting that list, then iterating through it and moving each live object into the next available space in the heap. An object contains: first, a metadata header, then any references, then any value bytes. The compiler, which is written in Yeti and outputs C source code, reorders struct fields as needed, placing references before value fields and giving the same field order to structs having the same field names with compatible types. The header contains: first, the object's type number, then its relocation pointer (used when moving objects, and updating references to them to point to their new locations). String literals (which can't be empty, and are between " and ") become byte arrays, whose type is 'bytes'. There are inbuilt functions that handle arrays, threads, logic and arithmetic, and writing to the command line. Variable and parameter names must be encased in [ and ], and unique within a function. Expressions are between { and }, and long number literals are between ( and ). The 'd' command defines a variable, e.g. d [myVar]="a string"; and can contain the special values 'true','false', and 'null'. A type must be written for number values of types other than long, e.g d [myNum]:byte=123;. Commands end in ;. Functions start with !, then their name, then | and a space or their parameters (separated by commas), then | and their commands. The 'main' function has no parameters, can't be invoked and can't contain a 'return' command; other functions must end with one that returns something, and need at least 1 parameter; functions referred to by function values must have exactly 1. Functions are invoked with : and their parameters (separated by commas) e.g. d [result]=myFunc:(123);. The 's' command can set struct fields, or invoke inbuilt functions with no result or ignoring their result. Function values are made with _, then the function name, then : e.g. d [funcVal]=_myFunc: ;. Structs are made with $, then their field names (separated by _), then : and their field values (separated by commas). Variants are made with the case name, then $: and the content. The 'getTag' command destructures a variant, contains 'case' commands which include the case name then a space and content variable, and ends with an 'endGet' command. Condition blocks are encased in < and >, and include the condition, then @ and the commands. Struct fields are accessed with ., e.g [myStruct].myField.

Reserved words include 'ref','refs','bytes','TaggedObj','true','false', 'null','size','gettype','tag','main' and the value types (bool, byte, short, int and long). Fractions and unsigned numbers are not supported. Names can't be numbers, and  can contain letters and digits. Some instruction sets forbid unaligned memory access; currently there may be issues running Slider code on them. There are inbuilt 'get' and 'set' functions for the numeric value types in byte arrays, and for references in reference arrays. The send_msg function appends a byte array to a thread's message queue, and the getmsg function retrieves this thread's message queue's content as a byte array and clears that queue, or returns the byte array that was passed into it if the queue is already empty. Threads are identified by numbers; the thread_num function returns this thread's number. The second command line argument to the compiler sets the number of threads, and the first sets the file to compile. The bytescopy and refscopy functions copy a range of array items, and the byteslen and refslen functions return an array's length. Because Slider implements parametric polymorphism via type erasure rather than multi-instantiation, it doesn't apply to value types.  Some objects have a footer.

An example of a thread error: program 1 has opened file A, and is obtaining a lock on file B, but program 2 has already opened file B, and won't close it until opening file A. Broadly there are 2 ways to prevent thread errors: message passing and atomic transactions. Message passing makes many threads running on 1 machine behave like many machines, which interact by transmitting data to each other over a network instead of sharing the same storage device: rather than sharing data between threads, threads append data to each other's message queues. The heap is the section of memory that stores objects. Erlang, a language used in high-volume communication systems, provides message passing with a separate garbage-collected heap for each thread, as does Slider.

Transactions are often used in databases. A transaction is a series of operations that behaves like 1 operation: either they all complete, or none of them does, and no other threads can access the data they apply to until they all complete. An example: a database with an Accounts table backs a bank website, and a money transfer is a transaction with 2 operations: adding money to an account, and subtracting it from another.

The stack is the section of memory that stores the identifiers, variables, and parameters of all running function invocations within a particular thread. Each thread has its own stack. The entry of a function invocation in the stack is called a stack frame. If a function's last action is to invoke a function and return its result, the invoking function's stack frame can be replaced by the invoked function's instead of making another stack frame, saving stack space: this is called tail call elimination, and can be implemented using continuation-passing style, which Slider does. Variants are implemented using the TaggedObj struct.

Type inference is done by building up a list of facts about types or requirements they must fulfil, first per each function, then between functions by looking at their invocations of each other, e.g if a function adds 2 variables, they must be numbers, and if they come from struct fields, those fields must be numbers too. Slider is inspired partly by row-typed programming languages like Yeti, PureScript etc.

The popular OSs of modern times (apart from some embedded systems) are derived not from the small, simple, single-user, single-tasking systems (e.g. DOS and classic Macintosh) but the large-scale time-sharing server systems (e.g. Unix and VMS) of olden times. They group running application programs (threads) into 'processes' each of which is confined to its own memory address space and disallowed raw access to external hardware devices. They make each physical processor/core switch rapidly between running threads. Return Infinity's BareMetal OS is different. Slider code runs on either Unix/Linux (using the pthreads library) or BareMetal.

This is the first  time I've written a compiler, and it may still contain errors which, if you find, you can report in comments on this blog. The compiler and runtime system are still under construction.

Yeti can be obtained here. If the yeti.jar file is in the same folder the Zip file is unzipped into, and Java and Clang are installed, running the following commands in that folder can test the Slider compiler:
java -jar yeti.jar -d . miscm.yeti
java -jar yeti.jar -cp . compile.yeti testth.txt 2 > o.c
clang -fsanitize=integer -Werror=int-conversion -pthread -o o o.c

Then the resulting program file "o" can be run.

Option types - a special kind of variant type that represents a reference that might be null - may be added to Slider in future.

Slider can be adapted to particular use cases and situations. For example:

  • Currently Slider runs on 64-bit systems - a few small changes would probably be sufficient to make it run on 32-bit or 16-bit systems also.
  • Slider programs can't do much now other than write text to standard output. Wrappers around OS functions can be added, providing more capabilities.
  • Slider uses 4-byte characters (UTF-32) - this could be changed to 2-byte (UTF-16) or 1-byte (UTF-8).
  • Functional programming means that after an object is created, its content is not changed. If the web of object references is arranged mostly in accordance with functional programming, at the time garbage collection happens, then I suspect that, in many cases, Slider can be made faster by using an adaptive sorting algorithm, such as Blocksort, instead of Heapsort which it now uses, sorting the pointer list into reverse order (it must then be looped through in reverse order), and looping through the roots in reverse order. An adaptive sorting algorithm is one that tends to be faster if the items are already partly sorted. Also, using some kind of Radixsort/Bucketsort would make  the sorting faster, if the locations of live objects are mostly spread out evenly across the heap.
  • Object-oriented programming means that the same object can behave both as a record and as a variant. 
Some of the above changes may be introduced in the next version of Slider.

Thursday 14 July 2016

Some software I wrote in Java

Download link of Zip file

The program NG2.java is a sort of minimal word processor, and has been tested on Windows. Grid cell 'fusion' is done by showing another cell on top of the fused cells. The page, or the items it contains (grids, rows, columns, images, and text areas), can be altered by right-clicking on them. Items can be 'unsnapped' into windows, moved elsewhere, and 'snapped' into tabs or items; when an item's window is closed, it is snapped into or beside the item previously under the mouse; this somewhat resembles the windows in Integrated Development Environments (IDEs) like NetBeans.  If a saved page is reopened, its windows won't snap. There are other, smaller programs included in the Zip file with it; the 'stest' program shows minimal windows on screen. These programs are written in Java.

In the program SQLTest2.java, the Refresh button refreshes the combo box of tables. The Retrieve button runs an SQL query that returns a resultset, and the Make Change button runs a query that doesn't. The Insert button writes an Insert query, and the Create button writes a Create Table query. The Edit Rows button writes an Update query, using the column selected in the combo box near it in the Where clause. When a resultset is shown, it appears in a text box, and in a separate window containing a grid. When a cell in that grid loses focus, a change in its text is reflected in that text box, which the query-writing buttons use. Columns are assumed to be textual. The NG button shows the Nested Grids window (NG2.java), and after it's clicked the 'csv' menu item adds content from that text box into a grid in the page (cells must not contain commas, quotes, escape characters or newline characters).