sexp-language


Intro

One of the greatest historic advantages of the Lisp family over other languages is its use of s-expressions (sexps) which can be read and written and thus transport arbitrary structured data which. In other languages such data was typically represented as raw textual or binary data using custom serializers and parsers and utility libraries, though in modern scripting languages the use of JSON and similar formats is becoming popular.

We call mini-languages built on sexps "sexp languages" and contrast them with the "string languages" which require parsers. These are distinct from DSLs (either macro or combinator based) in that they are data-driven. The advantages of using sexps are:

  1. A portable parser is provided for you
  2. Easier to learn
  3. Integrates easily with existing Scheme tools which handle sexps

A simple example of a sexp equivalent of a popular string representation could be a URI representation such as:

  (scheme host port path query fragment)

where the scheme is a symbol, the port an integer, the query an alist of (cons symbol string) pairs, and the other arguments strings.

SREs

A classic example of a string language that has grown to unwieldy extremes is PCRE syntax, which includes such syntax as "(?<!...)" and "(?(condition)yes-pattern|no-pattern)" and "(*THEN:NAME)". The brevity of the original operators was nice and no doubt helpful in regular expressions gaining in popularity, but it left no room for extension. However, the SRE syntax (first introduced by SCSH and extended and first supported as the native underlying representation in IrRegex) uses sexps to provide almost the same brevity, re-using the basic concepts of "*" and "?", but leaving room for extensibility and readable names for lesser used operators.

In SCSH SREs were written with the rx macro, but this had an implicit unquote. IrRegex treats SREs as pure data, and requires an explicit quasiquote. In both the expression was fundamentally dynamic - one could consider a pattern such as "`(or ,@(read))" to generate an alternation at runtime, however the rx macro was able to pre-compile fully static patterns.

Pipelines

Another example form the same original source, SCSH, is the SCSH process notation. Again, the original notation allowed for both convenience macros and a procedural interface, but the underlying language is data-based. Apart from certain symbols reserved to the language, a symbol in operator position is a program name, which cannot be resolved until runtime, at the point at which it is called.

Configs and Alists

Another common sexp language reinvented time and again is the configuration language. Simple formats are just key+value string pairs, while probably the most common sophisticated config format in other languages is the .ini format, but even this is limited to a single level of grouping. Sexps lend themselves naturally to trees of settings, as in (chibi config).

Simple key+value configs can still be supported, and these are just alists. Alists and plists can also be used as the very special case configs that are named function parameters.

SXML

The ultimate sexp language is perhaps SXML, a nicer representation of XML which was designed as the ultimate data language, and basis for many mini-languages. [S]XML addresses the fundamental flaw in the Unix philosophy of processing files with delimited line and record parsing tools, which is that much data is structured.