when-not-to-use-auxiliary-syntax


In addition to generating identifiers, macros often want to match specific identifiers, and in general in a hygieneic macro system this matching must also be done hygienically.

The canonical example is cond. In the following code:

 (let ((else #f)) 
   (cond (else 'ok))) 

the result is not 'ok but is rather unspecified. This is because the else in the cond must obey lexical scope and refer to the local variable, not the keyword defined in the cond macro. Some people find this disconcerting, but it is unavoidable because the cond syntax is ambiguous.

On the other hand, the case syntax is unambiguous. Given:

 (let ((else #f)) 
   (case 'anything (else 'ok))) 

there's no reason to match the else hygienically, and we can just treat it as a raw symbol name and return 'ok here. The standard suggests that's not the case, but it's a legal extension.

If you use only defmacro this all sounds silly and there is no case where the else wouldn't match. At the other extreme, some people consider it a sin to even write macros that work on data rather than abstract syntax objects. As usual, most people fall in between and it is a matter of degree - when and why should you use literal matching as opposed to hygienic matching. This page is here to discuss those situations.

Data languages

If you are writing a macro that attempts to understand parts or all of a sexp-language, then by definition you have no choice but to match unhygienically. Usually the whole point of a sexp-language is that you don't need macros, but it can be useful to layer. Macros can provide optimizations, fill in boilerplate, or provide local bindings from the language.

It's still essential that the macro itself be hygienic, such as if there is a body associated with or embedded in the data. For example, if the macro is just performing an optimization on static data, it may be that the data is generated by a free-form expression. Static parts of the data are either inferred by the expression being a quote, or perhaps by the expression being implicitly quasiquoted, with dynamic portions inserted with unquote or unquote-splicing. In this case not only must the hygiene of the dynamic expressions be preserved, the quote and unquotes must be matched hygienically. That is, given:

 (my-macro (quote (my-sexp-lang ...)) body ...) 

we can only perform our static analysis if quote is the quote we think it is, even though we strip hygiene from the quoted expression.

Unambiguous keywords

Sometimes we use keywords essentially as named placeholders for readability (as in case above), or for their visual effect as in => used in many macros. In many cases these are unambiguous - they do not occur in an expression context, and therefore could never interact with lexical scope. In these cases it's perfectly safe, and can be considered a nice feature, to match unhygienically so that we don't need to import or export the auxiliary syntax or worry about conflicts or renaming.

As mentioned case is one such example, another is the R6RS define-record-type syntax:

 (define-record-type Rectangle 
   (parent Shape) 
   (fields 
    (immutable width width-of) 
    (immutable height height-of)) 
   (sealed #t)) 

to illustrate some of the syntax. In R6RS parent, fields, immutable and sealed are all auxiliary syntax. However, they are really just named parameters - this is a single unified macro and there can be no ambiguity nor interaction between these keywords and lexical scope even if you wanted. Thus it's perfectly safe to just match these literally.

The alternative, to treat these as auxiliary syntax, means you may have to rename in some situations to something like:

 (define-record-type Rectangle 
   (record-parent Shape) 
   (record-fields 
    (record-immutable width width-of) 
    (record-immutable height height-of)) 
   (record-sealed #t)) 

Some people like the ability to rename. One possible use would be to provide localized versions of macros where not only the macros are renamed but their auxiliary syntax is as well. Others consider this just a cause of obfuscation.

Note it is also possible to accept a match if either the raw symbol or the hygienic identifier match, which is possibly the best of both worlds.