Efficient compilation of pattern matching is not exactly an open problem in computer science in the same way that implementing say, type systems, might be, but it’s still definitely possible to see a lot of mysticism surrounding it.
In this post I hope to clear up some misconceptions regarding the implementation of pattern matching by demonstrating one such implementation. Do note that our pattern matching engine is strictly linear, in that pattern variables may only appear once in the match head. This is unlike other languages, such as Prolog, in which variables appearing more than once in the pattern are unified together.
Structure of a Pattern Match
Pattern matching always involves a pattern (the match head, as we call it) and a value to be compared against that pattern, the matchee. Sometimes, however, a pattern match will also include a body, to be evaluated in case the pattern does match.
case 'some-value ; matchee (; match head [some-pattern "some body")]) ; match body (print!
As a side note, keep in mind that
case has linear lookup of match bodies. Though logarithmic or constant-time lookup might be possible, it is left as an exercise for the reader.
To simplify the task of compiling patterns to an intermade form without them we divide their compilation into two big steps: compiling the pattern’s test and compiling the pattern’s bindings. We do so inductively - there are a few elementary pattern forms on which the more complicated ones are built upon.
Most of these elementary forms are very simple, but two are the simplest: atomic forms and pattern variables. An atomic form is the pattern correspondent of a self-evaluating form in Lisp: a string, an integer, a symbol. We compare these for pointer equality. Pattern variables represent unknowns in the structure of the data, and a way to capture these unknowns.
|Pattern variable||Nothing||The matchee|
All compilation forms take as input the pattern to compile along with a symbol representing the matchee. Patterns which involve other patterns (for instance, lists, conses) will call the appropriate compilation forms with the symbol modified to refer to the appropriate component of the matchee.
Let’s quickly have a look at compiling these elementary patterns before looking at the more interesting ones.
defun atomic-pattern-test (pat sym) (= ,pat ,sym)) `(defun atomic-pattern-bindings (pat sym) ( '())
Atomic forms are the simplest to compile - we merely test that the symbol’s value is equal (with
=, which compares identities, instead of with
eq? which checks for equivalence - more complicated checks, such as handling list equality, need not be handled by the equality function as we handle them in the pattern matching library itself) and emit no bindings.
defun variable-pattern-test (pat sym) ( `true)defun variable-pattern-bindings (pat sym) (list `(,pat ,sym))) (
The converse is true for pattern variables, which have no test and bind themselves. The returned bindings are in association list format, and the top-level macro that users invoke will collect these and them bind them with
Composite forms are a bit more interesting: These include list patterns and cons patterns, for instance, and we’ll look at implementing both. Let’s start with list patterns.
To determine if a list matches a pattern we need to test for several things:
- First, we need to test if it actually is a list at all!
- The length of the list is also tested, to see if it matches the length of the elements stated in the pattern
- We check every element of the list against the corresponding elements of the pattern
With the requirements down, here’s the implementation.
defun list-pattern-test (pat sym) (and (list? ,sym) ; 1 `(= (n ,sym) ,(n pat)) ; 2 (map (lambda (index) ; 3 ,@(nth pat index) `(nth ,sym ,index))) (pattern-test (1 :to (n pat))))) (range :from
To test for the third requirement, we call a generic dispatch function (which is trivial, and thus has been inlined) to compile the th pattern in the list against the th element of the actual list.
List pattern bindings are similarly easy:
defun list-pattern-bindings (pat sym) (lambda (index) (flat-map (nth pat index) `(nth ,sym ,index))) (pattern-bindings (1 :to (n pat)))) (range :from
Compiling cons patterns is similarly easy if your Lisp is proper: We only need to check for
list-ness, less generally), then match the given patterns against the car and the cdr.
defun cons-pattern-test (pat sym) (and (list? ,sym) `(cadr pat) `(car ,sym)) ,(pattern-test (caddr pat) `(cdr ,sym)))) ,(pattern-test ( defun cons-pattern-bindings (pat sym) (append (pattern-bindings (cadr pat) `(car ,sym)) (caddr pat) `(cdr ,sym)))) (pattern-bindings (
Note that, in Urn,
cons patterns have the more general form
(pats* . pat) (using the asterisk with the usual meaning of asterisk), and can match any number of elements in the head. It is also less efficient than expected, due to the nature of
cdr copying the list’s tail. (Our lists are not linked - rather, they are implemented over Lua arrays, and as such, removing the first element is rather inefficient.)
Now that we can compile a wide assortment of patterns, we need a way to actually use them to scrutinize data. For this, we implement two forms: an improved version of
destructuring-bind is simple: We only have a single pattern to test against, and thus no search is nescessary. We simply generate the pattern test and the appropriate bindings, and generate an error if the pattern does not mind. Generating a friendly error message is similarly left as an exercise for the reader.
Note that as a well-behaving macro, destructuring bind will not evaluate the given variable more than once. It does this by binding it to a temporary name and scrutinizing that name instead.
defmacro destructuring-bind (pat var &body) (let* [(variable (gensym 'var)) ( (test (pattern-test pat variable)) (bindings (pattern-bindings pat variable))] `(with (,variable ,var)if ,test (progn ,@body) ("pattern matching failure"))))) (error!
Implementing case is a bit more difficult in a language without
cond, since the linear structure of a pattern-matching case statement would have to be transformed into a tree of
else combinations. Fortunately, this is not our case (pun intended, definitely.)
defmacro case (var &cases) (let* [(variable (gensym 'variable))] ( `(with (,variable ,var)cond ,@(map (lambda (c) (car c) variable) `(,(pattern-test (let* ,(pattern-bindings (car c) variable) (cdr c)))) ,@( cases)))))
Again, we prevent reevaluation of the matchee by binding it to a temporary symbol. This is especially important in an impure, expression-oriented language as evaluating the matchee might have side effects! Consider the following contrived example:
case (progn (print! "foo") (123) 1 (print! "it is one")] [2 (print! "it is two")] ["it is neither")]) ; _ represents a wild card pattern. [_ (print!
If the matchee wasn’t bound to a temporary value,
"foo" would be printed thrice in this example. Both the toy implementation presented here and the implementation in the Urn standard library will only evaluate matchees once, thus preventing effect duplication.
Unlike previous blog posts, this one isn’t runnable Urn. If you’re interested, I recommend checking out the actual implementation. It gets a bit hairy at times, particularly with handling of structure patterns (which match Lua tables), but it’s similar enough to the above that this post should serve as a vague map of how to read it.
In a bit of a meta-statement I want to point out that this is the first (second, technically!) of a series of posts detailing the interesting internals of the Urn standard library: It fixes two things in the sorely lacking category: content in this blag, and standard library documentation.
Hopefully this series is as nice to read as it is for me to write, and here’s hoping I don’t forget about this blag for a year again.