Matching regular expressions

Regular expressions naturally generalize from strings to trees and allow to test whether a given tree matches a given pattern. TeXmacs implements the primitives match? and match for this purpose, which also provide support for wildcards, user-defined grammars and more.

(match? expr pattern)
(check whether a scheme expression satisfies a pattern)

This function determines whether a scheme expression expr satisfies a given pattern. It will be detailed below how to form valid patterns. The matching routines recursively understand that native trees match their scheme counterparts. For instance, (match? (tree "x") "x") will return #t.

(match l pattern bindings)
(solutions to a given pattern under bindings)

Given a list l of scheme expressions, a pattern with free variables and an association list of bindings, this routine determines all substitutions of free variables by values (extending the given bindings), for which l matches the pattern.

(define-grammar rules*)
(user defined matching grammars)

Given a list of rules of the form (:var pattern-1 ... pattern-n), this instruction defines a new terminal symbol :var for each such rule, which matches the disjunction of the patterns pattern-1 until pattern-n. This terminal symbol can then be used as an abbreviation in matching patterns. Grammar rules may be interdependent.

Valid patterns are formed in the following ways:

leaf
(symbols, strings, etc.)

A leaf is only matched against itself.

(pattern-1 ... pattern-n)
(lists)

In the case when lists l-1 until l-n match pattern-1 until pattern-n, their concatenation matches the pattern (pattern-1 ... pattern-n).

:#1, :#2, :#3 ..., :*
(wildcards)

The wildcard :#n, where n is a number matches any list of length n. The wildcard :* matches any list, including the empty list.

'var
(variables)

This pattern attempts to bind the variable var against the expression. If var is used only once, then it essentially behaves as a wildcard. More generally, it can be used to form patterns with identical subexpressions. For instance, the pattern (frac 'x 'x) will match all fractions

x
x
.

:var
(user-provided grammar rules)

In the case when :var is a user-provided terminal symbol (see define-grammar above), this pattern matches the corresponding grammar.

:pred?
(arbitrary Scheme predicates)

Given a Scheme predicate pred?, such as string?, this pattern matches any scheme expression which satisfies the predicate.

(:not pattern)

(:or pattern-1 ... pattern-n)

(:and pattern-1 ... pattern-n)
(logical operations)

Negation, disjunction and conjunction of patterns.

(:repeat pattern)
(repetition)

Given lists l-1 until l-n which match pattern, their concatenation matches the repetition (:repeat pattern). In particular, the empty list is matched.

(:group pattern-1 ... pattern-n)
(grouping)

Groups a concatenation of patterns into a new list patterns. For instance, all lists of the form (a b a b ... a b) are matched by (:repeat (:group a b)), whereas (:repeat (a b)) rather matches all lists of the form ((a b) (a b) ... (a b)).

(:quote expr)
(quotation)

Only matches a given expression expr.

Example 1. The tree

(define t '(foo (bar "x") (bar "y") (option "z")))

matches the pattern (foo (:repeat (bar :#1)) :*), but not (foo (:repeat (bar 'x)) :*). The call (match t '(foo 'x 'y :*)) will return (((x . (bar "x")) (y . (bar "y")))).

Example 2. Consider the grammar

(define-grammar

(:a a b c)

(:b (:repeat :a)))

Then the list (a b x y c a a) matches the pattern (:b :#2 :b).

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".