Composable Scala Matchers

Introduction

When people ask me what are the most important differences between functional programming and other paradigms of programming, I say functional composition, pattern matching and non-strict (lazy) evaluation. Lots of languages have lambdas, and functional languages have lots of other good stuff. But the three features I mention are, to me at least, the essence of functional programming.

Pattern matching is a truly awesome feature and I like to describe patterns, jokingly, as the “dark” side of expressions. The yin of the expression’s yang. But, however powerful and convenient such patterns are, they are not easily composed or, at best, can give rise to complex, nested match clauses that get ugly. So, having a specific application in mind for a more composable set of matchers, I set out to create a library of composable matchers which you can find at Matchers.

Of course I modeled my design on the excellent-but-slightly-quirky Scala Parser-combinators (SPC) library. The Parsers class became Matchers, and the ParseResult class became MatchResult. Many of the methods, particularly, on the MatchResult class are just like those of the parsers. And some arose either from an exercise in algebraic design — or from an application use case.

You might want to skip much of the following detail and go straight to the section on usage.

The obvious place to start, then, is with MatchResult shown, for brevity, without any scaladoc (hoping that the names are sufficiently helpful — there is scaladoc in the source of course):

sealed trait MatchResult[+R] {
def successful: Boolean
def get: R
def success[S](s: => S): MatchResult[S] = MatchResult(s)
def flatMap[S](f: R => MatchResult[S]): MatchResult[S]
def getOrElse[S >: R](s: => S): S
def andThen[S](sm: => MatchResult[S]): MatchResult[R ~ S]
def orElse[S >: R](sm: => MatchResult[S]): MatchResult[S]
def foreach(f: R => Unit): Unit
def guard[S](sm: => MatchResult[S]): MatchResult[S]
def |[S >: R](m: => Matcher[Any, S]): MatchResult[S]
def &[S >: R, T](m: => Matcher[S, T]): MatchResult[T]
}

Of course, MatchResult is a monad, with the success method taking the place of the usual unit method. Thus, map is defined thus:

def map[S](f: R => S): MatchResult[S] = flatMap(r => success(f(r)))

isEmpty is a synonym for !successful; ~ is a synonym for andThen; || is a synonym for orElse; and && is a synonym for guard. && and guard differ from ~/andThen in that the value of this MatchResult is discarded (only its sense — success or otherwise — is used).

There are three case classes which extend MatchResult: Match, Miss, and Error. Their definitions are as follows:

case class Match[+R](r: R) extends MatchResult[R]abstract class Unsuccessful[+R, X](x: X) extends MatchResult[R]case class Miss[T, +R](msg: String, t: T) extends Unsuccessful[R, String](msg)case class Error[+R](e: Throwable) extends Unsuccessful[R, Throwable](e)

Additionally, there is an apply method defined in the MatchResult object which evaluates its call-by-name R parameter inside Try and then yields the appropriate result:

def apply[R](r: => R): MatchResult[R] = Try(r) match {
case Success(x) => Match(x)
case Failure(e) => Error(e)
}

Next, we take a look at Matcher:

trait Matcher[-T, +R] extends (T => MatchResult[R]) {...}

The big difference between Matcher and Parser is that the latter only has one parametric type defined in the class definition. Its equivalent of T is called Input and is defined using the type alias mechanism in trait Parsers. That mechanism works well for the SPC because almost all parsers read from the same input: namely a CharSequence.

But, for our purposes, I needed to define two parametric types for every Matcher so that the input and result types could be independent. In practice, we often compose matchers by chaining them and the intermediate result types can vary considerably.

So, again, we’d like the trait Matcher to be monadic, so we define unit, map, and flatMap:

def unit[S](s: S): Matcher[T, S] = _ => Match(s)
def map[U <: T, S](f: R => S): Matcher[U, S] = this (_) map f
def flatMap[U <: T, S](f: R => MatchResult[S]): Matcher[U, S] = this (_) flatMap f

The ^^ method, which tends to be used a lot in Parsers rather than map, is just a named version of map.

For the remaining methods in the Matcher trait, we have !, ?, |(Matcher), &(Matcher), ~(Matcher), ~>(Matcher), <~(Matcher), and chain(Matcher).

As in the SPC code, we need to be able to construct a Matcher[T, R] from a function T => MatchResult[R] and so we define a method (in trait Matchers) called Matcher (with an upper-case M) to do just that. There is a similar method called lift which constructs a Matcher[T, R] from a function of type T => R.

Additional methods defined in trait Matchers include: success(R), fail(String), error(Throwable), always, filter(R => Boolean), maybe(Boolean), matches(T)., as well as various types of valve. Additionally, there are methods which take one or two Matcher parameters, such as not(Matcher, R), opt(Matcher), *(Matcher), and **(Matcher). These latter two methods, together with many methods with names including select, are available for disentangling tuples and other Product types (including case classes) which arise quite commonly with the various methods.

Because the ~ (concatenation) operator is so useful in both expressions and pattern-matching, we define a special “tilde” case class (just as the SPC does) called ~, together with a companion object of sorts called Tilde (I wasn’t able to use “~” for the object).

case class ~[+L, +R](l: L, r: R) {
def flip: ~[R, L] = Tilde(r, l)
}
object Tilde {
def apply[L, R](l: L, r: R): ~[L, R] = new ~(l, r)
}

Usage

This article would be as dry as dust if I didn’t give you an actual use case (the one for which I created Matchers). Well, maybe it’s as dry as dust anyway. But I press on. The application is a numeric library in Scala called Number. Why would we need another numeric library? Well, I will discuss it in more detail in my next article. But let’s just say that the numbers which it deals with are almost always exact. If a number cannot be exact, perhaps because of some trigonometric approximation or because a number is based on an observation with a range of measurement errors, then we represent it as a fuzzy number.

Yet, it would be a shame to have an expression for a number such as (√3 + 1)(√3 — 1) come out with fuzziness when we know just by looking at it that it’s value is actually 2 (exactly). Thus, it is desirable to try to simplify such expressions. This in turn requires the ability to match expressions and numbers.

The simplifier for an Expression is defined thus:

def simplifier: Transformer = simpler(biFunctionSimplifier | functionSimplifier :| "simplifier")

def biFunctionSimplifier: Transformer = matchBiFunction & biFunctionMatcher

Transformer is simply a type alias for Matcher[Expression, Expression]. Suffice it to say that there are several sub-types of Expression, including numbers, bi-functions (i.e. dyadic functions), and functions (monadic functions). The code depicted here shows that we try to match the expression as a bi-function (using the matcher matchBiFunction) and, if that’s successful, we invoke the biFunctionMatcher. Not shown here (and not really important for our current purpose) is the fact that the result of matchBiFunction and so the input to biFunctionMatcher is another type (actually, it’s an ExpressionBiFunction ~ Expression ~ Expression). If the expression is not a bi-function, then we will try to simplify it as a function. I will be writing much more about the Number package in an upcoming article.

Debugging these matchers can be tricky, just like debugging SPC-based code. For that reason, there is an implicit class defined for logging as follows:

implicit class MatcherOps[T, R](p: Matcher[T, R]) {
def :|(name: => String)(implicit ll: LogLevel, logger: MatchLogger): Matcher[T, R] = log(p.named(name))
}

We don’t need to go into details of MatchLogger, LogLevel, etc. for now. The :| method is particularly useful at the end of a set of alternative matchers such as the following actual code from Number:

def matchSimplifySum: Matcher[DyadicTriple, Expression] = (matchDyadicBranches(Sum) & (gatherSum | *(matchOffsetsOrIdentities))) | matchSimplifyPlusIdentity :| "matchSimplifySum"

The logging output will show that it is trying matchSimplifySum and then will show whether it succeeded in making a match (or instead a miss).

Easily the most frequently used matcher methods in any application are likely to be & and | on the Matchers type. Here are their definitions:

def &[S](m: Matcher[R, S]): Matcher[T, S] = t => this (t) & m
def |[U <: T, S >: R](m: Matcher[U, S]): Matcher[U, S] = t => match2Any(this, m)(t ~ t)

Here, the & method in the body is that of the MatchResult type.

A slightly more complex (and , hopefully, interesting) method from Matchers is the following:

def *[T, R](m: Matcher[T ~ T, R], flip: Boolean = true): Matcher[T ~ T, R] = m | (maybe[T ~ T](flip) & swap & m)

Note that the * method was used in matchSimplifySum (above).

The idea here is that an expression such as a + b is commutative, i.e. its value is the same as b + a. Therefore, when we try to match a pair of Ts (defined using the ~ class), we usually want to try each of them in turn using the same logic. In this case, the matching logic is defined by m where m tries to match on the pattern x ~ y. Thus, if m succeeds, all is well and the result is returned. If m does not succeed then, if flip is true, we swap the order of the pair and retry m, effectively matching now on y ~ x. Passing the optional parameter flip as false takes care of the situation where an expression is not commutative.

Regular Expressions

Matchers is in no way intended as a replacement for SPC. Nevertheless, it is useful to be able to perform simple string parsing without having to depend on SPC.

First, we define a type alias as follows:

type Parser[R] = Matcher[String, R]

We then define an implicit class ParserOps(s: String) which includes several methods to yield instances of Parser:

def m: Parser[String]
def regex: Parser[String]
def regexGroups: Parser[List[String]]

The m method simply matches s literally. regex treats s as a regular expression and matches accordingly. regexGroups treats s as a regular expression with groups defined by parentheses — and matches accordingly, resulting in a List[String]. Unlike with the SPC, there is no provision for making partial matches — the entire string must match.

Here is an example (used to help with parsing the IMDB movie database):

case class Rating(code: String, age: Option[Int])
val m = new Matchers {}
val p = m.parser2("""(\w+)(-(\d+))?""", 1, 3)(m.always, m.opt(m.parserInt))(Rating)
p("PG-13") // successfully results in a Match[Rating]
p("R") // ditto

In this case, we want to be able to parse the rating of a movie (defined by case class Rating). The parser2 method is defined for trait Matchers. This invocation selects the first and third groups matched by the regular expression and passes those to the always and opt matchers, respectively. In this case, we want to optionally match an Int. The result is passed to the apply method of Rating (in a manner very similar to the way any Json reader works.

As you can see, for an entire String, this parsing mechanism is quite powerful. There is a parser for ~ so that we can parse a chain of one or more strings. But, the flexibility of SPC is far superior for the more general parsing of a free-form document.

Conclusion

I hope you found this interesting. I tried to make the description live and breathe with actual examples of usage. However, I’m all too aware that it’s basically just a list of method definitions. I do believe that Matchers is an important and useful library for Scala, as an alternative to or in addition to the more usual Scala Parser Combinators. I wasn’t able to find anything that did anything similar.

The versions used for this article are: Matchers: 1.0.3 and Numbers: 1.0.8.

I would love to hear from you if you’d like to contribute to this project. There’s lots more to be done, for example building and testing with Scala 3.

I’ve been addicted to programming for over 50 years now. I currently teach two classes to graduates at Northeastern University, Boston, USA.