Write you an interpreter

Tweet
Posted on December 30, 2016 by Kwang Yul Seo
Tags: interpreter

Writing an interpreter for a functional language is a good exercise in Haskell. There are several tutorials on this topic.

Implementation techniques used in these tutorials are similar even though their source languages are distinct. They all compile the source language into a small core language based on lambda calculus, and evaluate the program with a context (or an environment).

In this post, I am not going to revisit this common technique. Instead, I will show you how to compile a program to a finite, fixed set of combinators (SKI), and then evaluate these combinators as normal Haskell function. This technique was introduced in Matthew Naylor’s Evaluating Haskell in Haskell.

The source code is available here.

Poly

We are going to borrow the parser and type checker from Stephen Diehls’s Poly, a simple ML dialect with definitions, let polymorphism and a fixpoint operator.

An example of Poly:

let rec factorial n = if (n == 0) then 1 else (n * (factorial (n-1)));

The core language of Poly is a variant of lambda calculus. Let, If, Fix and Op are added as additional constructs.

type Name = String

data Expr
  = Var Name
  | App Expr Expr
  | Lam Name Expr
  | Let Name Expr Expr
  | Lit Lit
  | If Expr Expr Expr
  | Fix Expr
  | Op Binop Expr Expr
  deriving (Show, Eq, Ord)

data Lit
  = LInt Integer
  | LBool Bool
  deriving (Show, Eq, Ord)

data Binop = Add | Sub | Mul | Eql
  deriving (Eq, Ord, Show)

Desugar

Our first task is to desugar Let, If, Fix and Op to simplify the later stage of compilation.

desugar :: Expr -> Expr
desugar (App fun arg) = App (desugar fun) (desugar arg)
desugar (Lam x body) = Lam x (desugar body)
desugar (Let x e body) = App (Lam x (desugar body)) (desugar e)
desugar (If cond tr fl) = foldl App (Var "$IF") args
   where args = map desugar [cond, tr, fl]
desugar (Fix e) = App (Var "$FIX") (desugar e)
desugar (Op op a b) = foldl App (Var n) args
  where
    args = map desugar [a, b]
    n = case op of
      Add -> "$ADD"
      Sub -> "$SUB"
      Mul -> "$MUL"
      Eql -> "$EQL"
desugar e = e

desugar function converts let x = e in body into (\x -> body) e. If, Fix are Op are desugared into function applications. $IF, $FIX, $ADD, $SUB, $MUL, $EQL will be provided as primitive functions. (Note that $IF can be a function because we piggy back on the lazy evaluation of the host language, Haskell.)

Compilation to SKI combinators

The next step is to compile expressions into a fixed, finite combinators. The key idea is to replace Lam and Ap constructors with Haskell’s built-in lambda and application constructs. The original interpreter of Poly is slow because it emulates beta reduction on top of Haskell, but our implementation avoids this overhead by utilizing the host system’s support for beta-reduction.

For example,

Lam "f" (Lam "a" (Lam "b" (App (App (Var "f") (Var "b") (Var "a")))

is compiled to

CLam (\f -> CLam (\a -> CLam (\b -> ap (ap f b) a)))

Here’s the definition of CExpr. You can see that CLam contains a Haskell function CExpr -> CExpr. No variable in the lambda abstraction is necessary.

data CExpr
  = CVar Name
  | CApp CExpr CExpr
  | CLam (CExpr -> CExpr)
  | CBool Bool
  | CInt Integer

compile transforms a lambda calculus expression into an expression involving only S, K, I and constants. The SK compilation algorithm is well described in Simon Peyton Jones’s The Implementation of Functional Programming Languages.

compile :: Expr -> CExpr
compile (Var n) = CVar n
compile (App fun arg) = CApp (compile fun) (compile arg)
compile (Lam x body) = abstract x (compile body)
compile (Lit (LInt k)) = CInt k
compile (Lit (LBool k)) = CBool k

abstract :: Name -> CExpr -> CExpr
abstract x (CApp fun arg) = combS (abstract x fun) (abstract x arg)
abstract x (CVar n) | x == n = combI
abstract _ k = combK k

combS :: CExpr -> CExpr -> CExpr
combS f = CApp (CApp (CVar "$S") f)

combK :: CExpr -> CExpr
combK = CApp (CVar "$K")

combI :: CExpr
combI = CVar "$I"

For example, (\x -> + x x) 5 is transformed as follows:

S --> S (\x -> + x) (\x -> x) 5
S --> S (S (\x -> +) (\x -> x)) (\x -> x) 5
I --> S (S (\x -> +) I) (\x -> x) 5
I --> S (S (\x -> +) I) I 5
K --> S (S (K +) I) I 5

Primitives

Here’s the definition of our primitive functions:

infixl 0 !
(!) :: CExpr -> CExpr -> CExpr
(CLam f) ! x = f x

primitives :: [(String, CExpr)]
primitives =
  [ ("$I", CLam $ \x -> x)
  , ("$K", CLam $ \x -> CLam $ \_ -> x)
  , ("$S", CLam $ \f -> CLam $ \g -> CLam $ \x -> f!x!(g!x))
  , ("$IF", CLam $ \(CBool cond) -> CLam $ \tr -> CLam $ \fl -> if cond then tr else fl)
  , ("$FIX", CLam $ \(CLam f) -> fix f)
  , ("$ADD", arith (+))
  , ("$SUB", arith (-))
  , ("$MUL", arith (*))
  , ("$EQL", logical (==))
  ]

arith :: (Integer -> Integer -> Integer) -> CExpr
arith op = CLam $ \(CInt a) -> CLam $ \(CInt b) -> CInt (op a b)

logical :: (Integer -> Integer -> Bool) -> CExpr
logical op = CLam $ \(CInt a) -> CLam $ \(CInt b) -> if op a b then true else false

true, false :: CExpr
true = CBool True
false = CBool False

Link

The final step is link our compiled program with other functions and primitives in the environment. link traverses the structure of CExpr and replaces CVar node with the actual function definition.

type TermEnv = Map.Map String CExpr

emptyTmenv :: TermEnv
emptyTmenv = Map.fromList primitives

link :: TermEnv -> CExpr -> CExpr
link bs (CApp fun arg) = link bs fun ! link bs arg
link bs (CVar n) = fromJust (Map.lookup n bs)
link _ e = e

Eval

Finally, eval is just a composition of desugar, compile and link env.

eval :: TermEnv -> Expr -> CExpr
eval env = link env . compile . desugar

runEval :: TermEnv -> String -> Expr -> (CExpr, TermEnv)
runEval env nm ex =
  let res = eval env ex in
  (res, Map.insert nm res env)

Optimization

The basic compilation algorithm shown above tends to produce large combinator expressions. New combinators such as B, C, S', B' and C' can optimize both execution speed and program size.