samhuri.net: A Scheme parser in Haskell: Part 1

A Scheme parser in Haskell: Part 1

3rd May, 2007 ∞

From Write Yourself a Scheme in 48 hours:

Basically, a monad is a way of saying "there's some extra information attached to this value, which most functions don't need to worry about". In this example, the "extra information" is the fact that this action performs IO, and the basic value is nothing, represented as "()". Monadic values are often called "actions", because the easiest way to think about the IO monad is a sequencing of actions that each might affect the outside world.

I really like this tutorial. I'm only on part 3.3 of 12, parsing, but I'm new to Haskell so I'm learning left, right & centre. The exercises are taking me hours of reading and experimenting, and it's lots of fun! ghc's errors are usually quite helpful and of course ghci is a big help as well.

I'm going to explain one of the exercises because converting between the various syntax for dealing with monads wasn't plainly obvious to me. Perhaps I wasn't paying enough attention to the docs I read. In any case if you're interested in Haskell at all, I recommend the tutorial and if you're stuck on exercise 3.3.1 like I was then come on back here. Whether you're following the tutorial or not the point of this post should stand on its own with a basic knowledge of Haskell.

Last night I rewrote parseNumber using do and >>= (bind) notations (ex. 3.3.1). Here's parseNumber using the liftM method given in the tutorial:

parseNumber :: Parser LispVal
parseNumber :: liftM (Number . read) $ many1 digit

Okay that's pretty simple right? Let's break it down, first looking at the right-hand side of the $ operator, then the left.

many1 digit reads as many decimal digits as it can.
Number . read is a function composition just like we're used to using in math. It applies read to its argument, then applies Number to that result.
liftM is concisely and effectively defined elsewhere, and I'll borrow their description:

liftM f m lets a non-monadic function f operate on the contents of monad m

liftM

's type is also quite telling: liftM :: (Monad m) => (a -> b) -> (m a -> m b)

In a nutshell liftM turns a function from a to b to a function from a monad containing a to a monad containing b.

That results in a function on the left-hand side of $, which operates on and outputs a monad. The content of the input monad is a String. The content of the output monad is a LispVal (defined earlier in the tutorial). Specifically it is a Number.

The $ acts similar to a pipe in $FAVOURITE_SHELL, and is right associative which means the expression on the right is passed to the expression (function) on the left. It's exactly the same as (liftM (Number . read)) (many1 digit) except it looks cleaner. If you know LISP or Scheme (sadly I do not) then it's analogous to the apply function.

So how does a Haskell newbie go about re-writing that using other notations which haven't even been explained in the tutorial? Clearly one must search the web and read as much as they can until they understand enough to figure it out (which is one thing I like about the tutorial). If you're lazy like me, here are 3 equivalent pieces of code for you to chew on. parseNumber's type is Parser LispVal (Parser is a monad).

Familiar liftM method:

parseNumber -> liftM (Number . read) $ many1 digit

Using do notation:

parseNumber -> do digits <- many1 digit
                  return $ (Number . read) digits

If you're thinking "Hey a return, I know that one!" then the devious masterminds behind Haskell are certainly laughing evilly right now. return simply wraps up it's argument in a monad of some sort. In this case it's the Parser monad. The return part may seem strange at first. Since many1 digit yields a monad why do we need to wrap anything? The answer is that using <- causes digits to contain a String, stripped out of the monad which resulted from many1 digit. Hence we no longer use liftM to make (Number . read) monads, and instead need to use return to properly wrap it back up in a monad.

In other words liftM eliminates the need to explicitly re-monadize the contents as is necessary using do.

Finally, using >>= (bind) notation:

parseNumber -> many1 digit >>= \digits ->
               return $ (Number . read) digits

At this point I don't think this warrants much of an explanation. The syntactic sugar provided by do should be pretty obvious. Just in case it's not, >>= passes the contents of its left argument (a monad) to the function on its right. Once again return is needed to wrap up the result and send it on its way.

When I first read about Haskell I was overwhelmed by not knowing anything, and not being able to apply my previous knowledge of programming to anything in Haskell. One piece of syntax at a time I am slowly able to understand more of the Haskell found in the wild.

I'm currently working on ex. 3.3.4, which is parsing R5RS compliant numbers (e.g. #o12345670, #xff, #d987). I'll probably write something about that once I figure it out, but in the meantime if you have any hints I'm all ears.

Update #1: I should do more proof-reading if I'm going to try and explain things. I made some changes in wording.