Writing A Lisp Interpreter In Haskell

Eugenia Loli 2006-11-01 General Development 34 Comments

“The article is about my experience with Haskell and the process of working with it. I will describe which features of Haskell I used to implement particular aspects of my interpreter, how they’re different from the object oriented world, why they helped me get things done faster, and how I had to change my program whenever my initial efforts took me to a dead end. While the jury’s still out how well Haskell performs in other domains (I’m just starting my web application project) I hope this article sparks your interest in this beautiful language and explains some things programmers new to Haskell often find confusing.” More here.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSAlert.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

34 Comments

2006-11-01 7:52 am

DevL
Interestingly, the first somewhat complete Perl6 implementation is written in Haskell.

http://www.pugscode.org/
2006-11-01 10:34 am

B. Janssen
This is just a fanboy calling out. Haskell is so übergross! Once you wrapped your head around Monads, you will never look back. I guess, Haskell and other functional languages will become more prominent with the increasing spread of multi-core systems, they just handle multi-cores so much better than the imperative siblings.

Whatever, I never did any webbased stuff with Haskell, so i will closely monitor this blog. Thanks for bringing it to my attention.

2006-11-01 11:20 am

Morin
My personal experience was: Once I wrapped my head around Monads, I realized which problems they solve better than imperative languages and why they are an absolute hell to do practical things with. In the end, Java is still better for practical programming.

Most importantly however, they gave me a lot of understanding about imperative languages. So while a practical programmer would just use Java/C#/(insert your favorite language here), someone interested in the theory behind programming languages should absolutely know about monads.

And BTW, Haskell is a functional language, unlike Lisp which just claims to be one. Lisp is no more functional than Java.

2006-11-01 11:32 am

phgt
“Lisp is no more functional than Java”

Lisp has many things in it, but Lisp is written in Lisp. The basis of Lisp is a purely functional language, upon which objects and other object-oriented or aspect-oriented concepts are built. If you are not convinced, just look at Scheme which is exactly this basic-level language, with some renaming of the function names to make the syntax easier to learn. Or look at the design of a Lisp machine.

2006-11-01 12:17 pm

ondrej
“The basis of Lisp is a purely functional language”

How is a language with setq purely functional ?

2006-11-01 8:20 pm

diginux
I hate when people get nitpicky about these sort of things.

Can’t we all just agree that the guy that compared Java to Lisp is an idiot?

2006-11-01 8:17 pm

Morin
> Lisp has many things in it, but Lisp is written in Lisp

Lisp is a language, so it isn’t written in anything. If you mean that Lisp implementations can be and are written in Lisp – same for Java (look at Joeq or IBM Jikes, IIRC). This is true for any sufficiently advanced language, so we can really leave this argument alone.

> If you are not convinced, just look at Scheme which is exactly this

> basic-level language, with some renaming of the function names to

> make the syntax easier to learn.

Please explain the effect of set! in a purely functional language. To start, explain what set! is called in Haskell.

2006-11-01 8:14 pm

kungfooguru
“Lisp is no more functional than Java.”

Oh awesome! Java allows you to pass functions as variables, return functions and now has map/filter/anoymous functions!!! Sweet deal. Can’t wait till Java adds macros.

2006-11-01 8:23 pm

Morin
> Oh awesome! Java allows you to pass functions as variables, return

> functions and now has map/filter/anoymous functions!!! Sweet deal.

> Can’t wait till Java adds macros.

I don’t think you really know what functional programming means.

2006-11-01 8:29 pm

kungfooguru
“I don’t think you really know what functional programming means.”

hahahahahaha.

Sorry, just found that extremely funny. I think it is very obvious you do not. I don’t care for flame wars. But if you compare Java to Lisp and defend it saying I don’t know what functional programming is, it makes me wonder if I should reply or just go pray for you.

Please do tell what functional programming really is. You can leave all the pesky mathematics out and just describe its relation to Java if that is easier for you.
2006-11-02 12:04 pm

Morin
[kungfooguru]

> But if you compare Java to Lisp and defend it saying I don’t know what

> functional programming is, it makes me wonder if I should reply or just

> go pray for you.

I admit I should have been a bit more precise in my reply. The features you mentioned can be found in Smalltalk, which few people would call a functional language. In fact, many call it the godfather of OOP languages. Thus, the features you mentioned do not make a language a functional language. If you disagree with this, you simply have a different understanding of functional programming than most of the world and we just have to agreee to disagree.

[rayiner]

> 1) Semantics based directly on the lambda calculus.

> 2) Lexical closures.

> 3) The use of map/filter/reduce as a general rule.

> 4) The avoidance of mutable state as a general rule.

>

> Scheme fits all of these criteria, as does ML, which was its European

> contemporary in the 1970s. Both were the original languages of the

> functional programming community, and are the archetypal

> functional languages. Both allow imperative constructs. It wasn’t

> until much later that the class of “pure” functional languages entered

> the mainstream of the FP community, adding the distinction that

> they did not support mutation. However, Scheme and ML didn’t

> suddenly become imperative languages as a result!

Thanks for an educated reply. I think you contradict to yourself here. Scheme simply *doesn’t* avoid mutable state, nor does Common Lisp. They violate one of the most fundamental principles of FP by this. You can still do pure FP in it, but you aren’t forced to, and thus lose the advantages (you cannot assume anymore that some anynomous function passed to your code is purely functional). The same applies to Java, Smalltalk, … I have used Java as an example here and not to claim it is the one and only true language.

The term functional means “like a mathematical function”. Those don’t have side effects.

If you want to bring historic arguments about what Lisp originally was: The first Lisp version, IIRC, did not have lexical closures but did dynamic name binding instead.

Lexical closures exist in many other languages too, especially Smalltalk, but to a lesser extent even Java.

map/filter/reduce are specialized functions for *list* processing, but also an application of higher-order-functions/procedures. They work analogous in functional and imperative languages and have a lot more to do with Lisp’s strong support for list processing than with functional programming.
2006-11-02 12:31 pm

ashigabou
“The features you mentioned can be found in Smalltalk, which few people would call a functional language.”

It is the first time I am hearing that smalltalk is derived from lambda calculus, or that it tries to avoid non mutable state… forbidding non mutable state is a feature of pure functional programming languages as you said earlier, but it certainly not a requirement for a functional programming language.

Saying that Lisp is more similar to Java than to ML is specious, for the least. In any classes I got in CS, it was taught as a functional language, and by people who do research in computer languages; I think it is really unlikely that I got the so-called small portion who think LISP is a FP language. I can tell however that I’ve never heard Java being called a FP language.
2006-11-02 12:57 pm

Morin
> It is the first time I am hearing that smalltalk is derived from lambda

> calculus, or that it tries to avoid non mutable state…

Misunderstanding: I was talking about first-class functions, anonymous functions, and lexical closures.

> forbidding non mutable state is a feature of pure functional

> programming languages as you said earlier, but it certainly not a

> requirement for a functional programming language.

The distinction between FP and ‘pure’ FP is useless, since ‘non-pure’ languages lose all advantages of functional programming. Maybe I should have used Smalltalk as an example instead of Java and said: Lisp is no more a functional programming language than Smalltalk – because I really cannot find any essential differences between those two languages. Java is indeed lacking some features, but those have nothing to do with functional programming.

Saying that Lisp is a functional programming language, just an impure one, is like saying that you’re still a vegetarian if you eat steaks, just an impure one. Or like saying Smalltalk is a functional language, just an impure one.

> Saying that Lisp is more similar to Java than to ML is specious, for

> the least. In any classes I got in CS, it was taught as a functional

> language

I know this. I was tought this too. The very same people were unable to explain how Lisp was more a functional PL than e.g. Java.

With Scheme, the situation is even more obvious: In the Scheme standard, the language is called an ‘algorithmic’ language, not a ‘functional’ language. I think this tells a lot.

> I can tell however that I’ve never heard

> Java being called a FP language.

For good reason, I’d say.
2006-11-02 5:48 pm

ashigabou
“The distinction between FP and ‘pure’ FP is useless, since ‘non-pure’ languages lose all advantages of functional programming.”

Well, that is your opinion, but it is certainly not what “most of the world think”. Eg there are conferences which were called LISP and functional programming; I never heard smalltalk and functional programming, etc… You are arguing that there are more differences between pure FP and FP than between eg imperative languages. Again, this is an opinion, and is certainly not the widespread one. I almost always read, heard, etc… that the key to FP is link with lambda calculus.
2006-11-03 3:39 am

kungfooguru
ARRRRRRRGHHHH

Smalltalk is a form of functional language!

You are so stuck in your definition of Functional Language. A wrong definition. You are using Pure Functional for Functional. If you want to be able to speak with other human beings you must first remove the stick.

Most of us here know how functional languages work, and their mathematical concepts. You are not the only one who has both extensively used the languagues but studied the theory behind them (I assume you have…otherwise stop fighting back). Yet you push your version of FP because you believe it to be “more correct”. Please, Please, just go back to your basement.
2006-11-01 8:31 pm

diginux
> I don’t think you really know what functional programming means.

Are you talking to yourself?
2006-11-01 8:54 pm

rayiner
I think you’re using the Java trick of claiming something isn’t of a particular type because it doesn’t fit your (completely made up) definition of that type.

There is no absolute definition of functional language. However, there are a few general criteria (which derive from historical observation):

1) Semantics based directly on the lambda calculus.

2) Lexical closures.

3) The use of map/filter/reduce as a general rule.

4) The avoidance of mutable state as a general rule.

Scheme fits all of these criteria, as does ML, which was its European contemporary in the 1970s. Both were the original languages of the functional programming community, and are the archetypal functional languages. Both allow imperative constructs. It wasn’t until much later that the class of “pure” functional languages entered the mainstream of the FP community, adding the distinction that they did not support mutation. However, Scheme and ML didn’t suddenly become imperative languages as a result!

2006-11-01 8:41 pm

rayiner
Lisp is no more functional than Java.

That sounds a lot like what Java brats say when they talk about how Java is “so much more OO” than other languages. Lisp’s semantics reduce directly to the lambda calculus. That makes it almost by definition a functional language. What it isn’t is a type-oriented functional language, like Haskell and ML. Those have become the darlings of the FP community, but they are just one branch of the FP language family.

2006-11-01 8:45 pm

kungfooguru
Thank you

My thoughts exactly, if Morin can’t admit that a language that was based on Lambda Calculus is a functional langauage he has no understanding of put FP and its history.

Edited 2006-11-01 20:46

2006-11-01 11:40 am

cwaig
When I used Haskell at uni back in the early 90’s, it was actually a much more productive language than any of the more mainstream languages I’ve used before or since. A doddle to write, easy to understand – more verifiable than the imperative languages, not as unreadable as Lisp. Fabulous language really….
2006-11-01 1:43 pm

zorobo
…until you write significant amounts of code, to make a useful program.

Type inference works wonders, and I love the stricness and the safety of the language, but in the end, when you are calling a function you wrote months ago, the parameters you have to feed it are not that crisp in your head anymore.

Type inference is great on paper and makes for terseness, granted. However terseness is good for writing programs, not so much for reading code.

Mandatory parameter types serves as documentation.

This concern also stands fo Objective Caml (think of it as a more practical, not-so-pure Haskell)

I still think highly of this family of language, and I am sure some IDE that completes the functions names and shows the expected parameter types would alleviate the problem.

Java is practical (if inelegant) but I despise the barrage of typecasts needed to use collections and the like, even with the recently introduced generics.

My fallback for the development of large, robust software: ADA.

2006-11-01 5:23 pm

B. Janssen
hashnet: Type inference is great on paper and makes for terseness, granted. However terseness is good for writing programs, not so much for reading code.

Mandatory parameter types serves as documentation.

By any means, you should declare your types. The fun part is that Haskell actually does that for you. Load GHC in interpreter mode and enter :t myFunction, copy the type declaration and paste it into your source. Then, i really suggest that you use inline code and comment to your hearts content.

2006-11-01 8:20 pm

Morin
> By any means, you should declare your types. The fun part is that

> Haskell actually does that for you. Load GHC in interpreter mode and

> enter :t myFunction, copy the type declaration and paste it into your

> source.

It would be nice if Haskell had more support. In something like Eclipse, you’d “generate type declaration” with two keys…

2006-11-01 6:27 pm

MollyC
“I still think highly of this family of language, and I am sure some IDE that completes the functions names and shows the expected parameter types would alleviate the problem.”

You might want to check out F# with MSVS.

http://research.microsoft.com/fsharp/fsharp.aspx

2006-11-01 2:19 pm

smitty_one_each
which this article skirted:

http://sigfpe.blogspot.com/2006/08/you-could-have-invented-monads-a…
2006-11-01 7:28 pm

theGrump
a difficult tool to wrap your head around because doing so means unlearning a great deal, but worth it if you can stick it out

2006-11-01 10:38 pm

renox
I don’t know.. I’ve never used Haskell, but it is many thing but *simple*.

C is simple (just above assembly language), Java is simple.

Each time I read something about Haskell, usually this an unreadable explanation why A doesn’t work but B work, A and B looking very similar.

And you know what?

Using a language which make you feel like the compiler is doing ‘magic’ is *not* a good thing: ask any C++ developer which tries to use templates.

2006-11-02 3:50 am

rayiner
In what way is C “simpler” than Haskell? Its executional semantics are hardly simpler. Try doing a little exercise: write an interpreter for the lambda-calculus heart of Haskell, and then do the same for whatever the hell is at the heart of C. Look at what ends up shorter.

And its not like C is fundamentally close to the machine. It hasn’t been since the PDP-11, if it was even then. C has a conceptual virtual machine, just like Haskell or Lisp. The major difference is that hardware and software vendors go to great lengths to make superscaler/memory-protected/multi-threaded modern hardware look like a “C machine”.

Read the sources of a modern “C software VM” (compiler, static and dynamic linker, OS virtual memory subsystem). There are hundreds of thousands of lines of code between a C program and assembly.

2006-11-02 9:58 am

ashigabou
I am not sure I am following your argument here, rayiner. static and dynamic linker has nothing to do with C per se . I won’t argue about hardware vendors making hardware look like C, I know much less than you on this; but I don’t think the number of tools between C code and assembly is a fair argument to prove the (lack of) closeness between assembly and C. If you compare the code between C and assembly, at least for trivial examples, they look really similar. Besides syntax for function, and type, C does’t give you that much. Languages like LISP are much easier to parse than C, but the code generation is not really simpler. Tools like GCC are huge beasts, sure, but that’s because they implement a hell lof of features. A compiler like tinyCC takes about 1-2 Mo uncompressed, that really not a lot code (around 100 000 for 20 characters per line, and it implements ISOC99).

2006-11-02 4:35 pm

rayiner
First, let me point out that there is a distinction to be made between abstract and concrete systems, one that I failed to make originally.

In the abstract, we can consider a C implementation that exposes just the essence of the language, and we can imagine a Lisp implementation that does the same*. What do these two systems have to abstract, relative to the machine code?

Binary encoding: Both C and Lisp must abstract the encoding of a program into binary form. This not only includes abstracting the format of machine instructions, and the layout of code and data and the tracking of labels, but also completely mundane things that even assembly programmers take for granted (thanks to high-level assemblers). Both have to emit symbol tables, constant tables, etc. Something as simple as “i = $constant” may involve allocating an entry in a constant section, entering its offset into the symbol table, and generating code to load that value into a register.

Instruction selection/Operation lowering: Both C and Lisp have to lower from abstract VM operations to machine-level operations. “a + b” and “(+ a b)” are both a single abstraction operation in the C and Lisp virtual machines, respectively. However, no hardware has an “add two numbers” operation. PowerPC has at least 9 instructions for the addition of integers alone. Both systems have to figure out exactly what “a” and “b” are, and emit the appropriate machine code. They have to figure out the types of the variables, whether they are constants, whether those constants fit into constant fields in the machine instructions, etc, then emit addi, or addc, or fadd, or any of a dozen other instructions. Lisp and C differ in where they put the runtime/compile-time split for such operations, but they end up doing largely the same work.

Register allocation: Machines don’t deal with variables, they deal with registers. Both systems have to map from variables to registers, inserting spill/load code as necessary to fit the large variable set into the small register set. On many architectures (notably RISCs), this can get complicated. What may just be a simple global integer variable reference will often invoke a non-trivial symbol tracking mechanism in the compiler to figure out exactly what the offset of that variable is in the generated binary. The compiler also has to consider addressibility issues. This means dealing with things like addressing modes in the processor, dealing with the limited size of immediate offsets in load instructions, etc.

Conditional abstraction: Both C and Lisp use the ‘if’ abstraction. x86 and PowerPC don’t have an “if” instruction. They have test instructions and condition codes. The compiler not only has to map “if” to the appropriate instructions, but has to deal with the resulting code-layout issue, laying out the basic blocks in memory so that the execution follows the right path.

Function abstraction: Both Lisp and C expose the function abstraction, and this is non-trivial too. The compiler has to define a calling convention, handle argument passing (deconflicting that with register allocation), handle stack management, etc. It also involves code-layout issues, such as where to put functions, how to find the target offset of a call, etc. On many machines, this involves things like considering the maximum byte distance of immediate jumps, etc.

Loops: C offers additional abstractions like loops. With the exception of x86, most modern machines have no ‘loop’ instruction. The compiler must lower this to variable incrementation, limit testing, and jumps. The core of Lisp doesn’t need to do this, because it doesn’t have loops. It has tail-calls to procedures, which are a minimal abstraction over jumps.

Structures: C compilers have to do structure layout, accounting for variable sizes and inserting padding to account for architectural limitations like lack of support for unaligned loads. If the Lisp in question has structures, it has to do the same thing, but in practice its job is easier because Lisp’s semantics allow structures to be simply arrays of fixed-size pointers to fields. C’s memory model does not allow such a simplification.

Memory allocation: C has malloc(), Lisp has a GC. GC is a little bit more complicated than malloc(), but the simplest concievable GC isn’t much bigger than the simplest concievable malloc().

So what does the fundamental Lisp compiler have to do that the fundamental C compiler doesn’t? Closures are one thing, but dumb closures are really easy to implement, being just a pointer to a function bundled with a pointer to a small block of memory on the heap.

So even at the abstract level, both languages are pretty far from the machine, even if you neglect the (big) issue that the machine code is itself a virtual machine, with semantics very different from what actually happens inside the processor. Certainly, both languages are closer to each other than either is to the machine code.

And that’s just at the abstract level. When you talk about real implementations, things get even more complicated. While a fundamental C compiler doesn’t need a complicated linker, or dlopen(), or virtual memory, or memory protection, using C code on real machines is a bitch without one. Meanwhile, on most Lisp implementations, the fundamental compiler mechanisms mentioned above (symbol tracking machinery, type and bounds checking machinery, code generation and loading, etc) can be re-used to serve the same purposes. On the other hand, Lisp performance can be pretty painful without a big complicated optimizer, and a big, fast, GC.

The relative distances of the two languages from the machines can be seen in the sizes of the implementations. Movitz is a small but non-trivial Lisp implementation, with a native-code compiler, simple optimizer, and substantial ANSI CL support (including OOP). Its base image is under 900KB, which compares well to TinyCC. SBCL is a full-blown, highly optimizing native-code Lisp implementation, and weighs in at about 20MB, which compares pretty well to GCC.

*) I won’t argue for Haskell here, because I know jack about implementing Haskell.
2006-11-02 6:00 pm

ashigabou
“First, let me point out that there is a distinction to be made between abstract and concrete systems, one that I failed to make originally.

In the abstract, we can consider a C implementation that exposes just the essence of the language, and we can imagine a Lisp implementation that does the same*. What do these two systems have to abstract, relative to the machine code? ”

Ok, I understand your point better now. But I still think it is not really fair, specially syntaxe wise, and I think syntax matters… a lot. For example, if I compile a simple C code, let’s say a function sum(int* in, int size) { int i; int acc = 0; for(i=0; i < n; ++i) { acc += in[i]}}, you know basically what the output will be in assembler. When you are programming in C, you have a pretty good idea on what is expensive and what is not, because, well, OS interfaces are programmed in C, in part, and also because you have a pretty good mapping between C instructions and assemblers instructions. malloc is not C, but brk is, if I remember correctly; I agree that there is not big difference between malloc and a GC.

But honestly, with basic C, you can have a pretty good idea what the ASM output would be for simple programs; for LISP, I don’t think you can. And that’s because the syntax is really similar. The whole memory allocation is not a part of the language itself; the differences between register and the likes are a bit OT in my opinion too. I don’t think I am just playing semantics (even if what we are doing here is just that): you can programm DSP in C, and many other CPU which do not have a MMU, or comlex addressing modes like recent x86. It would be darm hard to do it in LISP, I think.

2006-11-02 10:34 am

renox
>In what way is C “simpler” than Haskell?

In its interface for the programmer: its syntax being less advanced is more predictable, so simpler.

>There are hundreds of thousands of lines of code between a C program and assembly.

These layer are not necessarily C specific: high-end assembly ‘compiler’ also need linkers, etc.

And most of the time, they’re well hidden to the programmer so these layers don’t really matter..

That said, I agree that comparing C and Haskell isn’t interesting: those have too much different purpose, but comparing D vs Haskell is more interesting..

I’d pick D myself: it’s ‘less magic’.

2006-11-02 11:04 am

B. Janssen
renox:Using a language which make you feel like the compiler is doing ‘magic’ is *not* a good thing: ask any C++ developer which tries to use templates.

What magic? Haskell has a very well documented specification in EBNF which is almost on par with ALGOL. Moreover, you can actually prove the correctness of whole programs in an easy mathematical way, thanks to the side-effect free nature of this language. In imperative languages, OTOH, you need the Hoare-Calculus, to prove subroutines and this is already abitch to do. Thus, Haskell is really not magical at all, just well documented and thought out.

But i’m happy that you like D, go ahead, use it. I don’t use Haskell for all i do, who does? But Haskell is still the shiznit.

2006-11-02 8:52 pm

cintyram
I see that most of the discussion so far has centered around the history and construction perspective..

by that count, even ruby and python must be in teh fray!!

how would it be from a different perspective?

Forget about how a language is implemented, or what history it has gone through.

Assume Lisp , Haskell and Small Talk are brand new languages . How can we classify those languages based only on what can we do with those languages and what we cannot? Why is one better over other for a particular problem type ?

-Ram