An interpreter is just a
program. As input, it takes a specification of a program in some language. As
output, it produces the output of the input program. By designing a new
interpreter, we can invent a new language.
Languages are powerful
tools for thinking. Different languages encourage dif-ferent ways of thinking
and lead to different thoughts. Hence, inventing new languages is a powerful
way for solving problems. We can solve a problem by de-signing a language in
which it is easy to express a solution, and then implementing an interpreter
for that language.
In this chapter, we explore
how to implement an interpreter. We will also introduce the Python programming
language, and describe an interpreter that implements a subset of the Scheme
language. Implementing an interpreter further blurs the line between data and
programs, that we first crossed in Chapter 3 by passing procedures as
parameters and returning new procedures as results. Now that we are
implementing an interpreter, all programs as just data input for our
interpreter program. The meaning of the program is determined by the
interpreter.
12-1
12.1 Building Languages
To implement an interpreter for a given target
language we need to:
• Implement a parser that
takes as input a string representation of a program in the target language and
produces a structural parse of the input program. The parser should break the
input string into its language components, and form a parse tree data structure
that represents the input text in a structural way.
• Implement an evaluator that
takes as input a structural parse of an input program, and evaluates that
program. The evaluator should implement the target language’s evaluation rules.
Section 12.2 describes our
parser implementation. Section 12.3 describes the im-plementation of our
evaluator. The next subsection introduces the target language for our
interpreter. The following subsection introduces Python, the language we will
use to implement our interpreter, and for most of the rest of the programs in
this book.
12.1.1 Charme
Our target language is a simple subset of Scheme
we will call Charme.1
The Charme language we will
implement is simple, but powerful enough to ex-press all computations (that is,
it is a universal programming language). Its eval-uation rules are a subset of
the Scheme evaluation rules. It includes the applica-tion expression,
conditional expression, lambda expression, name expression, and definitions. It
supports integral numbers, and provides the basic arithmetic and comparison
primitives with the same meanings as they have in Scheme.

1 The original name of
Scheme was “Schemer”, a successor to the languages “Planner” and “Conniver”.
Because the computer on which “Schemer” was implemented only allowed six-letter
file names, its name was shortened to “Scheme”. In that spirit, we name our
snake-charming language, “Charmer” and shorten it to Charme. Depending on the
programmer’s state of mind, the language name can be pronounced either “charm”
or “char me”.
|
|
12.1.2 Python
We could implement our Charme interpreter using
Scheme, but Python2 is a pop-ular programming language initially
designed by Guido van Rossum in 1991. Python is widely used to develop dynamic
web applications and as a scripting language for applications (including the
game Civilization IV). Python was used to manage special effects production for
Star Wars: Episode II, and is used exten-sively at Google, reddit.com, and
NASA.3
Like Scheme, Python is a
universal programming language. We will define this more formally in Chapter
??, but for now, think of it as meaning that both Python and Scheme and express
all computations. There is no Scheme program that does not have an equivalent
Python program, and every Python program has an equiv-alent Scheme program. One
piece of evidence that every Scheme program has an equivalent Python program is
the interpreter we describe in this chapter. Since we can implement a Scheme
interpreter in Python, we know we can express ev-ery computation that can be expressed
by a Scheme program with an equivalent Python program (that is, the Scheme
interpreter implemented in Python with the original Scheme program as input).
The grammar for Python is
quite different from the Scheme grammar, so Python programs look very different
from Scheme programs. In most respects, however, the evaluation rules are quite
similar to the evaluation rules for Scheme. This sec-tion does not describe the
entire Python language, but instead introduces what we need as we need it. See
the Python guide for a more comprehensive introduction to the Python
programming language.
12.2 Parser
The parser takes as input a
Charme program string, and produces as output a Python list that encodes the
structure of the input program. Charme’s syntax makes implementing the parser
fairly simple, since the grammar rules for Scheme are very simple. It is not
necessary to build a complex parser, since we can break an expression into its
components just by using the parentheses and whitespace.

2 The
name Python alludes to Monty Python’s Flying Circus.
3 See
http://python.org/Quotes.html for descriptions of Python uses.
The parser needs to balance
the open and close parentheses that enclose expres-sions. The parsed program
can be represented as a list.
For example, if the input
string is “(define square (lambda (x) (* x x)))” the output of the parser
should be:
[’define’,
’square’,
[ ’lambda’,
[’x’],
[’*’, ’x’, ’x’]
]
]
Python provides a list
datatype similar to the Scheme list datatype. Lists are de-noted using square
brackets, [ and ]. Hence, the output parse structure is a list containing three elements,
the first is the keyword ’define’, the second is the name ’square’, and the third is a list
containing three elements, [’lambda’, [’x’], [’*’,
’x’, ’x’]],
the third of which is itself a list containing three elements.
We divide the job of
parsing into two procedures that are combined to solve the problem of
transforming an input string into a list describing the input program’s
structure. The first part is the scanner. A scanner tokenizes an input string.
Its input is the input string in the target programming language, and its
output is a list of the tokens in that string.
A token is an indivisible
syntactic unit. For example, in the example above the tokens are (, define, square, (, lambda, (, x, ), (, *, x, x, ), ), and ). The tokens are separated
by whitespace (spaces, tabs, and newlines). The left and right parentheses are
tokens by themselves.
12.2.1 Tokenizing
The tokenize procedure below takes as
input a string s in the Charme target language, and produces as output a list of the
tokens in s. We describe it in detail below, but attempt to understand it on your
own first.
|
|
def tokenize(s): current = ’’ tokens = [] for c in s:
if c.isspace():
if len(current) > 0: tokens.append(current) current = ’’
elif c in
’()’:
if len(current) > 0: tokens.append(current) current = ’’
tokens.append(c)
else:
current =
current + c
if
len(current) > 0: tokens.append(current)
return tokens
Unlike in Scheme, the
whitespace (such as new lines) has meaning in Python. Statements cannot be
separated into multiple lines, and only one statement may appear on a single
line. Indentation within a line also matters. Instead of using parentheses to
provide code structure, Python uses the indentation to group state-ments into
blocks. The Python interpreter will report an error if the indentation of the
code does not match its structure.
12.2.2 Statements and Expressions
Whereas Scheme programs
were composed of expressions and definitions, Python programs are mostly
sequences of statements. Unlike expressions which (mostly) evaluate to values,
a statement does not have a value. The emphasis on state-ments reflects (and
impacts) the style of programming used with Python. It is more imperative than
that used with Scheme: instead of composing expressions in ways that pass the
result of one expression as an operand to the next expres-sion, Python programs
consist of a sequence of statements, each of which alters the state in some way
towards reaching the goal state. Nevertheless, it is possible
(but not recommended) to program in Scheme
using an imperative style (empha-sizing begin and set! expressions), and it is
possible (but not recommended) to program in Python using a functional style
(emphasizing procedure applications and eschewing assignment statement).
Defining a procedure in
Python is similar to defining a procedure in Scheme, ex-cept the grammar rule
is different:
ProcedureDefinition
|
::⇒
|
def Name ( Parameters ) : Block
|
Parameters
|
::⇒
|
|
Parameters
|
::⇒
|
SomeParameters
|
SomeParameters
|
::⇒
|
Name
|
SomeParameters
|
::⇒
|
Name , SomeParameters
|
Block
|
::⇒
|
<newline> indented(Statements)
|
Statements
|
::⇒
|
Statement <newline>
MoreStatements
|
MoreStatements
|
::⇒
|
Statement <newline>
MoreStatements
|
MoreStatements
|
::⇒
|
|
Since whitespace matters in
Python, we include newlines (<newline>) and in-dentation in our grammar.
We use
indented(elements)
to indicate that the
elements are indented. For example, the rule for Block is a newline, followed
by one or more statements. The statements are all indented one level inside the
block’s indentation. This means it is clear when the block’s statements end
because the next line is not indented to the same level.
The evaluation rule for a
procedure definition is similar to the rule for the Scheme procedure
definition. It defines the given name as a procedure that takes the pa-rameters
as it inputs and has the body given by the statement block.
So, our procedure definition
def
tokenize(s):
...
12.2. PARSER
|
12-7
|
defines a procedure named tokenize that takes a single parameter, s.
The body of the
procedure uses several different types of Python statements. Fol-lowing
Python’s more imperative style, five of the 12 statements are assignment
statements:
Statement
|
::⇒
|
AssignmentStatement
|
AssignmentStatement
|
::⇒
|
Target = Expression
|
Target
|
::⇒
|
Name
|
For now, we use
only a Name as the left side of an assignment, but since other constructs can
appear on the left side of an assignment statement, we introduce the nonterminal
Target for which additional rules can be defined to encompass other possible
assignees.
The evaluation
rule for an assignment statement is similar to Scheme’s evalua-tion rule for
set expressions: the meaning of x = e in Python is similar to the meaning of (set!
x e) in Scheme,
except that the target Name need not exist before the assignment.
Evaluation
Rule: Assignment. To evaluate an assignment statement, evaluate the expression,
and assign the value of the expression to the place identified by the target.
If no such place exists, create a new place with that name.
Like Scheme, Python supports many different
kinds of expressions. The tokenize procedure uses literals, primitives,
applications, arithmetic, and comparison ex-pressions.
Since Python
does not use parentheses to group expressions, the grammar pro-vides the
grouping by breaking down expression in several steps. This defines an order of
precedence for parsing expressions. If a complex expression includes many
expressions, the grammar specifies how they will be grouped. For example,
consider the expression 3+4*5. In Scheme, the
expressions (+ 3 (* 4 5)) and (* (+
3 4) 5)
are clearly different and the parentheses group the subex-pressions. The
meaning of the Python expression 3+4*5 is (+
3 (* 4 5)), that is, it evaluates to 23. The expression 4*5+3 also evaluates to 23.
12-8 CHAPTER 12. INTERPRETERS
This makes the
Python grammar rules more complex since they must deal with * and + differently, but it makes
the meaning of Python expressions match our familiar mathematical
interpretation, without needing all the parentheses we need in Scheme
expressions. The way this is done is by defining the grammar rules so an
AddExpression can contain a MultExpression as one of its subexpressions, but a
MultExpression cannot contain an AddExpression. This makes the multiplica-tion
operator have higher precedence than the addition operator. If an expression
contains both + and * operators, the * operator attaches to its
operands first. The replacement rules that happen first have lower precedence,
since their components must be built from the remaining pieces.
Here is a
subset of the grammar rules for Python expressions for comparison,
multiplication, and addition expressions that achieves this:
Expression
|
::⇒
|
ComparisonExpr
|
ComparisonExpr
|
::⇒
|
AddExpression
|
ComparisonExpr
|
::⇒
|
ComparisonExpr Comparator
ComparisonExpr
|
Comparator
|
::⇒< |
> | == | <= | >=
|
|
AddExpression
|
::⇒
|
MultExpression
|
AddExpression
|
::⇒
|
AddExpression + MultExpression
|
AddExpression
|
::⇒
|
AddExpression - MultExpression
|
MultExpression
|
::⇒
|
MultExpression * PrimaryExpression
|
MultExpression
|
::⇒
|
PrimaryExpression
|
PrimaryExpression
|
::⇒
|
Literal
|
PrimaryExpression
|
::⇒
|
Name
|
PrimaryExpression
|
::⇒
|
( Expression )
|
Note that the
last rule allows parentheses to be used to group expressions. For example, (3
+ 4) * 5 would be parsed as the
primary expression, (3 + 4), times 5, so it evaluates to 35.
A Literal can
be a numerical constant. Numbers in Python are similar (but not identical) to
numbers in Scheme. In the example program, we use the integer literal 0.
A
PrimaryExpression can also be a name, similar to names in Scheme. The
evalu-ation rule for a name in Python is similar to the stateful rule for
evaluating a name
12.2. PARSER
|
12-9
|
in Scheme4.
Exercise 12.1.
Do comparison expressions have higher or lower precedence than addition
expressions? Explain why using the grammar rules. ♦
Exercise 12.2. What do the following Python
expressions evaluate to:
a. 1 + 2 + 3 * 4
b. 3 > 2 + 2
c. 3
* 6 >= 15 == 12
d. (3
* 6 >= 15) == True
♦
12.2.3 Lists
Python provides
a list datatype similar to lists in Scheme, except instead of build-ing list
from simpler parts (that is, using cons pairs in Scheme), the Python list type is
provided as a built-in datatype. Lists are denoted in Python using square
brackets. For example, [] denotes an empty list, and [1, 2] denotes a list con-taining two elements. In the
tokenize
procedure, we use tokens = [] to initialize tokens to an empty list.
Elements can be selected from a list using the
list subscription expression:
PrimaryExpression
|
::⇒
|
SubscriptExpression
|
SubscriptExpression
|
::⇒
|
PrimaryExpression [
Expression ]
|
If the first primary
expression evaluates to a list, the subscript expression selects the element
indexed by value of the inner expression from the list. For example,

4 There
are some subtle differences and complexities (see Section 4.1 of the Python
Reference Manual, however, which we do not go into here.
12-10 CHAPTER 12. INTERPRETERS
>>> a = [1, 2, 3]
>>> a[0]
1
>>> a[1+1]
3
>>> a[3]
IndexError: list index out of range
>>> [1,2][1]
2
So, the expression p[0] in Python is analogous to (car p) in Scheme.
We can also use
negative selection indexes to select elements from the back of the list. The
expression p[-1] selects the last element in the list p.
A subscript expression can also select a range of elements from the
list:
SubscriptExpression
|
::⇒
|
PrimaryExpression [ BoundLow : BoundH igh ]
|
Bound
|
::⇒
|
Expression |
|
The subscript
expression evaluates to a list containing the elements between the low bound
and the high bound. If the low bound is missing, the low bound is the beginning
of the list. If the high bound is missing, the high bound is the end of the
list. For example,
>>> a
= [1, 2, 3]
>>> a[:1]
[1]
>>> a[1:]
[2, 3]
>>> a[4-2:3]
[3]
>>> a[:]
[1, 2, 3]
12.2. PARSER
|
12-11
|
So, the expression p[1:] in Python is analogous to (cdr p) in Scheme.
Python lists are mutable
(the value of a list can change after it is created). We can use list
subscripts as the targets for an assignment expression:
Target ::⇒ SubscriptExpression
For example,
>>> a = [1, 2, 3]
>>> a[0]
= 7
>>> a
[7, 2, 3]
>>> a[1:4]
= [4, 5, 6]
>>> a
[7, 4, 5, 6]
>>> a[1:] = [6]
>>> a
[7, 6]
Note that assignments can
not only be used to change the values of elements in the list, but also to
change the length of the list.