Department of Business Studies: INTER PRETER

An interpreter is just a program. As input, it takes a specification of a program in some language. As output, it produces the output of the input program. By designing a new interpreter, we can invent a new language.

Languages are powerful tools for thinking. Different languages encourage dif-ferent ways of thinking and lead to different thoughts. Hence, inventing new languages is a powerful way for solving problems. We can solve a problem by de-signing a language in which it is easy to express a solution, and then implementing an interpreter for that language.

In this chapter, we explore how to implement an interpreter. We will also introduce the Python programming language, and describe an interpreter that implements a subset of the Scheme language. Implementing an interpreter further blurs the line between data and programs, that we first crossed in Chapter 3 by passing procedures as parameters and returning new procedures as results. Now that we are implementing an interpreter, all programs as just data input for our interpreter program. The meaning of the program is determined by the interpreter.

12-1

12.1 Building Languages

To implement an interpreter for a given target language we need to:

• Implement a parser that takes as input a string representation of a program in the target language and produces a structural parse of the input program. The parser should break the input string into its language components, and form a parse tree data structure that represents the input text in a structural way.

• Implement an evaluator that takes as input a structural parse of an input program, and evaluates that program. The evaluator should implement the target language’s evaluation rules.

Section 12.2 describes our parser implementation. Section 12.3 describes the im-plementation of our evaluator. The next subsection introduces the target language for our interpreter. The following subsection introduces Python, the language we will use to implement our interpreter, and for most of the rest of the programs in this book.

12.1.1 Charme

Our target language is a simple subset of Scheme we will call Charme.¹

The Charme language we will implement is simple, but powerful enough to ex-press all computations (that is, it is a universal programming language). Its eval-uation rules are a subset of the Scheme evaluation rules. It includes the applica-tion expression, conditional expression, lambda expression, name expression, and definitions. It supports integral numbers, and provides the basic arithmetic and comparison primitives with the same meanings as they have in Scheme.

¹ The original name of Scheme was “Schemer”, a successor to the languages “Planner” and “Conniver”. Because the computer on which “Schemer” was implemented only allowed six-letter file names, its name was shortened to “Scheme”. In that spirit, we name our snake-charming language, “Charmer” and shorten it to Charme. Depending on the programmer’s state of mind, the language name can be pronounced either “charm” or “char me”.

12.1.2 Python

We could implement our Charme interpreter using Scheme, but Python² is a pop-ular programming language initially designed by Guido van Rossum in 1991. Python is widely used to develop dynamic web applications and as a scripting language for applications (including the game Civilization IV). Python was used to manage special effects production for Star Wars: Episode II, and is used exten-sively at Google, reddit.com, and NASA.³

Like Scheme, Python is a universal programming language. We will define this more formally in Chapter ??, but for now, think of it as meaning that both Python and Scheme and express all computations. There is no Scheme program that does not have an equivalent Python program, and every Python program has an equiv-alent Scheme program. One piece of evidence that every Scheme program has an equivalent Python program is the interpreter we describe in this chapter. Since we can implement a Scheme interpreter in Python, we know we can express ev-ery computation that can be expressed by a Scheme program with an equivalent Python program (that is, the Scheme interpreter implemented in Python with the original Scheme program as input).

The grammar for Python is quite different from the Scheme grammar, so Python programs look very different from Scheme programs. In most respects, however, the evaluation rules are quite similar to the evaluation rules for Scheme. This sec-tion does not describe the entire Python language, but instead introduces what we need as we need it. See the Python guide for a more comprehensive introduction to the Python programming language.

12.2 Parser

The parser takes as input a Charme program string, and produces as output a Python list that encodes the structure of the input program. Charme’s syntax makes implementing the parser fairly simple, since the grammar rules for Scheme are very simple. It is not necessary to build a complex parser, since we can break an expression into its components just by using the parentheses and whitespace.

² The name Python alludes to Monty Python’s Flying Circus.

³ See http://python.org/Quotes.html for descriptions of Python uses.

The parser needs to balance the open and close parentheses that enclose expres-sions. The parsed program can be represented as a list.

For example, if the input string is “(define square (lambda (x) (* x x)))” the output of the parser should be:

[’define’,

’square’,

[ ’lambda’,

[’x’],

[’_*’, ’x’, ’x’]

]

Python provides a list datatype similar to the Scheme list datatype. Lists are de-noted using square brackets, [ and ]. Hence, the output parse structure is a list containing three elements, the first is the keyword ’define’, the second is the name ’square’, and the third is a list containing three elements, [’lambda’, [’x’], [’_*’, ’x’, ’x’]], the third of which is itself a list containing three elements.

We divide the job of parsing into two procedures that are combined to solve the problem of transforming an input string into a list describing the input program’s structure. The first part is the scanner. A scanner tokenizes an input string. Its input is the input string in the target programming language, and its output is a list of the tokens in that string.

A token is an indivisible syntactic unit. For example, in the example above the tokens are (, define, square, (, lambda, (, x, ), (, _*, x, x, ), ), and ). The tokens are separated by whitespace (spaces, tabs, and newlines). The left and right parentheses are tokens by themselves.

12.2.1 Tokenizing

The tokenize procedure below takes as input a string s in the Charme target language, and produces as output a list of the tokens in s. We describe it in detail below, but attempt to understand it on your own first.

def tokenize(s): current = ’’ tokens = [] for c in s:

if c.isspace():

if len(current) > 0: tokens.append(current) current = ’’

elif c in ’()’:

if len(current) > 0: tokens.append(current) current = ’’

tokens.append(c)

else:

current = current + c

if len(current) > 0: tokens.append(current)

return tokens

Unlike in Scheme, the whitespace (such as new lines) has meaning in Python. Statements cannot be separated into multiple lines, and only one statement may appear on a single line. Indentation within a line also matters. Instead of using parentheses to provide code structure, Python uses the indentation to group state-ments into blocks. The Python interpreter will report an error if the indentation of the code does not match its structure.

12.2.2 Statements and Expressions

Whereas Scheme programs were composed of expressions and definitions, Python programs are mostly sequences of statements. Unlike expressions which (mostly) evaluate to values, a statement does not have a value. The emphasis on state-ments reflects (and impacts) the style of programming used with Python. It is more imperative than that used with Scheme: instead of composing expressions in ways that pass the result of one expression as an operand to the next expres-sion, Python programs consist of a sequence of statements, each of which alters the state in some way towards reaching the goal state. Nevertheless, it is possible

(but not recommended) to program in Scheme using an imperative style (empha-sizing begin and set! expressions), and it is possible (but not recommended) to program in Python using a functional style (emphasizing procedure applications and eschewing assignment statement).

Defining a procedure in Python is similar to defining a procedure in Scheme, ex-cept the grammar rule is different:

ProcedureDefinition	::⇒	def Name ( Parameters ) : Block
Parameters	::⇒
Parameters	::⇒	SomeParameters
SomeParameters	::⇒	Name
SomeParameters	::⇒	Name , SomeParameters
Block	::⇒	<newline> indented(Statements)
Statements	::⇒	Statement <newline> MoreStatements
MoreStatements	::⇒	Statement <newline> MoreStatements
MoreStatements	::⇒

Since whitespace matters in Python, we include newlines (<newline>) and in-dentation in our grammar. We use

indented(elements)

to indicate that the elements are indented. For example, the rule for Block is a newline, followed by one or more statements. The statements are all indented one level inside the block’s indentation. This means it is clear when the block’s statements end because the next line is not indented to the same level.

The evaluation rule for a procedure definition is similar to the rule for the Scheme procedure definition. It defines the given name as a procedure that takes the pa-rameters as it inputs and has the body given by the statement block.

So, our procedure definition

def tokenize(s):

...

12.2. PARSER

12-7

defines a procedure named tokenize that takes a single parameter, s.

The body of the procedure uses several different types of Python statements. Fol-lowing Python’s more imperative style, five of the 12 statements are assignment statements:

Statement	::⇒	AssignmentStatement
AssignmentStatement	::⇒	Target = Expression
Target	::⇒	Name

For now, we use only a Name as the left side of an assignment, but since other constructs can appear on the left side of an assignment statement, we introduce the nonterminal Target for which additional rules can be defined to encompass other possible assignees.

The evaluation rule for an assignment statement is similar to Scheme’s evalua-tion rule for set expressions: the meaning of x = e in Python is similar to the meaning of (set! x e) in Scheme, except that the target Name need not exist before the assignment.

Evaluation Rule: Assignment. To evaluate an assignment statement, evaluate the expression, and assign the value of the expression to the place identified by the target. If no such place exists, create a new place with that name.

Like Scheme, Python supports many different kinds of expressions. The tokenize procedure uses literals, primitives, applications, arithmetic, and comparison ex-pressions.

Since Python does not use parentheses to group expressions, the grammar pro-vides the grouping by breaking down expression in several steps. This defines an order of precedence for parsing expressions. If a complex expression includes many expressions, the grammar specifies how they will be grouped. For example, consider the expression 3+4_*5. In Scheme, the expressions (+ 3 (_* 4 5)) and (_* (+ 3 4) 5) are clearly different and the parentheses group the subex-pressions. The meaning of the Python expression 3+4_*5 is (+ 3 (_* 4 5)), that is, it evaluates to 23. The expression 4_*5+3 also evaluates to 23.

12-8 CHAPTER 12. INTERPRETERS

This makes the Python grammar rules more complex since they must deal with _* and + differently, but it makes the meaning of Python expressions match our familiar mathematical interpretation, without needing all the parentheses we need in Scheme expressions. The way this is done is by defining the grammar rules so an AddExpression can contain a MultExpression as one of its subexpressions, but a MultExpression cannot contain an AddExpression. This makes the multiplica-tion operator have higher precedence than the addition operator. If an expression contains both + and _* operators, the _* operator attaches to its operands first. The replacement rules that happen first have lower precedence, since their components must be built from the remaining pieces.

Here is a subset of the grammar rules for Python expressions for comparison, multiplication, and addition expressions that achieves this:

Expression	::⇒	ComparisonExpr
ComparisonExpr	::⇒	AddExpression
ComparisonExpr	::⇒	ComparisonExpr Comparator ComparisonExpr
Comparator	::⇒< \| > \| == \| <= \| >=
AddExpression	::⇒	MultExpression
AddExpression	::⇒	AddExpression + MultExpression
AddExpression	::⇒	AddExpression - MultExpression
MultExpression	::⇒	MultExpression * PrimaryExpression
MultExpression	::⇒	PrimaryExpression
PrimaryExpression	::⇒	Literal
PrimaryExpression	::⇒	Name
PrimaryExpression	::⇒	( Expression )

Note that the last rule allows parentheses to be used to group expressions. For example, (3 + 4) _* 5 would be parsed as the primary expression, (3 + 4), times 5, so it evaluates to 35.

A Literal can be a numerical constant. Numbers in Python are similar (but not identical) to numbers in Scheme. In the example program, we use the integer literal 0.

A PrimaryExpression can also be a name, similar to names in Scheme. The evalu-ation rule for a name in Python is similar to the stateful rule for evaluating a name

12.2. PARSER

12-9

in Scheme⁴.

Exercise 12.1. Do comparison expressions have higher or lower precedence than addition expressions? Explain why using the grammar rules. ♦

Exercise 12.2. What do the following Python expressions evaluate to:

a. 1 + 2 + 3 _* 4

b. 3 > 2 + 2

c. 3 _* 6 >= 15 == 12

d. (3 _* 6 >= 15) == True

♦

12.2.3 Lists

Python provides a list datatype similar to lists in Scheme, except instead of build-ing list from simpler parts (that is, using cons pairs in Scheme), the Python list type is provided as a built-in datatype. Lists are denoted in Python using square brackets. For example, [] denotes an empty list, and [1, 2] denotes a list con-taining two elements. In the tokenize procedure, we use tokens = [] to initialize tokens to an empty list.

Elements can be selected from a list using the list subscription expression:

PrimaryExpression	::⇒	SubscriptExpression
SubscriptExpression	::⇒	PrimaryExpression [ Expression ]

If the first primary expression evaluates to a list, the subscript expression selects the element indexed by value of the inner expression from the list. For example,

⁴ There are some subtle differences and complexities (see Section 4.1 of the Python Reference Manual, however, which we do not go into here.

12-10 CHAPTER 12. INTERPRETERS

>>> a = [1, 2, 3]

>>> a[0]

>>> a[1+1]

>>> a[3]

IndexError: list index out of range

>>> [1,2][1]

So, the expression p[0] in Python is analogous to (car p) in Scheme.

We can also use negative selection indexes to select elements from the back of the list. The expression p[-1] selects the last element in the list p.

A subscript expression can also select a range of elements from the list:

SubscriptExpression	::⇒	PrimaryExpression [ Bound_Low : Bound_{H igh} ]
Bound	::⇒	Expression \|

The subscript expression evaluates to a list containing the elements between the low bound and the high bound. If the low bound is missing, the low bound is the beginning of the list. If the high bound is missing, the high bound is the end of the list. For example,

>>> a = [1, 2, 3]

>>> a[:1]

[1]

>>> a[1:]

[2, 3]

>>> a[4-2:3]

[3]

>>> a[:]

[1, 2, 3]

12.2. PARSER

12-11

So, the expression p[1:] in Python is analogous to (cdr p) in Scheme.

Python lists are mutable (the value of a list can change after it is created). We can use list subscripts as the targets for an assignment expression:

Target ::⇒ SubscriptExpression

For example,

>>> a = [1, 2, 3]

>>> a[0] = 7

>>> a

[7, 2, 3]

>>> a[1:4] = [4, 5, 6]

>>> a

[7, 4, 5, 6]

>>> a[1:] = [6]

>>> a

[7, 6]

Note that assignments can not only be used to change the values of elements in the list, but also to change the length of the list.

Pages

Sunday, 25 September 2016

INTER PRETER