Saturday, January 12, 2008

Simple Python Program, Part 1 (revised)

The elder teen was talking about learning Python, and I was trying to figure out what a simple problem might be that would help her pick it up. We chatted about a few things, and I thought, how about a calculator program? It could read expressions, one per line, evaluate them, and print the value. Being a little rusty on Python syntax and such, I decided to just write a sort of "echo" program. Here's what I came up with:

1 #!/usr/bin/python -tt
2 # vim:et:ai:sw=4
3 # $Header: /Users/collin/pycalc/RCS/calc.py,v 0.1 2008/01/10 01:11:34 collin Exp $
4 #
5 # simple calculator. Read and process an expression per line.
6 import sys
7
8 def do_one_expression(expr):
9 # For now, let us just see if we can print a line
10 print(aline)
11 return
12
13 if __name__ == '__main__':
14 for aline in sys.stdin:
15 do_one_expression(aline)
16 sys.exit(0)
17
18 # $Log: calc.py,v $
19 # Revision 0.1 2008/01/10 01:11:34 collin
20 # initial revision. this one double-spaces the input.
21 #
22 # $EndLog$

OK, let me go through this. You can safely ignore lines 1-5.

I want to read all the lines from "stdin" -- i.e., the sort of usual input for the program -- which I remembered in Python is called "sys.stdin"; hence I have to type "import sys".

Then I write the sort of workhorse routine, "do_one_expression" -- for this first version, this routine will simply print its input parameter and return. Remember, I'm just re-learning the syntax.

Finally, lines 13-16 will send "aline" to the workhorse routine, until we run out of input, at which point we exit.

What is that "__name__" business on line 13? Not needed here, strictly speaking, but this is useful if you ever want to run the Python debugger on the code. Line 14 is Python's way of iterating over all the lines in the standard input.

Line 16 is probably superfluous, but it's my habit and it doesn't seem to hurt anything.

So I ran this program, feeding it a file that looked like this:
line1
two
number three
and it gave me this:
line1

two

number three
That's right, it was double-spaced.

I remembered then that when you read from sys.stdin the way I do here, you get the line with its terminating newline character. Since the 'print" statement adds its own newline, I was seeing double-spaced output. So rather than calling "do_one_expression(aline)", I tried saying "do_one_expression(aline.rstrip())" but nothing changed.

What happened? Take a look at line 10; it prints the (global) variable "aline" -- not the parameter "expr"! Having fixed that, the program now looks like this:

1 #!/usr/bin/python -tt
2 # vim:et:ai:sw=4
3 # $Header: /Users/collin/pycalc/RCS/calc.py,v 0.2 2008/01/10 01:17:25 collin Exp $
4 #
5 # simple calculator. Read and process an expression per line.
6 import string
7 import sys
8
9 def do_one_expression(expr):
10 # For now, let us just see if we can print a line
11 print(expr)
12 return
13
14 if __name__ == '__main__':
15 for aline in sys.stdin:
16 do_one_expression(aline.rstrip())
17 sys.exit(0)
18
19 # $Log: calc.py,v $
20 # Revision 0.2 2008/01/10 01:17:25 collin
21 # Solved two problems: First, the subroutine do_one_expression was
22 # printing aline, not its parameter! Second, we now kill trailing
23 # blanks from the string read.
24 #

25 # Revision 0.1 2008/01/10 01:11:34 collin
26 # initial revision. this one double-spaces the input.
27 #
28 # $EndLog$

By the way, the lines with the yellow background are lines that were added or changed from the previous version.

Okay -- now it can read lines and print them out without double-spacing them. How about doing something useful? Let's see if we can break the line into tokens. Since we are not going to do anything very fancy, maybe something like the string routine "split" will work. If you're not sure (as I wasn't) whether "split" would handle multiple spaces or tabs, you could try this:

% python
Python 2.3.5 (#1, Aug 22 2005, 22:13:23)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1809)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> " oneword after_tab after_spcs 17 21 ".split()
['oneword', 'after_tab', 'after_spcs', '17', '21']
>>>

Cool. It even strips leading and trailing whitespace! So if I put in a string like
123 456   +
then we should be able to "split" it and get an array containing "123", "456", and "+".

We could do something like "atoi(3)" on each of the numeric strings (to convert strings like "123" into a number, i.e., 123), and look at the operator ("+" in this case) to decide what to do with the numbers.

I thought I'd use a sort of RPN syntax, so that I could put only numbers on the stack (rather than numbers and operators), and I wouldn't have to deal with parentheses. So the flow for do_one_expression would be something like this:

split the expression into 'tokens'
for all tokens in the expression (the line):
if it's numeric
transform into a number
push it onto the stack
if it's "+"
pop 2 numbers off the stack
add them
push the sum onto the stack
if it's "-"
do the analogous operations...
When tokens exhausted, print the top item from stack.

I looked in the Python library reference manual (in your distribution or google it) for "atoi" and found:
Deprecated since release 2.0. Use the int() built-in function.
So that's what I'll use for converting strings to numbers. Here's the simple integer calculator:

1 #!/usr/bin/python -tt
2 # vim:et:ai:sw=4
3 # $Header: /Users/collin/pycalc/RCS/calc.py,v 0.3 2008/01/10 01:35:50 collin Exp $
4 #
5 # simple calculator. Read and process an expression per line.
6 # Make that an RPN calculator. No (parens)
7
8 import string
9 import sys
10
11 def do_one_expression(expr):
12 tokens = expr.split()
13 nstk = []
14 for atoken in tokens:
15 if atoken.isdigit():
16 nstk.append(int(atoken))
17 continue
18 if atoken == '+':
19 second = nstk.pop()
20 first = nstk.pop()
21 nstk.append(first + second)
22 continue
23 if atoken == '-':
24 second = nstk.pop()
25 first = nstk.pop()
26 nstk.append(first - second)
27 continue
28 # If we get here, unsupported operator.
29 print "** Unknown operator: '" + atoken + "'; ignored."
30 continue
31 # Done with tokens.
32 print "Problem:" , expr
33 print " answer:", nstk.pop()

34 return
35
36 if __name__ == '__main__':
37 for aline in sys.stdin:
38 do_one_expression(aline.rstrip().strip())
39 sys.exit(0)
40
41 # $Log: calc.py,v $
42 # Revision 0.3 2008/01/10 01:35:50 collin
43 # this one does addition and subtraction, integers only
44 #

45 # Revision 0.2 2008/01/10 01:17:25 collin
46 # Solved two problems: First, the subroutine do_one_expression was
47 # printing aline, not its parameter! Second, we now kill trailing
48 # blanks from the string read.
49 #
50 # Revision 0.1 2008/01/10 01:11:34 collin
51 # initial revision. this one double-spaces the input.
52 #
53 # $EndLog$

As you can see, lines 12-33 form the bulk of the changes. Line 12 splits the line into tokens, as described earlier. The stack is kept in a Python "list", which is defined and initialized at line 13. (It gets newly minted every time do_one_expression gets called.)

Line 15-17 handle the case where the "token" is a number. I determined (by experiment) that the string function "isdigit" (line 15) returns True when the entire string consists only of digits. So if atoken is all-numeric, it represents a number, and so int(atoken) is its numeric value. We push it onto the end of the list by calling nstk.append() (line 16). The continue statement at line 17 says that's it for this pass through the loop, so we go back to the top of the loop.

Lines 18-22 process the addition operation. If we see that the present token is a '+', then we pop two elements off the stack (lines 19-20), and push (or append) their sum onto the stack in line 21.

Now in 19-20 it doesn't matter which order I pop the elements off the stack, but in lines 24-25 it does. Imagine how this code handles the expression "7 3 -":
  • we push 7 onto the stack;
  • we push 3 onto the stack;
  • when we see the "-" we pop 3... we set
    second = 3
    and likewise set
    first = 7
  • then, executing line 26, we append (or "push") first-second (i.e., 7-3→4) onto the stack.
So although I could have replaced lines 19-21 with
nstk.append(nstk.pop() + nstk.pop())
it would not be safe to replace lines 24-26 with such a line.

Line 29 handles the case where neither a number nor a known operator was supplied. Rather than completely giving up, though, the program continues. Hence if the user types "7 4 should_be_3 -" we'll still eventually print the answer.

Next time I'll describe how to make the program handle floating-point numbers, and also how to make it a little more interactive.

No comments: