collin park: Simple Python Program, Part 1 (revised)

The elder teen was talking about learning Python, and I was trying to figure out what a simple problem might be that would help her pick it up. We chatted about a few things, and I thought, how about a calculator program? It could read expressions, one per line, evaluate them, and print the value. Being a little rusty on Python syntax and such, I decided to just write a sort of "echo" program. Here's what I came up with:


    1   #!/usr/bin/python -tt
    2   # vim:et:ai:sw=4
    3   # $Header: /Users/collin/pycalc/RCS/calc.py,v 0.1 2008/01/10 01:11:34 collin Exp $
    4   #
    5   # simple calculator.  Read and process an expression per line.
    6   import sys
    7
    8   def do_one_expression(expr):
    9       # For now, let us just see if we can print a line
   10       print(aline)
   11       return
   12
   13   if __name__ == '__main__':
   14       for aline in sys.stdin:
   15           do_one_expression(aline)
   16       sys.exit(0)
   17
   18   # $Log: calc.py,v $
   19   # Revision 0.1  2008/01/10 01:11:34  collin
   20   # initial revision.  this one double-spaces the input.
   21   #
   22   # $EndLog$

OK, let me go through this. You can safely ignore lines 1-5.

I want to read all the lines from "stdin" -- i.e., the sort of usual input for the program -- which I remembered in Python is called "sys.stdin"; hence I have to type "import sys".

Then I write the sort of workhorse routine, "do_one_expression" -- for this first version, this routine will simply print its input parameter and return. Remember, I'm just re-learning the syntax.

Finally, lines 13-16 will send "aline" to the workhorse routine, until we run out of input, at which point we exit.

What is that "__name__" business on line 13? Not needed here, strictly speaking, but this is useful if you ever want to run the Python debugger on the code. Line 14 is Python's way of iterating over all the lines in the standard input.

Line 16 is probably superfluous, but it's my habit and it doesn't seem to hurt anything.

So I ran this program, feeding it a file that looked like this:

line1 two number three

and it gave me this:

line1 two number three

That's right, it was double-spaced.

I remembered then that when you read from sys.stdin the way I do here, you get the line with its terminating newline character. Since the 'print" statement adds its own newline, I was seeing double-spaced output. So rather than calling "do_one_expression(aline)", I tried saying "do_one_expression(aline.rstrip())" but nothing changed.

What happened? Take a look at line 10; it prints the (global) variable "aline" -- not the parameter "expr"! Having fixed that, the program now looks like this:


    1   #!/usr/bin/python -tt
    2   # vim:et:ai:sw=4
    3   # $Header: /Users/collin/pycalc/RCS/calc.py,v 0.2 2008/01/10 01:17:25 collin Exp $
    4   #
    5   # simple calculator.  Read and process an expression per line.
    6   import string 
    7   import sys
    8
    9   def do_one_expression(expr):
   10       # For now, let us just see if we can print a line
   11       print(expr) 
   12       return
   13
   14   if __name__ == '__main__':
   15       for aline in sys.stdin:
   16           do_one_expression(aline.rstrip()) 
   17       sys.exit(0)
   18
   19   # $Log: calc.py,v $
   20   # Revision 0.2  2008/01/10 01:17:25  collin
   21   # Solved two problems:  First, the subroutine do_one_expression was
   22   # printing aline, not its parameter!  Second, we now kill trailing
   23   # blanks from the string read.
   24   # 
   25   # Revision 0.1  2008/01/10 01:11:34  collin
   26   # initial revision.  this one double-spaces the input.
   27   #
   28   # $EndLog$

By the way, the lines with the yellow background are lines that were added or changed from the previous version.

Okay -- now it can read lines and print them out without double-spacing them. How about doing something useful? Let's see if we can break the line into tokens. Since we are not going to do anything very fancy, maybe something like the string routine "split" will work. If you're not sure (as I wasn't) whether "split" would handle multiple spaces or tabs, you could try this:


% python
Python 2.3.5 (#1, Aug 22 2005, 22:13:23) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1809)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "  oneword  after_tab   after_spcs  17 21  ".split()
['oneword', 'after_tab', 'after_spcs', '17', '21']
>>>

Cool. It even strips leading and trailing whitespace! So if I put in a string like

123 456   +

then we should be able to "split" it and get an array containing "123", "456", and "+".

We could do something like "atoi(3)" on each of the numeric strings (to convert strings like "123" into a number, i.e., 123), and look at the operator ("+" in this case) to decide what to do with the numbers.

I thought I'd use a sort of RPN syntax, so that I could put only numbers on the stack (rather than numbers and operators), and I wouldn't have to deal with parentheses. So the flow for do_one_expression would be something like this:


   split the expression into 'tokens'
   for all tokens in the expression (the line):
      if it's numeric
         transform into a number
         push it onto the stack
      if it's "+"
         pop 2 numbers off the stack
         add them
         push the sum onto the stack
      if it's "-"
         do the analogous operations...
   When tokens exhausted, print the top item from stack.

I looked in the Python library reference manual (in your distribution or google it) for "atoi" and found:

Deprecated since release 2.0. Use the int() built-in function.

So that's what I'll use for converting strings to numbers. Here's the simple integer calculator:


    1   #!/usr/bin/python -tt
    2   # vim:et:ai:sw=4
    3   # $Header: /Users/collin/pycalc/RCS/calc.py,v 0.3 2008/01/10 01:35:50 collin Exp $
    4   #
    5   # simple calculator.  Read and process an expression per line.
    6   # Make that an RPN calculator.  No (parens)
    7
    8   import string
    9   import sys
   10
   11   def do_one_expression(expr):
   12       tokens = expr.split()
   13       nstk = []
   14       for atoken in tokens:
   15           if atoken.isdigit():
   16               nstk.append(int(atoken))
   17               continue
   18           if atoken == '+':
   19               second = nstk.pop()
   20               first = nstk.pop()
   21               nstk.append(first + second)
   22               continue
   23           if atoken == '-':
   24               second = nstk.pop()
   25               first = nstk.pop()
   26               nstk.append(first - second)
   27               continue
   28           # If we get here, unsupported operator.
   29           print "** Unknown operator: '" + atoken + "'; ignored."
   30           continue
   31       # Done with tokens.  
   32       print "Problem:" , expr
   33       print "    answer:", nstk.pop()
   34       return
   35
   36   if __name__ == '__main__':
   37       for aline in sys.stdin:
   38           do_one_expression(aline.rstrip().strip()) 
   39       sys.exit(0)
   40
   41   # $Log: calc.py,v $
   42   # Revision 0.3  2008/01/10 01:35:50  collin
   43   # this one does addition and subtraction, integers only
   44   #
   45   # Revision 0.2  2008/01/10 01:17:25  collin
   46   # Solved two problems:  First, the subroutine do_one_expression was
   47   # printing aline, not its parameter!  Second, we now kill trailing
   48   # blanks from the string read.
   49   #
   50   # Revision 0.1  2008/01/10 01:11:34  collin
   51   # initial revision.  this one double-spaces the input.
   52   #
   53   # $EndLog$

As you can see, lines 12-33 form the bulk of the changes. Line 12 splits the line into tokens, as described earlier. The stack is kept in a Python "list", which is defined and initialized at line 13. (It gets newly minted every time do_one_expression gets called.)

Line 15-17 handle the case where the "token" is a number. I determined (by experiment) that the string function "isdigit" (line 15) returns True when the entire string consists only of digits. So if atoken is all-numeric, it represents a number, and so int(atoken) is its numeric value. We push it onto the end of the list by calling nstk.append() (line 16). The continue statement at line 17 says that's it for this pass through the loop, so we go back to the top of the loop.

Lines 18-22 process the addition operation. If we see that the present token is a '+', then we pop two elements off the stack (lines 19-20), and push (or append) their sum onto the stack in line 21.

Now in 19-20 it doesn't matter which order I pop the elements off the stack, but in lines 24-25 it does. Imagine how this code handles the expression "7 3 -":

we push 7 onto the stack;
we push 3 onto the stack;
when we see the "-" we pop 3... we set
second = 3
and likewise set
first = 7
then, executing line 26, we append (or "push") first-second (i.e., 7-3→4) onto the stack.

So although I could have replaced lines 19-21 with
nstk.append(nstk.pop() + nstk.pop())
it would not be safe to replace lines 24-26 with such a line.

Line 29 handles the case where neither a number nor a known operator was supplied. Rather than completely giving up, though, the program continues. Hence if the user types "7 4 should_be_3 -" we'll still eventually print the answer.

Next time I'll describe how to make the program handle floating-point numbers, and also how to make it a little more interactive.

collin park

Saturday, January 12, 2008

Simple Python Program, Part 1 (revised)

No comments:

more selected postings

Who is this guy?

Links

Blog Archive