Saturday, April 21, 2012

Why “#!/usr/bin/python -utt”?

From a colleague's questions I see that my explanation in this post from last year wasn't quite clear, so I'll try to make this one more complete.

First let's talk about tabs. The Python style guide (also known as "PEP 8" [link]), recommends using spaces only, particularly for new projects:

Tabs or Spaces?

    Never mix tabs and spaces.

    The most popular way of indenting Python is with spaces only.  The
    second-most popular way is with tabs only.  Code indented with a mixture
    of tabs and spaces should be converted to using spaces exclusively.  When
    invoking the Python command line interpreter with the -t option, it issues
    warnings about code that illegally mixes tabs and spaces.  When using -tt
    these warnings become errors.  These options are highly recommended!

    For new projects, spaces-only are strongly recommended over tabs.  Most
    editors have features that make this easy to do.
You can see how this works in a very short script like:
if 1+1==2:
        print "I'm still here"
<tab>print "and math still works"
If that's saved in two.py you can run it with python two.py and you should see the two strings:
$ python two.py
I'm still here
and math still works
With "-t" we get a warning about tabs and spaces:
$ python -t two.py
two.py: inconsistent use of tabs and spaces in indentation
I'm still here
and math still works
With "-tt" (two "t"s) the warning becomes an error:
$ python -tt two.py
  File "two.py", line 3
    print "and math still works"
                               ^
TabError: inconsistent use of tabs and spaces in indentation

How about that “-u”?

This one is a little more complicated. In a lot of computer programs, when you "print" something, it doesn't display the characters on the screen right away. Instead, it saves them up in an output buffer; it buffers them. Usually this is a good thing, because the program can't display characters all by itself; it has to call upon the system to do it, and it's cheaper to call the system once to display an entire line, rather than calling the system e.g. 42 times (once for each character on the line).

But there are at least two situations where this buffering isn't desirable. One case is when you want to leave the cursor at the end of the line for a while. Here's a silly example:

from time import sleep
print "waiting five seconds...",
sleep(5)
print "done."
If you run this simply as "python sleep.py" then nothing happens for about five seconds, then the line
waiting five seconds... done.
appears all at once. If it's run as "python -u sleep.py" then the first thing you see is the "waiting five seconds..." part. Then about 5 seconds later, "done." appears.

Another case where buffered output isn't so nice is if your process fork()s (copies the current process, creating a "child" process) with un-displayed characters left in the buffer. Here's an example of weird behavior:

from time import sleep
from os import fork
print "after sleeping, let's try a fork...",
sleep(5)
pid = fork()
if pid == 0:
    print "in child process"
else:
    print "in parent process; child is pid", pid
When we run it as "python sleep.py", we get this:
$ python sleep.py
after sleeping, let's try a fork... in parent process; child is pid 7370
after sleeping, let's try a fork... in child process
Why? Because at the time we call fork(), the output buffer has "after sleeping, let's try a fork..." -- which isn't displayed yet in the terminal window. When the parent adds "in parent process; child is pid 7370" and terminates the output with a newline character (by default), the output buffer is flushed—its contents are written to your display. But that only flushes the buffer of the parent process; the child's buffer still has those characters.

Thus, when the child process runs, an extra copy of "after sleeping, let's try a fork..." appears on the screen.

What if we run it as "python -u sleep.py"?

$ python -u sleep.py
after sleeping, let's try a fork... in parent process; child is pid 7367
 in child process
If you run this yourself, you'll see "after sleeping, let's try a fork..." appear almost immediately (i.e., you don't have to wait 5 seconds for it). When the process executes fork(), the buffer has already been flushed; that's why we don't see two copies of the "after sleeping..." part.

Why is the "child process" line indented by one space? Well, we didn't retain "after sleeping..." in the buffer when we created the child process, but we did retain the knowledge that we weren't at the start of a new line. Thus, the child process thinks we're not at the beginning of a new line, so it prepends a space to "in child process" before displaying it, just as it would if we were truly in the middle of a line. This is just the way Python's print statement works.

I hope that made sense. And since my Python scripts don't have a whole lot of output to the display, it doesn't hurt much to use "-u" all the time. Combining that with Guido's recommended "-tt" option... that's why I pretty much always have "-utt" in my Python scripts.

No comments: