Saturday, May 30, 2009

pydoc for friendly up-to-date documentation

Suppose you have a bunch of Python scripts that lots of people will use. You make them easy to use by providing a help message. Like if they want to run my little script that "solves" the Plext® game, they type
$ ./plext1.py -h
or they just type the script name without parameters, and the script tells them they have to provide this or that parameter. Here's a screen-scrape:
$ ./plext1.py -help
./plext1.py: Play the 'plext'(tm) game.
Parameters: patterns (e.g., "ivngmarlbkstvl") or the verbose flag ("-v").
You can put more than one pattern if you like.
Note that the patterns MUST BE all lowercase.
The verbose flag applies to all subsequent patterns.
$
If you type plext1.py --help or similar, you'll get the same thing.

Now suppose you want to provide a website (a wiki, say) with the "help" messages for a bunch of these scripts. You could run each script, snarf'n'barf the help message, and put that onto your wiki. This might be OK if you just have a couple of scripts and you never (well, hardly ever) change them. But even with just a few scripts, you've got denormalized (redundant) data -- data that can easily get out of date.

That is, whenever you change one of the scripts, you have to snarf'n'barf the help message again, if you want the website to stay up to date. This is a waste of time, if not easily forgotten....

How about having a CGI that runs the script and displays the help message? This is such a bad idea that I don't even know where to begin....

But what if you had a CGI that would run pydoc(1) on the script? It might be able to print something like this:
$ pydoc /home/collin/plext1.py 
Help on module plext1:

NAME
plext1

FILE
/home/collin/plext1.py

DESCRIPTION
Play the 'plext'(tm) game.
Parameters: patterns (e.g., "ivngmarlbkstvl") or the verbose flag ("-v").
You can put more than one pattern if you like.
Note that the patterns MUST BE all lowercase.
The verbose flag applies to all subsequent patterns.

FUNCTIONS
find_longest_match(an_arg, legal_words)

main(args)
The parameter is a list; we expect sys.argv[1:] --
i.e. just the words, not the script name.

printwords(a_list)

show_answer(an_arg, legal_words)
an_arg is a long string like "ivngmarlbkstvl" (MUST BE all lowercase);
legal_words is a list of words from the dictionary.

usage()

DATA
verbose = False
words = '/usr/share/dict/words'
Even better, the CGI could run pydoc -w and then spit the result onto your page, something like what you see at plext1.html. Now how cool is that?


Anyway, if you want to see the source of the program, here it is. Note that the entire content is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
#!/usr/bin/python -tt
# vim:sw=4:et
"""Play the 'plext'(tm) game.
Parameters: patterns (e.g., "ivngmarlbkstvl") or the verbose flag ("-v").
You can put more than one pattern if you like.
Note that the patterns MUST BE all lowercase.
The verbose flag applies to all subsequent patterns."""

import re
import sys

words = "/usr/share/dict/words"
verbose = False


def main(args):
"""The parameter is a list; we expect sys.argv[1:] --
i.e. just the words, not the script name."""
global verbose
legal_words = []
all_lower = re.compile('[a-z]*$')
w = open(words, 'r')
for a_word in w:
if all_lower.match(a_word):
legal_words.append(a_word.rstrip())
w.close()

got_a_word = False
for an_arg in args:
if an_arg == "-v":
verbose = True
continue
# Not a flag; it's a plext puzzle
if all_lower.match(an_arg):
show_answer(an_arg.lower(), legal_words)
got_a_word = True
else:
usage()

if not got_a_word:
print "Didn't get any words."
usage()

sys.exit(0)

def usage():
print sys.argv[0] + ":", __doc__
sys.exit(1)

def show_answer(an_arg, legal_words):
"""an_arg is a long string like "ivngmarlbkstvl" (MUST BE all lowercase);
legal_words is a list of words from the dictionary."""
num_words = 0 # none so far
while len(an_arg) > 0:
(match_len, some_words) = find_longest_match(an_arg, legal_words)
num_words = num_words + 1
print `num_words` + ": matched '" + an_arg[:match_len] + "':",
printwords(some_words)
an_arg = an_arg[match_len:]
print "my best answer is:", num_words, "words"

def find_longest_match(an_arg, legal_words):
len_matched = 0
pat = ""
the_list = legal_words
# loop entry:
# the_list -> words matching len_matched bytes of an_arg
# pat -> pattern showing len_matched bytes
while len(the_list) and len_matched < len(an_arg):
if verbose:
print "trying to match:", an_arg[:len_matched+1]
old_list = the_list
pat = pat + '.*' + an_arg[len_matched]
pat_re = re.compile(pat)
the_list = filter(lambda aword: pat_re.match(aword), old_list)
len_matched = len_matched + 1
# At this point: Either we matched all of an_arg, or...
if len(the_list):
# Matched the whole thing
if verbose:
print "matched all of '" + an_arg + "': e.g.",
printwords(the_list)
return (len_matched, the_list)
else:
# Here, the_list is empty. So we matched len_matched-1 bytes.
if verbose:
print "matched", len_matched-1, "bytes of",
print "'" + an_arg + "': e.g.",
printwords(old_list)
return (len_matched-1, old_list)

def printwords(a_list):
if len(a_list) == 1:
print a_list[0]
return
# else
print a_list[0], "or", a_list[len(a_list)-1]

if __name__ == '__main__':
main(sys.argv[1:])
Did that sound too much like a commercial? Too bad :)

No comments: