Saturday, March 19, 2011

Sort of Python

Sorry for the title; I couldn't resist. My nephew and I were talking about sorting in Java, a language which I can't even spell. But I remembered, when talking about the idea of passing a function to another function, that this concept takes some getting used to. I decided to write a little example this morning, using a language I actually write code in, viz., Python. Here's the first part:
#!/usr/bin/python -utt
# vim:et
'''Program to sort a list of words using various criteria.'''

def show(listA):
    print "\tlistA:", listA
    print

def main():
    '''Create a list of words, then sort using various criteria.
    Display the contents of the list after each sort.'''

    # Easier to type than ['In','the','beginning' (etc)
    listA = 'In the beginning God created'.split()

    print "Before sorting:"
    show(listA)

    listA.sort()
    print "after default sort:"
    show(listA)

    return 0

if __name__ == '__main__':
    main()
Let me explain briefly. "show" is just a helper that displays the list in a certain way -- i.e., with a tab in front and an extra newline after the word list. I wrote "show" so that, in case I wanted to do something different (e.g., put the extra newline before, rather than after, the list; or use some spaces rather than a tab character) I would just change the layout once, in show(), rather than changing a bunch of print statements throughout.

"main" creates the list, which I creatively call listA, by taking a string (a sentence or phrase will be fine) and splitting it based on whitespace (this is like PERL's "qw"). Then it calls "show" to display the list's original contents.

Finally, it calls the list's sort function, passing no parameters. This sorts according to the default ordering, which may depend on your $LANG or $LC_ALL environment variable. Here's what happens when you run it:

% ./sort1a.py 
Before sorting:
        listA: ['In', 'the', 'beginning', 'God', 'created']

after default sort:
        listA: ['God', 'In', 'beginning', 'created', 'the']

% 
Now what if you want to do something different with the sort -- rather than sorting based on the natural ordering of these words, what if you wanted to put longest words last?

Well, sort() can still help you, but you need a comparator function -- one that, given two words, tells whether the first is shorter (or "less than", or "comes before") the second word in the desired outcome. Let's add that in like this:

#!/usr/bin/python -utt
# vim:et
'''Program to sort a list of words using various criteria.'''

def show(listA):
    print "\tlistA:", listA
    print

def compLen(x,y): '''Compare words for length''' return cmp(len(x), len(y))
def main(): '''Create a list of words, then sort using various criteria. Display the contents of the list after each sort.''' # Easier to type than ['In','the','beginning' (etc) listA = 'In the beginning God created'.split() print "Before sorting:" show(listA) listA.sort() print "after default sort:" show(listA)
# Sort for length listA.sort(compLen) print "after sort by length:" show(listA)
return 0 if __name__ == '__main__': main()
See how that works? I created a routine compLen, which returns -1, not when the first word would precede the second in natural ordering, but when the first word is shorter than the second word. By the way, Python's cmp does this comparison/return thing on whatever we pass it -- hence by just typing cmp(len(x), len(y)) I've created a routine that does what we need to sort by length.

The second change, within main(), calls sort(), passing in this comparator routine. When sort() does its thing, it will put them in ascending order, as defined by compLen().

Does that make sense? When sort() wants to decide if, what's currently say, element#2 should go after what's currently element#3 (i.e., if 2 and 3 are in the correct order), it calls a function. If you just say someList.sort(), then sort() will call a function that returns -1 if element#2 is "less than" element#3, whatever that means. But if you pass in a function like compLen, then sort() will call compLen, which will say whether element#2 is (in this case) shorter than element#3, and so you'll end up with a list sorted in order of increasing length. Here's how it looks when run:

% ./sort1b.py 
Before sorting:
        listA: ['In', 'the', 'beginning', 'God', 'created']

after default sort:
        listA: ['God', 'In', 'beginning', 'created', 'the']

after sort by length: listA: ['In', 'God', 'the', 'created', 'beginning']
%
Did that all make sense? More examples in part 2

No comments: