Monday, March 21, 2011

Sort of Python - part deux

Continuing from part 1...
Now suppose you wanted to do something a little more useful, like a case-ignore search. Well, you'll need, of course, a case-ignore comparator function. Fortunately Python strings have a method called "lower()", which returns a string that's been shifted to lower-case. So rather than comparing the original strings using cmp(), we could pass in stringX.tolower() and stringY.tolower() to cmp() and use that return value, as shown below in compIgnoreCase:
#!/usr/bin/python -utt
# vim:et
'''Program to sort a list of words using various criteria.'''

def show(listA):
    print "\tlistA:", listA
    print

def compLen(x,y): '''Compare words for length''' return cmp(len(x), len(y))
def compIgnoreCase(x,y): '''ignore case in sort''' return cmp(x.lower(), y.lower())
def main(): '''Create a list of words, then sort using various criteria. Display the contents of the list after each sort.''' # Easier to type than ['In','the','beginning' (etc) listA = 'In the beginning God created'.split() print "Before sorting:" show(listA) listA.sort() print "after default sort:" show(listA)
# Sort for length listA.sort(compLen) print "after sort by length:" show(listA)
# Sort ignoring case listA.sort(compIgnoreCase) print "after case-ignore sort" show(listA)
return 0 if __name__ == '__main__': main()
Note the code added in this color; we compare the lower()'d version of the strings; the new results are shown in this color below:
% ./sort1c.py 
Before sorting:
        listA: ['In', 'the', 'beginning', 'God', 'created']

after default sort:
        listA: ['God', 'In', 'beginning', 'created', 'the']

after sort by length: listA: ['In', 'God', 'the', 'created', 'beginning']
after case-ignore sort listA: ['beginning', 'created', 'God', 'In', 'the']
Now what if we wanted to put the words with the highest proportion of consonants at the end, and the more vowel-heavy words at the beginning? I'd write a routine to calculate consonant density, and, like the other comparators, include it in a call to sort(). Added code and results are in this color:
#!/usr/bin/python -utt
# vim:et
'''Program to sort a list of words using various criteria.'''

def show(listA):
    print "\tlistA:", listA
    print

def compLen(x,y): '''Compare words for length''' return cmp(len(x), len(y))
def compIgnoreCase(x,y): '''ignore case in sort''' return cmp(x.lower(), y.lower())
def compConsonants(x,y): '''compare words for consonant density''' return cmp(consonantDensity(x), consonantDensity(y)) def consonantDensity(astring): '''how many consonants in astring?''' ret = 0.0 for abyte in astring.lower(): if abyte in ('a','e','i','o','u'): continue ret += 1.0 return (ret / len(astring))
def main(): '''Create a list of words, then sort using various criteria. Display the contents of the list after each sort.''' # Easier to type than ['In','the','beginning' (etc) listA = 'In the beginning God created'.split() print "Before sorting:" show(listA) listA.sort() print "after default sort:" show(listA)
# Sort for length listA.sort(compLen) print "after sort by length:" show(listA)
# Sort ignoring case listA.sort(compIgnoreCase) print "after case-ignore sort" show(listA)
# Sort by increasing consonant density listA.sort(compConsonants) print "in order of increasing consonant density" show(listA)
return 0 if __name__ == '__main__': main()
So consonantDensity calculates what proportion of each word's letters are consonants, and compConsonants(x,y) compares the consonant proportions of the two strings passed. The results look like this:
% ./sort1d.py 
Before sorting:
        listA: ['In', 'the', 'beginning', 'God', 'created']

after default sort:
        listA: ['God', 'In', 'beginning', 'created', 'the']

after sort by length: listA: ['In', 'God', 'the', 'created', 'beginning']
after case-ignore sort listA: ['beginning', 'created', 'God', 'In', 'the']
in order of increasing consonant density listA: ['In', 'created', 'beginning', 'God', 'the']
That looks about right: "in" is 50% consonants; "created" is 4/7 or about 57% consonants; "beginning", "God", and "the" are 2/3 (about 67%) consonants.

Finally, I wanted to mention "pydoc", a terrific documentation aid. When run on the above code, it produces this:

% pydoc sort1d
Help on module sort1d:

NAME
    sort1d - Program to sort a list of words using various criteria.

FILE
    /Users/collin/tmp/sorting/sort1d.py

FUNCTIONS
    compConsonants(x, y)
        compare words for consonant density
    
    compIgnoreCase(x, y)
        ignore case in sort
    
    compLen(x, y)
        Compare words for length
    
    consonantDensity(astring)
        how many consonants in astring?
    
    main()
        Create a list of words, then sort using various criteria.
        Display the contents of the list after each sort.
    
    show(listA)
Pretty cool, huh? Just put the "documentation strings" in the function declarations, and voila -- instant manpage!

No comments: