Tuesday, April 03, 2012

De-google-izing search results

This keeps happening to me so I wrote a script to deal with it:
  • Google search on something, say "measuring software modularity"
  • See some PDF files in the results
  • "Open in new tab" one of those PDF results
  • The new tab has something that looks like
  • A "what to do with this file?" dialog box appears, suggesting
    • Save the file;
    • Open with something (Acroread, Preview, okular, kpdf, xpdf, etc.)
And what you've got when this is done is the ridiculous URL above

What I want instead of the above monstrosity is simply http://rise.cs.drexel.edu/~sunny/papers/acom08_drh.pdf.

Here's some code that'll do that. I have python2.3.something on this Mac OS X 10.4 powerbook (from 2006) and...

#!/usr/bin/python -utt
# vim:et
'''Given a google-ized URL on stdin return the URL of interest.
Example input:

Example output:

import sys
import urllib

USTART = '&url='
UEND = '&'

def main(infile):
    for aline in infile:
        aline = aline.rstrip()
        ustart_at = aline.find(USTART)
        if ustart_at < 0:
            print "Can't find '%s'; ignoring" % USTART
        url_start = ustart_at + len(USTART)
        url_end = aline.find(UEND, url_start)
        if url_end == -1:
            url_end = None
        print urllib.unquote(aline[url_start:url_end])

if __name__ == '__main__':

