- Google search on something, say "measuring software modularity"
- See some PDF files in the results
- "Open in new tab" one of those PDF results
- The new tab has something that looks like
http://www.google.com/url?sa=t&rct=j&q=measuring%20software%20modularity&source=web&cd=6&ved=0CGEQFjAF&url=http%3A%2F%2Frise.cs.drexel.edu%2F~sunny%2Fpapers%2Facom08_drh.pdf&ei=CC57T-XGBYb20gGJ7eSvBg&usg=AFQjCNF4WWHzd2WFkx-6_YupSut7Z4XVNA&cad=rja - A "what to do with this file?" dialog box appears, suggesting
- Save the file;
- Open with something (Acroread, Preview, okular, kpdf, xpdf, etc.)
What I want instead of the above monstrosity is simply http://rise.cs.drexel.edu/~sunny/papers/acom08_drh.pdf.
Here's some code that'll do that. I have python2.3.something on this Mac OS X 10.4 powerbook (from 2006) and...
#!/usr/bin/python -utt # vim:et '''Given a google-ized URL on stdin return the URL of interest. Example input: http://www.google.com/url?sa=t&rct=j&q=measuring%20software%20modularity&source=web&cd=2&ved=0CEUQFjAB&url=http%3A%2F%2Fwww2.dbd.puc-rio.br%2Fpergamum%2Ftesesabertas%2F0410867_08_cap_02.pdf&ei=CC57T-XGBYb20gGJ7eSvBg&usg=AFQjCNEEhsr8h5IOUWtKMoIMk7eMSdi41A&cad=rja Example output: http://www2.dbd.puc-rio.br/pergamum/tesesabertas/0410867_08_cap_02.pdf''' import sys import urllib USTART = '&url=' UEND = '&' def main(infile): for aline in infile: aline = aline.rstrip() ustart_at = aline.find(USTART) if ustart_at < 0: print "Can't find '%s'; ignoring" % USTART continue url_start = ustart_at + len(USTART) url_end = aline.find(UEND, url_start) if url_end == -1: url_end = None print urllib.unquote(aline[url_start:url_end]) if __name__ == '__main__': main(sys.stdin)
No comments:
Post a Comment