Wednesday, July 25, 2012

Sorting QIF records... a Python3 script

As somewhat of a modern Luddite, I use Quicken® 2002 basic for my home accounting; fortunately, I can download ".qif" reports for my accounts.

But one credit-card account doesn't give me the records in an order I can understand. Usually, I want to know the date range of last week's download, so I type something like
$ grep "^D" visa.QIF | sort
and based on the output, I'll search for transactions dated the following day or later. This way, I won't confuse myself by seeing the same transactions I saw last week (this is where déjà vu is for real).

Tonight I was confused the other way: I ended up with a .QIF file that was missing some transactions. As usual, the transactions were listed in some apparently-random order, and I didn't trust the bank to display transactions on-screen in the same order. Fortunately, the website can be told to display the transactions in some nice order (by transaction date, for example, or post date—ascending or descending). I like this concept, but it's too bad I haven't figured out how to download those transactions in a nice order. It may not be possible.

I wanted to have the on-screen stuff and my QIF file in the same order so I could see what was missing more easily, and feel confident that I had caught everything. How to sort the "QIF" file? For those who don't know, (but why are you reading this?), a QIF file looks like this:


!Type:Bank
D07/16/2012
T-31.30
C*
N
PWITHDRAWAL SMITH'S DEPT STORE -PAYMENT
^
D07/12/2012
T-3,300.00
C*
N
PWITHDRAWAL MASTER CARD -ONLINE PMT
^
…etc.
A few things to note:
  • Entries consist of 6 lines, at least they do here; the 6th line begins with ^ and indicates the end of the entry
  • Dollar amounts are preceded by "T" and use commas to separate thousands
  • Dates are preceded by "D" and are in the American format, sort of: mm/dd/yyyy
Since entries have multiple lines, there's not an easy way to just sort them via the "sort" command. So naturally I wrote some Python. If it weren't so late I'd explain it all in some detail. I at least want to share it with you though:
#!/usr/bin/python3 -utt
# vim:et:sw=4
'''Program to sort records from a quicken 'qif' file.

Parameters: filename'''

import os
import sys
from time import localtime, mktime, strftime, strptime


def main(args):
    '''Read quicken records from a qif file, creating a dict for each one.
    At end of file, sort them by US-style date and print them.'''
    if len(args) != 1:
        print('Expected exactly one parameter, viz., filename; got',
                args, file=sys.stderr)
        sys.exit(1)
    if not os.path.exists(args[0]):
        print("Can't find inputfile %s" % args[0], file=sys.stderr)
        sys.exit(1)
    xactions = list()
    curr = dict()
    for aline in open(args[0], 'r'):
        aline = aline.strip()
        if not aline:
            continue

        akey = aline[0]
        aval = aline[1:]
        # end of record?
        if akey == '^':
            if curr:
                xactions.append(curr)
                curr = dict()
        elif akey == 'T':
            curr[akey] = float(aval.replace(',', ''))   # 1,234 => 1234
        elif akey == 'D':
            curr[akey] = mktime(strptime(aval, '%m/%d/%Y'))
        else:
            curr[akey] = aval

    xactions.sort(key=lambda X: X['D'])
    for one in xactions:
        print('%s $%8.2f %4s %s' % (strftime('%Y-%m-%d', localtime(one['D'])),
                                        one.get('T', 99999.99),
                                        one.get('N', '#?')[:4],
                                        one.get('P', '<payee unspecified>')))

def date_cmp(a, b):
    '''Compare dates of two qif-based entries.'''
    return cmp(a['D'], b['D'])

if __name__ == '__main__':
    main(sys.argv[1:])
Oh, the date_cmp() function is struck out, because it's not needed in this program. Why did I write it then? Because I was thinking of the way sorting works in python2.x, where you pass a comparison function into sort() if you want to use a non-default one, This post from last year shows how that one works. We don't need it here, though, and actually can't use it, because the sort of Python3 uses a "key" function, rather than a "cmp" function.

With that out of the way, here's how it works, basically: we read the input file ("Activity.QIF" or "Quicken.qif" or whatever) one line at a time. when we see a '^' then we're done with one entry and stash it in a list.

Each entry, by the way, is a python "dict" (like a hash for you Perl guys) -- the key is the first character in each line, and the value is everything after that. For some reason, I wanted to convert the dollar amounts into floating-point numbers (what was I thinking?) and in order to do that, I needed to kill off any ','; that's why the
float(aval.replace(',', ''))
construct above.

For the dates, I wanted to be able to sort them easily, and I thought the easy way to do that was to convert them into scalar values corresponding to "number of seconds since the epoch", so that's what the mktime/strptime stuff is about. By the way, I usually like to code like "import os" and then later "os.path.exists(...)." This avoids embarrassing name collisions, and besides makes it clear where each thing came from. But with all those "time" routines, is there any doubt where they came from? I mean, localtime, mktime, str*time.

When we get to the end of the file, we've got all the records, so we sort them. A sanity-check (to make sure we don't have data left over in curr) might be nice, but that's "an exercise for the reader."

As I mentioned above, the sorting uses "key" rather than "cmp", and the "key" function just takes the value in each entry corresponding to the 'D' element—a less-than-one-liner, so I used a lambda to say, "hey, map X to X['D']"; it's not worth a whole "def funcName(foo):"

We then take the sorted list and print one element at a time: date, dollar amount, "check number" (that's what the "N" is), and payee. (Is it something else for deposits? I dunno.)

If it weren't past my bedtime, I'd explain further. But it is, so I won't. Except to point out that in python3, "print" is a function, not a statement. I may add more explanation later.

No comments: