Sunday, February 08, 2009

zgrep: my new friend and timesaver

Short version: In case I'm not the last Linux user in the world to hear about zgrep, I introduce it here in narrative form.

Recently I've been getting a flood of spam -- mostly bounces due to a forged return address "dcqowad" and maybe some more stuff afterwards. My friend procmail helped me to toss these, but I wanted to get a feel for how many of these things I've been getting in the past few days.

No problem -- I have verbose logging on in procmail, and it all goes into a file vlog. Every day I rename "vlog"→"vlog.XX" where XX is the day of the month. So I blithely typed
$ grep -l 'procmail: Match on.*dcqowad' vlog*
vlog
$
Wha....? Turns out I had forgotten my renaming scheme. The files were getting rather large, so after renaming "vlog"→"vlog.XX", I compress it with gzip. Hence what I've got in the directory is:
$ ls vlog*
vlog vlog.07.gz vlog.14.gz vlog.21.gz vlog.28.gz
vlog.01.gz vlog.08.gz vlog.15.gz vlog.22.gz vlog.29.gz
vlog.02.gz vlog.09.gz vlog.16.gz vlog.23.gz vlog.30.gz
vlog.03.gz vlog.10.gz vlog.17.gz vlog.24.gz vlog.31.gz
vlog.04.gz vlog.11.gz vlog.18.gz vlog.25.gz
vlog.05.gz vlog.12.gz vlog.19.gz vlog.26.gz
vlog.06.gz vlog.13.gz vlog.20.gz vlog.27.gz
$
The way I've handled this in the past is with something kind of verbose like the following (btw my shell is bash):
$ for f in vlog*; do
if [[ $f == ${f%gz} ]] ; then
grep -l 'procmail: Match on.*dcqowad' $f;
elif zcat $f | grep -q 'procmail: Match on.*dcqowad'; then
echo $f;
fi;
done
vlog
vlog.01.gz
vlog.02.gz
vlog.03.gz
vlog.04.gz
vlog.05.gz
vlog.06.gz
vlog.07.gz
vlog.08.gz
vlog.31.gz
$
What a pain! See what it does? It looks for filenames ending with 'gz' and runs zcat on them, and greps the result -- then, because grep is looking at stdin, I have to check the return value and announce it (oh, but I could have used "grep --label=$f" there). Anyway, it's a pain, especially since I have to type the pattern in twice.

Now there might be a better way to do the loop, something like
PROG=cat
if [[ $f != ${f%.gz} ]] || [[ $f != ${f%.Z} ]]; then PROG=zcat
elif [[ $f != ${f%.bz2 ]]; then PROG=bzcat
fi
$PROG $f | grep --label=$f -l 'procmail: Match on.*dcqowad'
maybe, but that's still somewhat of a pain.

So I thought to myself, why not have a little fun and write a script that would look for suffixes like that and just do the right thing? I'd call it zgrep. But before I got very far, I had the feeling that somebody else had certainly run into this before, so I typed:
$ type zgrep
zgrep is /usr/bin/zgrep
$
Whoa -- I was about to waste a bunch of time reinventing the wheel! To see what this was about, I typed...
ZGREP(1)                                         ZGREP(1)

NAME
zgrep - search possibly compressed files for a
regular expression

SYNOPSIS
zgrep [ grep_options ] [ -e ] pattern filename...

DESCRIPTION
Zgrep is used to invoke the grep on compress'ed or
gzip'ed files. All options specified are passed
directly to grep. If no file is specified, then
the standard input is decompressed if necessary
and fed to grep. Otherwise the given files are
uncompressed if necessary and fed to grep.

If zgrep is invoked as zegrep or zfgrep then egrep
or fgrep is used instead of grep. If the GREP
environment variable is set, zgrep uses it as the
grep program to be invoked. For example:

for sh: GREP=fgrep zgrep string files
...elided...
Well, another case where the programming virtue of laziness paid off.

No comments: