You can guess what happened: photos and music—especially photos—tend to expand to fill the space available. It doesn't help that we have multiple copies of stuff. So I thought to run some Perl or Python script to help me find said copies.
Since I've become a Python partisan I went that route. A web search turned up some helpful hints on stackoverflow and particularly this post on endlesslycurious.com. The mac mini has python2.6, so I made a few modifications; you can see the whole thing at http://cpwriter.net/dup2/.
I ran that on /Users on the lovely Carol's mac mini, putting the results into dups.out.
mini1:~ collin$ wc -l dups.out 12489 dups.out mini1:~ collin$Yep, that's a lot of files. A couple of big offenders:
mini1:~ collin$ grep Best.*Wedding dups.out [621850501, ['/Users/carol/from-macbook/Movies/Best of Wedding.mov', \ '/Users/collin/from-pbook/Desktop/Best of Wedding.mov']] [989954048, ['/Users/carol/from-macbook/Desktop/Redeemer Marriage Series/Best of Wedding-DVD.img', \ '/Users/collin/from-pbook/Desktop/Best of Wedding-DVD.img']] mini1:~ collin$That's 621Mbytes and 989Mbytes. So about 1.5GB freed up just like that. But I think we have a lot more. I discovered a lot of files under "archives" and "from-pbook" that are the same, like this:
mini1:~ collin$ grep archives dups.out|grep -m5 from-pbook [1049177, ['/Users/collin/archives/collin-laptop/Pictures/iPhoto Library/2009/01/01/IMG_3001.JPG', \ '/Users/collin/archives/data1/pix-dec08/img_3001.jpg', \ '/Users/collin/from-pbook/Pictures/iPhoto Library/2009/01/01/IMG_3001.JPG', '/Users/collin/pix/2008/12/pix-dec08/img_3001.jpg']] …Wow, four paths to the same file. Hey, can I get rid of all those pix-dec08 paths? Yes, because:
- A "diff -r archives/data1/pix-dec08 pix/2008/12/pix-dec08" showed that these two directories are identical;
- every "large" (not a thumbnail or slide) image file under pix/2008/12/pix-dec08/ appeared
in dups.out. Except those under 1024×1024 bytes:
mini1:~ collin$ for F in pix/2008/12/pix-dec08/*jpg; do if grep -qF $F dups.out; then : OK; else ls -l $F; fi; done -rwxr-xr-x 1 collin _lpoperator 1002328 Dec 31 2008 pix/2008/12/pix-dec08/img_2961.jpg -rwxr-xr-x 1 collin _lpoperator 858104 Jan 1 2009 pix/2008/12/pix-dec08/img_2988.jpg -rwxr-xr-x 1 collin _lpoperator 863361 Jan 1 2009 pix/2008/12/pix-dec08/img_2994.jpg -rwxr-xr-x 1 collin _lpoperator 865777 Jan 1 2009 pix/2008/12/pix-dec08/img_2995.jpg -rwxr-xr-x 1 collin _lpoperator 994298 Jan 1 2009 pix/2008/12/pix-dec08/img_2996.jpg -rwxr-xr-x 1 collin _lpoperator 811491 Jan 1 2009 pix/2008/12/pix-dec08/img_2997.jpg mini1:~ collin$
mini1:~ collin$ for F in pix/2008/12/pix-dec08/*jpg; do if grep -qF $F dups.out; then : OK; else \ Y=`basename $F|tr [:lower:] [:upper:]`; \ Z=`/bin/ls from-pbook/Pictures/iPhoto\ Library/200*/*/*/$Y`; \ echo $Z;cmp "$F" "$Z"; echo; fi; done from-pbook/Pictures/iPhoto Library/2008/12/31/IMG_2961.JPG from-pbook/Pictures/iPhoto Library/2009/01/01/IMG_2988.JPG from-pbook/Pictures/iPhoto Library/2009/01/01/IMG_2994.JPG from-pbook/Pictures/iPhoto Library/2009/01/01/IMG_2995.JPG from-pbook/Pictures/iPhoto Library/2009/01/01/IMG_2996.JPG from-pbook/Pictures/iPhoto Library/2009/01/01/IMG_2997.JPG mini1:~ collin$So we can kill off those two paths. That might have saved another Gbyte or so.
Now, can we maybe hardlink the /Users/collin/archives/collin-laptop/Pictures/ stuff to/from the /Users/collin/from-pbook/Pictures/ stuff? And how much space might that save?
mini1:~ collin$ du -sh archives/collin-laptop/Pictures/iPhoto\ Library/ from-pbook/Pictures/iPhoto\ Library/ 12G archives/collin-laptop/Pictures/iPhoto Library/ 11G from-pbook/Pictures/iPhoto Library/ mini1:~ collin$Quite a bit. That plus the 1.5GB already saved earlier would be a significant help here:
collin@p3:/mnt/home/collin> df -h . Filesystem Size Used Avail Use% Mounted on mini1:/Users 298G 257G 42G 87% /mnt/home collin@p3:/mnt/home/collin> ssh mini1 df -h . Filesystem Size Used Avail Capacity Mounted on /dev/disk0s2 298Gi 256Gi 41Gi 87% / collin@p3:/mnt/home/collin>Not sure why the difference, but there it is. Anyway, I wanted to hardlink one set of files to the other. (Why? Because the from-pbook directory may get rsync'd. If I delete the from-pbook directory, then it may come back later. And if I delete the other directory, and subsequently decide to remove the files from the pbook, then we'll lose the photos. So hardlink is the way to go.) Consequently I wrote this silly script:
collin@p3:/mnt/home/collin> cat tmp/photos.sh #!/bin/sh D2="archives/collin-laptop/Pictures/iPhoto Library" D1="from-pbook/Pictures/iPhoto Library" find "$D1" -type f | while read AFILE; do SUB=${AFILE#$D1/} #echo SUB=$SUB BFILE=$D2/$SUB if [[ -s $BFILE ]] && [[ ! "$AFILE" -ef "$BFILE" ]] && cmp -s "$AFILE" "$BFILE"; then if [[ $AFILE -ot $BFILE ]] ; then echo ln -f "'$AFILE'" "'$BFILE'" else echo ln -f "'$BFILE'" "'$AFILE'" fi fi done collin@p3:/mnt/home/collin> time tmp/photos.sh > foo.out real 41m18.878s user 0m11.009s sys 0m33.146s collin@p3:/mnt/home/collin>then ran it as you see above. A quick sanity check of "foo.out" looked reasonable. Ah, I probably should have run it on mini1, rather than on the NFS client. And the same here:
collin@p3:/mnt/home/collin> df -h .; ./foo.out; df -h . Filesystem Size Used Avail Use% Mounted on mini1:/Users 298G 257G 42G 87% /mnt/home -bash: ./foo.out: Permission denied # D'oh! I didn't say "chmod +x"; well, let me fix it the easy way... Filesystem Size Used Avail Use% Mounted on mini1:/Users 298G 257G 42G 87% /mnt/home collin@p3:/mnt/home/collin> df -h .; sh ./foo.out; df -h . Filesystem Size Used Avail Use% Mounted on mini1:/Users 298G 257G 42G 87% /mnt/home Filesystem Size Used Avail Use% Mounted on mini1:/Users 298G 246G 53G 83% /mnt/home collin@p3:/mnt/home/collin>OK, that's enough for now.
No comments:
Post a Comment