Mylos

Garbage-collecting your Mac OS X Address Book

After years of Outlook and Palm synchronization, when I fully switched over to the Mac, I moved my contacts database over to Mac OS X’s Address Book (painfully due to Outlook’s roach motel tendencies, but that’s another story). For the most part I am satisfied, its data model is far more powerful than Outlook’s, and most Mac apps feature excellent Address Book integration. I can call someone using Skype just by right-clicking on a phone number, for instance.

I just noticed, however, that my scripted Address Book backups were pushing the bounds of the reasonable: over 60MB zipped. For a mere 357 contacts, that seems a tad excessive. Upon further inspection, I realized my Address Book directory ~/Library/Application Support/AddressBook was pushing 152MB, the bulk of it in the Images subdirectory.

It turns out there are two causes for this problem:

  1. When you drag and drop an image for a contact, then crop it, Address Book keeps the full-size image around, presumably in case you want to change the crop later. In most cases this is unnecessary and wastes space.
  2. Address Book does not seem to remove the images for a contact when you delete it. Worse, those images get carried over into manual backups and thus backing up, blowing away your Address Book directory and restoring from backup will not get rid of the cruft.

I wrote the short shell script below to back up the AB directory, extract the list of contacts and delete any image that fits in the two categories above. This took me down to a much more reasonable 11MB. You can download the zipped script here.

Disclaimer: I tried my best to make this as generic as possible, but I cannot be held responsible if running this script causes you to lose data, so I would advise you to perform your own backup prior to running it.

#!/bin/sh
backup=$HOME/ab_clean_backup.$$
in_ab=$backup/in_ab
all=$backup/all
datadir="$HOME/Library/Application Support/AddressBook"
db=`echo "$datadir"/*.abcddb`

exit_ab() {
  echo killing AddressBook
  ps -u `whoami` | grep "Address Book" | grep /Applications | awk '{print $2}'|xargs -n 1 kill -9
}

backup_ab() {
  echo "Backing up address book directory $datadir to backup $backup"
  rm -rf  $backup > /dev/null 2>&1
  mkdir $backup
  ditto "$datadir" "$backup"
}

remove_old() {
  echo extract list of contacts in AB from SQLite
  sqlite3 "$db" "select zuniqueid from zabcdrecord"|cut -d: -f 1|sort > $in_ab

  cd "$datadir"
  cd Images

  # comment out the next two lines if you want to keep full-size originals
  echo removing full-scale images
  rm -f *.jpeg

  echo finding all the images
  ls -1|grep -v '\.jpeg$'|sort > $all

  echo removing images with no associated AB record
  comm -13 $in_ab $all | xargs rm
}

exit_ab
backup_ab
remove_old

Nehalem Mac Pro first impressions

Some people use laptops as their primary computing environment. I am not one of them. Desktop replacement laptops like the MacBook Pro are heavy, and truly portable ones like my MacBook Air are too limited. Even the desktop replacement ones have limited expandability, slow drives, poor screens, lousy keyboards. My workhorse for the last 5 years was a dual 2GHz PowerMac G5. I am surprised I kept it so long, but I guess that says something about the durability of Macs and how you are not required to go on the hadware upgrade treadmill with each release of the OS. To paraphrase Borges, each increasingly bloated version of Windows makes you regret the previous one. I am also surprised at how much residual value the G5 has kept on eBay.

That said, the G5 was showing its age. Stitching panoramas made from 22MP Canon 5DmkII frames in AutoPano Pro is glacially slow, for instance. I was not willing to switch to the Mac Pro until today because the archaic shared bus on previous Intel chips is a severe bottleneck on multi-processor and multicore performance, unlike the switched interconnect used by PowerPC G5 and AMD Opteron processors, both of which claim descent from the DEC Alpha, the greatest CPU architecture ever designed. The new Xeon 3500 and 5500 Mac Pros use Intel’s new Nehalem microarchitecture, which finally does away with the shared bus in favor of a switched interconnect called QuickPath and on-chip memory controllers (i.e. Intel cribbed AMD’s Opteron innovations).

I splurged on the top of the line Mac Pro, with eight 2.93GHz cores, each capable of running two threads simultaneously, and 8GB of RAM. The standard hard drive options were completely lackluster, so I replaced the measly 660GB boot drive with an enterprise-class Intel X25-E SSD. Unfortunately, at 32GB it is just enough to host the OS and applications, so I complemented it with a quiet, power-efficient yet fast 1TB Samsung SpinPoint F1 drive (there is a WD 2TB drive, but it is a slow 5400rpm, and the 1.5TB Seagate drive has well-publicized reliability problems, even if Seagate did the honorable thing unlike IBM with its defective Deathstars).

I originally planned on using the build-to-order ATI Radeon HD 4870 video card upgrade, but found out the hard way it is incompatible with my HP LP3065 monitor (more below) and had to downgrade back to the nVidia GeForce GT 120. It would have been nice to use BootCamp for games and retire my gaming PC, but I guess that will have to wait. The GT120 is faster than the 8800GTS in the Windows box, in any case.

In no particular order, here are my first impressions:

  • The “cheese grater” case is the same size as the G5, but feels lighter.
  • The DVD-burner drive tray feels incredibly flimsy.
  • Boot times are ridiculously fast. Once you’ve experienced SSDs as I originally did with the MacBook Air, there is no going back to spinning rust.
  • I have plenty of Firewire 800 to 400 cables for my FW400 devices (Epson R1800, Nikon Super Coolscan 9000ED, Canon HV20 camcorder) so I will probably not miss the old ports and probably not even need a hub (Firewire 800 hubs are very hard to get).
  • The inside of the case is a dream to work with. The drive brackets are easy to swap, the PCIe slots have a spring-loaded retention bar that hooks under the back of the card, and the L brackets are held with thumbscrews, making swapping the cards trivial, with no risk of getting a marginal connection from a poorly seated card.
  • The drive mounting brackets have rubber grommets to dampen vibrations, a nice touch. There also seems to be some sort of contact sensor in the rear, purpose unknown.
  • There are only two PCIe power connectors, so you can only plug in a single ATI 4870 card even though there are two PCIe x16 slots. The GT 120 does not require PCIe power connectors so you would have to expand capacity with one of these. Considering the GT 120 is barely more expensive than the Mini DisplayPort to Dual-Link DVI adapter cable, it makes more sense to get the extra video card if you have two monitors.
  • The entire CPU and RAM assembly sits on a daughterboard that can be slid out. This will make upgrading RAM (when the modules stop being back-ordered at Crucial) a breeze.
  • Built-in Bluetooth means no more flaky USB dongles.
  • No extras. The G5 included OmniGraffle, OmniOutliner, Quickbooks, Comic Life and a bunch of other apps like Art Director’s Toolkit. No such frills on the Mac Pro even though it is significantly more expensive even in its base configuration.
  • The optical out is now 96kHz 24-bit capable, unlike the G5 that was limited to 44kHz 16-bit Red Book audio. I have some lossless 192kHz studio master recordings from Linn Records, so I will have to get a USB DAC to get full use out of them. I am not sure why Apple cheaped out on the audio circuitry in a professional workstation that is going to be heavily used by musicians.
  • The G5 was one of the first desktop machines to have a gigabit Ethernet port. Apple didn’t seize the opportunity to lead with 10G Ethernet.
  • The annoying Mini DisplayPort is just as proprietary as ADC, but without the redeeming usability benefits of using a single cable for power, video and USB. DisplayPort makes sense for professional use with high-end monitors like the HP Dreamcolor LP2480zx that can actually use 10-bit DACs for ultra-accurate color workflows. There is no mini to regular DisplayPort adapter, unfortunately. Well, the thinness of the cable is a redeeming feature. Apple has always paid attention to using premium ultra-flexible cables everywhere from the power cord to Firewire.
  • Transferring over 800GB of data from the old Mac is utterly tedious, even over Firewire 800 using Target Disk mode on the G5…
  • As could be expected, the Mac Pro wipes the floor with the G5, as measured by Xbench. A more interesting comparison is with the MacBook Air, which also uses a SSD, albeit a slowish one.

Note (2009-03-17):

The BTO upgrade ATI Radeon HD 4870 video card I initially ordered won’t recognize my HP LP3065 30″ monitor, at least not on the Dual-link DVI port, which essentially renders it useless for me.

Update (2009-03-18):

I went to the Hillsdale Apple Store. The tech was very helpful, but we managed to verify that the ATI Radeon HD 4870 card works fine on an Apple 30″ Cinema Display (via Dual-link DVI) and on a 24″ Cinema LED display (via mini-DisplayPort). The problem is clearly an incompatibility between the ATI Radeon HD 4870 and the HP LP3065.

I am not planning on switching monitors. The HP is probably the best you can get under $3000, and far superior in gamut, ergonomics (tilt/height adjustments) and connectivity (3 Dual-link DVI ports) to the current long-in-the-tooth Apple equivalent, for 2/3 the price. My only option is to downgrade the video card to a nVidia GeForce GT 120. I ordered one from the speedy and reliable folks at B&H and should get it tomorrow (Apple has it back-ordered for a week).

Update (2009-03-19):

I swapped the ATI 4870 for the nVidia GT120. The new card works with the monitor. Whew!

Update (2009-03-29):

I have just learned disturbing news about racial discrimination at B&H. For the reasons I give on RFF, I can no longer recommend shopping there.

Parallelizing the command-line

Single-thread processor performance has stalled for a few years now. Intel and AMD have tried to compensate by multiplying cores, but the software world has not risen to the challenge, mostly because the problem is a genuinely hard one.

Shell scripts are still usually serial, and increasingly at odds with the multi-core future of computing. Let’s take a simple task as an example, converting a large collection of images from TIFF to JPEG format using a tool like ImageMagick. One approach would be to spawn a convert process per input file as follows:

#!/bin/sh
for file in *.tif; do
  convert $file `echo $file|sed -e 's/.tif$/.jpg/g' &
done

This does not work. If you have many TIFF files to convert (what would be the point of parallelizing if that were not the case?), you will fork off too many processes, which will contend for CPU and disk I/O bandwidth, causing massive congestion and degrading performance. What you want is to have only as many concurrent processes as there are cores in your system (possibly adding a few more because a tool like convert is not 100% efficient at using CPU power). This way you can tap into the full power of your system without overloading it.

The GNU xargs utility gives you that power using its -P flag. xargs is a UNIX utility that was designed to work around limits on the maximum size of a command line (usually 256 or 512 bytes). Instead of supplying arguments over the command-line, you supply them as the standard input of xargs, which then breaks them into manageable chunks and passes them to the utility you specify.

The -P flag to GNU xargsspecifies how many concurrent processes can be running. Some other variants of xargs like OS X’s non-GNU (presumably BSD) xargs also support -P but not Solaris’. xargs is very easy to script and can provide a significant boost to batch performance. The previous script can be rewritten to use 4 parallel processes:

#!/bin/sh
CPUS=4
ls *.tif|sed -e 's/.tif$//g'|gxargs -P $CPUS -n 1 -I x convert x.tif x.jpg

On my Sun Ultra 40 M2 (2x 1.8GHz AMD Opterons, single-core), I benchmarked this procedure against 920MB of TIFF files. As could be expected, going from 1 to 2 concurrent processes improved throughput dramatically, going from 2 to 3 yielded marginal improvements (convert is pretty good at utilizing CPU to the max). Going from 3 to 4 actually degraded performance, presumably due to the kernel overhead of managing the contention.

benchmark

Another utility that is parallelizable is GNU make using the -j flag. I parallelize as many of my build procedures as possible, but for many open-source packages, the usual configure step is not parallelized (because configure does not really understand the concept of dependencies). Unfortunately there are too many projects whose makefiles are missing dependencies, causing parallelized makes to fail. In this day and age of Moore’s law running out of steam as far as single-task performance is concerned, harnessing parallelism using gxargs -P or gmake -j is no longer a luxury but should be considered a necessity.

Large sensor compact cameras finally on the horizon

I have stated on the record that my dream camera is a digital Contax T3 with an APS-C size sensor (or larger). Sigma launched the DP1, the first large-sensor compact this year, but it is a flawed camera, very sluggish, with a slow f/4 lens, and its Foveon sensor tops out at ISO 800, making it in practice a less capable low-light camera than my Fuji F31fd.

A few weeks ago, Olympus and Panasonic announced the Micro Four Thirds specification, which would allow for interchangeable-lens compact cameras with a larger sensor than the nasty tiny and noisy ones used on most compacts. Unfortunately it seems the whole misguided Four Thirds effort is destined to flounder, just as APS did compared to 35mm, despite the undeniable convenience. The 18×13.5mm sensor size has almost half the area of an APS-C sensor and all Four Thirds camera made so far have predictably poor low-light performance.

In a much more promising development, Samsung announced today that since it is finding it very hard to dislodge Canon and Nikon from their top position in DSLRs or even make a dent, they are going to create an entire new segment of professional quality compact cameras using the same APS-C sensors as their DSLRs, and due for 2010. Samsung uses the Pentax lens mount for its DSLRs, and has a long established relationship with Schneider Kreuznach. Pentax makes some very nice pancake lenses that combine high optical quality with small size. The only other company is Olympus, but the 25mm f/2 is saddled with the aforementioned Four Thirds sensor with all the limitations that entails.

At the same time, Thom Hogan has echoed rumors of an APS-C size Coolpix compact from Nikon. It looks like the big camera manufacturers can no longer afford to ignore the pent-up demand for this category, as demonstrated by the brisk sales of the DP1 (No. 49 on Amazon’s Digital SLR chart).

Update (2010-10-06):

There is now a wide variety of large-sensor compacts, including models with interchangeable lenses:

  • Sigma DP1, DP1s, DP2, DP2s and DP2x: wonderful optics, compact, great image quality, mediocre high-ISO performance, very slow AF and user interface
  • Olympus EP-1, EP-2 and E-PL: cute design, sensor stabilization, poor ISO performance, slow AF, so-so optics unworthy of the Zuiko legacy, but you can use Panasonic’s lenses on them)
  • Panasonic GF1: great design, solid but heavy, mediocre ISO performance, very fast AF, great optics
  • Leica X1: great optics, best high-ISO performance, excellent user interface, very compact and light, slow AF, no video, very expensive). The camera I carry with me every day in my jacket pocket.
  • Sony NEX-3 and NEX-5: great high-ISO performance, poor user interface, very compact, awkward 24mm-e focal length if you want a compact lens. Made by an evil company that should be boycotted.
  • Samsung NX100: disappointing high-ISO performance for an APS-C sensor, optical quality a question mark.
  • Fuji X100: bulky, innovative viewfinder design, questionable user interface in the prototype, potential for greatness, but we will have to wait for the final production models.

Canon and Nikon are late to the party, and risk being marginalized if they continue to ignore market demand.

Update (2013-01-28):

The range of worthwhile options has expanded even further:

  • Fuji X100s: builds on the X100 with fast AF and an even better sensor. I have been using the X100 as my “every day carry” camera for two years now and have the X100s on preorder.
  • Leica X2: mostly fixes the AF sloth of the X1 and adds an EVF option. If it weren’t for the X100s, it would be a very compelling camera.
  • Sigma DP1 and DP2 Merrill: still slow, much higher resolution sensor, at the cost of greater bulk. Outstanding image quality at low to moderate ISO that dwarfs all but the highest end DSLRs and medium format digital. Very poor battery life. Poor software workflow options (not supported by Lightroom).
  • Sony RX1: extremely expensive, but great sensor and build quality. So-so AF. Sharp lens but has very high distortion, can be corrected in software but then you lose resolution.
  • Sony RX100: very compact, fast AF, versatile zoom lens. Decent ISO performance due to its’ 1″ sensor (in reality 13.3mm x 8.8mm, the 1″ is deceptive vacuum tube terminology that compact camera makers use to disguise just how tiny their sensors are).
  • Canon EOS-M with 40mm pancake lens: very compact, specially considering it has an interchangeable lens mount, excellent image quality, slow AF, questionable ergonomics).
  • Canon G-X: decent optical and sensor performance but clunky design that is neither fish nor fowl, neither really compact nor flexible like a compact system camera.
  • Olympus E-PM5: reportedly very good ISO performance and AF. Wide range of m43 lenses available.

The only manufacturer missing is Nikon, which for whatever reasons does not have competitive models. Their tiny sensor Coolpix line is undistinguished to the extreme, and their 1 System, while having excellent AF, has mediocre low-light performance, is fairly bulky despite the compromised sensor, and is not competitive with the Sony RX100.

r n m restaurant

This content is obsolete and kept only for historical purposes

rnm entrance

I have just eaten what is hands-down my best meal of the year at r n m restaurant (their capitalization, not mine), on Haight & Steiner in the Duboce Park/Lower Haight district of San Francisco (not to be confused with the formerly raffish and now utterly commercialized Haight-Ashbury).

The restaurant is named after the chef-owner Justine Miner’s father, Robert Miner, a co-founder of Oracle. The food was so good I am almost ready to forgive Oracle for their sleazy extortion tactics…

I started with the Parisian style tuna tartare with waffle chips, microgreens and a quail egg, a very classic dish (and one too often botched by careless chefs), given a little pep with a slight acidity. It was followed by an absolutely outstanding pan-roasted local halibut on ricotta gnocchi with asparagus and morel mushroom ragout, meyer lemon vinaigrette and mâche. The halibut was crisp outside, flaky inside. The ragoût was simply wonderful, a deep, rich and tangy broth, also slightly acidulated, with a generous helping of precious black morels. To top it off, the dessert, a Peach and cherry crisp with home-made blueberry gelato combined two of my favorite summer fruit in an unbeatable combination.

Be advised the parking situation in that neighborhood is particularly nightmarish, even by SF standards. If I had realized they offer valet parking, I wouldn’t have had to park halt a mile away (after seeking a place in vain for nearly half an hour).

Update (2012-09-05)

Unfortunately, it closed at least a year ago.