Mylos

A contrarian take on Delicious Library 2

On Friday I yielded to the hype, and after cursory testing, I purchased a copy of Delicious Library 2. The clincher was the new feature that allows you to inventory your physical posessions like electronics or cameras, and publish them in HTML format for insurance purposes.

Unfortunately, after some slightly less cursory use of the product, it is deeply unsuited to this purpose. To think I actually upgraded my home Powermac G5 from Tiger to Leopard just to use this software…

For starters, on my dual 2Ghz G5, when running in a window on my secondary 30″ monitor, the program is slow as molasses. With a library with only 3 items total, entering data fields is a one character per second tar pit. Moving the window back to the primary 23″ monitor helped only a bit.

Secondly, the data model is simplistic. For all practical purposes, gadgets are treated just like books, with some repurposing of fields. The all-important serial number can’t even be displayed in column view.

Third, even basic tasks are not handled properly. I have a Symbol CS1504 pocket scanner, which is one quarter the price or size of the Bluetooth scanner Delicious Monster sells, and has a 500 barcode memory, so you can actually use it away from your computer. Using my Python driver, I scanned some books’ bar codes, dumped a text file of ISBNs and imported it into DL2. The import mapper allows you to specify which fields of the tab-separated text file go into which field of the DL2 data model. You would expect it to retrieve book detail automatically, but it does not do so. Worse yet, the “retrieve book details from the Internet” menu is grayed out when you select one of the imported books.

There are also some fit-and-finish issues. Right-clicking to get the context menu and selecting the “View in Amazon” option does not do anything. Perhaps this is due to the fact Camino is set to be my default browser, but the other way to view book details on Amazon works (hovering the mouse over the book cover thumbnail, then clicking on the overlaid eye icon that appears when you hover).

On the plus side, the HTML export works quite well, and the loan manager probably does as well, but given the shortcomings of the current version, I would not advise using it unless all you want to manage is books, CDs and DVDs, and you can afford to buy their expensive Bluetooth scanner to use in wireless semi-tethered mode.

Christopher Elbow chocolates

A few months ago, a new chocolate shop opened in Hayes Valley. Christopher Elbow chocolates is based in Kansas City, not a place that immediately springs to mind when the Great American Chocolate Renaissance is discussed. I had bought some of their products from Cocoa Bella, however, and knew they were good, if pricey.

Christopher Elbow

They sell moderately expensive chocolate bars (the No. 10 41% milk chocolate with hazelnuts is pretty good), drinking chocolate, and bouchéees. The latter are a little too bleeding edge for my taste (spices do not belong in chocolate), but the Bourbon Pecan is to die for, a light and moist, pecan marzipan, almost creamy despite the deliberately roughly chopped texture, and topped with ganache. Not surprisingly, it is usually sold out at the other outlets..

The real draw, as far as I am concerned, is the hot chocolate. Dark, rich, creamy and thick, specially if you ask them to blend it with genuine praline, it is absolutely delicious. You can enjoy it in the twee little salon in the corner of the store before a concert at the nearby Symphony, or shopping in Hayes valley. If you are in the neighborhood, try also Miette Confiserie.

The value of over-the-counter service

My primary computer is a dual 2GHz PowerMac G5 until I can upgrade it with a Nehalem Mac Pro, most likely around the end of the year or early next year. I bought it in 2004, along with a 23″ Apple Cinema HD (the old pinstripe plastic bezel kind with an ADC connector). Unfortunately, about a year ago the CCFL backlight on the monitor started turning pink from old age, and thus unusable in a properly color-managed photographic workflow.

I used that as an excuse to splurge on a humongous (and agoraphobia-inducing) HP LP3065 30 inch LCD monitor after reading the glowing reviews. The two features that sold me were the enhanced color gamut (the only way to improve that would be to get a $6000 Samsung XL30, something I am not quite prepared to do), and the fact it has 3 built-in DVI ports, so it can easily be shared by multiple computers (assuming they support dual-link DVI, which unfortunately my basic spec Sun Ultra 40 M2 does not). The fact it was 25% less expensive than the Apple 30″ Cinema Display helped, of course.

About 6 months ago, I discovered there was a fine pink vertical line running across the entire height of the monitor, roughly 25 centimeters from the left. Since I primarily use that monitor for photo (the primary monitor for Mail, web browsing or terminals remains the Apple), at first I worried there was a defect with my camera. I managed to reproduce the problem with my MacBook Pro (they have dual-link DVI, unlike lesser laptops), and called HP support (the 3 year HP warranty was also an important consideration when I purchased).

My first support call in November 2007 went well, and the tech told me I would be contacted to arrange for an on-site exchange. This is a seriously heavy monitor and I did not relish the idea of lugging it back to FedEx, so getting premium support for a business-class monitor sounded an attractive proposition. Unfortunately, they never did call back, and as I had other pressing matters to attend to involving international travel, I just put it out of my mind (it is a very subtle flaw that is not even always visible).

I only got around to calling them back a few weeks ago. Unlike in November, I was given the run-around with various customer service reps in India until I was finally routed to a pleasant (and competent) tech in a suburb of Vancouver (the US dollar going in the direction it is, you have to wonder how much longer before HP outsources those call centers back to the US). The problem is not with Indian call centers, in any case, all but one of the CSRs were very polite (I suspect Indians learn more patience as they grow up than pampered Americans or Europeans would). The problem is poorly organized support processes and asinine scripts they are required to go through if they want to keep their jobs. In any case, the Canadian rep managed to find the FRU number and also told me someone would call to schedule an appointment. Someone did call this time, to let me know the part was back-ordered and they would call me when it becomes available.

This morning, as I was heading for the shower, my intercom buzzed. It was a DHL delivery man with the replacement monitor. I had to open the door to him in my bath robe. Naturally, nobody at HP bothered to notify me and had I left earlier, I would have missed him altogether.

One of the great things about Apple products is that if you live near an Apple store, you can just stop by their pretentiously-named Genius bars and get support for free (though not free repairs for out-of-warranty products, obviously). I now have a fully working HP monitor again, so I suppose I can’t complain too loudly, but the Apple monitor with the sterling support looks like the true bargain in hindsight.

Backing up is hard to do (right)

You can never overstate the importance of backups. Over the last year I have put quite a bit of effort in making sure my data is backed up properly. The purpose of this article is not to describe backup best practices (that is a vast subject, there are other, better resources available on the web, and in any case there is no one-size-fits-all solution). I am just documenting my setup, the requirements that drove it, and possibly give readers some ideas.

The first part in planning for backup is to do an inventory of the assets you are trying to protect. In my case, in order of priority:

  • 1.5GB of scans of important documents: birth certificates, diplomas, invoices, legal documents, bank statements, and so on. This data is very sensitive, and should be encrypted.
  • 150GB of digital photos and scans
  • My address book, which lives on my laptop
  • My source code repositories
  • My personal email, approximately .75GB
  • The contents of this website, about 5GB
  • 190GB of music (lossless rips of my CD collection)
  • My Temboz article database

Thus the total storage capacity required for a full backup is reaching the 400GB mark. This in itself precludes DVD-R or even tape backup (short of buying an expensive LTO-4 tape drive or an autoloader, that is).

The second step is to devise your threat model. In my case, by decreasing order of likelihood:

  1. Human error
  2. Hard drive failure
  3. Software failure (e.g. filesystem corruption)
  4. Silent data loss or corruption, e.g a defective disk
  5. Theft
  6. Fire, earthquake, natural disaster, etc.

Third, some general principles I believe in:

  • Do not use proprietary backup formats. The best format is plain files on a filesystem identical in structure to the original.
  • Do not rely on offline media for backups. The watched pot does not boil over, online data is much less likely to go bad without my noticing until it is too late.
  • A backup plan needs to be effortless to be successful. Plugging in external drives when backups are needed, or rotating drives between home and office is something I have tried, but not stuck to.
  • Backups should be verified — they should generate positive feedback, so that the absence of feedback can alert to problems
  • For all types of data, there should be one and only one reference machine that holds the authoritative copy. Multi-master synchronization and replication is possible using tools like Unison, but is much harder to manage and increases the risk of human error.

With these preliminaries out of the way, here is my system:

  • My primary backups reside on my home server, a Sun Ultra 40 M2 workstation, running Solaris 10. This machine is very quiet, so I can keep it running in the room next to my bedroom without disturbing my sleep. It is also relatively power-efficient at 160W with seven hard drives.
  • One of the seven drives is the 160GB boot drive, and the other six are 750GB Seagate drives configured in a 3TB ZFS RAID-Z2 storage pool.
  • With large SATA drives, reconstruction after a drive failure is long and the risk of another drive failing due to the stress of rebuilding is not negligible. RAID-Z2 can tolerate two drives failing, unlike RAID 5 which can only tolerate a single drive failure. This level of data protection is higher than RAID 1 since RAID 1 won’t protect you if two drives that are the mirror of one another fail. You can get the same level of protection in RAID 6 or RAID-DP.
  • I have scripts to take ZFS snapshots daily, equivalent to the auto-snapshot service. The daily snapshots are kept for the current month, then I keep only monthly snapshots. Snapshots are the primary line of defense against human error.
  • Snapshot technology consumes only as much disk space as required to store the differences between the snapshot and current versions of a file, and is much more efficient than schemes like Apple’s Time Machine where a single byte change to a multi-gigabyte file like a Parallels virtual disk image will cause the entire file to be duplicated, wasting storage. Because snapshots are taken near instantly and cost almost nothing, they are an extremely powerful feature of a storage subsystem.
  • I backup from my various machines to the Sun via rsync over ssh. An incremental backup of my PowerMac G5, which has most of the 400GB in my backup set, takes less than 5 minutes over Gigabit Ethernet, despite the ssh encryption.
  • ZFS is probably the best filesystem, bar none, but it is not perfect, as demonstrated by the Joyent outage and you still need another copy for backup in case of ZFS corruption.
  • Every night at 2AM a cron job on my old home server (2x400GB, ZFS RAID 0), that I now I keep at work, pulls updates from the Sun using rsync over ssh (the company firewall won’t let me push updates to it from the Sun). Another cron job at 8AM kills any leftover rsync processes, e.g. if there are more data changes to transfer than fit in the 1-2 GB that can be transferred in 6 hours over my relatively pokey 320-512kbps DSL uplink (no thanks to AT&T’s benighted refusal to upgrade its tired infrastructure).
  • My cron jobs use verbose output which generates an email sent back to me. I could suppress those messages, but then I would lose the ability to detect errors.
  • A last line of defense is to back up my server at work to a D-Link DNS-323 NAS box using rsync over NFS. This cute little unit holds two Western Digital Green Power 1TB drives in RAID 1, which slide right in, no tools required. It consumes next to no power or desk space. Since it runs Linux and is easy to extend using fun-plug, I could conceivably run the cron and rsync from there. As a bonus, the built-in mt-daapd server streams my entire music collection to iTunes over the LAN so I can listen to any of my CDs at work.
  • It can take a few days for this data bucket brigade to catch up with a particularly intense photo shoot, but it will eventually and is never too far behind. This provides me with near continuous data protection and disaster recovery.

Update (2009-10-07):

I made some changes. My office backup server is now an inexpensive Shuttle KPC 4500 running OpenSolaris 2009.06 and a 1TB drive. It in turn backs up to the DNS-323, although I need to qualify the recommendation – like many embedded Linux devices, the DNS-323 has a distressing tendency to get wedged every now and then, requiring a reboot, and is not reliable enough as primary offsite backup in my book. OpenSolaris, of course, is rock-stable, and the hardware is not much more expensive (I paid $400 for the KPC).

My backups are now much faster since I upgraded to 20Mbps symmetric Metro Ethernet service from Webpass a month ago.

Update (2014-01-09):

Since I moved to a semi-suburban house two years ago and had to revert to AT&T’s abysmally slow DSL service, remote backups over rsync are no longer a viable option and I have to use sneakernet. My current setup is:

  • A Time Machine backup onto a 4TB internal drive inside my Mac
  • hourly rsync backups onto a 2TB WD My Passport Studio. I actually have two of these and rotate them between home and office. They have a metal case (helps heat dissipation and increase drive lifetime and reliability) as well as hardware AES encryption

Push recruiting

As I was debugging why feedparser is mangling the GigaOM feed titles, I found this easter egg on the WordPress hosted site:

zephyr ~>telnet gigaom.com 80
Trying 72.232.101.40...
Connected to gigaom.com.
Escape character is '^]'.
GET /feed HTTP/1.0
Host: gigaom.com

HTTP/1.0 301 Moved Permanently
Vary: Cookie
X-hacker: If you're reading this, you should visit automattic.com/jobs and
apply to join the fun, mention this header.
Location: http://feeds.feedburner.com/ommalik
Content-type: text/html; charset=utf-8
Content-Length: 0
Date: Thu, 20 Mar 2008 23:36:17 GMT
Server: LiteSpeed
Connection: close

Connection closed by foreign host.

Knowing how to issue HTTP requests by hand is one of my litmus tests for a web developer, but I had never thought of using it in this creative way as a recruiting tool…