Digital photography used to combat domestic violence has an interesting article on how digital photography is used by police departments to document cases of domestic violence. Interestingly, even humble 1 megapixel cameras can show details like bruises that eluded the Polaroids the police used before.

Update (2002-09-02): An article in the New York Times (free registration required) on the same subject.

Digital imaging workflow matters

When you first start tinkering with digital imaging, you do things by the seat of the pants, and after a while you realize you need a more disciplined approach to have a manageable setup. The result is called a workflow.

Workflow phases

Each person’s workflow is slightly different, but the following rough steps are common to everyone:

  1. Acquisitiongetting the pictures in, whether from a flatbed scanner, a slide/negative scanner, PhotoCD or digital cameras. This also encompasses automated primary cleanup done from within a scanner driver, e.g. Digital ICE3.
  2. Reviewingdeleting dud pictures, and if you have duplicates, selecting only the best one.
  3. Asset managementcataloging your pictures in a database, with categories, captions and all. Professional organizations like photo agencies go to a very high level of detail as this is the key to their business, but this is also essential for anyone contemplating building an imae collection of more than 1000 pictures or so.
  4. Editingyou can go hog-wild with Photoshop or the GIMP, although since this is a very labor-intensive process, it is usually done to a small minority of pictures
  5. Outputgetting prints made, but also publishing to the Web
  6. Backupbacking up in case of hardware failure or catastrophe.


What hardware you use for acquisition controls the final quality of your results, so:

  • Don’t skimp on a cheap scanner, use slide or negative scanners rather than flatbed scans from prints
  • Use digital cameras like the Canon D60 or Nikon D100 that have larger sensors with less thermal noise rather than point and shoots.

Using a slide/negative scanner is a very slow and laborious process, and a preferable option in many cases is to have scans made by a photo lab. Avoid the low quality Kodak PictureCD and opt instead for PhotoCD Master, which has higher resolution and scans made more carefully.

Reviewing (editing)

Getting rid of the chaff early is a major step in improving your productivity, but it is difficult to be objective about one’s own photos. This process is sometimes also known as editing, although this term lends itself to confusion with digital imaging. Here is a good introductory article on the subject: Give That Cat The Boot: Editing 101.

Asset management

For most of the other phases, the choice of software does not matter very much and will indeed change over time. It is essential to get asset management right up-front, however. The solution you use must be

  1. scalable to accommodate an expanding collection of photographs
  2. open, you don’t want to be locked in a proprietary database format, at the very least you should have the ability to export the database to some kind of text format
  3. flexible, allowing you to enter as much or as little metadata as you require for any given photo
  4. Offer powerful retrieval capabilities: you should be able to run queries like “find all the photos of me and my grandma in front of the Golden Gate bridge”, or full-text caption search (if you use captions, not very common because of the amount of work involved)
  5. standards compliant, the key standards being EXIF (picture metadata like aperture and exposure) and IPTC (the press photographers’ standard for captions)

The best program I’ve found so far is IMatch (Windows only, I’m afraid), mostly because of its incredibly flexible category system, that works like set theory with multiple inclusion relationships and boolean operators. I have posted a more detailed overview of how to use IMatch for image category management. As I have switched to the Mac, my current asset management program is Kavasoft Shoebox, which has the same power as IMatch and a much better user interface to boot, but is not scriptable.


The most comprehensive description of a Photoshop editing workflow is available here on Michael Reichmann’s Luminous Landscape site.


As I’ve mentioned elsewhere, my preferred output method used to be prints made on a Fuji Frontier digital minilab system. Unfortunately, most labs are clueless about color management and cropping, and I now use an Epson R1800 archival pigment ink printer. People who want to print digital black & white prints may opt for the R2400 instead.


This is essential if you do not want all your hard work above to go in smoke in case of a hard drive failure.

Media failures are not the only kind of disaster that can destroy your digital images, fire, theft, flooding and earthquakes are also a consideration, depending on where you live. Most companies have a disaster recovery plan (at least on paper), most individuals should have a simplified one for their personal effects as well. I am not just talking about photos: scanning property titles, diplomas and other vital documents is an inexpensive precaution.

Sticking to a diet is hard. So is sticking to a backup plan, for human factors and process-related reasons, not technical ones. If your chosen backup method is so cumbersome you don’t apply it regularly, it is not going to do you much good. You should focus on developing a process that fits your risk sensitivity as well as your time and budget, and if your current approach is not sustainable, reexamine your backup requirements to fit within what you can do on a regular basis. A weekly or monthly backup schedule should not be too onerous for most people.

The backup process should also involve periodic verification of the backups, so that media failure can be detected and corrected immediately. This implies redundancy in the backup, as well as diversification (use media of different types, or different manufacturers, to avoid simultaneous failure from systemic causes). If you wait 5 years until you actually need the backup, Murphy’s law will inevitably strike.

CD-R and DVD-R media are the cheapest per megabyte, but I am not convinced of their archival characteristics (some published tests have shown CD-Rs can become unreadable in as little as 2 years). 70 or 80GB DLT tape cartridges (and other tape technologies like DAT DDS, 8mm, VXA or LTO) offer high capacity and are durable, but tape drives are very expensive, unreliable and usually available only in SCSI.

Just as the watched pot does not boil over, online data like that stored on external hard drives is harder to misplace than removable media. The solution I use is to make two backups onto two external 250GB firewire hard drives (under about $1 a gigabyte as of July 2005). I rotate them weekly between home and my office, so even if my apartment burns down, I will have lost at most a week’s worth of pictures.

If you prefer CD-R or DVD-R, be sure to use reliable brands like Mitsui Gold and follow the NIST guidelines for their care and handling (here is the PDF one-page summary).

For the backup software, I do not trust proprietary indexing formats and use a regular filesystem with incremental disk to disk copies using XXCOPY on Windows, LaCie’s free SilverKeeper utility on Mac OS X, and Rsync on UNIX.

Format obsolescence is a factor, although the magnitude of the risk is often overblown. While JPEG and TIFF are likely to be supported well into the future, manufacturers’ proprietary RAW image formats (for digital cameras) are less likely to. When a format becomes obsolete, it should be converted to a more durable one, obviously before the OS and drivers for it have become nonfunctional.

Finally, we are all mortal. If you were to disappear tomorrow, would your loved ones know how to retrieve your photos? Making prints of the best ones is a low-tech but robust way of ensuring their passage over time, possibly even skipping generations.

Image category management with IMatch

I have discussed digital imaging workflows elsewhere. In this article, I would like to focus on the asset management using IMatch.

Category management

The main reason why I selected IMatch is for its advanced category management. This masquerades as a hierarchical system, but it is actually a full-blown set theoretical system with:

  1. Inclusion relationships: if a category is under a more general category, anytime you search for the more general category, the images under the more specific category will also appear, without having to assign them to the more general category explicitly as well.

  2. Multiple inclusion (a category can belong to multiple larger categories.

  3. Derived categories: you can have categories that are defined using boolean formulas of categories and file system folder location.

Category tree

Category schema

To the left is a screenshot of the category window in IMatch.

It is very important to give some prior thought to how one’s categories will be organized, just as for librarians the choice of a catalog system like the Dewey decimal index is almost a matter of religion. A poorly designed category schema will almost inevitably entail at some point in time having to laboriously reassign categories.

The figure to the left shows my categories. I try and have a hierarchy where the most specific category (leaf categories in the tree) have a manageable number of images. If there are too many (more than a couple hundred), this means I have to break down the category into smaller, more specific subcategories.

Obviously, pictures of friends and family make a substantial proportion of my image catalog, so it is important to have a manageable approach to those.

For family, I use an inverted family tree: a family tree centered on myself, and then expanding on my father’s and mother’s side of the family, and so on, so that the path I take to get to the category for a relative. In the example on the left, I can get to the category for my cousin Hajera as follows:

  1. My family

  2. My mother’s side

  3. My mother’s sister Yasmeen

  4. Her daughter Hajera

This scheme is very simple and extensible, and avoids bunching all relatives together in a single disorganized mess.

In practice, however, I make some small adjustments that break this general approach. For instance, I have a separate category for my mother herself (“Naheed”, just under “Maman” – yes, I think in a mix of French and English), because otherwise whenever I would select the category for my mother, all her side of the family would appear as well, and the same thing for my grandmother. Andy Katayama describes a more systematic way to deal with these situations.

For friends, I use an approach where I organize them by how I met them (school, work, and so on). Sometimes, a single friend can belong to multiple categories, which is where IMatch’s multiple inclusion scheme comes in handy.

For instance, Bruno Chomel is an ex-colleague from three different companies, so I have link category to him in each of the three companies’ categories, as shown in the figure to the right.

Bruno Chomel categories

I also have some special-purpose categories. “Concepts” is used for categories that encompass abstract concepts like humorous pictures. “Technical” is used for things like flagging pictures that are part of a panoramic set, or my best pictures.

Finally, I have a category “Places” that is used to indicate the geographic location for photos that have a distinguishing landmark in them, and this category is organized hierarchically by continent, then country, state, city and so on.

In addition to these categories I defined myself, I also use the standard “Universal” category schema supplied with IMatch for standard categories like “People & Relationships / Weddings” or “Culture & Communities / Holidays & Celebrations / Thanksgiving”.

Derived categories

As an example of a derived category, here is the property box for the “Uncategorized” category I use for all images that do not have any category assigned.

Category properties

This formula means “all pictures that are not categorized under Fazal’s user-defined category schema or the Universal category schema supplied with IMatch, and that are not part of the Photodisc folder”. The expression for the Photodisc folder looks intimidating, but I actually entered it using the second button above, which allows me to enter a specific folder from a pop-up menu.

Assigning categories efficiently

For an asset management system to be viable, it shouldn’t take too much time. Systems that require you to enter captions are too burdensome for regular use, but categories strike the right balance, as long as they can be assigned to large numbers of images efficiently.

IMatch offers a number of time-saving features like splashers (a small drop-down menu that allows you to assign your most-frequently used categories to an image in two clicks), but in many cases this is not enough.

I import photos from my digital camera in batches of between 50 and 300, and I use the following algorithm to make assignments quickly. I have used this technique to categorize over a thousand images in less than an hour.

First, when importing the images (by clicking on the rescan button in the filesystem view of the database after importing the images), check the option to bookmark new images. Then go to the bookmarked images selection to view all of them.

Bookmark new images dialog

Look at the first image, and pick a category among those that fit it. Control-click on all images that are also in that category, and then assign the category to all of them in one go using the category assignment dialog. Repeat if there are more categories to assign to the first image. Once all categories for the first image are exhausted, toggle the bookmark on it to make it disappear from the view. Usually, the next few images also have no categories remaining so you can take out a batch. You then repeat with the next image that has categories remaining to be assigned, and repeat the process until all categories have been assigned.

Category assignment

Born Free and Equal

For fans of Ansel Adams, the Library of Congress has an online exhibit based on his book “Born Free and Equal”. This book is a series of photographs taken at the Manzanar camp in California where Japanese-Americans were interned during World War II.

A book version has been reprinted.

Update (2002-09-16): This article in The Atlantic sheds some light on the background for the original exhibition.

Rules of thumb


The optimal line length for readability is around 10-12 words (source: The Thames and Hudson Manual of Typography by Ruari McLean)

Telecoms, networking and IT

The ratio of peak load to average load in a service with diurnal activity variations is approximately 3 to 1. Source: my own empirical observation from Wanadoo access logs and France Telecom telephone call usage logs.

Probability of a Web page having X incoming links referring to it: P = X ^ -2.1 Source: []2

When specifying computers, for balanced performance provision one gigabyte of RAM per gigahertz per core/thread.

Any standard making use of ASN.1 is a piece of junk.

You only get the benefits of statistical multiplexing or compression once, and it should only be done in one layer. Any other layers attempting to do the same only add cost, complexity, brittleness, overhead and latency.

When designing high-availability systems, fail-over is not the hard part, falling back is.


For most ordinary lenses, optimal sharpness is around f/8. For high-quality lenses, it is one or two stops below full aperture. Only the very best lenses are diffraction-limited and offer optimal performance at full aperture.

Camera light meters are calibrated for 12% gray. Common gray cards are 18% gray, so if you use one for metering, you should open up one half stop to compensate. Source: []3

The human eye is a 6-7 megapixel sensor. The monocular field of view is 180 °, the binocular field of view is 120-140°, and the normal focus of attention spans a 45° field of view.

Avoid Kodak products like the plague. Those products they make that are actually decent (i.e. the engineers managed to sneak them past the bean counters) soon get adulterated (like Tri-X) or discontinued (like PhotoCD or their medium-format digital backs). Prefer Fuji, Agfa or Ilford.

Manual of Photography, Photographic and digital imaging, 9th edition

Ralph E Jacobson, Sidney F Ray, Geoffrey G Attridge, Norman R Axford

Focal Press, ISBN: 0240515749,  Publisher, Buy online.

coverThis book is simply wonderful. It is a detailed and comprehensive treatise on the physical, optical, chemical and otherwise scientific theory behind photography (the authors all have a bevy of these wonderfully quaint British learned society titles, in addition to a hefty list of PhDs and graduate degrees). Also distinctive is that the first edition was published in 1890 and thus it spans three centuries!

That said, the coverage of the latest developments like digital photography is impressive, and this is one of the first photography textbooks that have been updated completely for the coming migration to digital, rather  than treating it as an afterthought.

I’ve been looking for a long time for such a book, that explains the theory without patronizing a scientifically literate reader. For instance, the book explains how ISO ratings are defined for film and for electronic sensors, how depth of field is computed, the diffraction limit on sharpness at small apertures and so on. If you are afraid of equations, this is not the book for you.

Fuji Frontier digital prints are really high quality

Note (2004-04-17):

I am keeping the mention of Wal-Mart in this article for historical reasons. It has since come to my attention, however, that Wal-Mart has in many cases violated racial discrimination and immigration laws, locked in its night shift employees, potentially putting their health and life in danger in case of medical emergency, and repeatedly stolen from its employees by illegally witholding overtime pay and fraudulently altering computerized time sheets. The frequency of these reports suggests these are not isolated incidents, as the company asserts, but rather actively condoned or encouraged, and the result of a system of perverse incentives and pressure on middle management that can only be achieved by resorting to these criminal practices.

I do not believe it is morally permissible for me to patronize such an ethically dubious firm, and urge you not to either. In contrast, Costco is cheaper, and yet offers decent working conditions, pay and benefits to its employees.

I received some prints I made from Wal-Mart Photo Center by uploading digital photos taken with my Canon EOS D30. I made a mix of 4″x6″ and 8″x10″. The quality is very good, much better than that of traditional silver-halide photos I took with my old Nikon N6006, and when you look at them with a 10x loupe, they completely blow the supposedly 1200dpi inkjet prints from my HP Photosmart P1000 out of the water.

These prints are made on real photographic paper (Fuji Crystal archive, rated at 25 years) by a  Fuji Frontier laser photo printer which exposes the photo paper by passing red, green and blue lasers on it, and the print is then developed conventionally.

Wal-Mart has also improved the uploading process. When using IE on Windows, an ActiveX component allows simple drag-and-drop uploading of large numbers of images, as opposed to the laborious HTML file upload-based process that was limited to 5 images.

Update (2002-09-16):

For people who live in San Francisco (and probably other locations as well), Costco is a cheaper option. They use a Frontier 370 digital minilab in their SoMa location and they charge 20 cents for a 4×6, and $2 for a 8×10. Unfortunately, their on-line service uses the inferior Kodak process. When I went there last Saturday, they took my originals on CD and gave me back my prints in about 2 hours (although they printed my 8×10 as 4×6 by mistake, which added another 30 minutes, but the lady at the counter was very helpful). Sam’s club apparently matches Costco pricing and is upgrading to Frontiers as well (including on-line).

Other Frontier locations I know of in San Francisco (much more expensive, unfortunately): F-1 Photo at 690 Market (@ Post), Ritz Camera (2185 Chestnut). Two good resources for Frontier enthusiasts: this Digital minilabs list is a directory of (among others) Frontier-equipped minilabs, and Dry Creek Photo offers a color profiling service for your local minilab to obtain optimum color accuracy (they will profile your minilab for free if it isn’t already listed in their database).

Update (2002-10-24):

There is an article on laser digital minilabs in the New York Times (free registration required). One interesting tidbit is that Fuji’s initial implementation of the Frontier was so sharp it revealed every skin blemish, and they had to add code to detect and smooth out curved areas of skin-tone color.

Update (2003-07-02):

Other good reviews:

Update (2003-07-28):

I visited the San Francisco Costco yesterday, and they have replaced their Fuji Frontier 370 with a Noritsu QSS-3101 (PDF). This generation of Noritsu digital minilab uses a laser rather than the MLVA (LED) technology used in earlier Noritsu minilabs, and it should have equivalent quality (I will know for sure this coming Thursday when I get my prints back – it seems the word is out and Costco now has quite a backlog).

The nice thing is they now have a self-service Noritsu CT-1 kiosk where you can upload your photos from flash cards or CD, albeit with a slightly clunky interface. They also support 8×12 rather than 8×10 now, and more interestingly larger sizes as well, up to 12×18.

Fortunately, the paper used is still Fuji Crystal Archive rather than the inferior Kodak alternatives Noritsu is usually associated with (Kodak resells Noritsu minilabs, and allegedly some Agfa minilab components as well).

Update (2003-09-07):

Another Fuji Frontier location in San Francisco. Walgreens’ Fisherman’s Wharf store (Jones between Jefferson and Beach) has a Frontier 370 with an Aladdin self-service kiosk front-end. To their credit, they resisted the temptation to gouge the tourists that will probably make the bulk of their custom. They advertise a package of $6.99 for 24 4×6 digital prints, which isn’t that much more than what you would get from Costco. They also promise 1 hour delivery, as long as the machine is operating below capacity. The operator was not able to gve me ansers on the price of 8×10 enlargements (he was from the night shift, as it was past 9PM), but he thinks it is in the vicinity of $4-$5.