Mylos

Image category management with IMatch

I have discussed digital imaging workflows elsewhere. In this article, I would like to focus on the asset management using IMatch.

Category management

The main reason why I selected IMatch is for its advanced category management. This masquerades as a hierarchical system, but it is actually a full-blown set theoretical system with:

  1. Inclusion relationships: if a category is under a more general category, anytime you search for the more general category, the images under the more specific category will also appear, without having to assign them to the more general category explicitly as well.

  2. Multiple inclusion (a category can belong to multiple larger categories.

  3. Derived categories: you can have categories that are defined using boolean formulas of categories and file system folder location.

Category tree

Category schema

To the left is a screenshot of the category window in IMatch.

It is very important to give some prior thought to how one’s categories will be organized, just as for librarians the choice of a catalog system like the Dewey decimal index is almost a matter of religion. A poorly designed category schema will almost inevitably entail at some point in time having to laboriously reassign categories.

The figure to the left shows my categories. I try and have a hierarchy where the most specific category (leaf categories in the tree) have a manageable number of images. If there are too many (more than a couple hundred), this means I have to break down the category into smaller, more specific subcategories.

Obviously, pictures of friends and family make a substantial proportion of my image catalog, so it is important to have a manageable approach to those.

For family, I use an inverted family tree: a family tree centered on myself, and then expanding on my father’s and mother’s side of the family, and so on, so that the path I take to get to the category for a relative. In the example on the left, I can get to the category for my cousin Hajera as follows:

  1. My family

  2. My mother’s side

  3. My mother’s sister Yasmeen

  4. Her daughter Hajera

This scheme is very simple and extensible, and avoids bunching all relatives together in a single disorganized mess.

In practice, however, I make some small adjustments that break this general approach. For instance, I have a separate category for my mother herself (“Naheed”, just under “Maman” – yes, I think in a mix of French and English), because otherwise whenever I would select the category for my mother, all her side of the family would appear as well, and the same thing for my grandmother. Andy Katayama describes a more systematic way to deal with these situations.

For friends, I use an approach where I organize them by how I met them (school, work, and so on). Sometimes, a single friend can belong to multiple categories, which is where IMatch’s multiple inclusion scheme comes in handy.

For instance, Bruno Chomel is an ex-colleague from three different companies, so I have link category to him in each of the three companies’ categories, as shown in the figure to the right.

Bruno Chomel categories

I also have some special-purpose categories. “Concepts” is used for categories that encompass abstract concepts like humorous pictures. “Technical” is used for things like flagging pictures that are part of a panoramic set, or my best pictures.

Finally, I have a category “Places” that is used to indicate the geographic location for photos that have a distinguishing landmark in them, and this category is organized hierarchically by continent, then country, state, city and so on.

In addition to these categories I defined myself, I also use the standard “Universal” category schema supplied with IMatch for standard categories like “People & Relationships / Weddings” or “Culture & Communities / Holidays & Celebrations / Thanksgiving”.

Derived categories

As an example of a derived category, here is the property box for the “Uncategorized” category I use for all images that do not have any category assigned.

Category properties

This formula means “all pictures that are not categorized under Fazal’s user-defined category schema or the Universal category schema supplied with IMatch, and that are not part of the Photodisc folder”. The expression for the Photodisc folder looks intimidating, but I actually entered it using the second button above, which allows me to enter a specific folder from a pop-up menu.

Assigning categories efficiently

For an asset management system to be viable, it shouldn’t take too much time. Systems that require you to enter captions are too burdensome for regular use, but categories strike the right balance, as long as they can be assigned to large numbers of images efficiently.

IMatch offers a number of time-saving features like splashers (a small drop-down menu that allows you to assign your most-frequently used categories to an image in two clicks), but in many cases this is not enough.

I import photos from my digital camera in batches of between 50 and 300, and I use the following algorithm to make assignments quickly. I have used this technique to categorize over a thousand images in less than an hour.

First, when importing the images (by clicking on the rescan button in the filesystem view of the database after importing the images), check the option to bookmark new images. Then go to the bookmarked images selection to view all of them.

Bookmark new images dialog

Look at the first image, and pick a category among those that fit it. Control-click on all images that are also in that category, and then assign the category to all of them in one go using the category assignment dialog. Repeat if there are more categories to assign to the first image. Once all categories for the first image are exhausted, toggle the bookmark on it to make it disappear from the view. Usually, the next few images also have no categories remaining so you can take out a batch. You then repeat with the next image that has categories remaining to be assigned, and repeat the process until all categories have been assigned.

Category assignment

Born Free and Equal

For fans of Ansel Adams, the Library of Congress has an online exhibit based on his book “Born Free and Equal”. This book is a series of photographs taken at the Manzanar camp in California where Japanese-Americans were interned during World War II.

A book version has been reprinted.

Update (2002-09-16): This article in The Atlantic sheds some light on the background for the original exhibition.

Peer-to-peer collaborative spam filtering

An interesting product from a young company called Cloudmark addresses the spam explosion.

It works as an Outlook add-in that allows you to flag a message as spam and uploads a signature to the network, and thus helps other Cloudmark users to block the spam, in effect acting as a distributed peer-to-peer Brightmail.

It remains to be seen whether this system will be resistant to denial of service or poisoning attacks.

Update (2002-07-10): I’ve been trying it out for three weeks, and so far it looks pretty good. Out of 353 spam I’ve received, it successfully blocked 233. It also gave 3 false positives from permission marketing companies (Art.com and MyPoints), which is not absurd as they have very poor optout management. But it also flagged an IEEE newsletter as spam, which seems a little bit excessive. So, use with precaution.

The Thames and Hudson Manual of typography

Ruari McLean

Thames and Hudson, ISBN: 0500680221  Publisher

coverThis is a curious book, part tutorial, part cookbook, part personal war stories, including the author’s pet tools and techniques.  It was obviously designed before computers were commonplace and many sections dealing with hot metal type or phototypesetting are completely obsolete nowadays.

The beginning has a decent introduction to the history of typography and typefaces.

The middle part concerns itself with working around the constraints of metal or copyediting before word processing systems became commonplace. If nothing else, it should give us a renewed appreciation of how much tedious labor computers save us, such as not having to count characters to find out how many pages will be required.

The final part on layout for stationery, books and magazines is pretty good, but not very systematic, and carries the same war story flavor as the section on recommended tools.

All in all, this book has some interesting information, but I would not recommend it to anyone who wants to learn how to produce beautiful documents out of his desktop publishing setup. Robin Williams’s “The PC is not a typewriter”, Robert Bringhurst’s “The Elements of Typographical Style” or even Donald Knuth’s books on computers and typography are better choices in this respect.

Using Wake-on-LAN with Python

Most modern PCs and Macintoshes feature Wake-on-LAN. This feature, originally called “Magic packet” (PDF) by AMD, allows you to start a PC remotely by sending a specially formed “magic packet” to its Ethernet interface. On Macs running OS X, Wake-on-LAN seems to work only when the Mac is in sleep mode, not when it is completely turned off. The original intent was to allow administrators to boot PCs remotely to run backups, but with the spread of DSL, there are other uses.

For instance, I have a low-noise Solaris machine running 24/7 at my home (angband.majid.fm), and when I need to access my (noisy) home PC, I just log on to that machine via SSH, wake up the PC and then log on remotely using pcAnywhere. The same works with my iMac G4

Here is a very simple Python script that starts a machine with a given MAC address:

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.sendto('\xff'*6 + '\x00\x02\xb3\x07\xb6\xd1'*16, ('192.168.1.255', 80))

It will start the machine with MAC address 00:02:B3:07:B6:D1 on the subnet 192.168.1/24 by sending a Wake-on-LAN magic packet to the subnet-directed broadcast IP address.

Update (2003-12-05):

Now that you have woken your Mac, how do you send it back to sleep? Read this article to find out.

Update (2006-03-19):

On certain versions of Linux, you may get a “permission denied” error message because you are trying to send a packet to a broadcast address. The following code should work:

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
s.sendto('\xff'*6 + '\x00\x02\xb3\x07\xb6\xd1'*16, ('192.168.1.255', 80))