Python

Street sweeping reminders in iCal

Parking signSan Francisco sweeps streets twice a month in residential neighborhoods, and you will be fined if your car is parked on a street being swept. On my street, the schedule is the first and third Monday of each month, between 9am and 11am. I was trying to create reminders to myself in my calendar. Unfortunately, iCal does not have the ability to specify a recurring event with that definition.

No matter, Python to the rescue, the script below generates a year’s worth of reminders 12 hours before the event, in iCal vCalendar format. It does not correct for holidays, you will have to remove those yourself.

#!/usr/bin/python
"""Idora street sweeping calendar - 1st and 3rd Mondays of the month 9am-11am"""
import datetime
Monday = 0
one_day = datetime.timedelta(1)
today = datetime.date.today()
year = today.year
month = today.month

def output(day):
  print """
BEGIN:VEVENT
DTEND:%(end)s
SUMMARY:Idora street sweeping
DTSTART:%(start)s
BEGIN:VALARM
TRIGGER:-PT12H
ATTACH;VALUE=URI:Basso
ACTION:AUDIO
END:VALARM
END:VEVENT
""" % {
    'end': day.strftime('%Y%m%dT110000'),
    'start': day.strftime('%Y%m%dT090000')
    }

print """BEGIN:VCALENDAR
CALSCALE:GREGORIAN
VERSION:2.0"""

for i in range(12):
  day = datetime.date(year, month, 1)
  while day.weekday() != Monday:
    day += one_day
  output(day)
  output(day + 14 * one_day)
  month += 1
  if month > 12:
    month = 1
    year += 1

print "END:VCALENDAR"

Scanning your iTunes library for DRM-infested books

Tor, the leading publisher for Science Fiction and Fantasy books, announced they would be doing away with DRM in their eBooks. The product pages for their books on iBooks now mention “At the publisher’s request, this title is being sold without Digital Rights Management software (DRM) applied”. I figured it would be a good idea to uncripple the many Tor eBooks I have in my collection.

I wrote a quick little Python script to scan my growing iBooks library for books that could be updated. The procedure is to delete the book from both iTunes and iPads, then download it anew (restarting iTunes is also needed after deleting). Apple keeps track of your purchases and will not charge you again.

#!/usr/bin/env python
import sys, os.path, glob, zipfile, platform, xml.etree.ElementTree

# publishers who have forsaken DRM
good = ['Tom Doherty']

if platform.mac_ver()[0] > '10.8':
  bookdir = os.path.expanduser('~/Library/Containers/com.apple.BKAgentService/Data/Documents/iBooks')
else:
  bookdir = os.path.expanduser('~/Music/iTunes/iTunes Music/Books')

os.chdir(bookdir)

ok =  '\033[1;32mDRM-free    \033[0m'
bad = '\033[1;31mDRM-infested\033[0m'

count = 0
salvageable = 0

def extract(meta):
  creator = ''
  status = ''
  pub = ''
  et = xml.etree.ElementTree.fromstring(meta)
  try:
    creator = et.findall('*{http://purl.org/dc/elements/1.1/}creator')
    creator = creator[0].text
    title = et.findall('*{http://purl.org/dc/elements/1.1/}title')
    title = title[0].text
  except:
    assert '!DOCTYPE plist' in meta
    next_tag = None
    for e in et[0].iter():
      if e.tag == 'key' and e.text in ('artistName', 'itemName'):
        next_tag = e.text
        continue
      if next_tag == 'artistName':
        creator = e.text
        next_tag = None
        continue
      elif next_tag == 'itemName':
        title = e.text
        next_tag = None
        continue
  pub = [x for x in good if x in meta]
  return creator, title, pub

def find_meta(file_list, opener):
  for m in file_list:
    if m.endswith('.opf') or m == 'iTunesMetadata.plist':
      meta = opener(m).read()
      return extract(meta)
  
for fn in glob.glob('*/*.epub'):
  status = ok
  suffix = ''
  if os.path.isdir(fn):
    suffix = '(directory)'
    if os.path.exists(fn + '/META-INF/encryption.xml'):
      status = bad
      count += 1
    meta = find_meta(os.listdir(fn), lambda x: open(fn + '/' + x))
  else:
    z = zipfile.ZipFile(fn)
    try:
      i = z.getinfo('META-INF/encryption.xml')
      status = bad
    except KeyError:
      pass
    meta = find_meta(z.namelist(), z.open)
    z.close()
  creator, title, pub = meta
  print status, fn, suffix
  print '\t', creator
  print '\t', title
  if status == bad and pub:
    print '\t\033[1;32mThis is published by', pub[0],
    print 'and could be re-downloaded DRM-free\033[0m'
    salvageable += 1

print count, 'books are DRM-infested'
print salvageable, 'could be cured'

Unfortunately, it seems like the DRM-stripping is still work in progress. Out of the Wheel of Time series, for instance, only the first one is now DRM-free on the iBooks store.

undr ~>drmbooks.py
DRM-free     Books/0083D0AEC37E08453347DD12B1C6F980.epub 
	Greg Bear
	Blood Music
DRM-free     Books/09178837756A4DFF8347EC377345A37B.epub 
	Heinz Wittenbrink
	RSS and Atom
DRM-free     Books/0AD752E995042C7E12F11917AB58C6B8.epub 
	Wes McKinney
	Python for Data Analysis
DRM-free     Books/14BDC66A99E878EC232FFAFA73B341EF.epub 
	Fritz Leiber
	Swords and Deviltry-Fafhrd and the Gray Mouser-Book1
DRM-free     Books/15A1D7FE9B7D815C6FBE1A9A77D7143E.epub 
	Glen Cook
	A Fortress in Shadow
DRM-free     Books/1793F9DE1319B96FDE7E36EB8A1BC961.epub 
	Scalzi, John
	Old Man's War
DRM-free     Books/1D08BE221E8BC8F2A371EFEDE55029AC.epub 
	Ben Fry
	Visualizing Data
DRM-free     Books/24D6EC36CDEA0C1E8612CC61A89EA098.epub 
	None
	Node Cookbook
DRM-free     Books/29DA285F0051C431BD8BA3D1AEC5EAA6.epub 
	Fritz Leiber
	The Swords of Lankhmar: Fafhrd and the Gray Mouser-Book 5
DRM-free     Books/2E88CD68DFD8408CD0E7C0ACB1E78714.epub 
	Glen Cook
	A Cruel Wind: A Chronicle of the Dread Empire
DRM-free     Books/32996A9995040064818BAE4DFB66E92F.epub 
	Kelly Link
	Magic for Beginners
DRM-free     Books/34D3CD13D47E5FEBC6DCF7EF011113BD.epub 
	David Drake
	Lord of the Isles
DRM-infested Books/357298432.epub 
	Iain M. Banks
	The Player of Games
DRM-infested Books/357311036.epub 
	Iain M. Banks
	Use of Weapons
DRM-infested Books/357377857.epub 
	Iain M. Banks
	Against a Dark Background
DRM-infested Books/357396585.epub 
	Brent Weeks
	Night Angel: The Complete Trilogy
DRM-infested Books/357657026.epub 
	Iain M. Banks
	Transition
DRM-infested Books/357658374.epub 
	Po Bronson
	NurtureShock: New Thinking About Children
DRM-infested Books/357662058.epub 
	Iain M. Banks
	Consider Phlebas
DRM-infested Books/357669769.epub 
	Iain M. Banks
	Matter
DRM-infested Books/357914731.epub 
	Herbert, Frank
	Dune Messiah
DRM-infested Books/357918110.epub 
	Dalrymple, William
	City of Djinns
DRM-infested Books/357923567.epub 
	Patrick Rothfuss
	The Name of the Wind
DRM-infested Books/357929995.epub 
	Herbert, Frank
	Dune (40th Anniversary Edition)
DRM-infested Books/357969577.epub 
	Herbert, Frank
	God Emperor of Dune
DRM-infested Books/357987322.epub 
	Herbert, Frank
	Children of Dune
DRM-infested Books/357994537.epub 
	Herbert, Frank
	Heretics of Dune
DRM-infested Books/357994652.epub 
	William Dalrymple
	White Mughals: Love and Betrayal in Eighteenth-Century India
DRM-infested Books/357996119.epub 
	Stross, Charles
	Wireless
DRM-infested Books/360601506.epub 
	Ursula K. Le Guin
	The Dispossessed
DRM-infested Books/360609519.epub 
	Greg Egan
	Schild's Ladder
DRM-infested Books/360627712.epub 
	Raymond E. Feist
	Rides a Dread Legion
DRM-infested Books/360627930.epub 
	Neal Stephenson
	Anathem
DRM-infested Books/360628773.epub 
	Raymond E. Feist
	At the Gates of Darkness
DRM-infested Books/360641088.epub 
	Mihaly Csikszentmihalyi
	Flow
DRM-free     Books/361491495.epub 
	Basil Hall Chamberlain
	Aino Folk-Tales
DRM-free     Books/361494664.epub 
	Poul William Anderson
	Industrial Revolution
DRM-free     Books/361523763.epub 
	Lafcadio Hearn
	The Romance of the Milky Way / And Other Studies & Stories
DRM-free     Books/361527545.epub 
	Saki
	When William Came
DRM-free     Books/361539032.epub 
	Saki
	The Chronicles of Clovis
DRM-free     Books/361557387.epub 
	Saki
	Reginald in Russia and other sketches
DRM-free     Books/361557834.epub 
	Sir Arthur Conan Doyle
	The Adventure of the Dying Detective
DRM-free     Books/361559391.epub 
	Sir Arthur Conan Doyle
	The Valley of Fear
DRM-free     Books/361560694.epub 
	Lafcadio Hearn
	Chita: a Memory of Last Island
DRM-free     Books/361561399.epub 
	Confucius
	The Analects of Confucius (from the Chinese Classics)
DRM-free     Books/361562678.epub 
	Saki
	The Toys of Peace, and other papers
DRM-free     Books/361562764.epub 
	Sir Arthur Conan Doyle
	The Memoirs of Sherlock Holmes
DRM-free     Books/361564075.epub 
	Poul William Anderson
	The Burning Bridge
DRM-free     Books/361564898.epub 
	Henry David Thoreau
	Walden
DRM-free     Books/361565201.epub 
	Saki
	Beasts and Super-Beasts
DRM-free     Books/361565806.epub 
	Isaac Asimov
	Youth
DRM-free     Books/361572327.epub 
	Lafcadio Hearn
	Kokoro / Japanese Inner Life Hints
DRM-free     Books/361573126.epub 
	Sir Arthur Conan Doyle
	Through the Magic Door
DRM-free     Books/361575882.epub 
	Sir Arthur Conan Doyle
	Tales of Terror and Mystery
DRM-free     Books/361578744.epub 
	Lafcadio Hearn
	In Ghostly Japan
DRM-free     Books/361588265.epub 
	Lafcadio Hearn
	Books and Habits from the Lectures of Lafcadio Hearn
DRM-free     Books/361673695.epub 
	E. C. Babbitt
	More Jataka Tales
DRM-free     Books/361686559.epub 
	Poul William Anderson
	Security
DRM-free     Books/361713863.epub 
	Sir Arthur Conan Doyle
	The Return of Sherlock Holmes
DRM-free     Books/361721797.epub 
	Sir Arthur Conan Doyle
	The Adventure of the Cardboard Box
DRM-free     Books/361725352.epub 
	Saki
	The Unbearable Bassington
DRM-free     Books/361725959.epub 
	Sir Arthur Conan Doyle
	The Adventure of Wisteria Lodge
DRM-free     Books/361726975.epub 
	Saki
	Reginald
DRM-free     Books/361727237.epub 
	Sir Arthur Conan Doyle
	The Adventure of the Red Circle
DRM-free     Books/361732286.epub 
	Poul William Anderson
	The Valor of Cappen Varra
DRM-free     Books/361736043.epub 
	Lafcadio Hearn
	Kwaidan: Stories and Studies of Strange Things
DRM-free     Books/361741563.epub 
	Lafcadio Hearn
	Japan: an Attempt at Interpretation
DRM-free     Books/361743007.epub 
	Ambrose Bierce
	The Devil's Dictionary
DRM-free     Books/361743953.epub 
	Sir Arthur Conan Doyle
	The Adventure of the Devil's Foot
DRM-free     Books/361744178.epub 
	Sir Arthur Conan Doyle
	His Last Bow
DRM-free     Books/361745602.epub 
	Poul William Anderson
	The Sensitive Man
DRM-infested Books/362435686.epub 
	Ansary, Tamim
	Destiny Disrupted
DRM-infested Books/366773380.epub 
	Esslemont, Ian C. C.
	Return of the Crimson Guard
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/373338999.epub 
	Jordan, Robert
	The Path of Daggers
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-free     Books/375554215.epub 
	Orson Scott Card
	The Lost Gate
DRM-infested Books/376217648.epub 
	Steven Erikson
	Reaper’s Gale
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/376227359.epub 
	Jordan, Robert
	Winter's Heart
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/376227401.epub 
	Jordan, Robert
	Crossroads of Twilight
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/376227406.epub 
	Jordan, Robert
	Knife of Dreams
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/376227409.epub 
	Sanderson, Brandon
	The Gathering Storm
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/376227423.epub 
	Jordan, Robert
	New Spring
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-free     Books/376231110.epub 
	Cook, Glen
	Surrender to the Will of the Night
DRM-infested Books/376231528.epub 
	Robert Jordan and Brandon Sanderson
	Towers of Midnight
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/378317076.epub 
	Jordan, Robert
	A Crown of Swords
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/378317808.epub 
	Robert Jordan
	Lord of Chaos
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-free     Books/379451459.epub 
	Le Guin, Ursula K.
	Word for World is Forest, The
DRM-free     Books/37EC6E895E6BD70BEB48D9F1553D608E.epub 
	Eben Hewitt
	Cassandra: The Definitive Guide
DRM-free     Books/380490608.epub 
	Jordan, Robert
	The Eye of the World
DRM-free     Books/380494444.epub 
	Asimov, Isaac
	The End of Eternity
DRM-infested Books/381497257.epub 
	Harold McGee
	On Food and Cooking, The Science and Lore of the Kitchen
DRM-infested Books/381622032.epub 
	IAIN M. BANKS
	Look to Windward
DRM-infested Books/381683084.epub 
	Ursula K. Le Guin
	Tehanu
DRM-infested Books/381935940.epub 
	Richard Adams
	Watership Down
DRM-infested Books/382674388.epub 
	Steven Erikson
	Dust of Dreams
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/383912791.epub 
	Steven Erikson
	Bauchelain and Korbal Broach
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/385975662.epub 
	Jordan, Robert
	The Dragon Reborn
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/385981104.epub 
	Steven Erikson
	The Bonehunters
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/385981116.epub 
	Steven Erikson
	Midnight Tides
DRM-free     Books/385982966.epub 
	Brust, Steven
	To Reign in Hell
DRM-infested Books/385987858.epub 
	Steven Erikson
	Gardens of the Moon
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/385989170.epub 
	Steven Erikson
	House of Chains
DRM-infested Books/385992628.epub 
	Jordan, Robert
	The Fires of Heaven
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/385992927.epub 
	Steven Erikson
	Toll the Hounds
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/385992930.epub 
	Jordan, Robert
	The Great Hunt
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/385998417.epub 
	Jordan, Robert
	The Shadow Rising
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/386016540.epub 
	Esslemont, Ian C. C.
	Night of Knives
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/388403394.epub 
	Steven Erikson
	The Crippled God
DRM-infested Books/389191300.epub 
	Iain M. Banks
	Surface Detail
DRM-infested Books/390877859.epub 
	Loewen, James W.
	Lies My Teacher Told Me
DRM-infested Books/393310992.epub 
	Erikson, Steven
	Memories of Ice
DRM-infested Books/394745271.epub 
	Steven Erikson
	Deadhouse Gates
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-free     Books/394745833.epub 
	Walton, Jo
	Among Others
DRM-free     Books/395536306.epub 
	Sir Arthur Conan Doyle
	The Adventures of Sherlock Holmes
DRM-free     Books/395537209.epub 
	Sir Arthur Conan Doyle
	The Sign of the Four
DRM-free     Books/395539542.epub 
	Sir Arthur Conan Doyle
	A Study in Scarlet
DRM-free     Books/395540660.epub 
	Sir Arthur Conan Doyle
	The Hound of the Baskervilles
DRM-free     Books/395686685.epub 
	Dante Alighieri
	Divine Comedy, Longfellow's Translation, Complete
DRM-free     Books/395688318.epub 
	Edgar Rice Burroughs
	A Princess of Mars
DRM-free     Books/395688375.epub 
	Lafcadio Hearn
	Glimpses of an Unfamiliar Japan / First Series
DRM-infested Books/395926792.epub 
	Fukuyama, Francis
	Origins of Political Order
DRM-infested Books/396269736.epub 
	Herbert, Frank
	Chapterhouse: Dune
DRM-infested Books/398283114.epub 
	Rothfuss, Patrick
	The Wise Man's Fear
DRM-free     Books/3A5FBC58E821CFDF15C8C4E85657481E.epub 
	Jon Hicks
	The Icon Handbook
DRM-free     Books/410943153.epub 
	Edwin A. Abbott (A Square)
	Flatland: A Romance of Many Dimensions
DRM-free     Books/413463878.epub 
	Brust, Steven
	Tiassa
DRM-free     Books/418293515.epub 
	Heinlein, Robert A.
	Glory Road
DRM-infested Books/419950945.epub 
	Isaac Asimov
	Foundation
DRM-infested Books/419950970.epub 
	Isaac Asimov
	Foundation and Empire
DRM-infested Books/419950976.epub 
	Isaac Asimov
	Second Foundation
DRM-infested Books/419968238.epub 
	Scott Lynch
	The lies of Locke Lamora
DRM-infested Books/419968784.epub 
	Scott Lynch
	Red Seas Under Red Skies
DRM-infested Books/420037362.epub 
	Kim Stanley Robinson
	The Years of Rice and Salt
DRM-infested Books/420281728.epub 
	Richard Wiseman
	59 Seconds: Think a Little, Change a Lot
DRM-infested Books/420445771.epub 
	Isaac Asimov
	Foundation’s Edge
DRM-infested Books/420446058.epub 
	Isaac Asimov
	Foundation and Earth
DRM-infested Books/420725428.epub 
	Mike Resnick
	Kirinyaga: A Fable of Utopia
DRM-infested Books/421025353.epub 
	William Dalrymple
	The Last Mughal
DRM-free     Books/421124117.epub 
	Brust, Steven
	The Desecrator
DRM-infested Books/422530144.epub 
	Max Barry
	Machine Man
DRM-free     Books/422718511.epub 
	Apple Inc.
	Mac Integration Basics
DRM-free     Books/426914658.epub 
	Brust, Steven
	Five Hundred Years After
DRM-free     Books/428235697.epub 
	Vinge, Vernor
	A Fire Upon The Deep
DRM-infested Books/429173089.epub 
	Ursula K. Le Guin
	The Other Wind
DRM-infested Books/429173713.epub 
	Ursula K. Le Guin
	Tales from Earthsea
DRM-infested Books/429699133.epub 
	Rajaniemi, Hannu
	The Quantum Thief
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/431617578.epub 
	Walter Isaacson
	Steve Jobs
DRM-infested Books/432519291.epub 
	Susan Weinschenk
	100 Things: Every Designer Needs to Know About People
DRM-infested Books/434509014.epub 
	Stross, Charles
	Rule 34
DRM-free     Books/434522188.epub 
	Larry Niven, Jerry Pournelle
	The Mote In God's Eye
DRM-free     Books/434811509.epub 
	Asher, Neal
	Cowl
DRM-infested Books/436646026.epub 
	Neal Stephenson
	Reamde
DRM-infested Books/436691174.epub 
	Julia Child
	Mastering the Art of French Cooking
DRM-infested Books/443149884.epub 
	Daniel Kahneman
	Thinking, Fast and Slow
DRM-infested Books/446155927.epub 
	Esslemont, Ian C. C.
	Stonewielder
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-free     Books/447591195.epub 
	Asher, Neal
	The Skinner
DRM-infested Books/454252718.epub 
	William B. Norton
	The Internet Peering Playbook: Connecting to the Core of the Internet
DRM-infested Books/455525627.epub 
	Amar Chitra Katha
	Birbal The Genius
DRM-infested Books/458461362.epub 
	Pamela Druckerman
	Bringing Up Bebe
DRM-free     Books/45B90418E467D479DCDDF23B932C648C.epub 
	Douglas Crockford
	JavaScript: The Good Parts
DRM-infested Books/460822066.epub (directory)
	Scott Lynch
	The Republic of Thieves
DRM-free     Books/465679A557523FDB836005CF4BB9380E.epub 
	Scott Berkun
	Mindfire
DRM-infested Books/479594044.epub 
	Saladin Ahmed
	Throne of the Crescent Moon
DRM-infested Books/479717436.epub 
	David Crist
	The Twilight War: The Secret History of America’s Thirty-Year Conflict with Iran
DRM-infested Books/479771801.epub 
	William Dalrymple
	In Xanadu
DRM-infested Books/489957500.epub 
	Bruce Schneier
	Liars and Outliers
DRM-infested Books/491186678.epub 
	James Blish
	Cities in Flight
DRM-infested Books/491668459.epub 
	Neal Asher
	Shadow of the Scorpion
DRM-infested Books/491669284.epub 
	Glen Cook
	A Matter of Time
DRM-infested Books/491669288.epub 
	Glen Cook
	Darkwar
DRM-free     Books/492199230.epub 
	Frederik Pohl
	The Tunnel Under the World
DRM-free     Books/492199569.epub 
	Frederik Pohl
	The Knights of Arthur
DRM-infested Books/494939678.epub 
	Esslemont, Ian C. C.
	Orb Sceptre Throne
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/498634992.epub 
	Charles Stross
	The Apocalypse Codex
DRM-infested Books/499392787.epub 
	Daniel Suarez
	Kill Decision
DRM-free     Books/4AE7DDA54BEEFCD65157927546B18063.epub 
	Roberto Ierusalimschy
	Programming in Lua 2ed
DRM-free     Books/4DCFF682728B765A1CE221F3D7C21536.epub 
	Glen Cook
	Starfishers Volume 3: Stars' End
DRM-free     Books/501278407.epub 
	Colette
	Chéri
DRM-free     Books/501758197.epub 
	David Brin
	Existence
DRM-infested Books/501758516.epub 
	Scalzi, John
	Redshirts
	This is published by Tom Doherty and could be re-downloaded DRM-free
DRM-infested Books/503019669.epub 
	J.R.R. Tolkien
	The Lord of the Rings
DRM-infested Books/503153300.epub 
	J.R.R. Tolkien
	Tales from the Perilous Realm
DRM-infested Books/503154129.epub 
	J. R. R. Tolkien and Christopher Tolkien
	The Book of Lost Tales, Part One
DRM-infested Books/503154991.epub 
	J. R. R. Tolkien and Christopher Tolkien
	The Book of Lost Tales, Part Two
DRM-infested Books/503155327.epub 
	J.R.R. Tolkien
	The Children of Húrin
DRM-infested Books/503163148.epub 
	J.R.R. Tolkien
	The Hobbit Deluxe
DRM-infested Books/503164678.epub 
	J.R.R. Tolkien
	The Silmarillion
DRM-infested Books/503167303.epub 
	J.R.R. Tolkien
	Unfinished Tales of Númenor and Middle-earth
DRM-infested Books/504209078.epub 
	Isaac Asimov
	Prelude to Foundation
DRM-infested Books/504371982.epub 
	Iain M. Banks
	The Hydrogen Sonata
DRM-free     Books/511060740.epub 
	Frederik Pohl
	The Hated
DRM-free     Books/511143617.epub 
	Frederik Pohl
	The Day of the Boomer Dukes
DRM-free     Books/511252896.epub 
	Frederik Pohl
	Pythias
DRM-free     Books/513357605.epub 
	Hannu Rajaniemi
	The Fractal Prince
DRM-free     Books/513868CF0AA46D293EE27F74BC399760.epub 
	Jerry Pournelle
	West of Honor
DRM-infested Books/520233773.epub 
	Nate Silver
	The Signal and the Noise
DRM-free     Books/520897403.epub 
	Hugh Howey
	Wool Omnibus
DRM-free     Books/525170910.epub 
	Cory Doctorow and Charles Stross
	The Rapture of the Nerds
DRM-infested Books/526136048.epub (directory)
	Hetty van de Rijt & Frans Plooij
	The Wonder Weeks
DRM-infested Books/529020127.epub 
	Neal Asher
	The Departure
DRM-infested Books/529424632.epub 
	Guy Gavriel Kay
	The Lions of Al-Rassan
DRM-infested Books/536312376.epub 
	Glen Cook
	Garrett for Hire
DRM-free     Books/537023027.epub 
	Erikson, Steven
	Forge of Darkness
DRM-infested Books/541673159.epub 
	Ursula K. Le Guin
	The Tombs of Atuan
DRM-infested Books/541673162.epub 
	Ursula K. Le Guin
	The Farthest Shore
DRM-infested Books/546126326.epub 
	Jan Morris
	HAV
DRM-infested Books/551241606.epub 
	Ursula K. Le Guin
	A Wizard of Earthsea
DRM-infested Books/551567785.epub 
	Murray R. Spiegel, PhD
	Schaum's Outline Mathematical Handbook of Formulas and Tables, Fourth Edition
DRM-infested Books/551747038.epub 
	Chris Hedges and Joe Sacco
	Days of Destruction Days of Revolt v2b
DRM-infested Books/552144691.epub 
	Tamim Ansary
	Games without Rules
DRM-free     Books/553878102.epub 
	Charles Stross
	A Tall Tail
DRM-infested Books/563408849.epub 
	Iain Banks
	Stonemouth
DRM-infested Books/568731449.epub 
	William Dalrymple
	Return of a King: The Battle for Afghanistan, 1839-42
DRM-infested Books/569232538.epub 
	Neil Gaiman
	The Ocean at the End of the Lane
DRM-free     Books/571678152.epub 
	Glen Cook
	The Return of the Black Company
DRM-free     Books/571678945.epub 
	Glen Cook
	The Many Deaths of the Black Company
DRM-free     Books/573656304.epub 
	Glen Cook
	Chronicles of the Black Company
DRM-free     Books/573656441.epub 
	Glen Cook
	The Books of the South
DRM-free     Books/576233114.epub 
	Jack Vance
	Demon Princes
DRM-infested Books/578851675.epub 
	Zilpha Keatley Snyder
	Below The Root
DRM-infested Books/580642602.epub 
	Max Barry
	Lexicon
DRM-infested Books/588794444.epub 
	Charles Stross
	Neptune's Brood
DRM-free     Books/5CCB889F91585202637A3C5FBFD8409F.epub 
	Glen Cook
	Starfishers-The Starfishers Trilogy Volume II
DRM-infested Books/600938002.epub 
	Gardner Dozois
	The Year’s Best Science Fiction: Thirtieth Annual Collection
DRM-free     Books/606232503.epub 
	Ian C. Esslemont
	Blood and Bone
DRM-free     Books/610920977.epub 
	Robert Jordan and Brandon Sanderson
	A Memory of Light
DRM-free     Books/6122FF30560866BD75257E6CCC264371.epub 
	Fritz Leiber
	Swords and Ice Magic-Fafhrd and the Gray Mouser-Book 6
DRM-infested Books/619483561.epub 
	Raymond E. Feist
	Magician's End
DRM-free     Books/622837311.epub 
	Le Comte De  Lautréamont
	Les chants de Maldoror
DRM-free     Books/62497334A4189B1D00E9BDFD95724E2E.epub 
	Fritz Leiber
	The Knight and Knave of Swords-Fafhrd and the Gray Mouser-Book 7
DRM-infested Books/645571245.epub 
	Iain Banks
	The Quarry
DRM-free     Books/647688922.epub 
	Steven Brust and Skyler White
	The Incrementalists
DRM-infested Books/651331715.epub 
	Susan Crawford
	Captive Audience
DRM-infested Books/654456347.epub 
	Iain Banks
	The Wasp Factory
DRM-infested Books/662310218.epub (directory)
	Various Authors
	Star Wars: Empire Volume 3 – The Imperial Perspective
DRM-infested Books/662310219.epub 
	Paul Gulacy
	Star Wars: Crimson Empire
DRM-infested Books/664297397.epub (directory)
	Various Authors
	Star Wars: Empire, Vol. 4: The Heart of the Rebellion
DRM-infested Books/664343525.epub (directory)
	Paul Chadwick, Doug Wheatley & Tomás Giorello
	Star Wars: Empire, Vol. 2: Darklighter
DRM-infested Books/664894567.epub 
	John Ostrander
	Star Wars: Dawn of the Jedi Volume 1—Force Storm
DRM-infested Books/664910993.epub (directory)
	Scott Allie, Ryan Benjamin & Brian Horton
	Star Wars: Empire Vol. 1
DRM-free     Books/666772E4C21E2017E71C10F1990840BC.epub 
	Pieter Hintjens
	ZeroMQ - Connecting your Code
DRM-infested Books/674224604.epub (directory)
	Various Authors
	Star Wars: Empire, Vol. 5: Allies and Adversaries
DRM-infested Books/674226603.epub (directory)
	Thomas Andrews, Scott Allie & Various Authors
	Star Wars: Empire, Vol. 6: In the Shadows of Their Fathers
DRM-free     Books/697938901.epub 
	Charles Stross
	Equoid: A Laundry Novella
DRM-free     Books/6BBD61A64A46FFFF4AE7C5FCEB9CFCEE.epub 
	David Drake
	Balefires
DRM-free     Books/71C79A253586282ABE73C2975237EB08.epub 
	Glen Cook
	Sung in Blood
DRM-free     Books/73BC5DD0E51611BDC359CBAB48CC203F.epub (directory)
	Crane, Stephen
	The Red Badge of Courage
DRM-free     Books/7A0D1AEE343638A8A3769DABB90CD4CB.epub 
	Clay A. Johnson
	The Information Diet
DRM-free     Books/7B46914B55481714DED3D711288978FA.epub 
	Glen Cook
	Shadowline-The Starfishers Trilogy I
DRM-free     Books/809DCC75FE60E749458CB27636EE6777.epub 
	Mercedes Lackey
	The Secret World Chronicle
DRM-free     Books/8350F5DFECCD6AA88714400FDF4F6831.epub 
	François de La Rochefoucauld
	Réflexions ou Sentences et Maximes Morales
DRM-free     Books/853BD1C65277D276BA09E04EBFEB73EF.epub 
	None
	PostgreSQL 9 Administration Cookbook
DRM-free     Books/85907063B2351E91C6E7F5052C090BFD.epub 
	None
	PostgreSQL 9.0 High Performance
DRM-free     Books/8663E7BF735C06318FF450532A67F1C2.epub 
	Glen Cook
	Passage at Arms
DRM-free     Books/8690D69995483C5D2DF6AD38BE53C1D7.epub 
	Paolo Bacigalupi
	The Windup Girl - Second Electronic Edition
DRM-free     Books/8993920EF1C0E8FE8CB47A20BD955F53.epub 
	Fritz Leiber
	Swords Against Death-Fafhrd and Gray Mouser-Book 2
DRM-free     Books/8DC6ED0A99161EBB516E370A60ED1121.epub 
	Jonathan Zdziarski
	Hacking and Securing iOS Applications
DRM-free     Books/8DD0D22CD05B0B8D52DF9DB93FC8616B.epub 
	Fritz Leiber
	Swords in the Mist-Fafhrd And the Gray Mouser-Book 3
DRM-free     Books/912CB8B110736684549EAC4FC36665AB.epub 
	Neil Gaiman and Dave McKean
	Signal to Noise
DRM-free     Books/93061FDFD6EEC01FD8CE4295A049C97C.epub 
	Cory Doctorow
	Homeland
DRM-free     Books/9592D5001632B94D1FEFC98B3A40E049.epub 
	Kelly Link
	Stranger Things Happen
DRM-free     Books/990CA5799084407151488A7C563DF269.epub 
	Tom Hughes-Croucher
	Node: Up and Running
DRM-free     Books/9CCCA9E217602948B84BE9A7A21C2753.epub 
	Mike Resnick
	Birthright: The Book of Man
DRM-free     Books/9DA2C8D7E941C37FA83192E5849AA850.epub 
	Q. Ethan McCallum
	Parallel R
DRM-free     Books/A7DB557515983DB67F74E2BE351FC319.epub 
	Glen Cook
	Reap the East Wind
DRM-free     Books/B18AB85267B815E540C7E370F8D97726.epub 
	Lars George
	HBase: The Definitive Guide
DRM-free     Books/B1A586D24AE5F2450010F8664F8E059D.epub 
	Cory Doctorow
	Pirate Cinema
DRM-free     Books/B562B5D74F8677C946B0A6F81EB34F1B.epub 
	Gotthold Ephraim Lessing
	Nathan the Wise; a dramatic poem in five acts
DRM-free     Books/BE42D426242836AA171539B7415732E6.epub 
	Glen Cook
	The Swordbearer
DRM-free     Books/BE80AD93781746E45CF95607A7BEE687.epub 
	Charlie Stross
	Bit Rot
DRM-free     Books/C39FBC59673713A57D404D62BE85C4DC.epub 
	Lauren Beukes
	Zoo City
DRM-free     Books/C81CBBD470EFDBC62359BDAD12FDF551.epub 
	Thomas Hobbes
	Leviathan
DRM-free     Books/D17DC1A83BD90AFB24055A01831BFAB4.epub 
	Glen Cook
	The Dragon Never Sleeps
DRM-free     Books/DA8FBAB386EC503A0EF21E481D214521.epub 
	David Flanagan
	JavaScript: The Definitive Guide
DRM-free     Books/DBAD0DE7A4305B90F6AC33C673FE68E8.epub 
	Glen Cook
	A Path to Coldness of Heart
DRM-free     Books/DBCCB54B140D1158250B24B7E3E81B63.epub 
	Scott Chacon
	Pro Git
DRM-free     Books/DF6C56F9A8719294B13BA91BED5E1667.epub 
	Mike Resnick
	Ivory
DRM-free     Books/E8D76FDD351E68D92AE4A3F3AEA7EC6A.epub 
	Paolo Bacigalupi
	Pump Six and Other Stories
DRM-free     Books/EEE65A68378C2510E18DF24CD767AC9A.epub 
	Fritz Leiber
	Swords Against Wizardry-Fafhrd and the Gray Mouser-Book 4
DRM-free     Books/F3BC14466A16C96CCD8FFE00DCCF8147.epub 
	Glen Cook
	An Empire Unacquainted With Defeat
DRM-free     Books/F534835595041374B814D151A633E69D.epub 
	Peter Watts
	Blindsight
DRM-free     Books/F8BF760284ADC800B290DCA6D8EA7EF2.epub 
	Ben Klemens
	21st Century C
DRM-free     Books/FE8C57863E40B98CB732FEE4BFDB60BB.epub 
	Glen Cook
	An Ill Fate Marshalling
8 books are DRM-infested
26 could be cured

Update (2013-11-06):

OS X 10.9 Mavericks and the new iBooks app changed the location of the iBooks directory, I changed my script accordingly (and made it adjust depending on which OS version you have). Also, the file names have changed and no longer embed author and title, so I am extracting them from the XML metadata files.

Clearing custom crop aspect ratios in Lightroom

Lightroom’s crop tool allows you to constrain the aspect ratio to a proportion of your choice, e.g. to 4:3, defaulting to the same aspect ratio as the original. The last 5 or so custom crop aspect ratios are saved, but a minor annoyance is you are unable to clear the list.

Python on the Mac and SQLite to the rescue: this simple script  lraspect.zip will reset them. If you use a non-default name for your Lightroom catalog, you will need to edit it. To run it, quit Lightroom and run the script. It will back up your catalog for you just in case.

Needless to say, I cannot be held liable if this script corrupts your catalog or eats your dog (who ate your homework), use at your own risk.

#!/usr/bin/python
import sys, os, sqlite3

# edit this to point to your LR3 catalog if you do not use the default location
lrcat = os.path.expanduser('~/Pictures/Lightroom/Lightroom 3 Catalog.lrcat')

os.system('cp -i "%s" "%s.bak"' % (lrcat, lrcat))
db = sqlite3.connect(lrcat)
c = db.cursor()
c.execute("""select value from Adobe_variablesTable
where name='Adobe_customCropAspects'""")
crops = c.fetchone()[0]
print 'aspect ratios:', crops
c.execute("""update Adobe_variablesTable
set value='{}'
where name='Adobe_customCropAspects'""")
db.commit()
print 'Custom crop aspect ratios reset successfully'

Just enough Weave

Note: I am keeping this code around for historical purposes, but it has not worked since Weave 1.0 RC2. I created this because Mozilla’s public sync servers were initially quite unreliable, but they have remedied the situation and performance problems are a thing of the past. I also learned the inner workings of Weave/Firefox Sync in the process, and am satisfied as to the security of the system. Since I no longer use Firefox myself, I do not expect to ever revive this project. Feel free to take it over, otherwise you are best served by using Mozilla’s cloud.

Like most of my readers, I use multiple computers: my Mac Pro at home, my MacBook Air when on the road, 3 desktop PCs at work, a number of virtual machines, and so on. I have Firefox installed on all of them. The Mozilla Weave extension allows me to sync bookmarks, passwords et al between them. Weave encrypts this data before uploading it to the server, but I do not like to rely on third-party web services for mission-critical functions (my Mozilla server was down last Monday, for instance, due to the surge of traffic from people returning to work and performing a full sync against 0.5). Through Weave 0.5, I ran my own instance of the Mozilla public Weave server version 0.3. Unfortunately, Weave 0.6 requires server version 0.5 and I had to upgrade.

The open-source Weave server is implemented in PHP. It doesn’t require Apache compiled with mod_dav as early versions did (I prefer to run nginx), but it is still a fairly gnarly piece of code that is anything but plug-and-play. Somehow I had managed to get version 0.3 running on my home server, but no amount of blundering around got me to a usable state with 0.5. I ended up deciding to implement a minimalist Weave server in Python, as it seemed less painful than continuing to struggle with the Mozilla spaghetti code, which confusingly features multiple pieces of code that appear to do exactly the same thing in three different places. Famous last words…

Three days of hacking later, I managed to get it working. 200 or so lines of Python code replaced approximately 12,000 lines of PHP. Of course, I am not trying to reproduce an entire public cloud infrastructure like Mozilla’s, just enough for my own needs, using the “simplest thing that works” principle. Interestingly, the Mozilla code includes a vestigial Python reference implementation of a Weave server for testing purposes. It does not seem to have been working for a while, though. I used it as a starting point but ended up rewriting almost everything. Here are the simplifying hypotheses:

  • My weave server is meant for a single user (my wife prefers Safari)
  • It does not implement authentication, logging or SSL encryption — it is meant to be used behind a nginx (or Apache) reverse proxy that will perform these functions.
  • It has no configuration file. There are just three variables to set at the top of the source file.
  • It does not implement the full server protocol, just the parts that are actually used by the extension today.
  • More controversially, it does not even implement persistence, keeping all data in RAM instead. Python running on Solaris is very reliable, and the expected uptime of the server is likely months on end. If the server fails, the Firefoxes will just have to perform a full sync and reconciliation. Fortunately, that has been much improved in Weave 0.6, so the cost is minimal. This could even be construed as a security feature, since there is no data on disk to be misplaced. It would take catastrophically losing all my browsers simultaneously to risk data loss. Short of California falling into the ocean, that’s not going to happen, and if it does, I probably have more pressing concerns…

The code could be extended fairly easily to lift these hypotheses, e.g. adding persistence or multiple user support using SQLite, PostgreSQL or MySQL.

Here is the server itself, weave_server.py:

#!/usr/local/bin/python
"""
  Based on tools/scripts/weave_server.py from
  http://hg.mozilla.org/labs/weave/

  do the Simplest Thing That Can Work: just enough to get by with Weave 0.6
  - SSL, authentication and loggin are done by nginx or other reverse proxy
  - no persistence, in case of process failure do a full resync
  - only one user. If you need more, create multiple instances on different
    ports and use rewrite rules to route traffic to the right one
"""

import sys, time, logging, socket, urlparse, httplib, pprint
try:
  import simplejson as json
except ImportError:
  import json
import wsgiref.simple_server

URL_BASE = 'https://your.server.name/'
#BIND_IP = ''
BIND_IP = '127.0.0.1'
DEFAULT_PORT = 8000

class HttpResponse:
  def __init__(self, code, content='', content_type='text/plain'):
    self.status = '%s %s' % (code, httplib.responses.get(code, ''))
    self.headers = [('Content-type', content_type),
                    ('X-Weave-Timestamp', str(timestamp()))]
    self.content = content or self.status

def JsonResponse(value):
  return HttpResponse(httplib.OK, value, content_type='application/json')

class HttpRequest:
  def __init__(self, environ):
    self.environ = environ
    content_length = environ.get('CONTENT_LENGTH')
    if content_length:
      stream = environ['wsgi.input']
      self.contents = stream.read(int(content_length))
    else:
      self.contents = ''

def timestamp():
  # Weave rounds to 2 digits and so must we, otherwise rounding errors will
  # influence the "newer" and "older" modifiers
  return round(time.time(), 2)

class WeaveApp():
  """WSGI app for the Weave server"""
  def __init__(self):
    self.collections = {}

  def url_base(self):
    """XXX should derive this automagically from self.request.environ"""
    return URL_BASE

  def ts_col(self, col):
    self.collections.setdefault('timestamps', {})[col] = str(timestamp())

  def parse_url(self, path):
    if not path.startswith('/0.5/') and not path.startswith('/1.0/'):
      return
    command, args = path.split('/', 4)[3:]
    return command, args

  def opts_test(self, opts):
    if 'older' in opts:
      return float(opts['older'][0]).__ge__
    elif 'newer' in opts:
      return float(opts['newer'][0]).__le__
    else:
      return lambda x: True

  # HTTP method handlers

  def _handle_PUT(self, path, environ):
    command, args = self.parse_url(path)
    col, key = args.split('/', 1)
    assert command == 'storage'
    val = self.request.contents
    if val[0] == '{':
      val = json.loads(val)
      val['modified'] = timestamp()
      val = json.dumps(val, sort_keys=True)
    self.collections.setdefault(col, {})[key] = val
    self.ts_col(col)
    return HttpResponse(httplib.OK)

  def _handle_POST(self, path, environ):
    try:
      status = httplib.NOT_FOUND
      if path.startswith('/0.5/') or path.startswith('/1.0/'):
        command, args = self.parse_url(path)
        col = args.split('/')[0]
        vals = json.loads(self.request.contents)
        for val in vals:
          val['modified'] = timestamp()
          self.collections.setdefault(col, {})[val['id']] = json.dumps(val)
        self.ts_col(col)
        status = httplib.OK
    finally:
      return HttpResponse(status)

  def _handle_DELETE(self, path, environ):
    assert path.startswith('/0.5/') or path.startswith('/1.0/')
    response = HttpResponse(httplib.OK)
    if path.endswith('/storage/0'):
      self.collections.clear()
    elif path.startswith('/0.5/') or path.startswith('/1.0/'):
      command, args = self.parse_url(path)
      col, key = args.split('/', 1)
      if not key:
        opts = urlparse.parse_qs(environ['QUERY_STRING'])
        test = self.opts_test(opts)
        col = self.collections.setdefault(col, {})
        for key in col.keys():
          if test(json.loads(col[key]).get('modified', 0)):
            logging.info('DELETE %s key %s' % (path, key))
            del col[key]
      else:
        try:
          del self.collections[col][key]
        except KeyError:
          return HttpResponse(httplib.NOT_FOUND)
    return response

  def _handle_GET(self, path, environ):
    if path.startswith('/0.5/') or path.startswith('/1.0/'):
      command, args = self.parse_url(path)
      return self.handle_storage(command, args, path, environ)
    elif path.startswith('/1/'):
      return HttpResponse(httplib.OK, self.url_base())
    elif path.startswith('/state'):
      return HttpResponse(httplib.OK, pprint.pformat(self.collections))
    else:
      return HttpResponse(httplib.NOT_FOUND)

  def handle_storage(self, command, args, path, environ):
    if command == 'info':
      if args == 'collections':
        return JsonResponse(json.dumps(self.collections.get('timestamps', {})))
    if command == 'storage':
      if '/' in args:
        col, key = args.split('/')
      else:
        col, key = args, None
      try:
        if not key: # list output requested
          opts = urlparse.parse_qs(environ['QUERY_STRING'])
          test = self.opts_test(opts)
          result = []
          for val in self.collections.setdefault(col, {}).itervalues():
            val = json.loads(val)
            if test(val.get('modified', 0)):
              result.append(val)
          result = sorted(result,
                          key=lambda val: (val.get('sortindex'),
                                           val.get('modified')),
                          reverse=True)
          if 'limit' in opts:
            result = result[:int(opts['limit'][0])]
          logging.info('result set len = %d' % len(result))
          if 'application/newlines' in environ.get('HTTP_ACCEPT', ''):
            value = '\n'.join(json.dumps(val) for val in result)
            return HttpResponse(httplib.OK, value,
                                content_type='application/text')
          else:
            return JsonResponse(json.dumps(result))
        else:
          return JsonResponse(self.collections.setdefault(col, {})[key])
      except KeyError:
        if not key: raise
        return HttpResponse(httplib.NOT_FOUND, '"record not found"',
                            content_type='application/json')

  def __process_handler(self, handler):
    path = self.request.environ['PATH_INFO']
    response = handler(path, self.request.environ)
    return response

  def __call__(self, environ, start_response):
    """Main WSGI application method"""

    self.request = HttpRequest(environ)
    method = '_handle_%s' % environ['REQUEST_METHOD']

    # See if we have a method called 'handle_METHOD', where
    # METHOD is the name of the HTTP method to call.  If we do,
    # then call it.
    if hasattr(self, method):
      handler = getattr(self, method)
      response = self.__process_handler(handler)
    else:
      response = HttpResponse(httplib.METHOD_NOT_ALLOWED,
                              'Method %s is not yet implemented.' % method)

    start_response(response.status, response.headers)
    return [response.content]

class NoLogging(wsgiref.simple_server.WSGIRequestHandler):
  def log_request(self, *args):
    pass

if __name__ == '__main__':
  socket.setdefaulttimeout(300)
  if '-v' in sys.argv:
    logging.basicConfig(level=logging.DEBUG)
    handler_class = wsgiref.simple_server.WSGIRequestHandler
  else:
    logging.basicConfig(level=logging.ERROR)
    handler_class = NoLogging
  logging.info('Serving on port %d.' % DEFAULT_PORT)
  app = WeaveApp()
  httpd = wsgiref.simple_server.make_server(BIND_IP, DEFAULT_PORT, app,
                                            handler_class=handler_class)
  httpd.serve_forever()

Here is the relevant fragment from my nginx configuration file:

# Mozilla Weave
location /0.5 {
  auth_basic            "Weave";
  auth_basic_user_file  /home/majid/web/conf/htpasswd.weave;
  proxy_pass            http://localhost:8000;
  proxy_set_header      Host $http_host;
}
location /1.0 {
  auth_basic            "Weave";
  auth_basic_user_file  /home/majid/web/conf/htpasswd.weave;
  proxy_pass            http://localhost:8000;
  proxy_set_header      Host $http_host;
}
location /1/ {
  auth_basic            "Weave";
  auth_basic_user_file  /home/majid/web/conf/htpasswd.weave;
  proxy_pass            http://localhost:8000;
  proxy_set_header      Host $http_host;
}

This code is hereby released into the public domain. You are welcome to use it as you wish. Just keep in mind that since it is reverse-engineered, it may well break with future releases of the Weave extension, or if Mozilla changes the server protocol.

Update (2009-10-03):

I implemented some minor changes for compatibility with Weave 0.7. The diff with the previous version is as follows:

--- weave_server.py~	Thu Sep  3 17:46:44 2009
+++ weave_server.py	Sat Oct  3 02:59:19 2009
@@ -65,8 +65,7 @@
     command, args = path.split('/', 4)[3:]
     return command, args

-  def opts_test(self, environ):
-    opts = urlparse.parse_qs(environ['QUERY_STRING'])
+  def opts_test(self, opts):
     if 'older' in opts:
       return float(opts['older'][0]).__ge__
     elif 'newer' in opts:
@@ -92,7 +91,7 @@
   def _handle_POST(self, path, environ):
     try:
       status = httplib.NOT_FOUND
-      if path.startswith('/0.5/') and path.endswith('/'):
+      if path.startswith('/0.5/'):
         command, args = self.parse_url(path)
         col = args.split('/')[0]
         vals = json.loads(self.request.contents)
@@ -113,7 +112,8 @@
       command, args = self.parse_url(path)
       col, key = args.split('/', 1)
       if not key:
-        test = self.opts_test(environ)
+        opts = urlparse.parse_qs(environ['QUERY_STRING'])
+        test = self.opts_test(opts)
         col = self.collections.setdefault(col, {})
         for key in col.keys():
           if test(json.loads(col[key]).get('modified', 0)):
@@ -142,10 +142,14 @@
       if args == 'collections':
         return JsonResponse(json.dumps(self.collections.get('timestamps', {})))
     if command == 'storage':
-      col, key = args.split('/')
+      if '/' in args:
+        col, key = args.split('/')
+      else:
+        col, key = args, None
       try:
         if not key: # list output requested
-          test = self.opts_test(environ)
+          opts = urlparse.parse_qs(environ['QUERY_STRING'])
+          test = self.opts_test(opts)
           result = []
           for val in self.collections.setdefault(col, {}).itervalues():
             val = json.loads(val)
@@ -155,6 +159,8 @@
                           key=lambda val: (val.get('sortindex'),
                                            val.get('modified')),
                           reverse=True)
+          if 'limit' in opts:
+            result = result[:int(opts['limit'][0])]
           logging.info('result set len = %d' % len(result))
           if 'application/newlines' in environ.get('HTTP_ACCEPT', ''):
             value = '\n'.join(json.dumps(val) for val in result)

Update (2009-11-17):

Weave 1.0b1 uses 1.0 as the protocol version string instead of 0.5 but is otherwise unchanged. I updated the script and nginx configuration accordingly.

Inserting graphviz diagrams in a CVStrac wiki

CVStrac is an amazing productivity booster for any software development group. This simple tool, built around a SQLite database (indeed, by the author of SQLite) combines a bug-tracking database, a CVS browser and a wiki. The three components are fully cross-referenced and build off the strengths of each other. You can handle almost all aspects of the software development process in it, and since it is built on an open database with a radically simple schema, it is trivial to extend. I use CVStrac for Temboz to track bugs, but also to trace changes in the code base to requirements or to bugs, and last but not least, the wiki makes documentation a snap.

For historical reasons, my company uses TWiki for its wiki needs. We configured Apache with mod_rewrite so that the wiki links from CVStrac lead to the corresponding TWiki entry instead of the one in CVStrac itself, which is unused. TWiki is very messy (not surprising, as it is written in Perl), but it has a number of good features like excellent search (it even handles stemming) and a directed graph plug-in that makes it easy to design complex graphs using Bell Labs’ graphviz, without having to deal with the tedious pixel-pushing of GUI tools like Visio or OmniGraffle. The plug-in makes it easy to document UML or E-R graphs, document software dependencies, map process flows and the like.

CVStrac 2.0 introduced extensibility in the wiki syntax via external programs. This allowed me to implement similar functionality in the CVStrac native wiki. To use it, you need to:

  1. Download the Python script dot.py and install it somewhere in your path. The sole dependency is graphviz itself, as well as either pysqlite2 or the built-in version bundled with Python 2.5
  2. create a custom wiki markup in the CVStrac setup, of type “Program Block”, with the formatter command-line:
    path_/dot.py –db _CVStrac_database_file –name ‘%m'
    • Insert the graphs using standard dot syntax, bracketed between CVStrac {dot} and {enddot} tags.
For examples of the plugin at work, here is the graph corresponding to this markup:
{dot}
digraph sw_dependencies {
style=bold;
dpi=72;

temboz [fontcolor=white,style=filled,shape=box,fillcolor=red];
python [fontcolor=white,style=filled,fillcolor=blue];
cheetah [fontcolor=white,style=filled,fillcolor=blue];
sqlite [fontcolor=white,style=filled,fillcolor=blue];

temboz -> cheetah -> python;
temboz -> python -> sqlite -> gawk;
temboz -> cvstrac -> sqlite;
python -> readline;
python -> db4;
python -> openssl;
python -> tk -> tcl;

cvstrac -> "dot.py" -> graphviz -> tk;
"dot.py" -> python;
"dot.py" -> sqlite;
graphviz -> gdpng;
graphviz -> fontconfig -> freetype2;
fontconfig -> expat;
graphviz -> perl;
graphviz -> python;
gdpng -> libpng -> zlib;
gdpng -> freetype2;
}
{enddot}
Dot

Another useful plug-in for CVStrac I wrote is one that highlights source code in the CVS browser using the Pygments library. Simply download pygmentize.py, install it Setup/Diff & Filter Programs/File Filter, using the string _path_to_/pygmentize.py %F. Here is an example of Pygment applied to pygmentize.py itself:

#!/usr/bin/env python
# $Log: pygmentize.py,v $
# Revision 1.3  2007/07/04 19:54:26  majid
# cope with Unicode characters in source
#
# Revision 1.2  2006/12/23 03:51:03  majid
# import pygments.lexers and pygments.formatters explicitly due to Pygments 0.6
#
# Revision 1.1  2006/12/05 20:19:57  majid
# Initial revision
#
"""
CVStrac plugin to Pygmentize source code
"""
import sys, pygments, pygments.lexers, pygments.formatters

def main():
  assert len(sys.argv) == 2
  block = sys.stdin.read()
  try:
    lexer = pygments.lexers.get_lexer_for_filename(sys.argv[1])
    out = pygments.highlight
    block = pygments.highlight(
      block, lexer, pygments.formatters.HtmlFormatter(
      style='colorful', linenos=True, full=True))
  except ValueError:
    pass
  print unicode(block).encode('ascii', 'xmlcharrefreplace')

if __name__ == '__main__':
  main()

A Python driver for the Symbol CS 1504 bar code scanner

One of my cousins works for Symbol, the world’s largest bar code reader manufacturer. The fashionable action today is in RFID, but the humble bar code is relatively untapped at the consumer level. The unexpected success of Delicious Library shows people want to manage their collection of books, CDs and DVDs, and as with businesses, scanning bar codes is the fastest and least error-prone way to do so. Delicious Library supports scanning bar codes with an Apple iSight camera, but you have to wonder how reliable that is.

If you want something more reliable, you need a dedicated bar code scanner. They come in a bewildering array of sizes and shapes, from thin wands to pistol-like models or flat ones like those used at your supermarket checkout counter. For some reason, the bar code scanner world seems stuck in the era of serial ports (or worse, PS/2 keyboard wedges), but USB models are available, starting at $70 or so. They emulate a keyboard – when you scan a bar code, they will type in the code (as printed on the label), character by character so as to not overwhelm the application, and follow with a carriage return, which means they can work with almost anything from terminal-based applications to web pages. Ingeniously, most will allow you to program the reader’s settings using a booklet of special bar codes that perform changes like enabling or disabling ISBN decoding, and so on.

The problem with tethered bar code readers is, they are not very convenient if you are trying to catalog items on a bookshelf or read in UPC codes in a supermarket. Symbol has a unit buried deep inside its product catalog, the CS 1504 consumer scanner. This tiny unit (shown below with a canister of 35mm film for size comparison) can be worn on a key chain, although I would worry about damaging the plastic window. Most bar code readers are hulking beasts in comparison. It has a laser bar code scanner: just align the line it projects with the bar code and it will chirp once it has read and memorized the code. The memory capacity is up to 150 bar code scans with timestamps, or 300 without timestamps. The 4 silver button batteries (included) are rated for 5000 scans — AAA would have been preferable, but I guess the unit wouldn’t be so compact, but it is clear this scanner was not intended for heavy-duty commercial inventory tracking purposes.

I bought one to simplify the process of listing books with BookCrossing (even though their site is not optimized for bar code readers), but you have other interesting uses like finding out more about your daily purchases such as nutritional information or whether the company behind them engages in objectionable business practices. I can also imagine sticking preprinted bar-coded asset tracking tags on inventory (e.g. computers in the case of an IT department), and keeping track of them with this gizmo. People who sell a lot of books or used records through Amazon.com can also benefit as Amazon has a bulk listing service to which you can upload a file with barcodes. An interesting related service is the free UPC database.

Symbol CS 1504
You can order the scanner in either serial ($100) or USB ($110) versions, significantly cheaper than the competition like Intelliscanner (and much smaller to boot). I highly recommend the USB version, even if you have a serial port today — serial ports seem to be going the way of the dodo and your next computer may not have one. The USB version costs slightly more, but that’s because they include a USB-Serial adapter, and you can’t get one retailing for a mere $10. The one shipped with my unit is the newer PN50 cable which uses a Prolific 2303 chipset rather than the older Digi adapter. Wonder of wonders, they even have a

Mac OS X driver available.

The scanner ships without any software. Symbol mostly sells through integrators to corporations that buy hundreds or thousands of bar code scanners for inventory or point of sale purposes, and they are not really geared to be a direct to consumer business with all the customer support hassles that entails. There are a number of programs available, mostly for Windows, but they don’t seem to have that much by way of functionality to justify their high prices, often as expensive as the scanner itself.

Symbol does make available a SDK to access the scanner, including complete documentation of the protocol used for the device. While you do have to register, they do not make you go through the ridiculous hoops you have to pass to access to the Photoshop plug-in SDK or the Canon RAW decoding SDK. The supplied libraries are Windows-only, however, so I wrote a Python script that works on both Windows and Mac OS X (and probably most UNIX implementations as well, although you will have to use a serial port). The only dependency is the pySerial module.

By default, it will set the clock on the scanner, retrieve the recorded bar codes, correct the timestamps for any drift between the CS 1504’s internal clock and that of the host computer, and if successful clear the unit’s memory and dump the acquired bar codes in CSV format to standard output. The script will also decode ISBN codes (the CS 1504 does not appear to do this by itself in its default configuration). As it is written in Python, it can easily be extended, although it is probably easier to work off the CSV file.

The only configuration you have to do is set the serial port to use at the top of the script (it should do the right thing on a Mac using the Prolific driver, and the Windows driver seems to always use COM8 but I have no way of knowing if this is by design or coincidence). The program is still very rough, specially as concerns error recovery, and I appreciate any feedback.

A sample session follows:

ormag ~>python cs1504.py > barcodes.csv
Using device /dev/cu.usbserial...  connected
serial# 000100000003be95
SW version NBRIKAAE
reading clock for drift
clock drift 0:00:01.309451
resetting scanner clock... done
reading barcodes... done (2 read)
clearing barcodes... done
powering down... done

ormag ~>cat barcodes.csv
UPCA,034571575179,2006-03-27 01:08:48
ISBN,1892391198,2006-03-27 01:08:52

Update (2006-07-21):

At the prompting of some Windows users, I made a slightly modified version, win_cs1504.py, that will copy the barcodes to the clipboard, and also insert the symbology, barcode and timestamp starting on the first free line in the active Excel spreadsheet (creating one if necessary).

Update (2007-01-20):

Just to make it clear: I hereby place this code in the public domain.

Update (2009-11-06):

For Windows users, I have put up videos describing how to install the Prolific USB to serial driver, Python and requisite extensions, and how to use the program itself.

Update (2012-07-05):

I moved the script over to GitHub. Please file bug reports and enhancement requests there. Fatherhood and a startup don’t leave me much time to maintain this, so I make no promises, but this should allow people who make fixes to contribute them back (or fork).

A reader-writer lock for Python

Python offers a number of useful synchronization primitives in the threading and Queue modules. One that is missing, however, is a simple reader-writer lock (RWLock). A RWLock allows improved concurrency over a simple mutex, and is useful for objects that have high read-to-write ratios like database caches.

Surprisingly, I haven’t been able to find any implementation of these semantics, so I rolled my own in a module rwlock.py to implement a RWLock class, along with lock promotion/demotion. Hopefully it can be added to the standard library threading module. This code is hereby placed in the public domain.

"""Simple reader-writer locks in Python
Many readers can hold the lock XOR one and only one writer"""
import threading

version = """$Id: 04-1.html,v 1.3 2006/12/05 17:45:12 majid Exp $"""

class RWLock:
  """
A simple reader-writer lock Several readers can hold the lock
simultaneously, XOR one writer. Write locks have priority over reads to
prevent write starvation.
"""
  def __init__(self):
    self.rwlock = 0
    self.writers_waiting = 0
    self.monitor = threading.Lock()
    self.readers_ok = threading.Condition(self.monitor)
    self.writers_ok = threading.Condition(self.monitor)
  def acquire_read(self):
    """Acquire a read lock. Several threads can hold this typeof lock.
It is exclusive with write locks."""
    self.monitor.acquire()
    while self.rwlock < 0 or self.writers_waiting:
      self.readers_ok.wait()
    self.rwlock += 1
    self.monitor.release()
  def acquire_write(self):
    """Acquire a write lock. Only one thread can hold this lock, and
only when no read locks are also held."""
    self.monitor.acquire()
    while self.rwlock != 0:
      self.writers_waiting += 1
      self.writers_ok.wait()
      self.writers_waiting -= 1
    self.rwlock = -1
    self.monitor.release()
  def promote(self):
    """Promote an already-acquired read lock to a write lock
    WARNING: it is very easy to deadlock with this method"""
    self.monitor.acquire()
    self.rwlock -= 1
    while self.rwlock != 0:
      self.writers_waiting += 1
      self.writers_ok.wait()
      self.writers_waiting -= 1
    self.rwlock = -1
    self.monitor.release()
  def demote(self):
    """Demote an already-acquired write lock to a read lock"""
    self.monitor.acquire()
    self.rwlock = 1
    self.readers_ok.notifyAll()
    self.monitor.release()
  def release(self):
    """Release a lock, whether read or write."""
    self.monitor.acquire()
    if self.rwlock < 0:
      self.rwlock = 0
    else:
      self.rwlock -= 1
    wake_writers = self.writers_waiting and self.rwlock == 0
    wake_readers = self.writers_waiting == 0
    self.monitor.release()
    if wake_writers:
      self.writers_ok.acquire()
      self.writers_ok.notify()
      self.writers_ok.release()
    elif wake_readers:
      self.readers_ok.acquire()
      self.readers_ok.notifyAll()
      self.readers_ok.release()

if __name__ == '__main__':
  import time
  rwl = RWLock()
  class Reader(threading.Thread):
    def run(self):
      print self, 'start'
      rwl.acquire_read()
      print self, 'acquired'
      time.sleep(5)
      print self, 'stop'
      rwl.release()
  class Writer(threading.Thread):
    def run(self):
      print self, 'start'
      rwl.acquire_write()
      print self, 'acquired'
      time.sleep(10)
      print self, 'stop'
      rwl.release()
  class ReaderWriter(threading.Thread):
    def run(self):
      print self, 'start'
      rwl.acquire_read()
      print self, 'acquired'
      time.sleep(5)
      rwl.promote()
      print self, 'promoted'
      time.sleep(5)
      print self, 'stop'
      rwl.release()
  class WriterReader(threading.Thread):
    def run(self):
      print self, 'start'
      rwl.acquire_write()
      print self, 'acquired'
      time.sleep(10)
      print self, 'demoted'
      rwl.demote()
      time.sleep(10)
      print self, 'stop'
      rwl.release()
  Reader().start()
  time.sleep(1)
  Reader().start()
  time.sleep(1)
  ReaderWriter().start()
  time.sleep(1)
  WriterReader().start()
  time.sleep(1)
  Reader().start()

Threadframe: multithreaded stack frame extraction for Python

Note: threadframe is obsolete. Python 2.5 and later include a function sys._current_frames() that does the same thing. Threadframe is only useful for Python 2.2 through 2.4.

Rationale

I was encountering deadlocks in a multi-threaded CORBA server (implemented using omniORB). Debugging using GDB gave me too low-level information, and what I needed was an equivalent of the GDB command “info threads”. There was no such facility available from within Python’s standard library, so I rolled my own.

David Beazley added advanced debugging functions to the Python interpreter, and they have been folded into the 2.2 release.

I used these hooks to build a debugging module that is useful when you are looking for deadlocks in a multithreaded application. It basically has a single function that will return a list of the stack frames for all Python interpreter threads in the process.

Guido van Rossum added in Python 2.3 the thread ID to the interpreter state structure, and this allows us to produce a dictionary mapping thread IDs to frames.

This functionality is now integrated in Python 2.5’s batteries-included sys._current_frames() function.

Of course, I disclaim any liability if this code should crash your system, erase your homework, eat your dog (who also ate your homework) or otherwise have any undesirable effect.

Building and installing

Python 2.2 or later is required. Thread ID to frame dictionary extraction is only available in Python 2.3 and later, and will generate a NotImplementedError if used from 2.2.

Download the source tarball threadframe-0.2.tar.gz. You can use the Makefile or directly with the setup.py script. I have built and tested this only on Solaris 8/x86 and Windows 2000, but the code should be pretty portable. There is a small test program test.py that illustrates how to use this module to dump stack frames of all the Python interpreter threads. A sample run is available for your perusal.

For Windows users, I have available pre-compiled binaries, built using Mingw32 and GCC 2.95.2. Just copy the file threadframe.pyd in any location in your Python path and you should be able to run the test script test.py.

<tr>
  <th>
    Python version
  </th>
  
  <th>
    Download
  </th>
</tr>

<tr>
  <td>
    2.2.1
  </td>
  
  <td>
    <a href="/python/threadframe/win32/2.2/threadframe.pyd">threadframe.pyd</a>
  </td>
</tr>

<tr>
  <td>
    2.3.4
  </td>
  
  <td>
    <a href="/python/threadframe/win32/2.3/threadframe.pyd">threadframe.pyd</a>
  </td>
</tr>

<tr>
  <td>
    2.4.x
  </td>
  
  <td>
    <a href="/python/threadframe/win32/2.4/threadframe.pyd">threadframe.pyd</a>
  </td>
</tr>
Windows binaries

License

This code is licensed under the same terms as Python itself.

Change history

Release 0.2 (2004-06-10)

Distutils based setup.py contributed by Bob Ippolito. Bob also noticed that thread_id was added to the Python interpreter state, and contributed a patch to get a dictionary mapping thread_ids to frames instead of a list.

Release 0.1 (2002-10-11)

Initial release for Python 2.2: threadframe-0.1.tar.gz

The Temboz RSS aggregator

2013-03-14: Google’s announcement that their Reader service will be discontinued has spurred interest in Temboz. This software is not dead, in fact I use it daily, but have not made an official release in a long time. You should use the version from Github instead. There are currently a number of bugs which can lead to Temboz locking up and requiring a restart. I am planning on completing my long overdue overhaul before Google’s July deadline.

Contents

Introduction

Temboz is a RSS aggregator. It is inspired by FeedOnFeeds (web-based personal aggregator), Google News (two column layout) and TiVo (thumbs up and down). I have been using FeedOnFeeds for some time now, but that software seems to have stopped evolving, and I had a number of optimizations to the user experience I wanted to make.

Features

Already implemented:

  • Multithreaded, download feeds in parallel.
  • Built-in web server.
  • Two-column user interface for better readability and information density. Automatic reflow using CSS.
  • Ratings system for articles
  • Real-time hunter-gatherer user interface: items flagged with a “Thumbs down” disappear immediately off the screen (using Dynamic HTML), making room for new articles. No laborious flagging of items as in FeedOnFeeds.
  • Filtering entries (using Python syntax, e.g. 'Salon’ in feed_title and title == “King Kaufman’s Sports Daily", or simply by selecting keywords/phrases and hitting “Thumbs down”).
  • Ability to generate a RSS feeds from “Thumbs Up” articles, which is why Temboz would be a true aggregator, not just a reader.
  • Ad filtering
  • Automatic garbage collection: every day between 3AM and 4AM, uninteresting articles (by default those older than 7 days) are purged of their contents (but not metadata such as titles, permalinks or timestamps) to keep the database size manageable. After 6 months (by default), they are deleted altogether
  • Automatic database backups daily (immediately after garbage collection)

On the to do list:

  • Write better documentation
  • Handle permanent HTTP redirects for feed XML URLs
  • Automatic pacing of feed polling intervals using the average and standard deviation of observed feed item inter-arrival times, to reduce bandwidth usage and load for both client and server. Most feeds should be polled on a daily rather than hourly interval (e.g. my own, since I update once a week on average), but the mechanisms for a feed to indicate its polling rate preferences are quite inconsistent from one flavor of RSS/Atom to another.
  • “Survivor mode” – vote feeds that no longer perform off the aggregator based on relevance statistics.
  • Ability to cluster together articles (I tried a heuristic of looking for common URLs they are all pointing to, but this didn’t work well in practice).
  • Portability to Windows, distribution as a standalone package.

History

I have been using it successfully for well over a year. It still has rough edges, with some administration functions only doable using the SQLite command-line utility. Here is a screen shot showing the reader user interface. The article highlighted in yellow was given a “Thumbs Up”. You can also see the user interface at work in a view of the last 50 articles I flagged as “thumbs up” among the feeds I read.

Screen shots

Click on a screen shot thumbnail for a full-sized version

The first screen shot shows the article reading interface, using a two-column layout. Clicking on the “Thumbs down” icon makes the article disappear, bringing a new one in its place (if available). Clicking on the “Thumbs up” icon highlights it in yello and flags it as interesting in the database.

view itemsThe feed summary page shows statistics on feeds, starting with feeds with unread articles, then by alphabetical order. Feeds can be sorted based on other metrics. You have the option of “catching up” with a feed (marking all the articles as read). Feeds with errors are highlighted in red (not shown).

view feedsClicking on the “details” link for a feed brings this page, which allows you to change title or feed URL, and shows the RSS or Atom fields accessible for filtering.

feed detailsFeeds can be filtered using Python expressions.

filtering rules

Known bugs

You can check outstanding bug reports, change requests and more at the public CVStrac site.

Credits

Temboz is written in Python, and leverages Mark Pilgrim’s Ultra-liberal feed parser, SQLite 2.x, Cheetah.

Download

You can download the current version: temboz-0.8.tar.gz I welcome any feedback you may have, specially as concerns improving installation.

The CVS version is far ahead of 0.8 in features. I have not yet had the time to test and document the migration procedure from 0.8 to 1.0, but if you are a new Temboz user I strongly advise you to get a nightly CVS snapshot instead (they are what I run on my own server): temboz-CVS.tar.gz or temboz-CVS.zip.

Updates

For news on Temboz, please subscribe to the RSS feed.

Temboz has a CVStrac where you can submit bug reports or change requests, and a Wiki, where all future documentation will ultimately reside.

Post scriptum

The name “Temboz” is a reference to Malima Temboz, “The mountain that walks”, an elephant whose tormented spirit is the object of Mike Resnick’s excellent SF novel, Ivory.

Data mining Outlook for fun and profit

For a few years now, I have owned the domain name majid.fm. Dot-fm stands for the Federated States of Micronesia, a micro-state in the Pacific Ocean, and they market their domain names to FM radio stations. Those are also my initials. Unfortunately, the registration fees are quite expensive ($200 every two years), and the domain is redundant now that I have acquired majid.info and majid.org (majid.com is reserved by a Malaysian cybersquatter who is demanding a couple thousand dollars for it – I may be vain, but not that vain). I have decided to let the domain lapse when it expires on April 1st.

I used the majid-dot-FM domain for my emails, and set it up so emails sent to anything @majid.fm would be sent to my primary mailbox fazal@majid.fm. For instance, if I registered with Dell, I would give them the email address dell@majid.fm. This was helpful in tracing where I got my email from, and blacklisting companies that started spamming me (they shall remain nameless to protect the guilty yet litigious).

Unfortunately, spammers and some worms attempt dictionary attacks by trying all possible combinations like jim@majid.fm, smith@majid.fm, and so on. My spam filter would catch some, but not all of them, and it would be a terrible hassle. I do not want to have an auto-responder send emails back to people who email me at the old address, as this would at best flood innocent people whose addresses spammers are impersonating, and at worst actually give my new address to the spammers.

My solution to this dilemma is to produce a Python script that scans through all the emails in my Outlook personal folder (PST) files of archived emails, flag all those who sent me an email, and them manually send them a change of address notification (or in the case of websites and online stores, update my contact info online).

Simply using Outlook’s advanced search function will not work, as in many cases the To: header is set to something other than the address the email is delivered to, such as undisclosed-recipients, or the sender’s address when they send the email to multiple Bcc: recipients (the proper way to proceed when you want to send an email to multiple recipients without giving everyone in the list the email addresses of the other recipients). I actually have to sift through the raw message headers to see the envelope destination address.

Here is a simplified version of olmine.py, the script I used. It requires Python 2.x with the win32all extensions, and Outlook 2000 with the Collaboration Data Objects (CDO) option installed (this is not the default). CDO is required to access the full headers. Of course, this script can be useful for all sorts of social network analysis fun on your own Outlook files, or more prosaically to generate a whitelist of email addresses for your spam filter.

import re, win32com.client

srcs = {}
dsts = {}
pairs = {}

# regular expression that scans for valid email addresses in the headers
m_re = re.compile(r'[-A-Za-z0-9.,_]*@majid\.fm')
# regular expression that strips out headers that can cause false positives
strip_re = re.compile(r'(Message-Id:.*$|In-Reply-To:.*$|References:.*$)',
                      re.IGNORECASE | re.MULTILINE)

def dump_folder(folder):
  """Iterate recursively over the given folder and its subfolders"""
  print '-' * 72
  print folder.Name
  print '-' * 72
  for i in range(1, folder.Messages.Count + 1):
    try:
      # PR_SENDER_EMAIL_ADDRESS
      _from = folder.Messages[i].Fields[0x0C1F001F].Value
      # PR_TRANSPORT_MESSAGE_HEADERS
      headers = folder.Messages[i].Fields[0x7d001e].Value
    except:
      # ignore non-email objects like contacts or calendar entries
      continue
    stripped_headers = strip_re.sub('', headers)
    for _to in m_re.findall(stripped_headers):
      srcs[_from] = srcs.get(_from, 0) + 1
      dsts[_to] = dsts.get(_to, 0) + 1
      if (_from, _to) not in pairs:
        print _from, '->', _to
      pairs[_from, _to] = pairs.get((_from, _to), 0) + 1
  # recurse
  for i in range(1, folder.Folders.Count + 1):
    dump_folder(folder.Folders[i])

# connect to Outlook via CDO
cdo = win32com.client.Dispatch('MAPI.Session')
cdo.Logon()
# iterate over all the open PST files
for i in range(1, cdo.InfoStores.Count + 1):
  store = cdo.InfoStores[i]
  root = store.RootFolder
  m = root.Messages
  store.ID
  print '#' * 72
  print store.Name
  print '#' * 72
  dump_folder(root)
cdo.Logoff()

Debugging DCOracle2 applications

DCOracle2 is the Oracle interface module for Python I use most often. It is advertised as “beta”, but quite suitable for production use, aside from a few minor rough edges. There are a few others, most notably cx_oracle, but I can’t vouch for them.

Debugging applications that make use of DCOracle2 can be challenging, as with any database environment, specially in a multi-threaded server context. I have developed a small utility module to aid in development. When it is imported, it will automatically trace all database calls made through DCOracle2, including arguments such as bind variables. More interestingly, it will also automatically run EXPLAIN PLAN on queries taking longer than 2 seconds (by default), to aid in tuning SQL statements. As a side bonus, if run by itself, it provides a (very basic) SQL shell that does offer command-line history and editing, something Oracle hasn’t managed to provide in SQL*Plus in almost 30 years 🙂

This code works with Python 2.2 and DCOracle2 1.1 and 1.3 beta. It will not work with 2.1 and earlier.

The latest version of the module file can be downloaded here: debug_ora.py, as well as the RCS repository debug_ora.py,vfor those who care about this kind of stuff.

An example run of the module:

% python debug_ora.py scott/tiger@repos
SQL> select ename, job, dname from emp, dept where emp.deptno=dept.deptno;
SQL: Oct-03-2003 17:32:39:897
select ename, job, dname from emp, dept where emp.deptno=dept.deptno
ARG: () {}
SQL: !!!!!!!!!!!!!!!! slow query, time = 0.0 sec
SQL: !!!!!!!!!!!!!!!! execution plan follows
000      SELECT STATEMENT Optimizer=CHOOSE
001        NESTED LOOPS
002 001      TABLE ACCESS (FULL) ON EMP
003 001      TABLE ACCESS (BY INDEX ROWID) ON DEPT
004 003        INDEX (UNIQUE SCAN) ON PK_DEPT

ENAME  JOB       DNAME
------ --------- ----------
SMITH  CLERK     RESEARCH
ALLEN  SALESMAN  SALES
WARD   SALESMAN  SALES
JONES  MANAGER   RESEARCH
MARTIN SALESMAN  SALES
BLAKE  MANAGER   SALES
CLARK  MANAGER   ACCOUNTING
SCOTT  ANALYST   RESEARCH
KING   PRESIDENT ACCOUNTING
TURNER SALESMAN  SALES
ADAMS  CLERK     RESEARCH
JAMES  CLERK     SALES
FORD   ANALYST   RESEARCH
MILLER CLERK     ACCOUNTING
SQL>

Obtaining tracebacks on other threads than the current thread

Note: this entry was superseded and is maintained only for historical purposes. Among others, the restriction of not being able to find the stack frame for a specific thread has been lifted with changes in Python 2.3.

David Beazley added advanced debugging functions to the Python interpreter, and they have been folded into the 2.2 release.

I used these hooks to build a debugging module that is useful when you are looking for deadlocks in a multithreaded application. It basically has a single function that will return a list of the stack frames for all Python interpreter threads in the process.

Unfortunately, I was unable to find a way to get a stack frame for a specific thread (either by the thread ID or using threading Thread objects), as Python does not save the thread ID in its thread state.

Of course, I disclaim any liability if this code should crash your system, erase your homework, eat your dog (who also ate your homework) or otherwise have any undesirable effect.

Building and installing

Download threadframe-0.1.tar.gz. You can use the Makefile. I’ve built and tested this only on Solaris 8/x86 and Windows 2000, but the code should be pretty portable. There is a small test program test.py that illustrates how to use this module to dump stack frames of all the Python interpreter threads. A sample run is available for your perusal.

For Windows users, a pre-compiled binary for the standard Python 2.2.1 distribution is available: threadframe.pyd. Just copy this file in any location in your Python path and you should be able to run the test script test.py.

Objects are aristotelician

One of the unquestioned assumptions behind object-oriented programming is that objects are instances of a class, and thus implicitly stay that way. This is akin to the philosophical concept of nature, as in an invariant quality of something, that cannot be changed:

But is there any one thus intended by nature to be a slave, and for whom such a condition is expedient and right, or rather is not all slavery a violation of nature?

There is no difficulty in answering this question, on grounds both of reason and of fact. For that some should rule and others be ruled is a thing not only necessary, but expedient; from the hour of their birth, some are marked out for subjection, others for rule.

Again, the male is by nature superior, and the female inferior; and the one rules, and the other is ruled; this principle, of necessity, extends to all mankind.

It is clear, then, that some men are by nature free, and others slaves, and that for these latter slavery is both expedient and right.

Aristotle, Politics I, 5 (emphasis mine)

Needless to say, this concept is reactionary. One may well object that given slavery’s omnipresence in antiquity, even a great philosopher such as Aristotle could not be entirely free of the prejudices of his time. This conveniently ignores the fact Aristotle was a pupil of Plato, himself a disgruntled aristocrat who collaborated with Spartans when they overthrew Athenian democracy after the Peloponnesian war, and is arguably one of the theoretical founders of the totalitarian state. I would say it is rather the presumed greatness of Aristotle that should be reexamined, but I digress. For more on this subject, read Karl Popper’s The Open Society and its Enemies – Volume 1, The Spell of Plato.

Thus, OOP carries within it the conservatism of Plato and Aristotle, people who resented how the young Athenian democracy had usurped the aristocracy’s natural (in their eyes) right to rule over others. This is not just an academic consideration. Computer programmers influence society, specially those who work for governmental information systems, and if you consider the Sapir-Whorf hypothesis, the language they use affects the way they think.

This is why I like Python’s ability to morph an object from one class to another:

Python 2.2.1 (#1, Apr 18 2002, 13:06:27)
[GCC 2.95.3 20010315 (release)] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> class Slave:
...     def whip(self):
...             return 'Yes, master'
...
>>> class Freeman:
...     def whip(self):
...             return 'Die, fascist scum!'
...
>>> man = Slave()
>>> man.whip()
'Yes, master'
>>> man.__class__ = Freeman
>>> man.whip()
'Die, fascist scum!'

Using Wake-on-LAN with Python

Most modern PCs and Macintoshes feature Wake-on-LAN. This feature, originally called “Magic packet” (PDF) by AMD, allows you to start a PC remotely by sending a specially formed “magic packet” to its Ethernet interface. On Macs running OS X, Wake-on-LAN seems to work only when the Mac is in sleep mode, not when it is completely turned off. The original intent was to allow administrators to boot PCs remotely to run backups, but with the spread of DSL, there are other uses.

For instance, I have a low-noise Solaris machine running 24/7 at my home (angband.majid.fm), and when I need to access my (noisy) home PC, I just log on to that machine via SSH, wake up the PC and then log on remotely using pcAnywhere. The same works with my iMac G4

Here is a very simple Python script that starts a machine with a given MAC address:

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.sendto('\xff'*6 + '\x00\x02\xb3\x07\xb6\xd1'*16, ('192.168.1.255', 80))

It will start the machine with MAC address 00:02:B3:07:B6:D1 on the subnet 192.168.1/24 by sending a Wake-on-LAN magic packet to the subnet-directed broadcast IP address.

Update (2003-12-05):

Now that you have woken your Mac, how do you send it back to sleep? Read this article to find out.

Update (2006-03-19):

On certain versions of Linux, you may get a “permission denied” error message because you are trying to send a packet to a broadcast address. The following code should work:

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
s.sendto('\xff'*6 + '\x00\x02\xb3\x07\xb6\xd1'*16, ('192.168.1.255', 80))

Python used to defend human rights

Patrick Ball is the author of a book called Making the Case describing how information technology and notably databases of human rights abuse reports can yield statistical evidence of wrongdoing by specific individuals (say, policemen).

He testified at Slobodan Milosevic’s trial in The Hague. Apparently, the processing was done in Python, see page 2 of this Wired article for more details.