For a long time I used a service called Shoeboxed to scan and organize my credit card receipts. Basically stuff your receipts in a US prepaid envelope, drop it in the mail, and they scan, OCR and shred them, as well as analyzing the text to extract the information. Unfortunately, since I moved to the UK the service leaves to be desired, and the price has also gone up over time.

I have a couple of Fujitsu ScanSnap document scanners, a S1500M, which is no longer supported on macOS Mojave (but fortunately is by the third-party app ExactScan), and an iX100 which still is supported, as well as a SV600 which is utterly unsuited to dealing with crumpled receipts. The new, dumbed-down ScanSnap Home app that ships with ScanSnaps has a receipt mode. Since I never use the iX100 to scan documents at home given I have a S1500M (it’s a handheld battery=powered simplex scanner that’s mostly intended for mobile use), I dedicated it and ScanSnap Home to scanning receipts.

The basic functionality of scanning, deskewing, OCR-ing and extracting date, amount, vendor and so on mostly works, but otherwise ScanSnap Receipts is an ergonomic disaster. For starters, it never recognizes the currency correctly and always identifies my transactions as being in dollars rather than pounds. Secondly, it inexplicably lacks the ability to batch-edit receipts, e.g. select the date range for my last trip to France and change all of them from dollars to euros. You need to edit them one by one, which is as incredibly tedious as you can imagine.

After a few weeks of this, I decided to take matters in my own hand. It turns out ScanSnap Home uses Core Data backed by an underlying SQLite database. SQLite is the world’s most widely deployed database (every single Android and iOS smartphone includes it, for starters), but the Core Data object-relational mapper above it does a terrific job of obfuscating it and reducing its performance. Nonetheless, after a little bit of digging, I wrote the following script to automate the most repetitive operations:

  1. Stop all ScanSnap auxiliary processes, as they have the DB opened even if you quit the app
  2. Set the currency for all transactions tagged as Unchecked (ScanSnap Home does this by default) to GBP
  3. Normalize all unchecked Waitrose vendor names to Waitrose
  4. Same for Tesco
  5. Rename M&S to Marks & Spencer
  6. Restart ScanSnap Home

One thing this script cannot do is fix dates. Because date formats are ambiguous (is 11/3/20 March 11 2020, or November 3 2020 or perversely March 20 2011?) and point-of-sale vendors are neither ISO 8601 nor even Y2100 compliant, parsing dates is a minefield.

#!/bin/sh
pkill -9 -f ScanSnap

sqlite3 "$HOME/Library/Application Support/PFU/ScanSnap Home/Managed/ScanSnapHome.sqlite" << EOF

.mode lines

UPDATE zcontent
SET zcurrencysign=(SELECT z_pk FROM zcurrencysign WHERE zvalue='GBP')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Waitrose')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Waitrose' AND zvalue LIKE '%waitrose%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Tesco')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Tesco' AND zvalue LIKE '%tesco%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Marks & Spencer')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue LIKE '%M&S%'
);

EOF

open /Applications/ScanSnapHomeMain.app

Note that the SQLite database is not used only by expenses (ZCONTENT.ZDOCTYPE=4) but also to store a summary of all documents scanned. Also, the ZCONTENT table has a column ZUNCHECKED that is not what you would expect, it is a constant 1 even if you remove the Unchecked tag from the transaction.

Now, all the usual disclaimers apply, modifying the database directly is not something supported by the app developer, and could have unintended consequences. If you use this script (or more likely modify it for your needs), I disclaim responsibility for any damages or data loss this may cause.