IT

How to ensure a cron job runs exclusively

TL:DR a simple but effective mutex for cron jobs

Often you need to run a job periodically, e.g. backing up files, but the job could take more time than the interval allotted between runs, and you do not want multiple instances of the process to be running at the same time. For instance, bad things happen when multiple rsync processes are trying to synchronize the same folders to the same destination. Thus you want a mutex, something that ensures only one copy of the process can run at any given time.

There are approaches using lock files, but if the computer reboots or the job crashes, the lockfile will not be deleted and all subsequent runs of the job will fail. Some advocate using flock() or fcntl(), but those calls are finicky with strange semantics, e.g. fcntl will release a lock if any related process closes the file.

My solution to deal with this is to bind an IPv6 localhost ::1 socket to a given port. Only one process can do this, and thus it’s a very effective mutex. No lock files to cause havoc, no dealing with the dark and buggy corners of advisory file locking.

For shell scripts, simply replace the #!/bin/sh with #!/somewhere/bin/lock 2048 where 2048 is the port number you will use to enforce the lock (greater than 1024 if you do not want to deal with the hassles of privileged ports). If you want the jobs to wait and not exit immediately if they fail to acquire the lock, just change the line to #!/somewhere/bin/lock w2048

The code is in lock.c. Just compile using:

gcc -O2 -o lock lock.c

or

clang -O2 -o lock lock.c.

#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <netinet/in.h>
#include <inttypes.h>
#include <sys/time.h>
#include <string.h>

extern char **environ;

int main(int argc, char **argv) {
  int sock, port, status, exit_on_fail;
  char *port_start, *port_end = NULL;
  struct sockaddr_in6 sin6;
  struct timeval timeout;

  if (argc < 3) {
    fprintf(
      stderr,
      "Usage:\n"
      "\t#!%s [w]<port:1-65535> (first line of script instead of #!/bin/sh)\n"
      "\t\tor\n"
      "\t%s [w]<port:1-65535> -c \"cmd [args...]\"\n\n"
      "\tw: wait if we could not get the port\n",
      argv[0], argv[0]);
    return -1;
  }
  
  exit_on_fail = 1;
  port_start = argv[1];
  if (port_start[0] == 'w') {
    exit_on_fail = 0;
    port_start++;
  }
  port = strtol(port_start, &port_end, 10);
  if (port_end != port_start + strlen(port_start)) {
    printf("port %s invalid format, must be integer between 1 and 65535\n",
           port_start);
    return -2;
  }
  if (port < 1 || port > 65535) {
    printf("port %d invalid, must be between 1 and 65535\n", port);
    return -3;
  }

  sock = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDP);
  if (sock == -1) {
    perror("could not create socket");
    return -4;
  }

  sin6.sin6_family = AF_INET6;
  sin6.sin6_port = htons(port);
  sin6.sin6_addr = in6addr_loopback;

  status = -1;
  while (status < 0) {
    status = bind(sock, (const struct sockaddr *) &sin6, sizeof(sin6));
    if (status < 0) {
      if (exit_on_fail) {
        /* perror("could not bind socket"); */
        return -5;
      }
      timeout.tv_sec = 1;
      timeout.tv_usec = 0;
      /* fputs("sleeping...\n", stderr); */
      select(0, NULL, NULL, NULL, &timeout);
      
    }
  }
  /* default to /bin/sh if no args are supplied, so we can do something like:
     #!lock 2048
     instead of
     #!/bin/sh
  */
  argv[1] = "/bin/sh";
  execvp("/bin/sh", &argv[1]);
}

Automating Epson SSL/TLS certificate renewal

Network-capable Epson printers like my new ET-16600 have a web-based user interface that supports HTTPS. You can even upload publicly recognized certificates from Let’s Encrypt et al, unfortunately the only options they offer is a Windows management app (blech) or a manual form.

When you have to upload this every month (that’s when I automatically renew my Let’s Encrypt certificates), this gets old really fast, and strange errors happen if you forget to do so and end up with an expired certificate.

I wrote a quick Python script to automate this (and yes, I am aware of the XKCDs on the subject of runaway automation):

#!/usr/bin/env python3
import requests, html5lib, io

URL = 'https://myepson.example.com/'
USERNAME = 'majid'
PASSWORD = 'your-admin-UI-password-here'
KEYFILE = '/home/majid/web/acme-tiny/epson.key'
CERTFILE = '/home/majid/web/acme-tiny/epson.crt'

########################################################################
# step 1, authenticate
jar = requests.cookies.RequestsCookieJar()
set_url = URL + 'PRESENTATION/ADVANCED/PASSWORD/SET'
r = requests.post(set_url, cookies=jar,
                  data={
                    'INPUTT_USERNAME': USERNAME,
                    'access': 'https',
                    'INPUTT_PASSWORD': PASSWORD,
                    'INPUTT_ACCSESSMETHOD': 0,
                    'INPUTT_DUMMY': ''
                  })
assert r.status_code == 200
jar = r.cookies

########################################################################
# step 2, get the cert update form iframe and its token
form_url = URL + 'PRESENTATION/ADVANCED/NWS_CERT_SSLTLS/CA_IMPORT'
r = requests.get(form_url, cookies=jar)
tree = html5lib.parse(r.text, namespaceHTMLElements=False)
data = dict([(f.attrib['name'], f.attrib['value']) for f in
             tree.findall('.//input')])
assert 'INPUTT_SETUPTOKEN' in data

# step 3, upload key and certs
data['format'] = 'pem_der'
del data['cert0']
del data['cert1']
del data['cert2']
del data['key']

upload_url = URL + 'PRESENTATIONEX/CERT/IMPORT_CHAIN'

########################################################################
# Epson doesn't seem to like bundled certificates,
# so split it into its componens
f = open(CERTFILE, 'r')
full = f.readlines()
f.close()
certno = 0
certs = dict()
for line in full:
  if not line.strip(): continue
  certs[certno] = certs.get(certno, '') + line
  if 'END CERTIFICATE' in line:
    certno = certno + 1
files = {
  'key': open(KEYFILE, 'rb'),
}
for certno in certs:
  assert certno < 3
  files[f'cert{certno}'] = io.BytesIO(certs[certno].encode('utf-8'))

########################################################################
# step 3, submit the new cert
r = requests.post(upload_url, cookies=jar,
                  files=files,
                  data=data)

########################################################################
# step 4, verify the printer accepted the cert and is shutting down
if not 'Shutting down' in r.text:
  print(r.text)
assert 'Shutting down' in r.text
print('Epson certificate successfully uploaded to printer.')

Update (2020-12-29):

If you are having problems with the Scan to Email feature, with the singularly unhelpful message “Check your network or WiFi connection”, it may be the Epson does not recognize the new Let’s Encrypt R3 CA certificate. You can address this by importing it in the Web UI, under the “Network Security” tab, then “CA Certificate” menu item on the left. The errors I was seeing in my postfix logs were:

Dec 29 13:30:20 zulfiqar mail.info postfix/smtpd[13361]: connect from epson.majid.org[10.0.4.33]
Dec 29 13:30:20 zulfiqar mail.info postfix/smtpd[13361]: SSL_accept error from epson.majid.org[10.0.4.33]: -1
Dec 29 13:30:20 zulfiqar mail.warn postfix/smtpd[13361]: warning: TLS library problem: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca:ssl/record/rec_layer_s3.c:1543:SSL alert number 48:
Dec 29 13:30:20 zulfiqar mail.info postfix/smtpd[13361]: lost connection after STARTTLS from epson.majid.org[10.0.4.33]
Dec 29 13:30:20 zulfiqar mail.info postfix/smtpd[13361]: disconnect from epson.majid.org[10.0.4.33] ehlo=1 starttls=0/1 commands=1/2

Update (2021-08-01):

The script was broken due to changes in Let’s Encrypt’s trust path. Seemingly Epson’s software doesn’t like certificates incorporating 3 PEM files and shows the singularly unhelpful error “Invalid File”. I modified the script to split the certificate into its component parts. You may also need to upload the root certificates via the “CA Certificate” link above. I added these and also updated the built-in root certificates to version 02.03 and it seems to work:

  • lets-encrypt-r3-cross-signed.pem 40:01:75:04:83:14:a4:c8:21:8c:84:a9:0c:16:cd:df
  • isrgrootx1.pem 82:10:cf:b0:d2:40:e3:59:44:63:e0:bb:63:82:8b:00
  • lets-encrypt-r3.pem 91:2b:08:4a:cf:0c:18:a7:53:f6:d6:2e:25:a7:5f:5a

They are available from the Let’s Encrypt certificates page.

Edgewalker, a DIY VPN server

TL:DR Don’t trust VPN services, roll your own with this easy script.

Rationale

There are many reasons to use a Virtual Private Network. Perhaps you are on an unsecured WiFi network. Perhaps you don’t want your Internet Service Provider to snoop on your browsing history using Deep Packet Inspection and compile a marketing dossier on your. Perhaps like my daughter you want to access video content on Netflix that is not available in your country. Perhaps you want to bypass the nanny state content filters the British government mandates.

Most VPN services are untrustworthy. You depend on the VPN provider’s assurances to protect your privacy, which completely defeats the purpose of a VPN. The only way you can be sure is to run your own, but baroque network protocols engendering complex software makes it difficult to do so even for the technically savvy.

Streisand was one of the first efforts to automate the process, using cloud virtual servers as the hosts operating the VPN. Trail of Bits implemented Algo to simplify it and remove some questionable choices Streisand made (although, to be fair, the Streisand project seems to have jettisoned many of them and converged on WireGuard).

Edgewalker is similar, but awesomer:

  • It is based on OpenBSD, widely considered the most secure general-purpose OS, rather than Linux.
  • Like Algo, it implements IPsec/IKEv2/MOBIKE rather than OpenVPN (read the Algo announcement for the reasons why).
    • IPsec/IKEv2 works out of the box on iOS, iPadOS and macOS.
    • In theory on Windows as well, although I have no idea how to make it work or simplify setup, any help is welcome.
  • It also implements WireGuard (recommended for Linux and Android, along with travel VPN-capable routers like the GL.iNet Mango)
  • It uses QR codes to simplify installation as much as possible on the client devices.
  • It uses Let’s Encrypt so your IPsec certificates just work (WireGuard does not rely on PKI)
  • It uses its own Unbound DNS server with DNSSEC validation support, for better privacy
  • It has no dependencies on Ansible, Python or anything else exotic you need to add on your own machine, other than a SSH client.
  • It is just a shell script with little bits of Python thrown in like Acme-Tiny, and easily auditable.

While you can run the script again as your Let’s Encrypt certificates expire (although it generates new credentials each time), I recommend simply destroying the VM and creating a new one. Of course, if you are running on physical hardware, you will want to rerun the script. If using WireGuard only, you don’t need to rerun the script as WireGuard keys do not expire and there are no certificates.

Prerequisites

You need:

  • A Let’s Encrypt account and key (I’m working on setting this up automatically for you, in the meantime you can use Step 1 on this page to do that for you).
  • An OpenBSD machine reachable from the Internet (it can be a physical machine you own, or a cloud VM like Vultr).
  • The ability to add a DNS record for the machine’s IP address (IPv4 only for now).
  • The 80x25 OpenBSD console does not support UTF-8 and cannot display the QR code in a single screen. Use a different terminal, or enter the profile URL by hand.

If you have a firewall in front of the OpenBSD machine, it needs to allow the following inbound traffic (possibly using static port mappings if you use NAT):

  • SSH (TCP port 22) so you can actually log in to your machine.
  • HTTP (TCP port 80) and HTTPS (TCP port 443) to allow Let’s Encrypt certificate issual and allow you to get the Apple-format Profiles that will ease setup on your iDevice.
  • UDP ports 500 (IKE), 1701 (IPsec) and 4500 (IPsec NAT traversal).
  • Optionally IPsec protocols ESP (IP protocol number 50, hex 0x32)) and AH (decimal 51 hex 0x33) and ESP for maximum efficiency, although many firewalls won’t support this.
  • UDP port 51820 (WireGuard).

Instructions

  • Clone the Github repository into one of your own, or copy the file edgewalker.sh somewhere you can download it without it being tampered with in transit, in practice that means HTTPS.
  • Edit the first lines in the script edggewalker.sh (X509 and USERNAME). Not strictly necessary, but make it your own.
  • Log in as root on your OpenBSD machine, then:
    pkg_add wget
    wget -c https://raw.githubusercontent.com/YOUR_GITHUB_ACCOUNT_HERE/edgewalker/main/edgewalker.sh
    sh -e edgewalker.sh
    
  • The script will ask you for:
    • The DNS name of your OpenBSD machine.
    • To copy-paste your Let’s Encrypt account key in PEM format.
  • It will then obtain Let’s Encrypt certificates, generate a QR code that you can use to download the profile on your iDevice to set up the VPN.

Credits

  • The OpenBSD team, for making their wonderful security-focused OS.
  • Reyk Flöter for making OpenIKEd, a breath of fresh air in the unnecessarily convoluted world of VPN software.
  • Jason A. Donenfeld for inventing WireGuard.
  • Let’s Encrypt, for making certificates cheap and easy.
  • Daniel Roesler for the fantastic Acme-Tiny.

Demo

I created a fresh OpenBSD 6.8 VM vpn42.majid.org on Vultr, and here is what the experience looks like:

Here is how to install the VPN on an iPhone:

Here is how to create a suitable VM on Vultr:

DNP D820A review

A very solid and trouble-free printer that makes excellent prints, including spectacular panoramics, for a significant fixed price.

Despite striving for the paperless office, and believing photographic prints are mostly a relic, I have a substantial collection of printers (as my daughter points out, it’s 5 printers per person in my household):

  • HP OfficeJet Pro X551dw (extremely fast using PageWide fixed head technology, quite economical, huge paper tray capacity, very bulky)
  • Epson EcoTank ET-16600 (prints and scans A3, very economical, also very bulky but not considering the print size)
  • Brother QL-700, QL-820NWB, QL-1110NWB label printers (can make labels any length you want, the latter two are AirPrint compatible)
  • Rollo label printer (will take practically any label stock you can throw at it)
  • Fuji Instax SP100 instant photo printer (kids love them)
  • Canon Selphy QX10 portable dye-sub sticker printer for my daughter
  • two Dai-Nippon Printing DNP DS820A 8" dye-sub printers, one in storage
  • An Epson Stylus Photo R2400 in storage
  • a couple of Brother TZe label makers
  • a Dymo LabelWriter 450 Twin Turbo (unreliable garbage, at least on Mac, avoid)
  • A Selpic P1 on the way
  • A Prusa i3 MK3S 3D printer (not sure if that counts)

The DNP DS820A replaced my Epson R2400 for two reasons:

  • I print seldom enough that inks clogging in the nozzles was a big issue.
  • The Epson is a behemoth that is very hard to find a place for, even before I downsized.

The DNP uses dye-sublimation technology to make its prints. You may have encountered one at a drugstore self-service photo kiosk, or at photo events like Macy’s Santa Claus portrait sessions. These printers are designed specifically for these two use cases, and are built like tanks with a steel chassis. Since most events typically gang two or even four printers to maximize throughput, they are also very compact, with a footprint barely larger than an A3 sheet of paper, mine is on a lower shelf in my IKEA FREDDE computer desk.

Until the advent of fine-art photo printers with 6 or more color pigment inks, dye-sub was the top-end digital photo printing technology, thanks to the continous tones it can generate, like photographic processes (e.g. Fuji Frontier or Noritsu QSS digital minilabs, or large-format laser enlargers like the Cymbolic Sciences LightJet or Durst Lambda/Theta). Dye-subs have all but disappeared from the consumer market, however, apart from some Canon Selphy compact printers, and are now largely reserved for professional applications, with a price to match. The DNP DS820A used to cost $1100. They lowered the price to under $1000 a few years ago, but cheaped out by removing the print-catching basket that used to be included in the older package.

You pop off the front panel and install a roll of paper and a reel of dye ribbons in a tray above the paper inside the printer, then pop the front back in. Nothing protrudes and the media is protected from dust, which is really nice. There are two different sizes of media, 8x10 (130 prints) and 8x12 (110 prints). The size is mostly relevant for the dye ribbons that have CMY sections sized in increments of 10 or 12 inches, but a surprising consequence of this is that you cannot switch from 8x10 to 8x12 and vice versa (you can make smaller divisions and the printer will trim them to size using its built-in cutter). The cost per print is about $0.65 for 8x10, $0.80 for 8x12, $1.30 if you get the premium metallic paper. Since the paper and ribbon is consumed no matter the coverage, it’s a constant, unlike the variable costs of an inkjet printer.

The print quality is excellent, as can be expected, as is the color calibration out of the box. It may not quite have the tonal subtlety of an Epson, but there is no visible pixellation. Furthermore, the prints get a clear protective laminate, which makes them smudge-proof and very tough. You can even choose one of four different finishes applied by a roller so no media change required: glossy, luster, matte and fine matte.

One of the marquee features of the DS820A and its little 6" brother the DS620A is the ability to make panoramic prints. Each print is made by combining multiple pages together, with about 2" of overlap wastage, so if your printer is loaded with 8x12 media you can make 8x22 or 8x32 prints, with 8x10 media you can make 8x18 or 8x26. The 8x32 panoramic prints are absolutely spectacular, although finding a suitable frame for them is not a trivial undertaking, that not being a standard print size.

Unfortunately this functionality is not built into the printer driver, but you must use the DNP Hot Folder utility, and while it is available for both Mac and Windows, only the Windows version can make panoramic prints. DNP Hot Folder is meant to use for events where a single PC or server controls multiple printers. You drop the files into a directory per print size (hence the name “Hot Folder”) and the software will automatically print it on the next available printer loaded with the right media. Since the printers run in parallel, even if the print speed is not incredibly fast (about 30 to 60 seconds per print), aggregate throughput is sufficient for a busy event. I have mine on a USB switch (the printer has no network connectivity) to share it between my Mac and my gaming PC.

Making ScanSnap Receipts usable

For a long time I used a service called Shoeboxed to scan and organize my credit card receipts. Basically stuff your receipts in a US prepaid envelope, drop it in the mail, and they scan, OCR and shred them, as well as analyzing the text to extract the information. Unfortunately, since I moved to the UK the service leaves to be desired, and the price has also gone up over time.

I have a couple of Fujitsu ScanSnap document scanners, a S1500M, which is no longer supported on macOS Mojave (but fortunately is by the third-party app ExactScan), and an iX100 which still is supported, as well as a SV600 which is utterly unsuited to dealing with crumpled receipts. The new, dumbed-down ScanSnap Home app that ships with ScanSnaps has a receipt mode. Since I never use the iX100 to scan documents at home given I have a S1500M (it’s a handheld battery=powered simplex scanner that’s mostly intended for mobile use), I dedicated it and ScanSnap Home to scanning receipts.

The basic functionality of scanning, deskewing, OCR-ing and extracting date, amount, vendor and so on mostly works, but otherwise ScanSnap Receipts is an ergonomic disaster. For starters, it never recognizes the currency correctly and always identifies my transactions as being in dollars rather than pounds. Secondly, it inexplicably lacks the ability to batch-edit receipts, e.g. select the date range for my last trip to France and change all of them from dollars to euros. You need to edit them one by one, which is as incredibly tedious as you can imagine.

After a few weeks of this, I decided to take matters in my own hand. It turns out ScanSnap Home uses Core Data backed by an underlying SQLite database. SQLite is the world’s most widely deployed database (every single Android and iOS smartphone includes it, for starters), but the Core Data object-relational mapper above it does a terrific job of obfuscating it and reducing its performance. Nonetheless, after a little bit of digging, I wrote the following script to automate the most repetitive operations:

  1. Stop all ScanSnap auxiliary processes, as they have the DB opened even if you quit the app
  2. Set the currency for all transactions tagged as Unchecked (ScanSnap Home does this by default) to GBP
  3. Normalize all unchecked Waitrose vendor names to Waitrose
  4. Same for Tesco
  5. Rename M&S to Marks & Spencer
  6. Fixes for Superdrug and Sainsbury’s
  7. Set the amount of receipts tagged as duplicates to zero
  8. Attempt to fix little-endian European format DD/MM/YYYY dates parsed as middle-endian US format MM/DD/YYYY
  9. Attempt to fix Euro format DD/MM/YY dates parsed as US format YY/MM/DD
  10. Restart ScanSnap Home

The timestamps in the database are in a strange format that seems to be the number of seconds since an epoch of 2001-01-01T00:00:00 UTC. I hope no one tries to scan receipts older than that… You can convert to UNIX timestamps by adding 978307200 and from there to SQLite’s Julian Day format.

One major annoyance thing this script attempts to fix is dates. Because date formats are ambiguous (is 11/3/20 March 11 2020, or November 3 2020 or perversely March 20 2011?) and point-of-sale vendors are neither ISO 8601 nor even Y2100 compliant, parsing dates is a minefield. My assumption is that receipts will be scanned in a reasonably timely manner, and if there is ambiguity, the closest date should win.

#!/bin/sh
pkill -9 -f ScanSnap

sqlite3 "$HOME/Library/Application Support/PFU/ScanSnap Home/Managed/ScanSnapHome.sqlite" << EOF

.mode lines

UPDATE zcontent
SET zcurrencysign=(SELECT z_pk FROM zcurrencysign WHERE zvalue='GBP')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Waitrose')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Waitrose' AND zvalue LIKE '%waitrose%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Tesco')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Tesco' AND zvalue LIKE '%tesco%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Marks & Spencer')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue LIKE '%M&S%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Superdrug')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Superdrug' AND zvalue LIKE '%superdrug%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Sainsbury''s')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Sainsbury''s' AND zvalue LIKE '%Sainsbury%'
);

UPDATE zcontent
SET zamount=0.0
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Duplicate'
);

UPDATE zcontent
SET zreceiptdate = strftime('%s', 
  strftime('%Y-%d-%m', zreceiptdate+978307200, 'unixepoch', 'localtime'),
  'utc'
)-978307200
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
)
AND zreceiptdate IS NOT NULL
AND strftime('%s', 
  strftime('%Y-%d-%m', zreceiptdate+978307200, 'unixepoch', 'localtime'),
  'utc'
)-978307200
BETWEEN zreceiptdate AND strftime('%s', 'now')-978307200;

UPDATE zcontent
SET zreceiptdate = strftime('%s',
  strftime('20%d-%m-', zreceiptdate+978307200,
           'unixepoch', 'localtime') ||
  substr(strftime('%Y', zreceiptdate+978307200,
         'unixepoch', 'localtime'), 3),
  'utc'
)-978307200
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
)
AND zreceiptdate IS NOT NULL
AND strftime('%s',
  strftime('20%d-%m-', zreceiptdate+978307200,
           'unixepoch', 'localtime') ||
  substr(strftime('%Y', zreceiptdate+978307200,
         'unixepoch', 'localtime'), 3),
  'utc'
)-978307200
BETWEEN zreceiptdate AND strftime('%s', 'now')-978307200;

EOF

open /Applications/ScanSnapHomeMain.app

Note that the SQLite database is not used only by expenses (ZCONTENT.ZDOCTYPE=4) but also to store a summary of all documents scanned. Also, the ZCONTENT table has a column ZUNCHECKED that is not what you would expect, it is a constant 1 even if you remove the Unchecked tag from the transaction.

Now, all the usual disclaimers apply, modifying the database directly is not something supported by the app developer, and could have unintended consequences. If you use this script (or more likely modify it for your needs), I disclaim responsibility for any damages or data loss this may cause.

Update (2020-10-01):

Added:

  • setting duplicate receipt amounts to 0
  • fixing DD/MM/YYYY dates misparsed as MM/DD/YYYY
  • fixing DD/MM/YY dates misparsed as YY/MM/DD