Mac

Batch-converting HEIC images to JPEGs on the Mac

TL:DR working around Apple proprietary brain damage

I use Lightroom 6 to manage my photo collection, although it is falling victim to bit rot (e.g. the face recognition module no longer works, apparently due to a licensing logic time bomb in the code). Exploitative pay-forever software subscriptions are simply unacceptable so I will not yield to Adobe’s Creative Clout bondage, and since Lightroom will not work in newer versions of MacOS, that means I am working on migrating to Darktable, albeit very slowly.

My wife does all her photography on her iPhone, and while the image quality is poor, she does take a great many photos and videos of our daughter. I decided to integrate them in my workflow.

To do so, I installed the free and excellent Photobackup app on her iPhone. It allows backing up her photos and videos using rsync to my ZFS backup server, from which I rsync them to my Mac, and then use my linkonce tool to create a parallel file hierarchy that mirrors it, but so that when I delete a photo in Lightroom, it stays deleted. That way I can remove duds without having them pop back up in Lightroom every time I do a sync.

I just realized I was missing a large number of images because they are in Apple’s obnoxious HEIF format, that they switched to around the time they introduced the mostly useless Live Photos misfeature. Lightroom 6 does not recognize the format. While you can batch convert and export HEIC files to JPEG in Preview.app, it is still a manual process.

I investigated what command-line tools are available that could be run from a cron job and there are surprisingly few. GraphicsMagick sensibly refuses to support the format because of patent concerns. Most of the others require compiling an intimidating stack of dependencies first, and because HEIF is based on the H.265 HEVC video codec, an ostensibly open (in name only) ISO format that is heavily encumbered with patents, so is HEIF and it is probably illegal to use those tools like heic2jpeg.

I opted instead to write my own heic2jpeg (no relation to the previous tool). It is a very basic conversion utility using Apple’s CoreImage framework, to piggyback on Apple’s patent licenses, and as a side benefit, it will preserve the image metadata including geoloc. The flip side is that means the tool can only run on a Mac and not on Linux or Illumos, but I can live with that.

It is also my first ever Swift project. A nice expressive language in the vein of Python or Go (except with Apple’s grotesquely long API names), but I do not expect to use it much, as I have grown disillusioned with Apple’s policies and software quality, and have no intention of indenturing myself as a sharecropper in Tim Cook’s plantation any more than to Adobe’s.

The code is in heic2jpeg.swift

To build it, assuming Swift or Xcode is installed on your Mac, just run:

swiftc -O -o heic2jpeg heic2jpeg.swift

My sync script (part of my backup script) then runs something like:

find $HOME/Pictures -name \*.HEIC -print0 | xargs -0 -P 12 -t -n 10 heic2jpeg --delete

This will run 12 processes in parallel, consuming 10 files each until all HEIC are converted (or if already converted, left alone). I find the optimal setting to be 150% to 200% of the actual cores on your system (not including Intel’s fake hyperthreading cores, which do not count).

import Foundation
import CoreImage

var jpegQuality = 0.90
let context = CIContext(options: nil)
let options = NSDictionary(
    dictionary: [kCGImageDestinationLossyCompressionQuality:jpegQuality]
)
var delete:Bool
var filename:String?
delete = false
for i in 1..<Int(CommandLine.argc) {
    filename = CommandLine.arguments[i]
    if filename == "-delete" || filename == "--delete" {
        delete = true
        continue
    }
    let srcURL = URL(fileURLWithPath:filename!)
    let destURL = srcURL.deletingPathExtension().appendingPathExtension("jpg")
    var exists:Bool
    do {
        exists = try destURL.checkResourceIsReachable()
    } catch {
        exists = false
    }
    if exists {
        print("skipping \(filename ?? "???")")
    } else {
        print("converting \(filename ?? "???")")
        let image = CIImage(contentsOf: srcURL)
        try! context.writeJPEGRepresentation(
            of:image!,
            to:destURL,
            colorSpace: image!.colorSpace!,
            options:options as! [CIImageRepresentationOption : Any]
        )
        if delete {
            print("deleting \(filename ?? "???")")
            try! FileManager.default.removeItem(at:srcURL)
        }
    }
}

Apple iCalendar's buggy SNI

TL:DR If you use Apple’s calendar client software, do not run the server on an IP and port shared with any other SSL/TLS services.

I run my own CalDAV calendar server for my family and for myself. For a very long time I used DAViCal, but it’s always been a slight annoyance to set up on Apple devices because they don’t like DAViCal’s https://example.com/davical/caldav.php/majid URLs. What’s more, recent versions of iCalendar would pop up password prompts at random, and after re-entering the password a couple of times (once is not enough), would finally go on and work. The various devices would also all too often get out of sync, sometimes with the inscrutable error:

Server responded with “500” to operation CalDAVAccountRefreshQueueableOperation

requiring deleting the calendar account and recreating it by hand.

I tried replacing DAViCal with Radicale today, with the same flaky user experience, and I finally figured out why: Apple uses at least a couple of daemons to manage calendar and sync, including dataaccessd, accountsd and remindd (also CalendarAgent depending on your OS version). It seems some or all of them do not implement Server Name Indication (SNI) consistently. SNI is the mechanism by which a TLS client indicates what server it is trying to connect to during the TLS handshake, so multiple servers can share the same IP address and port, and is an absolutely vital part of the modern web. For example many servers use Amazon Web Services' Elastic Load Balancer or CloudFront services, which are used by multiple clients, if Amazon had to dedicate a separate IP address for each, it would break their business model1.

Sometimes, those daemons will not use SNI, which means they will get your default server. In my case, it’s password-protected with a different password than the CalDAV one, which is what triggers the “enter password” dialog. At other times, they will call your CalDAV server with dubious URLs like /.well-known/caldav, /principals/, /dav/principals/, /caldav/v2 and if your server has a different HTTP password for that and sends back a HTTP 401 status code instead of a 404 Not Found, well, that will also trigger a reauthentication prompt.

Big Sur running on my M1 MacBook Air seems to be more consistent about using SNI, but will still poke around on those URLs, triggering the reauthentication prompts.

In other words, the only way to get and Apple-compatible calendar server running reliably is to dedicate an IP and port to it that is not shared with anything else. I only have one IP address at home where the server runs, and I run other vital services behind HTTPS, so I can’t dedicate 443 to a CalDAV server. Fortunately, the configuration will accept the syntax example.org:8443 to use a non-standard port (make sure you use the Advanced option, not Automatic), but this is incredibly sloppy of Apple.


  1. Amazon does in fact have a Legacy Clients Support option, but they charge a $600/month fee for that, and if you need more than two, they will demand written justification before approving your request. ↩︎

Exporting secrets from the Lockdown 2FA app

The Lockdown app mentioned in this article was last updated in 2015, and if you don't already use it, I would not recommend your adopting it.

I am (very) slowly migrating away from the Mac to Ubuntu Linux as my main desktop operating system. The reasons why Apple has lost my confidence are:

  • The execrable software quality of recent releases like Catalina (I plan on sticking with Mojave until I have migrated, however long that takes).
  • Apple’s increasing locking down of macOS in ways antithetical to software freedom, e.g. SIP or the notarization requirements in Catalina with the denial of service implications
  • The fact they no longer even pretend not to price-gouge on the Mac Pro. My days of buying their professional workstations every 5 years have come to an end after 15 years (PowerMac G5, Nehalem Mac Pro, 2013 Mac Pro)
  • As the iPhone market is saturating, in their eagerness to come up with a replacement growth engine in “services”, they are pushing app developers towards the despicable and unacceptable subscription licensing model
  • The butterfly keyboard fiasco exemplifies the contempt in which the company seems to hold its most loyal customers

On the plus side, ThinkPads have decent keyboards, unlike all Apple laptops since at least 2008, the LG Gram 17 is both lightweight, powerful and its huge screen is a boon to my tired eyes, and I am favorably impressed with the deep level of hardware integration offered by Ubuntu (e.g. displaying the logo and boot status in the UEFI stages of boot), even if I am not enamored of the software bloat or systemd.

One of the tasks in my migration checklist is to find a replacement for my TOTP two-factor authentication solution, which is currently the Lockdown app on iOS, iPadOS and MacOS (based on this recommendation, not to be confused with the Lockdown firewall/VPN app). I don’t trust Authy, they have a record of security failures introduced by their attempts to extend standard TOTP with their proprietary garbage, but I digress…

Thus I need to export Lockdown secrets. The iOS app can print or email a PDF with QR codes as a backup, but that’s not a very usable format for migration.

As I had to add a new TOTP secret to the app recently, that was the impetus to do this as a weekend project. I implemented a small utility called ldexport in Go to decode Lockdown for Mac’s internal file into either JSON or HTML format. Here are some simulated samples:

[
    {
        "Service": "Amazon",
        "Login": "amazon@example.com",
        "Created": "2015-11-18T19:53:34.969532012Z",
        "Modified": "2015-11-18T19:53:34.969532012Z",
        "URL": "otpauth://totp/Amazon%3Aamazon%40example.com?secret=M7IoBWqA2WuzYG27ju82XTWsflPEha3xBafMQ3i9CgwKgp6RdBGh\u0026issuer=Amazon",
        "Favorite": true,
        "Archived": false
    },
    {
        "Service": "PayPal",
        "Login": "ebay@example.com",
        "Created": "2019-11-25T08:46:57.253684043Z",
        "Modified": "2019-11-25T08:46:57.253684043Z",
        "URL": "otpauth://totp/PayPal:ebay@example.com?secret=3gB0VWJFkaYcVIiD\u0026issuer=PayPal",
        "Favorite": false,
        "Archived": false
    },
    {
        "Service": "Reddit",
        "Login": "johndoe",
        "Created": "2020-08-07T19:58:37.930042982+01:00",
        "Modified": "2020-08-07T19:58:37.930042982+01:00",
        "URL": "otpauth://totp/Reddit:johndoe?secret=nDTxDMI6bEgVpHWCViZjDFhXKH1bysRa\u0026issuer=Reddit",
        "Favorite": true,
        "Archived": false
    },
    {
        "Service": "GitHub",
        "Login": "",
        "Created": "2016-05-04T19:04:12.495128989+01:00",
        "Modified": "2017-04-04T06:33:10.641680002+01:00",
        "URL": "otpauth://totp/github.com/johndoe?issuer=GitHub\u0026secret=bXh5qmeTMzcatKKz",
        "Favorite": false,
        "Archived": false
    },
    {
        "Service": "Google",
        "Login": "johndoe@gmail.com",
        "Created": "2015-11-13T05:06:07.103500008Z",
        "Modified": "2015-11-13T05:06:07.103500008Z",
        "URL": "otpauth://totp/Google%3Ajohndoe%40gmail.com?secret=o5MvqdWDt7ZEHHSTuH6rCAUr4M6ozGQD\u0026issuer=Google",
        "Favorite": false,
        "Archived": false
    }
]

Making ScanSnap Receipts usable

For a long time I used a service called Shoeboxed to scan and organize my credit card receipts. Basically stuff your receipts in a US prepaid envelope, drop it in the mail, and they scan, OCR and shred them, as well as analyzing the text to extract the information. Unfortunately, since I moved to the UK the service leaves to be desired, and the price has also gone up over time.

I have a couple of Fujitsu ScanSnap document scanners, a S1500M, which is no longer supported on macOS Mojave (but fortunately is by the third-party app ExactScan), and an iX100 which still is supported, as well as a SV600 which is utterly unsuited to dealing with crumpled receipts. The new, dumbed-down ScanSnap Home app that ships with ScanSnaps has a receipt mode. Since I never use the iX100 to scan documents at home given I have a S1500M (it’s a handheld battery=powered simplex scanner that’s mostly intended for mobile use), I dedicated it and ScanSnap Home to scanning receipts.

The basic functionality of scanning, deskewing, OCR-ing and extracting date, amount, vendor and so on mostly works, but otherwise ScanSnap Receipts is an ergonomic disaster. For starters, it never recognizes the currency correctly and always identifies my transactions as being in dollars rather than pounds. Secondly, it inexplicably lacks the ability to batch-edit receipts, e.g. select the date range for my last trip to France and change all of them from dollars to euros. You need to edit them one by one, which is as incredibly tedious as you can imagine.

After a few weeks of this, I decided to take matters in my own hand. It turns out ScanSnap Home uses Core Data backed by an underlying SQLite database. SQLite is the world’s most widely deployed database (every single Android and iOS smartphone includes it, for starters), but the Core Data object-relational mapper above it does a terrific job of obfuscating it and reducing its performance. Nonetheless, after a little bit of digging, I wrote the following script to automate the most repetitive operations:

  1. Stop all ScanSnap auxiliary processes, as they have the DB opened even if you quit the app
  2. Set the currency for all transactions tagged as Unchecked (ScanSnap Home does this by default) to GBP
  3. Normalize all unchecked Waitrose vendor names to Waitrose
  4. Same for Tesco
  5. Rename M&S to Marks & Spencer
  6. Fixes for Superdrug and Sainsbury’s
  7. Set the amount of receipts tagged as duplicates to zero
  8. Attempt to fix little-endian European format DD/MM/YYYY dates parsed as middle-endian US format MM/DD/YYYY
  9. Attempt to fix Euro format DD/MM/YY dates parsed as US format YY/MM/DD
  10. Restart ScanSnap Home

The timestamps in the database are in a strange format that seems to be the number of seconds since an epoch of 2001-01-01T00:00:00 UTC. I hope no one tries to scan receipts older than that… You can convert to UNIX timestamps by adding 978307200 and from there to SQLite’s Julian Day format.

One major annoyance thing this script attempts to fix is dates. Because date formats are ambiguous (is 11/3/20 March 11 2020, or November 3 2020 or perversely March 20 2011?) and point-of-sale vendors are neither ISO 8601 nor even Y2100 compliant, parsing dates is a minefield. My assumption is that receipts will be scanned in a reasonably timely manner, and if there is ambiguity, the closest date should win.

#!/bin/sh
pkill -9 -f ScanSnap

sqlite3 "$HOME/Library/Application Support/PFU/ScanSnap Home/Managed/ScanSnapHome.sqlite" << EOF

.mode lines

UPDATE zcontent
SET zcurrencysign=(SELECT z_pk FROM zcurrencysign WHERE zvalue='GBP')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Waitrose')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Waitrose' AND zvalue LIKE '%waitrose%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Tesco')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Tesco' AND zvalue LIKE '%tesco%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Marks & Spencer')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue LIKE '%M&S%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Superdrug')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Superdrug' AND zvalue LIKE '%superdrug%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Sainsbury''s')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Sainsbury''s' AND zvalue LIKE '%Sainsbury%'
);

UPDATE zcontent
SET zamount=0.0
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Duplicate'
);

UPDATE zcontent
SET zreceiptdate = strftime('%s', 
  strftime('%Y-%d-%m', zreceiptdate+978307200, 'unixepoch', 'localtime'),
  'utc'
)-978307200
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
)
AND zreceiptdate IS NOT NULL
AND strftime('%s', 
  strftime('%Y-%d-%m', zreceiptdate+978307200, 'unixepoch', 'localtime'),
  'utc'
)-978307200
BETWEEN zreceiptdate AND strftime('%s', 'now')-978307200;

UPDATE zcontent
SET zreceiptdate = strftime('%s',
  strftime('20%d-%m-', zreceiptdate+978307200,
           'unixepoch', 'localtime') ||
  substr(strftime('%Y', zreceiptdate+978307200,
         'unixepoch', 'localtime'), 3),
  'utc'
)-978307200
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
)
AND zreceiptdate IS NOT NULL
AND strftime('%s',
  strftime('20%d-%m-', zreceiptdate+978307200,
           'unixepoch', 'localtime') ||
  substr(strftime('%Y', zreceiptdate+978307200,
         'unixepoch', 'localtime'), 3),
  'utc'
)-978307200
BETWEEN zreceiptdate AND strftime('%s', 'now')-978307200;

EOF

open /Applications/ScanSnapHomeMain.app

Note that the SQLite database is not used only by expenses (ZCONTENT.ZDOCTYPE=4) but also to store a summary of all documents scanned. Also, the ZCONTENT table has a column ZUNCHECKED that is not what you would expect, it is a constant 1 even if you remove the Unchecked tag from the transaction.

Now, all the usual disclaimers apply, modifying the database directly is not something supported by the app developer, and could have unintended consequences. If you use this script (or more likely modify it for your needs), I disclaim responsibility for any damages or data loss this may cause.

Update (2020-10-01):

Added:

  • setting duplicate receipt amounts to 0
  • fixing DD/MM/YYYY dates misparsed as MM/DD/YYYY
  • fixing DD/MM/YY dates misparsed as YY/MM/DD

Scanner group test

TL:DR Avoid scanners with Contact Image Sensors if you care at all about color fidelity.

Vermeer it is not

After my abortive trial of the Colortrac SmartLF Scan, I did a comparative test of scanning one of my daughter’s A3-sized drawings on a number of scanners I had handy.

Scanner Sensor Scan
Colortrac SmartLF Scan CIS ScanLF.jpg
Epson Perfection Photo V500 Photo (manually stitched) CCD Epson_V500.jpeg
Epson Perfection V19 (manually stitched) CIS Epson_V19.jpg
Fujitsu ScanSnap S1500M (using a carrier sheet and the built-in stitching) CCD S1500M_carriersheet.jpg
Fujitsu ScanSnap SV600 CCD SV600.jpg
Fuji X-Pro2 with XF 35mm f/1.4 lens, mounted on a Kaiser RS2 XA copy stand with IKEA KVART 3-spot floor lamp (CCT 2800K, a mediocre 82 CRI as measured with my UPRtek CV600) CMOS X-Pro2.jpg

I was shocked by the wide variance in the results, as was my wife. This is most obvious in the orange flower on the right.

Comparison

I scanned a swatch of the orange using a Nix Pro Color Sensor (it’s the orange square in the upper right corner of each scan in the comparison above). When viewed on my freshly calibrated NEC PA302W SpectraView II monitor, the Epson V500 scan is closest, followed by the ScanSnap SV600.

The two scanners using Contact Image Sensor (CIS) technology yielded dismal results. CIS are used in low-end scanners, and they have the benefit of low power usage, which is why the only USB bus-powered scanners available are all CIS models. CIS sensors begat the CMOS sensors used by the vast majority of digital cameras today, superseding CCDs in that application, I would not have expected such a gap in quality.

The digital camera scan was also quite disappointing. I blame the poor quality of the LEDs in the IKEA KVART three-headed lamp I used (pro tip: avoid IKEA LEDs like the plague, they are uniformly horrendous).

I was pleasantly surprised by the excellent performance of the S1500M document scanner. It is meant to be used for scanning sheaves of documents, not artwork, but Fujitsu did not skimp and used a CCD sensor element, and it shows.

Pro tip: a piece of anti-reflective Museum Glass or equivalent can help with curled originals on the ScanSnap SV600. I got mine from scraps at a framing shop. I can’t see a trace of reflections on the scan, unlike on the copy stand.

Update (2018-10-14):

Even more of a pro tip: a Japanese company named Bird Electron makes a series of accessories for the ScanSnap line, including a dust cover for the SV600 and the hilariously Engrish-named PZ-BP600 Book Repressor, essentially a sheet of 3mm anti-reflection coated acrylic with convenient carry handles. They are readily available on eBay from Japanese sellers.