Fazal Majid's low-intensity blog

Sporadic pontification

Fazal

Please update to Temboz 4.4.0 or later

TL:DR If you are using my Temboz feed reader, please update as soon as possible to version 4.4.0 or later.

This is somewhat related to the last security advisory for Temboz. Fields like article title, author or tags, or feed title or description that are supposed to be plain text (not even HTML) were not being sanitized. For XSS. The effect was demonstrated by this article in BoingBoing.

Once again I apologize for potentially exposing you to XSS attacks via malicious feeds, and I would recommend you subscribe to my RSS feed for it so you can get important announcements like this one in the future.

DNP D820A review

A very solid and trouble-free printer that makes excellent prints, including spectacular panoramics, for a significant fixed price.

Despite striving for the paperless office, and believing photographic prints are mostly a relic, I have a substantial collection of printers (as my daughter points out, it’s 5 printers per person in my household):

  • HP OfficeJet Pro X551dw (extremely fast using PageWide fixed head technology, quite economical, huge paper tray capacity, very bulky)
  • Epson EcoTank ET-16600 (prints and scans A3, very economical, also very bulky but not considering the print size)
  • Brother QL-700, QL-820NWB, QL-1110NWB label printers (can make labels any length you want, the latter two are AirPrint compatible)
  • Rollo label printer (will take practically any label stock you can throw at it)
  • Fuji Instax SP100 instant photo printer (kids love them)
  • Canon Selphy QX10 portable dye-sub sticker printer for my daughter
  • two Dai-Nippon Printing DNP DS820A 8" dye-sub printers, one in storage
  • An Epson Stylus Photo R2400 in storage
  • a couple of Brother TZe label makers
  • a Dymo LabelWriter 450 Twin Turbo (unreliable garbage, at least on Mac, avoid)
  • A Selpic P1 on the way
  • A Prusa i3 MK3S 3D printer (not sure if that counts)

The DNP DS820A replaced my Epson R2400 for two reasons:

  • I print seldom enough that inks clogging in the nozzles was a big issue.
  • The Epson is a behemoth that is very hard to find a place for, even before I downsized.

The DNP uses dye-sublimation technology to make its prints. You may have encountered one at a drugstore self-service photo kiosk, or at photo events like Macy’s Santa Claus portrait sessions. These printers are designed specifically for these two use cases, and are built like tanks with a steel chassis. Since most events typically gang two or even four printers to maximize throughput, they are also very compact, with a footprint barely larger than an A3 sheet of paper, mine is on a lower shelf in my IKEA FREDDE computer desk.

Until the advent of fine-art photo printers with 6 or more color pigment inks, dye-sub was the top-end digital photo printing technology, thanks to the continous tones it can generate, like photographic processes (e.g. Fuji Frontier or Noritsu QSS digital minilabs, or large-format laser enlargers like the Cymbolic Sciences LightJet or Durst Lambda/Theta). Dye-subs have all but disappeared from the consumer market, however, apart from some Canon Selphy compact printers, and are now largely reserved for professional applications, with a price to match. The DNP DS820A used to cost $1100. They lowered the price to under $1000 a few years ago, but cheaped out by removing the print-catching basket that used to be included in the older package.

You pop off the front panel and install a roll of paper and a reel of dye ribbons in a tray above the paper inside the printer, then pop the front back in. Nothing protrudes and the media is protected from dust, which is really nice. There are two different sizes of media, 8x10 (130 prints) and 8x12 (110 prints). The size is mostly relevant for the dye ribbons that have CMY sections sized in increments of 10 or 12 inches, but a surprising consequence of this is that you cannot switch from 8x10 to 8x12 and vice versa (you can make smaller divisions and the printer will trim them to size using its built-in cutter). The cost per print is about $0.65 for 8x10, $0.80 for 8x12, $1.30 if you get the premium metallic paper. Since the paper and ribbon is consumed no matter the coverage, it’s a constant, unlike the variable costs of an inkjet printer.

The print quality is excellent, as can be expected, as is the color calibration out of the box. It may not quite have the tonal subtlety of an Epson, but there is no visible pixellation. Furthermore, the prints get a clear protective laminate, which makes them smudge-proof and very tough. You can even choose one of four different finishes applied by a roller so no media change required: glossy, luster, matte and fine matte.

One of the marquee features of the DS820A and its little 6" brother the DS620A is the ability to make panoramic prints. Each print is made by combining multiple pages together, with about 2" of overlap wastage, so if your printer is loaded with 8x12 media you can make 8x22 or 8x32 prints, with 8x10 media you can make 8x18 or 8x26. The 8x32 panoramic prints are absolutely spectacular, although finding a suitable frame for them is not a trivial undertaking, that not being a standard print size.

Unfortunately this functionality is not built into the printer driver, but you must use the DNP Hot Folder utility, and while it is available for both Mac and Windows, only the Windows version can make panoramic prints. DNP Hot Folder is meant to use for events where a single PC or server controls multiple printers. You drop the files into a directory per print size (hence the name “Hot Folder”) and the software will automatically print it on the next available printer loaded with the right media. Since the printers run in parallel, even if the print speed is not incredibly fast (about 30 to 60 seconds per print), aggregate throughput is sufficient for a busy event. I have mine on a USB switch (the printer has no network connectivity) to share it between my Mac and my gaming PC.

Critical Temboz vulnerability, please update

TL:DR If you are using my Temboz feed reader, please update as soon as possible to version 4.0 or later.

This is because Temboz depends on feedparser, Kurt McKee’s (originally Mark Pilgrim’s) ultra-liberal feed parsing library for Python. One of its responsibilities is to sanitize feed content to strip out potentially dangerous HTML like <script> tags. Unfortunately, I only just realized that on Python 3, due to the absence of the sgmllib module in Python 3 that used to ship with Python 2, feedparser will silently fail and not sanitize the HTML instead of failing safe, e.g. throwing a NotImplementedError. Since this is such a fundamentally flawed approach, I decided no longer to trust feedparser withis responsibility and assign it to Mozilla’s bleach instead. Furthermore, Temboz will now perform a sanity check at startup and refuse to start if <script> tags are not being filtered.

I apologize for potentially exposing you to XSS attacks via malicious feeds. Unfortunately I have no way to reach out to all those who installed Temboz. If you are installing Temboz, I would recommend you subscribe to my RSS feed for it so you can get important announcements like this one in the future.

Making ScanSnap Receipts usable

For a long time I used a service called Shoeboxed to scan and organize my credit card receipts. Basically stuff your receipts in a US prepaid envelope, drop it in the mail, and they scan, OCR and shred them, as well as analyzing the text to extract the information. Unfortunately, since I moved to the UK the service leaves to be desired, and the price has also gone up over time.

I have a couple of Fujitsu ScanSnap document scanners, a S1500M, which is no longer supported on macOS Mojave (but fortunately is by the third-party app ExactScan), and an iX100 which still is supported, as well as a SV600 which is utterly unsuited to dealing with crumpled receipts. The new, dumbed-down ScanSnap Home app that ships with ScanSnaps has a receipt mode. Since I never use the iX100 to scan documents at home given I have a S1500M (it’s a handheld battery=powered simplex scanner that’s mostly intended for mobile use), I dedicated it and ScanSnap Home to scanning receipts.

The basic functionality of scanning, deskewing, OCR-ing and extracting date, amount, vendor and so on mostly works, but otherwise ScanSnap Receipts is an ergonomic disaster. For starters, it never recognizes the currency correctly and always identifies my transactions as being in dollars rather than pounds. Secondly, it inexplicably lacks the ability to batch-edit receipts, e.g. select the date range for my last trip to France and change all of them from dollars to euros. You need to edit them one by one, which is as incredibly tedious as you can imagine.

After a few weeks of this, I decided to take matters in my own hand. It turns out ScanSnap Home uses Core Data backed by an underlying SQLite database. SQLite is the world’s most widely deployed database (every single Android and iOS smartphone includes it, for starters), but the Core Data object-relational mapper above it does a terrific job of obfuscating it and reducing its performance. Nonetheless, after a little bit of digging, I wrote the following script to automate the most repetitive operations:

  1. Stop all ScanSnap auxiliary processes, as they have the DB opened even if you quit the app
  2. Set the currency for all transactions tagged as Unchecked (ScanSnap Home does this by default) to GBP
  3. Normalize all unchecked Waitrose vendor names to Waitrose
  4. Same for Tesco
  5. Rename M&S to Marks & Spencer
  6. Fixes for Superdrug and Sainsbury’s
  7. Set the amount of receipts tagged as duplicates to zero
  8. Attempt to fix little-endian European format DD/MM/YYYY dates parsed as middle-endian US format MM/DD/YYYY
  9. Attempt to fix Euro format DD/MM/YY dates parsed as US format YY/MM/DD
  10. Restart ScanSnap Home

The timestamps in the database are in a strange format that seems to be the number of seconds since an epoch of 2001-01-01T00:00:00 UTC. I hope no one tries to scan receipts older than that… You can convert to UNIX timestamps by adding 978307200 and from there to SQLite’s Julian Day format.

One major annoyance thing this script attempts to fix is dates. Because date formats are ambiguous (is 11/3/20 March 11 2020, or November 3 2020 or perversely March 20 2011?) and point-of-sale vendors are neither ISO 8601 nor even Y2100 compliant, parsing dates is a minefield. My assumption is that receipts will be scanned in a reasonably timely manner, and if there is ambiguity, the closest date should win.

#!/bin/sh
pkill -9 -f ScanSnap

sqlite3 "$HOME/Library/Application Support/PFU/ScanSnap Home/Managed/ScanSnapHome.sqlite" << EOF

.mode lines

UPDATE zcontent
SET zcurrencysign=(SELECT z_pk FROM zcurrencysign WHERE zvalue='GBP')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Waitrose')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Waitrose' AND zvalue LIKE '%waitrose%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Tesco')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Tesco' AND zvalue LIKE '%tesco%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Marks & Spencer')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue LIKE '%M&S%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Superdrug')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Superdrug' AND zvalue LIKE '%superdrug%'
);

UPDATE zcontent
SET zvendor=(SELECT z_pk FROM zvendor WHERE zvalue='Sainsbury''s')
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
) AND zvendor IN (
  SELECT z_pk FROM zvendor
  WHERE zvalue<>'Sainsbury''s' AND zvalue LIKE '%Sainsbury%'
);

UPDATE zcontent
SET zamount=0.0
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Duplicate'
);

UPDATE zcontent
SET zreceiptdate = strftime('%s', 
  strftime('%Y-%d-%m', zreceiptdate+978307200, 'unixepoch', 'localtime'),
  'utc'
)-978307200
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
)
AND zreceiptdate IS NOT NULL
AND strftime('%s', 
  strftime('%Y-%d-%m', zreceiptdate+978307200, 'unixepoch', 'localtime'),
  'utc'
)-978307200
BETWEEN zreceiptdate AND strftime('%s', 'now')-978307200;

UPDATE zcontent
SET zreceiptdate = strftime('%s',
  strftime('20%d-%m-', zreceiptdate+978307200,
           'unixepoch', 'localtime') ||
  substr(strftime('%Y', zreceiptdate+978307200,
         'unixepoch', 'localtime'), 3),
  'utc'
)-978307200
WHERE zdoctype=4 AND z_pk IN (
  SELECT z_4contents
  FROM z_4labels
  JOIN zlabel ON z_15labels=zlabel.z_pk
  WHERE zlabel.zname='Unchecked'
)
AND zreceiptdate IS NOT NULL
AND strftime('%s',
  strftime('20%d-%m-', zreceiptdate+978307200,
           'unixepoch', 'localtime') ||
  substr(strftime('%Y', zreceiptdate+978307200,
         'unixepoch', 'localtime'), 3),
  'utc'
)-978307200
BETWEEN zreceiptdate AND strftime('%s', 'now')-978307200;

EOF

open /Applications/ScanSnapHomeMain.app

Note that the SQLite database is not used only by expenses (ZCONTENT.ZDOCTYPE=4) but also to store a summary of all documents scanned. Also, the ZCONTENT table has a column ZUNCHECKED that is not what you would expect, it is a constant 1 even if you remove the Unchecked tag from the transaction.

Now, all the usual disclaimers apply, modifying the database directly is not something supported by the app developer, and could have unintended consequences. If you use this script (or more likely modify it for your needs), I disclaim responsibility for any damages or data loss this may cause.

Update (2020-10-01):

Added:

  • setting duplicate receipt amounts to 0
  • fixing DD/MM/YYYY dates misparsed as MM/DD/YYYY
  • fixing DD/MM/YY dates misparsed as YY/MM/DD

Virtual Reality for the people

I have been shooting stitched panoramas for almost 20 years. I have used manual panorama heads like the Kaidan Kiwi+ and more recently the pocketPANO Compact, robotic heads like the Gigapan EPIC 100 and the Benro Polaris, and four successive generations of the Ricoh Theta (Theta, Theta S, Theta V, Theta Z1).

Setting up and iterating a manual head is incredibly tedious. The Gigapan makes it less so, specially when using long lenses (my standard setup is a Leica M typ 240 or M10 with a 90mm f/2 Apo-Summicron-M ASPH). The Theta series was a major breakthrough in that it could produce nearly seamless 360° panoramas with no motion artifacts or ghosting. The Z1 with its large 1″ sensor finally yields image quality that I am happy with.

The viewing situation has also improved. In the early days you needed Java applets or dubious plug-ins. Nowadays, it can all be done in HTML5 with the aid of JavaScript libraries like Panellum. The user experience is still one of scrolling an image through a rectangular viewport in browser window. The experience on mobile is a bit better because it can use the accelerometer so you scroll by panning with your phone or tablet. It’s still not a fully immersive experience.

This Friday Facebook announced a price drop for its Oculus Go VR headset, the entry-level 32GB model being at a near-impulse purchase price of $150, and of course I yielded to the impulse. I had bought the original Oculus Rift to get a sense of what the potential of VR was, but tethered to a beefy PC, it made for impressive demos but not much more.

The Oculus Go changes this completely because it is standalone (it has the guts of a midrange smartphone circa 2018) and affordable. One of the ways they kept costs down is by removing motion tracking: it can detect angular motions of your head, but not when you are walking around, but for purposes of viewing 360° panoramic stills and videos, that is not required.

One of my concerns was how deeply it would be tied to the Facebook privacy-mangling machine. My New Year’s resolution for 2019 was to delete my FB account (my 2020 resolution was to switch all my digital camera clocks to UTC and never again bother with the abomination that is Daylight Saving Time)—underpromise and overdeliver, that’s my motto… Any requirement to have a FB account would be a total deal-breaker for me.

The second concern was how much of a hassle it would be to set up and use with my own photos. Camera-makers are not known for outstanding software and Ricoh is no exception. There is an Oculus third-party app for Theta cameras, but it hasn’t been updated in ages and only lists Theta S compatibility.

I was pleasantly surprised at how smoothly it went. You can avoid the FB account by using an Oculus account (I used mine from the Rift), and no additional apps are required. Just install the Android File Transfer utility if you are on a Mac, copy the files to the headset’s Pictures directory. I would recommend using subfolders because the built-in Gallery app is not smart about caching thumbnails and is very slow at regenerating the view if there are more than about 20 images or so in a folder.

The image quality is not exceptional. Mike Abrash, who worked on the ground-breaking 3D game Quake, and is now Chief Scientist at Oculus, says fully immersive VR requires resolution halfway between 4K and 8K in each eye (vs. 2.5K shared for both eyes on the Go), and is at least a decade away. The immersive nature of the Go does provide that elusive Wow! factor, however, and more than makes up for its designed-to-a-budget shortcomings. The 2560×1440 display with an apparent field of view of 100° yields 3.7MP in the FOV but spherical trigonometry calculations reveal the entire 360° sphere would require a 26MP image to cover it entirely, which is slightly more than the 23MP images the Theta Z1 delivers. Fully immersive VR requires very high resolutions!

It even handles video transparently (you do have to convert Theta videos from the native format to equirectangular projection video with Ricoh’s app, which is excruciatingly slow). Keep in mind that video sizes are large, and with a 32GB model, there are limits to how much you can store on the device. If you plan to view immersive videos, the 64GB model is highly recommended.

The Oculus Go also has a “Cast” feature that will stream what the person wearing the headset is seeing to the phone it is paired with. You can have a friend wear the headset and narrate what they are seeing, I tried this with my architect mother-in-law as I was showing her the sights in Jerusalem, much to her delight (her master’s thesis at SOAS was on the Dome of the Rock). The Go has a unique sound projector developed by Oculus that means the user doesn’t have to wear earbuds, and can hear you speak. I would recommend you change the default display sleep time from the ridiculously short 15 seconds to 3 to 5 minutes, so you can swap the headset without losing the cast session or resetting the app. Sadly, the battery life is nothing to write home about. I would guesstimate it at 1 to 2 hours, tops.

I still need to figure how to share my 360° VR photos using WebVR so other people can view them from their own Oculus Go (or other headsets).

One essential accessory for the Z1 or another similar 360° camera is a selfie stick or similar implement, otherwise your hands will appear prominently in the final panorama. Ricoh sells three models.

The TM-1 is a very well designed tripod (rumored to be made by Velbon) with a magnetic quick-release mount. It’s easy to deploy with one click, unlike a conventional tripod, and fully extended the camera is at eye height for a natural perspective.

The TM-3 is a short telescopic stick. It’s long enough that your hands no longer appear in the picture but low-profile enoough that the TM-3 itself is invisible. It is well-made, unlike most generic Chinese selfie sticks, unlocks and locks with a simple twist, and the TS-2 case for the Z1 has an opening at the bottom so you don’t need to detach it before putting the camera back in its case, a nice touch.

The TM-2 is a longer version of the TM-3 with an unnecessary swivel head, I haven’t tried it but the swivel head would defeat the invisible factor.