Web

Amazon wishlist optimizer

I wrote a script several months ago to go through an Amazon wish list and find the combination of items that will best fit within a given budget. Given that the Christmas holiday shopping season seems to have started before Thanksgiving, it seemed topical to release it.

It used the Amazon Web Services API, which is a complete crock (among other failings, it will consistently not return the Amazon.com price for an item, even when explicitly instructed to do so). It does not look like Amazon pays any particular attention to the bug reports I filed. I just gave up on the API and re-implemented it the old-fashioned way, by “scraping” Amazon’s regular (and most definitely not XML-compliant) HTML pages.

It is still very much work in progress, but already somewhat useful. You can use it directly by stuffing your wish list ID in the URL (or using the form below):

Wish list IDAmount

A better way is to drag and drop the highlighted Amazon optimizer bookmarklet link (version 6 as of 2007-05-08) to your browser’s toolbar. You can then browse through Amazon, and once you have found the wish list you are looking for, click on the bookmarklet to open the optimizer in a new window (or tab). By default, it will try and fit a budget of $100 (my decadent tastes are showing, are they not?), but you can change that amount and experiment with different budgets. Surprisingly often, it will find an exact fit. Otherwise, it will try to find the closest match under the budget with as little left over as possible.

There are many caveats. The wishlist optimizer only works for public Amazon.com (US) wish lists. There does not seem to be an easy way to buy multiple items for somebody else’s wish list in one step, although I am working on it, so you will have to go through the wish list and add the items by hand. Shipping costs and wish list priorities are currently not taken into account. Sometimes Amazon will not show a price straight away but instead require you to click on a link, the optimizer will decline to play these marketer’s games and just skip those products.

Be patient – Amazon.com is rather slow right now — it seems they did not learn the lessons of their poor performance towards the end of last year. One of my coworkers ran the optimizer through an acid test with his wife’s 13-page wish list, and it took well over a minute and half to fetch the list, let alone optimize it. One can only imagine how bad it will get when the Christmas shopping season begins in earnest. To mitigate this somewhat, I have added caching – the script will only hit Amazon once per hour for any given wish list. As it works by scraping the web site rather than using the buggy and unreliable Amazon Web Services API, there is a real risk it will stop working if Amazon blocks my server’s IP or if they radically change their wish list UI (they would do better to add additional machines and load-balancers, but that would be too logical).

Update (2005-12-02):

Predictably, Amazon changed their form (they changed the form name from edit-items to editItems) and broke not only the wishlist optimizer, but also the bookmarklet. I fixed this and upgraded to the scraping module BeautifulSoup, but you will need to use the revised bookmarklet above to make it work again.

Update (2010-04-27):

The script has been broken for quite a while, but I fixed it and it should work again.

The Temboz RSS aggregator

2013-03-14: Google’s announcement that their Reader service will be discontinued has spurred interest in Temboz. This software is not dead, in fact I use it daily, but have not made an official release in a long time. You should use the version from Github instead. There are currently a number of bugs which can lead to Temboz locking up and requiring a restart. I am planning on completing my long overdue overhaul before Google’s July deadline.

Contents

Introduction

Temboz is a RSS aggregator. It is inspired by FeedOnFeeds (web-based personal aggregator), Google News (two column layout) and TiVo (thumbs up and down). I have been using FeedOnFeeds for some time now, but that software seems to have stopped evolving, and I had a number of optimizations to the user experience I wanted to make.

Features

Already implemented:

  • Multithreaded, download feeds in parallel.
  • Built-in web server.
  • Two-column user interface for better readability and information density. Automatic reflow using CSS.
  • Ratings system for articles
  • Real-time hunter-gatherer user interface: items flagged with a “Thumbs down” disappear immediately off the screen (using Dynamic HTML), making room for new articles. No laborious flagging of items as in FeedOnFeeds.
  • Filtering entries (using Python syntax, e.g. ‘Salon’ in feed_title and title == “King Kaufman’s Sports Daily”, or simply by selecting keywords/phrases and hitting “Thumbs down”).
  • Ability to generate a RSS feeds from “Thumbs Up” articles, which is why Temboz would be a true aggregator, not just a reader.
  • Ad filtering
  • Automatic garbage collection: every day between 3AM and 4AM, uninteresting articles (by default those older than 7 days) are purged of their contents (but not metadata such as titles, permalinks or timestamps) to keep the database size manageable. After 6 months (by default), they are deleted altogether
  • Automatic database backups daily (immediately after garbage collection)

On the to do list:

  • Write better documentation
  • Handle permanent HTTP redirects for feed XML URLs
  • Automatic pacing of feed polling intervals using the average and standard deviation of observed feed item inter-arrival times, to reduce bandwidth usage and load for both client and server. Most feeds should be polled on a daily rather than hourly interval (e.g. my own, since I update once a week on average), but the mechanisms for a feed to indicate its polling rate preferences are quite inconsistent from one flavor of RSS/Atom to another.
  • “Survivor mode” – vote feeds that no longer perform off the aggregator based on relevance statistics.
  • Ability to cluster together articles (I tried a heuristic of looking for common URLs they are all pointing to, but this didn’t work well in practice).
  • Portability to Windows, distribution as a standalone package.

History

I have been using it successfully for well over a year. It still has rough edges, with some administration functions only doable using the SQLite command-line utility. Here is a screen shot showing the reader user interface. The article highlighted in yellow was given a “Thumbs Up”. You can also see the user interface at work in a view of the last 50 articles I flagged as “thumbs up” among the feeds I read.

Screen shots

Click on a screen shot thumbnail for a full-sized version

The first screen shot shows the article reading interface, using a two-column layout. Clicking on the “Thumbs down” icon makes the article disappear, bringing a new one in its place (if available). Clicking on the “Thumbs up” icon highlights it in yello and flags it as interesting in the database.

view itemsThe feed summary page shows statistics on feeds, starting with feeds with unread articles, then by alphabetical order. Feeds can be sorted based on other metrics. You have the option of “catching up” with a feed (marking all the articles as read). Feeds with errors are highlighted in red (not shown).

view feedsClicking on the “details” link for a feed brings this page, which allows you to change title or feed URL, and shows the RSS or Atom fields accessible for filtering.

feed detailsFeeds can be filtered using Python expressions.

filtering rules

Known bugs

You can check outstanding bug reports, change requests and more at the public CVStrac site.

Credits

Temboz is written in Python, and leverages Mark Pilgrim’s Ultra-liberal feed parser, SQLite 2.x, Cheetah.

Download

You can download the current version: temboz-0.8.tar.gz I welcome any feedback you may have, specially as concerns improving installation.

The CVS version is far ahead of 0.8 in features. I have not yet had the time to test and document the migration procedure from 0.8 to 1.0, but if you are a new Temboz user I strongly advise you to get a nightly CVS snapshot instead (they are what I run on my own server): temboz-CVS.tar.gz or temboz-CVS.zip.

Updates

For news on Temboz, please subscribe to the RSS feed.

Temboz has a CVStrac where you can submit bug reports or change requests, and a Wiki, where all future documentation will ultimately reside.

Post scriptum

The name “Temboz” is a reference to Malima Temboz, “The mountain that walks”, an elephant whose tormented spirit is the object of Mike Resnick’s excellent SF novel, Ivory.

Mylos

I switched to WordPress at the end of 2009 for the reasons expressed elsewhere, then to Hugo in 2017, which is going back in the opposite direction, and this entry is here for historical purposes only.

Mylos is my home-grown weblog management software. I wrote my first web pages by hand in Emacs and RCS in 1993, but stopped maintaining them in 1996 or so. I only restarted one with Radio last year. After a year of weblogging, however, I find I am frustrated by the limitations of Radio as well as its web-based user interface (I am one of those rare people who prefer command-line user interfaces and non-WYSIWYG HTML editors). I guess I could have extended Radio using UserLand’s Frontier language it is implemented in, but I have no interest in learning yet another oddball scripting language.

I decided in April 2003 to roll my own system, implemented in Python. In my career at various ISPs, I had to kill home-grown content-management system (CMS) projects gone awry, and I was certainly aware that these projects have a tendency to go overboard. Still, it has taken me three months of (very) part-time work to get the system to a point where it generates usable pages and imports my legacy pages from Radio without a hitch.

The implemented requirements for Mylos are:

  • Migration of my existing Radio weblog entries and stories (done, but not in an entirely generic fashion, is theme-dependent)
  • All pages are static HTML, no requirements for CGI scripts, PHP, databases or the like
  • Implemented and extensible in Python
  • Separation of content and presentation using themes (based on Webware Python Server Pages and CSS)
  • Support for navigational hierarchy
  • Articles are stored as regular files on the filesystem where they can be edited using conventional tools if necessary, no need for proprietary databases
  • Extensible article metadata
  • Atom 1.0 syndication, with separate feeds for subcategories
  • Use only relative URLs in hyperlinks to allow easy relocation
  • Automatic entry HTML cleanup for XHTML compliance
  • A CSS-based layout where the blogroll doesn’t wrap around short bodies (e.g. on permalink pages for short articles).
  • reasonable defaults, e.g. don’t try to create a weblog entry for an image that is colocated with an article, just copy it
  • Built-in multithreaded external link validation.
  • Automatic URL remapping (/mylos/ becomes relative to the Mylos root, relative URLs in an entry are automatically prefixed in containers like home pages).
  • Ability to review an article before publishing
  • Lynx compliance
  • Automatically cache external images in weblog entries in case they disappear (but do not use them as such due to potential copyright issues)
  • Set robots meta tag so only permalinks are indexed and cached by search engines, for better relevance to search engine users (albeit at the cost of lower rankings for the home page).
  • Sophisticated image galleries fully integrated with the navigation
  • Automatic code fragment colorization using Pygments

These features are planned but not yet implemented:

  • Keyword index.
  • Enhanced support for books via Allconsuming and Amazon.
  • Automated dependency tracking to re-render only the pages affected by a change (via SCons)
  • Multi-threaded rendering (via SCons)
  • Automatically add height, width and alt tags to img tags
  • Auto abbreviation glossary as tooltip help using tags
  • Typographically clean results, as done by SmartyPants
  • Feedback loop via on-page comments
  • Notification of new comments by email
  • Ability to promote a weblog entry to a story if it reaches critical mass

These features are “blue-sky”, don’t hold your breath for them:

  • Updates by email
  • User-submitted ratings for articles
  • Support for multilingual weblogs

Features thet are not planned at all (anti-requirements) include:

  • Synchronization or upload to server – rsync does this far better
  • Text editor – use $VISUAL or $EDITOR, whether Emacs, vi, or whatever
  • Web user interface – Radio’s web interface has very poor usability in my personal opinion, and this is due to the fact it is web-based, not any fault of Userland’s
  • RSS 1.0 – RDF seems like an exercise in intellectual masturbation
  • Blogger API or similar – although someone else could certainly write a bridge in Python if needed

The software is currently not in a state where it can be used by anyone else. I am not sure if there is any demand for such a tool in any case, if so, I would certainly consider documenting it better and making a SourceForge project out of it.

By the way, the system is named “Mylos” after a city in the magnificent illustrated series “Les Cités Obscures” by Belgian architects and writers Schuiten and Peeters, more specifically L’Enfant Penchéee

Cover for L'Enfant Penchée