Threepress Consulting blog

Threepress creates software for publishers, educators and authors.

Category: digitization

New ePub Zen Garden contribution: GBS

This is really funny if you get the joke, and still a nice style if you don’t. Thanks Kirk Biglione!

A case study in converting image-based ebooks into XML

There’s a great deal of valuable information in this recently-released white paper by The American Council of Learned Societies: ACLS Humanities E-Book XML Conversion Experiment: Report on Workflow, Costs, and User Preferences. Although the study was based on scholarly books, their findings would apply to many other digitization projects.
The Humanities E-Book (HEB) project took [...]

Slides from “What publishers need to know about digitization”

O’Reilly Media will be posting a complete recording of the presentation, but in the meantime I’ve posted the slides from the webcast, “What publishers need to know about digitization” on Slideshare.
Thanks to everyone who attended and especially to those who asked so many excellent questions.
What publishers need to know about digitization
View SlideShare presentation or Upload [...]

The analog hole, and a seminar on digitization

Over on Tools of Change there’s a post of mine discussing the so-called “analog hole” as it applies to digital  books.  It was a fun article to write, especially the hands-on part.  I used Google’s OCRopus open-source OCR software, which was a little impenetrable to someone outside of the machine-learning community but did a good [...]

The real Internet Archive

My attention was caught by this quote from Clay Shirky on the excellent ReadWriteWeb blog:
Back in 1974, when the Internet was a fraction of what it is now, the acorn to an oak, there were really only two applications,” said Shirky, “Telnet, and FTP.”
Surely he’s wrong, I thought.  Those protocols aren’t that old.
But I was [...]

TEI + Python + lxml + Dutch = Corpus Toneelkritiek Interbellum

I was pleased to be able to assist with the Corpus Toneelkritiek Interbellum project, which allows reading, browsing and searching of early 20th-century Dutch theater reviews. I can’t read Dutch, but Google’s automated translation tells me that the review of Hamlet mentions a “long modern clown,” which sounds disturbing enough that I’ll leave the [...]