‘Tools’ section added, first tool is web-based epubcheck

Monday, May 26th, 2008

For one-off checks or use by non-developers, there is now a web front-end for the valuable epubcheck validation tool.

Uploaded files are run through the validator and any error messages are reported. The error report includes some notes to help decipher the sometimes-cryptic messages (notes are taken from the epubcheck wiki).

epub files are deleted immediately after validation, but take care not to upload any proprietary or sensitive documents. This tool provides no guarantees that any information it receives is secure.

The tool is running epubcheck version 0.9.5.  It is planned that the front-end will track updates of the epubcheck library.

Epub tools repository created

Monday, May 26th, 2008

I have split threepress into two different projects hosted on Google Code:

  1. threepress search, which is the web application that is running on threepress.org.
  2. epub-tools, which is a repository for standalone tools which can be used in other projects

Most developers will be interested in epub-tools.  Experimental projects will start in the search application, receive feedback from the digital publishing community, and get packaged up for distribution in epub-tools.

The tools project has only one application now: tei2epub.

The current version of tei2epub includes these recent changes:

  1. The latest version of the validation utility epubcheck (0.9.5)
  2. The NCX files now validate against the NCX DTD as well as epubcheck’s schema (thanks to Jon Noring for testing)
  3. Some corrections were made to render TOCs more attractively in Adobe Digital Editions

Convert TEI to epub

Monday, May 12th, 2008

The most useful standalone tool in threepress right now is tei2epub, which the system uses to convert its internal source XML to the emerging e-book standard format epub.

TEI is the Text Encoding Initiative, and is one of the most popular markup formats for printed works (especially in academics). All of the content on threepress has been converted from the Gutenberg format to TEI upon ingestion into the site.

epub is the shorthand for the e-book format proposed by the International Digital Publishing Forum (IDPF), which uses XHTML and custom metadata formats. An e-book bundle is distributed in ZIP file format with its text and supplementary media “bound” together.

tei2epub is written in Python with XSLT. It also comes bundled with the latest version of epubcheck, for validating the output. It is meant to be used by developers rather than end-users (unlike the recent BookGlutton epub converter) and as most of the functionality is in the XSLT, should be easy to port to other languages. Like all threepress tools it is released under the BSD license which means it is free for all commercial and non-commercial use. You may download the ZIP version of the current release or get the latest version from svn at http://epub-tools.googlecode.com/svn/trunk/

Current limitations:

  1. tei2epub has not been tested on extensively marked-up TEI. It leverages the standard TEI to XHTML stylesheets distributed by TEI, but it is unknown whether epub readers will support all of the resulting markup
  2. It accepts only a single source document (i.e. an entire TEI book)
  3. It does not handle images or other kinds of media

Any of the above can be addressed with the addition of more complex TEI source books.

Edited May 22, 2008 to point resources at a new standalone repository.