Convert TEI to epub

by Liza Daly

The most useful standalone tool in threepress right now is tei2epub, which the system uses to convert its internal source XML to the emerging e-book standard format epub.

TEI is the Text Encoding Initiative, and is one of the most popular markup formats for printed works (especially in academics). All of the content on threepress has been converted from the Gutenberg format to TEI upon ingestion into the site.

epub is the shorthand for the e-book format proposed by the International Digital Publishing Forum (IDPF), which uses XHTML and custom metadata formats. An e-book bundle is distributed in ZIP file format with its text and supplementary media “bound” together.

tei2epub is written in Python with XSLT. It also comes bundled with the latest version of epubcheck, for validating the output. It is meant to be used by developers rather than end-users (unlike the recent BookGlutton epub converter) and as most of the functionality is in the XSLT, should be easy to port to other languages. Like all threepress tools it is released under the BSD license which means it is free for all commercial and non-commercial use. You may download the ZIP version of the current release or get the latest version from svn at http://epub-tools.googlecode.com/svn/trunk/

Current limitations:

  1. tei2epub has not been tested on extensively marked-up TEI. It leverages the standard TEI to XHTML stylesheets distributed by TEI, but it is unknown whether epub readers will support all of the resulting markup
  2. It accepts only a single source document (i.e. an entire TEI book)
  3. It does not handle images or other kinds of media

Any of the above can be addressed with the addition of more complex TEI source books.

Edited May 22, 2008 to point resources at a new standalone repository.