What I’d change about ePub

by Liza Daly

Obviously I’m a fan of the ePub format. It’s flexible enough to support advanced publications, but a simple text ebook can be put together with minimal effort.

But I don’t think it’s minimal enough. If I could go back in time and be involved with ePub and its predecessors, here are the choices I’d make:

Make the NCX optional

Many books or book-like publications have no chapters. In this case ebook authors are forced to create useless one-item NCX files and invent fake chapter titles, like “Pages.” Reading systems should be able to rely on the opf:spine alone to order OPS documents without defined, named chapters.

Simplify the NCX

Good progress is being made in the EPUB Working Group towards clarifying and simplifying the NCX requirements. (Making playOrder optional is an especially useful step.)

But because the NCX is derived from the DAISY specification, there’s still some useless overlap, like the duplicated title. NCX is great for complex hierarchies, but I wish its features were simply a part of the OPF file, leaving only one file with publication-specific metadata.

Drop container.xml and replace with a required name and location for the OPF

I don’t personally understand the choice that was made here: there’s a file with a required location and name (META-INF/container.xml) whose sole purpose is to point to another file which may be named arbitrarily. Perhaps this is a historical artifact, but why not simply require there to be a content.opf file at the top level of the publication and be done with it?

(When combined with my first change, this would result in ePub requiring half as many files, which to me is a good thing.)

Support any valid form of XHTML

XHTML 1.1 was obviously a mistake, as it’s used (as far as I know) nowhere else, and is a dead-end as far as web technologies go. Few automated HTML tools generate it, and the changes from XHTML 1.0 are simply annoying rather than useful.

I’d prefer that ePub support XHTML 1.0, which is simply HTML 4.01 with an XML vocabulary. HTML 4 is the dominant form of HTML on the web (and will remain so for some time) and common automated tools like Tidy can clean up “street” HTML 4 into XHTML 1.0 quite well. Tidy won’t, however, produce XHTML 1.1.

I also don’t want to put an upper bound on the XHTML supported: XHTML 5 should also be okay, and the rules for a reading system which don’t support later tags should be the same “ignore and move on” that has worked well on the web.

Minimize or eliminate any ePub-specific styles and markup

I dislike the existence of special style properties like the oeb-* styles where equivalent CSS3 properties exist. I know CSS3 is a mess, but I’d rather use the same vocabulary as will eventually be found on the web. (This was discussed but isn’t happening.)

Support MathML as a first-class document type

This will happen eventually, but who knows the timeline. At least the fallback system means that MathML documents are currently allowable; it’s just more of a hassle and many people don’t realize it’s possible.

How would you change the spec if you could magically make it so?