Can an author create an EPUB file using normal tools? Part 1: Pages
by Liza Daly
Yes, but it may require a Mac.
The IDPF board met on the last day of the Digital Book 2011 conference at Book Expo America. One of our topics for discussion was what the IDPF as an organization should do to further the adoption of EPUB. I brought up an issue that’s been concerning me for some time: the lack of digital-native authoring tools aimed at authors, not publishers.
If publishers are struggling to produce high-quality EPUB files either via InDesign, XML workflows, or strategic outsourcing, authors are in an even worse place. This is especially true for authors with an ambition to self-publish, or to start a micro-publishing outfit, and yet still retain some creative control over the look of their digital product. InDesign (especially CS5.5) is a great solution for small- to medium-sized publishers who produce both print and digital books, but its feature set is inappropriate for digital-native publishing, and its price and complexity are unsuited for self-publishers.
I’m aware of two document creation tools right now that have native EPUB support and are available for my platform, Mac OS X: Pages, and Scrivener. (The only product I know of on the Windows side is Atlantis. Linux users have to make do with plugins for OpenOffice — judging from the comments in the issue tracker, EPUB export is not a priority, to say the least.)
This post will cover Apple Pages. A subsequent post covers outputting EPUB with Scrivener.
Apple’s Pages was the first major commercial word processor to include EPUB export. I reviewed the initial EPUB support in August 2010, but it’s been through some updates since then, and I wanted to dive into the semantics of the outputted code more closely.
The sample document
I started with the Apple-provided EPUB template (more on that later) and added a number of new elements and semantic tests. In particular, I added:
- Chapters and headings
- Emphasis and strong text (rendered in Pages as italics and bold)
- Numbered and unnumbered lists
- Hyperlinks both internal to the document and external to the web
- Inline images (by dragging and dropping)
- A cover page with an image
- All available metadata in the export pane
In all cases, I used only styles available in the style drawer; I did not change any font sizes or font weight via the toolbar buttons.
The EPUB output
As in my previous test, this produced a valid EPUB 2.0.1 document according to EPUBCheck 1.1. Hooray!
Headers and subheaders
The semantics are much-improved from my first test. Paragraphs are now wrapped in
<p> elements, for example, and headers are headers:
<body> <div class="body" style="white-space:pre-wrap"> <h3>Chapter Two: The Chaptering</h3> <p class="s2">This chapter has an introduction. Hello!</p> <h4>I’m a subchapter or section under that. </h4> <p class="s2">Don’t hold it against me. I just have a lot of things to say.</p> </div> </body>
white-space:pre-wrap style is curious: the property is meant to specify that whitespace inside the XHTML is significant, meaning that the ereader/web browser should retain it. That is emphatically not a best practice in general text; on the other hand, there was no whitespace in the output at all, so I’m unsure of its purpose. If I were post-processing this EPUB file, I would remove that style.
I used the “Chapter” style to generate the chapter heading. This header should be an
h1 rather than an
h3, but at least the subheading is also a header and one step down.
I completely failed to find a way to output
em rather than
I used the list styles provided in the template, but these are not the lists you’re looking for:
<h3>Chapter Three: Lists</h3> <p class="s2">Reasons why people love lists, in order.</p> <p class="s2 s3"><span class="c2">1.</span>Lists are neat.</p> <p class="s2 s3"><span class="c2">2.</span>It’s cool to let the computer fill in numbering.</p> <p class="s2 s3"><span class="c2">3.</span>Yessir.</p> <p class="s2">Other reasons that people like lists, in no particular order:</p> <p class="s2 s4"><span class="c3">•</span>Sometimes they have bullets</p> <p class="s2 s4"><span class="c3">•</span>Not real bullets.</p> <p class="s2 s4"><span class="c3">•</span>Those are scary.</p>
This must be fixed.
A little verbose markup-wise, but basically fine:
<table class="s5" style="margin-left:0.0px;width:99.8%;border-collapse:collapse"> <col style="width:33.3%"/> <col style="width:33.3%"/> <col style="width:33.3%"/> <tr style="height:25.0%"> <td class="s8 s6 s7"> <h2 class="s9">Reasons why tables are nice</h2> </td> <td class="s8 s6 s7"> <h2 class="s9">Who feels this way</h2> </td> <td class="s8 s6 s7"> <h2 class="s9">I can’t think of a third thing.</h2> </td> </tr> ....
Images, covers, and links
Creating an image is as easy as dragging it in. I’m not sure if it’s possible to add
alt text to the image — I believe document creation tools should prompt users to add descriptive text by default.
<p class="s2"> <img src="images/droppedImage.png" alt="droppedImage.png" style=""/> </p>
Only images styled as “inline” will be exported; Pages will warn you that the image was discarded if it had a floating or fixed style. I tried to select a page with an inline image as the cover page but Pages gave me a warning that it was being discarded. Then it actually shows up in iBooks anyway.
It would be nice if the original filename were preserved (it was not “droppedImage.png”), and the empty
style attribute should be discarded on output.
<h1><span id="chapter-5-sh1"/>Chapter Five: Hyperlinks</h1> <p class="s2">This is an internal link to <a href="chapter-1.xhtml#b1"><span class="c1">chapter one</span></a>. This is an external link to <a href="http://placekitten.com/"><span class="c1">photos of kittens</span></a>.</p>
span here is for the purpose of creating a back-link. A similar one was auto-added to Chapter One. Adding an internal hyperlink requires an initial step of creating a Pages “bookmark”, and then linking to that bookmark, which was a little confusing; I should be able to target any point in the document using the hyperlink feature.
I didn’t test HTML5 video output, but I’ve been told that video can be successfully embedded and output such that the video will work in iBooks (it will use HTML5 tagging).
Both the OPF and the NCX were perfectly well-formed. The EPUB export dialog should optionally request richer metadata than the current list of author/title/subject, though.
The dreaded sample document
The EPUB export function is next to useless on large documents unless you start with the sample template, or import its styles later and tediously update yours to match. The EPUB styles are completely opaque — I have no idea why they have magical properties, or what I would do to my own styles to emulate them. Since the Pages native file format is binary, there’s nothing for me to inspect to reverse-engineer the styling. The Pages file format is zipped XML, so it may be possible to inspect it directly — thanks Steve!
The native header/paragraph/list styles in the blank template should output useful semantics in the XHTML. It is unacceptable to force users to import an external document to produce a half-decent EPUB file. At the very least, an EPUB-friendly template should be one of the default choices available when creating a new document.
- The list styles should generate lists. They should be ordered or unordered as appropriate to the style.
- EPUBs should be importable as well as exportable. It’s understandable that they won’t magically re-constitute into the original Pages document, but a conversion pipeline is entirely reasonable.
- It should be possible to export chunked EPUBS (with multiple XHTML chapters) without having to use the sample template.
- It should be possible for a power user to understand how to create styles that will have specific behaviors.
- It should be possible to customize the XHTML serialization (“I want the style named ’strong’ to output
strongelements with the classname ‘foo’”).
- There should be much more metadata allowable in the OPF file.
- Images should require or at least prompt for
- Bold and italic buttons should output
emwith the appropriate CSS styling in all cases. I would say this is actually true of any EPUB output tools — it’s unreasonable to ask users to create named styles (as in InDesign) when those tempting bold and italic buttons are available.
I don’t expect Windows/Linux versions of Pages to ever exist, which means that Pages will remain a marginal tool in the publishing ecosystem, but it’s perfectly adequate for an individual Mac-only user.