“Pages” in ePub: Adobe’s page-map versus NCX pageList
by Keith Fahlgren
The vast majority of ebooks today have print cousins, despite some recent digital-only publishing news. As a consequence, many people creating ePubs want to know how to tie references to the printed pages back into the ebook. My personal opinion is that this sort of print-centrism is unnecessary for the vast majority of titles1, but there are times when mapping the ebook to the printed book pages does make a lot of sense. Unfortunately, there’s no perfect solution at this time, but there are two options.
Adobe was motivated to provide a solution to this problem in Digital Editions before other reading systems, and they came up with a mechanism called
page-map. The Adobe EPUB Best Practices Guide describes the issue:
There is no inherent linear navigation indicator which could be used for the same purpose that page number is used in the printed document world.
There is no way for an eBook to incorporate page number information for the printed edition of the same book.
Adobe developed an extension called
page-map, documented in the sme Best Practices Guide, that provides a solution. To implement
page-map, the creator includes a special
page-map file in the ePub and references that file in the OPF metadata. This file gives a (page #)
name to each pointer to a specific location within the content.
Here’s an example (note that many of the content files in the preface don’t span more than one page):
<page-map xmlns="http://www.idpf.org/2007/opf"> <page name="" href="strandedwithaspy_cov.html"/> <page name="" href="strandedwithaspy_intro.html"/> <page name="1" href="strandedwithaspy_fm01.html"/> <page name="2" href="strandedwithaspy_fm02.html"/> <page name="3" href="strandedwithaspy_tp01.html"/> <page name="5" href="strandedwithaspy_adc01.html"/> <page name="5" href="strandedwithaspy_ata01.html"/> <page name="6" href="strandedwithaspy_ded01.html"/> <page name="" href="strandedwithaspy_con01.html"/> <page name="7" href="strandedwithaspy_fm03.html"/> <page name="8" href="strandedwithaspy_fm03.html#page8"/> <page name="9" href="strandedwithaspy_fm03.html#page9"/> <page name="10" href="strandedwithaspy_ch01.html"/> <page name="11" href="strandedwithaspy_ch01.html#page11"/> <page name="12" href="strandedwithaspy_ch01.html#page12"/>
Pages in Action
page-map actually look in Digital Editions?
page-map is (intentionally) an extension to ePub and adding a
page-map file to your ePub will make it invalid. On top of that, unless the reading systems is based on Adobe software (like Digital Editions and the Sony Reader),
page-map will have no effect.
Pages in Digital Editions (without a page-map)
So, why do you always see these pages in Digital Editions, regardless? An un/fortunate
feature of Digital Editions is the addition of the
page-map-like display of
pages, even if the ePub doesn’t include any
page-map file. Here’s Adobe’s Best Practices again, describing how it chunks any content into a regular size, then labels each chunk a
When page map is not available in the document, Adobe Digital Editions will synthesize a page-map based on the document content. The approach used is the following:
Determine a compressed byte length of each resource which is referenced in the spine, subtracting any known encryption overhead (IV size)
Assume that there is a page for each 1024 bytes…
The NCX spec provides an alternative to the
page-map extension, the
pageList. This aptly named
container for pagination information, provides a mechanism for giving a label (
navLabel) to a point in the ePub (the
Here’s an example from an Internet Archive ePub:
<pageList> <navLabel> <text>Pages</text> </navLabel> <pageTarget type="normal" id="pagetarget000006" value="6" playOrder="6"> <navLabel> <text>6</text> </navLabel> <content src="part0000.html#page-6"/> </pageTarget> <pageTarget type="normal" id="pagetarget000007" value="7" playOrder="7"> <navLabel> <text>7</text> </navLabel> <content src="part0000.html#page-7"/> </pageTarget>
The catch? While the OPF spec (part of ePub) says
reading Systems must support NCX. (and mentions
pageList offhandedly), it’s not at all clear how much of NCX is supposed to be supported. This ambiguity has meant that no reading systems (to my knowledge) have implemented support for
pageList, so it’s appeal is primarily aesthetic. Unlike
page-map, adding a
pageList won’t make your ePub invalid. The EPUB Standards Maintenance Working Group is trying to clarify the NCX issue.
1 What percentage of people actually have both media in front of them at the same time? It’s unquestionably better when trying to tell someone on the phone about the
hilarious double-entendre on page 294, but I wonder how often this happens as well. Why not tell them to search for
would you like to check my figures?. Finally, in STM & educational content, where this sort of thing might come up often, the headings are often numbered, and serve as a better guidepost (because they don’t rely on the same trim/pagination for various international editions, etc, etc).