<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Threepress Consulting blog &#187; epub</title>
	<atom:link href="http://blog.threepress.org/category/epub/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.threepress.org</link>
	<description>Threepress creates software for publishers, educators and authors.</description>
	<lastBuildDate>Wed, 10 Mar 2010 20:40:23 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Three JavaScript ePub Readers</title>
		<link>http://blog.threepress.org/2010/02/06/three-javascript-epub-readers/</link>
		<comments>http://blog.threepress.org/2010/02/06/three-javascript-epub-readers/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 21:09:27 +0000</pubDate>
		<dc:creator>Keith Fahlgren</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ebooks]]></category>
		<category><![CDATA[epub]]></category>
		<category><![CDATA[epub zen garden]]></category>
		<category><![CDATA[epubjs]]></category>
		<category><![CDATA[html5]]></category>
		<category><![CDATA[ibisreader]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=1198</guid>
		<description><![CDATA[The last few weeks have seen a tremendous increase in interest about ePub. Many new blog posts have been written trying to explain the format. We’ve also seen a big jump in the number of publishers coming to Threepress for help with tricky ePub problems or just asking for guidance about the format. While I&#8217;d [...]]]></description>
			<content:encoded><![CDATA[<p>The last few weeks have seen a <a href="http://www.google.com/trends?q=epub">tremendous increase in interest</a> about <a href="http://www.openebook.org/">ePub</a>. Many new blog posts have been written trying to explain the format. We’ve also seen a big jump in the number of publishers coming to <a href="http://threepress.org">Threepress</a> for help with tricky ePub problems or just asking for guidance about the format. While I&#8217;d like to pretend that the growth is due, in part, to a long-anticipated awareness about the benefits of open standards among consumers, publishers, and suppliers, I think it&#8217;s more likely that it was Steve Jobs&#8217; explicit mention of ePub support in <a href="http://www.apple.com/ipad/features/">iBooks on the iPad</a> that drove most of the excitement. What makes <em>me</em> most excited about this groundswell is the sudden interest in ePub from a number of clever developers.</p>
<p>Just in the last few days, details emerged of two new JavaScript ePub readers, <a href="http://romeda.org/rePublish/">rePublish</a> from <a href="http://twitter.com/blaine/status/8733522914">Blaine Cook (@blaine)</a> and <a href="http://github.com/augustl/js-epub">JSEpub</a> (<a href="http://s3.amazonaws.com/augustl-stuff/epub-is-getting-there-95.png">screenshot</a>) from <a href="http://twitter.com/augustl/status/8700582603">August Lilleaas (@augustl)</a>. These two new readers join <a href="http://twitter.com/liza">@liza</a>’s <a href="http://blog.threepress.org/2009/02/09/introducing-epubjs/">epubjs</a>, which will be a year old on Tuesday. An improved version of epubjs powers the <a href="http://epubzengarden.com">ePub Zen Garden</a>, which helps “dispel the myth that digital books can&#8217;t also be crafted works of visual design.”</p>
<p>Why are JavaScript ePub readers interesting? They&#8217;re interesting to me for three reasons:</p>
<ul>
<li><a href="http://javascript.crockford.com/popular.html">JavaScript is the most popular programming language in the world</a> and it might be the best way to get more developers interested in creating and tweaking ePub readers.</li>
<li>JavaScript ePub readers start challenging publishers, developers, and book readers to start thinking about what’s most important in delivering a compelling reading experience in a browser. We’ve spent a lot of time thinking about these choices while developing <a href="http://ibisreader.com/about">Ibis Reader</a>, which will launch later this month, so I’m eager to see more opinions.</li>
<li>Building a pure-JavaScript ePub reader requires unzipping in JavaScript, which had no open source implementations until just recently. August has written about and open sourced his critical breakthrough for <a href="http://august.lilleaas.net/blog/unzipping-files-with-javascript">unzipping files in JavaScript</a>. <em>[Edit: Oops! I was wrong about this one. See the comments for more details.]</em></li>
</ul>
<p>Colin Hazlehurst has also published some impressive introductions, tutorials, and code for the .NET/C# crowd at his <a href="http://www.hazelhurst.net/InsideEpub/">InsideEpub</a> project and on his <a href="http://netkingcol.blogspot.com/2009/12/introduction-to-epub.html">blog</a>.</p>
<p>Do you know of other techies making waves with ePub? Please let us know!</p>
<p>(And if you&#8217;re one of those publishers who <em>is</em> looking for help, <a href="mailto:info@threepress.org">contact us</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2010/02/06/three-javascript-epub-readers/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Ibis Reader beta program opens</title>
		<link>http://blog.threepress.org/2010/01/25/ibis-reader-beta-program-opens/</link>
		<comments>http://blog.threepress.org/2010/01/25/ibis-reader-beta-program-opens/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 20:38:16 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[ebooks]]></category>
		<category><![CDATA[epub]]></category>
		<category><![CDATA[html5]]></category>
		<category><![CDATA[ibisreader]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=1163</guid>
		<description><![CDATA[
We&#8217;re starting to share early betas of the Ibis Reader mobile UI for iPhones, Nexus Ones, and other Android devices with a limited group of testers. If you&#8217;re interested in joining the beta program and testing on other phones, tablets, and laptops, please email info@ibisreader.com.  You may be asked to sign a non-disclosure agreement.
More [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.threepress.org/2010/01/25/ibis-reader-beta-program-opens/logo-2/" rel="attachment wp-att-1165"><img src="http://blog.threepress.org/wp-content/uploads/2010/01/logo1.png" alt="logo" title="logo" width="280" height="81" class="alignright size-full wp-image-1165" /></a></p>
<p>We&#8217;re starting to share early betas of the <a href="http://ibisreader.com">Ibis Reader</a> mobile UI for iPhones, Nexus Ones, and other Android devices with a limited group of testers. If you&#8217;re interested in joining the beta program and testing on other phones, tablets, and laptops, please email <a href="mailto:info@ibisreader.com">info@ibisreader.com</a>.  You may be asked to sign a non-disclosure agreement.</p>
<p>More info on our upcoming ereader is available in our <a href="http://ibisreader.com/about/">announcement post</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2010/01/25/ibis-reader-beta-program-opens/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What I&#8217;d change about ePub</title>
		<link>http://blog.threepress.org/2009/11/28/what-i-would-change-about-epu/</link>
		<comments>http://blog.threepress.org/2009/11/28/what-i-would-change-about-epu/#comments</comments>
		<pubDate>Sat, 28 Nov 2009 19:20:47 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[ebooks]]></category>
		<category><![CDATA[epub]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=721</guid>
		<description><![CDATA[Obviously I&#8217;m a fan of the ePub format. It&#8217;s flexible enough to support advanced publications, but a simple text ebook can be put together with minimal effort.
But I don&#8217;t think it&#8217;s minimal enough.  If I could go back in time and be involved with ePub and its predecessors, here are the choices I&#8217;d make:
Make [...]]]></description>
			<content:encoded><![CDATA[<p>Obviously I&#8217;m a fan of the ePub format. It&#8217;s flexible enough to support advanced publications, but a simple text ebook can be put together with minimal effort.</p>
<p>But I don&#8217;t think it&#8217;s minimal enough.  If I could go back in time and be involved with ePub and its predecessors, here are the choices I&#8217;d make:</p>
<h3>Make the NCX optional</h3>
<p>Many books or book-like publications have no chapters. In this case ebook authors are forced to create useless one-item NCX files and invent fake chapter titles, like &#8220;Pages.&#8221;  Reading systems should be able to rely on the <code>opf:spine</code> alone to order OPS documents without defined, named chapters.</p>
<h3>Simplify the NCX</h3>
<p>Good progress is being made in the EPUB Working Group towards <a href="http://www.daisy.org/epub/issues/clarify-ncx-reading-system-requirements">clarifying</a> and simplifying the NCX requirements. (Making <a href="http://www.daisy.org/epub/issues/playorder-attribute-epub"><code>playOrder</code> optional</a> is an especially useful step.)</p>
<p>But because the NCX is derived from the DAISY specification, there&#8217;s still some useless overlap, like the duplicated title. NCX is great for complex hierarchies, but I wish its features were simply a part of the OPF file, leaving only one file with publication-specific metadata.</p>
<h3>Drop <code>container.xml</code> and replace with a required name and location for the OPF</h3>
<p>I don&#8217;t personally understand the choice that was made here: there&#8217;s a file with a required location and name (<code>META-INF/container.xml</code>) whose sole purpose is to point to another file which may be named arbitrarily.  Perhaps this is a historical artifact, but why not simply require there to be a <code>content.opf</code> file at the top level of the publication and be done with it?</p>
<p>(When combined with my first change, this would result in ePub requiring half as many files, which to me is a good thing.)</p>
<h3>Support any valid form of XHTML</h3>
<p>XHTML 1.1 was obviously a mistake, as it&#8217;s used (as far as I know) nowhere else, and is a dead-end as far as web technologies go.  Few automated HTML tools generate it, and the changes from XHTML 1.0 are simply annoying rather than useful. </p>
<p>I&#8217;d prefer that ePub support XHTML 1.0, which is simply HTML 4.01 with an XML vocabulary.  HTML 4 is the dominant form of HTML on the web (and will remain so for some time) and common automated tools like <a href="http://tidy.sourceforge.net/">Tidy</a> can clean up &#8220;street&#8221; HTML 4 into XHTML 1.0 quite well.  Tidy won&#8217;t, however, produce XHTML 1.1.</p>
<p>I also don&#8217;t want to put an upper bound on the XHTML supported: XHTML 5 should also be okay, and the rules for a reading system which don&#8217;t support later tags should be the same &#8220;ignore and move on&#8221; that has worked well on the web.</p>
<h3>Minimize or eliminate any ePub-specific styles and markup</h3>
<p>I dislike the existence of special style properties like the <a href="http://www.idpf.org/2007/ops/OPS_2.0_final_spec.html#Section3.0">oeb-* styles</a> where equivalent CSS3 properties exist. I know CSS3 is a mess, but I&#8217;d rather use the same vocabulary as will eventually be found on the web. (<a href="http://www.daisy.org/epub/issues/oeb-page-head-oeb-page-foot">This was discussed but isn&#8217;t happening</a>.)</p>
<h3>Support MathML as a first-class document type</h3>
<p><a href="http://www.daisy.org/epub/issues/mathml-should-be-allowed-as-first-class-markup-ops">This will happen eventually</a>, but who knows the timeline.  At least the fallback system means that MathML documents are currently allowable; it&#8217;s just more of a hassle and many people don&#8217;t realize it&#8217;s possible.</p>
<p>How would you change the spec if you could magically make it so?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2009/11/28/what-i-would-change-about-epu/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Practical ePub metadata: Authorship</title>
		<link>http://blog.threepress.org/2009/11/27/practical-epub-metadata-authorship/</link>
		<comments>http://blog.threepress.org/2009/11/27/practical-epub-metadata-authorship/#comments</comments>
		<pubDate>Fri, 27 Nov 2009 14:31:49 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[ebooks]]></category>
		<category><![CDATA[epub]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=1001</guid>
		<description><![CDATA[The ePub format allows for a fairly comprehensive set of book-friendly metadata, mostly drawn from the Dublin Core set of terms.  Knowing what metadata to use and how it will appear in today&#8217;s and tomorrow&#8217;s readers is key; here are some recommendations:
Authorship
Two elements describe authorship of a work: dc:creator and dc:contributor. As you might [...]]]></description>
			<content:encoded><![CDATA[<p>The ePub format allows for a fairly comprehensive set of book-friendly metadata, mostly drawn from the <a href="http://dublincore.org/documents/dcmi-terms/">Dublin Core set of terms</a>.  Knowing what metadata to use and how it will appear in today&#8217;s and tomorrow&#8217;s readers is key; here are some recommendations:</p>
<h3>Authorship</h3>
<p>Two elements describe authorship of a work: <code>dc:creator</code> and <code>dc:contributor</code>. As you might guess, <code>dc:creator</code> designates the primary authorship, though there can be more than one.</p>
<p><em>Each author should be in their own element.</em>.  Don&#8217;t do this:</p>
<pre class="brush: xml;">
&lt;dc:creator&gt;James F. Smith, Thomas Jones&lt;/dc:creator&gt;
</pre>
<p>Instead use:</p>
<pre class="brush: xml;">
&lt;dc:creator&gt;James F. Smith&lt;/dc:creator&gt;
&lt;dc:creator&gt;Thomas Jones&lt;/dc:creator&gt;
</pre>
<p>We recommend that anyone who had any kind of significant authorship role be marked as a <code>dc:creator</code>.  Most reading systems use this element to discover or search for the author name, so ensure that individual authors in a multi-author work are so tagged.</p>
<p>A contributor designation is appropriate to describe individuals who had a non-primary role in the work.  For example, if a multi-author work had multiple editors, or you want to capture additional staff like proofreaders, illustrators or the like, then use <code>dc:contributor</code>.</p>
<pre class="brush: xml;">
&lt;dc:contributor&gt;Alice Proofreader&lt;/dc:contributor&gt;
</pre>
<h4>File maker</h4>
<p>ePub allows ebook creators to indicate how the values in these elements should be sorted. Name sorting, especially internationally, is a surprisingly difficult problem.  </p>
<p>We recommend listing the author names in display order (usually <em>First Middle Last</em>) in the element text, and then setting <code>opf:file-as</code> to <em>Last, First Middle</em>, or whatever is appropriate in your source language.</p>
<pre class="brush: xml;">
&lt;dc:creator opf:file-as=&quot;Smith, James F.&quot;&gt;James F. Smith&lt;/dc:creator&gt;
&lt;dc:creator opf:file-as=&quot;Jones, Thomas&quot;&gt;Thomas Jones&lt;/dc:creator&gt;
</pre>
<h4>Role playing</h4>
<p>ePub extends Dublin Core by providing the <code>opf:role</code> attribute.  This allows ebook creators to more precisely describe the contributions made by individuals listed in <code>dc:creator</code> and <code>dc:contributor</code>.  It&#8217;s always a good idea to fill this out, even if not all reading systems display it:</p>
<pre class="brush: xml;">
&lt;dc:creator opf:role=&quot;edt&quot; opf:file-as=&quot;Smith, James F.&quot;&gt;James F. Smith &lt;/dc:creator&gt;
&lt;dc:creator opf:role=&quot;edt&quot; opf:file-as=&quot;Jones, Thomas&quot;&gt;Thomas Jones&lt;/dc:creator&gt;
&lt;dc:creator opf:role=&quot;aut&quot; opf:file-as=&quot;MacNeil, Lisa&quot;&gt;Lisa MacNeil&lt;/dc:creator&gt;
</pre>
<p>The recommended list of values for <code>opf:role</code> come from the <a href="http://www.loc.gov/marc/relators/">MARC code list</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2009/11/27/practical-epub-metadata-authorship/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>&#8220;Pages&#8221; in ePub: Adobe&#8217;s page-map versus NCX pageList</title>
		<link>http://blog.threepress.org/2009/11/26/adobe-page-map-versus-ncx-pagelist/</link>
		<comments>http://blog.threepress.org/2009/11/26/adobe-page-map-versus-ncx-pagelist/#comments</comments>
		<pubDate>Thu, 26 Nov 2009 14:47:45 +0000</pubDate>
		<dc:creator>Keith Fahlgren</dc:creator>
				<category><![CDATA[ebooks]]></category>
		<category><![CDATA[epub]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=790</guid>
		<description><![CDATA[The vast majority of ebooks today have print cousins, despite some recent digital-only publishing news. As a consequence, many people creating ePubs want to know how to tie references to the printed pages back into the ebook. My personal opinion is that this sort of print-centrism is unnecessary for the vast majority of titles1, but [...]]]></description>
			<content:encoded><![CDATA[<p>The vast majority of ebooks today have print cousins, despite <a href="http://carinapress.com/faq/">some recent digital-only publishing news</a>. As a consequence, many people creating ePubs want to know how to tie references to the printed pages back into the ebook. My personal opinion is that this sort of print-centrism is unnecessary for the vast majority of titles<sup><a href="#fn1text" id="fn1anchor">1</a></sup>, but there are times when mapping the ebook to the printed book pages does make a lot of sense. Unfortunately, there&#8217;s no perfect solution at this time, but there are two options.</p>
<h3>page-map</h3>
<p>Adobe <a href="http://www.mobileread.com/forums/showpost.php?p=658073&#038;postcount=16">was motivated</a> to provide a solution to this problem in Digital Editions before other reading systems, and they came up with a mechanism called <code>page-map</code>. The <a href="http://www.adobe.com/devnet/digitalpublishing/epubs/EPUBBestPractices-1_0_3.epub">Adobe EPUB Best Practices Guide</a> describes the issue:</p>
<blockquote><p>There is no inherent linear navigation indicator which could be used for the same purpose that page number is used in the printed document world.</p>
<p>There is no way for an eBook to incorporate page number information for the printed edition of the same book.</p></blockquote>
<p>Adobe developed an extension called <code>page-map</code>, documented in the sme <a href="http://www.adobe.com/devnet/digitalpublishing/epubs/EPUBBestPractices-1_0_3.epub">Best Practices Guide</a>, that provides a solution. To implement <code>page-map</code>, the creator includes a special <code>page-map</code> file in the ePub and references that file in the OPF metadata. This file gives a (page #) <code>name</code> to each pointer to a specific location within the content. </p>
<p>Here&#8217;s an example (note that many of the content files in the preface don&#8217;t span more than one page):</p>
<pre class="brush: xml;">
&lt;page-map xmlns=&quot;http://www.idpf.org/2007/opf&quot;&gt;
  &lt;page name=&quot;&quot;  href=&quot;strandedwithaspy_cov.html&quot;/&gt;
  &lt;page name=&quot;&quot; href=&quot;strandedwithaspy_intro.html&quot;/&gt;
  &lt;page name=&quot;1&quot; href=&quot;strandedwithaspy_fm01.html&quot;/&gt;
  &lt;page name=&quot;2&quot; href=&quot;strandedwithaspy_fm02.html&quot;/&gt;
  &lt;page name=&quot;3&quot; href=&quot;strandedwithaspy_tp01.html&quot;/&gt;
  &lt;page name=&quot;5&quot; href=&quot;strandedwithaspy_adc01.html&quot;/&gt;
  &lt;page name=&quot;5&quot; href=&quot;strandedwithaspy_ata01.html&quot;/&gt;
  &lt;page name=&quot;6&quot; href=&quot;strandedwithaspy_ded01.html&quot;/&gt;
  &lt;page name=&quot;&quot; href=&quot;strandedwithaspy_con01.html&quot;/&gt;
  &lt;page name=&quot;7&quot; href=&quot;strandedwithaspy_fm03.html&quot;/&gt;
  &lt;page name=&quot;8&quot; href=&quot;strandedwithaspy_fm03.html#page8&quot;/&gt;
  &lt;page name=&quot;9&quot; href=&quot;strandedwithaspy_fm03.html#page9&quot;/&gt;
  &lt;page name=&quot;10&quot; href=&quot;strandedwithaspy_ch01.html&quot;/&gt;
  &lt;page name=&quot;11&quot; href=&quot;strandedwithaspy_ch01.html#page11&quot;/&gt;
  &lt;page name=&quot;12&quot; href=&quot;strandedwithaspy_ch01.html#page12&quot;/&gt;
</pre>
<h3>Pages in Action</h3>
<p>How does <code>page-map</code> actually look in Digital Editions?</p>
<a href="http://blog.threepress.org/wp-content/uploads/2009/11/ade_page_map.png"><img src="http://blog.threepress.org/wp-content/uploads/2009/11/ade_page_map.png" alt="An annotated screenshot of a real-world use of the Adobe page-map extension" title="Digital Edition&#039;s page-map support" width="578" height="542" class="size-full wp-image-1034" /></a>
<p>The catch? <code>page-map</code> is (intentionally) an extension to ePub and adding a <code>page-map</code> file to your ePub will make it invalid. On top of that, unless the reading systems is based on Adobe software (like Digital Editions and the Sony Reader),   <code>page-map</code> will have no effect.</p>
<h3>Pages in Digital Editions (without a page-map)</h3>
<p>So, why do you always see these pages in Digital Editions, regardless? An un/fortunate <q>feature</q> of Digital Editions is the addition of the <code>page-map</code>-like display of <q>pages</q>, even if the ePub doesn’t include any <code>page-map</code> file. Here&#8217;s Adobe’s Best Practices again, describing how it chunks any content into a regular size, then labels each chunk a <q>page</q>:</p>
<blockquote><p>When page map is not available in the document, Adobe Digital Editions will synthesize a page-map based on the document content. The approach used is the following:</p>
<p>Determine a compressed byte length of each resource which is referenced in the spine, subtracting any known encryption overhead (IV size)</p>
<p>Assume that there is a page for each 1024 bytes&#8230;</p></blockquote>
<p>Some people <a href="http://code.google.com/p/threepress/issues/detail?id=175&#038;sort=-id">want to see more of this</a>, while others wish it could be <a href="http://www.mobileread.com/forums/showpost.php?p=658764&#038;postcount=18">turned off</a>.</p>
<h3>pageList</h3>
<p>The <a href="http://www.niso.org/workrooms/daisy/Z39-86-2005.html">NCX</a> spec provides an alternative to the <code>page-map</code> extension, the <code>pageList</code>. This aptly named <q>container for pagination information,</q> provides a mechanism for giving a label (<code>navLabel</code>) to a point in the ePub (the <code>pageTarget</code>). </p>
<p>Here’s an example from an <a href="http://www.archive.org/download/songssourdough00servuoft/songssourdough00servuoft.epub">Internet Archive ePub</a>:</p>
<pre class="brush: xml;">
  &lt;pageList&gt;
    &lt;navLabel&gt;
      &lt;text&gt;Pages&lt;/text&gt;
    &lt;/navLabel&gt;
    &lt;pageTarget type=&quot;normal&quot; id=&quot;pagetarget000006&quot; value=&quot;6&quot; playOrder=&quot;6&quot;&gt;
      &lt;navLabel&gt;
        &lt;text&gt;6&lt;/text&gt;
      &lt;/navLabel&gt;
      &lt;content src=&quot;part0000.html#page-6&quot;/&gt;
    &lt;/pageTarget&gt;
    &lt;pageTarget type=&quot;normal&quot; id=&quot;pagetarget000007&quot; value=&quot;7&quot; playOrder=&quot;7&quot;&gt;
      &lt;navLabel&gt;
        &lt;text&gt;7&lt;/text&gt;
      &lt;/navLabel&gt;
      &lt;content src=&quot;part0000.html#page-7&quot;/&gt;
    &lt;/pageTarget&gt;
</pre>
<p>The catch? While the <a href="http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html#Section2.4.1.2">OPF spec</a> (part of ePub) says <q>reading Systems must support NCX.</q> (and mentions <code>pageList</code> offhandedly), it’s not at all clear how much of NCX is supposed to be supported. This ambiguity has meant that no reading systems (to my knowledge) have implemented support for <code>pageList</code>, so it&#8217;s appeal is <a href="http://groups.yahoo.com/group/epub-community/message/160">primarily aesthetic</a>. Unlike <code>page-map</code>, adding a <code>pageList</code> won’t make your ePub invalid. The <a href="http://www.daisy.org/epub/issues/clarify-ncx-reading-system-requirements">EPUB Standards Maintenance Working Group</a> is trying to clarify the NCX issue.</p>
<hr />
<p><sup><a id="fn1text" href="#fn1anchor">1</a></sup> What percentage of people actually have both media in front of them at the same time? It’s unquestionably better when trying to tell someone on the phone about the <q>hilarious double-entendre on page 294,</q> but I wonder how often this happens as well. Why not tell them to search for <q>would you like to check my figures?</q>. Finally, in STM &#038; educational content, where this sort of thing might come up often, the headings are often numbered, and serve as a better guidepost (because they don’t rely on the same trim/pagination for various international editions, etc, etc).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2009/11/26/adobe-page-map-versus-ncx-pagelist/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Choosing InDesign ePub output options</title>
		<link>http://blog.threepress.org/2009/11/23/choosing-indesign-epub-output-options/</link>
		<comments>http://blog.threepress.org/2009/11/23/choosing-indesign-epub-output-options/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 13:51:39 +0000</pubDate>
		<dc:creator>Keith Fahlgren</dc:creator>
				<category><![CDATA[ebooks]]></category>
		<category><![CDATA[epub]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=963</guid>
		<description><![CDATA[InDesign CS4 is one of the most popular tools for creating ePubs, but the range of options it provides when exporting can confound many users. While I’m not a wizened InDesign expert, I have accumulated a set of choices for the various options that differ from the defaults and can help form the basis of [...]]]></description>
			<content:encoded><![CDATA[<p>InDesign CS4 is one of the most popular tools for creating ePubs, but the range of options it provides when exporting can confound many users. While I’m not a wizened InDesign expert, I have accumulated a set of choices for the various options that differ from the defaults and can help form the basis of high-quality ePub output from InDesign. As you become more comfortable working with InDesign’s particular quirks, you’ll probably want to adapt these instructions to your particular style of manuscript setup.</p>
<h3>0. Use InDesign CS4</h3>
<p>It’s critically important to use the InDesign CS4 rather than an earlier version. CS4 included a number of updates and improvements to ePub output, so buy/upgrade at least one seat before starting to work on ePubs from InDesign rather than wasting your time on CS3 bugs that have been fixed. On top of that, make sure you’ve gotten all of the latest CS4 updates from Adobe (which have resolved some other ePub output bugs).</p>
<p>From your CS4 <code>.indb</code> file, select <q>Export Book for Digital Editions&#8230;</q>, an export filename, and start working through the three-paneled set of output options:</p>
<h3>1. General</h3>
<a href="http://blog.threepress.org/wp-content/uploads/2009/11/indesign_cs4_general_epub_output_option_panel.png"><img class="size-full wp-image-965 " title="indesign_cs4_general_epub_output_option_panel" src="http://blog.threepress.org/wp-content/uploads/2009/11/indesign_cs4_general_epub_output_option_panel.png" alt="General options panel in InDesign CS4 ePub output" width="437" height="330" /></a>
<p>The first section of General options have to do with the metadata that InDesign will include in the OPF. Because too many ePubs have <a href="http://blog.threepress.org/2009/11/18/whats-in-an-identifier/">too little metadata</a>, it&#8217;s worth always checking that option &amp; filling out some basic metadata. Adding the name of the publisher (mapped to, surprise, <code>&lt;dc:publisher&gt;</code>) is an obvious choice. In addition, it’s worth filling out some of the fields in File→File Info&#8230; on the book’s Style Source <code>.indd</code> file to provide InDesign with a bit more metadata to work from.</p>
<p>The next set of choices around CSS is more flexible. <q>Defined Styles</q> should be your default choice, especially for internal testing. It (sometimes) provides a reasonable set of CSS without being bloated with overly-specific styles. While it sounds tempting to let InDesign obsessively copy every element of your design from print to CSS, it often degrades the experience on ePub readers other than Adobe Digital Editions and on tiny screens.</p>
<p>If you&#8217;re working on a series of titles with similar styles, you may find it worthwhile to write your own master CSS file that modifies or overrides what InDesign suggests and then manually update the CSS in the ePub with your master. In that case, either start with the CSS generated by <q>Defined Styles</q> &amp; pick and choose what to override or choose <q>Style Names Only</q>, which will give you a CSS skeleton (all the <em>selectors</em> but no <em>declarations</em>).</p>
<p><q>Map to &#8230; Lists</q> under <q>Bullets and Numbers</q> will get you the widest interoperability in most cases. If you have complex numbered lists, it can be hard to get list numbering to continue after a break (in XHTML 1.1 in general), but try to avoid the <q>Convert to Text</q> options unless you’re really in a bind. <q>Map to Static Ordered Lists</q> isn&#8217;t a good choice because it produces invalid ePub.</p>
<p>I don’t recommend Adobe’s font embedding option because it (intentionally) creates invalid ePubs. Liza offered some advice on <a href="http://blog.threepress.org/2009/09/16/how-to-embed-fonts-in-epub-files/">how</a> and <a href="http://blog.threepress.org/2009/09/16/when-to-embed-fonts-in-epub-files/">when</a> to include embedded fonts.</p>
<p><q>View eBook after Exporting</q> just opens Adobe Digital Editions with the exported ePub file.</p>
<h3>2. Images</h3>
<a href="http://blog.threepress.org/wp-content/uploads/2009/11/indesign_cs4_images_epub_output_option_panel.png"><img class="size-full wp-image-966 " title="indesign_cs4_images_epub_output_option_panel" src="http://blog.threepress.org/wp-content/uploads/2009/11/indesign_cs4_images_epub_output_option_panel.png" alt="Images output options" width="436" height="329" /></a>
<p>The set of Images options probably deserves more testing because manuscript setups are so diverse. If you&#8217;ve setup your InDesign files with Links to web-ready images scaled to a reasonable size, choose <q>Copy Images: Original</q>. Because that usually isn&#8217;t the case, you&#8217;re probably better off choosing <q>Optimized</q> and experimenting with the <q>Image Quality</q> JPEG export options. I always choose <q>JPEG</q> over <q>GIF</q> or <q>Automatic</q> as the <q>Image Conversion</q><sup><a href="#fn1text" id="fn1anchor">1</a></sup>. Choose the highest <q>Image Quality</q> you can without making your <a href="http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/">ePub unreasonably large in file size</a>.</p>
<h3>3. Contents</h3>
<a href="http://blog.threepress.org/wp-content/uploads/2009/11/indesign_cs4_contents_epub_output_option_panel.png"><img class="size-full wp-image-964 " title="indesign_cs4_contents_epub_output_option_panel" src="http://blog.threepress.org/wp-content/uploads/2009/11/indesign_cs4_contents_epub_output_option_panel.png" alt="Contents output options" width="437" height="328" /></a>
<p>The final set of options controls the <a href="http://blog.threepress.org/2009/11/19/three-useful-xml-schemas-in-publishing/">schema</a> of the OPS documents that InDesign will output (which will be either XHTML or DTBook). The control also provide some options for the declarative table of contents (NCX). While there are times when DTBook output is encouraged or required, XHTML-based ePubs are much more common and will work well in more ePub readers.</p>
<p>The choices surrounding the <q>Table of Contents</q> depend on the way you&#8217;ve setup your InDesign files, but I&#8217;ve had the best luck with <q>Include InDesign TOC entries</q>, which tries to map the heading hierarchy from your files into the ePub, and <q>Suppress Automatic Entries for Documents</q>. As ePub readers evolve, I’m finding they include increasingly attractive displays of rich NCX data, so it’s worth the time to test out this option.</p>
<hr />
<p id="fn1text"><sup><a href="#fn1anchor">1</a></sup> Although I always choose <q>JPEG</q> instead of <q>Automatic</q>, I was never able to make InDesign output a GIF under the <q>Automatic</q> option, so it may be a non-issue.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2009/11/23/choosing-indesign-epub-output-options/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Vertical text in ePub/CSS: not there yet</title>
		<link>http://blog.threepress.org/2009/11/22/vertical-text-in-epubcss-not-there-yet/</link>
		<comments>http://blog.threepress.org/2009/11/22/vertical-text-in-epubcss-not-there-yet/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 18:45:01 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[bookworm]]></category>
		<category><![CDATA[ebooks]]></category>
		<category><![CDATA[epub]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=1003</guid>
		<description><![CDATA[Languages aren&#8217;t just written right-to-left or left-to-right, of course. They can also be written top-to-bottom, as in Chinese. How can you indicate that a block of text should be rendered vertically rather than horizontally?
In ePub, you can&#8217;t.
I was surprised to discover that the subset of CSS supported by ePub only includes the rtl or ltr [...]]]></description>
			<content:encoded><![CDATA[<p>Languages aren&#8217;t just <a href="http://blog.threepress.org/2009/11/21/bidirectional-text-in-epub/">written right-to-left or left-to-right</a>, of course. They can also be written top-to-bottom, as in Chinese. How can you indicate that a block of text should be rendered vertically rather than horizontally?</p>
<p>In ePub, you can&#8217;t.</p>
<p>I was surprised to discover that the subset of CSS supported by ePub only includes the <code>rtl</code> or <code>ltr</code> values in the <a href="http://www.idpf.org/2007/ops/OPS_2.0_final_spec.html#Section3.3">CSS 2 direction property</a>.  Rendering vertical text is covered in the <a href="http://dev.w3.org/csswg/css3-text-layout/">CSS3 Text Layout module</a>, which is still in the editorial draft stage.  But it does have some unusual browser support already.</p>
<p>There are multiple methods to specify directionality, but the simplest is the shorthand <a href="http://dev.w3.org/csswg/css3-text-layout/#writing-mode"><code>writing-mode</code> property</a>:</p>
<pre class="brush: css;">
writing-mode: tb-rl; /* Chinese */
writing-mode: lr-tb; /* English */
</pre>
<p>Curiously, <code>writing-mode</code> is only supported by one browser: Internet Explorer.  I doubt any other ePub rendering systems besides Bookworm viewed with IE will actually display such content properly (nor are they required to, as <code>writing-mode</code> isn&#8217;t in the specification).</p>
<p>Here&#8217;s a modified Project Gutenberg book with the English boilerplate and Chinese content rendered with different <code>writing-mode</code> values, as shown above:</p>
<p><a href="http://blog.threepress.org/wp-content/uploads/2009/11/chinese.png"><img src="http://blog.threepress.org/wp-content/uploads/2009/11/chinese-300x269.png" alt="chinese" title="chinese" width="300" height="269" class="aligncenter size-medium wp-image-1005" /></a></p>
<p>I&#8217;d like to see future revisions of ePub specifically support <code>writing-mode</code>, even if the CSS Text Layout module isn&#8217;t fully ratified at that time. (I may add an issue to the <a href="http://www.daisy.org/epub/">EPUB Maintenance</a> list if that seems warranted.)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2009/11/22/vertical-text-in-epubcss-not-there-yet/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Bidirectional text in ePub</title>
		<link>http://blog.threepress.org/2009/11/21/bidirectional-text-in-epub/</link>
		<comments>http://blog.threepress.org/2009/11/21/bidirectional-text-in-epub/#comments</comments>
		<pubDate>Sat, 21 Nov 2009 21:31:36 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[ebooks]]></category>
		<category><![CDATA[epub]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=948</guid>
		<description><![CDATA[Languages such as Arabic and Hebrew are written right-to-left (RTL) rather than left-to-right (LTR), as in European languages. When dealing with only one of those scripts at a time, computers generally handle the directionality well by just falling back on the user&#8217;s general language setting. But what if you have to render text in multiple [...]]]></description>
			<content:encoded><![CDATA[<p>Languages such as Arabic and Hebrew are written right-to-left (RTL) rather than left-to-right (LTR), as in European languages. When dealing with only one of those scripts at a time, computers generally handle the directionality well by just falling back on the user&#8217;s general language setting. But what if you have to render text in multiple directions?</p>
<p>This <a href="http://www.gutenberg.org/etext/18291">Project Gutenberg edition of Hunger</a> by Knut Hamsun has text in Hebrew.  It also has text in English. Ideally we&#8217;d like it to run in the correct direction for each language, but Bookworm shows it all running LTR (note the ragged-right margin):</p>
<p><a href="http://blog.threepress.org/wp-content/uploads/2009/11/Picture-51.png"><img src="http://blog.threepress.org/wp-content/uploads/2009/11/Picture-51-300x106.png" alt="Picture 51" title="Picture 51" width="300" height="106" class="aligncenter size-medium wp-image-949" style="border: 1px solid gray"/></a></p>
<p>Maybe it&#8217;s a Bookworm bug; if I open the XHTML file directly in my browser, I get the Hebrew text RTL, but the English text is aligned right, which looks weird:</p>
<p><a href="http://blog.threepress.org/wp-content/uploads/2009/11/Picture-52.png"><img src="http://blog.threepress.org/wp-content/uploads/2009/11/Picture-52-300x105.png" alt="Picture 52" title="Picture 52" width="300" height="105" class="aligncenter size-medium wp-image-951" style="border: 1px solid gray" /></a></p>
<p>Adobe Digital Editions doesn&#8217;t have a Hebrew font, and also the directionality is wrong:</p>
<p><a href="http://blog.threepress.org/wp-content/uploads/2009/11/Picture-50.png"><img src="http://blog.threepress.org/wp-content/uploads/2009/11/Picture-50-300x147.png" alt="Picture 50" title="Picture 50" width="300" height="147" class="aligncenter size-medium wp-image-952" style="border: 1px solid gray" /></a></p>
<p>(This is a case where <a href="http://blog.threepress.org/2009/09/16/when-to-embed-fonts-in-epub-files/">font embedding</a> is appropriate.)</p>
<p>This particular ebook is coded with the directionality marked using the <code>dir</code> attribute in the <code>&lt;body&gt;</code>:</p>
<pre class="brush: xml;">
&lt;body dir=&quot;rtl&quot; xml:lang=&quot;he&quot;&gt;
</pre>
<p>But that&#8217;s not really accurate, since the first block of content is in English (it&#8217;s the Project Gutenberg boilerplate text).  We need to mark up the ebook to properly describe the directionality of all parts of the text.</p>
<p>CSS provides the <a href="http://www.w3.org/TR/CSS2/visuren.html#propdef-direction">direction</a> property for this. It&#8217;s pretty simple; you can have:</p>
<pre class="brush: css;">
direction: ltr; /* Left to right */
direction: rtl; /* Right to left */
</pre>
<p>So a better way to mark up <em>Hunger</em> might be to define the default as <code>rtl</code> and then mark up any English text with a <code>.english</code> class:</p>
<pre class="brush: css;">
body {
  direction:rtl;
}
.english {
  direction:ltr;
}
</pre>
<p>And add the <code>.english</code> class where necessary:</p>
<pre class="brush: xml;">
&lt;div class=&quot;pgheader english&quot; id=&quot;pgepubid00000&quot;&gt;&lt;p&gt;
&lt;br/&gt;
&lt;br/&gt;The Project Gutenberg EBook of Hunger, by Knut Hamsun
&lt;br/&gt;
...&lt;/p&gt;&lt;/div&gt;
</pre>
<p>Now Bookworm is happy:</p>
<p><a href="http://blog.threepress.org/wp-content/uploads/2009/11/Picture-54.png"><img src="http://blog.threepress.org/wp-content/uploads/2009/11/Picture-54-300x107.png" alt="Picture 54" title="Picture 54" width="300" height="107" class="aligncenter size-medium wp-image-953" style="border: 1px solid gray" /></a></p>
<p>And later on that same page&#8230;</p>
<p><a href="http://blog.threepress.org/wp-content/uploads/2009/11/Picture-53.png"><img src="http://blog.threepress.org/wp-content/uploads/2009/11/Picture-53-300x99.png" alt="Picture 53" title="Picture 53" width="300" height="99" class="aligncenter size-medium wp-image-954" style="border: 1px solid gray"/></a></p>
<p>(The fact that Bookworm didn&#8217;t respect the initial <code>dir</code> attribute is a bug, which I opened <a href="http://code.google.com/p/threepress/issues/detail?id=176">with a test case</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2009/11/21/bidirectional-text-in-epub/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Best practices in ePub cover images</title>
		<link>http://blog.threepress.org/2009/11/20/best-practices-in-epub-cover-images/</link>
		<comments>http://blog.threepress.org/2009/11/20/best-practices-in-epub-cover-images/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 13:04:46 +0000</pubDate>
		<dc:creator>Keith Fahlgren</dc:creator>
				<category><![CDATA[ebooks]]></category>
		<category><![CDATA[epub]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=732</guid>
		<description><![CDATA[[Today's guest post is by Keith Fahlgren. - Liza]
The three ePub specifications (OPF, OPS, and OCF) include a lot of detail on ebook metadata and markup, but do not include a technique for describing covers. Despite this omission (a fix is being discussed), there are some widely adopted approaches for marking up covers that will [...]]]></description>
			<content:encoded><![CDATA[<p>[Today's guest post is by Keith Fahlgren. - <em>Liza</em>]</p>
<p>The three ePub specifications (OPF, OPS, and OCF) include a lot of detail on ebook metadata and markup, but do not include a technique for describing covers. Despite this omission (<a href="http://www.daisy.org/epub/issues/covers-metadata">a fix is being discussed</a>), there are some <a href="http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen&#038;File=cover.htm#IDPF2">widely adopted</a> <a href="http://www.lexcycle.com/node/1462#comment-5188">approaches</a> for marking up covers that will make the cover display correctly in most (but <a href="http://code.google.com/p/threepress/issues/detail?id=75">not all</a>) ePub reading systems. </p>
<p>Ensuring the widest interoperability for your cover means including an XHTML cover page (as the first “flow”), a cover image, and bits of OPF metadata to refer to these items using consistent names.</p>
<h3>Cover as XHTML</h3>
<p>To get started, here&#8217;s an extremely minimal XHTML cover page that serves only to reference the cover image. This file should be named <code>cover.html</code>:</p>
<pre class="brush: xml;">
&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.1//EN&quot; &quot;http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd&quot;&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;
  &lt;head&gt;
    &lt;title&gt;Cover&lt;/title&gt;
    &lt;style type=&quot;text/css&quot;&gt; img { max-width: 100%; } &lt;/style&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;div id=&quot;cover-image&quot;&gt;
      &lt;img src=&quot;the_cover.jpg&quot; alt=&quot;Title of this Thing, Second Edition&quot;/&gt;
    &lt;/div&gt;
  &lt;/body&gt;
&lt;/html&gt;
</pre>
<h3>Cover as image</h3>
<p>Next, make sure the cover image itself, probably in JPEG (it&#8217;s a photo) or PNG (it isn&#8217;t) format, is scaled to less than 1000px in width and height. Choosing the perfect size for ebook images is still an open question given the growing range of screen sizes, but shrinking to 1000px should ensure that you don&#8217;t trigger a (historical) bug that crashed a popular mobile reading system by giving it a cover too large in size.</p>
<h3>Covered in metadata</h3>
<p>The real fun<a href="http://www.slate.com/id/2111172/">¡</a> is in the OPF file. First, make sure you have references to both the image and XHTML covers in your OPF <code>manifest</code>. The <strong>id</strong> attributes below are important to note:</p>
<pre ">
&lt;item <strong>id="cover"</strong>       href="cover.html"    media-type="application/xhtml+xml"/&gt;
&lt;item <strong>id="cover-image"</strong> href="the_cover.jpg" media-type="image/jpeg"/&gt;
</pre>
<p>Using the same value as the <code>id</code> attribute for the XHTML cover, add an initial <code>itemref</code> to the <code>spine</code> element that refers to the cover page but is not <code>linear</code>. This tells reading systems (like ADE) what position the XHTML cover should have in the document (must be first to trigger cover behavior), but asks that it not be displayed to the reader as part of the “linear reading order of the publication.”</p>
<pre>
&lt;spine toc="ncxtoc"&gt;
  &lt;itemref <strong>idref="cover"</strong> linear="no"/&gt;
  &lt;itemref idref="normal-first-content"/&gt;
</pre>
<p>The cover image also gets a special bit of metadata: a reference as a <code>meta</code> element inside the OPF metadata block. It&#8217;s critical that this use <code>name="cover"</code> as a signal to reading systems that would prefer to show covers as images rather than covers as XHTML:</p>
<pre>
&lt;meta name="cover" <strong>content="cover-image"</strong>/&gt;
</pre>
<p>Finally, it&#8217;s a good choice to include an OPF <code>guide</code> reference to your XHTML cover as a final hint to reading systems of what&#8217;s going on:</p>
<pre class="brush: xml;">
&lt;guide&gt;
  &lt;reference href=&quot;cover.html&quot; type=&quot;cover&quot; title=&quot;Cover&quot;/&gt;
</pre>
<p>If you&#8217;ve gone to all this trouble to include a cover in your ebook, please don&#8217;t forget that <a href="http://dearauthor.com/wordpress/2009/09/27/digital-consumers-like-pictures-too/">readers expect it to be nice &#038; in color</a>.</p>
<h3>Other approaches</h3>
<p>Adobe&#8217;s <a href="http://www.adobe.com/devnet/digitalpublishing/epubs/EPUBBestPractices-1_0_3.epub">EPUB Best Practices</a> recommends a SVG-based approach to cover images. While it&#8217;s an interesting technique that can integrate well with Adobe Digital Editions, I haven&#8217;t found it to be widely supported.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2009/11/20/best-practices-in-epub-cover-images/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Three useful XML schemas in publishing</title>
		<link>http://blog.threepress.org/2009/11/19/three-useful-xml-schemas-in-publishing/</link>
		<comments>http://blog.threepress.org/2009/11/19/three-useful-xml-schemas-in-publishing/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 16:15:56 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[ebooks]]></category>
		<category><![CDATA[epub]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=931</guid>
		<description><![CDATA[If I say that a document is in &#8220;XML&#8221;, I&#8217;m not really saying anything very specific. All I&#8217;ve told you is that the document has some text wrapped in various angle-brackets, and that those angle-brackets are &#8220;well-formed.&#8221;  A well-formed XML document just means one in which the angle-brackets open and close in a predictable [...]]]></description>
			<content:encoded><![CDATA[<p>If I say that a document is in &#8220;XML&#8221;, I&#8217;m not really saying anything very specific. All I&#8217;ve told you is that the document has some text wrapped in various angle-brackets, and that those angle-brackets are &#8220;well-formed.&#8221;  A well-formed XML document just means one in which the angle-brackets open and close in a predictable way.</p>
<p>It doesn&#8217;t tell you anything about the information encoded in those angle-brackets (really called <em>elements</em>). If the element is called <code>&lt;i&gt;</code>, does that mean &#8220;put this text in italics&#8221;? Or &#8220;indent&#8221;? Or even, &#8220;The following text is about me&#8221;?</p>
<p>In order to know what an XML document actually means, you need to know its <strong>schema</strong>. A schema is a kind of dictionary that defines all the names of the elements and to some extent, what they mean.  It also describes the grammar of the document: for example, we might say that a <code>&lt;chapter&gt;</code> can be inside a <code>&lt;book&gt;</code> but not the other way around.  </p>
<p>You can make up your own schema, and that&#8217;s often advisable when modeling a unique business practice. But books and other kinds of literature are well-understood, and there&#8217;s already been a huge amount of thought put into how to properly model them in XML.  If you&#8217;re in digital publishing, these are the three schemas you&#8217;re most likely to come across when modeling written works:</p>
<h3>DocBook</h3>
<p>Originally designed for technical books, <a href="http://docbook.org/">DocBook</a> has emerged as an excellent general-purpose book schema. Because it&#8217;s in wide use, there are a lot of modern tools that understand it (including the excellent <a href="http://www.oxygenxml.com/">oXygen XML editor</a>), and it&#8217;s trivial to generate other formats, including PDF and HTML, from a DocBook source.</p>
<p>Here&#8217;s a really simple DocBook document, in this case describing an article rather than a whole book:</p>
<pre class="brush: xml;">
&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;
&lt;article xmlns=&quot;http://docbook.org/ns/docbook&quot; version=&quot;5.0&quot; xml:lang=&quot;en&quot;&gt;
  &lt;title&gt;Sample article&lt;/title&gt;
  &lt;para&gt;This is a very short article.&lt;/para&gt;
&lt;/article&gt;
</pre>
<h3>TEI</h3>
<p>The <a href="http://www.tei-c.org/index.xml">Text Encoding Initiative</a> is also used to model textual works, but supports methods to encode historical and academic texts.  TEI allows document authors to include revision history, extensive footnoting and cross-references, and provides a rich tagging mechanism for poetry, drama, and other forms of human literature.</p>
<p>TEI is frequently used in library digitization and archiving projects, and it can be used to encode texts that might seem otherwise impossible to render in XML.  This project from the University of Maryland really shows off TEI&#8217;s power: <a href="http://www.lib.umd.edu/digital/transition/">In Transition: Selected Poems by the Baroness Elsa von Freytag-Loringhoven</a>.</p>
<h3>XHTML</h3>
<p>In lots of ways, XHTML is wholly unsuited for use in book content. XHTML has almost no semantically-meaningful elements as applied to literature &#8212; there&#8217;s no built-in way to indicate a chapter, or footnotes, or dialogue versus description. </p>
<p>The advantage it does have is that it&#8217;s ubiquitous &#8212; thanks to the web &#8212; and many people who otherwise have no experience in XML or text encoding know at least a little HTML. Because of the web there are probably more works written in HTML today than in any other form in history.</p>
<p>By supplementing it with other forms of XML that <em>do</em> provide semantic structure, as in ePub, XHTML is demonstrably a useful and important commercial format. </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2009/11/19/three-useful-xml-schemas-in-publishing/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>
