<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How big is the average ePub book?</title>
	<atom:link href="http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/</link>
	<description>Threepress creates software for publishers, educators and authors.</description>
	<lastBuildDate>Tue, 07 Feb 2012 13:57:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: bowerbird</title>
		<link>http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/comment-page-1/#comment-1224</link>
		<dc:creator>bowerbird</dc:creator>
		<pubDate>Tue, 17 Nov 2009 20:16:19 +0000</pubDate>
		<guid isPermaLink="false">http://blog.threepress.org/?p=868#comment-1224</guid>
		<description>yes, 30 gigs is a big chunk, for sure, no question about it.

on the other hand, it&#039;s useful to know how big the text is,
as opposed to the images, because you have to manipulate
the text, whereas you merely have to display the pictures...

i asked the question because the .epub format combines
the text and pictures into the .zip file, thereby making it
difficult to easily compute the size of the text all by itself.
so the format you&#039;ve chosen complicates finding an answer.

but you will find that fact more palatable if _you_ say it
than if _i_ say it.

(the fact that it&#039;s all compressed is another complication,
but since most graphics are already at a compressed size,
and most text compresses at a fairly standard rate, this is
not a very big concern.)

your corpus is also dependent on who did the uploading
-- as evidence by the comment up above that says &quot;hey,
most of the big books are probably ones we uploaded&quot; --
so a more useful answer to the question of the size of text
would probably be found in the project gutenberg corpus.

which is not to deny that the total size -- text and pictures,
not to mention audio and video in the days to come -- is
a very useful measure in and of itself, especially when you
consider the matter of &quot;cloud storage&quot; which you mention.

-bowerbird</description>
		<content:encoded><![CDATA[<p>yes, 30 gigs is a big chunk, for sure, no question about it.</p>
<p>on the other hand, it&#8217;s useful to know how big the text is,<br />
as opposed to the images, because you have to manipulate<br />
the text, whereas you merely have to display the pictures&#8230;</p>
<p>i asked the question because the .epub format combines<br />
the text and pictures into the .zip file, thereby making it<br />
difficult to easily compute the size of the text all by itself.<br />
so the format you&#8217;ve chosen complicates finding an answer.</p>
<p>but you will find that fact more palatable if _you_ say it<br />
than if _i_ say it.</p>
<p>(the fact that it&#8217;s all compressed is another complication,<br />
but since most graphics are already at a compressed size,<br />
and most text compresses at a fairly standard rate, this is<br />
not a very big concern.)</p>
<p>your corpus is also dependent on who did the uploading<br />
&#8211; as evidence by the comment up above that says &#8220;hey,<br />
most of the big books are probably ones we uploaded&#8221; &#8211;<br />
so a more useful answer to the question of the size of text<br />
would probably be found in the project gutenberg corpus.</p>
<p>which is not to deny that the total size &#8212; text and pictures,<br />
not to mention audio and video in the days to come &#8212; is<br />
a very useful measure in and of itself, especially when you<br />
consider the matter of &#8220;cloud storage&#8221; which you mention.</p>
<p>-bowerbird</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: liza</title>
		<link>http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/comment-page-1/#comment-1200</link>
		<dc:creator>liza</dc:creator>
		<pubDate>Mon, 16 Nov 2009 21:32:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.threepress.org/?p=868#comment-1200</guid>
		<description>As you might imagine, doing any kind of real analysis on 30 gigabytes of data is pretty time-consuming.  Bytes are easy since the operating system has that information readily available.

I can probably come up with the total size of all the Bookworm images and subtract that from the epub total, but that isn&#039;t as useful as it could be since the text is compressed.</description>
		<content:encoded><![CDATA[<p>As you might imagine, doing any kind of real analysis on 30 gigabytes of data is pretty time-consuming.  Bytes are easy since the operating system has that information readily available.</p>
<p>I can probably come up with the total size of all the Bookworm images and subtract that from the epub total, but that isn&#8217;t as useful as it could be since the text is compressed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bowerbird</title>
		<link>http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/comment-page-1/#comment-1199</link>
		<dc:creator>bowerbird</dc:creator>
		<pubDate>Mon, 16 Nov 2009 21:16:50 +0000</pubDate>
		<guid isPermaLink="false">http://blog.threepress.org/?p=868#comment-1199</guid>
		<description>how big is the text, versus the pictures?

-bowerbird</description>
		<content:encoded><![CDATA[<p>how big is the text, versus the pictures?</p>
<p>-bowerbird</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave Cramer</title>
		<link>http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/comment-page-1/#comment-1195</link>
		<dc:creator>Dave Cramer</dc:creator>
		<pubDate>Mon, 16 Nov 2009 20:01:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.threepress.org/?p=868#comment-1195</guid>
		<description>My biggest is 180MB; smallest is 40k, from a sample of around 3,000 from a major trade publisher. Average around 1.2MB, but the big books push that up. Two-thirds are between 300k and 1MB.

Dave</description>
		<content:encoded><![CDATA[<p>My biggest is 180MB; smallest is 40k, from a sample of around 3,000 from a major trade publisher. Average around 1.2MB, but the big books push that up. Two-thirds are between 300k and 1MB.</p>
<p>Dave</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Lease Morgan</title>
		<link>http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/comment-page-1/#comment-1194</link>
		<dc:creator>Eric Lease Morgan</dc:creator>
		<pubDate>Mon, 16 Nov 2009 19:44:01 +0000</pubDate>
		<guid isPermaLink="false">http://blog.threepress.org/?p=868#comment-1194</guid>
		<description>While the size of a book in bytes is interesting in an of itself, I would advocate you measure the size of a book in terms of number of words, especially if you want to give an idea of how long a book is. I have begun to do this against content in my Alex Catalogue.

Furthermore, the field of digital  humanities offers great amount of other types of analysis. Number of action words. Count the this type of sound or that kind of sound. The number of times a text mentions &quot;great ideas&quot; or &quot;big names&quot;. Calculating an item&#039;s  &quot;readability index&quot;. 

Fun with computers and books.</description>
		<content:encoded><![CDATA[<p>While the size of a book in bytes is interesting in an of itself, I would advocate you measure the size of a book in terms of number of words, especially if you want to give an idea of how long a book is. I have begun to do this against content in my Alex Catalogue.</p>
<p>Furthermore, the field of digital  humanities offers great amount of other types of analysis. Number of action words. Count the this type of sound or that kind of sound. The number of times a text mentions &#8220;great ideas&#8221; or &#8220;big names&#8221;. Calculating an item&#8217;s  &#8220;readability index&#8221;. </p>
<p>Fun with computers and books.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: stuart yeates</title>
		<link>http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/comment-page-1/#comment-1192</link>
		<dc:creator>stuart yeates</dc:creator>
		<pubDate>Mon, 16 Nov 2009 18:48:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.threepress.org/?p=868#comment-1192</guid>
		<description>Some of those larger ePubs are probably from us. Some of our collections have _very_ large ePubs, particular the Cyclopedia of New Zealand:

http://www.nzetc.org/tm/scholarly/tei-corpus-cyclopedia.html</description>
		<content:encoded><![CDATA[<p>Some of those larger ePubs are probably from us. Some of our collections have _very_ large ePubs, particular the Cyclopedia of New Zealand:</p>
<p><a href="http://www.nzetc.org/tm/scholarly/tei-corpus-cyclopedia.html" rel="nofollow">http://www.nzetc.org/tm/scholarly/tei-corpus-cyclopedia.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: liza</title>
		<link>http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/comment-page-1/#comment-1191</link>
		<dc:creator>liza</dc:creator>
		<pubDate>Mon, 16 Nov 2009 18:33:28 +0000</pubDate>
		<guid isPermaLink="false">http://blog.threepress.org/?p=868#comment-1191</guid>
		<description>Fran: Bookworm only accepts epub, so that&#039;s all I have.</description>
		<content:encoded><![CDATA[<p>Fran: Bookworm only accepts epub, so that&#8217;s all I have.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fran Toolan</title>
		<link>http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/comment-page-1/#comment-1190</link>
		<dc:creator>Fran Toolan</dc:creator>
		<pubDate>Mon, 16 Nov 2009 18:32:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.threepress.org/?p=868#comment-1190</guid>
		<description>Hey Liza, 
do you have similar data for PDF&#039;s?  i&#039;d be curious to see how that correlates with what we are seeing.  thanks.</description>
		<content:encoded><![CDATA[<p>Hey Liza,<br />
do you have similar data for PDF&#8217;s?  i&#8217;d be curious to see how that correlates with what we are seeing.  thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: India</title>
		<link>http://blog.threepress.org/2009/11/16/how-big-is-the-average-epub-book/comment-page-1/#comment-1189</link>
		<dc:creator>India</dc:creator>
		<pubDate>Mon, 16 Nov 2009 18:07:49 +0000</pubDate>
		<guid isPermaLink="false">http://blog.threepress.org/?p=868#comment-1189</guid>
		<description>Oh, the things you could do with that corpus! What fun.

I&#039;d also be interested to know which titles and genres of  books have been uploaded most, what devices they&#039;re being downloaded to, how many users have uploaded those 30,000+ books, . . . 

More, more, more said the baby.</description>
		<content:encoded><![CDATA[<p>Oh, the things you could do with that corpus! What fun.</p>
<p>I&#8217;d also be interested to know which titles and genres of  books have been uploaded most, what devices they&#8217;re being downloaded to, how many users have uploaded those 30,000+ books, . . . </p>
<p>More, more, more said the baby.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

