<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Threepress Consulting blog &#187; google</title>
	<atom:link href="http://blog.threepress.org/tag/google/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.threepress.org</link>
	<description>Threepress creates software for publishers, educators and authors.</description>
	<lastBuildDate>Fri, 03 Sep 2010 23:28:54 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Python and XML (and Google!) in publishing applications</title>
		<link>http://blog.threepress.org/2008/10/28/python-and-xml-and-google-in-publishing-applications/</link>
		<comments>http://blog.threepress.org/2008/10/28/python-and-xml-and-google-in-publishing-applications/#comments</comments>
		<pubDate>Tue, 28 Oct 2008 21:38:24 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[tools]]></category>
		<category><![CDATA[article]]></category>
		<category><![CDATA[digitization]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[ibm]]></category>
		<category><![CDATA[lxml]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=156</guid>
		<description><![CDATA[IBM DeveloperWorks has just released an article of mine on High-Performance XML Parsing in Python.  Although there is nothing publishing-centric about the article itself, it was based on my own experience in dealing with large XML datasets in academic publishing.



Massive XML files are uncommon in the general web development world, where the primary roles of [...]]]></description>
			<content:encoded><![CDATA[<p>IBM DeveloperWorks has just released an article of mine on <a href="http://www.ibm.com/developerworks/library/x-hiperfparse/">High-Performance XML Parsing in Python</a>.  Although there is nothing publishing-centric about the article itself, it was based on my own experience in dealing with large XML datasets in academic publishing.</p>
<div align="center">
<a href="http://www.ibm.com/developerworks/library/x-hiperfparse/"><img class="alignnone size-medium wp-image-157" title="lxml article screenshot" src="http://blog.threepress.org/wp-content/uploads/2008/10/picture-12-300x226.png" alt="" width="300" height="226" style="float:none"/></a>
</div>
<p>Massive XML files are uncommon in the general web development world, where the primary roles of XML are either as configuration files, read only infrequently, or for interchange across the web, in which case the files are necessarily small.  It&#8217;s rare to encounter XML measured in gigabytes or more; data at that level is usually stored in a relational database.</p>
<p>For that reason I find myself frustrated with many XML tools, even those ostensibly designed to handle large amounts of data.  Too often they don&#8217;t scale well or at least easily.  I don&#8217;t believe that scaling should be a black art that each individual developer needs to solve independently.  Unfortunately, in commercial products ease-of-use is a key bullet point and computationally-difficult problems are hard to summarize in a user&#8217;s guide.</p>
<p>I tend to recommend open-source software most strongly in two scenarios: for <a href="http://blog.threepress.org/2008/10/14/corpus-toneelkritiek-interbellum/">small projects with limited budgets</a> and for large projects with unique challenges.  There simply isn&#8217;t going to be a one-size-fits-all application for most interesting publishing work.</p>
<p>This is one of many reasons I&#8217;m excited by Google&#8217;s willingness to <a href="http://tinyurl.com/6kc6hx">open its Google Books archive to researchers</a>:  Python is a first-class programming language in the Google ecosystem, and Google has a good track record of open-sourcing those internal tools with limited commercial value.  I expect a lot of interesting work to come out of that archive once it&#8217;s available.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2008/10/28/python-and-xml-and-google-in-publishing-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookworm library integration with Google Books Search</title>
		<link>http://blog.threepress.org/2008/09/24/bookworm-library-integration-with-google-books-search/</link>
		<comments>http://blog.threepress.org/2008/09/24/bookworm-library-integration-with-google-books-search/#comments</comments>
		<pubDate>Wed, 24 Sep 2008 14:11:09 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[bookworm]]></category>
		<category><![CDATA[dublin core]]></category>
		<category><![CDATA[epub]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[google books]]></category>
		<category><![CDATA[identifier]]></category>
		<category><![CDATA[isbn]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=45</guid>
		<description><![CDATA[On September 22nd Google Books announced its expanded Google Book Search API, which includes the ability to preview and search Google Books content from other web sites.
Bookworm now has integration with one part of this API.  The Book Information page (available from the table of contents for each Bookworm book), displays results from the [...]]]></description>
			<content:encoded><![CDATA[<p>On September 22nd Google Books announced its expanded <a href="http://booksearch.blogspot.com/2008/09/book-search-everywhere-with-new.html">Google Book Search</a> API, which includes the ability to preview and search Google Books content from other web sites.</p>
<p><a href="http://bookworm.threepress.org/">Bookworm</a> now has integration with one part of this API.  The <strong>Book Information</strong> page (available from the table of contents for each Bookworm book), displays results from the Google Book Search service for that title and author.</p>
<div align="center">
<a href="http://blog.threepress.org/wp-content/uploads/2008/09/picture-11.png"><img class="size-medium wp-image-46" title="Google Books Search screenshot" src="http://blog.threepress.org/wp-content/uploads/2008/09/picture-11-300x185.png" alt="Anne of Green Gables results from Google Book Search" width="300" height="185" /></a>
</div>
<h2>How good are the results?</h2>
<p>Frankly I&#8217;m disappointed.  The metadata is often sloppy: description fields are sometimes nonsensical, there are numerous spacing errors in which words run together, and there is much more data available when you click through to the Google Books page than was returned by the API.  </p>
<p>Nevertheless, I have decided to include the data in this single place per book, to help Bookworm users find print editions of their ebooks (especially for public domain books).</p>
<h2>The identifier problem</h2>
<p>This latest API is not the first that Google Books released, but it is the first that allows arbitrary search queries (such as for title and author name).  The previous version only allowed searches by ISBN.</p>
<p>The ePub standard <a href="http://www.openebook.org/2007/opf/OPF_2.0_final_spec.html#Section2.2.10">requires that ebooks be tagged with a unique identifier</a> but does not specify what that identifier is.  Obviously public domain and non-books don&#8217;t have ISBNs.  Some publishers are assigning an <a href="http://www.teleread.org/blog/2008/08/21/isbn-bloat-vs-money-for-small-e-publishers-writers-and-editors/">ISBN as the ePub identifier</a>, but using unique ISBNs for their digital editions.  It would be nice if I could uniquely tie the ePub version of a book on Bookworm to its print counterpart (and leverage powerful Google features like searching that book content), but that&#8217;s not going to be possible when the editions have different ISBNs.   Similarly it would be difficult to encourage users to buy a print version from Amazon or other retailers without running the risk of pointing to an older edition or one by a different publisher.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2008/09/24/bookworm-library-integration-with-google-books-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
