<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Threepress Consulting blog &#187; python</title>
	<atom:link href="http://blog.threepress.org/tag/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.threepress.org</link>
	<description>Threepress creates software for publishers, educators and authors.</description>
	<lastBuildDate>Mon, 09 Jan 2012 13:02:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>On IBM Developerworks: Doing More with the Django Admin</title>
		<link>http://blog.threepress.org/2009/06/03/on-ibm-developerworks-doing-more-with-the-django-admin/</link>
		<comments>http://blog.threepress.org/2009/06/03/on-ibm-developerworks-doing-more-with-the-django-admin/#comments</comments>
		<pubDate>Wed, 03 Jun 2009 15:01:42 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[developerworks]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[ibm]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=507</guid>
		<description><![CDATA[The second of two articles I wrote for IBM Developerworks: Doing More with the Django Admin (the first was Better Django Models).

The &#8220;Admin&#8221; is the administrative console that&#8217;s built into the Django application.  It provides an easy way to administer content on any Django site.  For developers who are tired of writing the [...]]]></description>
			<content:encoded><![CDATA[<p>The second of two articles I wrote for IBM Developerworks: <a href="http://www.ibm.com/developerworks/opensource/library/os-django-admin/index.html">Doing More with the Django Admin</a> (the first was <a href="http://blog.threepress.org/2009/06/02/on-ibm-developerworks-better-django-models/">Better Django Models</a>).</p>
<p><a href="http://www.ibm.com/developerworks/opensource/library/os-django-admin/index.html"><img src="http://blog.threepress.org/wp-content/uploads/2009/06/picture-26-289x300.png" alt="picture-26" title="picture-26" width="289" height="300" class="aligncenter size-medium wp-image-508" /></a></p>
<p>The &#8220;Admin&#8221; is the administrative console that&#8217;s built into the Django application.  It provides an easy way to administer content on any Django site.  For developers who are tired of writing the same admin functionality again and again (&#8220;Add/Edit/Delete page&#8221;) it&#8217;s a huge win.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2009/06/03/on-ibm-developerworks-doing-more-with-the-django-admin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On IBM Developerworks: Better Django Models</title>
		<link>http://blog.threepress.org/2009/06/02/on-ibm-developerworks-better-django-models/</link>
		<comments>http://blog.threepress.org/2009/06/02/on-ibm-developerworks-better-django-models/#comments</comments>
		<pubDate>Tue, 02 Jun 2009 14:48:28 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[bookworm]]></category>
		<category><![CDATA[developerworks]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[ibm]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=502</guid>
		<description><![CDATA[The first of two articles I wrote about the Python web framework Django has been published by IBM Developerworks: Better Django Models.

Neither one is publishing-specific, but Django (like Rails and Drupal) is a great way to deploy a web application quickly and easily. (Bookworm runs on Django, as do many of my smaller projects.)
]]></description>
			<content:encoded><![CDATA[<p>The first of two articles I wrote about the Python web framework Django has been published by IBM Developerworks: <a href="http://www.ibm.com/developerworks/opensource/library/os-django-models/index.html">Better Django Models</a>.</p>
<p><a href="http://www.ibm.com/developerworks/opensource/library/os-django-models/index.html"><img src="http://blog.threepress.org/wp-content/uploads/2009/06/picture-25-300x290.png" alt="picture-25" title="picture-25" width="300" height="290" class="aligncenter size-medium wp-image-503" /></a></p>
<p>Neither one is publishing-specific, but Django (like Rails and Drupal) is a great way to deploy a web application quickly and easily. (<a href="http://bookworm.oreilly.com/">Bookworm</a> runs on Django, as do many of my smaller projects.)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2009/06/02/on-ibm-developerworks-better-django-models/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Python and XML (and Google!) in publishing applications</title>
		<link>http://blog.threepress.org/2008/10/28/python-and-xml-and-google-in-publishing-applications/</link>
		<comments>http://blog.threepress.org/2008/10/28/python-and-xml-and-google-in-publishing-applications/#comments</comments>
		<pubDate>Tue, 28 Oct 2008 21:38:24 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[tools]]></category>
		<category><![CDATA[article]]></category>
		<category><![CDATA[digitization]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[ibm]]></category>
		<category><![CDATA[lxml]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=156</guid>
		<description><![CDATA[IBM DeveloperWorks has just released an article of mine on High-Performance XML Parsing in Python.  Although there is nothing publishing-centric about the article itself, it was based on my own experience in dealing with large XML datasets in academic publishing.



Massive XML files are uncommon in the general web development world, where the primary roles of [...]]]></description>
			<content:encoded><![CDATA[<p>IBM DeveloperWorks has just released an article of mine on <a href="http://www.ibm.com/developerworks/library/x-hiperfparse/">High-Performance XML Parsing in Python</a>.  Although there is nothing publishing-centric about the article itself, it was based on my own experience in dealing with large XML datasets in academic publishing.</p>
<div align="center">
<a href="http://www.ibm.com/developerworks/library/x-hiperfparse/"><img class="alignnone size-medium wp-image-157" title="lxml article screenshot" src="http://blog.threepress.org/wp-content/uploads/2008/10/picture-12-300x226.png" alt="" width="300" height="226" style="float:none"/></a>
</div>
<p>Massive XML files are uncommon in the general web development world, where the primary roles of XML are either as configuration files, read only infrequently, or for interchange across the web, in which case the files are necessarily small.  It&#8217;s rare to encounter XML measured in gigabytes or more; data at that level is usually stored in a relational database.</p>
<p>For that reason I find myself frustrated with many XML tools, even those ostensibly designed to handle large amounts of data.  Too often they don&#8217;t scale well or at least easily.  I don&#8217;t believe that scaling should be a black art that each individual developer needs to solve independently.  Unfortunately, in commercial products ease-of-use is a key bullet point and computationally-difficult problems are hard to summarize in a user&#8217;s guide.</p>
<p>I tend to recommend open-source software most strongly in two scenarios: for <a href="http://blog.threepress.org/2008/10/14/corpus-toneelkritiek-interbellum/">small projects with limited budgets</a> and for large projects with unique challenges.  There simply isn&#8217;t going to be a one-size-fits-all application for most interesting publishing work.</p>
<p>This is one of many reasons I&#8217;m excited by Google&#8217;s willingness to <a href="http://tinyurl.com/6kc6hx">open its Google Books archive to researchers</a>:  Python is a first-class programming language in the Google ecosystem, and Google has a good track record of open-sourcing those internal tools with limited commercial value.  I expect a lot of interesting work to come out of that archive once it&#8217;s available.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2008/10/28/python-and-xml-and-google-in-publishing-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TEI + Python + lxml + Dutch = Corpus Toneelkritiek Interbellum</title>
		<link>http://blog.threepress.org/2008/10/14/corpus-toneelkritiek-interbellum/</link>
		<comments>http://blog.threepress.org/2008/10/14/corpus-toneelkritiek-interbellum/#comments</comments>
		<pubDate>Wed, 15 Oct 2008 02:42:29 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[digitization]]></category>
		<category><![CDATA[libraries]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[clowns]]></category>
		<category><![CDATA[dutch]]></category>
		<category><![CDATA[lxml]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[tei]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[xslt]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=103</guid>
		<description><![CDATA[I was pleased to be able to assist with the Corpus Toneelkritiek Interbellum project, which allows reading, browsing and searching of early 20th-century Dutch theater reviews.  I can&#8217;t read Dutch, but Google&#8217;s automated translation tells me that the review of Hamlet mentions a &#8220;long modern clown,&#8221; which sounds disturbing enough that I&#8217;ll leave the [...]]]></description>
			<content:encoded><![CDATA[<p>I was pleased to be able to assist with the <a href="http://webh01.ua.ac.be/theso/cti/index.html">Corpus Toneelkritiek Interbellum</a> project, which allows reading, browsing and searching of early 20th-century Dutch theater reviews.  I can&#8217;t read Dutch, but Google&#8217;s automated translation tells me that the review of <a href="http://webh01.ua.ac.be/theso/cti/1926-05-30_putman70.html">Hamlet</a> mentions a &#8220;long modern clown,&#8221; which sounds disturbing enough that I&#8217;ll leave the actual reading to someone else.
</p>
<div style="text-align:center;margin:auto;float:none">
<a href="http://webh01.ua.ac.be/theso/cti/index.html"><img style="float:none" src="http://blog.threepress.org/wp-content/uploads/2008/10/picture-6-300x253.png" alt="" title="picture-6" width="300" height="253"  align="right" /></a>
</div>
<p style="clear:both">
The source documents are encoded in <a href="http://www.tei-c.org/index.xml">TEI</a> XML and rendered to the browser using Python and <a href="http://codespeak.net/lxml/">lxml</a>, three of my favorite technologies.</p>
<p>
There are a few take-aways from this project that might benefit anyone working in a similar area and scale: </p>
<ul>
<li> Use a standard encoding format (in this case TEI, but choose an appropriate one based on the source content)</li>
<li> Use a modern programming language, even in a humanities context (e.g. Python)</li>
<li> Use modern XML parsing tools (e.g. lxml + XPath + XSLT)</li>
</ul>
<p>
The key advantage of libraries such as lxml in publishing and digitization projects is that it allows the developer to freely mix XML-native languages like XPath and XSLT with the expressive, procedural programming style of Python.  I&#8217;m still amazed by how many people are &#8220;parsing&#8221; XML using regular expressions (or worse), or using plain CGI/Perl scripts to serve up content. There are easier ways!</p>
<p> &#8220;Free&#8221; doesn&#8217;t have to mean primitive. In fact I would argue that projects like <a href="http://pinaxproject.com/">Pinax</a> can jump-start library or digital archive sites into the 21st century with less work than a grad student will spend crafting a bespoke Perl script.
</p>
<p> Congratulations to Thomas Crombez and his team!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2008/10/14/corpus-toneelkritiek-interbellum/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Better technical book reviewing with Subversion</title>
		<link>http://blog.threepress.org/2008/08/29/better-technical-book-reviewing-with-subversion/</link>
		<comments>http://blog.threepress.org/2008/08/29/better-technical-book-reviewing-with-subversion/#comments</comments>
		<pubDate>Fri, 29 Aug 2008 21:37:00 +0000</pubDate>
		<dc:creator>Liza Daly</dc:creator>
				<category><![CDATA[tools]]></category>
		<category><![CDATA[editing]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[subversion]]></category>

		<guid isPermaLink="false">http://blog.threepress.org/?p=39</guid>
		<description><![CDATA[

I just received my copy of Python for Unix and Linux System Administration by Noah Gift and Jeremy Jones, for which I was a technical reviewer.  I&#8217;ve done several tech reviews for O&#8217;Reilly in the past, on both Python and CSS, and the least enjoyable part of the process has been the actual method [...]]]></description>
			<content:encoded><![CDATA[<p style="float:right;padding-left: 1em;"><a href="http://oreilly.com/catalog/9780596515829/"><img class="alignright size-full wp-image-40" title="python-for-sysadmins" src="http://blog.threepress.org/wp-content/uploads/2008/08/python-for-sysadmins.gif" alt="" width="180" height="236" /></a>
</p>
<p>I just received my copy of <a href="http://oreilly.com/catalog/9780596515829/">Python for Unix and Linux System Administration</a> by Noah Gift and Jeremy Jones, for which I was a technical reviewer.  I&#8217;ve done several tech reviews for O&#8217;Reilly in the past, on both Python and CSS, and the least enjoyable part of the process has been the actual method of providing feedback.
</p>
<p>
At my previous job we routinely used Word (or OpenOffice) with <em>Track Changes</em> for collaborative editing, and as Word-based tools go I felt that worked well.  For whatever reason, though, most of the pre-release books I&#8217;ve received have been in PDF, which is limiting in several ways:
</p>
<ol>
<li>Cut and paste from PDF, especially of source code, often does not work properly. To test the code a technical reviewer needs to ensure that they are accurately repeating exactly what is in the book.</li>
<li>There is no ability to in-line comment on particular words or phrases.</li>
<li>The copy of the text I&#8217;m reading may be days or even weeks out of date, back when the author did the PDF conversion.</li>
</ol>
<p>
<em>Python for Unix and Linux System Adminstration</em> was different: the authors elected to use the source code control system <a href="http://en.wikipedia.org/wiki/Subversion_(software)">Subversion</a> to manage the writing.  The text was composed in <a href="http://docbook.sourceforge.net/">DocBook XML</a> rather than Word or some other non-text format.  While I&#8217;m sure this was done entirely to facilitate collaboration between the authors, it had the downstream effect of making it supremely easy for me to review it:
</p>
<ol>
<li>Code samples were in plain text, and if they were formatted incorrectly, that was useful feedback to be able to give (especially in a language that is sensitive to whitespace, as Python is)</li>
<li>While I was told I would be able to &#8220;commit&#8221; my changes back to the authors inside of the source text, I still chose to use an external file to provide my comments.  I did this only because I wasn&#8217;t sure that the authors would be able to manage multiple commits coming in from technical reviewers, and because we hadn&#8217;t decided on a common tagging framework.  With more editorial guidance, being able to commit my comments directly into the source could be very useful (including the ability to potentially see other reviewers&#8217; comments, and avoid repeating myself).</li>
<li>Each time I went to work on the book, I was able to get a fresh copy of the text.  I didn&#8217;t go back and re-check old sections, but it did mean that any section I worked on was always up-to-date.</li>
</ol>
<p>
When used with friendly front-end software like <a href="http://tortoisesvn.tigris.org/">TortoiseSVN</a>, Subversion isn&#8217;t even very difficult.  It&#8217;s certainly no more arcane than many professional content management systems.  Although it works best when managing text content (which could be Office-supported XML formats), it would still provide value with binary formats.  It&#8217;s worth considering for any publisher that has to manage multiple, distributed editors or authors and wants to improve the process using entirely free software.
</p>
<p>
For more on the subject, Rachel Greenham has a nice <a href="http://strangenoises.org/subversion-for-writers/">tutorial explaining how authors can use Subversion with OS X</a>.  The definitive word is the <a href="http://svnbook.red-bean.com/">Subversion book</a>.
</p>
<p>
(I highly recommend <em>Python for Unix and Linux System Adminstration</em> as well, even for Python programmers who aren&#8217;t system administrators.  It collects an impressive breadth of information in one place and showed me how to automate processes I hadn&#8217;t even realized needed automating.)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.threepress.org/2008/08/29/better-technical-book-reviewing-with-subversion/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

