<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Kier&#039;s Blog &#187; Programming</title>
	<atom:link href="http://www.kierdugan.com/tag/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.kierdugan.com</link>
	<description>Damn right.</description>
	<lastBuildDate>Fri, 11 Mar 2011 23:36:36 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>SimpleHLST – Part 2: Lexical Analysis</title>
		<link>http://www.kierdugan.com/2011/03/11/simplehlst-part-2-lexical-analysis/</link>
		<comments>http://www.kierdugan.com/2011/03/11/simplehlst-part-2-lexical-analysis/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 17:30:22 +0000</pubDate>
		<dc:creator>Kier</dc:creator>
				<category><![CDATA[Digital]]></category>
		<category><![CDATA[Electronics]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Uni]]></category>
		<category><![CDATA[Finite Automata]]></category>
		<category><![CDATA[HDL]]></category>
		<category><![CDATA[Lexical Analysis]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[SimpleHLST]]></category>
		<category><![CDATA[Synthesis]]></category>
		<category><![CDATA[Tokeniser]]></category>

		<guid isPermaLink="false">http://www.kierdugan.com/?p=159</guid>
		<description><![CDATA[SimpleHLST needs to read in some code, written by humans, to begin generating data structures that we can actually work<a href="http://www.kierdugan.com/2011/03/11/simplehlst-part-2-lexical-analysis/" class="searchmore">Read the Rest...</a><div class="clr"></div>]]></description>
			<content:encoded><![CDATA[<p><a title="SimpleHLST - Part 1: Introduction" href="http://www.kierdugan.com/2011/03/01/simplehlst-%E2%80%93-part-1-introduction/" target="_self">SimpleHLST</a> needs to read in some code, written by humans, to begin generating data structures that we can actually work with.  This can take up to three stages:</p>
<ul>
<li>
<div style="text-align: justify;">Lexical Analysis – convert the input text into a list of <em>tokens</em> that distinguishes language features from whitespace, comments, pre-processor commands, etc. (&#8220;Unknown character on line 2&#8243; errors are produced by this stage).</div>
</li>
<li>
<div style="text-align: justify;">Syntax Analysis – convert the list of tokens into a <em>syntax tree</em> which shows the entire file hierarchically as perfectly valid expressions (this is where &#8220;Missing semicolon on line 4&#8243; type errors come from).</div>
</li>
<li>
<div style="text-align: justify;">Semantic Analysis – process the syntax tree to make sure that it makes sense according to the rules of the language.  Type checking falls under this category for instance, so this is where &#8220;warning, assigning &#8216;int&#8217; to &#8217;short&#8217; – possible loss of data&#8221; type errors come from.</div>
</li>
</ul>
<p>SimpleHLST only needs a very basic language and I&#8217;m not planning to have it support complex data, so I don&#8217;t think we&#8217;ll need a semantic analysis stage at the moment.</p>
<h2>An example</h2>
<p>Suppose we have the following mathematical expression, <em><span style="font-family: serif;">f(t) = 2t + k</span></em>, and we want to implement it in some program; we could write something like</p>
<p style="text-align: center;"><span style="font-family: Courier New;">f(t) = 2*t + k</span>.</p>
<p>Before any more advanced stages can take place, the compiler must extract the information from this expression.  This is exactly the same principle as we humans extracting information from text.  We have languages that each have a vocabulary of words with specific definitions which we can use to convey detailed information.  Each word only represents a small piece of information; the context and sentence structure contains the rest.</p>
<p style="text-align: left;">Programming languages are exactly the same and this is where <em>Lexical Analysis</em> (or, colloquially, <em>Tokenisation</em>) gets its name from.  Programming languages tend to use symbols, or <em>tokens</em>, instead of words for brevity.  The output of this stage would be an array that looks somewhat like the following table.</p>
<div align="center">
<table style="border-collapse: collapse; height: 140px;" border="0" width="346">
<colgroup>
<col style="width: 54px;"></col>
<col style="width: 66px;"></col>
<col style="width: 50px;"></col>
<col style="width: 54px;"></col>
<col style="width: 71px;"></col>
<col style="width: 50px;"></col>
</colgroup>
<tbody>
<tr>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-bottom: solid 1.5pt;">
<p style="text-align: center;"><strong>Index</strong></p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-bottom: solid 1.5pt;">
<p style="text-align: center;"><strong>Type</strong></p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-bottom: solid 1.5pt; border-right: solid 1.5pt;">
<p style="text-align: center;"><strong>Data</strong></p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-left: none; border-bottom: solid 1.5pt;">
<p style="text-align: center;"><strong>Index</strong></p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-bottom: solid 1.5pt;">
<p style="text-align: center;"><strong>Type</strong></p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-bottom: solid 1.5pt;">
<p style="text-align: center;"><strong>Data</strong></p>
</td>
</tr>
<tr>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-top: none;">
<p style="text-align: center;">0</p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-top: none;">Char</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-top: none; border-right: solid 1.5pt;">
<p style="text-align: center;"><span style="font-family: Courier New;">f</span></p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-top: none; border-left: none;">
<p style="text-align: center;">5</p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-top: none;">Number</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-top: none;">
<p style="text-align: center;"><span style="font-family: Courier New;">2</span></p>
</td>
</tr>
<tr>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">
<p style="text-align: center;">1</p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">Symbol</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-right: solid 1.5pt;">
<p style="text-align: center;"><span style="font-family: Courier New;">(</span></p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-left: none;">
<p style="text-align: center;">6</p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">Symbol</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">
<p style="text-align: center;"><span style="font-family: Courier New;">*</span></p>
</td>
</tr>
<tr>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">
<p style="text-align: center;">2</p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">Char</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-right: solid 1.5pt;">
<p style="text-align: center;"><span style="font-family: Courier New;">t</span></p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-left: none;">
<p style="text-align: center;">7</p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">Char</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">
<p style="text-align: center;"><span style="font-family: Courier New;">t</span></p>
</td>
</tr>
<tr>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">
<p style="text-align: center;">3</p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">Symbol</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-right: solid 1.5pt;">
<p style="text-align: center;"><span style="font-family: Courier New;">)</span></p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-left: none;">
<p style="text-align: center;">8</p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">Symbol</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">
<p style="text-align: center;"><span style="font-family: Courier New;">+</span></p>
</td>
</tr>
<tr>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">
<p style="text-align: center;">4</p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">Symbol</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-right: solid 1.5pt;">
<p style="text-align: center;"><span style="font-family: Courier New;">=</span></p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px; border-left: none;">
<p style="text-align: center;">9</p>
</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">Char</td>
<td style="padding-top: 4px; padding-left: 7px; padding-bottom: 4px; padding-right: 7px;">
<p style="text-align: center;"><span style="font-family: Courier New;">k</span></p>
</td>
</tr>
</tbody>
</table>
</div>
<p>Now the parser can operate on this set of tokens instead of having to wade through the text itself.  This example is, obviously, very simplistic and having a separate tokenising and parsing stages may even complicate things, but it is incredibly useful for handling more complex languages such as C and Verilog.<span id="more-159"></span></p>
<h2>Finite Automata</h2>
<p>It&#8217;s very tempting to try and define all the jobs of the parser as regular expressions (regex) to begin with, but there&#8217;s a very commonly used construct that they cannot cope with: parenthesis.  It&#8217;s easy to define a regex that can handle say &#8220;()&#8221; but that can&#8217;t adapt to &#8220;(())&#8221; and so on.  We need something more general.</p>
<p>Finite Automata (FA) are abstract state machines that operate on sets of data.  Despite being very abstract we can actually use regular expression to define a simple FA but we may not be able to do the reverse of this.  Let&#8217;s consider a compiler reading in a keyword or a variable name.</p>
<p><a href="http://www.kierdugan.com/wp/wp-content/uploads/2011/03/IdentifierStateDiagram.png"><img class="aligncenter size-full wp-image-165" title="State Diagram for Identifier DFA" src="http://www.kierdugan.com/wp/wp-content/uploads/2011/03/IdentifierStateDiagram.png" alt="" width="312" height="203" /></a></p>
<p>This diagram shows a <em>Deterministic Finite Automata</em> (DFA) that will transition into the terminal state when it encounters a letter and will stay in that state for every alphanumeric character or underscore it encounters thereafter.  Every state transition is clearly defined therefore it is said to be deterministic but this may not always be the case.</p>
<p><a href="http://www.kierdugan.com/wp/wp-content/uploads/2011/03/StringStateDiagram.png"><img class="aligncenter size-full wp-image-166" title="State Diagram for String NFA" src="http://www.kierdugan.com/wp/wp-content/uploads/2011/03/StringStateDiagram.png" alt="" width="266" height="177" /></a></p>
<p>One of the state transitions in this diagram is marked with <em>&epsilon;</em> which means that it is taken if no other transitions match. This allows us to make very complex state transitions with optional states, possibly even bypassing large sections of the state diagram (much like the * and ? operators allow complex regular expressions). Unfortunately every state transition is no longer rigidly defined and hence they are called Non-deterministic Finite Automata (NFA).</p>
<p><em>&epsilon;</em>-edges should be used with caution. It is perfectly acceptable (at least, in the theory) to have <em>several</em> <em>&epsilon;</em>-edges transitioning out of a single state but this means that <em>all</em> of these edges must be taken at once. The first deterministic edge taken on any of these branches then defines the next state of the NFA hence all other edges must be discarded. In practice this makes NFAs unattractive for compiler front-ends because they potentially need large amounts of memory and computation. It is possible to convert NFAs into DFAs but I think I&#8217;ve rambled on long enough about the theory now.</p>
<h2>From Theory to Code</h2>
<p>Theory is useful and worth understanding but it can sometimes be difficult to map onto actual code.  I learned a lot from <a title="Writing a parser: overview" href="http://blog.tcx.be/2007/05/writing-parser-overview.html" target="_blank">Tommy Carlier&#8217;s blog series</a> before I wrote my own lexer so it might be worth looking there as well to fill in any gaps I leave.  There are several ways to implement finite automata in code and the final decision will probably depend on the language you&#8217;re using.  I&#8217;ve decided to write SimpleHLST in C++ which might not be the best decision but it&#8217;s the language I feel most comfortable with.</p>
<p style="text-align: center;"><span style="font-family: Courier New;">output = 15*in_1 + 39*in_2<br />
</span></p>
<p>If we look at the above expression there are only actually three types of token (words, integers and symbols) so the finite automaton only needs three states.  The SimpleHLST tokeniser implements each <em>state</em> as a function call inside the main worker function, <span style="font-family: Courier New;">readNextToken</span>, as shown below.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> Tokeniser<span style="color: #008080;">::</span><span style="color: #007788;">readNextToken</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #666666;">// Skip over all whitespace</span>
    munchWhitespace <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Make sure we don't advance beyond the end of file</span>
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>atEnd <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
        m_curTokType <span style="color: #000080;">=</span> T_EOF<span style="color: #008080;">;</span>
    <span style="color: #0000ff;">else</span> <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span><span style="color: #0000dd;">isalpha</span> <span style="color: #008000;">&#40;</span>m_char<span style="color: #008000;">&#41;</span> <span style="color: #000040;">||</span> m_char <span style="color: #000080;">==</span> <span style="color: #FF0000;">'_'</span><span style="color: #008000;">&#41;</span>
        readWordToken <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">else</span> <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span><span style="color: #0000dd;">isdigit</span> <span style="color: #008000;">&#40;</span>m_char<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
        readNumericalToken <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">else</span> <span style="color: #666666;">// Assume it must be a symbol</span>
        readSymbolToken <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Return the token type</span>
    <span style="color: #0000ff;">return</span> m_curTokType<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Before attempting to read a token, this function skips over all whitespace using the function shown below.  The obvious improvement this function needs is to be able to skip over comments.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">void</span> Tokeniser<span style="color: #008080;">::</span><span style="color: #007788;">munchWhitespace</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #666666;">// Just keep reading until there's no whitespace left.</span>
    <span style="color: #0000ff;">while</span> <span style="color: #008000;">&#40;</span><span style="color: #0000dd;">isspace</span> <span style="color: #008000;">&#40;</span>m_char<span style="color: #008000;">&#41;</span> <span style="color: #000040;">&amp;&amp;</span> <span style="color: #000040;">!</span>m_eof<span style="color: #008000;">&#41;</span>
        readNextChar <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Finally, <span style="font-family: Courier New;">readWordToken</span> is listed below which is capable of extracting &#8220;output&#8221; and &#8220;in_1&#8243; from the expression given earlier.  The <span style="font-family: Courier New;">pushBuffer</span> and <span style="font-family: Courier New;">extractBuffer</span> functions operate on a <span style="font-family: Courier New;">std::string</span> member variable that is used to store words and integers.  An accessor function, <span style="font-family: Courier New;">tokenData</span>, can then be used to find out what the token data is.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">void</span> Tokeniser<span style="color: #008080;">::</span><span style="color: #007788;">readWordToken</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #666666;">// Keep reading alphanumeric characters into the buffer.</span>
    <span style="color: #0000ff;">while</span> <span style="color: #008000;">&#40;</span><span style="color: #0000dd;">isalnum</span> <span style="color: #008000;">&#40;</span>m_char<span style="color: #008000;">&#41;</span> <span style="color: #000040;">||</span> m_char <span style="color: #000080;">==</span> <span style="color: #FF0000;">'_'</span><span style="color: #008000;">&#41;</span>
        pushBuffer <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Store the token details</span>
    extractBuffer <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    m_curTokType <span style="color: #000080;">=</span> T_WORD<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p><a href="http://www.kierdugan.com/wp/wp-content/uploads/2011/03/SimpleHLST-part1-lexer.zip">You can download the source code for the tokeniser in the form of a Visual Studio 2010 solution</a>.  I haven&#8217;t used any features particular to VS2010 so it should be possible to build with any prior version or g++ but I can&#8217;t confirm this.  I&#8217;ve also included some sample code that I hope will become the HDL we&#8217;ll feed into SimpleHLST at a later stage.</p>
<p><a href="http://www.kierdugan.com/wp/wp-content/uploads/2011/03/SimpleHLST-Lexer.png"><img class="aligncenter size-medium wp-image-160" title="SimpleHLST" src="http://www.kierdugan.com/wp/wp-content/uploads/2011/03/SimpleHLST-Lexer-300x188.png" alt="" width="300" height="188" /></a></p>
<p>To try it out: build the project and open a command prompt in the same directory as the SimpleHLST executable.  Then type <span style="font-family: Courier New;">SimpleHLST &lt;expression or filename&gt;</span> on the command line to see the output. The picture above shows some sample output.</p>
<h2>T_EOF</h2>
<p>Now we can read a stream in terms of tokens instead of characters we can get started on the parser! Next time, of course…</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kierdugan.com/2011/03/11/simplehlst-part-2-lexical-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PowerPython&#8230; or PythonPoint&#8230; or something</title>
		<link>http://www.kierdugan.com/2010/07/08/powerpython-or-pythonpoint-or-something/</link>
		<comments>http://www.kierdugan.com/2010/07/08/powerpython-or-pythonpoint-or-something/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 23:17:44 +0000</pubDate>
		<dc:creator>Kier</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Uni]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[COM]]></category>
		<category><![CDATA[Email]]></category>
		<category><![CDATA[IMAP]]></category>
		<category><![CDATA[PowerPoint]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[pythoncom]]></category>
		<category><![CDATA[win32com]]></category>

		<guid isPermaLink="false">http://www.kierdugan.com/?p=96</guid>
		<description><![CDATA[I&#8217;ve been meaning to update this for a fair while now as an uncharacteristically large amount of stuff has happened.<a href="http://www.kierdugan.com/2010/07/08/powerpython-or-pythonpoint-or-something/" class="searchmore">Read the Rest...</a><div class="clr"></div>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been meaning to update this for a fair while now as an uncharacteristically large amount of stuff has happened. Since exams finished I&#8217;ve managed to get a job at <a title="Electronics and Computer Science" href="http://www.ecs.soton.ac.uk/" target="_blank">ECS</a> working for two of my lecturers on two separate projects, which is pretty good because it means my work is varied. Both are IC design projects though, so there is a similar vein running through them.</p>
<p>One of my minor duties on this dual-job is to assemble slides from about twelve people into a large presentation, with cover slides for each speaker, every Friday for a progress meeting we all have. Naturally the first Friday I just did it by hand by importing each one in turn into PowerPoint. However it is a fairly tedious job, and to paraphrase a certain member of staff: why do something by hand when I have a powerful computer under the desk?</p>
<p>So I began to investigate automating the process.</p>
<p>Turns out that <a title="Python" href="http://www.python.org" target="_blank">Python</a> has an <a title="imaplib" href="http://docs.python.org/library/imaplib.html" target="_blank">IMAP module</a> in its standard library, which isn&#8217;t <em>too </em>surprising I suppose as the Python standard library is <strong>enormous</strong>. After some playing I managed to write a program that logged into my university email account and downloaded the appropriate PowerPoint attachments.</p>
<p><span id="more-96"></span></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">imaplib</span> <span style="color: #ff7700;font-weight:bold;">import</span> IMAP4_SSL
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">email</span> <span style="color: #ff7700;font-weight:bold;">import</span> message_from_string
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">datetime</span> <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">datetime</span>, timedelta
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">re</span>
&nbsp;
M = IMAP4_SSL <span style="color: black;">&#40;</span><span style="color: #483d8b;">'imapserver'</span><span style="color: black;">&#41;</span>
M.<span style="color: black;">login</span> <span style="color: black;">&#40;</span><span style="color: #483d8b;">'user'</span>, <span style="color: #483d8b;">'pass'</span><span style="color: black;">&#41;</span>
M.<span style="color: #dc143c;">select</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
since = <span style="color: black;">&#40;</span><span style="color: #dc143c;">datetime</span>.<span style="color: black;">today</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> - timedelta <span style="color: black;">&#40;</span>days=<span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>.<span style="color: black;">strftime</span> <span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%d-%b-%Y&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">for</span> <span style="color: #dc143c;">user</span> <span style="color: #ff7700;font-weight:bold;">in</span> users:
    typ, data = M.<span style="color: black;">search</span> <span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, <span style="color: #483d8b;">'(HEADER FROM &quot;%s&quot; SINCE %s)'</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span>, since<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">for</span> <span style="color: #008000;">id</span> <span style="color: #ff7700;font-weight:bold;">in</span> data<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">split</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
        typ, data = M.<span style="color: black;">fetch</span> <span style="color: black;">&#40;</span><span style="color: #008000;">id</span>, <span style="color: #483d8b;">'RFC822'</span><span style="color: black;">&#41;</span>
        msg = message_from_string <span style="color: black;">&#40;</span>data<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> msg.<span style="color: black;">is_multipart</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">continue</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">for</span> part <span style="color: #ff7700;font-weight:bold;">in</span> msg.<span style="color: black;">walk</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
            fn  = part.<span style="color: black;">get_filename</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            ext = <span style="color: #dc143c;">re</span>.<span style="color: black;">match</span> <span style="color: black;">&#40;</span>r<span style="color: #483d8b;">'.*<span style="color: #000099; font-weight: bold;">\.</span>(pptx?)$'</span>, fn<span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> ext
                <span style="color: #ff7700;font-weight:bold;">continue</span>
&nbsp;
            fp = <span style="color: #008000;">file</span> <span style="color: black;">&#40;</span><span style="color: #483d8b;">'%s.%s.temp'</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span>, ext.<span style="color: black;">group</span> <span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>, <span style="color: #483d8b;">'w'</span><span style="color: black;">&#41;</span>
            fp.<span style="color: black;">write</span> <span style="color: black;">&#40;</span>part.<span style="color: black;">get_payload</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
            fp.<span style="color: black;">close</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
M.<span style="color: black;">close</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>This code would download any <code>.ppt</code> or <code>.pptx</code> attachment from anyone with an email address in the <code>users</code> list and save the file as <code>&lt;user&gt;.ppt.temp</code> or <code>&lt;user&gt;.pptx.temp</code>. It should be possible to decode the message part with <a title="email.message.Message.get_payload" href="http://docs.python.org/library/email.message.html#email.message.Message.get_payload" target="_blank"><code>msg.get_payload (decode=True)</code></a> but it seemed to introduce a fair number of errors into the file. I think this is because it seems to convert line-by-line instead of as one large block. So I used the following code to fix this.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">raw      = part.<span style="color: black;">get_payload</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
raw      = <span style="color: #dc143c;">re</span>.<span style="color: black;">sub</span> <span style="color: black;">&#40;</span>r<span style="color: #483d8b;">'[<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>]+'</span>, <span style="color: #483d8b;">''</span>, raw<span style="color: black;">&#41;</span>
encoding = part<span style="color: black;">&#91;</span><span style="color: #483d8b;">'Content-Transfer-Encoding'</span><span style="color: black;">&#93;</span>.<span style="color: black;">lower</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">if</span> encoding == <span style="color: #483d8b;">'base64'</span>:
    raw = b64decode <span style="color: black;">&#40;</span>raw<span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">elif</span> encoding == <span style="color: #483d8b;">'quoted-printable'</span>:
    raw = <span style="color: #dc143c;">quopri</span>.<span style="color: black;">decodestring</span> <span style="color: black;">&#40;</span>raw<span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">else</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;ERROR: Unknown coding strategy - ignoring.&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">continue</span></pre></div></div>

<p>Not particularly pretty, but it does the job.</p>
<p>So now I had all the emails downloaded, but I still needed to merge them all into a single file. At first I was thinking of making a C++ program to exploit the PowerPoint COM interface, but then I found <a title="Python for Windows Exentsions" href="http://starship.python.net/crew/mhammond/win32/" target="_blank">Python for Windows Extensions</a> which fully supports COM!</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">pythoncom.<span style="color: black;">CoInitializeEx</span> <span style="color: black;">&#40;</span>pythoncom.<span style="color: black;">COINIT_APARTMENTTHREADED</span><span style="color: black;">&#41;</span>
gencache.<span style="color: black;">EnsureModule</span> <span style="color: black;">&#40;</span><span style="color: #483d8b;">'{2DF8D04C-5BFA-101B-BDE5-00AA0044DE52}'</span>, <span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">4</span><span style="color: black;">&#41;</span>
gencache.<span style="color: black;">EnsureDispatch</span> <span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;PowerPoint.Application.12&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Create an instance of PowerPoint and presentation</span>
pp = Dispatch <span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;PowerPoint.Application.12&quot;</span><span style="color: black;">&#41;</span>
pp.<span style="color: black;">Activate</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
pres = pp.<span style="color: black;">Presentations</span>.<span style="color: black;">Add</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Insert all the downloaded presentations</span>
count = <span style="color: #ff4500;">1</span>
<span style="color: #ff7700;font-weight:bold;">for</span> filename <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">os</span>.<span style="color: black;">listdir</span> <span style="color: black;">&#40;</span><span style="color: #483d8b;">'presentations'</span><span style="color: black;">&#41;</span>:
    pres.<span style="color: black;">Slides</span>.<span style="color: black;">InsertFromFile</span> <span style="color: black;">&#40;</span><span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">realpath</span> <span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>, count<span style="color: black;">&#41;</span>
    count = pres.<span style="color: black;">Slides</span>.<span style="color: black;">Count</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Save and exit</span>
pres.<span style="color: black;">SaveAs</span> <span style="color: black;">&#40;</span><span style="color: #483d8b;">'compiled.pptx'</span><span style="color: black;">&#41;</span>
pres.<span style="color: black;">Close</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
pp.<span style="color: black;">Quit</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
pythoncom.<span style="color: black;">CoUninitialize</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>I was quite shocked at how straightforward it was to automate PowerPoint from Python, but this solution wasn&#8217;t quite good enough yet. Using <a title="MSDN - InsertFromFile" href="http://msdn.microsoft.com/en-us/library/bb265418%28v=office.12%29.aspx" target="_blank"><code>InsertFromFile</code></a> means that the imported presentation acquires the formatting of <code>pres</code> which is not what I wanted. Also, there appears to be a bug in PowerPoint 2007 which causes image references to be broken when importing from a <code>.pptx</code> into a <code>.pptx</code> with the COM interface.</p>
<p>Searching for a solution to the <em>import with formatting</em> issue lead me to <a title="CopyWithSourceFormating" href="http://skp.mvps.org/pptxp001.htm" target="_blank">this awesome VBA function</a> which has been referenced many, many times. I ported this to Python and it worked perfectly! There was still the weird image problem, but I used a really crude fix for that:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">for</span> filename <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: black;">&#91;</span>f <span style="color: #ff7700;font-weight:bold;">for</span> f <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">os</span>.<span style="color: black;">listdir</span> <span style="color: black;">&#40;</span><span style="color: #483d8b;">'presentations'</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">if</span> f<span style="color: black;">&#91;</span>-<span style="color: #ff4500;">4</span>:<span style="color: black;">&#93;</span> == <span style="color: #483d8b;">'pptx'</span><span style="color: black;">&#93;</span>:
    pres = pp.<span style="color: black;">Presentations</span>.<span style="color: black;">Open</span> <span style="color: black;">&#40;</span>path.<span style="color: black;">realpath</span> <span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    pres.<span style="color: black;">SaveAs</span> <span style="color: black;">&#40;</span>path.<span style="color: black;">realpath</span> <span style="color: black;">&#40;</span>filename<span style="color: black;">&#91;</span>:-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    pres.<span style="color: black;">Close</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Yes. I converted all the <code>.pptx</code>&#8217;s into <code>.ppt</code>&#8217;s. Nothing intelligent here. Finally, I wrote a function to replace all the fonts added by the import with Arial. Job done.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">fonts = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span> <span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, pres.<span style="color: black;">Fonts</span>.<span style="color: black;">Count</span> + <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>:
    fonts<span style="color: black;">&#91;</span>pres.<span style="color: black;">Fonts</span>.<span style="color: black;">Item</span> <span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span>.<span style="color: black;">Name</span><span style="color: black;">&#93;</span> = <span style="color: #ff4500;">1</span>
fonts = fonts.<span style="color: black;">keys</span> <span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
fonts.<span style="color: black;">remove</span> <span style="color: black;">&#40;</span>u<span style="color: #483d8b;">'Arial'</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> font <span style="color: #ff7700;font-weight:bold;">in</span> fonts:
    pres.<span style="color: black;">Fonts</span>.<span style="color: black;">Replace</span> <span style="color: black;">&#40;</span>font, u<span style="color: #483d8b;">'Arial'</span><span style="color: black;">&#41;</span></pre></div></div>

<p>So after all that I have a Python program that logs into my email, downloads a load of PowerPoint files, converts them all to <code>.ppt</code> format, inserts them into a blank presentation, and then normalises the font to Arial. Not bad for a few hundred lines of code!</p>
<p>I did change the program a bit so that it would copy a template with title slides in and add the presentations to that instead of a blank file, and then set the date appropriately on the main title slide. But the key point here is that I&#8217;ve replaced my tedious Friday-morning activity with a single command: <code>makepres</code>.</p>
<p>Victory.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kierdugan.com/2010/07/08/powerpython-or-pythonpoint-or-something/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twitter Me Xerces!</title>
		<link>http://www.kierdugan.com/2010/03/25/twitter-me-xerces/</link>
		<comments>http://www.kierdugan.com/2010/03/25/twitter-me-xerces/#comments</comments>
		<pubDate>Thu, 25 Mar 2010 00:03:42 +0000</pubDate>
		<dc:creator>Kier</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Random]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[libcurl]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[Xerces]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.kierdugan.com/?p=58</guid>
		<description><![CDATA[Following from the spirit of yesterdays post, little victories&#8230;
Yesterday I managed to download the front page of my website using<a href="http://www.kierdugan.com/2010/03/25/twitter-me-xerces/" class="searchmore">Read the Rest...</a><div class="clr"></div>]]></description>
			<content:encoded><![CDATA[<p>Following from the spirit of yesterdays post, little victories&#8230;</p>
<p>Yesterday I managed to download the front page of my website using <a href="http://curl.haxx.se/libcurl/" target="_blank">libcurl</a>. As good as that was as a learning experience, it wasn&#8217;t interesting or useful in the slightest. Today however, I decided to see if I could fetch my status updates from Twitter and display them in a program. So I had a look at the API documentation and it looks quite easy to use, with the exception of OAuth which I&#8217;m yet to get my head around. Thankfully, for now, basic authentication is still supported.</p>
<p>The Twitter API uses the REST (REpresentational State Transfer) paradigm which means there&#8217;s no concept of a <em>state</em> on the server; i.e. each transaction is considered separately. It also means that it uses HTTP, which is pretty simple to understand. Basically in a REST protocol the URI&#8217;s are objects in the system, and the HTTP verbs are how you interact with them. So a GET on a <span style="font-family: Courier New;">http://server/article?name=REST</span> object would download an <em>article</em> named <em>REST</em>. Simple eh? Check <a href="http://www.codeproject.com/KB/architecture/RESTWebServicesPart2.aspx" target="_blank">this article</a> if you&#8217;re interested.</p>
<p>Anyway, onto the meat &#8216;n&#8217; taters. Data in a REST transaction is typically stored as XML or JSON. I considered downloading <a href="http://pyyaml.org/wiki/LibYAML" target="_blank">LibYAML</a> and taking the JSON route but a) I already had <a href="http://xerces.apache.org/xerces-c" target="_blank">Xerces</a>, b) I understand XML more than JSON, and c) I couldn&#8217;t be bothered to learn yet another new thing.</p>
<p><span id="more-58"></span>Xerces is incredibly well written. If you look at the class listings of Xerces or <a href="http://xml.apache.org/xalan-c/" target="_blank">Xalan</a> you&#8217;ll appreciate they&#8217;re both <strong>enormous</strong> and support basically everything. In fact, right out of the box Xerces supports fetching XML documents over the internet using HTTP GET. I chose not to use this purely because I wanted to use libcurl. Thankfully libcurl is surprisingly easy to use:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">static</span> <span style="color: #0000ff;">size_t</span> _CurlWriteCB <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">void</span><span style="color: #000040;">*</span> ptr, <span style="color: #0000ff;">size_t</span> nLen, <span style="color: #0000ff;">size_t</span> cbElem,
                            CMemFile<span style="color: #000040;">*</span> pFile<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">size_t</span> cbSizeAtStart<span style="color: #008080;">;</span>
    <span style="color: #0000ff;">size_t</span> cbSizeAtEnd<span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Write data to file, but measure buffer size before and after.</span>
    cbSizeAtStart <span style="color: #000080;">=</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">size_t</span><span style="color: #008000;">&#41;</span>pFile<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>GetLength <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    pFile<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>Write <span style="color: #008000;">&#40;</span>ptr, <span style="color: #008000;">&#40;</span>UINT<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#40;</span>nLen <span style="color: #000040;">*</span> cbElem<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    cbSizeAtEnd   <span style="color: #000080;">=</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">size_t</span><span style="color: #008000;">&#41;</span>pFile<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>GetLength <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Return the difference in buffer size, i.e. number of bytes written.</span>
    <span style="color: #0000ff;">return</span> <span style="color: #008000;">&#40;</span>cbSizeAtEnd <span style="color: #000040;">-</span> cbSizeAtStart<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
BYTE<span style="color: #000040;">*</span> GetStatusesFromTwitter <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> szUserName, UINT<span style="color: #000040;">&amp;</span> uiSize<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #666666;">// Attempt to initialise curl</span>
    CURL<span style="color: #000040;">*</span> curl <span style="color: #000080;">=</span> curl_easy_init <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>curl <span style="color: #000040;">!</span><span style="color: #000080;">=</span> <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        <span style="color: #666666;">// Set up the http target</span>
        CString strFmt<span style="color: #008080;">;</span>
        strFmt.<span style="color: #007788;">Format</span> <span style="color: #008000;">&#40;</span>IDS_TWITTER_STATUS, szUserName<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_URL, strFmt.<span style="color: #007788;">GetString</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        <span style="color: #666666;">// Save the result into memory for now.</span>
        CMemFile buffer<span style="color: #008080;">;</span>
        curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_WRITEFUNCTION, _CurlWriteCB<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_WRITEDATA,     <span style="color: #000040;">&amp;</span>buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        <span style="color: #666666;">// Attempt to grab the data from Twitter.</span>
        CURLcode res <span style="color: #000080;">=</span> curl_easy_perform <span style="color: #008000;">&#40;</span>curl<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        curl_easy_cleanup <span style="color: #008000;">&#40;</span>curl<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        <span style="color: #666666;">// Return the data.</span>
        <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>res <span style="color: #000080;">==</span> <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
            uiSize <span style="color: #000080;">=</span> <span style="color: #008000;">&#40;</span>UINT<span style="color: #008000;">&#41;</span>buffer.<span style="color: #007788;">GetLength</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
            <span style="color: #0000ff;">return</span> buffer.<span style="color: #007788;">Detach</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #008000;">&#125;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0000ff;">return</span> <span style="color: #0000ff;">NULL</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>The above listing will download a users tweets and store them in a growable buffer (see <a href="http://msdn.microsoft.com/en-us/library/tzdxd4x0.aspx" target="_blank">CMemFile</a>). But now we have to present this to Xerces in a way that it will understand. Thankfully we can supply an arbitrary <a href="http://xerces.apache.org/xerces-c/apiDocs-2/classInputSource.html" target="_blank">InputSource</a> to a <a href="http://xerces.apache.org/xerces-c/apiDocs-2/classXercesDOMParser.html" target="_blank">DOMParser</a>, including one that will <a href="http://xerces.apache.org/xerces-c/apiDocs-2/classMemBufInputSource.html" target="_blank">wrap a piece of memory</a>.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">bool</span> DoGetStatuses <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> szUserName<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #666666;">// Query Twitter</span>
    UINT  uiSize<span style="color: #008080;">;</span>
    BYTE<span style="color: #000040;">*</span> pbData <span style="color: #000080;">=</span> GetStatusesFromTwitter <span style="color: #008000;">&#40;</span>szUserName, uiSize<span style="color: #008000;">&#41;</span>
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>pbData <span style="color: #000080;">==</span> <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span>
        <span style="color: #0000ff;">return</span> <span style="color: #0000ff;">false</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Move the memory into an object Xerces understands.</span>
    MemBufInputSource<span style="color: #000040;">*</span> pDataSrc <span style="color: #000080;">=</span> <span style="color: #0000dd;">new</span> MemBufInputSource
        <span style="color: #008000;">&#40;</span>pbData, uiSize, L<span style="color: #FF0000;">&quot;TwitterXML&quot;</span>, <span style="color: #0000ff;">true</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Parse the data</span>
    XercesDOMParser parser<span style="color: #008080;">;</span>
    parser.<span style="color: #007788;">setValidationScheme</span> <span style="color: #008000;">&#40;</span>XercesDOMParser<span style="color: #008080;">::</span><span style="color: #007788;">Val_Never</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    parser.<span style="color: #007788;">setDoNamespaces</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">false</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    parser.<span style="color: #007788;">setDoSchema</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">false</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    parser.<span style="color: #007788;">setDoValidation</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">false</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    parser.<span style="color: #007788;">parse</span> <span style="color: #008000;">&#40;</span><span style="color: #000040;">*</span><span style="color: #008000;">&#40;</span>InputSource<span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span>pDataSrc<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Get the root node</span>
    DOMDocument<span style="color: #000040;">*</span> pDoc <span style="color: #000080;">=</span> parser.<span style="color: #007788;">getDocument</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">//...</span>
&nbsp;
    <span style="color: #666666;">// Free memory.</span>
    <span style="color: #0000dd;">delete</span> pDataSrc<span style="color: #008080;">;</span>
    <span style="color: #0000ff;">return</span> <span style="color: #0000ff;">true</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>I was especially lazy in that last listing actually, because I told Xerces to <em>adopt</em> my buffer which means it&#8217;ll free it for me when it&#8217;s finished with it. Curiously though, even a memory object needs a <em>system id</em> which is the purpose <span style="font-family: Courier New;">L&#8221;TwitterXML&#8221;</span> serves. With a <a href="http://xerces.apache.org/xerces-c/apiDocs-2/classDOMDocument.html" target="_blank">DOMDocument</a> in memory it was trivial to add the statuses to a list box.</p>
<p><a href="http://www.kierdugan.com/wp/wp-content/uploads/2010/03/XercesTest.jpg"><img class="aligncenter size-medium wp-image-59" title="XercesTest" src="http://www.kierdugan.com/wp/wp-content/uploads/2010/03/XercesTest-300x185.jpg" alt="" width="300" height="185" /></a></p>
<p>I was quite surprised at how complex a task I&#8217;d achieved given the effort I&#8217;d put in; hats off to both Xerces and libcurl. Now that I&#8217;d managed to list my tweets, naturally the next step is to try and submit one! So I made a new dialog for the occasion:</p>
<p><a href="http://www.kierdugan.com/wp/wp-content/uploads/2010/03/XercesTestPost.jpg"><img class="aligncenter size-full wp-image-60" title="XercesTestPost" src="http://www.kierdugan.com/wp/wp-content/uploads/2010/03/XercesTestPost.jpg" alt="" width="210" height="226" /></a></p>
<p>Clicking OK causes some magic to happen:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> CPostStatusDlg<span style="color: #008080;">::</span><span style="color: #007788;">DoStatusUpdate</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">static</span> <span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> cszUrl <span style="color: #000080;">=</span>
        <span style="color: #FF0000;">&quot;http://api.twitter.com/1/statuses/update.xml&quot;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Attempt to initialise cURL.</span>
    CURL<span style="color: #000040;">*</span> curl <span style="color: #000080;">=</span> curl_easy_init <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>curl <span style="color: #000080;">==</span> <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span>
        <span style="color: #0000ff;">return</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Configure the authentication</span>
    CString strFmt<span style="color: #008080;">;</span>
    strFmt.<span style="color: #007788;">Format</span> <span style="color: #008000;">&#40;</span>_T<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;%s:%s&quot;</span><span style="color: #008000;">&#41;</span>, m_strUserName, m_strPassword<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_USERPWD,  <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span>strFmt.<span style="color: #007788;">GetString</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Format the string entire in C form.</span>
    <span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> szStatus <span style="color: #000080;">=</span> curl_easy_escape <span style="color: #008000;">&#40;</span>curl, m_strStatus.<span style="color: #007788;">GetString</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>,
        m_strStatus.<span style="color: #007788;">GetLength</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">char</span> szPostBody<span style="color: #008000;">&#91;</span><span style="color: #0000ff;">BUFSIZ</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
    <span style="color: #0000dd;">sprintf</span> <span style="color: #008000;">&#40;</span>szPostBody, <span style="color: #FF0000;">&quot;status=%s&quot;</span>, szStatus<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    curl_free <span style="color: #008000;">&#40;</span>szStatus<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Set up the HTTP connection and use the POST method</span>
    curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_POST,           <span style="color: #0000dd;">1L</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_POSTFIELDSIZE,  <span style="color: #0000dd;">strlen</span> <span style="color: #008000;">&#40;</span>szPostBody<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_POSTFIELDS,     szPostBody<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Finally, set the callback function and the URL.</span>
    CMemFile buffer<span style="color: #008080;">;</span>
    curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_WRITEFUNCTION, _CurlWriteCB<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_WRITEDATA,     <span style="color: #000040;">&amp;</span>buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    curl_easy_setopt <span style="color: #008000;">&#40;</span>curl, CURLOPT_URL,           cszUrl<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Now we can execute at last!</span>
    <span style="color: #0000ff;">int</span> nResponse <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
    CURLcode res <span style="color: #000080;">=</span> curl_easy_perform <span style="color: #008000;">&#40;</span>curl<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    curl_easy_getinfo <span style="color: #008000;">&#40;</span>curl, CURLINFO_RESPONSE_CODE, <span style="color: #000040;">&amp;</span>nResponse<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    curl_easy_cleanup <span style="color: #008000;">&#40;</span>curl<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #666666;">// Check for success</span>
    <span style="color: #0000ff;">return</span> nResponse<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Boom! Tweet submitted!</p>
<p>In the above listing, the growable buffer and the callback are largely to just eat the output from libcurl because we don&#8217;t really care about it. CMemFile will free the memory it allocated when the function returns too, which saves hassle. I originally wrote all the code listings with Unicode in mind which is why they might appear to be a bit odd. libcurl is an ANSI C library so you may need to convert your strings for it to work. Thankfully Xerces includes <a href="http://xerces.apache.org/xerces-c/apiDocs-2/classXMLString.html" target="_blank">some basic support</a> because it uses Unicode internally.</p>
<p>Little victory indeed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kierdugan.com/2010/03/25/twitter-me-xerces/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I&#8217;ve written a Shell Extension!</title>
		<link>http://www.kierdugan.com/2009/08/27/shell-extension/</link>
		<comments>http://www.kierdugan.com/2009/08/27/shell-extension/#comments</comments>
		<pubDate>Thu, 27 Aug 2009 18:51:53 +0000</pubDate>
		<dc:creator>Kier</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[Explorer]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Shell Development]]></category>

		<guid isPermaLink="false">http://www.kierdugan.com/?p=17</guid>
		<description><![CDATA[Title pretty much says it all. I&#8217;ve written a Shell Extension!
I can&#8217;t remember what I was doing now, but for<a href="http://www.kierdugan.com/2009/08/27/shell-extension/" class="searchmore">Read the Rest...</a><div class="clr"></div>]]></description>
			<content:encoded><![CDATA[<p>Title pretty much says it all. <em>I&#8217;ve written a Shell Extension!</em></p>
<p>I can&#8217;t remember what I was doing now, but for some reason I needed to copy the full path of some file into some program to do some&#8230; thing. I was finding it increasingly annoying that I had to copy the path from the Explorer window, then either hand-transcribe (complete with mistakes) or rename, select all, copy the file name.</p>
<p>&#8220;Why can&#8217;t I just right click and select <em>Copy Filename</em> or something?&#8221; I said to myself, &#8220;I wonder&#8230;&#8221;</p>
<p>So I searched <a title="CodeProject" href="http://www.codeproject.com" target="_blank">CodeProject</a> for some information on how to write my own damn Shell Extension (with Blackjack and Hookers) and stumbled across <a title="The Complete Idiot's Guide to Writing Shell Extensions - Part I" href="http://www.codeproject.com/KB/shell/shellextguide1.aspx" target="_blank">this fantastic article</a> by Michael Dunn. In the space of an hour I had managed to learn enough to make half of my extension: I added a context menu item to Explorer!</p>
<p style="text-align: center;"><a href="http://www.kierdugan.com/wp/wp-content/uploads/2009/08/sillyscreen.jpg"><img class="alignnone size-medium wp-image-19 aligncenter" style="border: 0pt none;" title="CopyExt" src="http://www.kierdugan.com/wp/wp-content/uploads/2009/08/sillyscreen-300x225.jpg" alt="CopyExt working" width="300" height="225" /></a></p>
<p>Now over to MSDN to learn about the Clipboard and history was made. After around an hour and a half I&#8217;d gone from knowing nothing about writing Shell Extensions or using the Clipboard to having a working Shell Extension that used the Clipboard. I love the internet.</p>
<p>This is one of the smallest pieces of code I&#8217;ve ever written and, ironically, one of the few projects I consider myself to have finished. For more information, including how to download it, go to the <a title="CopyExt Homepage" href="http://www.kierdugan.com/programming/copyext" target="_self">CopyExt page</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kierdugan.com/2009/08/27/shell-extension/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Term 2 Over</title>
		<link>http://www.kierdugan.com/2009/03/22/term-2-over/</link>
		<comments>http://www.kierdugan.com/2009/03/22/term-2-over/#comments</comments>
		<pubDate>Sun, 22 Mar 2009 20:05:20 +0000</pubDate>
		<dc:creator>Kier</dc:creator>
				<category><![CDATA[Electronics]]></category>
		<category><![CDATA[Rant]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Uni]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.kierdugan.com/?p=15</guid>
		<description><![CDATA[Wow, I&#8217;ve really been neglecting my website for a while. I had an amazing 1902 bullshit comments from spam bots<a href="http://www.kierdugan.com/2009/03/22/term-2-over/" class="searchmore">Read the Rest...</a><div class="clr"></div>]]></description>
			<content:encoded><![CDATA[<p>Wow, I&#8217;ve really been neglecting my website for a while. I had an amazing 1902 bullshit comments from spam bots or wankers or something. Luckily Wordpress identified all but a hundred or so as completely lacking of any worth and flagged them for manual verification. Fuck that. All 1902 pointless comments deleted by a single click of a button; thank you SQL.</p>
<p>An unfortunate side effect of this atrocity is that you must now register to post comments. Every cloud has a silver lining however: now you shameless, friendless people with an increased sense of self-worth can&#8217;t post plugs to your boring blog that would bring a sloth to tears.</p>
<p>End rant, moving on&#8230;</p>
<p>Finally, D4 (our big project of the year) is done and dusted! It was a pretty interesting couple of weeks, despite being filled with stress, aggrovation and late nights. We had to design a fire detection system that could accurately measure temperature, detect smoke and the presence of people from up to four sensors and display that information on a web interface.<span id="more-15"></span></p>
<p>The basic design consisted of up to four Sensor nodes that contained a thermistor (measures temperature), smoke sensor from a butchered smoke alarm, and a PIR (Passive InfraRed) detector, and the Central Node which then connected to a PC. Each node had to have a memory to store data in the event of a power or communications failure, and the PC was used to control and monitor the whole system. A database (well, CSV file&#8230;) was used to store data from all sensors on the PC and that same file was used by the web interface to display tables and graphs of what was going down.</p>
<p>All in a week&#8230;</p>
<p>Regardless of the crazy timescale and annoying setbacks, we managed to get a fairly good prototype up and running. It was my job to build the Central Node and the software on the PC (named Monitor, see how imaginative I am?) which taught me a lot.</p>
<p>I spent the first day trying to get my unconventional approach to a <a title="AVR USB" href="http://www.obdev.at/products/avrusb/index.html" target="_blank">USB interface</a> working correctly. The documentation is a bit sparse, but fairly good nonetheless. I managed to use Command Endpoint 0 to control the state of some LED&#8217;s with relative ease, but I had a small amount of difficulty using Interrupt-In Endpoint 1 to send data to the PC when I flicked some switches. I got it down eventually though, so I was happy with my first day of work.</p>
<p>Shame this didn&#8217;t continue&#8230; I have no idea what happened to Tuesday and Wednesday. I just remember taking a particularly long walk home on Wednesday evening thinking &#8220;What have I actually achieved?&#8221; I was walking with my housemate, and the weather was good, so it was a good excuse to escape from stress and work.</p>
<p>Thursday was eventful as hell! We arrived in the morning with very small parts of the system working but nothing major going on. Matt and I joined forces. In a single day we had a fully working datapath that could accurately measure temperature to within half a degree Celsius! It also supported the necessary stuff to allow Smoke and PIR warnings to be sent to the computer. I had half written a piece of software to read the data from the Central Node and save it to a file, but it wasn&#8217;t good enough for the specification.</p>
<p>Thursday night was a long bastard. I completely rewrote my broken software into a multithreaded slice of moist cake that handled more than it needed to, just in case. I made it detect the Central Node being unplugged from the computer so it could ASK for lost data! I was, and still am, pretty pleased with it.</p>
<p>Friday, deadline day. First thing in the morning I tested my newly written software, and much to my surprised it worked fairly well. Needed a few changes and tweaks but it performed nicely. By this time we had most of the components we needed, just a case of making the firmware a bit more capable, but we needed to give all the nodes some memory. Our chosen device was a Philips I<sup>2</sup>C EEPROM because we were rocking the serial technologies in this lab. We used</p>
<ul>
<li><strong>USB</strong> as the interface between the Central Node and PC,</li>
<li><strong>UART</strong> to link the Central Node and Sensor Nodes,</li>
<li><strong>SPI</strong> for the Sensor Nodes to control an ADC, and</li>
<li><strong>I<sup>2</sup>C</strong> to give each node some memory.</li>
</ul>
<p>Which turned out to be awesome, grand total of seven pins used on a 28 pin AVR microcontroller!</p>
<p>Matt wrote a memory controller which we spent the morning trying to debug but, unfortunately, couldn&#8217;t get it to work despite using recommended code from Atmel themselves! By lunch it still wasn&#8217;t working so we decided to cut our losses and get a system together.</p>
<p>We had a few hiccups when getting the Sensor to send two bytes of data, but that was my sloppy programming. I tried to compensate for a single error in two places resulting in the same byte being recieved twice. Never mind eh? After that crease was ironed out successfully we had a working data path from the Sensor (didn&#8217;t have time to make another to test our awesome bespoke communications protocol) to the PC software. The web guys brought their software over and the other guys bought the smoke and PIR sensors over and both interfaced nicely. A few small errors, but nothing major.</p>
<p>And that was it! We demonstrated our end-to-end system successfully&#8230; until Brad melted our thermistor! Only a minor setback however, a quick dash to stores by John sorted that out. Now the system was back into it&#8217;s (unfortunately memory lacking) glory and we got some nice ticks in boxes. Job done. How anticlimactic.</p>
<p>Well that was longer than I expected. Easter is upon me now, so I intend to get into the swing of updating this often. Bet that won&#8217;t last long.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kierdugan.com/2009/03/22/term-2-over/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

