Tag: Programming

SimpleHLST – Part 2: Lexical Analysis

SimpleHLST needs to read in some code, written by humans, to begin generating data structures that we can actually work with. This can take up to three stages:

  • Lexical Analysis – convert the input text into a list of tokens that distinguishes language features from whitespace, comments, pre-processor commands, etc. (“Unknown character on line 2″ errors are produced by this stage).
  • Syntax Analysis – convert the list of tokens into a syntax tree which shows the entire file hierarchically as perfectly valid expressions (this is where “Missing semicolon on line 4″ type errors come from).
  • Semantic Analysis – process the syntax tree to make sure that it makes sense according to the rules of the language. Type checking falls under this category for instance, so this is where “warning, assigning ‘int’ to ’short’ – possible loss of data” type errors come from.

SimpleHLST only needs a very basic language and I’m not planning to have it support complex data, so I don’t think we’ll need a semantic analysis stage at the moment.

An example

Suppose we have the following mathematical expression, f(t) = 2t + k, and we want to implement it in some program; we could write something like

f(t) = 2*t + k.

Before any more advanced stages can take place, the compiler must extract the information from this expression. This is exactly the same principle as we humans extracting information from text. We have languages that each have a vocabulary of words with specific definitions which we can use to convey detailed information. Each word only represents a small piece of information; the context and sentence structure contains the rest.

Programming languages are exactly the same and this is where Lexical Analysis (or, colloquially, Tokenisation) gets its name from. Programming languages tend to use symbols, or tokens, instead of words for brevity. The output of this stage would be an array that looks somewhat like the following table.

Index

Type

Data

Index

Type

Data

0

Char

f

5

Number

2

1

Symbol

(

6

Symbol

*

2

Char

t

7

Char

t

3

Symbol

)

8

Symbol

+

4

Symbol

=

9

Char

k

Now the parser can operate on this set of tokens instead of having to wade through the text itself. This example is, obviously, very simplistic and having a separate tokenising and parsing stages may even complicate things, but it is incredibly useful for handling more complex languages such as C and Verilog. (continue reading…)


PowerPython… or PythonPoint… or something

I’ve been meaning to update this for a fair while now as an uncharacteristically large amount of stuff has happened. Since exams finished I’ve managed to get a job at ECS working for two of my lecturers on two separate projects, which is pretty good because it means my work is varied. Both are IC design projects though, so there is a similar vein running through them.

One of my minor duties on this dual-job is to assemble slides from about twelve people into a large presentation, with cover slides for each speaker, every Friday for a progress meeting we all have. Naturally the first Friday I just did it by hand by importing each one in turn into PowerPoint. However it is a fairly tedious job, and to paraphrase a certain member of staff: why do something by hand when I have a powerful computer under the desk?

So I began to investigate automating the process.

Turns out that Python has an IMAP module in its standard library, which isn’t too surprising I suppose as the Python standard library is enormous. After some playing I managed to write a program that logged into my university email account and downloaded the appropriate PowerPoint attachments.

(continue reading…)


Twitter Me Xerces!

Following from the spirit of yesterdays post, little victories…

Yesterday I managed to download the front page of my website using libcurl. As good as that was as a learning experience, it wasn’t interesting or useful in the slightest. Today however, I decided to see if I could fetch my status updates from Twitter and display them in a program. So I had a look at the API documentation and it looks quite easy to use, with the exception of OAuth which I’m yet to get my head around. Thankfully, for now, basic authentication is still supported.

The Twitter API uses the REST (REpresentational State Transfer) paradigm which means there’s no concept of a state on the server; i.e. each transaction is considered separately. It also means that it uses HTTP, which is pretty simple to understand. Basically in a REST protocol the URI’s are objects in the system, and the HTTP verbs are how you interact with them. So a GET on a http://server/article?name=REST object would download an article named REST. Simple eh? Check this article if you’re interested.

Anyway, onto the meat ‘n’ taters. Data in a REST transaction is typically stored as XML or JSON. I considered downloading LibYAML and taking the JSON route but a) I already had Xerces, b) I understand XML more than JSON, and c) I couldn’t be bothered to learn yet another new thing.

(continue reading…)


I’ve written a Shell Extension!

Title pretty much says it all. I’ve written a Shell Extension!

I can’t remember what I was doing now, but for some reason I needed to copy the full path of some file into some program to do some… thing. I was finding it increasingly annoying that I had to copy the path from the Explorer window, then either hand-transcribe (complete with mistakes) or rename, select all, copy the file name.

“Why can’t I just right click and select Copy Filename or something?” I said to myself, “I wonder…”

So I searched CodeProject for some information on how to write my own damn Shell Extension (with Blackjack and Hookers) and stumbled across this fantastic article by Michael Dunn. In the space of an hour I had managed to learn enough to make half of my extension: I added a context menu item to Explorer!

CopyExt working

Now over to MSDN to learn about the Clipboard and history was made. After around an hour and a half I’d gone from knowing nothing about writing Shell Extensions or using the Clipboard to having a working Shell Extension that used the Clipboard. I love the internet.

This is one of the smallest pieces of code I’ve ever written and, ironically, one of the few projects I consider myself to have finished. For more information, including how to download it, go to the CopyExt page.


Term 2 Over

Wow, I’ve really been neglecting my website for a while. I had an amazing 1902 bullshit comments from spam bots or wankers or something. Luckily Wordpress identified all but a hundred or so as completely lacking of any worth and flagged them for manual verification. Fuck that. All 1902 pointless comments deleted by a single click of a button; thank you SQL.

An unfortunate side effect of this atrocity is that you must now register to post comments. Every cloud has a silver lining however: now you shameless, friendless people with an increased sense of self-worth can’t post plugs to your boring blog that would bring a sloth to tears.

End rant, moving on…

Finally, D4 (our big project of the year) is done and dusted! It was a pretty interesting couple of weeks, despite being filled with stress, aggrovation and late nights. We had to design a fire detection system that could accurately measure temperature, detect smoke and the presence of people from up to four sensors and display that information on a web interface. (continue reading…)


Archives

Categories

Recent Comments

Copyright © 2008-2011 Kier Dugan. All rights reserved.
Jarrah theme by Templates Next | Powered by WordPress