Rebooted

Wow, almost two years since I’ve posted here. Seems it’s time for my biennial attempt to maintain a “weblog”.

Things have been going swimmingly, thanks for asking. I’ve gotten older and wiser, I’ve moved to Agworld (a Ruby on Rails shop), and I’ve written a few technical blog pieces over there, at the Agworld Devblog (under the pseudonym of “Jason Hutchens”).

Apart from that, I’m still pretty interested in making games and working on other side-projects. And although I find myself surrounded by tempting triviality and mundanity, I try my best to avoid it.

I would like, I think, to write longer opinionated rants. I might do that at some time. We’ll see how it goes.

Just testing out syntax highlighting of some mundane Ruby code folks. Nothing to see here.

TTM: Time To MegaHAL

It seems that my Time To MegaHAL, or TTM, is about six weeks.

I’ll explain. What generally happens is this:

  1. I start a new job, and meet a whole new bunch of people.
  2. After about six weeks, someone suddenly mentions “hey, you’re that MegaHAL guy”!

It happened just today. My new boss, at a job I started six weeks ago, suddenly made the connection when a good friend of his published an interesting post about cobe, an optimised and much improved reverse-engineered rewrite of MegaHAL, written in Python and using SQLite as a storage back-end. A nice piece of work, although I’m a bit ashamed that someone had to wade through my 16-year-old vanilla C code to try to figure our what the bloody hell is going on :)

Of course, this will probably help to knock me out of my summertime hiatus. Things have been quite on the blogging front lately, what with moving into our new house just prior to Christmas and starting a new job first thing in the New Year. Now that our new place is looking more like a home and less like a cardboard box wholesaler, it’s time to sit back, take stock and figure out what’s next.

For me, that’s as follows:

Although, really, the first item on the list should really be “get more sleep, and exercise”.

Exciting times!

MegaHAL

One of my sabbatical projects is MegaHAL.10, an entirely new version of the mildly popular chatterbot that I wrote and put online fifteen years ago. I’ve been writing it in Google’s Go programming language, and I recently started getting some exciting results.

MegaHAL.10 generates sentences using a second-order Markov model. That means they tend to be random walks; you start out with a blank slate, and you never know where you’ll end up. The only guarantee is that any sequence of three symbols will have previously been observed by the model.

These kinds of random generations may be quite amusing, but they’re not too useful when you’re trying to simulate a conversation, which requires that each generation adheres to the current context. What would be much better is if you could force the model to generate a sentence that has the desired Markovian properties, but which contains pre-determined keywords.

With the original MegaHAL I did this by starting at the keyword and using two Markov models to generate the sentence, one proceeding from the keywords forwards to the end of the sentence, and the other going in the opposite direction. That was a bit of a hack, and could only guarantee that the sentence contains a single keyword.

For MegaHAL.10 I’ve been experimenting with other strategies, including a Fractal slot-filling model and a family of Markov models, each of which generates towards a specified target symbol. Both of these took a template sentence as input, which specified the keywords and their order, and both suffered from several problems, such as the fact that it may be impossible to complete the template, and that the local grammaticalness condition may need to be violated to make the keywords join up.

Recently I had a brainwave: what if we generated with the standard Markov model, but weighted the generations based on another model that specified the distance to the desired keyword. We could then generate a directed random walk, similar to an A* pathfinding algorithm. We may not be able to guarantee that the resulting sentence contains the keywords we want, but chances are it will contain at least a few.

Some bugs still need to be ironed out (and some wrinkles squished), but my tests so far indicate that this method may be the missing link. I trained MegaHAL.10 on the entire collection of Sherlock Holmes stories, and then I asked it to generate a random sentence, specifying these ten keywords in advance:

adventure foot wild scandal street twenty death finger villain rock

This is the sentence I got (I’ve highlighted the keywords):

Twenty had changed their position, and the street by the death of the adventure of the ROCK OF GIBRALTAR. From the dead scientist.  His wild flight to escape from her finger into the garden.  Perhaps the villain was able to take counsel how far we can hardly put my foot over the case of this question. I saw that a very remarkable man, that was this William doing there is to avoid scandal.”

To me that’s completely amazing. “Give me a sentence containing these ten words”, I said, “making sure that each sequence of three words in your sentence appears somewhere in the collected Sherlock Holmes stories.” And MegaHAL.10 delivered.

Just to prove it wasn’t a fluke, here are a few more, using the same keywords:

Twenty or thirty, I have seen death in the street by a strong family resemblance about misdeeds, and if you know now and so I dipped my finger on the ledge above the other hand had dropped his mask of his foot than might at any time before he was wild, wayward, and to our young companion looked at the young diplomatist’s window. There was this dark business which had been in the young lady.”

Villain! Scoundrel!” cried the old quarters. At the foot of the death of the street. Holmes looked eagerly round for it was twenty before her in the dead scientist.  His wild eyes, stained cheeks and He put a triumphant conclusion.

Death from some little distance down the street by a very simple chain: 1. You had a wild thrill as this was at the foot of the rock on each side of the adventure as a great scandal threatened to raise a scandal may well be cleared along those lines. Mr.  Douglas has spent a large coil from under the three strips. No finger impressions–no! Well, he is not here as I was at the other night.”

I’m planning to spend next week improving this algorithm, fixing various bugs, and optimising for memory and speed. I’ll then set to work creating a web version of MegaHAL.10, which will hopefully be going live before the end of the month.

EA Spam

Please, EA, please stop spamming my email account!

Yes, I know that I signed up to receive updates about Spore. Amazingly, I did that on Friday the 20th of May 2005. Was it really that long ago?

It took almost three years for you to send me the first email about Spore. Since then, I’ve received the odd email now and again. I didn’t mind. After all, I signed up for it.

Then, in March, you send me a non-Spore email. So I decided to unsubscribe. I really wasn’t interested in hearing any more about Spore (after all, that train’s left the station), and I certainly wasn’t interested in receiving advertising about anything else.

Seven months later and I received another email. I unsubscribed again, confirming that I was receiving these emails because I’d signed up for Spore updates.

Two weeks later, another one. Unsubscribed for a third time. This is getting annoying.

And now, six days later, yet another email. For this you must die. Metaphorically. So let’s see:

  • The Sims 3 is Available on Consoles Now! I do not care!
  • If you no longer want us to contact you, click here to be removed from our mailing list. Consider it done!
  • Confirm that I want to be removed from all EA communications. With pleasure!
  • By opting out I won’t receive updates about Spore. Am I sure? You bet I am!
  • Confirm unsubscription by logging in to my EA profile. Confirmed!

Come on, you scum, you just try sending me one more.

Anyone else getting spammed by EA after they’ve gone through the unsubscription process?

Virtual Reality

In 1992, a couple of years before the Web started to gain traction, Virtual Reality was the new hotness.

I was a second year Engineering student, and encouraged the head of one of the research groups on campus to allow me and two fellow students to work on a VR project over the summer break for credit.

We improvised a poor-man’s VR helmet (a stack-hat) with a poor man’s stereoscopic HUD (two video camera viewfinders) and poor-man’s motion tracking (two POTs on the helmet for 2 DOF head tracking, and a Nintendo Power Glove). All of this was driven by two Amiga 500 computers, one for each eye, using custom software written in AMOS and AMOS 3D. One of the most challenging parts of the project was synchronising the displays via a null-modem link, and reading the POTs, which had to be timed off the vertical scan (something arcane like waiting until the electron beam of the display reached the 7th line, and then reading some register or other before it reached the 12th line).

We were excited, but, in truth, the project sucked balls.

While working on that project, we became involved with the Perth Virtual Reality Interest Group, which held meetings in Tech Park (Enterprise Building 3, funnily enough, where I returned 15 years later – ack – to work at Interzone). The SIG organised a special, private viewing of the Virtuality Arcade Machine (which also used the Amiga computer) when it made a brief appearance in the Perth Myer store.

I remember being very excited by the potential of VR at the time, in part due to a documentary that aired on TV that featured Marvin Minsky, Jaron Lanier, William Gibson and Tomothy Leary. It’s funny and embarrassing in hindsight, but it was a strange time, with people absolutely convinced that VR would be the future of entertainment, medical imaging and stock market manipulation.

I wonder what the modern equivalent of VR is? Social gaming, perhaps?

A Cunning Plan

Hmmm… eight days since my last post. Looks like I’ve fallen off the wagon. For a while there I actually had a backlog of half-a-dozen posts ready to be published, and it was great. Writing is like exercising; it’s hard to start, but it feels fantastic once you’ve built up a rhythm. Take this, then, as an attempt to re-start.

D. and I have been watching BlackAdder. I received the “ultimate collection” box set for Father’s Day, and we’re working our way through the episodes, watching everything twice (to hear the commentary), and thoroughly enjoying it.

Ben Elton did a fantastic job of re-imagining BlackAdder when Richard Curtis invited him to join the writing team for the second season, suggesting that the characters of Edmund and Baldrick should be swapped. Each episode was filmed in under two hours in front of a studio audience (apart from, of course, location footage).

Channel 7 co-financed the first season, which was the most expensive to make, but pulled out from later seasons. Nine years later, after all four seasons of BlackAdder had been released, Channel 7 created an embarrassing BlackAdder rip-off called Bligh, which starred Michael Veitch from Fast Forward. I can’t find any footage of it online; I can only remember it being very, very bad.

I find it inspiring that it’s possible to create a long-lasting piece of television in such a short amount of time. It seems that the advice to follow is do what you love, don’t compromise on quality and just damn well get it done. Speaking of which…

Go

Yesterday I posted the following on Twitter:

Why is the #golang community so much more holier-than-thou than the #ruby community? Harumph to all sanctimonious hackers.

This got a concerned response from someone who works at Google as a “Go Gopher”, whatever that means. It also made me feel like a dirty troll, which wasn’t my intention at all: I was just venting after a frustrating day.

When Google’s Go Programming Language was announced late last year I was fascinated. I loved the decisions they’d made regarding code formatting (the language enforces a particular standard, making debates about the right way of formatting code a moot point), dependency management (it’s a compile-time error to include something that you don’t use; streamlining your dependencies to speed up builds in C++ is a nightmare by comparison), and language features (defer, iota, interfaces and goroutines are all very cool). I read through the documentation, watched the videos and toyed around with the language a bit, but I didn’t dive in and get my hands (really) dirty straight away.

Recently, I’ve started working in earnest on the back-end for MegaHAL.10, an online chatterbot that can learn to talk in any language by example. That project has a few requirements that make Go an ideal language for the server-side code, including:

  1. The need to be fast, and to manipulate a lot of data in memory. This makes C or C++ an obvious choice, and discounts languages such as Ruby or Python. Go is a lot closer to C++ in terms of efficiency, so it’s a candidate for the implementation language.
  2. The need to work with all human languages, meaning it’s important that the chatterbot engine supports UTF-8 end-to-end. All strings in Go are UTF-8 by design, whereas it’s fiddly to properly support UTF-8 in C++.
  3. The server will need to handle many simultaneous requests, and the concept of goroutines is ideally suited to satisfying this requirement.

So I started coding things up, and eventually ran into some problems that I couldn’t answer by reading through the provided documentation.

For example, consider the provided documentation for the sort package. The description is a brief “the sort package provides primitives for sorting arrays and user-defined collections”. Just what I need! Unfortunately, the documentation for the Sort method is no more than its specification, “func Sort(data Interface)“. That is all.

You can click through to see the implementation of Sort, revealing that it just calls through to quickSort. Great! I’ll look at the documentation for that. Unfortunately, there is none; quickSort is private (because it starts with a lowercase character), meaning that it doesn’t appear in the documentation at all, although its implementation is right there in the source.

So they’re hiding stuff from you in the docs, while at the same time recommending that you treat the source as documentation.

Now, this is a contrived example. I didn’t really need to look up documentation for Sort, but I did have several problems of this category with other packages and functions that I wanted to use. What I’m trying to illustrate here is how difficult they’ve made it for a newcomer to the language to get acquainted with things. Contrast this, for example, with the documentation for sort in Ruby:

Returns a new array created by sorting self. Comparisons for the sort will be done using the <=> operator or using an optional code block. The block implements a comparison between a and b, returning -1, 0, or +1. See also Enumerable#sort_by.

It’s succinct and easy to understand, with clear examples and cross references to related functions. Exactly what you need to dive in and get started on something.

I’ve just finished working on FAQoverflow, a fun little project that was implemented in two weeks using Ruby, and I rarely needed to look outside of the provided documentation when writing a fairly complicated API spider. When I did, I found the community polite, welcoming and useful. Consider, for example, this question to ruby-talk, made soon after the first version of Ruby was released in the west:

Ok, I’m having trouble with an extremely simple class. Here’s my example:

This got an immediate reply from Matz, the creator of the language:

Example:

Now, that’s really helpful! Here’ by contrast, is the first question I ever saw on golang-nuts (when searching to find out how the ternary operator worked):

How can you call your self a C-like language and NOT have the ternary operator? But seriously, why isn’t it in Go? This could be a deal-breaker for me, as it’s often more succinct and clear to use a ternary operator than an if/else, *especially* if you require curly braces even for single-line blocks.

And here’s the first answer:

Then “go” write your own language or use a language that makes use of your precious ternary operator.

Ouch! That answered my question, but it certainly rubbed me the wrong way.

When you’re deeply focused on writing code, and jump to a browser to do a quick search, and the first thing you hit is a mean-spirited reply to a reasonable question, then it breaks your stride and leaves you with a foul taste in your mouth.

Of course, if this just happened a few times it wouldn’t be a problem. What concerned me is that throughout the day, whenever I searched for an answer to a question I had about Go, I always got a mean-spirited answer to a fair question. Sure, I admit I might have had an unlucky streak, and I admit that I usually read the first reply to each question due to my familiarity with Stack Overflow, whereby the first reply tends to be the best. But I just don’t encounter that with the Ruby community. And it was enough to make me stop looking to the forums for answers, which is a shame.

Some more examples of less-than-useful answers (paraphrased) that I encountered on the day I was working with Go:

“Having optional arguments to functions would be useful.”

“In the rare instances you need them, make wrapper functions and give them unique names.”

Coming from Python and Ruby, I’d have to say that optional arguments, or arguments with sensible defaults, are far from a rarity, and are often extremely useful. And, no, I don’t believe that it’s good practice to create eight differently-named wrappers for a function that you’d like to have three optional arguments.

“Why does strconv have functions to parse int and int64 but not int32?”

“Because making a function for every possible integer type is tedious and clutters the interface.”

Clutters the interface? Really? Do you mean that the generated documentation gets a bit longer, meaning you have to scroll through it? Do you mean that you really shouldn’t need to specifically convert int8 and int32 from their string representation, ever? Or that if you do you should write that functionality yourself?

“I was following the tutorial on the website, and it said… (a simple misunderstanding)?”

” I don’t get the point.”

Thanks for sharing, but why not correct the obvious misunderstanding or be more specific about what it is that you don’t get?

“The library code is … hard to learn from. I’d rather see short example programs.”

“Rule one of problem solving: break it down into smaller problems, see rule zero.”

Right, so rather than provide short, concise usage examples in the documentation, you want me to dive into the source and spend time understanding the implementation of the functions I want to use?

“I’m sure this is totally obvious, but I can’t seem to find it anywhere in the docs…”

“One way to get answers to questions is to search the golang-nuts mailing list.”

Hmmm… dare I ask any questions at all?

Now, yes, perhaps I’m being harsh. But when you’re struggling with something unfamiliar and new, it’s a godsend to find a superb resources such as Programming Ruby or The Ruby User’s Guide or why’s (poignant) guide to Ruby or the Standard Library Documentation, and these resources exist because the community cared enough to want to help newbies to understand why they fell in love with the language. Guess what? It works.

I just don’t get that feeling with the Go community at all. At the moment, it feels defensive and argumentative. Now, it may well be that Go is suited to a different class of problems than Ruby, and that the language is therefore going to be niche, appealing to systems programmers  only, and that it’s still immature and not really suited to production code. Fine. My problem is that it sure wasn’t marketed that way. The Go team claim that their language is an expressive, concise, clean, and efficient language designed to make programmers more productive. Sounds good to me! Now if only I could learn to use it as such without banging my head against defensive, unhelpful, critical answers on golang-nuts.

So, Go Community, where should I be looking?