Posted by admin on the 8th of February, 2009 at 9:43 am under Programming.    This post has Comments.

One of Clojure’s biggest strengths is that it is backed by the JVM, and has good interoperability with the Java libraries.  When I needed to implement a simulation with events that should be executed in order of their timestamp, I was immediately tempted to use Java’s PriorityBlockingQueue, since neither sorted-map nor sorted-set supported two events with the same time stamp (though I could have used some magic to make it work, but decided simplicity was more important).

I was generating events with several types, and under certain circumstances, events that mapped :type to :arrival needed to be removed because they were no longer valid given the state of the simulation.  Easy enough:

(filter #(not= (:type %) :arrival) event-queue)

But, PriorityBlockingQueue is mutable (oh, and I’d put it in a ref, to make things more interesting), so I’d need to clear it after I did the filter, before repopulating it with the filtered events.  This is a simpler version of the final code:

(let [elements (filter #(not= (:type %) :arrival) @event-queue)]
(.clear @event-queue)
(.addAll @event-queue elements))

So, when I ran this, I kept getting NullPointerException.  Clojure is not well known for its uber-informative stack traces, so I flailed with this for a few hours, looking at it on and off while working on other parts of the code.  Then it hit me: filter is lazy, so the filtering of the elements is not happening until I’m calling in the filter when I make the addAll call.  By that time, @event-queue is empty, so the list of results is nil, which will throw a NullPointerException when passed to the addAll call.  The easy way to fix this is to wrap the call in a doall function, which forces the computation:

(doall (filter #(not= (:type %) :arrival) event-queue))

The moral?  Even though Clojure has good support of Java collections, be wary when you use mutable data structures with the standard Clojure seq functions; many of them are lazy, and unless you know they won’t change, you can get subtle temporal errors creeping into your code.  I got lucky that addAll threw the exception.  If it hadn’t my simulation would have ended and I would have spent a lot more time trying to figure out why.

Posted by Rick on the 4th of February, 2009 at 10:59 pm under Programming and web.    This post has Comments.

Introduction

When I program, I sometimes am fortunate enough to see beauty emerge from what I do. I love programming because of this, but, enthusiastic as I am about it, I find that it is very hard to convey the essence of what I see to others, which often means that they cannot understand what I am so enthusiastic about!

So, I would like to discuss one of my favorite idioms in programming, but before I do so, I would like to discuss something you might be familiar with: organizing information. In the context of information organization, I think my example makes sense, and the beauty arises, and I don’t have to discuss programming at all. This is ideal, because the beauty in programming has nothing to do with programming at all: it has to do with patterns of thought. But I’ve already delayed too long; off we go!

Organizing Information: Hierarchies

In our struggle to store and retrieve information, we have developed systems to help us organize it all. Probably the most prevalent idea in these systems is the notion of a hierarchy. We see this everywhere, from filing cabinets, to file systems on a computer, to organization of library books, to notes taken in an outline form, to the system for classifying life on Earth, and even in this very essay. The idea is simple: establish a set of top-level categories. Each of these can contain either more categories, or items. Thus, any given item has a location in the hierarchy that is defined by all of its parent categories.

This idea of classifying information in a hierarchy is so pervasive that it comes completely naturally to our way of thinking. It has three main strengths:

  1. It allows us to locate information using a broad idea, and narrowing that idea until we finally locate the information we seek
  2. It allows us to easily add new information in the structure, because we have a pretty good idea of where things should go
  3. It allows us to reorganize the ideas, because we can take sub-categories and change their parents as necessary

But it has one major weakness: if a piece of information has multiple facets, it can really only exist in one place in the hierarchy, so users of the system are forced to choose. There is a rich history of developing add-on aspects to hierarchical systems to fix this weakness. In computer file systems used by Linux, Solaris, BSD or Apple’s OS X, for example, a user can establish links in the file hierarchy to allow a file to exist in multiple locations at once. This would be a bit like putting a yellow sticky note in a filing cabinet in the “Hobbies” section that says “Go look at the Automobiles section for info on my hobby sports car.” This works fairly well, but an unsuspecting user might remove the file on the sports car after it got totaled, only to find that little sticky later. Meanwhile, if your brother looked under “Hobbies” to find info on the sports car, he’d be confused when he couldn’t find it where the note indicated. So, it makes maintenance of the system a bit harder, and can confuse users.

Organizing Information: Tags

Nevertheless, hierarchical systems have such great advantages that they are still the most prevalent approach to solving the problem of information organization. Recently, however, another notion has come along: tagging. Tagging is a simple system: each item of data can be assigned a tag, or keyword, that describes some broader category to which it belongs. Each item can have an unlimited number of tags. Going back to our example, the information on the sports car could be tagged “Automobile” and “Hobby”.

This mechanism for organizing information is used in several well-known environments: Google uses it in their bookmarks implementation, Facebook uses it to organize photos and notes, Flickr uses it for photo organization, and many news sites (like Digg and Slashdot) tag news stories. The advantages are obvious: tags allow a fragment of information to exist in multiple places in the hierarchy. But wait, where is the hierarchy now?

The first question many newcomers to tagging ask is how to really organize things. As soon as you try to establish a tag hierarchy, you start to notice the problem. Suppose you had a tag “Cars” and then another for “Sports Cars”. If you wanted to add some information about the discontinuation of the Dodge Viper into your system, what tags would you add? Both “Cars” and “Sports Cars”? Or maybe just “Sports Cars”, since that implies “Cars”? But then, when someone looked for all information on “Cars”, the info on the Viper wouldn’t show up, which doesn’t seem right. Then again, is this system going to require that we add every possible tag when we add information? That seems cumbersome.

So, in exchange for solving a problem, tagging introduces another. We can now freely associate bits of information with tags, but the tags have no hierarchy. When we look back at the strengths of a hierarchical system, we see that we can still locate information using a broad idea. If we look at all information tagged “Cars”, we can then see that many of those items are also tagged “Sports Cars”, so we develop a sort of emergent hierarchy. This concept is often referred to as a “tag cloud“. So, at least tagging systems allow to locate information in much the same way as we could using a hierarchy, and they have the added bonus of allowing us to file data in multiple places.

We’ve said that hierarchical systems are good at adding data. Unfortunately, adding information to a tagged system can be difficult. For each item we want to add to the system, we have to identify all relevant tags when we insert the information. This means not only knowing all the relevant tags in the system, but also anticipating whether any new tags are needed. This could be even more work that creating the data we wanted to insert in the first place!

Finally, we’ve said that hierarchies are good because they allow us to reorganize information. This can also be difficult when we use tags. In addition to collecting cars, we also start collecting motorcycles. So, we’d like to introduce the notion of “Vehicles” to our system, and have it apply to all of our cars, along with the new motorcycles we’ll be adding. In a hierarchy, you’d simply create the “Vehicles” category and put “Cars” into it. With tags, we have to go back and add the tag “Vehicles” to all items tagged with “Cars”. Laborious, to say the least!

Are tags really that bad, or have we missed something? Do we need to build some tools to automate all the cumbersome aspects of using tags? Or is there some underlying truth here that we have failed to recognize?

“Simplicity does not precede complexity, but follows it.” -Alan Perlis

In many tag systems used on the internet, the problems I’ve outlined above remain. The main reason is that it is difficult to formulate elegant systems to address these problems. The primary job of a computer programmer (whether they know it or not!), is to take problems like the one I described above and try to find the underlying model that should be used. You might get an itch in the back of your mind that something is wrong with what I described: tags seems like they should work, but for some reason are difficult to use.

Your first instinct may be to write a program that will automate the more cumbersome aspects of using tags. For example, in the case of adding a “Vehicles” tag, you could just write a program that would grab all the items tagged “Cars” and add “Vehicles” to them. Seems simple enough, and saves work. But it isn’t beautiful. Over time, you’d write a lot of tools to help you perform tasks like that, and the system would become complex. We wanted a simple system when we started using tags, not something that would be hard to understand and use. The key question that is always on my mind is: How can we retain simplicity?

In the systems we’ve been discussing, we have defined two elements. In the hierarchical system, we defined categories and items. In the tagged system, we defined items and tags. What if we tried to define the system using only one element? What if, in the tagged system, we defined tags as items, and items as tags? How would this change its utility?

Such a change would mean that items could be tagged (as before), but also that tags could be tagged, since they are also items. Like many subtle changes, it is hard to see how this small modification would be useful. In fact, it allows us to enjoy the best aspects of both a hierarchical and a tagged approach to organization, and it means the entire system is only composed of one element: a data item. What we’ve done is change how these items relate to one another, and that enables new patterns of thought and organization.

Going back to the example of the sports car that is both a “Car” and a “Hobby”, we can see that this system would support both through the normal use of tagging. But what about our example of “Cars” and “Sports Cars”? Well, in our new system, as soon as we create the data on the Dodge Viper being discontinued, we can simply tag it with “Sports Cars”. Why don’t we have to worry about tagging it “Cars” as well? Well, as soon as we create the tag “Sports Cars”, we tag it with “Cars”. So, now we have embedded the knowledge that “Sports Cars” and “Cars” are related in the tags themselves, rather than in each item using those tags.

Using this idiom, it is easy to see that hierarchies naturally emerge from the structure, but can be more complex than simple trees, due to the nature of tagging. Now, tags can relate themselves to multiple other tags at different points in the tag hierarchy, as can items, because items are tags. Likewise, if you want to provide some notes on a tag, you simply add data to it, because the tag is a data item. Reorganization of the hierarchy is easy - if you want to introduce the notion of “Vehicles”, simply tag “Cars” and “Motorcycles” with “Vehicles”, and you’re done. All items with those tags (or tags tagged with those tags, etc.) will now be tied to the notion of “Vehicles”.

This idea, though simple, may be confusing at first.  Here’s another way to think about it: instead of tagging items with tags, we tag items with other items.  So, if we’re in a filing system, we tag files with other files.  If we organize photos, we tag photos with other photos.  In reality, we have reduced our system of organization to its very simplest form by allowing ourselves only one action: we can establish connections among various members of a homogeneous collection of data.  Now, each member of that collection will ultimately have both data, and connections to other pieces of data.

If this sounds familiar, that’s because it is: it is the exact methodology for organizing the internet.  Web pages sometimes contain data, and they sometimes contain references to other webpages.  Most webpages have both, even in walled gardens like Wikipedia.

Conclusion

By reducing the number of elements in our system, we increased its power, but retained its simplicity.  Our new system has all the benefits of the hierarchical system, as well as supporting all the benefits of tagging.

While I decided to write this and relate the ideas to programming, I got the idea of “data as tags” and “tags as data” from TiddlyWiki, which I used to compose this, as well. If you want to see the system I described in action, visit the TiddlyWiki homepage and download a copy and play with the tagging system: it is an elegant and powerful approach to managing lots of information.

Posted by Rick on the 22nd of January, 2009 at 12:41 am under howto and web.    This post has Comments.

Though perputually on the verge of total disorganization in the physical realm, I try to keep my digital world very organized.  Because of this, I habitually try to move more and more of my life into a digital representation, which allows easy backup, searching and reorganization.  Although I have tried many free and proprietary software products to act as a central hub for storing and retrieving random bits of useful information, even the very best (Emacs Org Mode) has failed to gain traction in my life on a long term basis because it is not immediately accessible to me when its underlying framework is not present (Emacs).  Of course, Org Mode files are just text, but much of the file’s appeal is lost in a pure text mode: Emacs is required to really wrangle all the data present in the text file.  I have used other tools, like DevonThink on OS X, Google Notebook, Google Calendar, iCal, various GTD systems, several Emacs-based packages, and Evernote.  All either store my data remotely, aren’t workable on Windows, OS X and GNU/Linux, or require a sizeable framework to be present whereever I want to view/alter my data.  Each of these shortcomings causes me to shy away from really “investing” in the system.

In the past few months, however, I have rediscovered TiddlyWiki, which, despite its name, is a very serious piece of software.  Though written only in Javascript and HTML (to run in a browser), it is a serious software engineering effort, and has a large community doing active development of many different variants.  The purpose of this post is to provide a look at what TiddlyWiki has to offer, so as to assist potential new users (you!) in determining if it might be useful to organize information in their lives.

TiddlyWiki is a Wiki

This means in is accessed in a web browser that you already have on your computer.  It works best in Firefox, but can be used in Safari, Internet Explorer, and Opera as well.  TiddlyWiki works an whatever operating sytem you have.

This means it has “pages”, as seen in other wikis (like Wikipedia), except instead of entire webpages, TiddlyWiki uses Tiddlers: small (or not-so-small) chunks of formatted text about a particular topic.

This also means it uses hypertext, so you get both internal (links between tiddlers) and external (links to other websites) linking, as well as formatted text: bold, underline, italic, monospace fonts, numbered and bulleted lists.  The linking feature is critical, because it allows you to link tiddlers to each other in the same way the internet links webpages.  You can use this mechanism to organize your data as you see fit - whether that be in a hierarchy (like the files on your computer) or in some web that more closely resembles Wikipedia.  You are in complete control.

TiddlyWiki is Just a File

This means it resides on your hard drive or USB stick just like any other file.  You can copy it, give it to others, post it online, or carry it around with you just like any other file.  To open it, you just double cilck on it and it will open in your browser.

TiddlyWiki Supports Metadata

All I mean by this is that it allows you to tag tiddlers (pages) with their topic.  If you’re a programmer, you might tag tiddlers with the language they relate to.  If you’re a journalist, you can tag a tiddler with the article it relates to, or the source it came from.  Tiddlers can have as many tags as you like.

TiddlyWiki Can Search

As you grow the content in your TiddlyWiki (by adding new Tiddlers), you may start to worry that while it is really easy to add information to the TiddlyWiki, retrieving it may be difficult.  Where is that darn tiddler that contained the dentist’s address and office hours?  Even if you can’t remember what you called it (was it under “Dentist” or “Medical”?) so long as you used the word “dentist” anywhere in the tiddler, you can simply type “dentist” into the global search box, and, as you type, TiddlyWiki will search every word you have stored in it and show you all the tiddlers that contain that keyword.  If all you can remember is that you had the words “office hours” in the tiddler, just type that and you’ll track your data down in no time.

You can try this on the TiddlyWiki homepage (which is itself a TiddlyWiki!).  Upon visiting the page, you have no idea if there is any information on whether TiddlyWiki works on the browser available for the Nintendo Wii.  Simply type in the keyword “wii” in the search box and you can find out if there is any information on the Wii anywhere in the TiddlyWiki.

TiddlyWiki is Organized

In addition to full text search of all your data, TiddlyWiki can also list all the tags you’ve ever created and allow you to view lists of all tiddlers that match those tags.  You can then open the tiddlers you want from that list, or open all tiddlers that match a given tag.  At home, I work on schoolwork for my current class (csc576 - Connection Oriented Neworks), so to bring up my notes for that class, I just go to the tag I made called “csc576″ and request all tiddlers matching that tag be opened.  Similarly, at work, if I’m working on a project relating to JBoss, I can open all my notes on JBoss by selecting the tag I created called “jboss”.

In addition to tagging, TiddlyWiki offers a series of tiddler lists that are organized in various useful ways.  In addition to the tag list view (mentioned above), TiddlyWiki also offers a timeline of tiddler edits, allowing you to see what has been changing in your TiddlyWiki at a glance.  Can’t remember the name of the tiddler you were editting on Monday?  Just look at the timeline and it will tell you what changed that day.  In addition to the timeline, you may also find that you create tiddlers that are not referenced from anywhere else.  They may contain useful information (the VIN number on your old 1991 Toyota Corolla), but you won’t find them just by “surfing” through the TiddlyWiki.  Well, luckily, TiddlyWiki keeps track of all these “orphaned” tiddlers and can give you a list, so that you won’t lose them.  Conversely, sometimes a tiddler has a link to it, but it doesn’t yet exist.  TiddlyWiki also tracks these cases, and provides a list if needed.

Conclusion

Overall, I have found that TiddlyWiki has provided a sort of “sweet spot” in the field of tools to organize your data.  It is extremely easy to pick up and start working with, but has an extensive feature set that gives advanced users  a lot to work with.  It is open source and extensible, allowing sties that collect extensions for TiddlyWiki to spring up all over the web.  If you’d like to keep your own TiddlyWiki on your local machine or USB stick to carry with you, you can head over to the official site and download it.  If you’d prefer to keep your data online so you don’t have to carry it with you, a site called TiddlySpot specializes in providing free online TiddlyWikis (public or private!).  Just supply a site ID and password and you’re on your way.

Posted by Rick on the 10th of December, 2008 at 8:33 am under Intellectual Property.    This post has Comments.

It can be really difficult to tell the difference between a skillful troll and someone who really is that clueless.  For that matter, it can be really difficult to tell the difference between parody and reality, especially when one predicts the other.

That said, I honestly have a really hard time figuring out whether the HeliOS Project has been the victim of a troll or not when I read this.  This is very similar to Poe’s Law, which I myself have fallen victim to on occasion, though instead of being unable to distinguish parodies of fundamentalism and real fundamentalists, I cannot tell whether or not there are people who really think this stuff.  Anyway, I wish the best to HeliOS, it looks like they have their work cut out for themselves.

Blog of helios: Linux - Stop holding our kids back

Posted by Rick on the 21st of November, 2008 at 1:48 am under Programming.    This post has Comments.

Just ran across (via Proggit) an interesting blog post about Core War, a game I played a bit with on Linux a few years back in which programs compete for dominance inside a virtual machine.  Very neat idea - you program an agent that will go and fight for you in a virtual world.  Cool.

What I didn’t know was that A.K. Dewdney wrote an article about Core War in Scientific American in 1984.  This is the same A.K. Dewdney that authored one of my very favorite science fiction/philosophy books ever, The Planiverse.  It turns out it is sort of a small world.

Posted by Rick on the 25th of October, 2008 at 11:21 pm under Programming.    This post has Comments.

I mentioned a couple of posts ago about a technique for scoping state I used when I programmed in Scheme more often than I do now.  As I’ve been picking up Clojure as my hacking language of preference, I was surprised to find that it doesn’t support the same idioms.  In Clojure, any variables referenced when  a function is defined in a local scope are unbound when the function is returned to an outer scope.  This means the code I wrote in Scheme doesn’t have a direct analogue in Clojure.  This was a deliberate decision made by Rich Hickey, Clojure’s designer/author.  As he says:

If locals were variables, i.e. mutable, then closures could close over mutable state, and, given that closures can escape (without some extra prohibition on same), the result would be thread-unsafe. And people would certainly do so, e.g. closure-based pseudo-objects. The result would be a huge hole in Clojure’s approach.

The end result is that when you first pick up Clojure, if you try to close a function definition over a variable, you’ll find it is no longer bound when you leave the enclosing scope, and if you try to close over let-bound value (local), you’ll find that is immutable, and can therefore not be used to store state.  So how do we accomplish the same result as we did in Scheme?

You use Clojure’s shared transactional memory and create a mutable ref that is modified in a transaction to preserve thread-safety:

(let [z (ref 0)] (defn add [] (dosync (ref-set z (inc @z)))))

We still make use of local binding, but we bind z to a ref, which we can then modify inside of a dosync call.  Naturally, an arbitrary number of functions could be defined inside of the let that all shared a reference to z.  Once outside of the let, z is no longer accessible to the casual programmer, but the functions still have full access to it, creating a sort of data hiding/state-preserving closure that protects the “private” variable z.

I don’t use this idiom often, but when I want it, it’s exactly what I want, and avoids the need for more complex constructs to simply have a pair of functions share a variable that isn’t modifiable in other parts of the program.

Posted by Rick on the 23rd of October, 2008 at 10:38 pm under ublog.    This post has Comments.

My Emacs startup time was stretching beyond 10 seconds, which seemed a bit offputting even for a text editor cum operating system. I do have a fair amount of elisp loading at startup, however, and though I have played with byte-code-cache, I determined there must be an easier way to compile all my elisp files in one go, rather than calling byte-compile-file for each file in my elisp directory.  It turns out there is:

C-0 M-x byte-recompile-directory

will recursively traverse a directory stucture and compile all .el files encountered.  My startup time is back to 4 seconds.

Posted by Rick on the 23rd of October, 2008 at 1:08 am under Programming.    This post has Comments.

In Java, there are four scopes: public, package friendly, protected and private.  These are useful, but rather coarse.  It would be nice to be able to specify a variable that was only visible to a select set of methods.  For example, suppose you wanted to develop a counter that could not be modified without being accessed through methods that modified it.  In Java, this would be accomplished with an inner class that had two methods.  It’s interface would have to be declared publicly, though.  In R5RS Scheme, this would look like:

(let ((z 0)) (list (lambda () (set! z (+ z 1))) (lambda () (set! z (- z 1)))))

This defines two functions that each close over the variable z such that you must call the first to increment z and you must call the second to decrement z.  Because z is not defined at the top level, there is no way to access it and modify it directly — you must go through the functions.  This allows the functions to be a bottleneck to the modification of z.  So, if, for example, you wanted to log every time z was changed, you could add a logging command to each of the functions.  This approach allows arbitrarily fine scoping to be defined.  If you want a variable to be available to only one or two functions, it is trivial to do so using the “let” construct.

I’ll take a look at how you can accomplish similar results in JVM-based functional languages like Scala and Clojure in my next post.

Posted by Rick on the 19th of October, 2008 at 7:19 am under ublog.    This post has Comments.

It took me 30 years, but I woke up this Sunday morning and realized that rational numbers can be expressed as the ratio of two integers.

Posted by Rick on the 18th of October, 2008 at 7:01 pm under ublog.    This post has Comments.

By far the most brain-bendingly elegant definition of all the Fibonacci numbers I have ever encountered:

(def fibs (lazy-cat '(1 2) (map + fibs (drop 1 fibs))))

Found over an the clojure-euler wiki.