Archive for the ‘Programming’ Category
Programming Style
I’ve spent a lot of time trying to figure out the “best” coding style, environment, language, abstraction. One of my mentors over the years once turned to me as we were working on some code and, after I’d delivered a particularly vehement critique of the code I’d been working on, simply said “Rick, there’s no such thing as ‘perfect code’.” And that was it. I never knew him to be a man of many words, but he knew the craft better than any I’d ever met. It struck me at the time, but the simple truth of it has resonated with increasing impact over the years.
Part of it is that, from all the evidence I’ve gathered, I have to to conclude that different systems and methods and mental models work for different people. Such a relativistic view is somehow disappointing to the part of me that seeks universal truth, but the longer I spend programming, the more I realize that the methods that work the best for me (dynamic languages, REPLs, experimentation with many prototypes) not only aren’t appealing to some, they simply aren’t as effective as an IDE with code completion and an incremental compiler for them. It really is relative. Various artists choose different tools. Just as as some writers pen their manuscripts by hand, some use typewriters, and yet others use Emacs, different programmers choose the tools of their craft with equal discretion.
Perhaps it’s not all that surprising, but some part of me still wishes there were some ultimate answer to the art of computer programming.
Org-Mode Powered Literate Programming
One of my first projects to push to GitHub was my Emacs initialization files. Emacs is precisely as powerful as Neal Stephenson suggests, but I find that people are unwilling to use it because they find it difficult to configure to do what they want. I thought that I might help by not only making my Emacs configuration public, but by using Org-Mode and Org-Babel to make my Emacs initialization a literate program — one that doesn’t just have comments, but that is itself an executable document meant to be read by humans.
I completed the port this weekend, posted it to GitHub, and am making a weave of the file available as a static page here on Etherplex. I’ll be updating it as it changes, and hope someone finds it to be useful.
Code Moving to GitHub
I’m a big fan of distributed version control for a lot of reasons, but sites like GitHub and BitBucket really strike at the heart of the issue: real coding should be a social phenomenon, if for no other reason than code is really about sharing ideas, and code should be addressed to human beings rather than to a computer.
The self-hosted Mercurial repositories I used to host at http://code.etherplex.org are now gone, and the subdomain is now a redirect to my GitHub profile, where I’ll be migrating all my active and psuedo-active coding efforts. While I tend to prefer Mercurial, the community at large has spoken, and I’m happy to use Git for the majority of my work if it helps engage the community. So, head on over to Code Etherplex and check out what I’ve posted. Not much yet, but I’ll be pushing a lot of my in-progress work to GitHub in the next couple of weeks.
Low Learning Curve
The role of a command shell is to give you more control over whatyour computer does for you. Not everyone needs this amount of control,and it does come at a cost: Learning the necessary script commands toexpress what you want done… But if you find yourself using yourcomputer frequently enough, it is more than worthwhile in the long run.Any tool you use often deserves the time spent learning to master it.
Emphasis mine. Optimizing for a beginner is a huge mistake, in my opinion. Sure, if you want to make a lot of money in this early stage of computer development, when adoption isn't very high yet, then it makes sense to optimize the entire experience for beginners. But in the long haul, we will develop a population of experienced users, not beginners. And once the average users has not years, but decades of experience with computers, shouldn't we admit that any tool you use that much deserves the time spent learning to master it? And, given that you are going to master a tool, it makes sense to choose a tool worthy of your time. That should be the analysis new users should be making, rather than seeing how much they can get done in the first 15 seconds using the system.
So-called “modern desktop environments” converge on total unusability, and present-day mainstream graphical user interfaces in general are far less usable than they are praised to be. Usability simply does not equal low learning curve, and hiding system details from the user, as the Official Truth seems to be these days.
He was right. It's too bad he left Linux to develop closed software for Windows.
Speed Problems: C vs. Gambit Scheme
My Gambit Scheme code is taking 500 times longer than the equivalent C program, and I don’t know why.
So, there’s a really horrible way to spend processor time to calculate pi that involves taking the unit square and inscribing a circle inside it with radius 1/2. You then partition the square into N partitions on each axis (the code below uses 12,000). This gives you a unit square and places it on a grid with 12,000 on each side, for a total number of grid squares of 12,000^2 = 144,000,000. The algorithm then iterates through the center of each square and checks to see if it is inside of the circle. It sums the number of points inside the circle, divides it my the total number of points, and multiplies by 4, which produces an approximation of pi.
For those already initiated, this is like a deterministic Monte Carlo approach.
Here’s the C implementation (from the professor of my Architecture of Parallel Computers course):
#include <stdio.h> int main(int argc, char*argv[]) { int numPartitions = 12000; int circleCount = 0; double interval = 0, pi = 0; int i = 0, j = 0; interval = 1.0/(double)numPartitions; for (i = 0; i < numPartitions; i++) { double a = (i + .5)*interval; for (j = 0; j < numPartitions; j++) { double b = (j + .5)*interval; if ((a*a + b*b) <= 1) circleCount++; } } pi = (double)(4*circleCount)/(numPartitions * numPartitions); printf("Estimate of pi is: %10.8lf\n", pi); return 0; }
Here’s the (mostly equivalent, I hope) Scheme implementation:
(define (main args) (let* ((partitions 12000) (increment (expt partitions -1)) (count 0)) (do ((y 0 (+ 1 y))) ((= partitions y) (* 4 (/ count (* partitions partitions)))) (let ((b (* increment (+ y .5)))) (do ((x 0 (+ 1 x))) ((= partitions x)) (let ((a (* increment (+ x .5)))) (if (< (+ (* a a) (* b b)) 1) (set! count (+ 1 count))))))))) (display (string-append (number->string (exact->inexact (main '()))) "\n"))
OK, so, running these on my three-year-old Intel Core 2 Duo T7200 (under Linux 2.6.31), the C version takes 1.03 seconds, consistently. The Scheme version, compiled with Gambit Scheme 4.6.0, takes about 8 minutes 19 seconds. I am using a complied binary of the Scheme code, and Gambit was built for the machine I’m running it on, i.e. –enable-single-host was supplied to the configure script.
So, the point of this post is not really that Scheme is slow — in fact, I vastly prefer it. Rather, I’m looking for some input from people more knowledgeable in Scheme as to what I’m doing wrong here. I assume do is well optimized, but I’m I suffering because of the let* or nested lets? What’s killing my performance?
I’ll try to profile it, but I’m not sure how to do that in a way that traces back to the Scheme source after it’s been compiled and linked. The is the first time I’ve used Gambit (I used PLT in the past, and I’ve done quite a bit of work in Clojure on the JVM), so I could use advice from anyone that knows the C or Gambit environments.
Static Typing and Functional Languages
With a title like that, it’s not really clear where I intend to go with this post, but bear with me for a moment.
A couple of years ago, I got into functional programming, initially with Common Lisp (only a tiny bit), and then Scheme and Scala, a pinch of OCaml, and finally Clojure. One thing I noticed about all the Lisps was that they weren’t statically typed in the way I was used to in Java. Then I noticed that all the functional languages that were statically typed (OCaml, Scala) were also more modern than the Lisps.
Anyway, I continued working in functional languages for months, growing more accustomed to the power they offered, and changing how I thought about solving problems as I went. Design patterns seemed much less useful, and languages that didn’t support higher order functions, TCO, and the basic map, reduce, and lambda functionality started to feel increasingly awkward.
So, as I worked in Java for much of my professional work, I was naturally drawn to investigate Functional Java, a set of libraries that offered at least a partial veneer of functional programming on top of the Java language. Part of that library was a testing framework inspired by QuickCheck called Reductio Test. Reductio was cool, and the examples looked simple and powerful. Then I tried to write my own generator for Reductio.
The code was awful. It was an explosion of objects attempting to replicate higher order functions, but constrained by Java’s type system. As such, the types of the higher order functions were generified on types of lower-arity higher order functions, and so on, until finally all the parameters had been consumed. I thought I must be doing it wrong, so I browsed the Functional Java source tree over on the Google code site where it is hosted. In Function.java, I saw the same problem emerge:
/**
* Curry a function of arity-5.
*
* @param f The function to curry.
* @return A curried form of the given function.
*/
public static <A, B, C, D, E, F$> F<A, F<B, F<C, F<D, F<E, F$>>>>> curry(final F5<A, B, C, D, E, F$> f) {
return new F<A, F<B, F<C, F<D, F<E, F$>>>>>() {
public F<B, F<C, F<D, F<E, F$>>>> f(final A a) {
return new F<B, F<C, F<D, F<E, F$>>>>() {
public F<C, F<D, F<E, F$>>> f(final B b) {
return new F<C, F<D, F<E, F$>>>() {
public F<D, F<E, F$>> f(final C c) {
return new F<D, F<E, F$>>() {
public F<E, F$> f(final D d) {
return new F<E, F$>() {
public F$ f(final E e) {
return f.f(a, b, c, d, e);
}
};
}
};
}
};
}
};
}
};
}
I was stunned. What had gone wrong? After a few minutes or hours, I realized the reason Lisp had used dynamic typing: having to explicity specify all the types for each of the higher order functions destroys the readability of the code! So, languages like Javascript, Lisps, Python and Ruby get too avoid all that. So how do OCaml and Scala work?
Well, a lot of work had to be done to support static typing with functional programming, and it required the development of a type inferencer to make code readable. The end result is a sophisticated compiler that has the ability to determine the types in the code without the programmer specifying them exhaustively. This functionality is advanced, however, and it took a long time to develop. That is why OCaml and Scala are much newer that Lisp, for example (though it is true that OCaml and Python are about the same age).
So, suddenly it all came together for me. I finally had a glimpse as to how the type systems interacted with the programming paradigms, and why languages made the decisions they did. Type inferencers can be tricky to write well, so there is a trade space language designers have to navigate to get the result they’re looking for. Dynamic typing offers simplicity, but also comes at the cost of rigor and the ability to catch entire classes of bugs at compile time, rather than runtime. Sometimes that cost is worth it, sometimes not. That is why, even today, there is room for dynamic languages like Python and Ruby alongside static languages like Scala and OCaml.
Re-thinking the Tabs Model in Modern Browsers
TL;DR version: Modern browsers need to learn a lesson from Emacs and keyboard launchers and provide an interface for tab-switching that is keyboard search-as-you-type based. Firefox is the best browser to implement this on, since it is very extensible.
The Extensibility of Firefox
Mozilla recently hosted a Summer Design Challenge for 2009 that focused on “Reinventing Tabs in the Browser”. Firefox is a fabulous platform to host such a challenge on, as it is the most extensible of all the browsers currently available. Let me digress into why I say this quickly before I get into my main point (which is about tabs!)
Mozilla itself has gone to great length to expose the browser internals by making the browser open source, which allowed for modification. Next, they provided a framework for developers to write add-ons for Firefox, which provided a robust architecture for extension. The primary distinction between modification and extension is that two modifications to a piece of software may well be mutually exclusive, while two extensions are designed to work side-by-side. These two aspects of Firefox’s feature set have been well known since its release, and the add-on (extension) community has been very active since Firefox’s release.
But, in recent months, Mozilla has released two extensions that make the process of extension itself even easier. One is called Ubiquity, which provides a command-based interface to the browser and web, and allows for it to be extended by writing Javascript in its configuration interface (in a browser window with the address “about:ubiquity”). This type of extension caught on quickly with fans of command-based systems like Launchy and QuickSilver, (and later, Google’s Quick Search Box, authored by the developer responsible for Quicksilver, also only available for Mac) but was otherwise relegated to a niche of the Firefox user base. One of its primary features was the novel mechanism by which Ubiquity could be modified: simply visiting its configuration interface (inside the browser itself), and typing into a text box. That was it. No saving, reloading or compiling. As soon as the code was typed, it was “live”, and immediately available within the Ubiquity launcher. The distinction between “development” and “use” was almost non-existent.
This feature was so compelling that it deserved its own extension, independent of the concept of a command-based launcher embedded inside Firefox. Whereas Ubiquity provided an API centered around the user interface of Ubiquity, it offered little to modify aspects of Firefox outside of the Ubiquity interface. To fill this more general need, JetPack was born, also from Mozilla Labs. JetPack had all the quick-development goodness of Ubiquity, but without the extra baggage of a command-based interface. JetPack was designed to allow extensions for Firefox to be written quickly and simply, and distributed on the web with equal ease. Though still young (JetPack has only been released for a couple of months), it provides a polished interface for development. Its internal browser-based editor for coding is based on Mozilla’s own Bespin, and provides all the immediacy of Ubiquity development with extra goodies like line numbering and syntax highlighting thrown in.
So, hopefully at this point you understand why I assert that Firefox is the world’s most customizable browser. My concluding thought on this topic is that this is all part of Mozilla’s effort to work towards a browser that is designed to be restarted only rarely, is extensible while it runs using a built-in interface, and allows users to change their working environment as easily as developers, because no special tools are required. This vision is not entirely new; if you are familiar with Lisp-based environments, the REPL, or GNU Emacs, all this will sound very familiar. So, it would seem Mozilla is learning well from successful projects in the past.
Why Tabs? Why Not “Buffers”?
Which brings me to my main point (sorry it took so long!) As Mozilla hosted the design challenge to reinvent tabs in the browser, many teams with a wide variety of approaches sought to provide meaningful improvements to the existing interface. Every team did research with users by watching users use Firefox, and/or interviewing users about their browsing techniques. Every team noticed one glaring deficiency in the existing tab interface: it doesn’t scale. Any user of Firefox that has tried to manage 50 tabs has had trouble. Even the interface has trouble: it tries to show many of them, and finally gives up, reverting to a scrolling tab bar. So, when asked to improve the existing interface, every team tried to improve the tab interface’s scalability.
What I found so surprising as I reviewed the entries was that, in a community that has learned so much from Lisp and Emacs, most entries completely ignored the notion that a keyboard-based navigation system scales much better than a mouse-based system. The trade-off, of course, is learning curve and prior knowledge. We can attribute the meteoric rise in keyboard launchers across all operating systems to “Start Menu Clutter” – the simple problem that, when you install a few hundred programs, each wanting its own start menu item, quick launcher, and desktop shortcut, you rapidly drown in a mosaic of brightly colored icons, all alike. But, if you know what you want (“Firefox”, “Word”, “Emacs”, or “iTunes”), you can use a keyboard launcher, type a few letters, and avoid the clutter entirely. Such systems prohibit “browsing” of available programs, but for most users most of the time, the desired program is known in advance, rather than discovered by searching the start menu.
One entry provided effective keyboard navigation (it was called “Collapsible Tab Groups“) and it provided good scalability. In particular, it provided two pieces of functionality that should be present in more software today, one of which comes straight from Emacs.
First, (this is the non-Emacs feature), it puts the UI elements on the side of the screen, rather that the top or bottom. Why is this so good? Most screens are widescreen. Heck, even the 1600×1200 screens are wider than they are tall, but even those are going out of style to favor wider aspects (I’m typing this on a 1680×1050 display). So, given the fact that displays are wide, but most material we read is tall (think word documents, most PDFs, and almost every web site), do developers insist on sticking every interface bar on the top or bottom of the screen, further constraining the already-limited vertical dimension? Well, collapsible tabs groups gets it right, and puts the tabs on the side.
Second (this is the Emacs feature), it provides “search as you type” for navigation. What does this mean? It’s the kind of behavior you get in desktop keyboard launchers. If you type “irefox” or even “ire” into Launchy or QuickSilver, you will get “Firefox” as the launch option because that substring matches the “ire” in “firefox”. This is the feature Emacs uses to manage buffers with ido-mode. You open the interface to change buffers (files, roughly) in Emacs and it presents you with a small interface that lists just a couple of the (possibly tens or hundreds of) buffers you can switch to. As you start typing, each character is added to the search criteria as you type, and the list is refined to reflect the relevant buffers.
This approach works perfectly in a browser. The problem with Mozilla’s challenge is that they asked people to reinvent tabs; the realization is that we don’t need tabs. “Tabs” implies a user interface, which should be customizable. What we really want is a model for navigation, and a set of pluggable user interface elements that attach to that model. One user interface might be tabs (for users that keep 4 or 5 websites open at a time), but another could be a search-as-you-type mode that doesn’t use any screen real-estate when it is not in use. There has been a widespread conflation of view (the tab interface) and model (the idea that we have multiple web sites open) in every browser.
The obvious answer is that I should get in gear and write a keyboard-based tab switching interface using JetPack (Ubiquity has one, but it is painfully slow to use for a variety of reasons). I don’t know…maybe I just convinced myself that I should.
Software as an Investment in an Ecosystem
Here’s the thing about using commercial (proprietary) software: you are not buying software, or even software-as-a-service. You’re buying into an ecosystem. Let me quickly provide examples, rather than pontificating endlessly.
Take the iPhone. I really like the iPhone. A good friend of mine just bought a shiny new iPhone 3G S. She was moving from Verizon, and had used their oh-so-handy Backup Assistant to backup her addresses, phone numbers, emails, etc. The problem was that Verizon doesn’t really have a way to get the data out of their system. So, it’s great so long as you stick with Verizon, but as soon as she wanted to switch to AT&T and grab her contacts, she had a problem. (For the curious, I actually extracted the data from her phone, a Motorola, with BitPim, exported from BitPim to vCards, and imported from vCards to Apple’s Address Book, at which point I recommended taking the same vCards file and importing it into Google Contacts as a nice central storage point that does allow you to export.)
The point here is that she invested the time to input roughly 250 contacts into a proprietary system but didn’t have 1) an exit strategy or 2) a plan to keep the system forever. This is common – when you invest a ton of time in a piece of technology (either by inputting data to it, or learning it) you need to have a plan for what happens to your data (or your skill) if that tool goes away for some reason. Google Contacts was a good replacement: it is accessible via the web and exports to three different (plain-text) formats that are widely used and can be backed up locally.
Second example: investment of time in learning a proprietary piece of software. Another good friend of mine is very interested in photography and image editing. So, he, being motivated, took a class to learn Photoshop and even got the educational discount on Photoshop when he took the class. The $500 or so isn’t that big of deal, but coupled with the hundreds of hours learning the tool, starts to become a major investment of the only two things we have in life: time and money.
Here’s the problem: the one thing you can be sure of is that you will get a new computer. People like me upgrade more often than most (every 18-24 months in my case, and I’m not even a gamer), and probably own more regularly-used computers than most (desktop, laptop and netbook). So the problem becomes apparent: If I invest a few hundred dollars and hundreds or thousands of hours in learning Photoshop (or Excel, or whatever), I have to worry about having that tool wherever I am (on my desktop at home, on my laptop at work, on my netbook at the airport) and whenever I am (today, a year from today, 10 years from today). And that is when the cost mounts: you are willing to spend lots of time to learn how to use the tool, but that investment represents a hook. Now, you have an enormous incentive to continue spending money to buy new versions of the software for all your machines that span the space and time you occupy.
So, when my good friend bought a new computer, he had the problem of getting a new copy of Photoshop for it, because he was now heavily invested in the ecosystem surrounding the tool. He knew how to use the tool, but he also relied on the vast pool of resources on the web related to Photoshop: tutorials, plugins, templates and interoperability. So, the investment isn’t just his time, his money, and his skill, but also an expectation that certain associated resources surrounding the product will be there for you, as a user. This is the product’s ecosystem.
I’m not saying you shouldn’t invest in proprietary software systems; I’m saying you should do so with your eyes open to the true costs of the investment – don’t just look at the ecosystem that the iPhone has and think “Everyone has one…and it has 100,000 apps, it’s the way to go!” Recognize that your are now necessarily going to be compelled to use all the associated iPhone tools, and you will invest yourself in learning how to use them. Many commercial software vendors recognize this effect: there is a good reason that Microsoft gives away most of their products freely to students. These companies realize that your willingness to spend the time to learn new tools will trump your desire to hang on to your hard-earned dollars. That is the strategy to keep the money flowing.
So, what can you do when you see this happening to you? You can change your expectations. You can treat the commercial software as a single point in a space of solutions to your needs, and keep an open mind about switching. Learn the tool from the perspective of approaches it offers to solving problems, and try out other (perhaps Free) tools with which the same approach can be applied. In the case of Verizon, you could look at Google Contacts, Plaxo, Thunderbird, or Apple’s Address book. In the case of Photoshop, you can, depending on your needs, look at The Gimp, Picasa, Paint.NET, iPhoto, Paint Shop Pro, or Aperture. But, when you pick a solution, recognize that your computer and your needs will change, and be prepared for it.
Use of Mutable Data Structures in Clojure
One of Clojure’s biggest strengths is that it is backed by the JVM, and has good interoperability with the Java libraries. When I needed to implement a simulation with events that should be executed in order of their timestamp, I was immediately tempted to use Java’s PriorityBlockingQueue, since neither sorted-map nor sorted-set supported two events with the same time stamp (though I could have used some magic to make it work, but decided simplicity was more important).
I was generating events with several types, and under certain circumstances, events that mapped :type to :arrival needed to be removed because they were no longer valid given the state of the simulation. Easy enough:
(filter #(not= (:type %) :arrival) event-queue)
But, PriorityBlockingQueue is mutable (oh, and I’d put it in a ref, to make things more interesting), so I’d need to clear it after I did the filter, before repopulating it with the filtered events. This is a simpler version of the final code:
(let [elements (filter #(not= (:type %) :arrival) @event-queue)]
(.clear @event-queue)
(.addAll @event-queue elements))
So, when I ran this, I kept getting NullPointerException. Clojure is not well known for its uber-informative stack traces, so I flailed with this for a few hours, looking at it on and off while working on other parts of the code. Then it hit me: filter is lazy, so the filtering of the elements is not happening until I’m calling in the filter when I make the addAll call. By that time, @event-queue is empty, so the list of results is nil, which will throw a NullPointerException when passed to the addAll call. The easy way to fix this is to wrap the call in a doall function, which forces the computation:
(doall (filter #(not= (:type %) :arrival) event-queue))
The moral? Even though Clojure has good support of Java collections, be wary when you use mutable data structures with the standard Clojure seq functions; many of them are lazy, and unless you know they won’t change, you can get subtle temporal errors creeping into your code. I got lucky that addAll threw the exception. If it hadn’t my simulation would have ended and I would have spent a lot more time trying to figure out why.
Beauty in Programming: Part I
Introduction
When I program, I sometimes am fortunate enough to see beauty emerge from what I do. I love programming because of this, but, enthusiastic as I am about it, I find that it is very hard to convey the essence of what I see to others, which often means that they cannot understand what I am so enthusiastic about!
So, I would like to discuss one of my favorite idioms in programming, but before I do so, I would like to discuss something you might be familiar with: organizing information. In the context of information organization, I think my example makes sense, and the beauty arises, and I don’t have to discuss programming at all. This is ideal, because the beauty in programming has nothing to do with programming at all: it has to do with patterns of thought. But I’ve already delayed too long; off we go!
Organizing Information: Hierarchies
In our struggle to store and retrieve information, we have developed systems to help us organize it all. Probably the most prevalent idea in these systems is the notion of a hierarchy. We see this everywhere, from filing cabinets, to file systems on a computer, to organization of library books, to notes taken in an outline form, to the system for classifying life on Earth, and even in this very essay. The idea is simple: establish a set of top-level categories. Each of these can contain either more categories, or items. Thus, any given item has a location in the hierarchy that is defined by all of its parent categories.
This idea of classifying information in a hierarchy is so pervasive that it comes completely naturally to our way of thinking. It has three main strengths:
- It allows us to locate information using a broad idea, and narrowing that idea until we finally locate the information we seek
- It allows us to easily add new information in the structure, because we have a pretty good idea of where things should go
- It allows us to reorganize the ideas, because we can take sub-categories and change their parents as necessary
But it has one major weakness: if a piece of information has multiple facets, it can really only exist in one place in the hierarchy, so users of the system are forced to choose. There is a rich history of developing add-on aspects to hierarchical systems to fix this weakness. In computer file systems used by Linux, Solaris, BSD or Apple’s OS X, for example, a user can establish links in the file hierarchy to allow a file to exist in multiple locations at once. This would be a bit like putting a yellow sticky note in a filing cabinet in the “Hobbies” section that says “Go look at the Automobiles section for info on my hobby sports car.” This works fairly well, but an unsuspecting user might remove the file on the sports car after it got totaled, only to find that little sticky later. Meanwhile, if your brother looked under “Hobbies” to find info on the sports car, he’d be confused when he couldn’t find it where the note indicated. So, it makes maintenance of the system a bit harder, and can confuse users.
Organizing Information: Tags
Nevertheless, hierarchical systems have such great advantages that they are still the most prevalent approach to solving the problem of information organization. Recently, however, another notion has come along: tagging. Tagging is a simple system: each item of data can be assigned a tag, or keyword, that describes some broader category to which it belongs. Each item can have an unlimited number of tags. Going back to our example, the information on the sports car could be tagged “Automobile” and “Hobby”.
This mechanism for organizing information is used in several well-known environments: Google uses it in their bookmarks implementation, Facebook uses it to organize photos and notes, Flickr uses it for photo organization, and many news sites (like Digg and Slashdot) tag news stories. The advantages are obvious: tags allow a fragment of information to exist in multiple places in the hierarchy. But wait, where is the hierarchy now?
The first question many newcomers to tagging ask is how to really organize things. As soon as you try to establish a tag hierarchy, you start to notice the problem. Suppose you had a tag “Cars” and then another for “Sports Cars”. If you wanted to add some information about the discontinuation of the Dodge Viper into your system, what tags would you add? Both “Cars” and “Sports Cars”? Or maybe just “Sports Cars”, since that implies “Cars”? But then, when someone looked for all information on “Cars”, the info on the Viper wouldn’t show up, which doesn’t seem right. Then again, is this system going to require that we add every possible tag when we add information? That seems cumbersome.
So, in exchange for solving a problem, tagging introduces another. We can now freely associate bits of information with tags, but the tags have no hierarchy. When we look back at the strengths of a hierarchical system, we see that we can still locate information using a broad idea. If we look at all information tagged “Cars”, we can then see that many of those items are also tagged “Sports Cars”, so we develop a sort of emergent hierarchy. This concept is often referred to as a “tag cloud“. So, at least tagging systems allow to locate information in much the same way as we could using a hierarchy, and they have the added bonus of allowing us to file data in multiple places.
We’ve said that hierarchical systems are good at adding data. Unfortunately, adding information to a tagged system can be difficult. For each item we want to add to the system, we have to identify all relevant tags when we insert the information. This means not only knowing all the relevant tags in the system, but also anticipating whether any new tags are needed. This could be even more work that creating the data we wanted to insert in the first place!
Finally, we’ve said that hierarchies are good because they allow us to reorganize information. This can also be difficult when we use tags. In addition to collecting cars, we also start collecting motorcycles. So, we’d like to introduce the notion of “Vehicles” to our system, and have it apply to all of our cars, along with the new motorcycles we’ll be adding. In a hierarchy, you’d simply create the “Vehicles” category and put “Cars” into it. With tags, we have to go back and add the tag “Vehicles” to all items tagged with “Cars”. Laborious, to say the least!
Are tags really that bad, or have we missed something? Do we need to build some tools to automate all the cumbersome aspects of using tags? Or is there some underlying truth here that we have failed to recognize?
“Simplicity does not precede complexity, but follows it.” -Alan Perlis
In many tag systems used on the internet, the problems I’ve outlined above remain. The main reason is that it is difficult to formulate elegant systems to address these problems. The primary job of a computer programmer (whether they know it or not!), is to take problems like the one I described above and try to find the underlying model that should be used. You might get an itch in the back of your mind that something is wrong with what I described: tags seems like they should work, but for some reason are difficult to use.
Your first instinct may be to write a program that will automate the more cumbersome aspects of using tags. For example, in the case of adding a “Vehicles” tag, you could just write a program that would grab all the items tagged “Cars” and add “Vehicles” to them. Seems simple enough, and saves work. But it isn’t beautiful. Over time, you’d write a lot of tools to help you perform tasks like that, and the system would become complex. We wanted a simple system when we started using tags, not something that would be hard to understand and use. The key question that is always on my mind is: How can we retain simplicity?
In the systems we’ve been discussing, we have defined two elements. In the hierarchical system, we defined categories and items. In the tagged system, we defined items and tags. What if we tried to define the system using only one element? What if, in the tagged system, we defined tags as items, and items as tags? How would this change its utility?
Such a change would mean that items could be tagged (as before), but also that tags could be tagged, since they are also items. Like many subtle changes, it is hard to see how this small modification would be useful. In fact, it allows us to enjoy the best aspects of both a hierarchical and a tagged approach to organization, and it means the entire system is only composed of one element: a data item. What we’ve done is change how these items relate to one another, and that enables new patterns of thought and organization.
Going back to the example of the sports car that is both a “Car” and a “Hobby”, we can see that this system would support both through the normal use of tagging. But what about our example of “Cars” and “Sports Cars”? Well, in our new system, as soon as we create the data on the Dodge Viper being discontinued, we can simply tag it with “Sports Cars”. Why don’t we have to worry about tagging it “Cars” as well? Well, as soon as we create the tag “Sports Cars”, we tag it with “Cars”. So, now we have embedded the knowledge that “Sports Cars” and “Cars” are related in the tags themselves, rather than in each item using those tags.
Using this idiom, it is easy to see that hierarchies naturally emerge from the structure, but can be more complex than simple trees, due to the nature of tagging. Now, tags can relate themselves to multiple other tags at different points in the tag hierarchy, as can items, because items are tags. Likewise, if you want to provide some notes on a tag, you simply add data to it, because the tag is a data item. Reorganization of the hierarchy is easy – if you want to introduce the notion of “Vehicles”, simply tag “Cars” and “Motorcycles” with “Vehicles”, and you’re done. All items with those tags (or tags tagged with those tags, etc.) will now be tied to the notion of “Vehicles”.
This idea, though simple, may be confusing at first. Here’s another way to think about it: instead of tagging items with tags, we tag items with other items. So, if we’re in a filing system, we tag files with other files. If we organize photos, we tag photos with other photos. In reality, we have reduced our system of organization to its very simplest form by allowing ourselves only one action: we can establish connections among various members of a homogeneous collection of data. Now, each member of that collection will ultimately have both data, and connections to other pieces of data.
If this sounds familiar, that’s because it is: it is the exact methodology for organizing the internet. Web pages sometimes contain data, and they sometimes contain references to other webpages. Most webpages have both, even in walled gardens like Wikipedia.
Conclusion
By reducing the number of elements in our system, we increased its power, but retained its simplicity. Our new system has all the benefits of the hierarchical system, as well as supporting all the benefits of tagging.
While I decided to write this and relate the ideas to programming, I got the idea of “data as tags” and “tags as data” from TiddlyWiki, which I used to compose this, as well. If you want to see the system I described in action, visit the TiddlyWiki homepage and download a copy and play with the tagging system: it is an elegant and powerful approach to managing lots of information.