A few weeks ago my wife and me visited the National Library of Spain in a guided tour. It was quite boring – that’s not been a big surprise. But one thing caught my attention. It was exciting to come to know how they organize their vast collection. One would expect that they order the books and other media by release date, genre or author. But what they actually do because of the limited space in the historic building is to sort them by physical size!
Real-world file systems
They then use paper files to find a book by author, date or other categories. Library staff locates the desired item in a shelf or drawer identified by a unique number. Of course most of this file system is already computerized.
In many other libraries, however, classification and the media’s physical location are somehow mixed up. Usually there are designated book shelfs or departments for specific genres (e.g. biographies, sci-fi,…) and within these shelfs the books are ordered by the authors’ surnames. This system has a trivial downside: one given book can only be at one location at the same time. What if a book combines two genres?
Some computer programs even more old-fashioned
A similar problem occurs in traditional computer applications. In Microsoft’s Outlook for example emails are organized in folders. You can define your own folders, besides the default folders like “Inbox”, “Sent” and “Outbox”. However one Email can’t be in “Inbox”, “Things to do” and a folder named “Related to my blog” at the same time.
One way to overcome this problem is the concept of tagging. Instead of dividing the data into folders, all items reside in the same big pool just like in the National Library. You can then organize the data items by attaching certain indexed tags to them. GMail is one example of this concept. Google’s web-based alternative to Outlook maintains the familiar concept of “Inbox” and “Outbox”, but those are just another tags–or “labels” as Google calls them. You can easily list all Emails with a given tag. The crucial difference between folders and tags is that an email can have as many tags as you want, but must stay in only one folder.
Many modern desktop and web applications follow the same concept. Apple’s applications iTunes and iPhoto, for example, hide the physical location of the data records they manage (songs, videos and photos) from the user and organize them in Playlists or Galleries. This blog itself is on the forefront of modern web development, too! The tabs you see on the top (“Jazz”, “Technology”…). If I came up with a post that covered an intersection of both topics (and I surely will), it would be listed under both topics.
Taking it a step further: Tag-based Filesystems?
Here is something I find worth thinking about: are all common filesystems outdated because they use inflexible hierarchical folder structures? I know, Unix-like Operating Systems have hard-links that can serve as a work-around for having one file at two different directory locations. But that’s kind of clunky, isn’t it? Will there ever be a real tag-based filesystem, where files can have multiple tags instead of been trapped in a folder?
Such a filesystem could supersede all these iTunes and Windows Media Player programs together with their respective ideas of reinventing the way how files are organized.
What do you think? Would that make sense?
Image source: Antonio Garro


What I also found quite cool is that the Library also archives all published “sound documents”, which means LPs, cassettes, CDs, CD-ROMs, videos…..
I’ve just found a related discussion on a Ubuntu forum:
http://brainstorm.ubuntu.com/idea/9560/
and even a work-in-progress project that implements a tag-based file-system:
http://www.tagsistant.net/