Organizing information: from rigid structures to the lack of information organization

Arseni Mourzenko

Founder and lead developer

180

articles

April 29, 2015

Tags: tagging 4 data-structures 2

When organizing information such as files, two problems arise every time: the uncertainty that the current structure is appropriate, and the cases which are outside the structure.

Those two problems are very close and lead to similar results, such as the “Miscellaneous” directory which contains everything which didn't found its place elsewhere. Those can be the entries which are truly outside any conceivable structure given their specific nature, their particular content, or their origin, but when the initial structure is badly designed, valid files can end up in the “Miscellaneous” directory as well.

There are numerous examples of that. For example, when sorting photos, one can try to distinguish them by events, such as “Trip to London” and “Visit of the Louvre museum.” Quickly, the person will find that there are many photos which, while valuable, are outside any particular event, such as a single photo of a strange bird the person noticed on his balcony the other day. The person can instead try to distinguish the photos by date: those ones were taken on 15th of July, while these were taken on 18th of July. Such organization will be not only quite useless, but also make a difference where no difference should exist. For example, it doesn't make sense to put photos of a celebration which started in the evening and continued through the night in two directories.

Bad structures always lead to a lot of entries which are put into “Miscellaneous” directory, or which are put in a directory where they feel being in a wrong place. Tree-based structures lead to a high number of such cases nearly every time (actually, I've never seen any case where a tree-based structure would work well to sort uniform resources). This is caused by the inherent mutually-exclusive nature the tree structure imposes, in other words the fact that a leaf or a branch belongs to one and one only branch is an important obstacle for the organization of information. The worst example I have in mind is the password manager we use internally, where most passwords ended up in “Miscellaneous” folder, but every other usage of a tree was harmful too.

Tagging helps. By making removing the mutually-exclusive nature of a tree, tagging makes it possible to solve the biggest problem of a tree—the situation where a leaf may find its place in several branches at once. Actually, there are two underlying cases which are problematic with a tree: the case where a leaf can actually be placed in any of the multiple branches, none being more appropriate than other ones, and the case where a leaf should be in several branches at once.

Unfortunately, tagging doesn't solve all the problems. There are still entries which are completely outside the system. For instance, this blog article, by its nature, couldn't find its way into the preexisting set of tags. By creating an additional tag, I hid the problem, but the fact remains that the article is too different to fit into tags intended for software production articles.

When an entry is too special, either the entry should be shaped in a form which makes it possible to fit into the preexisting system, or the system itself should change.

This leads to the primary recommendation of this article: always design both the organization structure and the structure framework so that changes are as painless as possible. This means that:

The culture of change should be here. The system shouldn't be perceived as rigid, which often means that the system shouldn't be done by the community leader, but by several non-lead members of the community. Stack Exchange with its tagging system did it well, with anyone (given enough reputation) can create tags. Removal of tags is a different story.
The framework should facilitate the changes: renaming tags or removing them, merging tags or mass-assigning a tag to entries should be possible and, more importantly, easy.
The structure should be at a service of the users and the actual content, not the other way. Unfortunately, I've seen many cases where information is forced into a rigid schema, and users spend too much time trying to figure out where should they put a specific entry.

Ultimately, users shouldn't even be aware of the way information is organized. It should simply work, auto-magically. A promising field in this domain is full-text search and search based on meta. I've already discussed how search capability is superior to tagging in a case of this blog, and the case study can be applied as well to music hosting websites or passwords managers. While the fact that basic indexing engines coupled with highly complex search systems produce search capability of a higher degree compared to manual tagging is not that surprising (and is explained mostly by the lack of ability of a human brain to categorize entries in an obvious way which makes sense to everyone). What is more surprising is that even elementary full-text or meta-based search (with basic indexing) is still more capable than tagging or categorizing within a tree. Less work leads to better results—not bad.