Corante

Authors

Clay Shirky
( Archive | Home )

Liz Lawley
( Archive | Home )

Ross Mayfield
( Archive | Home )

Sébastien Paquet
( Archive | Home )

David Weinberger
( Archive | Home )

danah boyd
( Archive | Home )

Guest Authors
Recent Comments

pet rescue saga cheats level 42 on My book. Let me show you it.

Affenspiele on My book. Let me show you it.

Affenspiele on My book. Let me Amazon show you it.

Donte on My book. Let me show you it.

telecharger subway surfers on My book. Let me show you it.

Ask Fm Anonymous Finder on My book. Let me show you it.

Site Search
Monthly Archives
Syndication
RSS 1.0
RSS 2.0
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

Many-to-Many

« The Innovator's Lemma | Main | Good post on folksonomy; another on tagging »

January 24, 2005

Tags != folksonomies && Tags != Flat name spaces

Email This Entry

Posted by Clay Shirky

Grrrr - I hate not having the time to write the post I want to write on this, but here goes…

Tags are labels attached to things. This procedure is absolutely orthogonal to whether professionals or amateurs are doing the tagging.

Professionals often think tags are covalent with folksonomies because their minds have been poisoned by the false dream of ontology, but also because tagging looks too easy (in the same way the Web looked too easy to theoreticians of hypertext.) Not only are tags amenable to being used as controlled vocabularies, it’s happening today, where groups are agreeing about how to tag things so as to produce streams of e.g. business research.

More importantly, tags are not the same as flat name spaces. The LiveJournal interests list, the first large-scale folksonomy I became aware of (though before the label existed) is flat. The interest list has one meaning: Person X has Interest Y, included as part of in List L. All L is attached to X, and all Y’s are equivalent in L.

Tags don’t work that way at all. Tags are multi-dimensional, and only look flat, in the way Venn diagrams look flat. When I tag something ‘socialsoftware drupal’, I enable searches of the form “socialsoftware & drupal”, “socialsoftware &! (and not) drupal”, “drupal &! socialsoftware”, and so on.

Hierarchy is a degenerate case of tags. If hierarchy floats your boat, by all means tag hierarchically. If I tag so that A &! B returns no results, and a search on A alone returns the same items as A & B, then A is a subset of B at the moment.

This last point is key — the number one fucked up thing about ontology (in its AI-flavored form - don’t get me started, the suckiness of ontology is going to be my ETech talk this year…), but, as I say, the number one thing, out of a rich list of such things, is the need to declare today what contains what as a prediction about the future. Let’s say I have a bunch of books on art and creativity, and no other books on creativity. Books about creativity are, for the moment, a subset of art books, which are a subset of all books.

Then I get a book about creativity in engineering. Ruh roh. I either break my ontology, or I have to separate the books on creativity, because when I did the earlier nesting, I didn’t know there would be books on creativity in engineering. A system that requires you to predict the future up front is guaranteed to get worse over time.

And the reason ontology has been even a moderately good idea for the last few hundred years is that the physical fact of books forces you to predict the future. You have to put a book somewhere when you get it, and as you get more books, you can neither reshelve constantly, nor buy enough copies of any given book to file it on all dimensions you might want to search for it on later.

Ontology is a good way to organize objects, in other words, but it is a terrible way to organize ideas, and in the period between the invention of the printing press and the invention of the symlink, we were forced to optimize for the storage and retrieval of objects, not ideas. Now, though, we can scrap of the stupid hack of modeling our worldview on the dictates of shelf space. One day the concept of creativity can be a subset of a larger category, and the next day it can become a slice that cuts across several categories. In hierarchy land, this is a crisis; in tag land, it’s an operation so simple it hardly merits comment.

The move here is from graph theory (arrange everything in a tree graph, so that graph traversal becomes the organizing principle) to set theory (sets have members, and the overlap or non-overlap of those memberships becomes the organizing principle.) This is analogous to the change in how we handle digital data. The file system started out as a tree graph. Then we added symlinks (aliases, shortcuts), which said “You can organize things differently than you store them, and you can provide more than one mode of access.”

The URI goes all the way in that direction. The URI says “Not only does it not matter where something is stored, it doesn’t matter whether it’s stored. A URI that generates the results on the fly is as valid as one that points to a disk.” And once something is no longer dependent on tree graph traversals to find it, you can dispense with hierarchical assumptions about categorizing it too.

Comments (13) + TrackBacks (1) | Category: social software


COMMENTS

1. Bill Seitz on January 24, 2005 5:04 PM writes...

This reminds me of an old Altavista feature where you would start with a multi-word search, and it would return a bunch of co-occuring words for each word in your search, and you could include/require or exclude individual words to generate a much nicer boolean search.

Permalink to Comment

2. Jay Fienberg on January 24, 2005 6:57 PM writes...

Unfortunately, none of the folks (myself included) who are thinking / talking about Clay's writings seems to be working around (or, even towards) much of a common definition for these terms.

Especially complicating matters: in some ways, these terms can be used to describe either or any of design processes, means of human organization, system/tool implementation, specific kinds of data, and, as well, resulting "documents", databases, and/or information spaces.

So, terms that I think we're collectively getting tripped up on include:

tags
tagging
controlled vocabulary
folksonomy
taxonomy (when it's used like "not-folksonomy")
classifcation
flat structure
tree structure
graph structure
finding things / findability

Also, tip of the hat to set theory: from Codd's relational theory on, many have argued against hierarchical and for set-based approaches to data structures. There is a lot to say about how / why hierarchy is so ingrained in so many computer and info system designs, but it's worth noting that the movement against the hierarchical approach is well grounded as far back as the 1960s, even if it's not yet quite caught on.

Permalink to Comment

3. Rick Thomas on January 24, 2005 10:08 PM writes...

Clay wrongly conflates "ontology" and "hierarchy". There are unlimited numbers of ontologies - some flat, some shallowly hierarchical, some deeply hierarchical. Some look like naive tag sets, others describe massive business operations. Ontologies describing different related domains - say one for kitchen ingredients, one for recipes, and one for restaurant operations - are combined in practice, and the structure of these combinations is more useful than a big pile of tags.

This stuff has been systematized in database design over a generation. Now there's a lot more systemizing to do to make the web-wide database work. Folksonomies are one type of ontology. Folksonomies will coexist and interoperate with the most rigid hierarchies as needed.

Permalink to Comment

4. alex wright on January 24, 2005 11:15 PM writes...

While Clay is right to assert that library cataloging systems are deeply flawed, it is an enormous stretch to conclude that all ontologies are therefore flawed. This argument seems to stem from a basic misconception about library cataloging: confusing call number classification with subject cataloging. Most catalogers would tell you that assigning call numbers is by far the least important part of what they do; subject headings (which we might call "tags") allow users to overcome the physical limitations of the shelf by providing multi-dimensional access to physical objects (meaning that, yes, it is entirely possible to catalog a book as being about both "art and creativity" and "engineering and creativity"). In this sense, library catalogers actually anticipated the notion of tagging by about 100 years.

Permalink to Comment

5. alex wright on January 24, 2005 11:45 PM writes...

I should clarify the above to say that I meant it is possible in principle to catalog a book with multiple orthogonal subject headings; but if one were following the strict LC subject classification, one would be constrained from using a term like "engineering and creativity" - but for reasons that have nothing to do with shelving books (or, fundamentally, with ontologies).

Permalink to Comment

6. Mark Ranford on January 25, 2005 3:06 AM writes...

Great piece, Thankyou again Clay for bringing new insights. Lots would like to say, and also to respond to the clearly determined and deterministic ontology supporters, but for now, I'll jsut add a bite on a thought your piece inspired in me. In your last paragraph you said:

The URI goes all the way in that direction. The URI says “Not only does it not matter where something is stored, it doens’t matter whether it’s stored. A URI that generates the results on the fly is as valid as one that points to a disk.” And once something is no longer dependant on tree graph traverals to find it, you can dispense with hierarchical assumptions about categorizing it too.

Absolutely, and you raise in my mind the really interesting analogy with physics, the nature of nature, and also the nature of knowledge. Reductionistic science has been forever trying to particulate nature. so we continuolly search for the ultimate particles that the universe is constructed from. Likewise in KM many find it hard to particulate knowledge.

First, the points about not caring WHERE something is stored, remind me of the different approaches in AI towards storing info, distributed versus distinct representations. Distributed proving to be more likely how the brain does it, and also more advantageous for many reasons. Basically a "thing" is represented between a number of connected elements rather than in one. The representation emerges as one result of a specific configuration/stable state among the connected elements. Having any number of labels, sets of overlapping labels that can recall a thing is certainly powerful from our own perspective of searching and managing information on a daily basis. But what does this analogy help lead us towards? Is there a possibility that taking current efforts forward we might eventually be building towards something that could act in much the way of our own associative memory, or a collective associative memory? Its a nice thought, or is just a nice dream :-)

Second, your point about not caring WHETHER something is stored, is also fascinating, again theres the ramifications of the effectiveness of the analogy of distributed representations of things in both Artificial or Natural Intelligence as alluded to above. But also theres good mileage gained from the analogy with physics and the questions over the particulate nature of reality. Consider your words - URI's constructed on the fly - consider how the multiple quantum states collapse into a single state under observation, waves becoming particles. Are URI's created on the fly the web equivalent of these observed particles? Is knowledge the same nature also, actually not a physical or digital object at all until we choose to percieve an object, and in doing so have it constructed. Ive bitten off more than I can chew, as havent even started to explore buddhist terminology :-) But anyway, just to let you know what kind of weird, totally off the track ideas you generated by your very astute words. Getting more grounded, in essence I see the importance you are attaching to URI's and agree wholeheartedly. Also see your important point as to the valuable differences tags have from folksonomies and Flat Name Spaces.


Permalink to Comment

7. oedipa on January 25, 2005 4:21 AM writes...

"the suckiness of ontology is going to be my ETech talk this year"

...and O'Reilly starts putting up posters in philosophy departments across the land

Permalink to Comment

8. Julien Boyreau on January 25, 2005 8:47 AM writes...

Shirky made a GREAT confusion between Ontologies and Hierarchies !!! Hierarchies are just a degenerate cases of Ontologies&Data
I advice to everyone to read this piece
(See : http://citeseer.ist.psu.edu/cache/papers/cs/31314/
http:zSzzSzwww.csd.abdn.ac.ukzSz~ggrimneszSzpubszSzLearningFOAFDesc.pdf/learning-meta-descriptions-of.pdf) to distinguish between Semantic Forests AND Semantic Web where YOU COULD HAVE multiple super-class to your class !!

It is very funny to see that the same basic problem repeats day after day : one of the interesting aspect of WinFS was to use some kind of Tagging to allow multiplying super-class...

I know for the first day Shirky is a MAJOR OPPONENT to the Semantic Web ; as I see his move against is just continuing

BTW discover Haystack : it is a primary attempts to put SemWeb to the core of information : a kind of "tagging on steroids" experience

Permalink to Comment

9. Chris L on January 25, 2005 11:44 AM writes...

At what point did 'folksonomy' pick up the definition you ascribe to it? As far as I can tell, the most popular usage seems to be that folksonomy is equivalent to "system in which tags are used". Tags are certainly not the same as flat name spaces-- but folksonomy doesn't appear refer to flat name spaces either, at least not very often.

Permalink to Comment

10. nick sweeney on January 25, 2005 7:05 PM writes...

Now, though, we can scrap of the stupid hack of modeling our worldview on the dictates of shelf space.

If only that were so easy, Clay. The Lakoffian cognitive schema which map the manipulation of ideas and objects to the manipulation of objects and containers in a mental cabinet of curiosities... well, they ain't going a way any time soon.

And I suppose that's the problem we face: trying gently to remap both internal and external models of knowledge and ideas -- that is, the ones 'in here' and 'out there' away from the shelf, the bookpress, the cabinet, the filing cabinet...

Are we headed to a model in which the pursuit of knowledge becomes analogous to scrying? You know, now I think of it, gazing into the crystal ball or reading the tea leaves aren't too distant from this flattened-out process.

Permalink to Comment

11. phil jones on January 26, 2005 12:37 AM writes...

But surely the *other* purpose of ontologies is to guide future classification in certain directions.

For example, maybe I really want all examples of programming books to be put in the same set. And if people are free to choose a bottom-up classification sheme they might classify some as Python books, others as Lisp books, others as C books etc. If I've forced them to use a single category called "programming", yep I've lost flexibility and accuracy. But I've bought a certain kind of reliability. That I (and other users) will be able to find *all* these books with the known tag : "programming", and we are not stuck trying to guess how the set we are interested in has been tagged.

Once you accept that there are times when this trade-off is worth making, then it seems you've accepted the need for controlled vocabularies. In which case, the other comments here seem to be right. Ontology and controlled vocabularies doesn't imply hierarchical classification. These are only contingently connected.

Permalink to Comment

12. scottxyz on January 26, 2005 7:56 PM writes...

This reminds me of sheaves (from category theory) or open neighborhoods (from topology) or evolving sets (from Heyting algebra, and applied to cosmology by Fotini-Markopolou).

These logics talk about sets which evolve over time. And these logics lead to highly usable applications in "real-world" mathematics and physics and computing. (When cosmologists forget to look at the universe "from the inside", they tend to get tangled up in 10-dimensional wormholes.)

And these logics are a bit different from "classical" logics of math and physics and computing. (Heyting algebra is VERY DIFFERENT from Boolean algebra. But it seems much more resource-conscious - like Girard's Linear Logic.)

Although non-classical, these logics work well. I would check them out, if you have any interest in Constructive Mathematics or Intuitionism. It might turn out that Constructivism could solve the Ontology Anomaly.

Fotini-Markopoulou is here:
http://cgpg.gravity.psu.edu/online/Html/Seminars/Fall1998/Markopoulou/Slides/s01.html

http://xxx.lanl.gov/abs/gr-qc/9811053

"Sheaves" can be googled by people versed in Category Theory.

You will see that Lawvere's Sheaves and Fotini-Markopoulou's Sets Evolving Through Time deal with the same problem Shirkey is talking about here: not knowing all the members of a set (its extension) at the time you define it.

Permalink to Comment

13. Otis on January 27, 2005 10:29 PM writes...

The Set vs. Tree analogy is a good one, and I hope people are finally seeing how one can apply simple boolean logic to tags and create dynamic data views.

Coincidentally, this morning I posted a comment about just that - required, excluded and optional tags that let a person build a nice tag-based boolean query.

Here is the post, which describes how Simpy ( http://www.simpy.com/ ) does this:

http://www.ppdd.net/mt/mt-comments.cgi?entry_id=32

Permalink to Comment

TRACKBACKS

TrackBack URL:
http://www.corante.com/cgi-bin/mt/teriore.fcgi/1820.

Listed below are links to weblogs that reference Tags != folksonomies && Tags != Flat name spaces:

here are some exerpts form a quite itersting post on tagging versus ontoligies -> see http://www.corante.com/many/archives/2005/01/24/tags_folksonomies_tags_flat_name_spaces.php From my point of view, the author takes an extreme postion - but it st... [Read More]

Tracked on December 31, 2005 7:04 AM

WebTalk: Tags != folksonomies && Tags != Flat name spaces. Many-to-Many:

POST A COMMENT




Remember Me?



EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Spolsky on Blog Comments: Scale matters
"The internet's output is data, but its product is freedom"
Andrew Keen: Rescuing 'Luddite' from the Luddites
knowledge access as a public good
viewing American class divisions through Facebook and MySpace
Gorman, redux: The Siren Song of the Internet
Mis-understanding Fred Wilson's 'Age and Entrepreneurship' argument
The Future Belongs to Those Who Take The Present For Granted: A return to Fred Wilson's "age question"