Corante

Authors

Clay Shirky
( Archive | Home )

Liz Lawley
( Archive | Home )

Ross Mayfield
( Archive | Home )

Sébastien Paquet
( Archive | Home )

David Weinberger
( Archive | Home )

danah boyd
( Archive | Home )

Guest Authors
Site Search
Monthly Archives
Syndication
RSS 1.0
RSS 2.0
In the Boston area?: Join us on June 11 for Startups and the Cloud, a free event on cloud computing with insights from Intuit founder Scott Cook and others

Many-to-Many

« questions of classification (a response to Clay) | Main | Guilt is Good »

January 30, 2005

del.icio.uus Tag Stemming

Email This Entry

Posted by Clay Shirky

Matt Biddulph has put up a del.icio.us tag stemmer, which will take your username (or indeed any username) and point out the possible inconsistencies based on word stemming (tag/tags/tagging, etc.) It will also take a URL, scan all users who tagged it, and look for the same thing.

What it will not (yet) do is return the full list of tags sorted by frequency, listing both tags with alternate stems and those without, but I assume this is simply a matter of time.

This is part of why I think tags are such a big deal — they are annotations for the only native unit of accounting the Web has, namely the URL; the annotations are themselves URLs that can be further annotated; and they are simple enough in both concept and technical design that third-party services like ‘stemtags’ can easily be built on top of the system.

Comments (4) + TrackBacks (0) | Category: social software


COMMENTS

1. Michal Migurski on January 30, 2005 9:06 PM writes...

This is step one towards del.icio.us latent semantic indexing, a language-analysis technique that has somehow hit my reblog inbox through three or four different sources over the past few weeks, in articles related to writing & research (Tools For Thought, NYT), information visualization, search, and other areas relevant to sucking meaning out of mountains of free-form data.

At the very least, it will begin to address issues around synonyms and similarity that del.icio.us still leaves unanswered.

Permalink to Comment

2. Bud Gibson on January 31, 2005 7:52 AM writes...

Mirroring the previous comment, my personal issue is not so much using different variants of the same word but consistently using tags over time. What might help me is a tool to indicate similar tag clusters.

I suspect this would be hard to do for a single user but could possibly work over a group of users where you have multiple people classifying the same thing. How do they classify the same corpus? There needs to be some visible way to see this.

I suspect the ESP game that Liz Lawley mentioned a few weeks ago might be a way to make this work. She thinks of it as vulgarization. I think of it as agreement.

Again, I would like to start by just agreeing with myself over time and think of using the social space to help me do that.

Permalink to Comment

3. nick sweeney on January 31, 2005 1:41 PM writes...

What's important here, I think, isn't to regard this as a way to improve 'consistency' in tagging and limit one's use of tags (as one of the trackback links suggests) but rather to identify patterns of use associated with particular variants.

It's certainly possible to use a tag stemmer as a coarse filtering tool -- a grammatical filtering tool, if you like -- but the value of folksonomy is going to come from analysing the nuances, and working out whether the choice of variants has any bearing on the subject matter, or reflects more upon the tagger. Again, it's a vector.

Permalink to Comment

4. Jakob Lodwick on January 31, 2005 5:26 PM writes...

We're so close to a major breakthrough in the way people understand language (and the brain, and learning, and more) that it's not even funny.

Permalink to Comment

TRACKBACKS

TrackBack URL:
http://www.corante.com/cgi-bin/mt/teriore.fcgi/1834.

Listed below are links to weblogs that reference del.icio.uus Tag Stemming:


EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Spolsky on Blog Comments: Scale matters
"The internet's output is data, but its product is freedom"
Andrew Keen: Rescuing 'Luddite' from the Luddites
knowledge access as a public good
viewing American class divisions through Facebook and MySpace
Gorman, redux: The Siren Song of the Internet
Mis-understanding Fred Wilson's 'Age and Entrepreneurship' argument
The Future Belongs to Those Who Take The Present For Granted: A return to Fred Wilson's "age question"