« questions of classification (a response to Clay) |
Main
| Guilt is Good »
January 30, 2005
del.icio.uus Tag Stemming
Posted by Clay Shirky
Matt Biddulph has put up a del.icio.us tag stemmer, which will take your username (or indeed any username) and point out the possible inconsistencies based on word stemming (tag/tags/tagging, etc.) It will also take a URL, scan all users who tagged it, and look for the same thing.
What it will not (yet) do is return the full list of tags sorted by frequency, listing both tags with alternate stems and those without, but I assume this is simply a matter of time.
This is part of why I think tags are such a big deal — they are annotations for the only native unit of accounting the Web has, namely the URL; the annotations are themselves URLs that can be further annotated; and they are simple enough in both concept and technical design that third-party services like ‘stemtags’ can easily be built on top of the system.
Comments (4)
+ TrackBacks (0) | Category: social software
- RELATED ENTRIES
- Spolsky on Blog Comments: Scale matters
- "The internet's output is data, but its product is freedom"
- Andrew Keen: Rescuing 'Luddite' from the Luddites
- knowledge access as a public good
- viewing American class divisions through Facebook and MySpace
- Gorman, redux: The Siren Song of the Internet
- Mis-understanding Fred Wilson's 'Age and Entrepreneurship' argument
- The Future Belongs to Those Who Take The Present For Granted: A return to Fred Wilson's "age question"
1. Michal Migurski on January 30, 2005 9:06 PM writes...
This is step one towards del.icio.us latent semantic indexing, a language-analysis technique that has somehow hit my reblog inbox through three or four different sources over the past few weeks, in articles related to writing & research (Tools For Thought, NYT), information visualization, search, and other areas relevant to sucking meaning out of mountains of free-form data.
At the very least, it will begin to address issues around synonyms and similarity that del.icio.us still leaves unanswered.
Permalink to Comment2. Bud Gibson on January 31, 2005 7:52 AM writes...
Mirroring the previous comment, my personal issue is not so much using different variants of the same word but consistently using tags over time. What might help me is a tool to indicate similar tag clusters.
I suspect this would be hard to do for a single user but could possibly work over a group of users where you have multiple people classifying the same thing. How do they classify the same corpus? There needs to be some visible way to see this.
I suspect the ESP game that Liz Lawley mentioned a few weeks ago might be a way to make this work. She thinks of it as vulgarization. I think of it as agreement.
Again, I would like to start by just agreeing with myself over time and think of using the social space to help me do that.
Permalink to Comment3. nick sweeney on January 31, 2005 1:41 PM writes...
What's important here, I think, isn't to regard this as a way to improve 'consistency' in tagging and limit one's use of tags (as one of the trackback links suggests) but rather to identify patterns of use associated with particular variants.
It's certainly possible to use a tag stemmer as a coarse filtering tool -- a grammatical filtering tool, if you like -- but the value of folksonomy is going to come from analysing the nuances, and working out whether the choice of variants has any bearing on the subject matter, or reflects more upon the tagger. Again, it's a vector.
Permalink to Comment4. Jakob Lodwick on January 31, 2005 5:26 PM writes...
We're so close to a major breakthrough in the way people understand language (and the brain, and learning, and more) that it's not even funny.
Permalink to Comment