Corante

Authors

Clay Shirky
( Archive | Home )

Liz Lawley
( Archive | Home )

Ross Mayfield
( Archive | Home )

Sébastien Paquet
( Archive | Home )

David Weinberger
( Archive | Home )

danah boyd
( Archive | Home )

Guest Authors
Site Search
Monthly Archives
Syndication
RSS 1.0
RSS 2.0
Don't Miss The AppGap, a blog on the future of the office and small business. Sponsored by QuickBase.

Many-to-Many

« Embedded del.icio.us - Tagging's future illustrated | Main | Tags run amok! »

February 1, 2005

Folksonomy: The Soylent Green of the 21st Century

Email This Entry

Posted by Clay Shirky

In What Do Tags Mean, Tim Bray says “There is no cheap metadata” (quoting himself from the earlier On Search.) He’s right, of course, in both the mathematical sense (metadata, like all entropy-fighting moves, requires energy) and in the human sense — in On Search, he talks about the difficulties of getting users to enter metadata.

And yet I keep having this feeling that folksonomy, and particularly amateur tagging, is profound in a way that the ‘no cheap metadata’ dictum doesn’t cover.

Imagine a world where there was really no cheap metadata. In that world, let’s say you head on down to the local Winn-Dixie to do your weekly grocery accrual. In that world, once you pilot your cart abreast of the checkout clerk, the bargaining begins.

You tell her what you think a 28 oz of Heinz ketchup should cost. She tells you there’s a premium for the squeezable bottle, and if you’re penny-pinching, you should get the Del Monte. You counter by saying you could shop elsewhere. And so on, until you arrive at a price for the ketchup. Next out of your cart, the Mrs. Paul’s fish sticks…

Meanwhile, back in the real world, you don’t have to do anything of the kind. When you get to the store, you find that, mirabile dictu, the metadata you need is already there, attached to the shelves in advance of your arrival!

Consider what goes into pricing a bottle of Heinz: the profit margin of the tomato grower, the price of a barrel of oil, local commercial rents, average disposable incomes in your area, and the cost of providing soap in the employee bathrooms. Yet all those inputs have already been calculated, and the resulting price then listed on handy little stickers right there on the shelves. And you didn’t have to do any work to produce that metadata.

Except, of course, you did. Everytime you pick between the Heinz and the Del Monte, it’s like clicking a link, the simplest possible informative transaction. Your choice says “The Heinz, at $2.25 per 28 oz., is a better buy than the Del Monte at $1.89.” This is so simple it doesn’t seem like you’re producing metadata at all — you’re just getting ketchup for your fish sticks. But in aggregate, those choices tell Del Monte and Heinz how to capture the business of the price-sensitive and premium-tropic, respectively.

That looks like cheap metadata to me. And the secret is that that metadata is created through aggregate interaction. We know how much more Heinz ketchup should cost than Del Monte because Heinz Inc. has watched what customers do when they raise or lower their prices, and those millions of tiny, self-interested transactions have created the metadata that you take for granted. And when you buy ketchup, you add your little bit of preference data to the mix.

So this is my Get Out of Jail Free card to Tim’s conundrum. Cheap metadata is metadata made by someone else, or rather by many someone elses. Or, put another way, the most important ingredient in folksonomy is people.

I think cheap metadata has (at least) these characteristics:

1. It’s made by someone else
2. Its creation requires very few learned rules
3. It’s produced out of self-interest (Corrolary: it is guilt-free)
4. Its value grows with aggregation
5. It does not break when there is incomplete or degenerate data

And this is what’s special about tagging. Lots of people tag links on del.icio.us, so I gets lots of other people’s metadata for free. There is no long list of rules for tagging things ‘well,’ so there are few deflecting effects from transaction cost. People tag things for themselves, so there are no motivation issues. The more tags the better, because with more tags, I can better see both communal judgment and the full range of opinion. And no one cares, for example, that when I tag things ‘loc’ I mean the Library of Congress — the system doesn’t break with tags that are opaque to other users.

This is what’s missing in the “Users don’t tag their own blog posts!” hand wringing — they’re not supposed to. Tagging is done by other people. As Cory has pointed out, people are not good at producing metadata about their own stuff, for a variety of reasons.

But other people will tag your posts if they need to group them, find them later, or classify them for any other reason. And out of this welter of tiny transactions comes something useful for someone else. And because the added value from the aggregate tags is simply the product of self-interest + ease of use + processor time, the resulting metadata is cheap. It’s not free, of course, but it is cheap.

Comments (5) + TrackBacks (0) | Category: social software


COMMENTS

1. Joshua Porter on February 1, 2005 2:10 PM writes...

Just the act of bookmarking a site in del.icio.us is producing metadata. In fact, it's the best way to judge the popularity of a post: look at how many people have bookmarked it. Tags are second-level metadata that tell us something different than bookmarking: like how many people relate a post to a certain word, for instance.

We can do this with any explicit, aggregatable behavior, not just bookmarking or tagging. Bookmarks are so interesting because they cover the whole of the Web. But lately I've been asked over and over how to apply this stuff in other domains: "How can I use folksonomy?".

My response is that it's not tagging that is driving this phenomenon. What's driving it is the fact that both parties are benefitting.

And so I think you're right on, Clay. It's the cheap, win-win metadata that's the key here. If it doesn't benefit the user it won't be done. If it doesn't benefit the site it isn't worth keeping.

Permalink to Comment

2. Janne on February 1, 2005 2:16 PM writes...

There's a serious problem that needs to be tackled, and that is one of languages. As much as some would like, most people in the world don't speak English.

I wrote about this at length: http://www.ecyrd.com/ButtUgly/Wiki.jsp?page=Main_blogentry_250105_1

Permalink to Comment

3. Joshua Allen on February 1, 2005 2:44 PM writes...

Right, metadata is a lot cheaper than food, and the supply and demand curves for metadata are very strong. Most of the problem now is in inefficient production and distribution methods. Sharing metadata with your consumers these days is a lot like packing furs on your back across the snow to get them to the trading post.

Permalink to Comment

4. lindon on February 1, 2005 6:43 PM writes...

"But other people will tag your posts if they need to group them, find them later, or classify them for any other reason. And out of this welter of tiny interventions comes something useful for someone else."

Nope, still not buying it. I think you're missing the point. We dont "need" tags (they would have arisen earlier if we did) so this "need" doesn't exist in enough volume to make tagging work well enough for everyone and everything(or a smaller but useful set of these). The "welter of tiny intervensions" isn't going to happen in enough volume to leven the "bad tag" issue.

We have a classic catch-22 issue, it needs to be done(by enough people) to be useful but wont get done because its not *currently* useful.

Still I once told Marc Andressen the image tag in Mosaic wasn't a good idea...

lp

Permalink to Comment

5. Phil Wolff on February 8, 2005 2:34 AM writes...

OK, this post had 347 unique words, 767 words, and 113 used two or more times. If you pull out stop-words, the most popular are "metadata", "cheap", "heinz", "ketchup", "tag", "tags", "tagging", "people". Others that are unusual: "post", "aggregate", "course", "users", "transactions", "folksonomy" (really unusual), "winn-dixie", "welter".

Offer this list to the author during post preview as proposed tags.

Go further. Find synonyms. Look at tags and words from posts linked to and from this post. Give extra weight to terms/tags used in previous posts. Give extra weight to terms suddenly popular among feeds I read. Add bayesian filtering or genetic algorithms and ouji boards to improve the quality of guessing. Then analyze the comments too.

Make great recommendations, to slash a blogger's or photographer's cognitive burden. Even offer the option of doing this in the background, assigning the 8 most likely tags automatically.

45 the
29 of
19 and
19 is
19 you
17 metadata
15 to
14 in
12 that
11 for
10 a
9 your
8 cheap
8 there
7 heinz
7 on
6 at
6 del
6 i
6 no
6 people
6 so
6 this
5 by
5 do
5 ketchup
5 monte
5 or
5 other
5 out
5 tag
5 tags
5 when
4 are
4 cost
4 have
4 it
4 it’s
4 like
4 not
4 someone
4 tagging
4 what
4 world
3 about
3 aggregate
3 all
3 because
3 better
3 but
3 course
3 doesn’t
3 else
3 free
3 from
3 get
3 has
3 more
3 price
3 should
3 their
3 them
3 things
3 those
3 users
3 with
3 you’re
2 already
2 any
2 both
2 bottle
2 break
2 buy
2 cart
2 created
2 data
2 don’t
2 few
2 find
2 fish
2 folksonomy
2 getting
2 how
2 if
2 its
2 little
2 local
2 lots
2 made
2 mean
2 need
2 own
2 oz
2 producing
2 requires
2 resulting
2 right
2 rules
2 says
2 search
2 self-interest
2 sense
2 shelves
2 tell
2 than
2 they
2 tiny
2 transaction
2 transactions
2 value
2 way
2 what’s
2 yet
1 abreast
1 accrual
1 add
1 added
1 advance
1 aggregation
1 amateur
1 another
1 anything
1 area
1 arrival
1 arrive
1 as
1 attached
1 average
1 back
1 bargaining
1 barrel
1 bathrooms
1 been
1 begins
1 between
1 bit
1 blog
1 bray
1 business
1 calculated
1 can
1 capture
1 card
1 cares
1 characteristics:
1 checkout
1 choice
1 choices
1 classify
1 clerk
1 clicking
1 comes
1 commercial
1 communal
1 congress
1 consider
1 conundrum
1 corrolary:
1 cory
1 could
1 counter
1 cover
1 creation
1 customers
1 deflecting
1 degenerate
1 dictu
1 dictum
1 did
1 didn’t
1 difficulties
1 disposable
1 does
1 done
1 down
1 earlier
1 ease
1 effects
1 elses
1 elsewhere
1 employee
1 energy
1 enter
1 entropy-fighting
1 everytime
1 example
1 except
1 feeling
1 full
1 gets
1 goes
1 good
1 granted
1 grocery
1 group
1 grower
1 grows
1 guilt-free
1 hand
1 handy
1 having
1 he
1 he’s
1 head
1 her
1 himself
1 how
1 human
1 icio
1 imagine
1 important
1 inc
1 incomes
1 incomplete
1 informative
1 ingredient
1 inputs
1 interaction
1 into
1 issues
1 jail
1 judgment
1 just
1 keep
1 kind
1 know
1 later
1 learned
1 least
1 let’s
1 library
1 link
1 links
1 list
1 listed
1 loc
1 long
1 looks
1 lower
1 many
1 margin
1 mathematical
1 me
1 meanwhile
1 millions
1 mirabile
1 missing
1 mix
1 most
1 motivation
1 moves
1 mrs
1 much
1 my
1 next
1 oil
1 once
1 one
1 opaque
1 opinion
1 particularly
1 paul’s
1 penny-pinching
1 people’s
1 per
1 pick
1 pilot
1 pointed
1 possible
1 posts
1 posts!
1 preference
1 premium
1 premium-tropic
1 prices
1 price-sensitive
1 pricing
1 processor
1 produce
1 produced
1 product
1 profit
1 profound
1 providing
1 put
1 quoting
1 raise
1 range
1 rather
1 real
1 really
1 reason
1 reasons
1 rents
1 respectively
1 say
1 saying
1 secret
1 see
1 seem
1 self-interested
1 she
1 shop
1 simple
1 simplest
1 simply
1 soap
1 something
1 special
1 squeezable
1 stickers
1 sticks
1 sticks…
1 store
1 stuff
1 supposed
1 system
1 take
1 talks
1 tells
1 themselves
1 then
1 there’s
1 these
1 they’re
1 through
1 tim
1 tim’s
1 time
1 tomato
1 until
1 us
1 use
1 useful
1 variety
1 very
1 was
1 watched
1 we
1 weekly
1 well
1 welter
1 where
1 will
1 winn-dixie
1 work
1 wringing

Permalink to Comment

TRACKBACKS

TrackBack URL:
http://www.corante.com/cgi-bin/mt/teriore.fcgi/1836.

Listed below are links to weblogs that reference Folksonomy: The Soylent Green of the 21st Century:


EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Spolsky on Blog Comments: Scale matters
"The internet's output is data, but its product is freedom"
Andrew Keen: Rescuing 'Luddite' from the Luddites
knowledge access as a public good
viewing American class divisions through Facebook and MySpace
Gorman, redux: The Siren Song of the Internet
Mis-understanding Fred Wilson's 'Age and Entrepreneurship' argument
The Future Belongs to Those Who Take The Present For Granted: A return to Fred Wilson's "age question"