Guest Authors
January 20, 2005

social consequences of social tagging

Posted by Liz Lawley

So, if my inbox is any indication, the blogosphere has been abuzz lately with opinions and commentary on “folksonomy.” It’s interesting stuff, no doubt, especially for those of us who come to social computing from a library and information science background.

Unfortunately, too many of the paeans to tagging that I’ve read have completely ignored some of the key social and cultural issues associated with public and collaborative labeling of content, opting instead for a level of technology-driven optimism that I see as overly naive. I think folksonomy has incredible value—the two web sites that I use most heavily right now are Flickr and And I understand that this is something that can’t be stuffed back into the bottle. Nonetheless, I don’t think that means we have to accept it with an uncritical eye, or adopt every new implementation of tagging without consideration.

I’ve been happy, however, to see some exceptions to this rule—recent posts by Lou Rosenfeld, Rebecca Blood, Anil Dash, and Foe Romeo have all addressed the darker side of bottom-up classification.

After Technorati unveiled their new tagging implementation, Rebecca Blood wrote this:

It’s certain that some people will try to game the system, deliberately tagging their photos to misdirect people, make a political statement, or otherwise promote their own interests. It seemed to me that Technorati would want to start thinking about that now: to Design for Evil, as Bruce Sterling has said.

The issue of inevitable systematic disruptive behavior has been missing from a lot of these discussions, and I hope Rebecca’s post will spur more discussion on this aspect of tagging.

Foe Romeo followed up with a wonderful example of exactly how decontextualized content, repurposed by Technorati, shapes the perception of content. She linked to the “teens” tag on Technorati, and pointed out the juxtaposition of blog posts referencing pornography with Flickr photos of kids hanging out with each other. (Her screenshot is quite different from what I get now when I try that search, however. I don’t know if that’s a regional issue, a new censorship implementation in the Technorati database, or some other factor.)

One of the topics that’s started coming in these discussions is the extent to which any given individual’s tagging behavior is (or can be, or should be, or shouldn’t be) influenced by the tags others have assigned. In a recent post on the delicious-discuss mailing list, Saul Albert wrote:

So I’m proposing a kind of tag-brokerage system. A system by which people can form epistomology gangs who decide to share tags, and declare a concensually [sic] decided-upon meaning and remit for them. That’s when tags can start to become categories, grouped, separated, weeded, updated, expanded etc..

I’ve been mulling that over for a bit. On the one hand, as a librarian, I understand completely the value of controlled vocabularies and taxonomies. I don’t want to have to look in six different places for information on a given topic—I want some level of confidence that the things I want are grouped together. On the other hand, I don’t share the optimism that so many of my colleagues in this field seem to have that the collective “wisdom of crowds” will always yield accurate and useful descriptors. Describing things well is hard, and often context-specific.

Last night, I discovered the perfect example to illustrate my concerns. The ESP Game is a site developed by researchers at CMU, intended to create a set of descriptors for images indexed by Google.

Here’s the abstract from a paper they presented at last year’s CHI conference:

We introduce a new interactive system: a game that is fun and can be used to create valuable output. When people play the game they help determine the contents of images by providing meaningful labels for them. If the game is played as much as popular online games, we estimate that most images on the Web can be labeled in a few months. Having proper labels associated with each image on the Web would allow for more accurate image search, improve the accessibility of sites (by providing descriptions of images to visually impaired individuals), and help users block inappropriate images. Our system makes a significant contribution because of its valuable output and because of the way it addresses the image-labeling problem. Rather than using computer vision techniques, which don’t work well enough, we encourage people to do the work by taking advantage of their desire to be entertained.

Noble aims, and a brilliant way to attract and reward input. But the unintended consequences of this approach are non-trivial, as I found when I spent a few hours playing with it yesterday and today.

The way the site works is that you register as a game player, then launch the java applet. You’re paired at random with another player, and presented with the first in a a series of images.

ESP Game Screen Shot

The clock in the top left corner shows how much time is left. The thermometer along the bottom shows how many matches you’ve made, and how many images are left to describe. The “taboo words” are words you can’t use—if a color is included there, all colors are barred, as are parts of existing words (thus if china is banned, so is chin). Not all pictures include taboo words—I suspect these appear once certain words have already been regularly associated with an image, a problem I’ll come back to.

You start typing words associated with the image, one at a time. When you and your partner have both typed the same word, you’re told it’s a match and you move on to the next item. The goal is to come to agreement on a word for every picture before time runs out.

When I started playing the game, my scores were very low. I kept trying to assess the context and content of the image, and choose descriptors based on that. So if I saw a woman in a bathing suit walking down a runway wearing a sash and a crown, I’d type pageant, or contestant. But it turns out that’s a lousy strategy for winning the game, because it’s unlikely that you’ll be matched with someone doing the same thing. If the picture is of a female, regardless of clothing or context, woman is always the most likely match. And if woman is listed as a taboo word, girl will almost always work. Unsurprisingly, with pictures of men, it is not the case that “boy” is the next best choice. Instead, the best match if “man” is taboo is typically race (“black”), hair (“bald”, “gray”), or clothing (“tie” “suit” or even “glasss”).

Maximizing your scores in this game means sacrificing a lot of valuable semantics. Colors are great for matching, but often are not the most critical or valuable aspects of the image. Shapes are good. Easiest to match are any images that have text, with typical “stop words” being the best matches—“the”, “of”, etc.

The game developers attempted to push people to richer semantic labels by the use of taboo words. According to their CHI paper:

Taboo words are obtained from the game itself. The first time an image is used in the game, it will have no taboo words. If the image is ever used again, it will have one taboo word: the word that resulted from the previous agreement. The next time the image is used, it will have two taboo words, and so on. (The current implementation of the game displays up to six different taboo words.)

In my experience with the game, however, taboo words also serve to influence player word choice. Looking at the list of words encourages you to find synonyms, rather than analyzing the image itself. If a taboo word shown is “round,” then “circle” turns out to be a very likely match—even if there may be aspects of the image that have far more semantic meaning. If one word in the image is shown, any other words, or word fragments, or letters, are likely to be typed in. In many cases, the list of five or six “taboo” words being shown completely miss key aspects of the image—one image I saw was of a greek coin. No inclusion of greece or greek anywhere in the taboo words, though, nor could I get a match with my partner by typing those. Coin was there, but the other words had to do with obvious physical characteristics rather than inferred or non-explicit information.

There’s another problem that I encountered with the list of “taboo” words, one that’s even more troubling for me. One of the pictures I was shown last night was of a young black woman. The first word in the list of taboo terms was “nigga.” According to the game description, that means that two people, randomly selected, agreed upon that word as the best descriptor for the image.

The paper goes on to say:

We use only words that players agree on to ensure the quality of the labels: agreement by a pair of independent players implies that the label is probably meaningful. In fact, since these labels come from different people, they have the potential of being more robust and descriptive than labels that an individual indexer would have assigned.

Beyond the obviously disturbing example I just provided, there are other problems with this conclusion. The labels chosen by people trying to maximize their matches with an anonymous partner are not necessary the most “robust and descriptive” labels. They’re the easiest labels, the most superficial labels, the labels that maximize the speed of a match rather than the quality of the descriptor. In addition, they’re words that are devoid of context or depth of knowledge. (Yes, increasing the number of people assigning tags, as in Flickr or, helps with this particular problem.)

I think, however, that the same factors that influence players of the ESP Game to try to maximize agreement rather than depth are also at work in the new folksonomic playgrounds. Increasingly, people are changing the way they label their links or photos because of how they see other people labeling them. Knowing that your descriptors will change how people can access your content can’t help but change the way you use the tags—just as knowing that people will read your blog influence the way you write. Tagging for your own retrieval is different than tagging for retrieval by people you know (say, searching for posts on your blog) and even more different than tagging for retrieval in an completely uncontextualized environment—like Technorati. (Anil does a good job of thinking about this impact.)

Another weakness of this approach is that the people who are likely to have the most time to play these games and provide the content are not necessarily those with the broadest range of knowledge and expertise. (Yes, yes, I know…that’s a horribly elitist thing to say.) In fact, when I played the game with my 8-year-old sitting next to me, I did much better—his very simple suggestions were typically better than my more nuanced descriptors. Which is fine, if you’re trying to maximize search results for other eight-year-olds. But what if you want to maximize results for people who need finer granularity?

Clay argues that detractors from wikipedia and folksonomy are ignoring the compelling economic argument in favor of their widespread use and adoption. Perhaps. But I’m arguing that it’s just as problematic to ignore the compelling social, cultural, and academic arguments against lowest-common-denominator classification. I don’t want to toss out folksonomies. But I also don’t want to toss out controlled vocabularies, or expert assignment of categories. I just don’t believe that all expertise can be replicated through repeated and amplified non-expert input.

Comments (11) + TrackBacks (2) | Category: social software


1. Kartik Agaram on January 20, 2005 12:32 AM writes...

"There’s another problem that I encountered with the list of 'taboo' words..The first word in the list of taboo terms was 'nigga.'"

Seems to me this is a problem with the society we live in rather than any particular game. Where does free speech end and distributed editing begin?

Permalink to Comment

2. Kevin Marks on January 20, 2005 7:37 AM writes...

This is a great discussion of the issues, Liz. We have done some degree of designing for evil (hence the disappearance of that porn spammer) but we will need to continue to evolve responses to the bad actors over time.
Your ESP game experience sounds like the 'priming' that Malcolm Gladwell describes in 'Blink' , where exposure to word or image stimuli beforehand can radically effect subsequent perceptions, especially when under time stress as in the ESP game,

Permalink to Comment

3. Liz Lawley on January 20, 2005 8:06 AM writes...

Kartik, I agree that it points to societal issues. But there's also the question of who gets to describe and label things for global retrieval. Do I want the lowest-common-denominator descriptors? Do I want my content labelled by other people? (Trolls, for example?) And how is distributed editing different from (or worse than) distributed labelling?

My main point here, however, was to point out that the game developers' description seemed a bit naive--the most easily agreed-upon descriptors aren't necessarily the "best," or the "most robust."

Permalink to Comment

4. Joshua Porter on January 20, 2005 12:30 PM writes...

Hi Liz. I wrote my paean A Self-Referential Demonstration of the Power of the Folksonomy just yesterday about the power of

Let me tell you why I'm so this point. Right now, we're seeing in a new reference tool built by people for people that (for the most part) bubbles up the best stuff to the top. It aggregates the individual behavior of intelligent people into a sort of relevancy framework, so that others can find the best content on their topic of choice. (If you'll have a look right now, your article is everywhere on that site). Not bad, eh?

The social implications you speak of are certainly important to consider and discuss. I believe your concern, though, pales in comparison to the mishmash of other architectures that we currently have outside of behavior-based search engines and folksonomies. Examples include the static, near-useless architecture of web sites that are based on self-serving parts of an organization: marketing needs *this* on the site, while the CEO needs *this*. I've seen too many sites go up this way (and too many users fail on them) to not embrace what I'm seeing with folksonomies.

If a system can be gamed, it will be. This is true with, and it's true with search engines, RSS feeds, comment spam, etc. None of these difficulties are new, just different.

More generally, think about the way that we communicate with words. As an example, what you might call social tagging, I might call a folksonomy, and Lou might call a metadata ecology. Of course there are slight distinctions here. But as humans our greatest gift is the capability to adapt to each other and each other's words, and we certainly get along without a single, officially defined way to talk about ... *anything*. I think folksonomies, because they are based on real human behavior, are afflicted with this same (problem).

Permalink to Comment

5. Luciano Evaristo Guerche on January 20, 2005 3:15 PM writes...

I must confess I am also becoming a Flick and addicted :-)

Permalink to Comment

6. TuringTest on January 20, 2005 4:12 PM writes...

Knowing that your descriptors will change how people can access your content can’t help but change the way you use the tags—just as knowing that people will read your blog influence the way you write. Tagging for your own retrieval is different than tagging for retrieval by people you know

To address this problem i'd suggest removing the source of interference: there should be added granular visibility. The same way that you can choose to hide your posts as private (at least does this), you should be able to add private tags for your own use. As there is no upper limit to the number of tags, you could have both tags for social goals and for your own use. Aggregation of knowledge could use both types (for example when counting the most popular tags).

Permalink to Comment

7. joe on January 21, 2005 9:19 AM writes...

Social tagging = Ebonics?

Permalink to Comment

8. Greg on January 21, 2005 11:20 AM writes...

I suppose this should go without saying, but I haven't seen it mentioned yet... Folksonomies should define a class of categorization artifacts well beyond tags, social tags, or any other keyword-type artifact for web pages. The most obvious examples I see are the "common names" of plants, as opposed to the Latin taxonomy. Has anyone done research on how that works, and problem / issue areas there?

The subset that's being discussed here might more properly be called tagsonomy.

Permalink to Comment

9. Nikolas on January 21, 2005 2:05 PM writes...

Hey, I just got a brand new free mini mac! Fully legit and I didn't pay for squat.
I received an ipod using the gratis network as well.
Check it out, here's my help site URL:

And here's if you want to sign up:

Permalink to Comment

10. Alan Levine on January 25, 2005 8:16 AM writes...

Your criticism of the ESP Game are spot on but I see an important distinction- in that environment your goals are "winning" a game and the information is content that does not mean much to you.

When I tag my own photos or tag the web sites I found valuable, I am doing this for my own benefit as much (or more) than the "crowd", and this is a very different intrinic motivvation than getting a higher "score". My actions follow my intents.

Also, why is this seen as an either/or issue? Why is it "controlled vocabs" vs the folksonomists? Isn't there a space for both to run together?

Permalink to Comment

11. Boris Anthony on January 26, 2005 12:24 AM writes...

Open up a wiki-like (wikipedia-like?) interface to the tag databases. Ask Flickr and to invest in their own futures and develop a shared resource or an API for sharing and synching tags. Have somebody create desktop tag managers, for personal and public use.

Permalink to Comment


Listed below are links to weblogs that reference social consequences of social tagging:

by Ellyssa Kroski There is a revolution happening on the Internet that is alive and building momentum with each passing tag. With the advent of social software and Web 2.0, we usher in a new era of Internet order. One in which the user has the powe... [Read More]

Tracked on December 8, 2005 3:36 PM

InfoTangle: The Hive Mind: Folksonomies and User-Based Tagging

On Social Tagging... from connecting*the*dots
As social tagging catches on beyond the early adopters, content and commerce domains will begin to open their information architectures to empower their consumers to tag, creating exponentially greater degrees of faceted, semantic relationships between... [Read More]

Tracked on December 13, 2005 11:14 AM

connecting*the*dots: On Social Tagging...


