Corante

Authors

Clay Shirky
( Archive | Home )

Liz Lawley
( Archive | Home )

Ross Mayfield
( Archive | Home )

Sébastien Paquet
( Archive | Home )

David Weinberger
( Archive | Home )

danah boyd
( Archive | Home )

Guest Authors
Recent Comments

pet rescue saga cheats level 42 on My book. Let me show you it.

Affenspiele on My book. Let me show you it.

Affenspiele on My book. Let me Amazon show you it.

Donte on My book. Let me show you it.

telecharger subway surfers on My book. Let me show you it.

Ask Fm Anonymous Finder on My book. Let me show you it.

Site Search
Monthly Archives
Syndication
RSS 1.0
RSS 2.0
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

Many-to-Many

« Jimbo's Problems: A Free Culture Manifesto | Main | Valuing Social Gestures »

August 8, 2005

the biases of links

Email This Entry

Posted by danah boyd

I have a hard time respecting anyone who believes that science or technology is neutral. Unfortunately, even when people consciously know that they are not, they give credence to the biased outputs without questioning the underlying assumptions. This is why i’m an academic - nothing gives me greater joy than to think about what biases go into the creation of a particular system.

After reminding folks at Blogher that there are gender differences in networking habits, i decided to do some investigation into the network structures of blogs. Kevin Marks of Technorati kindly gave me a random sample of 500 blogs to play with. I began coding them based on gender (which is surprisingly easy to do given the amount of personal information people put about themselves) and looking for patterns in links and blogrolls.

I decided to do the same for non-group blogs in the Technorati Top 100. I hadn’t looked at the Top 100 in a while and was floored to realize that most of those blogs are group blogs and/or professional blogs (with “editors” and clear financial backing). Most are covered in advertisements and other things meant to make them money. It’s very clear that their creators have worked hard to reach many eyes (for fame, power or money?).

Here are some of the patterns that i saw*:

Blogrolls:
  • All MSNSpaces users have a list of “Updated Spaces” that looks like a blogroll. It’s not. It’s a random list of 10 blogs on MSNSpaces that have been recently updated. As a result, without special code (like in Technorati), search engines get to see MSNSpace bloggers as connecting to lots of other blogs. This would create the impression of high network density between MSNSpaces which is inaccurate.
  • Few LiveJournals have a blogroll but almost all have a list of friends one click away. This is not considered by search tools that look only at the front page.
  • Bloggers who use hosting services tend to link to only others on the same hosting service (from the blogrolls on Xanga and Rakuten to the friend links on LJ). The blogroll structure on these is often set up to only accept lists of blogs from that service.
  • Blogrolls seem to be very common on politically-oriented blogs and always connect to blogs with similar political views (or to mainstream media).
  • Blogrolls by group blogging companies (like Weblogs, Inc.) always link to other blogs in the domain, using collective link power to help all.
  • A fraction of the Top 100 have blogrolls of blogs. Some have blogrolls that are a link away (like Crooked Timber). Quite a few use that space to advertise or link to mainstream media or companies.
  • Male bloggers who write about technology (particularly social software) seem to be the most likely to keep blogrolls. Their blogrolls tend be be dominantly male, even when few of the blogs they link to are about technology. I haven’t found one with >25% female bloggers (and most seem to be closer to 10%).
  • On LJ (even though it doesn’t count) and Xanga, there’s a gender division in blogrolls whereby female bloggers have mostly female “friends” and vice versa.
  • I was also fascinated that most of the mommy bloggers that i met at Blogher link to Dooce (in Top 100) but Dooce links to no one. This seems to be true of a lot of topical sites - there’s a consensus on who is in the “top” and everyone links to them but they link to no one.
  • I also get the impression that blogrolls are not frequently updated (although i have to imagine that the blogs one reads are). I wonder how static blogrolls are.
Linking patterns:
  • The Top 100 tend to link to mainstream media, companies or websites (like Wikipedia, IMDB) more than to other blogs (Boing Boing is an exception).
  • Blogs on blogging services rarely link to blogs in the posts (even when they are talking about other friends who are in their blogroll or friends’ list). It looks like there’s a gender split in tool use; Mena said that LJ is like 75% female, while Typepad and Moveable Type have far fewer women.
  • Bloggers often talk about other people without linking to their blog (as though the audience would know the blog based on the person). For example, a blogger might talk about Halley Suitt’s presence or comments at Blogher but never link to her. This is much rarer in the Top 100 who tend to link to people when they reference them.
  • Content type is correlated with link structure (personal blogs contain few links, politics blogs contain lots of links). There’s a gender split in content type.
  • When bloggers link to another blog, it is more likely to be same gender.

I began this investigation curious about gender differences. There are a few things that we know in social networks. First, our social networks are frequently split by gender (from childhood on). Second, men tend to have large numbers of weak ties and women tend to have fewer, but stronger ties. This means that in traditional social networks, men tend to know far more people but not nearly as intimately as those women know. (This is a huge advantage for men in professional spheres but tends to wreak havoc when social support becomes more necessary and is often attributed to depression later in life.)

While blog linking tends to be gender-dependent, the number of links seems to be primarily correlated with content type and service. Of course, since content type and service are correlated by gender, gender is likely a secondary effect.

Interestingly, there are distinct clusters of norms wrt linking in blogging, not a coherent and consistent one. The search engines (and the Technorati 100 and PubSub’s Daily 100 Top Links) are validating one of those clusters, regardless of whether or not that is what searchers are looking for. The Top 100 is a list of blogs who either fit into those norms or have adopted those norms in their patterns (most commonly the companies).

I also want to point out a few other issues in link biases that are relevant here:
  • All links are created equal. All relationships are not. Treating everything like a consistent weak tie is quantity over quality and in social networks, that means male over female.
  • When the data being measured has inconsistent structure rules, any ranking metric is inherently flawed. In blogs, there’s no consistency for what a link means, no consistent social norms for blogrolls, no agreed-upon links norms. Metrics inherently squish out this nuance and force all of the square pegs into the round holes.
  • Links indicate no weight, no valence, no attributes. I know Technorati has asked folks to indicate positive/negative in their links or to use nofollow, but few do this. And even if people did, that kind of articulation is a social disaster (::cough:: think Friendster).
  • Traditionally, there is power in keeping your black book shut; one’s position in a network can be quite powerful. You get kudos by helping two unconnected people. You can limit information flow and acquire credit when you take something from one group to another. (This is the basis for some interesting work on creativity - creativity is when bridges connect information from disparate worlds.) While some think that transparency is good, some hide their network to maintain power. For example, if as a blogger, you provide “cool links,” you want others to read you, not the collection of people you read. Of course, a reasonable counter argument is that this person is no longer needed as a bridge, but as a curator. Still, some people hide so that they must be asked for recommendations directly and thus can control who they send people to. (Note: this is a particular kind of power move; transparency can also be a power move by through gifting.)
  • There are social consequences to linking structures and those who have a lot of eyes on them are probably more aware of the consequences of their linking habits. This is another reason why people with a lot of eyes may get rid of blogrolls. Having to negotiate lots of requests for links can be a real turn-off.
  • People will try to manipulate any ranking if there is an advantage to being up top. Static measurement algorithms cause harm to the entire community that is being measured. Web search engines know this, but it’s equally critical for blog search.

These services are definitely measuring something but what they’re measuring is what their algorithms are designed to do, not necessarily influence or prestige or anything else. They’re very effectively measuring the available link structure. The difficulty is that there is nothing consistent whatsoever with that link structure. There are disparate norms, varied uses of links and linking artifacts controlled by external sources (like the hosting company). There is power in defining the norms, but one should question whether or companies or collectives should define them. By squishing everyone into the same rule set so that something can be measured, the people behind an algorithm are exerting authority and power, not of the collective, but of their biased view of what should be. This is inherently why there’s nothing neutral about an algorithm.

While i’ve been looking into the linking patterns, Mary Hodder has been thinking through new metrics for measurement. These are very important but not because one is better than the other. In fact, if we all switched to any of her metrics, we’d have just as many biases as we have now. And many of the Top blogs would try to figure out how to get rank in that system. The significance lies in the ability to offer choice.

Of course, choice is difficulty. Lots of people want to know what the “best” one is and don’t want to think about the metrics behind it (yes, these are the “neutral” people). Unfortunately, many of those types have a lot of power that motivate people to want their attention. The press want a list of the best and many bloggers want the attention of the press and thus want to be listed among the best. Breaking this cycle is virtually impossible, but it how power maintains power. And in our current system, we are doing a damn fine job of replicating the power structures that pervade everyday life under the auspices of creating a new system that usurps power. Ah, what fun.

Still, i think it’s critical to work on new metrics so that we can at least start showing alternate ways of organizing information if for no other reason than to push back against the conception of neutrality. And thus, i’m stoked to help Mary out and i would encourage everyone else interested in altering the power structure to do so as well.

At the least, i do think we need to really think about what is at stake and what we’re inadvertently supporting through our current systems. Are these the power structures that we want to maintain? Because there’s nothing neutral about our technological choices.

  • Note: these are patterns, not findings. The methodology used here is not solid enough for findings. I am not offering quantitative data because i want it to be clear that these are trends based on tracking patterns. Think of them as guesstimated hypotheses (and i’d be ecstatic if someone would compute them).

(Also posted at apophenia)

Comments (29) + TrackBacks (0) | Category: social software


COMMENTS

1. Emma Duke-Williams on August 8, 2005 6:18 AM writes...

Just found this (via Steven Downes' Edu_RSS), and have found it & the linked articles very interesting. It's something that I think needs far more research ...

Permalink to Comment

2. Andrea Learned on August 8, 2005 5:17 PM writes...

It was refreshing to read of your guesstimates! I am a blogger who has, and always will have, trouble playing the link/blogroll game. I don't understand all the ins and outs, and I don't make any money one way or the other. I try as hard as I can to just stick with writing about what I know/am interested in - and hope that I've got readers who gain from it. It is easier for my brain to do what I love and simply trust that it is benefiting my career in some way. (I guess I am the classic female social networker...)

Your work is fascinating. I look forward to hearing more about what you and Mary discover over time.

Permalink to Comment

3. TW on August 8, 2005 9:38 PM writes...

A lot of fuel for thought. Thanks for stuff to ponder during my day...and probably quite some time to come.

Permalink to Comment

4. Duncan Riley on August 9, 2005 4:21 AM writes...

Your post has given me a headache just by the various things you go into, there are so many conflicting findings from the various portions of the blogosphere that I feel confused afterwards. I think you sum it up best here:
" if we all switched to any of her metrics, we’d have just as many biases as we have now"

what is best or better will always be held to be subjective and will differ from person to person. Whilst there is some merit in understanding such concepts has the ongoing obsession with the Top 100/ A List of late now not gone too far? Will we end up with 100 variations of a Top 100 list that divides the blogosphere even further?

Permalink to Comment

5. Bill Seitz on August 9, 2005 8:20 AM writes...

Technorati is kind of like the business-metrics folks - they measure what they can measure easily, then forget that it's just a starting point.

So if you think you can define a better measure, you need to
* define that metric (hopefully it's scalable)
* figure out what each *writer* and blog-engine developer has to do to support it (are you willing to switch engines for this feature?)
* maybe you start out with a semi-manual hack to your engine of choice - e.g. you have to manually edit your FOAF or OPML file rather than having TypePad do it for you..
* get whatever blog-engine support you can at first
* get writers to start hacking with that engine to support the new feature/metric
* put up a server that does the necesary scraping, calculating, presenting.

Then work on getting increasing-returns...

Permalink to Comment

6. Bill Seitz on August 9, 2005 8:23 AM writes...

A route-around example: the WomenBloggers WebRing.

http://ringsaround.net/womenbloggers/

Permalink to Comment

7. Mark Bernstein on August 9, 2005 10:38 AM writes...

You begin with:

> I have a hard time respecting anyone who believes
> that science or technology is neutral.

This starting point puzzles me. You're studying the intersection of link behavior and link measurement; these are clearly interesting and important things to study.

The only reason I see for talking about your difficulties in respecting your colleagues, some of whom might believe things you don't, would be if that difficulty were so extreme that it clouded your judgment or distorted your observation. Surely that's not the case here.

A. B. Clump, boy scientist, might be a scoundrel and a cad and someone I wouldn't invite to dinner. He might believe a dozen foolish things before breakfast. Still, if Clump has found a new ocelot, it's the ocelot that matters -- not Clump's personal failings.

Permalink to Comment

8. sneJ on August 9, 2005 10:54 AM writes...

"Blogs on blogging services rarely link to blogs in the posts (even when they are talking about other friends who are in their blogroll or friends’ list..."

I don't believe this is true of LiveJournal, where it's nearly universal to refer to another person on LJ by using the special "<lj user=...>" tag that creates a link to that user's journal, complete with the little head icon. (Example.) The icon makes those links stand out visually, so it's easy to see that many posts are liberally peppered with them.

The few LJ'ers I know who don't use that mechanism, do so deliberately because they don't want people just skimming their posts looking for links to themselves.

Permalink to Comment

9. Bill Seitz on August 9, 2005 2:39 PM writes...

Has anyone suggested to Technorati that they base their Top100 on non-root links (e.g. links to specific postings)?

Permalink to Comment

10. Bill Seitz on August 9, 2005 2:50 PM writes...

Another idea: each bloggers has a top-5 sub-blogroll. (If they include more than 5, just the first 5 are counted.)

The changes people's thinking process about who to include, and perhaps results in (fewer) "deeper" thinking/reading relationships being expressed and aggregated.

Permalink to Comment

11. zephoria on August 9, 2005 7:49 PM writes...

sneJ - my apologies. I had originally written something about the LJ user structure but apparently left it out of my post. Yes, LJ users do frequently talk about their friends using that structure in their posts, although there seems to be a gap between those who do and those who don't and it seems to be network-driven. For example, if your friends use that paradigm, you will too. But if they don't, you won't. I don't know what separates those clusters from each other though. I meant to mark that as a special case because it is so internally dense but i apparently forgot. Thanks for catching this!

Permalink to Comment

12. Rob on August 9, 2005 9:13 PM writes...

Strange, a quick look shows that my own blog roll is weighed toward male bloggers, although I clearly break the 25% rule as a male blogger: Of 46 links, one's to someone's second blog, one is to a group blog I'm developing, 18 are to women (or primarily women with occasional male guests) and one transexual female. Of the remaining blogs, many are group blogs that feature both male and female bloggers.

Why? There's a large number of Mommy blogs. Events in my own life (wife and I are trying for children) make them interesting and useful reading. The unusual thing is, in the real world, I tend to make friends with women much more easily than men -- the question for me may be why there are so many men on the list, not why there are so many women on my blog roll.

There are two dead blogs on my blog rolls, one is a sports blog inactivated by illness and one that has been quiet for a month. I'm keeping the dead ones there as I try to contact the authors and find out where they've moved to. It' a mnemonic device. After a while, either a blog's new location is found or they're cut.

I am also reluctant to give up on folks. Pulling a blog too soon feels like I'm saying "I don't care." I hope the continued presence says to their authors "I still hope to see you again." With the quiet one and the inactive because of illness one, this is especially true.

I myself quit blogging for a while. A single link that was still there on a blog site encouraged me to start again.

Permalink to Comment

13. vijay on August 10, 2005 2:36 AM writes...

As an Eljayer,I might agree with the few blogrolls finding. But It's not that we don't care for rolls. LJ doesn't allow more than five links on the main journal/blog page.

I am not sure if most on LJ are above 'mutual backpatting'.

Permalink to Comment

14. Ryan Shaw on August 10, 2005 2:25 PM writes...

(via kottke.org)

More on the politics of search engines in a paper by Lucas D. Introna and Helen Nissenbaum, Shaping the Web: Why the politics of search engines matters (2000): "Our study of search engines suggests that they systematically exclude (in some cases by design and in some accidentally) certain sites, and certain types of sites, in favor of others, systematically give prominence to some at the expense of others. We argue that such biases, which would lead to a narrowing of the Web's functioning in society, run counter to the basic architecture of the Web as well as the values and ideals that have fueled widespread support for its growth and development."

Permalink to Comment

15. Jason Davis on August 10, 2005 4:17 PM writes...

Jason Calacanis on his blog (weblogsinc) was talking about the same thing. he has offered a pretty generous prize to anyone who can come up with a better top 500 blog list.

Permalink to Comment

16. Jim on August 10, 2005 5:17 PM writes...

yea I dont think that anyone will come out with a better top 500 list of blogs anytime soon.

Permalink to Comment

17. Tim on August 10, 2005 7:37 PM writes...

> I have a hard time respecting anyone who
> believes that science or technology is neutral.

I have a hard time respecting anyone who thinks science is just another social enterprise, whose myopic results are purely the result of existing power structures and biases.

Technology is definitely not neutral, but science *is*, given a long enough time scale. On the small scale of months, years, or even a few decades, science does indeed have its prejudices and errors. But on the long scale of many decades, or centuries, science is self-correcting, and prejudice and error are eventually removed.

The law of gravity is neutral; it is not an artifact of gender bias, or political ideaology. Neither is plate tectonics. Given enough time, biases in science become apparent and get questioned, heretical notions get researched. Slowly, science methodically uncovers an ever closer description of the truth, as it is, without regard to what you, I or anyone else, might desire that truth to look like.

Permalink to Comment

18. zephoria on August 10, 2005 7:57 PM writes...

Tim - there's a huge difference between physics and algorithms meant to turn social behavior into quantifiable elements for computation. But i also agree with you - science in the local is inherently biased. We are talking about the science in practice not the axioms. Nowhere was this more painfully clear to me than when i was trying to study depth cue prioritization and realized that 60 years of work in perception was flawed due to the exact same subject limitation.

Permalink to Comment

19. Tim on August 11, 2005 4:35 AM writes...

Zephoria -

> there's a huge difference between physics
> and algorithms meant to turn social behavior
> into quantifiable elements for computation

The author made no such distinctions; she indicted science as a whole. Rephrased, her first sentence is essentially "Science is not neutral". But she is simply wrong - given enough time, science is neutral.

> i was trying to study depth cue prioritization
> and realized that 60 years of work in perception
> was flawed

But this is exactly my point. Error and bias in scientific research are temporary. They are eventually removed by people, like you, who notice the flaws, who question the assumptions. You found what you perceive to be be flaws in prior research. So, you'll research the issue further, and correct the flaws (or someone else will). Through you, science corrects itself, and proceeds ever closer to the unbiased truth. The system works!

Furthermore, the author gave no evidence that any of the metrics at issue were developed with any scientific rigor. Most likely, they were developed by business types, or programmers (who are generally clever, but are usually not scientists) working for business types. How can she turn that into an indictment of science? Her own (unfounded) biases are showing.

Permalink to Comment

20. zephoria on August 11, 2005 12:26 PM writes...

Tim -

"she" = me (the author). I still disagree with you that given enough time science is neutral. There are huge procedural biases in the scientific process that limit many things from being understood. People are rarely aware of the biases of their research and we are continuously realizing that what we thought to be "truth" wasn't quite true.

Certain sciences are more affected by bias. While physics can make great strides without extensive biases, most things involving humans are heavily heavily biased. Hell, there are a ton of experiments that we simply cannot do. Time will not allow us to suddenly do them.

Technology is worse than science because technology is a set of tools built to meet the needs of humans. It's entirely wrapped up in human biases.

My indictment of science is based on my research in neuro-psych. My indictment of technology is based on years in that field. I am definitely biased - we ALL are. I've spent my life questioning every step that is made without a critical eye. I was indoctrinated into the "science is everything" religion when i was a kid and doing research in the sciences destroyed most of my respect for the processes. There are good folks who really do want to get to the bottom of things. But there are a hell of a lot who want fame or money, regardless of what it takes.

Permalink to Comment

21. Dennis Howlett on August 11, 2005 9:02 PM writes...

This is breathtaking. This will have an extraordinary impact on how blog stats are viewed.

Permalink to Comment

22. David Gibbons on August 11, 2005 11:11 PM writes...

Thank you very much - both broad & deep.

Permalink to Comment

23. Savanna Slave on August 12, 2005 3:49 PM writes...

that was a really interesting read, and you're right on the money. =)

*but* here's the thing: your argument is also based on an assumption, and it's not a proper assumption. the assumption is that you assume people rely on the top 100 (or 500 or whatever) listings to read or get to the blogs they find on the net.

but they don't.

i think most people get to blogs they like via other blog links in articles and the like, and not necessarily through blogrolls. *sometimes* people click on blogrolls, but it's not always the case. most people seem like they follow links on articles, entries, sent via email or word of mouth or chat systems. then, if they like the blog, they bookmark it and share it with others.

the big blogs are read cuz they are consistent but also because they are reliable for what people want. political bloggers look like they totally keep in that sphere and whatever sphere you're in (right, left, center, whatever), you know the major ones even if you don't click on blogrolls.

i run a porn blog but i also read a lot of political blogs. some of the biggest political blogs don't link to other smaller blogs all that often, but more to news stories on mainstream media sites. =) (kinda funny cuz both sides hate the MSM as being too liberal or conservative etc...) i think in those cases, blogrolls have little to do with it.

if people relied mostly on blogrolls, i think your point is really great, but i don't see that happening. blogrolling looks like it's mostly a courtesy rather than a serious resource of information. it can be, but it just doesn't look that way. =)

Permalink to Comment

24. zephoria on August 12, 2005 11:16 PM writes...

Savanna - actually, it was not based on that assumption. It *was* based on the discussion that occurred at Blogher about how mainstream media and other power players (like folks at the RNC/DNC) judge top blogs and thus seek people's voice as "authorities" on the subject. There are many who do read these lists as authoritative and that was the concern at Blogher.

Permalink to Comment

25. Counsel on August 17, 2005 3:00 PM writes...

I would be interested in knowing how the bloggers spouse/significant other (collectively known as SOs)affects the links on a given blog. It could be there are som Active SOs or bloggers who worry about what their SOs see on the blog.

Any comments/studies?

C

Permalink to Comment

26. Counsel on August 17, 2005 3:17 PM writes...

I would be interested in knowing how the bloggers spouse/significant other (collectively known as SOs)affects the links on a given blog. It could be there are some Active SOs or bloggers who worry about what their SOs see on the blog.

Any comments/studies?

C

Permalink to Comment

27. Counsel on August 17, 2005 3:17 PM writes...

I would be interested in knowing how the bloggers spouse/significant other (collectively known as SOs) may affect the links on a given blog. It could be there are some Active SOs or bloggers who worry about what their SOs see on the blog.

Any comments/studies?

C

Permalink to Comment

28. zephoria on August 17, 2005 6:17 PM writes...

Counsel - this is something the mommy blogs have discussed extensively but not something that i can evaluate simply through looking at the end results.

Permalink to Comment

29. Paul on August 23, 2005 5:22 AM writes...

Nice blog.I like this.
Paul
http://www.sify.com

Permalink to Comment

TRACKBACKS

TrackBack URL:
http://www.corante.com/cgi-bin/mt/teriore.fcgi/1953.

Listed below are links to weblogs that reference the biases of links:


EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Spolsky on Blog Comments: Scale matters
"The internet's output is data, but its product is freedom"
Andrew Keen: Rescuing 'Luddite' from the Luddites
knowledge access as a public good
viewing American class divisions through Facebook and MySpace
Gorman, redux: The Siren Song of the Internet
Mis-understanding Fred Wilson's 'Age and Entrepreneurship' argument
The Future Belongs to Those Who Take The Present For Granted: A return to Fred Wilson's "age question"