Against Well-designed Reputation Systems (An Argument for Community Patent)

Intro: I was part of a group of people asked by Beth Noveck to advise the Community Patent review project about the design of a reputation and ranking system, to allow the widest possible input while keeping system gaming to a minimum. This was my reply, edited slightly for posting here.

We’ve all gone to school on the moderation and reputation systems of Slashdot and eBay. In those cases, their growing popularity in the period after their respective launches led to a tragedy of the commons, where open access plus incentives led to nearly constant attack by people wanting to game the system, whether to gain attention for themselves or their point of view in the case of Slashdot, or to defraud other users, as with eBay.

The traditional response to these problems would have been to hire editors or other functionaries to police the system for abuse, in order to stem the damage and to assure ordinary users you were working on their behalf. That strategy, however, would fail at the scale and degree of openness at which those services function. The Slashdot FAQ tells the story of trying to police the comments with moderators chosen from among the userbase, first 25 of them and later 400. Like the Charge of the Light Brigade, however, even hundreds of committed individuals were just cannon fodder, given the size of the problem. The very presence of effective moderators made the problem worse over time. In a process analogous to more roads creating more traffic, the improved moderation saved the site from drowning in noise, so more users joined, but this increase actually made policing the site harder, eventually breaking the very system that made the growth possible in the first place.

EBay faced similar, ugly feedback loops; any linear expenditure of energy required for policing, however small the increment, would ultimately make the service unsustainable. As a result, the only opportunity for low-cost policing of such systems is to make them largely self-policing. From these examples and others we can surmise that large social systems will need ways to highlight good behavior or suppress negative behavior or both. If the guardians are to guard themselves, oversight must be largely replaced by something we might call intrasight, designed in such a way that imbalances become self-correcting.

The obvious conclusion to draw is that, when contemplating the a new service with these characteristics, the need for some user-harnessed reputation or ranking system can be regarded as a foregone conclusion, and that these systems should be carefully planned so that tragedy of the commons problems can be avoided from launch. I believe that this conclusion is wrong, and that where it is acted on, its effects are likely to be at least harmful, if not fatal, to the service adopting them.

There is an alternate reading of the Slashdot and eBay stories, one that I believe better describes those successes, and better places Community Patent to take advantage of similar processes. That reading concentrates not on outcome but process; the history of Slashdot’s reputation system should teach us not “End as they began — build your reputation system in advance” but rather “Begin as they began — ship with a simple set of features, watch and learn, and implement reputation and ranking only after you understand the problems you are taking on.” In this telling, constituting users’ relations as a set of bargains developed incrementally and post hoc is more predictive of eventual success than simply adopting any residue from previous successes.

As David Weinberger noted in his talk The Unspoken of Groups, clarity is violence in social settings. You don’t get 1789 without living through 1788; successful constitutions, which necessarily create clarity, are typically ratified only after a group has come to a degree of informal cohesion, and is thus able to absorb some of the violence of clarity, in order to get its benefits. The desire to participate in a system that constrains freedom of action in support of group goals typically requires that the participants have at least seen, and possibly lived through, the difficulties of unfettered systems, while at the same time building up their sense of membership or shared goals in the group as a whole. Otherwise, adoption of a system whose goal is precisely to constrain its participants can seem too onerous to be worthwhile. (Again, contrast the US Constitution with the Articles of Confederation.)

Most current reputation systems have been fit to their situation only after that situation has moved from theoretical to actual; both eBay and Slashdot moved from a high degree of uncertainty to largely stable systems after a period of early experimentation. Perhaps surprisingly, this has not committed them to continual redesign. In those cases, systems designed after launch, but early in the process of user adoption, have survived to this day with only relatively minor subsequent adjustments.

Digg is the important counter-example, the most successful service to date to design a reputation system in advance. Digg differs from the community patent review process in that the designers of Digg had an enormous amount of prior art directly in its domain (Slashdot, Kuro5hin, Metafilter, et al), and still ended up with serious re-design issues. More speculatively, Digg seems to have suffered more from both system gaming and public concern over its methods, possibly because the lack of organic growth of its methods prevented it from becoming legitimized over time in the eyes of its users. Instead, they were asked to take it or leave it (never a choice users have been know to relish.)

Though more reputation design work may become Digg-like over time, in that designers can launch with systems more complete than eBay or Slashdot did, the ability to survey significantly similar prior art, and the ability to adopt a fairly high-handed attitude towards users who dislike the service, are not luxuries the community patent review process currently enjoys.

The Argument in Two Pictures

The argument I’m advancing can be illustrated with two imaginary graphs. The first concerns plasticity, the ease with which any piece of software can be modified.

Plasticity generally decays with time. It is highest at the in the early parts of the design phase, when a project is in its most formative stages. It is easier to change a list of potential features than a set of partially implemented features, and it is easier to change partially implemented features than fully implemented features. Especially significant is the drop in plasticity at launch; even for web-based services, which exist only in a single instantiation and can be updated frequently and for all users at once, the addition of users creates both inertia, in the direction of not breaking their mental model of the service, and caution in upgrading, so as not to introduce bugs or create downtime in a working service. As the userbase grows, the expectations of the early adopters harden still further, while the expectations of new users follows the norms set up by those adopters; this is particularly true of any service with a social component.

An obvious concern with reputation systems is that, as with any feature, they are easier to implement when plasticity is high. Other things being equal, one would prefer to design the system as early as possible, and certainly before launch. In the current case, however, other things are not equal. In particular, the specificity of information the designers have about the service and how it behaves in the hands of real users moves counter to plasticity over time.

When you are working to understand the ideal design for a particular piece of software, the specificity of your knowledge increases with time. During the design phase, the increasing concreteness of the work provides concomitant gains in specificity, but nothing like launch. No software, however perfect, survives first contact with the users unscathed, and given the unparalleled opportunities with web-based services to observe user behavior — individually and in bulk, in the moment and over time — the period after launch increases specificity enormously, after which it continues to rise, albeit at a less torrid pace.

There is a tension between knowing and doing; in the absence of the ideal scenario where you know just what needs to be done while enjoying complete freedom to do it (and a pony), the essential tradeoff is in understanding which features benefit most from increased specificity of knowledge. Two characteristics that will tend to push the ideal implementation window to post-launch are when a set of possible features is very large, but the set of those features that will ultimately be required is small; and when culling the small number of required features from the set of all possible features can only be done by observing actual users. I believe that both conditions apply a fortiori to reputation and ranking.

Costs of Acting In Advance of Knowing

Consider the costs of designing a reputation system in advance. In addition to the well-known problems of feature-creep (“Let’s make it possible to rank reputation rankings!”) and Theory of Everything technologies (“Let’s make it Semantic Web-compliant!”), reputation systems create an astonishing perimeter defense problem. The number of possible threats you can imagine in advance is typically much larger than the number that manifest themselves in functioning communities. Even worse, however large the list of imagined threats, it will not be complete. Social systems are degenerate, which is to say that there are multiple alternate paths to similar goals — someone who wants to act out and is thwarted along one path can readily find others.

As you will not know which of these ills you will face, the perimeter you will end up defending will be very large and, critically, hard to maintain. The likeliest outcome from such an a priori design effort is inertness; a system designed in advance to prevent all negative behavior will typically have as a side effect deflecting almost all behavior, period, as users simply turn away from adoption.

Working social systems are both complex and homeostatic; as a result, any given strategy for mediating social relations can only be analyzed in the context of the other strategies in use, including strategies adopted by the users themselves. Since the user strategies cannot, by definition, be perfectly predicted in advance, and since the only ungameable social system is the one that doesn’t ship, every social system will have some weakness. A system designed in advance is likely to be overdefended while still having a serious weaknesses unknown the designer, because the discovery and exploitation of that class of weakness can only occur in working, which is to say user-populated, systems. (As with many observations about the design of social systems, these are precedents first illustrated in Lessons from Lucasfilm’s Habitat, in the sections “Don’t Trust Anybody” and “Detailed Central Planning Is Impossible, Don’t Even Try”.)

The worst outcome of such a system would be collapse (the Communitree scenario), but even the best outcome would still require post hoc design to fix the system with regard to observed user behavior. You could save effort while improving the possibility of success by letting yourself not know what you don’t know, and then learning as you go.

In Favor of Instrumentation Plus Attention

The N-squared problem is only a problem when N is large; in most social systems the users are the most important N, and the userbase only grows large gradually, even for successful systems. (Indeed, this scaling up only over time typically provides the ability for a core group, once they have self-identified, to inculcate new users a bit at a time, using moral suasion as their principal tool.) As a result, in the early days of a system, the designers occupy a valuable point of transition, after user behavior is observable, but before scale and culture defeat significant intervention.

To take advantage of this designable moment, I believe that what Community Patent needs, at launch, is only this: metadata, instrumentation, and attention.

Metadata: There are, I believe, three primitive types of metadata required for Community Patent — people, patents, and interjections. Each of these will need some namespace to exist in — identity for the people, and named data for the patents themselves and for various forms of interjection, from simple annotation to complex conversation. In addition, two abstract types are needed — links and labels. A link is any unique pair of primitives — this user made that comment, this comment is attached to that conversation, this conversation is about those patents. All links should be readily observable and extractable from the system, even if they are not exposed in the interface the user sees. Finally, following Schachter’s intuition from del.icio.us, all links should be labelable. (Another way to view the same problem is to see labels as another type of interjection, attached to links.) I believe that this will be enough, at launch, to maximize the specificity of observation while minimizing the loss of plasticity.

Instrumentation: As we know from collaborative filtering algorithms from Ringo to PageRank, it is not necessary to ask users to rank things in order to derive their rankings. The second necessary element will be the automated delivery of as many possible reports to the system designers as can be productively imagined, and, at least as essential, a good system for quickly running ad hoc queries, and automating their production should they prove fruitful. This will help identify both the kinds of productive interactions on the site that need to be defended and the kinds of unproductive interactions they need to be defended from.

Designer Attention: This is the key — it will be far better to invest in smart people watching the social aspects of the system at launch than in smart algorithms guiding those aspects. If we imagine the moment when the system has grown to an average of 10 unique examiners per patent and 10 comments per examiner, then a system with even a thousand patents will be relatively observable without complex ranking or reputation systems, as both the users and the comments will almost certainly exhibit power-law distributions. In a system with as few as ten thousand users and a hundred thousand comments, it will still be fairly apparent where the action is, allowing you the time between Patent #1 and Patent #1000 to work out what sorts of reputation and ranking systems need to be put in place.

This is a simplification, of course, as each of the categories listed above presents its own challenges — how should people record their identity? What’s the right balance between closed and open lists of labels? And so on. I do not mean to minimize those challenges. I do however mean to say that the central design challenge of user governance — self-correcting systems that do not raise crushing participation burdens on the users or crushing policing barriers on the hosts — are so hard to design in advance that, provided you have the system primitives right, the Boyd Strategy of OODA — Orient, Observe, Decide, Act — will be superior to any amount of advance design work.

[We’ve been experiencing continuing problems with our MT-powered commenting system. We’re working on a fix but for now send you to a temporary page where the discussion can continue.]

Against Well-designed Reputation Systems (An Argument for Community Patent)

Leave a Reply Cancel reply

Liked this post?