Biology in the News Explained

The “wisdom of the crowd” effect is a statistical artifact

Proceedings of the National Academy of Sciences (PNAS) is considered a top-tier journal, just slightly below Science and Nature in terms of prestige (in fact, it is known among scientists as “Post Nature And Science” because many of the papers were submitted to one of the former two journals without success). The papers published in PNAS thus are generally considered to be important contributions to science.

Why is this paper in PNAS?
There are two puzzling questions about the publication “How social influence can undermine the wisdom of crowd effect” (Lorenz J, Rauhut H, Schweitzer F, Helbing D., 2011. 108(22):9020-5) published last May in PNAS. The first is why a scientific journal has published a sociological paper in the first place. Does having a few statistics and a discussion of logarithmic distributions make a study science?

The second question is simply, since when is a basic statistical truth, taught to all first-year ecology students, but dressed up in meaningless sociological jargon considered cutting-edge science? It is truly baffling.

The irrelevance of this paper truly cannot be overstated. Here is the abstract:

Social groups can be remarkably smart and knowledgeable when their averaged judgements are compared with the judgements of individuals. Already Galton [Galton F (1907) Nature 75:7] found evidence that the median estimate of a group can be more accurate than estimates of experts. This wisdom of crowd effect was recently supported by examples from stock markets, political elections, and quiz shows [Surowiecki J (2004) The Wisdom of Crowds]. In contrast, we demonstrate by experimental evidence (N = 144) that even mild social influence can undermine the wisdom of crowd effect in simple estimation tasks. In the experiment, subjects could reconsider their response to factual questions after having received average or full information of the responses of other subjects. We compare subjects’ convergence of estimates and improvements in accuracy over five consecutive estimation periods with a control condition, in which no information about others’ responses was provided. Although groups are initially “wise,” knowledge about estimates of others narrows the diversity of opinions to such an extent that it undermines the wisdom of crowd effect in three different ways. The “social influence effect” diminishes the diversity of the crowd without improvements of its collective error. The “range reduction effect” moves the position of the truth to peripheral regions of the range of estimates so that the crowd becomes less reliable in providing expertise for external observers. The “confidence effect” boosts individuals’ confidence after convergence of their estimates despite lack of improved accuracy. Examples of the revealed mechanism range from misled elites to the recent global financial crisis.

Why is this paper irrelevant?
In this case, the authors are standing on the shoulders of midgets. The entire concept of “wisdom of the crowd” is meaningless in the first place. This is because first, the questions asked have answers for which estimates will naturally be in a logarithmic distribution, around the right order of magnitude; and second, if those estimates are nonindependent, they are not as accurate.

Estimates of nearly everything are log-normal
As it turns out, the arithmetic means Lorenz et al. calculated are actually nowhere close to the actual data (Table 1, below):

Table 1, Lorenz et al. 2011

They do recognize that the nature of the actual data are such that estimates will fall in a logarithmic distribution, which perhaps is why the study appears to be “scientific”:

… the estimates of our type of questions are not normally distributed but right-skewed. In other words, the majority of estimates are low and a minority of estimates are scattered in a fat right tail, as it is the case for log-normal distributions.

This distribution occurs because respondents know that the real answer is bounded by zero but is literally unbound in the other direction. Some people who guess wrong will guess too low, but the range of wrong answers below the correct one is much smaller than the range of wrong answers above the correct one, and a few people who guess wrong will guess a number that is way too high, such as 10,000 km.

The authors transform the mean to a geometric mean to take account of this discrepancy. But then, inexplicably, they use a linear measure (percentage) to claim that the geometric mean is actually closer to the truth. But it is no closer to the truth than the arithmetic mean, because the data are the same; they are simply transformed to fall within a smaller-looking range, but it only looks smaller because it is geometric! This is a mind-bogglingly basic mistake, which apparently only demonstrates that sociologists (both the authors and the reviewers that found this acceptable) are all completely clueless about statistics.

The “wisdom of the crowd” means a rough guess within an order of magnitude
What sociologists touting the “wisdom of the crowd” have actually demonstrated is no insight about human collectivism, but merely a statistical artifact, for the reason above. What the mean guess approaches is an answer within roughly the right order of magnitude. It turns out that human brains are built to think in terms of order of magnitude (when dealing with numbers above about 10 or 20), so the test becomes a self-fulfilling prophecy. But the same pattern exists all over nature, which is why scientists are continually exhorted to make sure their data points are actually independent — the most accurate estimate of a true mean is only possible if they are.

“Social influence” is sociological jargon for nonindependence
“Social influence” as defined in this paper is nothing more than the authors making sure that the data points are not independent, which is a general state nearly guaranteed to produce the wrong answer by definition. There is no earth-shattering effect here that has a sociological basis; it is all simply statistical artifact, and could be reproduced any number of ways without even involving humans. For example, the number of leaves on a tree branch is bounded by zero and a potentially very large number. If you wanted to estimate the mean number of leaves per branch, you would likely identify lots of branches of the tree species of interest and count the leaves on each. This would give you an estimate of the true mean of leaf number. Importantly, you would not take all your data from a single tree, because your counts would not be independent (because branches originating from the same trunk are all influenced by the same trunk), and you would likely get an estimate farther away from the true mean.

Thus, trees show the same “wisdom of the crowd” effect, and are biased in the same way when counts are influenced (or “known”) by being related to each other.

So, the authors have succeeded in demonstrating one of the most basic statistical concepts (at least to anyone who actually understands statistics), and convinced themselves (and, apparently the editors of PNAS) by dressing up the concept in sociological jargon that they have instead demonstrated an important social phenomenon. Papers like this are a big reason why as a society we struggle with scientific literacy. If authors and editors from in a major scientific journal are this oblivious, is it any wonder that the general public is too?

Share
tabs-top

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>