welcome

# On average, averages are the exception not the rule

April 3, 2008

In a nice article on the pitfalls of statistics published today on KnowledgeWharton (The Use -- and Misuse -- of Statistics: How and Why Numbers Are So Easily Manipulated - http://knowledge.wharton.upenn.edu/article.cfm?articleid=1928) there is an interesting discussion on statistics and how tricky it actually is. Nice, but it doesn’t go far enough.

How do we usually proceed to study a field or phenomena where there is lots of apparent or real heterogeneity? Well, we are trained to look for simple explanations, to infer from patterns and regularities the existence of laws (when in doubt apply Occam’s razor) and expect our units of analysis (whether they are cities, people, firms, ecologies or economic transactions - let’s call them agents) to conform to those laws with some individual variations. Assuming the existence of a representative agent, we also expect that it is possible to rank our agents according to how distant they are from the representative agent.

Now imagine what the world looked before this type of reasoning was introduced. Unbounded variability, endless forms, capricious behaviours, permanent amazement at the diversity of the natural and social phenomena. No surprise that Plato introduced the myth of the cavern to try to establish some order in the messiness of reality (for Plato all earthly forms were flawed reflections of the ideal type, which didn’t exist on earth).

Then arrived Quetelet, Demoivre, Gauss, Pearson, etc. and it must have been intellectual nirvana. By using simple concepts such as averages and variances, they could explain the amazing diversity of reality. It worked everywhere, from atoms to voters, from societies to natural systems. The promise of statistics must have seemed unbound. Quetelet thought, in a typical Platonian or pre-communist fashion, that the mean was the embodiment of the ideal form. Variance was evil and extreme variances indicated pathological behaviours. Order was in homogeneity and the mean represent the signature of the ‘right’ value. Perfection rested with the average person and consequently the role of politics was to create the average society. Many sciences followed. Substitute mean with equilibrium, throw in the invisible hand and you get today’s market fundamentalism. In today’s FT George Soros writes:
“for the past 25 years or so the financial authorities and institutions they regulate have been guided by market fundamentalism: the belief that markets tend toward equilibrium and that deviations from it occur in random manner. All the innovations – risk management, trading techniques, the alphabet soup of derivatives and synthetic financial instruments were based on that belief. The innovations remained unregulated because authorities believe markets are self-correcting”

As often in the history of ideas we forget the assumptions on which theories are built. For Gaussian statistics (and linear science at large), they are basically 2: independence and randomness. Now, how many instances do you know in the social sciences in which phenomena, or datapoints, are truly independent from each other and random? But try to take any sample of articles in the social sciences and especially in economics and management, and you will see that Gaussian statistics rules uncontested. Even worse, alternative methods and the underlying weltanschaung (vision of the world) are actively resisted.
As Mandelbrot (the inventor of fractal geometry) puts it:

“The most diverse attempts continue to be made, to discredit in advance all evidence based on the use of doubly logarithmic graphs. But I think this method would have remained uncontroversial, were it not for the nature of the conclusion to which it leads. Unfortunately, a straight doubly logarithmic graph indicates a distribution that flies in the face of the Gaussian dogma, which long ruled uncontested. The failure of applied statisticians and social scientists to heed Zipf helps account for the striking backwardness of their fields. (Mandelbrot, 1983, 404)”

With a colleague from UCLA, Bill McKelvey, I have been studying the misuse of Gaussian statistics and exploring the potential of what is known as Paretian (from Pareto, the Italian economist/sociologist) science. We find that almost anywhere you look at you find distributions that carry the unmistakable sign of Pareto distributions (also known as Zipf, or power law, these are long-tailed distributions where both mean and variance are unstable or don’t exist.), which scream for a non Gaussian interpretation.

More on this in my next blog

Related Posts