Sunday, March 25, 2007

The demographics of Wikipedia

Reading Geoff Burling's recent post on "Age and Wikipedia" got me thinking about age, and about other demographics, in the Wikipedia communities. However, I think the behavior Geoff is writing about is not entirely a symptom of age (although certainly many of the people expressing the behavior in question are teenaged boys, as I've commented on before), but rather of psychological characteristics that Wikipedia selects for, combined with the simple fact that there are a lot of teenagers (boys, mainly) with scads of free time to burn on the Internet.

A few days later, I then saw this, from Language Log, about a college freshman who credits Wikipedia for his passion for linguistics. And certainly Wikipedia does probably attract significant numbers of people who first discover Wikipedia while scratching some knowledge itch. I seem to recall that my first encounter with Wikipedia was due to an interest in mathematics. Of course, not nearly everyone who finds Wikipedia as a reference source goes on to be even so much as a casual editor, let alone a dedicated editor, and I doubt that anyone has anything better than a wild guess as to the conversion rates there.

And this brings me to the major annoyance in discussing demographics of Wikipedians: there are no meaningful demographics about Wikipedians. About the only subgroup of Wikipedians for which there is even a hope of meaningful demographics is that group which goes to Wikimania. The culture of anonymity there is so strong that many admins, and quite probably a majority of editors, do not reveal even basic demographic information about themselves. There are a number of voluntary surveys (e.g. the "list of Wikimedians by age" on meta), but any statistician knows that self-selected surveys are problematic at best and the response rate on these surveys is generally so low as to be useless for any meaningful purpose. Even so much as estimating the number of distinct editors on Wikimedia projects is hard, because of anonymous editing (which results in multiple people being difficult to distinguish) and sockpuppets (which results in a single person appearing to be multiple people). The English Wikipedia has millions of users, but a rather large percentage of them were created for the sole purpose of vandalism. The number of true, non-anonymous editors is simply not known, and is likely unknowable as well. And nobody can really agree on the best method to find reasonable estimates (most of which have to do with only counting editors who make more than some number of edits in some fixed length of time).

So, while it is "common knowledge" that "Wikipedia is run by high schoolers", there really is not any objective basis for this statement. At best, it's an intuitive guess extrapolated from very limited information. Certainly there are high school students involved with Wikipedia, but I think the above-linked article about the passionate linguist is proof of why this can be a good thing. Extrapolating from a few instances to the general case, however, is fallacious. I would love to see real, meaningful statistical data on the demographics of Wikipedia readers, contributors, and community members instead of the current mishmash of wild guesses, extrapolations, and outright hyperbole that is sadly passing for fact in such discussions.