Friday, April 20, 2007

Does vandalism make Wikipedia less of an encyclopedia?

This is, of course, old news; it's from March 12th, and the fact that I'm just getting around to blogging about it is a reflection of how busy I've been. (March 12th was three days after we closed on the new house, and so I was understandably busy with more important things than amusing y'all about the insanity that is Wikipedia.)

By now everyone -- including Sabre Publishing Ltd., who, amusingly enough, does not have a Wikipedia article -- knows that anything you find on Wikipedia might contain random nonsense that, if used without inspection, could prove quite embarrassing. And certainly this might have had something to do with this; the raw possibility that a Wikipedia page might contain anything will certainly put many people off.

This is both Wikipedia's problem, and not Wikipedia's problem.

It is Wikipedia's problem insofar as people are putting garbage into articles; the Wikipedia community needs to find better ways to deal with this than it has to date. The much-talked-about but yet-to-be-seen "stable versions" would likely help a great deal, but I've been saying that now since last summer and Wikipedia still don't have them and (as far as I know) there isn't even a timetable for when Wikipedia will have them. Apparently the problem has something to do with page moves; I looked at this briefly after talking to someone on IRC and recognized that, yes, that would be difficult with the current database schema. But I already knew that MediaWiki's database schema had issues. But even setting aside stable versions as a solution, there are other possibilities: coordinated vandalism patrolling tools, for example. A coordinated vandalism management tool would ensure that each edit was examined by only one or two patrollers. This can be done by putting each new edit into a queue, and having patrollers pop edits off the queue and review them driven not by the march of the RC stream, but instead their own availability. The queue might back up if not enough patrollers are on duty at any time, but the software can probably deal with this as well by merging multiple edits to the same article (and dropping them when a subsequent edit is from a trustworthy user). This system could also implement quality control and metric collection, both areas that are currently not addressed in the vandalism management arena.

It is not Wikipedia's problem insofar as anyone reading Wikipedia, or in fact any reference source, is expected to use their common sense in evaluating what they find, and not merely accepting it unquestioned. So, for example, even if Wikipedia says that Sioux Lookout is "full of drunks and a dirty little town", or that chickens can "fly into magic dragon helecopters", even a relatively undiscriminating reader would likely to take a moment to consider whether these remarkable facts are in fact true. However, it's somewhat more difficult with respect to the Jesuit problem cited by Noam Cohen's NYT article (mentioned above). The average person is not likely to have any reason to believe that the statement "the rebels themselves were backed by the foreign power of the Jesuits and the Roman Catholic Church" (as found in this edition of the Wikipedia article in question) is wrong. One would have to be independently knowledgeable of the Shimabara Rebellion (which I do not recall even hearing of prior to today despite taking several Asian history classes in high school and college) and of the Jesuit's role in it. Errors like this can be due to honest mistakes in fact by editors, to deliberate bias by editors, or to so-called "sneaky vandalism", where an editor deliberately introduces false but plausible information to degrade the quality of Wikipedia. I can't say which of the three is responsible for this particular error here; the point, however, is that "caveat lector" clearly applies here. One ought not to rely on anything in Wikipedia for anything more than casual purposes without verifying it independently, lest one find oneself accidentially claiming that artificial poultry incubation was invented by George W. Bush.

Clearly the Sioux Lookout incident was primarily Sabre Publications fault: they were abjectly negligent to send copy to press without looking it over even superficially. At the same time, the Wikipedia community was negligent for letting the article say such unpleasantly nasty things about Sioux Lookout for nearly two days back in January (the vandalism was in place from January 27 at 2310Z until January 29 at 1515Z). A more comprehensive, more responsible activity monitoring system would have discovered this edit more promptly. (Tellingly, the vandalism was reverted by an anonymous editor, not by one of Wikipedia's vaunted vandal patrollers. One wonders how often this happens.)

The Jesuits in Japan incident, though, is more the result of a structural defect in Wikipedia, combined with a perceptual problem with Wikipedia unfairly calling itself an "encyclopedia". People expect an encyclopedia to be accurate. Wikipedia is not systematically fact-checked, which means that its accuracy falls well short of what most people probably expect, including, apparently, Professor Waters' students. Wikipedia could do a lot by implementing systematic fact-checking methods, but in my experience attempts to suggest this to the Wikipedia community are generally met with "That's too much bother, people won't do it". And that attitude, quite frankly, is why I really question the commitment of the Wikipedia community to writing an encyclopedia.