Friday, April 20, 2007

Does vandalism make Wikipedia less of an encyclopedia?

This is, of course, old news; it's from March 12th, and the fact that I'm just getting around to blogging about it is a reflection of how busy I've been. (March 12th was three days after we closed on the new house, and so I was understandably busy with more important things than amusing y'all about the insanity that is Wikipedia.)

By now everyone -- including Sabre Publishing Ltd., who, amusingly enough, does not have a Wikipedia article -- knows that anything you find on Wikipedia might contain random nonsense that, if used without inspection, could prove quite embarrassing. And certainly this might have had something to do with this; the raw possibility that a Wikipedia page might contain anything will certainly put many people off.

This is both Wikipedia's problem, and not Wikipedia's problem.

It is Wikipedia's problem insofar as people are putting garbage into articles; the Wikipedia community needs to find better ways to deal with this than it has to date. The much-talked-about but yet-to-be-seen "stable versions" would likely help a great deal, but I've been saying that now since last summer and Wikipedia still don't have them and (as far as I know) there isn't even a timetable for when Wikipedia will have them. Apparently the problem has something to do with page moves; I looked at this briefly after talking to someone on IRC and recognized that, yes, that would be difficult with the current database schema. But I already knew that MediaWiki's database schema had issues. But even setting aside stable versions as a solution, there are other possibilities: coordinated vandalism patrolling tools, for example. A coordinated vandalism management tool would ensure that each edit was examined by only one or two patrollers. This can be done by putting each new edit into a queue, and having patrollers pop edits off the queue and review them driven not by the march of the RC stream, but instead their own availability. The queue might back up if not enough patrollers are on duty at any time, but the software can probably deal with this as well by merging multiple edits to the same article (and dropping them when a subsequent edit is from a trustworthy user). This system could also implement quality control and metric collection, both areas that are currently not addressed in the vandalism management arena.

It is not Wikipedia's problem insofar as anyone reading Wikipedia, or in fact any reference source, is expected to use their common sense in evaluating what they find, and not merely accepting it unquestioned. So, for example, even if Wikipedia says that Sioux Lookout is "full of drunks and a dirty little town", or that chickens can "fly into magic dragon helecopters", even a relatively undiscriminating reader would likely to take a moment to consider whether these remarkable facts are in fact true. However, it's somewhat more difficult with respect to the Jesuit problem cited by Noam Cohen's NYT article (mentioned above). The average person is not likely to have any reason to believe that the statement "the rebels themselves were backed by the foreign power of the Jesuits and the Roman Catholic Church" (as found in this edition of the Wikipedia article in question) is wrong. One would have to be independently knowledgeable of the Shimabara Rebellion (which I do not recall even hearing of prior to today despite taking several Asian history classes in high school and college) and of the Jesuit's role in it. Errors like this can be due to honest mistakes in fact by editors, to deliberate bias by editors, or to so-called "sneaky vandalism", where an editor deliberately introduces false but plausible information to degrade the quality of Wikipedia. I can't say which of the three is responsible for this particular error here; the point, however, is that "caveat lector" clearly applies here. One ought not to rely on anything in Wikipedia for anything more than casual purposes without verifying it independently, lest one find oneself accidentially claiming that artificial poultry incubation was invented by George W. Bush.

Clearly the Sioux Lookout incident was primarily Sabre Publications fault: they were abjectly negligent to send copy to press without looking it over even superficially. At the same time, the Wikipedia community was negligent for letting the article say such unpleasantly nasty things about Sioux Lookout for nearly two days back in January (the vandalism was in place from January 27 at 2310Z until January 29 at 1515Z). A more comprehensive, more responsible activity monitoring system would have discovered this edit more promptly. (Tellingly, the vandalism was reverted by an anonymous editor, not by one of Wikipedia's vaunted vandal patrollers. One wonders how often this happens.)

The Jesuits in Japan incident, though, is more the result of a structural defect in Wikipedia, combined with a perceptual problem with Wikipedia unfairly calling itself an "encyclopedia". People expect an encyclopedia to be accurate. Wikipedia is not systematically fact-checked, which means that its accuracy falls well short of what most people probably expect, including, apparently, Professor Waters' students. Wikipedia could do a lot by implementing systematic fact-checking methods, but in my experience attempts to suggest this to the Wikipedia community are generally met with "That's too much bother, people won't do it". And that attitude, quite frankly, is why I really question the commitment of the Wikipedia community to writing an encyclopedia.

3 comments:

  1. Erik said stable versions are 4-8 weeks off in a recent email.

    ReplyDelete
  2. Interesting stuff about "vandalism management".

    Short of the actual coordination bit, I and others have been doing pretty much all of that stuff for some time - and you are quite right in saying that it is effective.

    My software uses a queue which prioritizes likely vandalism. It merges multiple edits to the same article and it drops them after an edit by a trustworthy user. It goes a couple of steps beyond that by 'learning' which users are trustworthy, and tracking the number of warnings a user recieves (and using this to prioritize edits). More importantly (in terms of efficiency) it reduces the whole process of reverting an edit, warning the user responsible and bringing up another diff to a single keystroke or mouse click.

    While I haven't done any vandalism patrolling for a month or so, the experience I do have suggests this is an effective way of going about thing.

    Other tools, some of which have been around for about two years, do at least some of this.

    What's missing from this, of course, is the 'coordination' - such a system would presumably build up an edit queue in the same way but have it in a centralized place, and distribute requests among patrollers.

    Metric collection is certainly possible; I collected some general statistics from the RC feed back in November, and more specific things would be fairly simple to do - this doesn't require any kind of 'coordination', just an RC reader.

    I'm not sure what you mean by quality control, but if you mean checking reverts to make sure they were appropriate, while that's easy to implement I'm not convinced it would be a good use of time; effort would be more profitably diverted into finding more vandalism or improving the encyclopedia.

    ReplyDelete
  3. qxzzxq, could you get in touch with me? I'd be interested in possibly adapting some of your techniques to a collaborative application.

    ReplyDelete