Tuesday, February 06, 2007

Deletion of vandalism revisions, and authorship in general on Wikipedia

I've long argued that Wikipedia should entirely delete vandalism revisions rather than leaving them in the history. Leaving them there leaves at least some form of recognition for the vandal, and they certain serve no legitimate purpose except to provide statistics as to how much Wikipedia is vandalized, which can be gathered some other way.

The presence of vandalism revisions in an article's history also makes identifying the authors of the article far more difficult. Neither the vandal nor the vandal patroller who reverts them vandalism is an "author" of the article, and neither should be listed as an author in the same sense that someone who actually contributes real content should be. This ties into a separate and more complex issue: identifying the authors of a Wikipedia author.

Not everyone who edits an article is an author of it. Obviously, of course, editors who vandalized the article or who reverted that vandalism are not authors; the first because their contributions, such as they were, have been removed, and the second because their actions did not add any creative content (or in fact any content at all). In general, maintenance actions of any sort do not create an authorship interest. An author is someone who either creates substantial new content, or substantially transforms existing content. Only those editors who edits contribute substantial new content or substantially transform existing content are properly authors of the article. Clearly, not all of the editors will qualify, and it's possible that an article might have authors who are not editors, if the article derives from content not on the wiki, or which was part of some other article merged across.

The problem, then, becomes one of differentiating authors from editors. There's no way the MediaWiki software can make this determination; it can't tell someone who makes a dozen spelling corrections or who replaces a bunch of HTML markup with MediaWiki markup from someone who completely rewrites the article. So this has to be a human-mediated process. What I recommend is a separate "authors" tab associated with each article. Any editor who feels that they are an author of the article may list themselves as one. A bot would go through edits to author pages and check to make sure that any person listed as an author is actually an editor of the article; instances where a person not listed as an editor is listed as an author would be flagged for review. False claim of authorship would be a serious community offense.

This approach reflects a general approach to maintenance operations on Wikipedia, many of which are currently performed by admins. Most of these activities can be substantially automated, but there is a great reluctance to do so because of the long tail problem. The bot would be programmed to automatically deal with the clear-cut cases, and refer the nonclear cases to its human minders for a decision. Examples where this can be implemented include most speedy deletion situations, requested moves, protection and unprotection, and even blocking and unblocking.

The thing is, Wikipedia's culture won't allow most of this to happen. Deleting vandalism revisions won't happen because (a) what if someone makes a mistake and labels a nonvandalism revision as vandalism and (b) it would reduce edit counts of vandalism patrollers so much that they could never "level up". The former problem can be dealt with on a "good-enough" basis (say, if the revision is tagged as vandalism and not untagged within a certain time, it and its reverting edit will both be deleted by a bot; this gives time to untag). The latter problem, though, is harder to deal with because it deals with the mindnumbingly bizarre expectations of the Wikipedia community. There is also a very strong prejudice against having bots that can perform administrative actions ("fear of SkyNet syndrome"), although I have yet to figure that one out, either. I think it's probably related to the way Wikipedia really satisfies many people's desire to Be In Charge, or at least to Wield Power. If being an admin merely meant that the bots gave your recommendations more weight, then what's the fun of it? That problem won't fix itself until Wikipedia's community wakes up and remembers that it's trying to write an encyclopedia.