Wednesday, September 06, 2006

Aaron Swartz: Why is he getting so much attention?

I'm sure we've all seen Aaron's blog entry about contributors to the Alan Alda article. For those of us who went to Wikimania, this is old news: Seth's study not only reported on this, but in more detail (as finally Ross Mayfield has noticed on his blog).

The community has long known that edit count is a poor measure of contributions, although at the same time the community is also quite addicted to edit counting. Aaron's letter-counting metric also incidentially heavily rewards people who revert pageblanking vandalism (which is actually quite common on high-traffic articles). Aaron's metric is better than edit count, but isn't that good.

Quite simply, we need a way to classify edits. This isn't that easy; if it were we'd have the software do it automatically, and reject those that fall into the classifications of "edits we don't want". That would be really nice -- and certainly requires natural language processing that doesn't exist and won't for a long time. Greg and I discussed (after Seth's presentation) the idea of a research portal that would present edits to research assistants who have been trained to classify them and store the classifications in its database. Mechanisms could be added to facilitate quality checking to ensure classification is being done correctly. This portal could then be made available to researchers interested in this sort of thing. The problem is that it takes from 5 to 30 seconds to classify an edit, and there are currently 76,732,244 of them on the English Wikipedia alone. Classifying all of them would take over 50 years of full-time labor -- or in dollar terms, about three quarters of a million dollars in labor value (and that assumes 5 seconds per edit classified and no allowances for quality issues). Furthermore, classifying edits is boring: it's not going to be easy to incent people to do it, and certainly not in large volume.

I appreciate Aaron bringing this issue up -- again -- but I think he needs to work more on talking to the people who are already in the field instead of trying to use his unoriginal discovery as a justification for his own board candidacy -- which is quite clearly the real reason for his blog post. I'd be very curious to know how many more votes he got after his blog story got picked up by the media. I can't imagine it's none....

6 comments:

  1. "Aaron's letter-counting metric also incidentially heavily rewards people who revert pageblanking vandalism (which is actually quite common on high-traffic articles)."

    Not true. Someone who reverts pageblanking vandalism gets no credit under my metric, which counts who first adds a piece of text. I'm not classifying edits at all, I'm looking at the history of each piece of text in the final version.

    I'm sorry I somehow missed Seth's presentation at Wikimania, but I hope it's somewhat helpful to get independent evidence on this question.

    But am I right in understanding that you agree that most of Wikipedia is written by casual editors?

    ReplyDelete
  2. "who first adds a piece of text" ... Gee, that sounds a lot like historyflow to me... Seems a bit honest that you failed to mention history flow. If thats all you're doing, why have you not run this test on a significant number of articles?

    Perhaps the results wouldn't support your position?

    ReplyDelete
  3. I assume you mean "seems a bit dishonest". I'm not sure why. I talked to the history flow folks but didn't end up using their algorithm because it didn't work particularly well for this task (as Martin notes, it doesn't handle pageblanking vandalism well). Should I simply have mentioned every other study on the subject? (There's also Denise Anthony's work and Tom Cross's, neither of which I had read before I published mine.) That doesn't seem like usual practice for an essay, though I will of course refer to all of them (and Seth) when I put up more details about my work.

    As for significance, I have run the test on a significant number of articles -- according to statistical sampling techniques, the 500 I ran it on is statistically significant with a reasonable margin of error. Still, I'm eager to run it on more, but the process is actually quite slow and Wikipedia is very large. Thankfully, some people have offered more compute time. I hope to work out a deal with one of them and run the algorithm on all the pages.

    ReplyDelete
  4. What is history flow? where can I look at it?

    ReplyDelete
  5. As I understood it, Aaron's method measures how much of the final article was written by which user. If an article was deleted in vandalism along the way, then restored, it's still (mostly) the same groups of letters which were contributed by, I think he demonstrated quite convincingly, the casual contributor.

    ReplyDelete
  6. I wonder if Bayesian analysis, such as is used in many spam-filtering programs these days, would help in edit-classifying? If a sufficiently large sample of edits were manually classified by humans, and then a program analyzed what mechanically determinable characteristics were present in each category, then it might be able to categorize other edits automatically (with humans spot-checking and moving incorrectly classified edits to fine tune the algorithm).

    ReplyDelete