Monday, March 10, 2008

Wikipedia's quality

So, people are constantly finding new and interesting ways to evaluate Wikipedia's quality. These often rely on random pagewalks, which is a really poor way to choose articles for evaluation. Finally, though, we have a good basis for choosing articles for evaluation: some dude named Henrik has, presumably in cooperation with someone in the Wikimedia developers' team, come up with stats on the most frequently viewed articles. This is by far the best way to choose articles for evaluation for quality: the articles that people actually do look at will, better than anything else, evaluate how well Wikipedia's readerships is being served by Wikipedia content.

Based on a sample window of 23 days in February 2008, there were 9956 distinct page names (not all of which correspond to articles) that were viewed at least once per minute on average over that timeframe. I don't have time to evaluate them all, but I will be looking at some of the top rated ones and making some comments in the near future. Just glancing down the list indicates that politics, popular culture, and sex dominate the topics. I admit being perplexed at the prominence of "canine reproduction", however. (Andrew Gray has some good comments on his first impressions of the top 9956 in his LiveJournal; I shall not repeat them here.)

I actually expect Wikipedia will acquit itself better here than in many of the other evaluatory metrics people use. These high-traffic articles tend to be watched closely and many of them are semiprotected (and in fact my early observation of this led me to wonder what percentage of pageviews are of protected content, a statistic I would love to see collected, or at least estimated). Their content is likely to be at least decent, if not actually good. It'll be interesting to see to what degree this is the case, and how far down one has to get before one gets to a really bad article.

Anyway. Look for this to be the focus of at least the next several posts. Hopefully this will be a pleasant change from the less pleasant discussions of the past few days.

11 comments:

  1. As I understand it you propose to evaluate the best articles on Wikipedia. You can't take this article sample to compare it to other sources.

    It's only usable for internal purposes to figure out how participation works.

    ReplyDelete
  2. I have no intentions of evaluating the best articles on Wikipedia; instead, I intend to evaluate the most frequently viewed articles on Wikipedia. Big difference.

    ReplyDelete
  3. Interesting. Removing all non-articles from the list and look at the top 25 remaining... 1 stubbish, 2 barebones, 18 reasonable (fairly broad and well-written with some references), 6 good/featured articles.

    ReplyDelete
  4. @kelly martin

    The most viewed articles are the most reviewed articles - am I wrong? According to some studies the quality of articles increases with colaboration.

    Ergo: the most viewed articles are probably part of the best articles on Wikipedia.

    ReplyDelete
  5. It's fun to check out the popularity of deleted articles too - we still get a lot of hits for, say, Brian Peppers. :p

    ReplyDelete
  6. Torsten,

    I just posted my first article review, of the most viewed article, "wiki". I must say that if that's one of "the best", Wikipedia has a long way to go. Hopefully the next article, "Valentine's Day", will be better.

    ReplyDelete
  7. Torsten commits so many errors of logic, I don't even know where to begin refuting them. Not to (re-)mention that some of these pages are protected, so how does collaboration continue on those?

    I'll look forward to the university study that concludes "2 Girls 1 Cup" is one of Wikipedia's "best articles".

    ReplyDelete
  8. Kelly Martin: Sorry, I misinterpreted your plans. I don't see a problem as long as you do not take this sample of articles as representative for the Wikipidea as whole.


    Gregory Kohs: protection is the result of very intense collaboration :-)

    Without joking: When I find an error in an protected article, I write it on the discussion page. Or I talk to an admin. Collaboration does not stop totally.

    ReplyDelete
  9. They are representative of what Wikipedia's consumers are getting when they use Wikipedia. I would think that would matter to someone, but apparently not to you, whoever you are. Admittedly, my experience with the "compulsive author" type of editor is that they really don't care if anyone actually reads their articles, or if anyone else finds them useful.

    ReplyDelete
  10. There was a paper at last year's Wikisym that took similar stats (running off a data dump) and ran some analysis on them; it was an interesting take on google ranking as indicator of quality, making something of the same argument as Torsten, above. They actually argued that "topics of high interest or quality" were brought to the forefront. Anyway: http://ws2007.wikisym.org/space/WilkinsonHubermanPaper

    I argued that they missed the disconnect between quality & page views in highly viewed topics, but it was still a thought-provoking paper.

    ReplyDelete
  11. I've wondered about the question of pageviews to protected content too, so I hacked up some stats over here.

    ReplyDelete