Thursday, March 15, 2007

Notability, maintainability, and quality

Sage Ross reports (in his blog, at "Wikipedia and Notability") that the community is unhappy with the current definition of notability. I've touched on notability before, in the limited context of webcomics (see Webcomics and Wikipedia and On Webcomics, again). As Sage notes, "notability" has always been a contentious issue in Wikipedia, and there is indeed currently a dispute over what, if anything, "notability" should mean.

In the interest of disclosure, I will reveal that I am an eventualist inclusionist mergist. (I do not consider the latter two mutally contradictory.) My experience, in the somewhat over two years that I've been involved with Wikipedia, is that the scope of what constitutes "acceptable content" for inclusion in the encyclopedia has consistently broadened over time, although certainly in some areas (such as webcomics) there have been pushbacks. A good example of this trend must necessarily be high schools. When I first started at Wikipedia, in late 2004, very few high schools had articles, and most attempts to create one were met with a rather quick deletion on the basis of being "not notable". By 2006, it was generally accepted that high school articles were not subject to being deleted on the basis that they were "insufficiently notable", and today nobody (except for the most hardcore deletionist) contemplates deleting a high school article for very long. Similar trends have seen individual articles on every Pokemon, articles on individual episodes of various television shows, and all sorts of other content that would likely have been summarily deleted in 2004 become generally accepted as appropriate content in 2007.

This is, in my belief, largely due to the fact that the people who feel the urge to remove what they feel is meritless content are simply outnumbered by the people who would create such content. There has not, in most cases, been any conscious decision by the Wikipedia community (if in fact that entity is capable of making decisions, which I rather highly doubt) that articles on individual episodes of the Simpsons are appropriate for inclusion; rather, the articles were created by dedicated Simpsons fans, and nobody with an eye for trimming the encyclopedia got to them quickly enough to effectively resist their presence, and so they, by default, became part of the accepted corpus. I see no reason why this trend would not continue, and so I therefore expect that over time the margins of notability will continue to be pushed further and further back. I don't think that the margins will ever be pushed out completely to the point that (e.g.) the serial number of the dollar bills in my purse will merit their own articles (although it's not entirely out of the question, as many of them are catalogued already at, but I think there's still a great deal of room for expansion and I expect to see Wikipedia expand into that space over the long haul.

The ongoing battle over webcomics seems to be the current exception to this trend, and I don't expect it to continue. Assuming that they don't give up, the webcomics fans will eventually win, as they simply outnumber the notability pruners. At the moment, the pruners are organized against webcomics, and they are assiduously defending that territory. However, the pruners are more subject to attrition in the ranks than the webcomics fans, and it is likely inevitable that too many of their faction will leave Wikipedia or be drawn off into some other battle (say, amateur sports leagues, or radio towers, or some other equally borderline area) and the resulting loss of active focus will let the webcomics fans win out. It's far easier, in most cases, to recruit people in favor of keeping content than it is to recruit those opposed to it.

So, rather than spending a lot of time refining the definition of notability, I would advise discarding it entirely. Notability is, in practice, is a proxy for a large number of largely personal beliefs about what should be in an encyclopedia for which there is no consensus within the Wikipedia community. Furthermore, those beliefs shift over time, and I believe that shift will tend toward broader inclusion over time. The problem with broad inclusionism is that it will inevitably lead to more articles than the Wikipedia community can effectively maintain. (It is difficult to deny that this has already happened.)

The problem with discarding notability is that immediately people will scream "But then we will have articles about what you had for breakfast yesterday". Well, no, we won't. (Although it might be interesting to have that data; I'm sure that there will be people in 2150 who will be interested in knowing about the dietary habits of early 21st century IT professionals. There are probably people in 2007 with that interest, for that matter.) I am not advocating having no standards at all; that would be irrational. Instead, the standards must reflect maintainability as the main consideration. A record of my breakfast yesterday (for the record, two glazed Dunkin Donuts and a bottle of Aquafina) is unverifiable, and thus unmaintainable, and thus unfit for inclusion in Wikipedia. Verifiability isn't enough for maintainability, but it's definitely a minimum characteristic.

This seems to be the general direction of the discussion that Sage refers to, although they're not characterizing it as maintainability, but instead attributability. I don't think attributability is enough. One of Wikipedia's largest problems right now is that it's larger than its community can effectively tend to. Wikipedia needs to aggressively limit its growth, at least in the short term, to give its community enough time to structure itself better to be able to handle the content it has now, to say nothing of the content it will acquire in the future. The problem that adopting attributability (or verifiability) as a minimum criterion for inclusion is that someone is going to have to check the cited sources for accuracy. Nobody is doing that now, except on a haphazard basis. Wikipedia has no process now for any sort of organized maintenance of the encyclopedia; even vandalism management is done haphazardly.

Quite frankly, I think it would be appropriate for Wikipedia to disable new page creation (except for admins, to deal with special cases) for an entire month and spend that month developing the infrastructure to better maintain both the articles it currently has and the new articles it'll gain once new page creation is reenabled. New page review needs to be systematic, not haphazard, and there need to be systems to ensure that every new page is looked at by at least one and preferably several experienced editors promptly after creation, both to properly categorize it (the stub sorters already sorta do this, but they do so in a far less useful way than they could) and to evaluate the article for what action the community needs to take with respect to it. And then the community needs to actually do those things.

There are currently 21,598 articles tagged as needing cleanup and 55,928 tagged as lacking sources. And I suspect that only represents about 20% of the articles that actually belong in those respective categories. These numbers are not falling with time; they are growing (a month ago, there were only 49,607 tagged as lacking sources). These backlogs reflect the rapidly declining overall quality of Wikipedia. The situation may already be out of control; if it is not yet, it likely will be soon. The problem is that the community largely seems not to care, and that really bothers me.

Deleting all unverified articles would be a good start. Not all at once, but a deliberate, systematic process to either source or delete those 55,928 articles would be a great start. Proper use of automation is critical to this, and I really think that's where Wikipedia needs to be concentrating its activities in the next year. It would be great if the Foundation would help to recruit the volunteers needed for this effort; the problem with the current community is that there don't seem to be enough people interested in this sort of work to get it done.