Tuesday, January 09, 2007

Why Commons sucks

The Wikimedia Commons sucks. Let's not mince any words about this. It just sucks.

No, not the concept. The idea of having a repository of free content is just great. Wonderful idea. Really spot on great idea. It's the implementation that sucks.

There's a term in the media industry for what Commons is, or at least should be: Digital Asset Management, or DAM. (Wikipedia has an article on it, but frankly it reads like sales literature to me.) Allow me to detail point by point why MediaWiki is simply the wrong software to be used as a DAM -- and especially as a DAM for open content.

A good DAM would provide a very thorough cataloging and indexing to facilitate retrieval of desired content. For example, it should be very easy to obtain "all images that portray a kitten in color that are at least 3 inches by 5 inches at 100 dpi". MediaWiki cannot do this. MediaWiki can't even give you all images that portray a kitten unless there is a "kitten" category (fortunately for us, there is). But that's dependent on the uploader putting it in the "kitten" category, and MediaWiki's categorization system has never been good to begin with, and the category tree used on the Commons has a lot of curious issues. And what if you only want ginger tabby kittens? If they're in a ginger tabby kitten category, sure (but there isn't one, alas), but if they're merely in both the ginger tabby category and the kitten category, well, you can't do that. You could try to do something with MediaWiki search, but we all know how bad MediaWiki search is. And there's no hope of specifying format criteria, unless someone has bothered to create a category that happens to match your needs and uploaders have happened to manually assign the categories. Hm. There's a lot of "manual" in this.... not good.

A good DAM would have some way to detect probable duplicate images. There are some decent algorithms for this out there. Commons has no way to use them. There are lots of duplicated images on Commons, and they'll only get fixed if an administrator happens to notice one of them. Tagging duplicates for delete doesn't seem to help: I accidentially uploaded a duplicate once; it took six months and three separate naggings of various Commons admins before it was finally deleted.

A good DAM will provide some way to collect related content together. By "related", I mean content that is all derived from the same original work. I recently took an photograph uploaded to Commons by another user, and (at her request) did some Photoshop magic on it to improve the quality of the image, and reuploaded it. My upload is clearly a derivative -- one might even say a version -- of the original. With MediaWiki I have two choices: upload it as a new version of the original -- overwriting the original -- or else upload it under another name, which will then not be associated with the original in the software in any way. I can add content to the image description pages saying "oh, by the way, see over here for another version", but that's a manual process and is likely not to be performed on a regular basis.

A good DAM will provide an application-independent means to access its content. Commons has no API for access to its database or content except the generic MediaWiki interface, which is not really suitable for use as a content repository. The current methodology of having Commons "stand behind" each project's Image: and Media: spaces is rather much a hack, and is not very much extensible. MediaWiki is not a content repository, and it doesn't play well as one.

A good DAM will have mechanisms for controlling compliance with licensing requirements. This may not seem like such a big deal since Commons is supposed to be entirely free content. However, there are lots of media on Wikimedia projects that are not free. Why should the same nonfree media be uploaded to both enwiki and some other project simply because both want it? Current policy requires nonfree media that fall within the "fair use" policies of multiple projects to exist in duplicate merely because Commons arbitrarily refuses to host them. To me, it makes sense to store all of the media in a single system and use copyright control mechanisms within the system to restrict its presentation to locations where its display is consistent with the usage allowed by its license (or lack thereof). Also, in the sense of providing a reusable tool for use outside Wikimedia, in projects where copyright policies may not so strongly favor free content, having copyright controls becomes much more useful. Some MediaWiki-based wikis have "all rights reserved" as a copyright policy, after all.

In short, Commons was a good idea implemented in nearly the worst possible way by using a completely wrong tool. MediaWiki is, fundamentally, a text content engine: it is intended to support collaborative editing of text content, with versioning and all that jazz. (It has issues doing that, too, but that's not grist for this post.) MediaWiki image management has never been good (although it is much better than it used to be) and it's simply not the right tool for the job. The implementation of the Commons in MediaWiki is, as far as I can tell, an example of "if all you have is a hammer, everything looks like a nail". (Don't ask me who I blame for this.) It impresses me the lengths of complexity that people will go to to make MediaWiki behave like something it is not, simply because "all the world is a MediaWiki". Software should serve users, not the other way around.

I'm not entirely in agreement with Amgine's comments on the Commons, but his post definitely spurred me to post on this issue. Commons should be entirely replaced by a proper DAM. I'm not the person to write it (if I were to write it, it would be written in Java, and I'm reasonably confident the Wikimedia Foundation won't even contemplate hosting it if I write it in Java, and I don't happen to have a place to host over 1TB of digital media content) but I really would like it if someone would step up to the plate and do something about the total suckiness of the project where I make the most contributions these days.