Friday, November 17, 2006

Wikimedia needs a TechCom

One of the things I've noticed in the past year and a half or so of watching Wikimedia operate is the way that MediaWiki development (which is at least partially paid for by the Foundation) seems to lurch about almost at random, with development being driven as much by what the devs feel like doing as what Wikimedia needs. Now, I realize that MediaWiki has customers other than Wikimedia. However, Wikimedia is the only customer that is paying them; they should get a much larger say in what gets developed.

On top of that, there is relatively poor communication between the communities (who have the desire for technical changes) and the developers (who are in a position to implement changes). Community-driven changes only take place when someone in the community manages to find a developer and convince that developer that the change is a good idea. This forces developers to be judges of consensus in communities that they are likely not even members of. There is no established mechanism for communities to come to a consensus regarding a change which requires either technical assistance (change of a configuration setting for the software) or software development, and, having come to that consensus, then request that the developers make that change. This isn't to say that such changes don't happen, it's just that there is no established mechanism. Whether or not a community can get a change executed comes down to whether or not they can successfully convince a developer that it's worth doing, which is a battle entirely independent to winning the consensus of the community for the change.

Another problem is the fact that many developers do double duty as system administrators. As a former developer turned system administrator, I will testify that this is one of the worst possible ways to run a development operation. There are endless reasons for this; I am not going to get into all of them, nor do I suggest that Wikimedia's operations team is guilty of all of them. However, I am a strong believer in a clear separation of responsibility between developers and administrators, especially on production hardware. (To be fair, Brion has done a decently good job of managing this, although there have certainly been some very egregious exceptions.)

On top of that, Brion is currently managed by Brad. Brad, while more technical than most CEOs I've run across, is neither sufficiently technical to direct Brion effectively, nor does he have the time to do it on top of all the other stuff he has to do. Nor is there any guarantee that the permanent ED will be even as technical as Brad. Brion is not sufficiently skilled (or, more pertiently, experienced) a CTO to effectively keep Brad appropriately informed on technical matters or, from what I've seen, to manage Wikimedia's technical assets and contracts in a fiscally prudent manner. As as result, the Foundation wastes money on poorly-considered purchases (and, especially, on its strategy of "throwing hardware at the problem" of its grossly inefficient software) and contracts, and doesn't get a whole lot of value out of funding MediaWiki development. It's clear to me that Wikimedia needs to shake up its technical side somewhat.

My recommendation is threefold. First, appoint a true CTO: someone who has the technical skills to manage both developers and operations, without actually having to be either a developer or a system administrator, and also the managerial skills to interface effectively with the nontechnical people in the ED's office, the CFO's office, and the Board. The role of the CTO (a title which Brion currently holds, but he does not really perform the duties of the office) is to direct operations, infrastructure investment, and development to ensure that the goals of the Foundation are being met by those activities, to keep the leadership of the Foundation informed on technical developments in a manner that is comprehensible to them, and to ensure that the directives that are set by the leadership are met in a timely manner by the technical staff and volunteers who report to the CTO.

Second, appoint a Technical Committee (or TechCom). The purpose of the TechCom, which would be a committee operating under the auspices of the Board, is to determine the technical needs of the Foundation and of the communities and convert those into directives to be given to the CTO for implementation. They would do so in consultation with the Board, with representatives of the communities, and with the CTO and other technical personnel. The CTO would probably be ex officio a member of the TechCom. The TechCom would be the entity to establish the mechanism by which a project requests a technical change; once the TechCom has evaluated the request and prioritized it, the CTO then decides how to make the request happen and assigns it to the appropriate teams for implementation.

Third, separate the technical staff into development and operations teams. The development team, led by a Senior Developer (a role for which Brion is probably most appropriate), would develop MediaWiki and other software required to meet the objectives and directives determined by the TechCom and the Board, and would report up to the CTO. The operations team, led by the Director of Technical Operations, would be responsible for maintaining the servers, hosting arrangements, and other such things as are required to maintain the day to day technical operation of one of the Internet's more complicated sites.

The Senior Developer will likely have to do a lot of volunteer coordination, since most of the MediaWiki developers are volunteers. However, it would likely make sense to allocate some budget to either contracting out development of code and/or hiring programmers, especially where such development could increase the efficiency of the systems used by Wikimedia. Current management strategy gives the developers no real incentive to improve efficiency because they have control over both operations and the hardware acquisition budget; therefore, they can simply solve performance problems by throwing more hardware at them. This has resulted in the Foundation being significantly overinvested in server hardware. On top of that, having Brion doing so many different tasks prevents him from doing any of them as well as he could. Divesting him of his operational and hardware acquisition responsibilities would free him to actually develop the code as well as give him time to recruit and manage volunteers for the project, which hopefully would lead to a better completion rate on outstanding projects. We've been promised SUL for what, over a year now, and stable versions on dewiki is now overdue as well. I cannot help but imagine that this is in part due to Brion being stretched too thin; but I also suspect it is due to inadequate supervision of Brion and the other technical resources as well.