Thursday, September 27, 2007

More on my MediaWiki port

So, some people have inquired about my port of MediaWiki to Java.

One commentator implied that this is a foolish/wasted venture because it's "already done". JAMWiki is a pure Java wiki that resembles MediaWiki, but it is not a port; rather, it is a scratch implementation that uses MediaWiki's markup syntax and resembles its behavior. However, it does not use MediaWiki's database schema (a situation that they attribute to licensing issues). I have nothing but respect for what they're doing, but what they're doing is not what I'm doing. There are a number of other Java-implemented wikis out there, but I'm not attempting to compete with any of them. And, given my motivations (see below) even if it had already been done I'd still likely want to do it.

Another intriguing project that has been brought to my attention is Quercus, a native Java implementation of PHP. The Quercus people claim that Quercus runs PHP code significantly faster than the standard mod_php interpreter and is on a par with the performance offered by PHP accelerated by APC. Certainly an option for incremental development would be to run MediaWiki under Quercus, and then incrementally port portions of it to pure Java. It has occurred to me that the Wikimedia Foundation might benefit from doing this (if nothing else, it would likely significantly simplify their interface with Lucene, which at the moment is done with a really grotty .NET interface), but of course it's not my place to advise Brion and company how to run their show. But at the moment I don't want to take the time to immerse myself into another product. Might be something to look at down the road, though.

But, fundamentally, neither of these really serves my interests. As I've noted before, more than once, I don't like PHP very much. On the other hand, I would like to better understand the MediaWiki code. What better way to understand a body of code than to port it to another language? At the end of this, not only will I have a better understanding of MediaWiki's codebase (I've already submitted three bug reports, all for admittedly minor items) as well as even more reasons to hate PHP, but hopefully also I will have a product that is a drop-in replacement for MediaWiki that also outperforms it and can be more easily modified, to boot.

As to why Java? At the moment, Java is my favorite language for large applications. I actually prefer C#'s generics to Java's, but I do not trust Microsoft enough to feel comfortable committing my labors to a language and runtime with dubious intellectual property issues. Java is far more open than C#, and so I'm much more comfortable (philosophically) developing to that environment. Furthermore, there is an open source Web engine for Java already (Tomcat) as well as well-known compatible high-performance enterprise products from Sun and IBM for someone who might want to use this product in an enterprise environment. I haven't worked much with Java in quite a while (I was a Java programmer back in 2001 for a while, but not much since) so this is also a chance to resharpen my skills.

Syntactically, PHP and Java are actually pretty close. I can mechanically convert PHP code to pretty-close-to-Java with just a handful of Perl scripts. I'm probably about half-way done with the first pass of rough code conversion (although admittedly the parser has not yet been done, which will likely demand considerably more attention than many of the other modules I've already done). Certainly it will be quite a long time before an actually executable product comes out of this, but I'm not on any timeline here. (Fortunately, MediaWiki makes very little use of the parts of PHP that are especially hard to port.)

The curious may observe my progress via SVN. I chose "Myrtle" because I am fond of wood (my other main hobby these days is woodworking) and because my full name anagrams to "Kill Nanny Myrtle".

Once I've finished this project, I am very likely to work on automated PHP-to-Java porting tools. That's going to require writing a PHP parser, but that can't be terribly hard, now, can it?