Wednesday, October 25, 2006

I said in my last post that I didn't get any work on my java port project done tonight. Well, I lied. My headache got better and not only did I finish, but I got websvn set up on my SVN server, too. This means that, for example, you can see my last commit with ease now. If there is actually anyone out there watching me do this silly project, well, I just made it easier for you. I've also nearly decided on a name, but that won't show up until I finish with the first pass (since I don't want to try to do the refactor by hand).

I've also been thinking about how to handle MediaWiki's handling of quotes in the parser. The approach currently used by MediaWiki is such that it makes the language require arbitrary lookahead/lookbehind, which is bad. However, I think I can avoid that at the cost of making certain nonstandard markups parse "differently". Mainly, sequences of either exactly four quotes, or more than five quotes. I would make two quotes be one lexeme (call it 2Q), three quotes another different lexeme (3Q), five quotes yet another lexeme (5Q), and sequences of four or six or more probably be syntax errors. I might allow for four quotes and six quotes to also be lexemes (4Q and 6Q) that are basically treated as null strings, since they can conceivably occur during template expansion and when they do the user's intent was probably to output nothing. There is no excuse for seven or more quotes; that should be a syntax error.

The downside is that I don't think I can easily enforce pairing at the grammar level; it will have to be enforced at the conversion layer instead. I'm not sure this is really a bad thing. The problem is that 2Q ... 3Q ... 5Q is semantically valid, as is 2Q ... 3Q ... 2Q ... 3Q, and 5Q ... 3Q ... 5Q ... 2Q. But 3Q ... 5Q is not. It might be possible to enumerate all the possible combinations and do so in a way that doesn't break LL(1)/LR(1), but I'm not sure about this yet.