Monday, April 9, 2007

TAS Must Die, Chapter 9

All the bugs that I know of are gone now. I haven't taken the 'bootstrap' plunge yet. That being said, it's been a few days since I had to rebuild a base lexer using the L1Gen lexer program.

I'm lexing a large part of the TAS Iscript input files now. Now I'm at the crossroads of

1. Writing a parser for the tokens so I can convert iscript into something else. That's ultimately what this project us about. I previously hand-coded "P1", a recursive-descent parser that produces DFAs from the lexical rules. It was surprisingly easy. I'll probably do the same for the iscript parser. Easy Peasy.

2. Writing a parser generator. Then I could describe iscript's structure using BNF or something and have the parser be generated. This is another favorite topic for me. Recent experiments of turning BNF into a Greibach Normal Form have been very successful.

3. Finding a parser generator for Java, similar to yacc. Where's the fun in that? If I was doing this project as part of my job, I'd be all over it. But I'm not so I'm not.

4. Cleaning up what I have, because it will probably bite me anyway. For example, I still don't handle newlines and other control characters well. The subtelty of this surprises me. If I started all over, I'd probably start with control (and escaped) characters first.

5. Stopping. I've accomplished my first goal and learned a lot.

I'll probably bounce between #1 and #4, addressing weaknesses of the lexer generator as needed. And after a few visual observations, it's no good to have the some manner of p-code being produced without doing anything with it, so I'll probably start building a servlet to interpret said p-code.

This has been quite a learning experience, I need time to breathe now.