All the bugs that I know of are gone now. I haven't taken the 'bootstrap' plunge yet. That being said, it's been a few days since I had to rebuild a base lexer using the L1Gen lexer program.
I'm lexing a large part of the TAS Iscript input files now. Now I'm at the crossroads of
1. Writing a parser for the tokens so I can convert iscript into something else. That's ultimately what this project us about. I previously hand-coded "P1", a recursive-descent parser that produces DFAs from the lexical rules. It was surprisingly easy. I'll probably do the same for the iscript parser. Easy Peasy.
2. Writing a parser generator. Then I could describe iscript's structure using BNF or something and have the parser be generated. This is another favorite topic for me. Recent experiments of turning BNF into a Greibach Normal Form have been very successful.
3. Finding a parser generator for Java, similar to yacc. Where's the fun in that? If I was doing this project as part of my job, I'd be all over it. But I'm not so I'm not.
4. Cleaning up what I have, because it will probably bite me anyway. For example, I still don't handle newlines and other control characters well. The subtelty of this surprises me. If I started all over, I'd probably start with control (and escaped) characters first.
5. Stopping. I've accomplished my first goal and learned a lot.
I'll probably bounce between #1 and #4, addressing weaknesses of the lexer generator as needed. And after a few visual observations, it's no good to have the some manner of p-code being produced without doing anything with it, so I'll probably start building a servlet to interpret said p-code.
This has been quite a learning experience, I need time to breathe now.