Saturday, March 31, 2007

TAS Must Die, Chapter 4

Ok, here's what I'm working towards:

The 'hand-coded lexer generator' is a java program that, using no real configuration, produces a lexer, which in my world is a series of data and data structures and the bare-bones code to tranverse those data structures. In my case, I generate a lexer that can hopefully recognize everything you'd see in file of lexical rules.

'a lexer' is the lexer generated by the generator.

'input that a lexer is meant to process' is a list of lexical rules. It's the input I'd like the parser to consume, using the lexer to take it apart.

'Hand-coded parser' is a recursive-descent parser (with no backtracking) that consumes the file, using 'A lexer' to break the input into pieces.

'A lexer #2' is the output of the parser. It is a lexical generator that should recognize the rules in 'input that a lexer...' The point is that I'll be able to use this to lex an arbitrary file.

Here's the implementation plan:

To accomplish my true goal, I would write a lexical description of TAS's Iscript language and use 'a lexer #1' to break the input into tokens which a new program, an 'Isript Compiler' would ingest.

Here's how I'll use it once it all works:
Then the pcode will be compiled and interpreted, which is a whole new thing. I'm sure I'll write several chapters about it.

Of course, being unable to just do the job at hand, I'm going to sidetrack a little. In the first diagram, you see the 'hand-coded lexer.' This is odd since I'm writing a parser that should be able to produce that very same lexer. That is, with proper diligence, 'A lexer #1' and 'A lexer #2' should be functionally identical. At this point, the hand-coded lexer can be retired. It's called 'bootstrapping' and it makes my head hurt. It's a chicken-and-egg situation - i'm writing a lexer using the lexer.

A practical effect of boorstrapping the lexer is I'll have two allegedly identical lexers of different origin. I should be able to put large amounts of text through each, and they should respond exactly the same. If they don't, I have work to do.

No comments:

Post a Comment