'a lexer' is the lexer generated by the generator.
'input that a lexer is meant to process' is a list of lexical rules. It's the input I'd like the parser to consume, using the lexer to take it apart.
'Hand-coded parser' is a recursive-descent parser (with no backtracking) that consumes the file, using 'A lexer' to break the input into pieces.
'A lexer #2' is the output of the parser. It is a lexical generator that should recognize the rules in 'input that a lexer...' The point is that I'll be able to use this to lex an arbitrary file.
Here's the implementation plan:
To accomplish my true goal, I would write a lexical description of TAS's Iscript language and use 'a lexer #1' to break the input into tokens which a new program, an 'Isript Compiler' would ingest.
Here's how I'll use it once it all works:
Of course, being unable to just do the job at hand, I'm going to sidetrack a little. In the first diagram, you see the 'hand-coded lexer.' This is odd since I'm writing a parser that should be able to produce that very same lexer. That is, with proper diligence, 'A lexer #1' and 'A lexer #2' should be functionally identical. At this point, the hand-coded lexer can be retired. It's called 'bootstrapping' and it makes my head hurt. It's a chicken-and-egg situation - i'm writing a lexer using the lexer.
A practical effect of boorstrapping the lexer is I'll have two allegedly identical lexers of different origin. I should be able to put large amounts of text through each, and they should respond exactly the same. If they don't, I have work to do.
No comments:
Post a Comment