While I'm procrastinating on making the final changes to the lexer, I thought I'd post the lexer's architecture. I wrote it in a frenzy the other day and I almost forgot how it works.
To the left is a variable named 'currentLexer'. This points to the start state of the currently active lexer. When we call the lexer for a token, 'currentLexer' tells us the state to begin with.
This refers directly to the four parallel arrays in the center of the diagram. Any row across them represents a single state in the lexer. 'LexerJump' will, upon succesfully lexing a token, change the current lexer. This facility allows lexer designers to create additional lexers to help resolve lexical conflicts. If while scanning the input we can't make any progress, TokenString will tell us if we're in a stop state and which token should be returned. BegEdge and EndEdge identify the rows in the 'edge' tables that are associated with the current state. If the current state transitioned on a, b, and c, there would be 3 rows in the 'edge' tables. BegEdge would point to the first row and EndEdge would point to the last row.
To the right are two more parallel arrays. A single row represents an input character and the state we advance to if we see that character. EdgeInputs contains a character. The paired EdgeDests row tells us the state we advance to upon scanning that character.
Finally, the Tokens array at the bottom holds all the 'class'es that can be returned by the lexer. It is used by the TokenString array.
I've refered a few times to 'multiple lexers' and such. Excepting 'currentLexer' and 'LexerJump', the architecture really doesn't know about lexers at all. There are states, their inputs, and the tokens generated. So stacking several DFAs/lexers into these arrays works fine. The only thing we have to do is to add a mechanism to allow us to point to the first state of any DFA/lexer, and that's what 'currentLexer' and 'LexerJump' do. Previously, instead of a 'currentLexer' variable, I just started at state 0 and there was no concept of more than one lexer.