But we sort of cheated. If we scanned an 'a', we'd be stuck. We wouldn't know whether to go to state 1, 5, 9, or 12. If we implemented this, we'd have to be prepared to try them all. Ladies and germs, this sucks to do. It's hard to code and easy to get burned by ambiguities.
I'm going to talk about some transformations to convert this NFA into an equivalent NFA that's easier to implement.
We can solve this problem using the following rule: If state x has an epsilon transition to state y, copy all of y's transitions to x and remove x's epsilon to y. Further, if state y is a stop state, make state x a stop state.
Applying this rule to state 0's epsilon to state 1, we create the following NFA.
0's epsilon to 1 is gone, and 0 has assumed 1's 'a' transition to 2.
Now we apply the rule to 0's other epsilon transitions to arrive at this: 's6'
Our NFA is now free of epsilons, so we're closer to having a recognizer that is easy to implement.
However, while removing the epsilons, we've created two new problems:
States 1, 5, 9, and 12 have no inputs. There is no way to get to them. Unreachable states are totally useless and should be removed. This is easy on paper, not always so easy in the computer. Depending on your implementation language, you have to act with varying amounts of vigor. In C, you might keep track of the data structures and free() them if they become unused. I use Java, so once an instance has no references the garbage collector reclaims it. I like not having to worry about it, but then again I can't control the Java's garbage collector. I'm ok with the tradeoff.
For an ambiguous input x connecting A x T1, A x T2,... AxTn, create a new state B and connect it to A using x, so AxB. Connect B to all the destination states Tn with epsilons. Yes, we're inserting epsilons back in and thus we'll have to re-run the epsilon-removing code. I'm ok with that. It's certainly possible to do something fancy and create B without epsilons. My take is that code correctness is king. Using the epsilon remover requires less code and thus probably causes fewer bugs. So we're going to create state number 16, and let 0->a->16, and then connect 16 to 2, 6, 10, and 13 using epsilons.
abd: 0->a->16->b->17->d->4, and 4's a stop state. Check.
acd: 0->a->16->c->7->d->8. 8's a stop state. Check.
ab: 0->a->16->b->17, and 17's a stop state (because we inherited that status from 11.)
abf: 0->a->16->b->17->f->15, stop. check.
The last transformation I employ is an identical state remover. If you have two states that have identical transitions and have the same stop states, those states are identical. One can be removed from the NFA with references to this state being replaced by references to the other. I invented a cool algorithm I'll share soon. (Yes, I realize some white-haired mathematician guy with frightening eyebrows probably invented it 50 years ago. But it is new to me!)
My current implementation features a method in the Nfa class where the epsilon remover, disambigufier, and identical-state combiner are run in a loop until the NFA quits changing. At this point the NFA has become a DFA and is easy to implement.
No comments:
Post a Comment