It's pretty easy to see that states 1 and 4, 2 and 5, and 3 and 6 are functionally identical. It gets less easy when you generate an NFA that has 200 states in it.
So how do we recognize when states are functionally identical?
My first stab was equivalent to trying to reassemble a broken ceramic sculpture by trying all the pieces and parts to see if they fit. It occured to me that testing a universe of disparate states for sameness was not going to be effective. If you have A->x->Z and B->x->Q, you can't tell if A and B are identical because at some stage in the process you may or may not have tested Z and Q for sameness.
It occured to me that it's easier to prove states are different than it is to prove they're the same.
The algorithm below starts off with all the states in a large group. We iterator over the states in the group, querying each for a 'signature' of it's transitions and such. We create new groups and put all states with the same signature in the same group. The signatures are not dependent upon only the states' transitions, but also on the destination group of those transitions. That is, S1->x->s10 and S2 -> x -> S9 look different. But if s10 and s9 are both in group 6, then signature(S1) could be something like "x->G7" and signature(S2) could be "x->G7". That's a match. S1 and S2 could be identical.
make a group G
for each state S
G.add(s), where this method includes
s.groupMember = G
while(the number of groups keeps changing)
for each group g
Regroup the states in g into g1, g2, ... gn
based upon signature(state). Whenever
adding a state to a group, we must
be sure to set s.groupMember = owningGroup
// All groups now contain only redundent states.
For each group G
make a new uber state S as a copy of any
associate S with G
// Change the references to the old obsolete
// states with the shiny new uber-states.
For each group G
for each of G's uberstate's transitions T
T.getDestination = T.getDestinationState.
startState = startState.getGroupMember.getUberstate
make a comparable representation of the transitions
and the group number of which the transition's
destination state is a member. If some state S has
transitions x->10, y->11, and z->12, and the groups
are grp#0 = (10 11) and grp#1 = (12), then the result
could be the (well delimited) "~x->0~~y->0~~z->1~"
Also add to the string if the state is stop state,
or any other identifying fact, such as the token
class to be produced by the state. For example
if two states are identical except one produces
a token class of A and the other produces a
token class of B, they aren't identical.
Two states that produce a different result can't
possibly be identical. Either they have different
inputs or the destinations fundamentally different.
Two states that do seem to be identical now could be
proven to be non-identical in a different iteration
of the loop.