14. Fanta Revisited
- Now that we know how the DFA and NFA engines work, let's revisit the
example we had before, and see exactly how we arrived at that strange
match.
- Let's look at the DFA engine first.
- For this example, I've introduced a new symbol in the DFA engine, "#".
This symbol is for marking the potential completed matches that the DFA
engine is keeping track of in the regex as it scans through the text
(remember how the DFA engine was keeping track of not only all the
potential matches, but also all the potential completed matches?)
- We were using the regex:
fanta(.*)n
on the target:
This isn't fantastic, let's drink fanta!
- Here's how the DFA engine would work:
Position in string | Match in regex
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
^ |
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
^ |
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
^ |
------------------------------------------------------------
etc.
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
^ | ^
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
*^ | ^
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
**^ | ^
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
***^ | ^
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
****^ | ^
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
*****^ | ^
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
******^ | ^
------------------------------------------------------------
etc.
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
********************^ | ^ ^
********************# | #
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
*********************^ | ^
********************# | #
------------------------------------------------------------
etc.
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
*************************^ | ^ ^
*************************# | #
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
**************************^ | ^
*************************# | #
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
***************************^ | ^
*************************# | #
------------------------------------------------------------
This isn't fantastic, let's drink fanta! | fanta(.*)n
****************************^ | ^
*************************# | #
------------------------------------------------------------
- See how as the engine scans through the text, it kees track of the positions in
the regex that are currently matching the text, including the last known "good"
match. This way, when it gets to a point in the text that no longer matches (in
this case, the end of the string), it can go back to the last known good match
and either carry on from there, or, as in this case, declare this point as the
match! (Of course, it could also report that there were no matches if this was
the case.)
|