21. Another Traditional NFA problem
- Another problen with the Traditional NFA engine other than non-greedy
alternation is when quantifiers are insufficiently greedy.
- For example, say you want to find some C code, like*:
#define print_it(name,index) \
fprintf(stderr,"%s bytes per sec = %12.2f (%5.1fuS)\n",name, \
tm[index]*8,1.0e6/tm[index]);
- First up, you might decide to search for:
\#define.*
but this would only get you the first line. So, it's tempting to
amend this to be:
\#define.*(\\\n.*)*
- In other words, look for the "#define", followed by zero or more of
anything, and then, zero or more occurrences of a literal backslash, then
newline, then zero or more of anything.
- This would give us all of the continuing lines as indicated by the
"\" at the end of each line. (Okay, we could fix it to cope with spaces after
the backslash, but just ignore that for now.)
- However, we know by now that the Traditional NFA engine will match that
first ".*" right up to the end of the line, including any backslashes
that may be there, and then nothing is going to make that star give anything
up, because everything inside the brackets is optional! So a Traditional
NFA engine won't find the longest match - it's insufficiently greedy.
- As an excercise, feel free to work through this with the DFA engine to see
how the DFA engine would end up matching the lot. A POSIX NFA engine
would also find the longest leftmost match.
*From FreeS/WAN 1.8
|