Previous

Next


21. Another Traditional NFA problem

  • Another problen with the Traditional NFA engine other than non-greedy alternation is when quantifiers are insufficiently greedy.
  • For example, say you want to find some C code, like*:
    #define print_it(name,index) \
            fprintf(stderr,"%s bytes per sec = %12.2f (%5.1fuS)\n",name, \
                    tm[index]*8,1.0e6/tm[index]);
    
  • First up, you might decide to search for:
      \#define.*
    
    but this would only get you the first line. So, it's tempting to amend this to be:
      \#define.*(\\\n.*)*
    
  • In other words, look for the "#define", followed by zero or more of anything, and then, zero or more occurrences of a literal backslash, then newline, then zero or more of anything.
  • This would give us all of the continuing lines as indicated by the "\" at the end of each line. (Okay, we could fix it to cope with spaces after the backslash, but just ignore that for now.)
  • However, we know by now that the Traditional NFA engine will match that first ".*" right up to the end of the line, including any backslashes that may be there, and then nothing is going to make that star give anything up, because everything inside the brackets is optional! So a Traditional NFA engine won't find the longest match - it's insufficiently greedy.
  • As an excercise, feel free to work through this with the DFA engine to see how the DFA engine would end up matching the lot. A POSIX NFA engine would also find the longest leftmost match.
*From FreeS/WAN 1.8

Previous

Next

Andrew Hill

For LinuxSA Meeting, 17 April 2001