20. Greediness and alternation
- Remember how I said there was a Traditional NFA and a POSIX NFA engine?
- Remember also our "earliest match wins" rule?
- Well, that rule isn't quite right - it should also state that not only
should the earliest match win, but of any potential matches at that
point, it should be the longest one that wins - which is clear enough,
given all the time we've spent looking at greediness.
- So, if you like, we can re-name this rule to be the "longest leftmost"
rule - we want the longest match that occurs the farthest to the left.
- This "longest leftmost" rule is a requirement for POSIX regex engines -
and Traditional NFA engines aren't POSIX NFA engines. A classic example
of a Traditional NFA engine's non-POSIXness is with greediness and
alternation.
- Consider the following target:
It's bad, isn't it?
and use the following regex:
(is|isn't)
- Here's what would happen:
Position in the regex | Position in the string
----------------------------------------------
(is|isn't) | It's bad, isn't it?
^ |^
----------------------------------------------
(is|isn't) | It's bad, isn't it?
^ | ^
----------------------------------------------
(is|isn't) | It's bad, isn't it?
^ | ^
----------------------------------------------
etc.
----------------------------------------------
(is|isn't) | It's bad, isn't it?
^ | ^
----------------------------------------------
(is|isn't) | It's bad, isn't it?
^^ | ^^
----------------------------------------------
- Which is a match, and a Traditional NFA engine will stop at this point, even though
a longer match is possible.
- Sure, by re-writing the regex to be:
(isn't|is)
you would get the longest leftmost, but you can see that it would be
pretty easy to slip up, and not get that result.
|