Previous

Next


13. The NFA Engine - An Example

  • Consider this regex:
      to(morrow|mmorrow|mmy)
    
    Searching this string with tomorrow mis-spelled:
      I hope this makes sense tommorrow.
    
  • Let's work through the example - in this case, I've used just the "^" character to show what is being matched on both sides. As with before, the regex engine is starting at the start of line position:
      Position in the regex   | Position in the string
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^                      |^    
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^                      | ^   
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^                      |  ^  
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^                      |   ^ 
      ------------------------------------------------------------
    
      etc.
    
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^                      |        ^     
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^^                     |        ^^    
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^                      |        ^     
      ------------------------------------------------------------
    
  • Notice the engine has backtracked in the regular expression when it has found that the next character does not match. The engine goes back to the last place where a partial successful match occurred, and tries other options from there.
  • In this case, it goes back to the place where just the "t" matched, and the only possible thing it can do from there is to bump along to the next character.
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^                      |         ^    
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^                      |          ^   
      ------------------------------------------------------------
    
      etc.
    
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^                      |                         ^     
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^^                     |                         ^^    
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^^ ^                   |                         ^^^   
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^^ ^^                  |                         ^^^^  
      ------------------------------------------------------------
    
  • Again, when the next character in the string doesn't match, the regex engine backtracks to the last place in the regex where it was able to match, and carries on from there.
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^^                     |                         ^^    
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^^        ^            |                         ^^^   
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^^        ^^           |                         ^^^^  
      ------------------------------------------------------------
    
       etc.  
    
      ------------------------------------------------------------
       to(morrow|mmorrow|mmy) | I hope this makes sense tommorrow.
       ^^        ^^^^^^^      |                         ^^^^^^^^^
      ------------------------------------------------------------
    
  • Great! The NFA engine also matches!

Previous

Next

Andrew Hill

For LinuxSA Meeting, 17 April 2001