Previous

Next


16. Greediness and backtracking

  • Okay, now you know how the NFA engine works by trying the regex at the start of the target string, and bumping along the target as it needs to, and backtracking through the regex when it reaches a dead end.
  • However, you will need to be careful with backtracking and greediness - don't get caught out thinking that backtracking is going to happen in places where it won't. Let's think about this example. We want to search this string:
      Some 76294 number
    
    with this regex:
      ([0-9]+)
    
  • What do we get in $1?
      Position in the regex | Position in the string
      ----------------------------------------------
       ([0-9]+)             | Some 76294 number
         ^^^                |^
      ----------------------------------------------
       ([0-9]+)             | Some 76294 number
         ^^^                | ^
      ----------------------------------------------
       ([0-9]+)             | Some 76294 number
         ^^^                |  ^
      ----------------------------------------------
    
      etc.
    
      ----------------------------------------------
       ([0-9]+)             | Some 76294 number
         ^^^                |      ^
      ----------------------------------------------
       ([0-9]+)             | Some 76294 number
         ^^^ ^              |      ^^
      ----------------------------------------------
       ([0-9]+)             | Some 76294 number
         ^^^ ^              |      ^^^
      ----------------------------------------------
       ([0-9]+)             | Some 76294 number
         ^^^ ^              |      ^^^^
      ----------------------------------------------
       ([0-9]+)             | Some 76294 number
         ^^^ ^              |      ^^^^^
      ----------------------------------------------
       ([0-9]+)             | Some 76294 number
         ^^^ ^              |      ^^^^^^
      ----------------------------------------------
    
  • Whoops! That's not right, so the NFA engine backtracks to give our final result of:
      ----------------------------------------------
       ([0-9]+)             | Some 76294 number
         ^^^ ^              |      ^^^^^
      ----------------------------------------------
    
    with "76294" in $1, just like we'd expect.
  • So what happens if we searched with
      ([0-9]*)
    
    instead?

Previous

Next

Andrew Hill

For LinuxSA Meeting, 17 April 2001