Previous

Next


19. Greediness always favours matching - continued

  • 'Fraid not.
  • While our new regex of:
      #!/usr/bin/perl
      $number = "2.3451245678";
      $number =~ s/(\.\d\d[1-9]?)\d+/$1/;
      print "$number\n";
      $number = "5.190653417532";
      $number =~ s/(\.\d\d[1-9]?)\d+/$1/;
      print "$number\n";
    
    still gives the same result:
      2.345
      5.19
    
    we now have a problem with:
      #!/usr/bin/perl
      $number = "2.562";
      $number =~ s/(\.\d\d[1-9]?)\d+/$1/;
      print "$number\n";
    
    which gives:
      2.56
    
  • Yikes! That number was truncated right - where did that third decimal place go?
  • Let's work the example to see why:
      Position in the regex | Position in the string
      ----------------------------------------------
       (\.\d\d[1-9]?)\d+    | 2.562
         ^                  |^
      ----------------------------------------------
       (\.\d\d[1-9]?)\d+    | 2.562
         ^                  | ^
      ----------------------------------------------
       (\.\d\d[1-9]?)\d+    | 2.562
         ^                  |  ^
      ----------------------------------------------
       (\.\d\d[1-9]?)\d+    | 2.562
         ^ ^                |  ^^
      ----------------------------------------------
       (\.\d\d[1-9]?)\d+    | 2.562
         ^ ^ ^              |  ^^^
      ----------------------------------------------
       (\.\d\d[1-9]?)\d+    | 2.562
         ^ ^ ^ ^^^          |  ^^^^
      ----------------------------------------------
       (\.\d\d[1-9]?)\d+    | 2.562
         ^ ^ ^ ^^^    ^     |  ^^^^
      ----------------------------------------------
    
  • Nope, that's no good. There's nothing for the "\d+" to match - and "+" means one or more times - it's required. So, the engine will backtrack to the last good place:
      ----------------------------------------------
       (\.\d\d[1-9]?)\d+    | 2.562
         ^ ^ ^ ^^^          |  ^^^^
      ----------------------------------------------
    
    where it can try something different - which is to drop that optional "[1-9]" digit - which it does:
      ----------------------------------------------
       (\.\d\d[1-9]?)\d+    | 2.562
         ^ ^ ^              |  ^^^
      ----------------------------------------------
    
    and then it carries on from there:
      ----------------------------------------------
       (\.\d\d[1-9]?)\d+    | 2.562
         ^ ^ ^        ^     |  ^^^^
      ----------------------------------------------
    
  • See how greediness always favours a match?

Previous

Next

Andrew Hill

For LinuxSA Meeting, 17 April 2001