Previous

Next


29. Metacharacters - Word Boundaries

  • Word boundaries are another commonly provided metasequence.
  • egrep, vi and GNU Emacs provide:
      \<
      \>
            
    for the start and end of words, respectively.
  • Perl and GNU Emacs provide:
      \b
      \B
            
    for word boundaries and non-word boundaries, respectively.
  • Hang on! Remember that a regular expression has no concept of sentences and words! So how can it recognise the start and end of a word?
  • Well, it can't. But it can recognise the start and end of a series of characters that are considered to be what make up a word. With most tools, this is all the alphabetic and numeric characters, or:
      [a-zA-Z0-9]
            
    Some will also include "_".
  • In the following example, the sting "This is an example of Andrew's #%*&@ darn-good sen.ance." has been padded out with underscores so that arrows pointing down to the start of words and arrows pointing up to the end of words can be inserted in the right places:
    
      v         v     v     v               v     v             v  
      _T_h_i_s_ _i_s_ _a_n_ _e_x_a_m_p_l_e_ _o_f_ _A_n_d_r_e_w_'_s_
              ^     ^     ^               ^     ^             ^   ^
    
                  v         v         v       v 
      _#_%_*_&_@_ _d_a_r_n_-_g_o_o_d_ _s_e_n_._a_n_c_e_._
                          ^         ^       ^         ^
            
  • These are like ^ and $ in a way, in that they are also not matching a specific character, but rather a position.

Previous

Next

Andrew Hill

For LinuxSA Meeting, 21 November 2000