Improving on the brute force string macthing algorithm

  • Consider the following state in a string matching operation:

      T = abadaabaccabacabaabb
      P = abacab
          ^  ^
          |  |        P[j] != T[s+j]
        s=0  j=3 

  • The brute force string matching algorithm will try the next character position:

      T = abadaabaccabacabaabb
      P =  abacab
           ^
           |
          s=s+1  <--- Can we do better than +1 ?
          j=0  

Using a partially matched prefix to slide the pattern

  • Consider the following state in a string matching operation:

       T = ?????????a??????????
       P =      aaabxxyz       
                ^   ^
                |   | First mismatch occurs here
                s   j

  • We can know for a fact that ....

       T = ?????????a??????????
       P =      aaabxxyz  
                ^   ^
                |   |
                s   j

Using a partially matched prefix to slide the pattern

  • Consider the following state in a string matching operation:

       T = ?????????a??????????
       P =      aaabxxyz       
                ^   ^
                |   | First mismatch occurs here
                s   j

  • All characters in the prefix before the mismatched character are same:

       T = ?????aaaba??????????
       P =      aaabxxyz  
                ^   ^
                |   |
                s   j

Using a partially matched prefix to slide the pattern

  • Consider the following state in a string matching operation:

       T = ?????????a??????????
       P =      aaabxxyz       
                ^   ^
                |   | First mismatch occurs here
                s   j

  • The prefix aaab will cause a shift of 1 position to fail:

       T = ?????aaaba??????????
       P =       aaabxxyz  
                 ^    
                 |        because  aab != aaa
                s=s+1    

Using a partially matched prefix to slide the pattern

  • Consider the following state in a string matching operation:

       T = ?????????a??????????
       P =      aaabxxyz       
                ^   ^
                |   | First mismatch occurs here
                s   j

  • The prefix aaab will cause a shift of 2 positions to fail:

       T = ?????aaaba??????????
       P =        aaabxxyz  
                  ^    
                  |       because   ab != aa
                 s=s+2    

Using a partially matched prefix to slide the pattern

  • Consider the following state in a string matching operation:

       T = ?????????a??????????
       P =      aaabxxyz       
                ^   ^
                |   | First mismatch occurs here
                s   j

  • The prefix aaab will cause a shift of 3 positions to fail:

       T = ?????aaaba??????????
       P =         aaabxxyz  
                   ^    
                   |       because    b != a 
                  s=s+3    

Using a partially matched prefix to slide the pattern

  • Consider the following state in a string matching operation:

       T = ?????????a??????????
       P =      aaabxxyz       
                ^   ^
                |   | First mismatch occurs here
                s   j

  • The earliest possible match must start here:

       T = ?????aaaba??????????
       P =          aaabxxyz  
                    ^    
                    | The prefix aaab will cause a shift of +4 !
                   s=s+4    

Example 2 on using partially matched prefix to slide the pattern

  • Consider the following state in a string matching operation:

       T = ????????????X??????????
       P =     aaaaabaaxaaxyz 
               ^       ^
               |       | First mismatch occurs here
               s       j

  • We can know for a fact that ....

       T = ????????????X??????????
       P =     aaaaabaaxaaxyz 
               ^       ^
               |       |
               s       j

Example 2 on using partially matched prefix to slide the pattern

  • Consider the following state in a string matching operation:

       T = ????????????X??????????
       P =     aaaaabaaxaaxyz 
               ^       ^
               |       | First mismatch occurs here
               s       j

  • All characters in the prefix before the mismatched character are same:

       T = ????aaaaabaaX??????????
       P =     aaaaabaaxaaxyz 
               ^       ^
               |       |
               s       j

Example 2 on using partially matched prefix to slide the pattern

We can know for sure that no match can occur at these positions:

   T = ????aaaaabaaX??????????
   P =      aaaaabaaxaaxyz 

   T = ????aaaaabaaX??????????
   P =       aaaaabaaxaaxyz 

   T = ????aaaaabaaX??????????
   P =        aaaaabaaxaaxyz

   T = ????aaaaabaaX??????????
   P =         aaaaabaaxaaxyz 

   T = ????aaaaabaaX??????????
   P =          aaaaabaaxaaxyz  

The earliest possible match starts here:

   T = ????aaaaabaaX??????????
   P =           aaaaabaaxaaxyz