Notations used in string matching algorithms


     T = CGTGCCTACTTACTTACTTACTTACGCGAA         len(T) = n
     P =       CTTACTTAC                        len(P) = m
               ^   ^
	       |   |
               s   j


  T = text (where you want to find the pattern)
  P = pattern

  ASIZE = size of the alphabet
          (All characters from P and T must belong to the alphabet)

  n = length (# chars) of T
  m = length (# chars) of P

  s = shift of P relative to T
  j = charactor position to the next character in the matching 

The brute force pattern matching algorithm

int bruteForce(string T, string P, int s)  // s = start position
{
    int n = T.length(), m = P.length();

    int j = 0;  // Start match with character pos = 0

    while ( s+(m-1) < n )  // Last char pos in P is valid character in T
    {
        if ( P[j] == T[s+j] )  
        { 
            j++;              // Continue with next character
            if ( j == m )
                return(s);    // Pattern found at pos s
        }
        else
        {
            s += 1;     // Use next shift position
            j = 0;      // reset to 1st char in pattern to match
        }
    }
    return -1;          // Pattern not found
}

Test program for the brute force pattern matching algorithm

#include <iostream>
#include <string>

using namespace std;

int main()
{
    string T = "CGTGCCTACTTACTTACTTACTTACGCGAA";
    string P = "CTTACTTAC";

    int s = -1;
    while ( (s = bruteForce(T, P, s+1)) != -1 )
    {
        cout << P << " found at position " << s << endl;
    }
}