Notations used in string matching algorithms
T = CGTGCCTACTTACTTACTTACTTACGCGAA len(T) = n
P = CTTACTTAC len(P) = m
^ ^
| |
s j
T = text (where you want to find the pattern)
P = pattern
ASIZE = size of the alphabet
(All characters from P and T must belong to the alphabet)
n = length (# chars) of T
m = length (# chars) of P
s = shift of P relative to T
j = charactor position to the next character in the matching
|
The
brute force
pattern matching algorithm
int bruteForce(string T, string P, int s) // s = start position
{
int n = T.length(), m = P.length();
int j = 0; // Start match with character pos = 0
while ( s+(m-1) < n ) // Last char pos in P is valid character in T
{
if ( P[j] == T[s+j] )
{
j++; // Continue with next character
if ( j == m )
return(s); // Pattern found at pos s
}
else
{
s += 1; // Use next shift position
j = 0; // reset to 1st char in pattern to match
}
}
return -1; // Pattern not found
}
|
Test program for the
brute force
pattern matching algorithm
#include <iostream>
#include <string>
using namespace std;
int main()
{
string T = "CGTGCCTACTTACTTACTTACTTACGCGAA";
string P = "CTTACTTAC";
int s = -1;
while ( (s = bruteForce(T, P, s+1)) != -1 )
{
cout << P << " found at position " << s << endl;
}
}
|
❮
❯