0

I am writing a function to search in the binary array(bitmap file) a pattern of bits. The size of the pattern is from 5 to 8 bit long. I have implemented this function as testing for each bit in the array and in the pattern. However, it is not as efficient as it should be.

First of all I want to implement this code in C.



Point* FindPattern(imgInfo* pImg, int pSize, int* ptrn, Point* pDst, int* fCnt)
{
    int i, j, k, l;
    int mask;
    int rx = pSize >> 16;
    int ry = pSize & 0xFFFF;

    *fCnt = 0;
    for (i=0; i < pImg->height - ry; ++i)
        for (j=0; j < pImg->width - rx; ++j)
        {
            // for a rectangle with upper lefr corner in (i,j)
            // check if there is pattern in image
            for (k=0; k < ry; ++k)
            {
                mask = 1 << (rx - 1);
                for (l=0; l < rx; ++l, mask >>= 1)
                    if (GetPixel(pImg, j+l, i+k) != ((ptrn[k] & mask) != 0))
                        break;
                if (l < rx) // pattern not found
                    break;
            }
            if (k >= ry) //pattern found
            {
                pDst[*fCnt].x = j;
                pDst[*fCnt].y = i;
                ++(*fCnt);
            }
        }

For example I have such binary string: 1111 1111 1010 0000 0111 1111 1111 1111

and I am looking for pattern: 0100 0000

so what is the most efficient way to detect such a pattern in a string? By shifting the bits of the pattern and the string and than perfrom XOR on them?

1 Answer 1

0

Finding a given pattern can be done with a number of steps equal to the number of bits in the pattern. I do not know if it is most efficient way to detect a pattern, but for not too large patterns, it may be competitive.

As an example, consider your pattern 0100 0000.
It must be found in bitstring bs and we call bs[i] the ith bit of bs.
The pattern is matched at a given position i iff

bs[i] is false (0)
and bs[i+1] is false (0)
and bs[i+2] is false (0)
and bs[i+3] is false (0)
and bs[i+4] is false (0)
and bs[i+5] is false (0)
and bs[i+6] is true (1)
and bs[i+7] is false (0)

This can be turned to the logical expression
~ bs[i] & ~ bs[i+1] & ~ bs[i+2] & ~ bs[i+3] & ~ bs[i+4] & ~ bs[i+5] & bs[i+6] & ~ bs[i+7]
where there is logical complement ~ when there is a 0 in the pattern.

This can be rewritten with right bit shift to get the expression :
~ bs[i] & ~ ((bs>>)[i]) & ~ ((bs>>2)[i]) & ~ ((bs>>3)[i]) & ~ ((bs>>4)[i]) & ~ ((bs>>5)[i]) & ((bs>>6)[i]) & ~ ((bs>>7)[i])
where (bs>>k)[i] is the ith bit of bs shifted k steps rightwards.

From this we can deduce the following C code

#include <stdio.h>

unsigned int findpattern(unsigned int bitstring, unsigned int pattern, 
                         unsigned int patternsize) {
  unsigned int match=~0;
  for(int i=0; i<patternsize; i++){
    match &= ((pattern&0x1)-1) ^ (bitstring);
    pattern >>=1;
    bitstring >>=1;
  }

  return match;
}

int main() {
  unsigned int bitstring=0xffa07fff;
  unsigned int pattern=0x40;

  unsigned int match=findpattern(bitstring,pattern,8);

  if (! match) 
    printf("No match for %x in %x\n",pattern, bitstring);
  else 
    printf("Matches found for %x in %x : %.8x\n", pattern, bitstring, match);
}

The function findpattern returns an int where the ith bit is set if the pattern has been found at position i. In no pattern is found match is zero.

The idea is just to scan the successive bits of pattern by right shifting pattern. At any time if lsb is set, we AND the result with a properly shifted version of bitstring and if lsb of pattern is unset we AND it with the complement of the shifted bit string.
The complementation is done by xoring with (pattern&1)-1. If lsb is set, it is a xor with 1-1=0 (identity), otherwise a xor with -1 (equivalent of ~).

Note that there may be false matches in the higher bits as the bitstring is somehow artificially extended by (patternsize-1) zeros on the left and this may create problems with some combination of bitstring/pattern. That is the reason of the final mask that clears (pattersize-1) leftmost bits of match as it is impossible to find a match beyond bit 32-patternsize. For this reason match is anded with (2^(32-(patternsize-1)) that is a number formed of ones, with patternsize-1 zeros at leftmost positions.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.