Pattern recognition

Pattern Matching

Normalized grayscale correlation is widely used in industry for pattern matching applications. Although in many cases you do not need to know how the search operation is performed, an understanding of the algorithm can sometimes help you pick an optimal search strategy.

Normalized Correlation

The correlation operation can be seen as a form of convolution, where the pattern matching model is analogous to the convolution kernel (see Chapter 5: Image manipulation). In fact, ordinary (un-normalized) correlation is exactly the same as a convolution:

In other words, for each result, the N pixels of the model are multiplied by the N underlying image pixels, and these products are summed. Note, the model doesn’t have to be rectangular, because it can contain "don’t care" pixels that are completely ignored during the calculation. When the correlation function is evaluated at every pixel in the target image, the locations where the result is largest are those where the surrounding image is most similar to the model. The search algorithm then has to locate these peaks in the correlation result, and return their positions.
Unfortunately, with ordinary correlation, the result increases if the image gets brighter. In fact, the function reaches a maximum when the image is uniformly white, even though at this point it no longer looks like the model. The solution is to use a more complex, normalized version of the correlation function (the subscripts have been removed for clarity, but the summation is still over the N model pixels that are not "don’t cares"):

With this expression, the result is unaffected by linear changes (constant gain and offset) in the image or model pixel values. The result reaches its maximum value of 1 where the image and model match exactly, gives 0 where the model and image are uncorrelated, and is negative where the similarity is less than might be expected by chance.
In our case, we are not interested in negative values, so results are clipped to 0. In addition, we use r 2 instead of r to avoid the slow square-root operation. Finally, the result is converted to a percentage, where 100% represents a perfect match. So, the match score returned by MpatGetResult() is actually:
Score = max (r, 0)2 x 100%
Note, some of the terms in the normalized correlation function depend only on the model, and hence can be evaluated once and for all when the model is defined. The only terms that need to be calculated during the search are:

This amounts to two multiplications and three additions for each model pixel.
On a typical PC the multiplications alone account for most of the computation time. A typical application might need to find a 128x128-pixel model in a 512x512-pixel image. In such a case, the total number of multiplications needed for an exhaustive search is 2x512 2 x128 2 , or over 8 billion. On a 66MHz 80486 processor, this would take approximately 30 minutes, much more than the 100 milli-seconds or so the search actually takes. Clearly, MpatFindModel() does much more than simply evaluate the correlation function at every pixel in the search area and return the location of the peak scores.

Hierarchical Search

A reliable method of reducing the number of computations is to perform a so-called hierarchical search. Basically, a series of smaller, lower-resolution versions of both the image and the model are produced, and the search begins on a much-reduced scale. This series of sub-sampled images is sometimes called a resolution pyramid, because of the shape formed when the different resolution levels are stacked on top of each other.

Each level of the pyramid is half the size of the previous one, and is produced by applying a low-pass filter before sub-sampling. If level 0 (the original image or model size) is 512x512, then level 1 is 256x256, level 2 is 128x128, and so on. Therefore, the higher the level in the pyramid, the lower the resolution of the image and model.

The search starts at low resolution to quickly find likely match candidates. It proceeds to higher and higher resolution to refine the positional accuracy and make sure that the matches found at low resolution actually were occurrences of the model. Because the position is already known from the previous level (to within a pixel or so), the correlation function need be evaluated only at a very small number of locations.

Since each higher level in the pyramid reduces the number of computations by a factor of 16, it is usually desirable to start at as high a level as possible. However, the search algorithm must trade off the reduction in search time against the increased chance of not finding the pattern at very low resolution. Therefore, it chooses a starting level according to the size of the model and the characteristics of the pattern. In the application described earlier (128x128 model and 512x512 image), it might start the search at level 4, which would mean
using an 8x8 version of the model and a 32x32 version of the target image. You can, if desired, force a specific starting level, using MpatSetSearchParameter() .

The last (lowest) level used is usually determined by the specified positional accuracy; however, you can set this explicitly, using MpatSetSearchParameter().

The logic of a hierarchical search accounts for a seemingly counter-intuitive characteristic of MpatFindModel(): large models tend to be found faster than small ones. This is because a small model cannot be sub-sampled to a large extent without losing all detail. Therefore, the search must begin at fairly high resolution (low level), where the relatively large search area results in a longer search time. Thus, small models can only be found quickly in fairly small search areas.

Search Heuristics

Even though performed at very low resolution, the initial search still accounts for most of the computation time if the correlation is performed at every pixel in the search area. On most models, match peaks (pixel locations where the surrounding image is most similar to the model and correlation results are largest) are several pixels wide. These can be found without evaluating the correlation function everywhere. MpatPreprocModel() analyses the shape of the match peak produced by the model, and determines if it is safe to try to find peaks faster. If the pattern produces a very narrow match peak (or the model was not pre-processed), an exhaustive initial search is performed. The search algorithm tends to be conservative; if necessary, force fast peak finding, using
MpatSetSearchParameter().

At the last (high-resolution) stage of the search, the model is large, so this stage can take a significant amount of time, even though the correlation function is evaluated at only a very few points. To save time, you can select high search speed, using MpatSetSpeed(). Only every second model pixel will be used. For most models, this has little effect on the score or accuracy, but does increase speed.

Sub-Pixel Accuracy

The highest match score occurs at only one pixel position, and drops off around this peak. The exact (sub-pixel) position of the model can be estimated from the match scores found on either side of the peak. A curve is fitted to the match scores around the peak and, from the equation of the curve, the exact peak position is calculated. The curve is also used to improve the estimate of the match score itself, which should be slightly higher at the true peak position than the actual measured value at the nearest whole pixel.

The actual accuracy that can be obtained depends on several factors, including the noise in the image and the particular pattern of the model. However, if these factors are ignored, the absolute limit on accuracy, imposed by the algorithm itself and by the number of bits and precision used to hold the correlation
result, is about 0.05 pixels. This is the worst-case error measured in X or Y when an image is artificially shifted by fractions of a pixel. In a real application, accuracy better than 0.1 pixel might be achieved for low-noise images; however, it is better not to rely on more than a 0.125 pixel accuracy. These numbers apply if you select high-search accuracy, using MpatSetAccuracy(), in which case the search always proceeds to resolution level 0.
If you select medium accuracy (the default), the search may stop at resolution level 1, and hence the accuracy is half of what can be attained at level 0. Selecting low accuracy may cause the search to stop at level 2, so the accuracy is reduced by an additional factor of two (to about 0.5 pixel).