tools

Tools

Certain tools are central to the processing of digital images. These include mathematical tools such as convolution, Fourier analysis, and statistical descriptions, and manipulative tools such as chain codes and run codes. We will present these tools without any specific motivation. The motivation will follow in later sections.

Convolution

There are several possible notations to indicate the convolution of two (multi-dimensional) signals to produce an output signal. The most common are:

We shall use the first form, , with the following formal definitions.

In 2D continuous space:

In 2D discrete space:

Properties of Convolution

There are a number of important mathematical properties associated with convolution.

* Convolution is commutative.

* Convolution is associative.

* Convolution is distributive.

where a, b, c, and d are all images, either continuous or discrete.

Fourier Transforms

The Fourier transform produces another representation of a signal, specifically a representation as a weighted sum of complex exponentials. Because of Euler's formula:

where , we can say that the Fourier transform produces a representation of a (2D) signal as a weighted sum of sines and cosines. The defining formulas for the forward Fourier and the inverse Fourier transforms are as follows. Given an image a and its Fourier transform A, then the forward transform goes from the spatial domain (either continuous or discrete) to the frequency domain which is always continuous.

Forward -

The inverse Fourier transform goes from the frequency domain back to the spatial domain.

Inverse -

The Fourier transform is a unique and invertible operation so that:

The specific formulas for transforming back and forth between the spatial domain and the frequency domain are given below.

In 2D continuous space:

Forward -

Inverse -

In 2D discrete space:

Forward -

Inverse -

Properties of Fourier Transforms

There are a variety of properties associated with the Fourier transform and the inverse Fourier transform. The following are some of the most relevant for digital image processing.

* The Fourier transform is, in general, a complex function of the real frequency variables. As such the transform can be written in terms of its magnitude and phase.

* A 2D signal can also be complex and thus written in terms of its magnitude and phase.

* If a 2D signal is real, then the Fourier transform has certain symmetries.

The symbol (*) indicates complex conjugation. For real signals eq. leads directly to:

* If a 2D signal is real and even, then the Fourier transform is real and even.

* The Fourier and the inverse Fourier transforms are linear operations.

where a and b are 2D signals (images) and w₁ and w₂ are arbitrary, complex constants.

* The Fourier transform in discrete space, A( , ), is periodic in both and . Both periods are 2 .

* The energy, E, in a signal can be measured either in the spatial domain or the frequency domain. For a signal with finite energy:

Parseval's theorem (2D continuous space):

Parseval's theorem (2D discrete space):

This "signal energy" is not to be confused with the physical energy in the phenomenon that produced the signal. If, for example, the value a[m,n] represents a photon count, then the physical energy is proportional to the amplitude, a, and not the square of the amplitude. This is generally the case in video imaging.

* Given three, multi-dimensional signals a, b, and c and their Fourier transforms A, B, and C:

In words, convolution in the spatial domain is equivalent to multiplication in the Fourier (frequency) domain and vice-versa. This is a central result which provides not only a methodology for the implementation of a convolution but also insight into how two signals interact with each other--under convolution--to produce a third signal. We shall make extensive use of this result later.

* If a two-dimensional signal a(x,y) is scaled in its spatial coordinates then:

* If a two-dimensional signal a(x,y) has Fourier spectrum A(u,v) then:

Importance of phase and magnitude

Equation indicates that the Fourier transform of an image can be complex. This is illustrated below in Figures 4a-c. Figure 4a shows the original image a[m,n], Figure 4b the magnitude in a scaled form as log(|A( , )|), and Figure 4c the phase ( , ).

Figure 4a Figure 4b Figure 4c Original log(|A( , )|) ( , )

Both the magnitude and the phase functions are necessary for the complete reconstruction of an image from its Fourier transform. Figure 5a shows what happens when Figure 4a is restored solely on the basis of the magnitude information and Figure 5b shows what happens when Figure 4a is restored solely on the basis of the phase information.

Figure 5a Figure 5b ( , ) = 0 |A( , )| = constant

Neither the magnitude information nor the phase information is sufficient to restore the image. The magnitude-only image (Figure 5a) is unrecognizable and has severe dynamic range problems. The phase-only image (Figure 5b) is barely recognizable, that is, severely degraded in quality.

Circularly symmetric signals

An arbitrary 2D signal a(x,y) can always be written in a polar coordinate system as a(r, ). When the 2D signal exhibits a circular symmetry this means that:

where r² = x² + y² and tan = y/x. As a number of physical systems such as lenses exhibit circular symmetry, it is useful to be able to compute an appropriate Fourier representation.

The Fourier transform A(u, v) can be written in polar coordinates A( _r, ) and then, for a circularly symmetric signal, rewritten as a ankel transform:

where and J_o(*) is a Bessel function of the first kind of order zero.

The inverse ankel transform is given by:

The Fourier transform of a circularly symmetric 2D signal is a function of only the radial frequency, _r. The dependence on the angular frequency, , has vanished. Further, if a(x,y) = a(r) is real, then it is automatically even due to the circular symmetry. According to equation , A( _r) will then be real and even.

Examples of 2D signals and transforms

Table 4 shows some basic and useful signals and their 2D Fourier transforms. In using the table entries in the remainder of this chapter we will refer to a spatial domain term as the point spread function (PSF) or the 2D impulse response and its Fourier transforms as the optical transfer function (OTF) or simply transfer function. Two standard signals used in this table are u(*), the unit step function, and J₁(*), the Bessel function of the first kind. Circularly symmetric signals are treated as functions of r as in eq. .

Statistics

In image processing it is quite common to use simple statistical descriptions of images and sub-images. The notion of a statistic is intimately connected to the concept of a probability distribution, generally the distribution of signal amplitudes. For a given region--which could conceivably be an entire image--we can define the probability distribution function of the brightnesses in that region and the probability density function of the brightnesses in that region. We will assume in the discussion that follows that we are dealing with a digitized image a[m,n].

Probability distribution function of the brightnesses

The probability distribution function, P(a), is the probability that a brightness chosen from the region is less than or equal to a given brightness value a. As a increases from - to + , P(a) increases from 0 to 1. P(a) is monotonic, non-decreasing in a and thus dP/da >= 0.

Probability density function of the brightnesses

The probability that a brightness in a region falls between a and a+ a, given the probability distribution function P(a), can be expressed as p(a) a where p(a) is the probability density function:


T.1 Rectangle
	picture 1	picture 2


T.2 Pyramid
	picture 1	picture 2


T.3 Pill Box
	picture 1	picture 2


T.4 Cone
	picture 1	picture 2


T.5 Airy PSF
	picture 1	picture 2


T.6 Gaussian
	picture 1	picture 2


T.7 Peak
	picture 1	picture 2

T.8 Exponential

Decay

picture 1

picture 2

Table 4: 2D Images and their Fourier Transforms

Because of the monotonic, non-decreasing character of P(a) we have that:

For an image with quantized (integer) brightness amplitudes, the interpretation of a is the width of a brightness interval. We assume constant width intervals. The brightness probability density function is frequently estimated by counting the number of times that each brightness occurs in the region to generate a histogram, h[a]. The histogram can then be normalized so that the total area under the histogram is 1 (eq. ). Said another way, the p[a] for a region is the normalized count of the number of pixels, , in a region that have quantized brightness a:

The brightness probability distribution function for the image shown in Figure 4a is shown in Figure 6a. The (unnormalized) brightness histogram of Figure 4a which is proportional to the estimated brightness probability density function is shown in Figure 6b. The height in this histogram corresponds to the number of pixels with a given brightness.

(a) (b)

Figure 6: (a) Brightness distribution function of Figure 4a with minimum, median, and maximum indicated. See text for explanation. (b) Brightness histogram of Figure 4a.

Both the distribution function and the histogram as measured from a region are a statistical description of that region. It must be emphasized that both P[a] and p[a] should be viewed as estimates of true distributions when they are computed from a specific region. That is, we view an image and a specific region as one realization of the various random processes involved in the formation of that image and that region. In the same context, the statistics defined below must be viewed as estimates of the underlying parameters.

Average

The average brightness of a region is defined as the sample mean of the pixel brightnesses within that region. The average, m_a, of the brightnesses over the pixels within a region ( ) is given by:

Alternatively, we can use a formulation based upon the (unnormalized) brightness histogram, h(a) = *p(a), with discrete brightness values a. This gives:

The average brightness, m_a, is an estimate of the mean brightness, u_a, of the underlying brightness probability distribution.

Standard deviation

The unbiased estimate of the standard deviation, s_a, of the brightnesses within a region ( ) with pixels is called the sample standard deviation and is given by:

Using the histogram formulation gives:

The standard deviation, s_a, is an estimate of _a of the underlying brightness probability distribution.

Coefficient-of-variation

The dimensionless coefficient-of-variation, CV, is defined as:

Percentiles

The percentile, p%, of an unquantized brightness distribution is defined as that value of the brightness a such that:

P(a) = p%

or equivalently

Three special cases are frequently used in digital image processing.

* 0% the minimum value in the region

* 50% the median value in the region

* 100% the maximum value in the region

All three of these values can be determined from Figure 6a.

Mode

The mode of the distribution is the most frequent brightness value. There is no guarantee that a mode exists or that it is unique.

SignaltoNoise ratio

The signal-to-noise ratio, SNR, can have several definitions. The noise is characterized by its standard deviation, s_n. The characterization of the signal can differ. If the signal is known to lie between two boundaries, a_min <= a <= a_max, then the SNR is defined as:

Bounded signal -

If the signal is not bounded but has a statistical distribution then two other definitions are known:

Stochastic signal - S & N inter-dependent

S & N independent

where m_a and s_a are defined above.

The various statistics are given in Table 5 for the image and the region shown in Figure 7.

Figure 7 Table 5 Region is the interior of the circle. Statistics from Figure 7

A SNR calculation for the entire image based on eq. is not directly available. The variations in the image brightnesses that lead to the large value of s (=49.5) are not, in general, due to noise but to the variation in local information. With the help of the region there is a way to estimate the SNR. We can use the s (=4.0) and the dynamic range, a_max - a_min, for the image (=241-56) to calculate a global SNR (=33.3 dB). The underlying assumptions are that 1) the signal is approximately constant in that region and the variation in the region is therefore due to noise, and, 2) that the noise is the same over the entire image with a standard deviation given by s_n = s.

Contour Representations

When dealing with a region or object, several compact representations are available that can facilitate manipulation of and measurements on the object. In each case we assume that we begin with an image representation of the object as shown in Figure 8a,b. Several techniques exist to represent the region or object by describing its contour.

Chain code

This representation is based upon the work of Freeman . We follow the contour in a clockwise manner and keep track of the directions as we go from one contour pixel to the next. For the standard implementation of the chain code we consider a contour pixel to be an object pixel that has a background (non-object) pixel as one or more of its 4-connected neighbors. See Figures 3a and 8c.

The codes associated with eight possible directions are the chain codes and, with x as the current contour pixel position, the codes are generally defined as:

Figure 8: Region (shaded) as it is transformed from (a) continuous to (b) discrete form and then considered as a (c) contour or (d) run lengths illustrated in alternating colors.

Chain code properties

* Even codes {0,2,4,6} correspond to horizontal and vertical directions; odd codes {1,3,5,7} correspond to the diagonal directions.

* Each code can be considered as the angular direction, in multiples of 45deg., that we must move to go from one contour pixel to the next.

* The absolute coordinates [m,n] of the first contour pixel (e.g. top, leftmost) together with the chain code of the contour represent a complete description of the discrete region contour.

* When there is a change between two consecutive chain codes, then the contour has changed direction. This point is defined as a corner.

Crack code

An alternative to the chain code for contour encoding is to use neither the contour pixels associated with the object nor the contour pixels associated with background but rather the line, the "crack", in between. This is illustrated with an enlargement of a portion of Figure 8 in Figure 9.

The "crack" code can be viewed as a chain code with four possible directions instead of eight.

(a) (b)

Figure 9: (a) Object including part to be studied. (b) Contour pixels as used in the chain code are diagonally shaded. The "crack" is shown with the thick black line.

The chain code for the enlarged section of Figure 9b, from top to bottom, is {5,6,7,7,0}. The crack code is {3,2,3,3,0,3,0,0}.

Run codes

A third representation is based on coding the consecutive pixels along a row--a run--that belong to an object by giving the starting position of the run and the ending position of the run. Such runs are illustrated in Figure 8d. There are a number of alternatives for the precise definition of the positions. Which alternative should be used depends upon the application and thus will not be discussed here.