viernes, 30 de septiembre de 2011

CHAPTER 2 Algorithm Simplification Strategies

CHAPTER 2


Algorithm Simplification Strategies

2.1 INTRODUCTION

An algorithm is simply a set of prescribed rules or procedures that are used to solve a given problem [103, 130]. Although there may exist different possible algorithms for solving an image/video processing problem, when transitioning to a real-time implementation, having efficient algorithms takes higher precedence. Efficiency implies low computational complexity as well as low memory and power requirements. Due to the vast amounts of data associated with digital images and video, developing algorithms that can deal with such amounts of data in a computational, memory, and power-efficient manner is a challenging task, especially when they are meant for real-time deployment on resource constrained embedded platforms.

Since algorithms are usually prototyped in development environments not suffering from resource constraints, they often have to be “optimized” for achieving real-time performance on a given hardware platform. While special hardware and software optimization techniques can be used to realize a real-time version of an algorithm, in general, greater gains in performance are obtained through simplifications at the algorithmic level [1, 124]. Such modifications or simplifications performed at the algorithmic level help to streamline the algorithm down to its core functionality, which not only leads to a lower computational complexity but also to lower memory and power requirements.

Thus, the very first step in transitioning an algorithm from a research environment to a real-time environment involves applying simplification strategies to the algorithm. It is more effective to perform these simplifications while still working in the research development environment, which possesses a higher design flexibility over the implementation environment.

As the first step toward transitioning an algorithm to a real-time implementation, this chapter presents the strategies to achieve algorithmic simplifications along with relevant examples from the literature to exhibit successful applications of the strategies.

2.2 CORE SIMPLIFICATION CONCEPTS

Examining the literature on real-time image and video processing reveals three major concepts addressing algorithmic simplifications. These strategies include the following:

reduction in number of operations;

reduction in amount of data to be processed; and

utilization of simple or simplified algorithms.

Reductions in operations and in amount of data to be processed are the most common simplification strategies for transitioning an algorithm to a real-time implementation. While there are many different manifestations of the reduction strategies, the discussion here provides a general classification of the strategies that are scattered throughout the literature. Each of these concepts will be expanded upon in the following subsections.

2.2.1 Reduction inNumber of Operations

Due to the enormous amount of data involved in image/video processing, every operation counts, especially the time-consuming operations at the lower levels of the processing hierarchy. Thus, reduction in the number of operations plays a major role in achieving real-time performance.

2.2.1.1 Pure Operation Reduction

The strategy of pure operation reduction involves applying a transformation to reduce the number of operations, which does not change the numerical outcome. If the numerical outcome is changed, then the approach is referred to as either an approximation or a suboptimal/alternative solution as discussed in the next subsection. Any operation that has inherent symmetries or redundancies in its constituent computations is a candidate for applying this strategy.

Application of this strategy manifests itself in uncovering hidden symmetries or redundancies within the computations, which can often be discovered by expanding the computations by hand and carefully noting any mathematical identities or properties [1].

An example of the reduction in number of operations through inherent computation symmetries or redundancies can be found in sliding window operations. Oftentimes, the extraction of local statistics via sliding windows involves a heavy amount of redundant computations that can be reduced by formulating a recursive approach. A common technique is to keep a running accumulator, where new results are obtained by subtracting the results of the leftmost column of the sliding window from the accumulator, sliding the window to the right to the next position, and adding the result of the new set of pixels in the rightmost column of the window to the accumulator. Another example of reduction in the number of operations arises in linear filtering utilizing a specific filter kernel. Symmetries in kernel coefficients can be used to reduce the amount of computations without changing the numerical outcome. Reduction of computations can also be achieved by rearranging or regrouping the computations such that the occurrence of the more time-consuming operations such as multiplication and division is reduced.

Throughout the literature, time-consuming operations including multiplication and division as well as other so-called “expensive” operations are considered the main targets for reduction. These operations are regarded as “expensive” due to the fact that they usually require more time to execute than other operations such as addition or bit-shifting. Thus, they present obstacles to achieving real-time performance. Researchers often seek to restructure an algorithm in such a way to reduce the amount of multiplications or divisions, sometimes even replacing these operations with simple bit-shifting to carry out multiplications or divisions by powers of 2.

The use of computationally simple operations is one of the key steps toward achieving real-time performance. Other times, a calculation can be cleverly factored, causing a reduction in the number of operations [12].

2.2.1.2 Operation Reduction Through Approximations

The strategy of approximations is similar to the strategy of reduction in computations in that approximations involve applying transformations to reduce the number of operations, but it differs from pure computational reduction due to the presence of approximation errors. The main objective of this strategy is to minimize errors as much as possible, thus obtaining suboptimal results within an acceptable interval of time. Consequently, in this strategy, one seeks a trade-off between the accuracy of the outcome and the speed at which it is obtained. In fact, a real-time performance is often obtained by trading off accuracy of processing with speed of processing, where an “optimal” working point is found through experimentation.

Examples of this strategy include the use of lookup tables for computationally complex calculations, the use of a suboptimal search technique over a time-consuming, exhaustive search technique, the use of sums of absolute values over time-consuming squaring operations, and many others.

2.2.1.3 Alternative Method

Choosing an alternative method is a strategy that involves searching for a new algorithm in order to attain faster processing while at the same time maintaining a required level of accuracy.

This strategy is similar to the strategy of approximations, except that it is used primarily when the algorithm is too computationally complex for any approximations or reductions to yield an acceptable real-time performance. Basically, this strategy involves abandoning the computationally complex algorithm in favor of one that is less computationally complex. A simpler algorithm is often developed by careful consideration of data to be processed and properties of the objects being sought out.

2.2.2 Reduction in Amount of Data

In addition to the reduction of operations, the reduction of data to be processed also plays a major role toward having real-time performance. This is simply because such systems require the processing of large amounts of data in relatively short intervals of time, and thus any reduction in the amount of data can lead to a reduction in the processing time. This strategy involves applying a transformation to the underlying image data for the purpose of deriving a compact representation, which in turn speeds up subsequent stages of processing. Reduction of data takes many forms in real-time image/video processing systems including spatial or temporal down-sampling, spatial block partitioning, region of interest or selective processing, formulating the algorithm in a multiresolution or coarse-to-fine processing framework, appropriate feature extraction, etc. In all these cases, a certain subset of pixels from an image frame is processed.

2.2.2.1 Spatial Down-Sampling, Selective Processing, and Coarse-to-fine Frameworks

Spatial down-sampling involves skipping the processing of a certain amount of pixels within a frame, while temporal down-sampling involves skipping the processing of an entire frame within a video sequence. It should be noted that such down-sampling may not be applicable in certain applications. Spatial block partitioning involves dividing up an image frame into nonoverlapping or overlapping blocks, and separately processing each block. Region of interest or selective processing involves applying complex computations only to a desired subset of the image data, providing a speedup in processing by narrowing down the region of interest. A multiresolution or coarse-to-fine framework involves formulating an algorithm where a rough estimate is first obtained at a coarse resolution level and then subsequently refined at increasing levels of resolution to allow a rapid convergence to the solution.

2.2.2.2 Compact Representation Through Appropriate Features

Selection of appropriate features to represent objects of interest helps one to cut downthe amount of extraneous information, providing a succinct representation of information which in turn simplifies higher level operations such asKalman filter tracking. Effective features depend on the application and the objects involved. For instance in face detection applications, color has been found to be an effective feature for real-time deployment. Of course, there is a computational burden associated with the extraction of features from image/video sequences. Hence, another important aspect involves not only finding effective features but also fast mechanisms to compute them.

Automatic algorithms that require less human intervention and fewer or no user-defined thresholds or parameters are also preferred because such algorithms can adapt to different situations automatically, leading to a truly stand-alone real-time system, which is the ultimate goal of most practical designs [6]. Clearly, the adaptation schemes should be kept computationally simple and not burden the rest of the algorithm. In addition to this, since real-time image/video processing systems are usually employed in real-world environments, in many cases, data preprocessing is performed to remove unwanted disturbances such as noise to ensure generating a reliable outcome. Similar to any other component, the preprocessing should also be as computationally efficient as possible.

2.2.3 Simplified Algorithms

In general, the fundamental idea behind real-time image/video processing systems is the utilization of simple or simplified algorithms. A rule of thumb when transitioning to a real-time implementation is to keep things as simple as possible, that is to say, look for simple solutions using simple operations and computationally simple algorithms as opposed to complex, computationally intensive algorithms, which may be optimal from a mathematical viewpoint, but are not practical from a real-time point of view. With embedded devices now being outfitted with vision capabilities, such as camera-equipped cell phones and digital still/video cameras, the deployment of those algorithms which are not only computationally efficient but also memory efficient is expected to grow. Often, algorithms are carefully analyzed in terms of their number of operations and computational complexity, but equally important are their storage requirements.

In essence, simple algorithms provide a means of meeting real-time performance goals by lowering computational burdens, memory requirements, and indirectly, power requirements as well.

2.3 EXAMPLES OF SIMPLIFICATIONS

To help illustrate the three stated algorithm simplification strategies toward achieving a real-time implementation, the following subsections present representative examples from the literature, which exhibit successful applications of the strategies.

2.3.1 Reduction inNumber of Operations

2.3.1.1 Pure Reduction

The strategy of pure operation reduction has been used by many researchers to simplify algorithms for real-time implementation. This strategy has been primarily used for algorithms with repetitive, well-structured computations in low-level operations, such as filtering, transforms, matrix–vector operations, and local statistics extraction. This subsection includes several examples illustrating the strategy of pure operation reduction.

Exploiting any available symmetry in the computations involved can often lead to pure operation reduction. For instance, in [87], the symmetry of the coefficients of linear-phase filters in the biorthogonal wavelet transform allowed streamlining the computation of the forward and inverse transforms into a single architecture while reducing the number of expensive multiplication operations. This produced a more efficient transform computation. Other common techniques used for reducing the number of multiplication operations in linear filtering include making use of the separability of the kernel involved, or eliminating multiplications by ones or zeros [99].

Computations can often be cleverly rearranged or factored to reduce the number of operations. One example of this can be found in [106], where the symmetry in the elements of the discrete cosine transform (DCT) matrix allowed rearranging the computations, reducing the number of expensive multiplication as well as addition operations. As a result, a more efficient transform was achieved through this operation reduction.

Another possible way to achieve pure operation reduction in matrix computations encountered in image/video processing algorithms is to seek out encoding schemes that can transform a matrix into a sparse matrix. For example, in [48], an exact two-dimensional (2D) polynomial expansion in terms of integer coefficients provided a sparse representation of the elements of the DCT matrix. This allowed a more efficient computation by replacing expensive multiplication operations with simple bit-shift and addition operations and reducing the number of multiplications as well as additions. Another popular technique to reduce the amount of operations in matrix computations involves exploiting matrix properties. For example, in [74], a rearrangement of the equations for a 2.5D affine motion parameter estimation allowed an efficient solution via an orthogonal matrix factorization using a Householder transformation, thus reducing the computation operations over that of a 2D affine estimation.

Many times, an efficient computational structure derived from digital signal processing theory can be utilized to achieve a reduction in the number of operations. For example, in [94], a one-dimensional (1D) infinite impulse response filter provided a reduction in the number of expensive multiplication operations per pixel over that of a 1D finite impulse response filter in addition to saving memory space via using a lower order filter. These changes led to an efficient scan-line-based image enhancement at video rates. Another example of this approach can be found in [88], where the relationship between the computation structure of discrete, geometric moments and that of all-pole digital filters was exploited, allowing the computation of any order geometric moment using a series of accumulators. This resulted in a significant reduction in the number ofmultiplications and thus allowed real-time computation of geometric moments.

In addition to multiplication operations, reduction in the number of comparison operations may also be useful. For instance, in [57], it was reported that the comparison operations had a greater computational cost than that of the addition and subtraction operations in the sum-of-absolute difference (SAD) computation for determining motion vectors as part of a block-matching method. In order to reduce the number of comparisons to determine the minimum of the SAD error surface, the order in which the SADs were calculated was modified from the standard scan-line order to a circular, spiraling inward order. This allowed comparing the SAD values only twice per spiral, providing up to a 14% decrease in comparison operations per frame with virtually no noticeable loss in image quality. Thus, changing the order by which a computation is carried out may lead to a reduction in the number of operations.

Exploiting recursion in a computation can be quite useful toward reducing the number of operations in certain data-intensive, low-level image processing tasks. This technique capitalizes on the redundancies present in adjacent computations. For example, in [84], a recursion in the computation of local variances was used to reduce the redundant computations by adding the values of a right strip and subtracting the values of a left strip. This provided a significant speedup from 112 to 3.28 ms, thus achieving the required real-time performance.

Similarly, in [7], a recursion in the correlation computation as part of a similarity measure for stereo image matching was used to reduce the amount of redundant computations for adjacent pixels in overlapping windows. A noteworthy discussion detailing the generalization of this approach to higher dimensions can be found in [137], where the 2D sliding window recursion equation was generalized to N-dimensional images. This recursion reduced the redundant operations in computing local statistics and provided a speedup that was independent of the size of the window. In [122], the strategy of recursion was applied to the computation of histogram statistics, where the computation of local histogram statistics was performed recursively and then generalized to N-dimensional images. This provided a remarkable speedup, 63 ms as compared to 2 min, leading to real-time color histogram matching.

2.3.1.2 Reduction via Approximations

When it comes to real-time implementation, sometimes sacrifices in accuracy have to be made in order to achieve the required performance. In fact, most algorithms that are transitioned to a real-time environment are simplified using various approximations.

Approximations are often used for reducing the computations in transform operations. For instance, in [33], approximating the DCT computations by using up to three biorthogonal matrices led to replacing expensive multiplication operations with bit-shifting and addition operations. The approximation could be varied between two or three matrices, producing tradeoffs between speed and accuracy. A high-performance version was able to process one 8 × 8 image block using less than one processor clock cycle per pixel.

Approximating computations by utilizing simple operations is often used to reduce the amount of processing time. In [131], simplifying the computations for computing the gradient image via a nonlinear filter led to the speedup needed to achieve an efficient real-time implementation of the noise-robust image enhancement procedure. Two key algorithmic simplifications that were made included approximating the quadratic filter using a normalized squared-gradient computation and replacing the normalization of the filter output by a power of 2 division. The loss in accuracy due to the division operation was kept low by considering the maximum value of the filter output for different images in the test set.

Another example that utilized this approach can be found in [63], where approximating 2D optical flow with two, simpler 1D flow computations provided a significant reduction in the processing time which allowed generating the motion model parameters in real-time. In order to account for the loss in accuracy due to the approximation, a fast iterative least-squares formulation was used to subsequently refine the approximation within ten iterations, providing a real-time throughput around 10–20 fps. Also, in [66], a subpixel edge detector was simplified by approximating a critical but expensive division operation via a simple iterative minimization procedure using integer arithmetic.

In some cases, the processing resources are so scarce that a desired computation cannot be fully realized without some simplifying approximation. For example, in [152], the computations required for applying a large 13×13 filter kernel had to be approximated by using two sequential passes of a smaller 7 × 7 filter kernel due to a lack of processing resources supporting larger kernels. The approximation did not have any detrimental effect on the outcome of the object tracking system under consideration.

Discarding computations to reduce the number of expensive operations is often used as a simple means of achieving algorithmic simplification through approximation. For instance, in [120], the computations involved in three-dimensional (3D) nonlinear filtering operations for the suppression of impulsive noise were simplified by using only a certain subset of the volume elements in the 3D filter kernel. This led to an acceptable processing speed while maintaining sufficient noise suppression characteristics. In [73], another application of discarding computations consisting of a partial decoding of an encoded image or video sequence was discussed.

In order to provide a real-time performance for viewing MPEG-2-encoded DVD movies on low-resolution mobile handsets, the expensive full-decode operation was approximated by keeping only the DC and a certain number of AC coefficients, and discarding the rest. The number of discarded coefficients could be varied to achieve a desired speed versus quality trade-off. As an example, one of the presented simplifications used three AC coefficients and required only 7 multiplications and 10 additions versus 4096 multiplications and 4032 additions for the full decoding of one block.

Oftentimes, a function evaluation is approximated as a lookup table, essentially trading off the time spent in computations with the time spent in accessing values from the memory space. This can help one to achieve a real-time performance. One good example of this lookup table approach involves the computation of the entropy function. In [23], in order to reduce the error in the approximation, a multiresolution lookup table was considered using a higher resolution for the portion of the function that incurred a larger error.

Formulating an algorithm in an evolutionary manner may also provide different degrees of approximation. For example, in [133], in order to have a real-time tracking algorithm for an automobile obstacle avoidance warning system, the matching stage was approximated by formulating it in an evolutionary manner where the matching improved by increasing the processing time. Formulating an algorithm in this manner allows the simplification in computations to be performed according to the time made available for processing and thus ensures an acceptable outcome while meeting hard real-time constraints.

2.3.1.3 Reduction via Alternative Methods

Sometimes an algorithm may be too computationally complex to warrant exploration of computational reduction strategies. In such cases, it is often desired to develop a computationally simple algorithm that can reduce the complexity while at the same time maintain the high level of accuracy of the original algorithm. In [29], one good example was presented that involved the development of an efficient line detection algorithm. In this work, it was decided to abandon the use of computationally expensive Hough transform techniques in favor of a simpler algorithm by exploiting the underlying geometry of the objects under consideration. This alternative, simple algorithm was based on the idea that the area formed by three colinear points is zero. This geometric property allowed utilizing a computationally simple means of detecting lines in images, which led to a real-time deployment. Another example of using an alternative, simple algorithm that achieved a real-time performance appears in [111]. In this algorithm, the operation of fitting an ellipse to extract contour points for a human gesture tracking system was found to be the bottleneck. It was decided to replace the original algorithm, which required an expensive Levenberg–Marquardt-based fitting procedure, with an alternative moments-based algorithm. This replacement produced a significant speedup in the ellipse fitting procedure.

Thus, sometimes it is beneficial to consider alternative formulations of an algorithm to address its real-time performance bottlenecks.

2.3.2 Reduction of Data

Reduction in the amount of data to be processed plays a prominent role for bringing image/video processing algorithms into real-time, and there are many examples given in the literature showing how to use this strategy to simplify an algorithm toward reaching a real-time performance.

Reducing the amount of data upfront allows one to speed up subsequent stages of processing. Also, it indirectly allows one to use more complex processing during subsequent stages of processing.

2.3.2.1 Subsampling

One of the simplest and most effective approaches involves applying spatial or temporal subsampling of the incoming image frame or video sequence. The objective here is to reduce the amount of data to be processed and thus to obtain a speedup for subsequent stages of processing.

There are many examples involving spatial subsampling. For example, in [25], subimages were reduced in size to 20 × 20 pixels before being subjected to classification processing in order to meet the hard real-time performance demands of an industrial inspection system. In another application involving obstacle detection, the input image was subsampled to reduce the size by a factor of 2, after having been spatially low-pass filtered for reducing the effects of aliasing.

Spatial subsampling has also been found to be useful for speeding up face recognition systems on embedded platforms. In [146], to meet the real-time performance requirements, input images of size 320 × 240 were downscaled to 80 × 60 to help speed up the discussed face recognition system. Similarly, in [11], input images of size 640 × 480 were downsized to 128 × 128 to help speed up the subsequent processing.

Another benefit of spatial subsampling involves reducing the number of points to search for object detection in industrial inspection systems. In such applications, it has been shown that subsampling can provide a speedup up to 25 times [34]. Naturally, temporal subsampling can be applied to the processing of video sequences. For example, in [142], it was suggested that a complex range or scene depth computation could be performed for every nth incoming frame instead of every frame in order to meet the real-time requirement of a gesture tracking algorithm. Of course, it should be pointed out that such methods do impart a certain loss in accuracy due to the discarding of some of the original data. Hence, it is best to explore several different subsampling intervals to determine the right balance between speed and accuracy. Also of note here is that in some cases, spatial subsampling might not be appropriate. In such cases, one may formulate the problem as a scalable algorithm [62]. This approach involves creating a quality control module to manage the processing for achieving a certain level of output quality according to the availability of system resources.

2.3.2.2 Partitioning

Another simple method of reducing the amount of data to be processed involves partitioning an image frame into smaller subimages that can each be processed at a faster speed than the entire image frame. This is similar to the divide-and-conquer strategy where the problem is divided into several smaller problems that are easier to solve [103, 130]. The most popular partitioning schemes include row-wise, column-wise, and block-wise. A good example appearing in [144] covers the computation of the discrete biorthogonal wavelet transform (DBWT) for highdefinition TV compression. Due to the large data set involved, real-time processing was not feasible without partitioning a frame into nonoverlapping, square subimages and calculating the DBWT separately on each subimage.

In another application discussed in [47], the image frame was partitioned into vertical strips in order to process the image in parallel on multiple processors. This generated a real-time implementation of the `a trous wavelet transform. In the face detection algorithm covered in [112] for an embedded device, the input image was partitioned into nonoverlapping subimages, the size of which was chosen to balance the gain in speed versus the accuracy of detection. In another example mentioned in [8], it was found that partitioning the input image into subimages enabled the real-time implementation of the most time-consuming portion of a color-quantization algorithm. Since edge artifacts could be present between adjacent subimages, an overlapping partitioning scheme can be employed, thus producing a trade-off between speed and artifact suppression [35]. In [31], an overlapped block-partitioning scheme was used to correct for processing across partition borders in a wavelet transform computation.

2.3.2.3 Selective Processing

Selective processing is another popular data reduction method that involves narrowing down the region of interest before applying any subsequent processing. As a result, only a certain subset of the entire image is processed, hence the name selective processing. For example, in [125], in locating an object of interest to subpixel accuracy, instead of applying subpixel calculations on the entire image first and then locating the object of interest, the location was first narrowed down to a specific area, after which appropriate computations were performed to refine this location to subpixel precision. In this case, narrowing down the area of interest helped to reduce the amount of data to be processed by the subpixel refinement stage, thus generating a real-time subpixel accurate object detection. Similarly, in another application involving the use of range data for gesture tracking [142], the range data was calculated only for selected regions of interest and not for the entire image. Therefore, if computationally complex processing cannot be avoided to save processing time, it is best to apply such processing only to the areas of interest and not to the entire image.

2.3.2.4 Determining and Using Appropriate Features

Determining and using appropriate image features instead of images themselves provide an effective way to reduce the dimensionality of image data to be processed, thus speeding up subsequent processing stages. Here, 11 examples exhibiting this approach are mentioned, each of which played a crucial role in achieving a real-time performance in their respective applications.

In [40], coarse motion features were preferred over dense features since it was found that dense features provided unnecessary tracking accuracy for the video surveillance application under consideration while demanding too much processing time.

In [102], marker features, which were physically placed on the objects to be tracked, were found to be appropriate features to obtain a real-time performance in the discussed constrained object tracking application. Such features allowed an object to be quickly identified by its unique marker features.

In [93], contour features were seen to provide a more compact representation over mesh features, achieving a real-time performance in calculating deformation parameters for a 3D face modeling application.

In [97], Gabor wavelet features were shown to provide a better baseline feature set over those from an eigen-decomposition method for a face detection application. The use of such features allowed the elimination of time-consuming pre- and postprocessing stages, cutting down on the amount of processing and thus leading to real-time performance.

In [151], the head contour was found to be an appropriate feature to be used for tracking human heads that turned around, considering a lack of texture features for utilizing a texture-based head tracking. The use of such features allowed the construction of a fast algorithm for tracking the displacement of the contours between subsequent frames.

This in turn led to real-time throughput.

In [9], motion and head shape features were found to be appropriate for a head detection and tracking algorithm. The use of the circle shape to represent the shape of the head allowed utilization of a fast Hough transform algorithm for real-time performance.

In [155], features reflecting human visual perception or features that humans can use to distinguish between a moving object and its shadows such as color and texturebased features were found to be the most appropriate features for the object tracking application under consideration.

In [68], contour features were used to enable a real-time object tracking via capturing the essential shape information. It was shown that contour features could be efficiently represented using a radial-based representation, which required a small amount of computations to update.

In [105], distance-independent features were considered to be the right kind of features for a real-time recognition application with objects placed at varying distances. These features allowed objects at varying distances to be represented by a fixed amount of pixels, which in turn helped to speed up the subsequent matching algorithm and to

achieve real-time recognition.

In [121], DCT coefficients and motion vectors were utilized to achieve a real-time video object segmentation using MPEG-encoded video sequences. The use of these features allowed the construction of a feature vector that succinctly captured the frequencytemporal characteristics of the image blocks, providing a 320:1 reduction in the amount

of data for a typical 352 × 288 color video sequence. This in turn helped to construct an efficient algorithm for segmenting video objects without resorting to the computationally expensive full decoding.

Color features have been found to be useful in many human detection and tracking applications over those obtained from gray-level images. For example, in [75], colorbased features were preferred over gray-level-based features due to their robust tracking performance in the face of poor conditions such as sudden changes in lighting. Also, in [92] and [28], skin color features were found to be the most appropriate for face detection applications. To balance out the increase in the amount of data to be processed due to the use of color information, it has been recommended to subsample color images to allow for faster processing [75].

2.3.2.5 Dimensionality Reduction

After having determined appropriate features to use, it is often helpful to further reduce the amount of data by applying dimensionality reduction techniques, such as principal component analysis (PCA), linear discriminant analysis (LDA), or a Kohonen self-organizing map (SOM).

There are many examples in the literature that employ such techniques. For instance, in [27], the Kohonen SOM neural network was used to reduce the dimension of a feature vector to help speed up the subsequent processing, while in [76], PCA was employed for the same purpose.

In [139], a combination of PCA and LDA was used to reduce the amount of data, providing a better separation of classes and thus easing the task of the subsequent classifier. Also, in [109], the dimensionality of chrominance feature vectors extracted from a 2D chrominance histogram was reduced by modeling the histogram as a multivariable Gaussian distribution and then applying PCA to remove the correlation in the two components. This allowed the use of

1D marginal statistics of each component instead of 2D statistics.

Dimensionality reduction has also been used when dealing with color images. When dealing with color image data, researchers have often made use of normalized 2D color spaces as opposed to 3D color spaces to achieve real-time implementations. For instance, in the face recognition application discussed in [28], the 2D normalized r–g color space was used, saving computations by reducing the 3D color space by one dimension. Also, reducing a three-channel color image to a one-channel image can sometimes serve as a means of achieving a real-time performance through dimensionality reduction. For instance, in [141], a moment-preserving threshold scheme was used to reduce a three-channel color image to a one-channel image by taking into account the correlation amongst the three color channels.

2.3.2.6 Multiresolution or Coarse-to-fine Processing

The technique of casting a complex problem into an appropriate multiresolution or coarse-tofine processing framework can often provide a reduction in data and computation through quick determination of a rough or low-resolution solution and its subsequent refinement at higher resolutions.

Optimization involving the global minimization of a parameter over some error surface arises often in image/video processing algorithms. To overcome difficulties in reaching the global minimum, the optimization can be performed using a multiresolution framework. The advantage of this framework is that at lower resolutions, the error surface is smoother and contains fewer local minima, allowing the use of fast methods to quickly find a rough solution.

This solution can then be propagated to a higher resolution with a reduced search range around the location of the rough solution. Such a scheme reduces the amount of data used to reach a solution. While it can provide a fast means of reaching a reasonable solution, it does not guarantee convergence to the true global solution. Thus, this framework can be considered to provide a fast, but suboptimal solution—which can be regarded as another example of the speed versus accuracy trade-off often encountered in real-time image/video processing systems. One successful example of the multiresolution framework applied to the optimization of a functional can be found in [129], where such a framework was utilized to generate panoramic ultrasound images in real-time.

Similar to a multiresolution framework, coarse-to-fine processing involves formulating a problem by quickly determining a rough solution during a first processing pass and then subsequently refining it during a second processing pass. In [72], such a framework was developed for an object tracking application, where a Kalman filter was used to roughly determine and subsequently refine the position of an object of interest. The predicted location was used as a rough estimate, and the prediction uncertainty was used to limit the search area for refinement processing. This reduced the amount of data to be searched in every frame and thus led to real-time performance.

2.3.3 Simple Algorithms

In order to meet real-time requirements, many researchers divide problems into stages composed of computationally simple operations. Simple operations include the use of binary, morphological, frame differencing, and various other computationally efficient operations. Due to the fact that such algorithms are often employed in real-world situations, they often have to be made robust to noise sources or disturbances in the scene such as changes in lighting conditions, or low-contrast scenes. The algorithm used to achieve a robust performance must also be computationally simple, and preferably fully automatic with little human intervention in the setting of thresholds or parameters. In the formulation of such an algorithm, a trade-off analysis between speed and accuracy needs to be performed in order to find a balance between achieving real-time performance and robustness to disturbances in the scene. Of course, the use of appropriate features that are robust to disturbances can help. It should be noted that the construction of simple or simplified algorithms involves both the use of computational and data reduction simplification strategies mentioned in the previous sections. Four examples are presented next to illustrate the practical usefulness of such simple algorithms.

The tracking of objects within sequences of images has long been an important problem in image/video processing. For example, in [61], a simple algorithm for tracking an object based on template matching was presented. To provide a more robust operation to changes in object shape, a rather simple extended snake algorithm was employed. First, the object was extracted by using simple frame differencing and applying simple morphological closing on the result to

merge homogenous regions. Then, a data reduction procedure was done by using an appropriate feature to succinctly represent the object and provide control points for the snake algorithm.

The use of simple algorithms during each stage allowed achieving robust real-time tracking. The increase in the practical applications of face detection and recognition has increased the interest in their real-time implementations. Such implementations are possible via the use of simple algorithms. For instance, in [92], a simple algorithm for face detection based on the use of skin color features was discussed. In order to make the detection algorithm robust to regions in the background with skin-like colors, a simplified method was developed. First, motion detection via a simple frame differencing operation was used to distinguish the human face from the background. Then, to reduce the noise in the difference image, a simple morphological opening operation was used. Finally, a simple labeling operation was used to determine the region where the face was located. Again, the use of simple algorithms during the various stages of the image processing chain allowed achieving robust real-time face detection.

Industrial inspection systems are known for their strict hard real-time deadlines, forcing developers to devise algorithms with simple operations. For example, in [147], a simple algorithm for the extraction of paper watermarks as part of a defect inspection system was introduced. Due to the real-time constraints involved, a simple morphological white top-hat operation was used to extract the watermarks. This simple operation was found to be not only fast, but also robust to disturbances such as uneven illumination and low-contrast watermarks.

In this case, the morphological operations provided a means of obtaining real-time performance. Similarly, in [135], simple defect location and feature extraction algorithms were employed to meet the hard real-time constraints in an industrial inspection system. In order to speed up the location detection process, the image was first binarized to find a rough estimate of the location of the defect. This rough location was then propagated to the full-resolution image, and a region around this location was selected for feature extraction. A simple gray-level difference method was used to extract texture features from the narrowed down defect region. The use of relatively simple operations enabled the real-time detection of defects and extraction of features for the subsequent classification.

2.4 SUMMARY

In this chapter, the three major strategies for achieving algorithmic simplifications were covered including reduction of computations, reduction of data, and simple algorithm design. In addition, a wide range of examples from the literature was mentioned in order to further illustrate the use of such strategies in actual real-time image/video processing systems.When it comes to developing an efficient algorithm for real-time deployment, one should examine each stage of the algorithm according to the required real-time performance constraints, carefully seeing if any of the principles or techniques presented in this chapter can be applied to gain an improved performance. Ultimately, it is up to the developer to determine the best balance between speed and accuracy for the application under consideration. In summary, this chapter has provided insights into available algorithmic simplification choices that should be considered at the beginning of the real-time implementation process.