jueves, 29 de septiembre de 2011

CHAPTER 1 Real-Time Image and Video Processing Concepts

CHAPTER 1

Real-Time Image and Video Processing Concepts

1.1 INTRODUCTION

The multidisciplinary field of real-time image and video processing has experienced a tremendous growth over the past decade, as evidenced by a large number of real-time related articles that have appeared in various journals, conference proceedings, and books. Our goal by writing this book has been to compile in one place the guidelines one needs to know in order to take an algorithm from a research environment into an actual real-time constrained implementation.

Real-time image and video processing has long played a key role in industrial inspection systems and will continue to do so while its domain is being expanded into multimedia-based consumer electronics products, such as digital and cell-phone cameras, and intelligent video surveillance systems [20, 55, 150]. Of course, to understand such complex systems and the tools required to implement their algorithms, it is necessary to start with the basics.

Let us begin by examining the underlying concepts that form the foundation of such real-time systems. Starting with an analysis of the basic types of operations that are commonly encountered in image and video processing algorithms, it is argued that the real-time processing needs can be met through exploitation of various types of parallelism inherent in such algorithms.

In what follows, the concept of “real-time” as it pertains to image and video processing systems is discussed and followed by an overview of the history of these systems and a glance at some of the emerging applications along with the common types of implementation trade-off decisions. This introductory chapter ends with a brief overview of the other chapters.

1.2 PARALLELISM IN IMAGE/VIDEO PROCESSING OPERATIONS

Real-time image and video processing systems involve processing vast amounts of image data in a timely manner for the purpose of extracting useful information, which could mean anything from obtaining an enhanced image to intelligent scene analysis. Digital images and video are essentially multidimensional signals and are thus quite data intensive, requiring a significant amount of computation and memory resources for their processing [15]. For example, take a typical N × M digital image frame with P bits of precision. Such an image contains N ×M× P bits of data. Normally, each pixel can be sufficiently represented as 1 byte or 8 bits, the exception being in medical or scientific applications where 12 or more bits of precision may be needed for higher levels of accuracy. The amount of data increases if color is also considered.

Furthermore, the time dimension of digital video demands processing massive amounts of data per second. One of the keys to real-time algorithm development is the exploitation of the information available in each dimension. For digital images, only the spatial information can be exploited, but for digital videos, the temporal information between image frames in a sequence can be exploited in addition to the spatial information.

A common theme in real-time image/video processing systems is how to deal with their vast amounts of data and computations. For example, a typical digital video camera capturing VGA-resolution quality, color video (640 × 480) at 30 fps requires performing several stages of processing, knownas the image pipeline, at a rate of 27 million pixels per second. Consider that in the near future as high-definitionTV(HDTV) quality digital video cameras come into the market, approximately 83 million pixels per secondmust be processed for 1280×720 HDTV quality video at 30 fps. With the trend toward higher resolution and faster frame rates, the amounts of data that need to be processed in a short amount of time will continue to increase dramatically.

The key to cope with this issue is the concept of parallel processing, a concept well known to those working in the computer architecture area, who deal with computations on large data sets. In fact, much of what goes into implementing an efficient image/video processing system centers on how well the implementation, both hardware and software, exploits different forms of parallelism in an algorithm, which can be data level parallelism (DLP) or/and instruction level parallelism (ILP) [41, 65, 134]. DLP manifests itself in the application of the same operation on different sets of data, while ILP manifests itself in scheduling the simultaneous execution of multiple independent operations in a pipeline fashion.

To see how the concept of parallelism arises in typical image and video processing algorithms, let us have a closer look at the operations involved in the processing of image and video data. Traditionally, image/video processing operations have been classified into three main levels, namely low, intermediate, and high, where each successive level differs in its input/output data relationship [41, 43, 89, 134]. Low-level operators take an image as their input and produce an image as their output, while intermediate-level operators take an image as their input and generate image attributes as their output, and finally high-level operators.

Controls

Features

Pixels

Pixels

High level

Intermediate level

Low level

FIGURE 1.1: Image processing operations pyramid

take image attributes as their inputs and interpret the attributes, usually producing some kind of knowledge-based control at their output. As illustrated in Figure 1.1, this hierarchical classification can be depicted as a pyramid with the pixel data intensive operations at the bottom level and the more control-intensive, knowledge-based operations at the top level with feature extraction operations in-between the two at the intermediate level. Each level of the pyramid is briefly explained here, revealing the inherent DLP in many image/video processing operations.

1.2.1 Low-Level Operations

Low-level operations transform image data to image data. This means that such operators deal directly with image matrix data at the pixel level. Examples of such operations include color transformations, gamma correction, linear or nonlinear filtering, noise reduction, sharpness enhancement, frequency domain transformations, etc. The ultimate goal of such operations is to either enhance image data, possibly to emphasize certain key features, preparing them for viewing by humans, or extract features for processing at the intermediate-level.

These operations can be further classified into point, neighborhood (local), and global operations [56, 89, 134]. Point operations are the simplest of the low-level operations since a given input pixel is transformed into an output pixel, where the transformation does not depend on any of the pixels surrounding the input pixel. Such operations include arithmetic operations, logical operations, table lookups, threshold operations, etc. The inherent DLP in such operations is obvious, as depicted in Figure 1.2(a), where the point operation on the pixel shown in black needs to be performed across all the pixels in the input image.

FIGURE1.2: Parallelism in low-level (a) point, (b) neighborhood, and (c) global image/video processing operations

Local neighborhood operations are more complex than point operations in that the transformation from an input pixel to an output pixel depends on a neighborhood of the input pixel. Such operations include two-dimensional spatial convolution and filtering, smoothing, sharpening, image enhancement, etc. Since each output pixel is some function of the input pixel and its neighbors, these operations require a large amount of computations. The inherent parallelism in such operations is illustrated in Figure 1.2(b), where the local neighborhood operation on the pixel shown in black needs to be performed across all the pixels in the input image.

Finally, global operations build upon neighborhood operations in which a single output pixel depends on every pixel in the input image [see Figure 1.2(c)]. A prominent example of such an operation is the discrete Fourier transform which depends on the entire image. These operations are quite data intensive as well.

All low-level operations involve nested looping through all the pixels in an input image with the innermost loop applying a point, neighborhood, or global operator to obtain the pixels forming an output image. As such, these are fairly data-intensive operations, with highly structured and predictable processing, requiring a high bandwidth for accessing image data. In general, low-level operations are excellent candidates for exploiting DLP.

1.2.2 Intermediate-Level Operations

Intermediate-level operations transform image data to a slightly more abstract form of information by extracting certain attributes or features of interest from an image. This means that such operations also deal with the image at the pixel level, but a key difference is that the transformations involved cause a reduction in the amount of data from input to output. Intermediate operations primarily include segmenting an image into regions/objects of interest, extracting edges, lines, contours, or other image attributes of interest such as statistical features. The goal by carrying out these operations is to reduce the amount of data to form a set of features suitable for further high-level processing. Some intermediate-level operations are also data intensive with a regular processing structure, thus making them suitable candidates for exploiting DLP.

1.2.3 High-Level Operations

High-level operations interpret the abstract data from the intermediate-level, performing highlevel knowledge-based scene analysis on a reduced amount of data. Such operations include classification/recognition of objects or a control decision based on some extracted features. These types of operations are usually characterized by control or branch-intensive operations. Thus, they are less data intensive and more inherently sequential rather than parallel. Due to their irregular structure and low-bandwidth requirements, such operations are suitable candidates for exploiting ILP [20], although their data-intensive portions usually include some form of matrix–vector operations that are suitable for exploiting DLP.

1.2.4 Matrix–Vector Operations

It is important to note that in addition to the operations discussed, another set of operations is also quite prominent in image and video processing, namely matrix–vector operations. Linear algebra is used extensively in image and video processing, and most algorithms require at least some form of matrix or vector operations, even in the high-level operations of the processing chain. Thus, matrix–vector operations are prime candidates for exploiting DLP due to the structure and regularity found in such operations.

1.3 DIVERSITY OF OPERATIONS IN IMAGE/VIDEO PROCESSING

From the above discussion, one can see that there is a wide range of diversity in image and video processing operations, starting from regular, high data rate operations at the front end and proceeding toward irregular, low data rate, control-intensive operations at the back end [1].

A typical image/video processing chain combines the three levels of operations into a complete system, as shown in Figure 1.3, where row (a) shows the image/video processing chain, and row (b) shows the decrease in the amount of data from the start of the chain to the end for an N × N image with P bits of precision [126].


FIGURE1.3: Diversity of operations in image/video processing: (a) typical processing chain, (b) decrease in amount of data across processing chain.

Depending on the types of operations involved in an image/video processing system, this leads to the understanding that a single processor might not be suitable for implementing a real-time image/video processing algorithm. A more appropriate solution would thus involve a highly data parallel front end coupled with a fast general-purpose back end [1].

1.4 DEFINITION OF “REAL-TIME”

Considering the need for real-time image/video processing and how this need can be met by exploiting the inherent parallelism in an algorithm, it becomes important to discuss what exactly is meant by the term “real-time,” an elusive term that is often used to describe a wide variety of image/video processing systems and algorithms. From the literature, it can be derived that there are three main interpretations of the concept of “real-time,” namely real-time in the perceptual sense, real-time in the software engineering sense, and real-time in the signal processing sense.

1.4.1 Real-time in Perceptual Sense

Real-time in the perceptual sense is used mainly to describe the interaction between a human and a computer device for a near instantaneous response of the device to an input by a human user.

For instance, Bovik [15] defines the concept of “real-time” in the context of video processing, describing that “the result of processing appears effectively instantaneously(usually in a perceptual sense) once the input becomes available.” Also, Guy [60] defines the concept of “real-time image processing” as the “digital processing of an image which occurs seemingly immediately; without a user-perceivable calculation delay.” An important item to observe here is that “real-time” involves the interaction between humans and computers in which the use of the words “appears” and “perceivable” appeals to the ability of a human to sense delays. Note that “real-time” connotes the idea of a maximum tolerable delay based on human perception of delay, which is essentially some sort of application-dependent bounded response time.

For instance, the updating of an automatic white balance (AWB) algorithm running on a digital camera need not operate every 33 ms at the maximum frame rate of 30 fps. Instead, updating at approximately 100 ms is sufficient for the processing to seem imperceptible to a human user when white balance gains require adjustment to reflect the surrounding lighting conditions. Thus, as long as the algorithm takes no longer than 100 ms to complete whatever image processing the algorithm entails, it can be considered to be “real-time.” It should be noted that in this example, in certain instances, for example low-light conditions, it might be perfectly valid to relax the “real-time” constraint and allow for extra processing in order to achieve better image quality. The key question is whether an end user would accept the trade-off between slower update rates and higher image quality. From this discussion, one can see that the definition of “real-time” is loose because the maximum tolerable delay is entirely application dependent and in some cases the system would not be deemed a complete failure if the processing happened to miss the “real-time” deadline.

1.4.2 Real-time in Software Engineering Sense

Real-time in the software engineering sense is also based on the concept of a bounded response time as in the perceptual sense. Dougherty and Laplante [42] point out that a “real-time system is one that must satisfy explicit bounded response time constraints to avoid failure,” further explaining that “a real-time system is one whose logical correctness is based both on the correctness of the outputs and their timeliness.” Indeed, while any result of processing that is not logically correct is useless, the important distinction for “real-time” status is the all-important time constraint placed on obtaining the logically correct results.

In software engineering, the concept of “real-time” is further classified based on the strictness attached to the maximum bounded response time into what is known as hard realtime, firm real-time, and soft real-time. Hard real-time refers to the case where if a real-time deadline is missed, it is deemed to be a complete failure. Firm real-time refers to the case in which a certain amount of missed real-time deadlines is acceptable and does not constitute failure.

Finally, soft real-time refers to the case where missed real-time deadlines result in performance degradation rather than failure. In order to manage the priorities of different tasks of a system, real-time operating systems have been utilized to ensure that deadlines, whether hard, firm, or soft, are met. From a software engineer point of view, the issue of real-time is more about predictable performance rather than just fast processing [90].

1.4.3 Real-time in Signal Processing Sense

Real-time in the signal processing sense is based on the idea of completing processing in the time available between successive input samples. For example, in [81], “real-time” is defined as “completing the processing within the allowable or available time between samples,” and it is stated that a real-time algorithm is one whose total instruction count is “less than the number of instructions that can be executed between two consecutive samples.” While in [1], “real-time processing” is defined as the computation of “a certain number of operations upon a required amount of input data within a specified interval of time, set by the period over which the data arrived.” In addition to the time required for processing, the times required for transferring image data and for other memory-related operations pose additional bottlenecks in most practical systems, and thus they must be taken into consideration [124].

An important item of note here is that one way to gauge the “real-time” status of an algorithm is to determine some measure of the amount of time it takes for the algorithm to complete all requisite transferring and processing of image data, and then making sure that it is less than the allotted time for processing. For example, in multimedia display devices, screen updates need to occur at 30 fps for humans to perceive continuous motion, and thus any picture enhancement or other types of image/video processingmust occur within the 33 ms time frame.

It should be pointed out that, in image/video processing systems, it is not always the case that the processing must be completed within the time afforded by the inverse frame rate, as was seen in the above AWB update example.

1.4.4 Misinterpretation of Concept of Real-time

A common misunderstanding regarding the concept of “real-time” is that since hardware is getting faster and more powerful each year, “real-time” constraints can be met simply by using the latest, fastest, most powerful hardware, thus rendering “real-time,” a nonissue. The problem with this argument is that it is often the case that such a solution is not a viable one, especially for consumer electronics embedded systems that have constraints on their total system cost and power consumption. For instance, it does not make sense to bundle the engineering workstation used to develop an image processing algorithm into a digital camera just for the purpose of running the algorithm in real-time.


1.4.5 Challenges in Real-time Image/Video Processing

Bearing in mind the above argument, developing a real-time image/video processing system can be quite a challenge. The solution often ends up as some combination of hardware and software approaches. From the hardware point of view, the challenge is to determine what kind of hardware platform is best suited for a given image/video processing task among the myriad of available hardware choices. From the algorithmic and/or software point of view, the challenge involves being able to guarantee that “real-time” deadlines are met, which could involve making choices between different algorithms based on computational complexity, using a real-time operating system, and extracting accurate timing measurements from the entire system by profiling the developed algorithm.

1.5 HISTORICAL PERSPECTIVE

The development of digital computers, electronic image sensors coupled with analog-to-digital converters, along with the theoretical developments in the field of multidimensional signal processing have all led to the creation of the field of real-time image and video processing.

Here, an overview of the history of image processing is stated in order to gain some perspective on where this field stands today.

1.5.1 History of Image/Video Processing Hardware Platforms

The earliest known digital image processing, the processing of image data in digital form by a digital computer, occurred in 1957 with the first picture scanner attached to theNational Bureau of Standards Electronic Automatic Computer (SEAC), built and designed by the scientists at the United States National Bureau of Standards, now known as the National Institute of Standards and Technology [86]. This scanner was used to convert an analog image into discrete pixels, which could be stored in the memory of the SEAC. The SEAC was used for early experiments in image enhancement utilizing edge enhancement filters. These developments, stimulated by the search for innovative uses of the ever-increasing computation power of computers, eventually led to the creation of the field of digital image processing as it is known today.

Around the same time frame, in the 1960s, developments at NASA’s Jet Propulsion Laboratory led to the beginning of electronic imaging using monochrome charge-coupled device enabled electronic still cameras [56]. The need for obtaining clear images from space exploration was the driving force behind the uses of digital cameras and digital image processing by NASA scientists.

With such technology at hand, new applications for image processing were quickly developed, most notably including among others, industrial inspection and medical imaging. Of course, due to the inherent parallelism in the commonly used low-level and intermediate level operations, architectures for image processing were built to be massively parallel in order to cope with the vast amounts of data that needed to be processed. While the earliest computers used for digital processing of images consisted of large, parallel mainframes, the drive for miniaturization and advancements in very large scale integration (VLSI) technology led to the arrival of small, power-efficient, cost-effective, high performance processor solutions, eventually bringing the processing power necessary for real-time image/video processing into a device that could fit in the palm of one’s hand and go into a pocket.

It used to be that when an image/video system design required a real-time throughput, multiple boards with multiple processors working in parallel were used, especially in military and medical applications where in many cases cost was not a limiting factor.With the development of the programmable digital signal processor (DSP) technology in the 1980s though, this way of thinking was about to change. The following decade saw the introduction of the first commercially available DSPs, which were created to accelerate the computations necessary for signal processing algorithms. DSPs helped to usher in the age of portable embedded computing.

The mid-1980s also saw the introduction of programmable logic devices such as the field programmable gate array (FPGA), a technology that desired to unite the flexibility of software through programmable logic with the speed of dedicated hardware such as application-specific integrated circuits. In the 1990s, there was further growth in both DSP performance, through increased use of parallel processing techniques, and FPGA performance to meet the needs of multimedia devices and a push toward the concept of system-on-chip (SoC), which sought to bring all necessary processing power for an entire system onto a single chip. The trend for SoC design continues today [71].

In addition to these developments, a recent trend in the research community has been to harness the massive parallel computation power of the graphics processing units (GPUs) found in most modern PCs and laptops for performing compute-intensive image/video processing algorithms [110]. Currently, GPUs are used only in desktops or laptops, but pretty soon they are expected to be found in embedded devices as well. Another recent development that started in the late 1990s and early 2000s is the idea of a portable multimedia supercomputer that combines the high-performance parallel processing power needed by low-level and intermediate level image/video operations with the high energy efficiency demanded by portable embedded devices [54].

1.5.2 Growth in Applications of Real-time Image/Video Processing

Alongside the developments in hardware architectures for image/video processing, there have also been many notable developments in the application of real-time image/video processing.

Lately, digital video surveillance systems have become a high-priority topic of research worldwide [6, 16, 36, 37, 40, 45, 69, 98, 149, 155]. Relevant technologies include automatic, robust face recognition [11, 28, 92, 112, 146], gesture recognition [111, 142], tracking of human or object movement [9, 40, 61, 68, 76, 92, 102, 151], distributed or networked video surveillance with multiple cameras [17, 37, 53, 75], etc. Such systems can be categorized as being hard real-time systems and require one to address some difficult problems when deployed in realworld environments with varying lighting conditions. Along similar lines, the development of smart camera systems [20] can be mentioned, which have many useful applications such as lane change detection warning systems in automobiles [133], monitoring driver alertness [72], or intelligent camera systems that can accurately adjust for focus [52, 79, 115, 116], exposure [13,78, 108], and white balance [30, 78, 108] in response to a changing scene. Other interesting areas of research include developing fast, efficient algorithms to support the image/video coding standards set forth by the standards committees [22, 26, 31, 33, 48, 57, 70, 73, 82, 87, 106,144]. In the never ending quest for a perfect picture, research in developing fast, high-quality algorithms for processing pictures/videos captured by consumer digital cameras or cell-phone cameras [80] is expected to continue well into the future. Of course, the developments in industrial inspection [25, 34, 67, 135, 147] and medical imaging systems [18, 23, 24, 44, 136, 143,145] will continue to progress. The use of color image data [8, 85, 107, 109], or in some cases, multispectral image data [139] in real-time image/video processing systems is also becoming an important area of research.

It is worth mentioning that the main sources of inspiration for all the efforts in the applications of real-time image/video processing are biological vision systems, most notably the human visual system. As Davies [35] puts it, “if the eye can do it, so can the machine.” This requires using our knowledge along with the available algorithmic, hardware, and software tools to properly transition algorithms from research to reality.

1.6 TRADEOFF DECISIONS

Designing real-time image/video processing systems is a challenging task indeed. Given a fixed amount of hardware, certain design trade-offs will most certainly have to be made during the course of transitioning an algorithm from a research development environment to an actual real-time operation on some hardware platform. Practical issues of speed, accuracy, robustness, adaptability, flexibility, and total system cost are important aspects of a design and in practice, one usually has to trade one aspect for another [35]. In real-time image/video processing systems, speed is critical and thus trade-offs such as speed versus accuracy are commonly encountered.

Since the design parameters depend on each other, the trade-off analysis can be viewed as a system optimization problem in a multidimensional space with various constraint curves and surfaces [35]. The problem with such an analysis is that, from a mathematical viewpoint, methods to determine optimal working points are generally unknown, although some progress is being made [62]. As a result, one is usually forced to proceed in an ad hoc manner.

1.7 CHAPTER BREAKDOWN

It could be argued that we are at a crossroad in the development of real-time image/video processing systems. Although high-performance hardware platforms are available, it is often difficult to easily transition an algorithm onto such platforms.

The advancements in integrated circuit technology have brought us to the point where it is now feasible to put into practical use the rich theoretical results obtained by the image processing community. The value of an algorithm hinges upon the ease with which it can be placed into practical use. While the goal of implementing image/video processing algorithms in real-time is a practical one, the implementation challenges involved have often discouraged researchers from pursuing the idea further, leaving it to someone else to discover the algorithm, explore its trade-offs, and implement a practical version in real-time. The purpose of the following chapters is to ease the burden of this task by providing a broad overview of the tools commonly used in practice for developing real-time image/video processing systems. The rest of the book is organized as follows:

Chapter 2: Algorithm Simplification Strategies

In this chapter, the algorithmic approaches for implementing real-time image/video processing algorithms are presented. It includes guidelines as how to speed up commonly used image/video processing operations. These guidelines are gathered from the recent literature spanning over the past five years.

Chapter 3: Hardware Platforms for Real-Time Image and Video Processing

In this chapter, the hardware tools available for implementing real-time image/video processing systems are presented, starting from a discussion on what kind of hardware is needed for a real-time system and proceeding through a discussion on the processor options available today such as DSPs, FPGAs, media-processor SoCs, general-purpose processors, and GPUs, with references to the recent literature discussing such hardware platforms.

Chapter 4: Software Methods for Real-Time Image and Video Processing

This chapter covers the software methods to be deployed when implementing real-time image/video processing algorithms.Topics include a discussion on software architecture designs followed by a discussion on memory and code optimization techniques.

Chapter 5: The RoadMap

The book culminates with a suggested methodology or road map for the entire process of transitioning an algorithm from a research development environment to a real-time implementation on a target hardware platform using the tools and resources mentioned throughout the previous three chapters.