Rendering Multispectral data as useful 'Super-Visual' images

Matthew Gypson


Introduction

To fulfill a desire to better evaluate and understand the world around us, we often need tools that reach beyond the capabilities of human perception. Our visual system provides one means of perception that has a limited window on the world around us, 400-700nm (wavelength range) of the electromagnetic spectrum. Ancient documents such as the Dead Sea Scrolls sometimes contain important text that is outside our visual range. To evaluate this information is necessary to have a means to visually inspect it. Image information is typically captured through a bandpass filter that isolates the spectral range of interest and records it as grayscale values corresponding to the intensity of a scene. To render information on a wide range of wavelengths, multiple grayscale images are captured through different filters. These multiple images are traditionally compared with each other to look for object details. This process is time consuming and requires an eye trained to look for important details. Labor-intensive preprocessing is required to extract useful information band by band. A particular method may be developed that is specific to the particular subject. It would be advantageous to have an algorithm that is fast and robust and that generates useful results from many different data sets and different types of subjects.

A color image can be generated by overlaying images captured with red, green and blue filters. These are the three basic components or groups of color for human perception. The spectral information we perceive is limited to combinations of these three color bands. Much like the transmissive filters placed in front of a digital camera, the cones in our retina isolate specific spectral regions. Through we cannot manipulate the biology of the eye, we can manipulate multispectral image data to construct an image that conveys information that the unaided eye cannot see. All bands must be manipulated to generate a perceptible "super-visual" image. It is useful to avoid image specific preprocessing as much as possible, to quicken the rendering of an image for display. The method aims to compress the most important details of wide specta into the confines of human perception. The task is to implement data and spectral compression.

Background

Several methods are possible for converting four or more images into the three RGB images. This research explored only that deemed most likely to give useful results. The first conception of this project was the idea of compressing a wide spectral range into a narrower one. This is a type of "data fusion". Munechika (1990) distinguishes three classes of fusion algorithms.

1. "Fusion for visual display" intends to produce images of human interpretation designed to look good to a human interpreter. Histogram manipulation, contrast stretching, and scaling transformations are all examples of such algorithms. This is the method used to develop the algorithm implemented in this project.

2. "Fusion by separate manipulation of the spatial information" is a "component-substitution" (COS) algorithm. It is based on the fact that areas that are bright or dark in one band tend to be bright or dark in at least some of the other bands, According to Schott (1997). The digital counts of two bands are plotted on a 2-D histogram. The data are distributed in an elongated band; in other words digital numbers tend to increase in one both bands. If we know the gray level in one band, we can predict the approximate value in the adjacent band. This correlation means there is redundancy in information in a multispectral information set. COS algorithms create a new coordinate system by creating linear combinations of pixels from the original coordinate system. This method can be used for mulitispectral data sets with any number of bands.

"Component Substitution"
Fig. 1: Gray level in one band can predict approximate value in adjacent band

3. "Fusion for radiometric integrity" creates ratio images by dividing the digital count (DC) in one band by the corresponding DC value in another band for each pixel. The resulting ratios are plotted as an image as demonstrated by Chavez (1991). In a ratio image, the black and white extremes of the gray scale represent pixels having the greatest difference in reflectivity between the two spectral bands. The darkest signatures are areas where the denominator of the ratio is greater than the numerator. Conversely, the numerator is greater than the denominator for the bright signatures. Where the denominator and numerator are equal, there is no difference between the two bands and a midgray is created.

 

 

Theory

The human visual system has a limited range of spectral sensitivity (wavelengths from 400nm to 700nm) and three receptors that roughly correspond to three spectral bands: red, green, and blue. These receptors (cones) sample a spectral range with the detectors at three different peaks. In order to be perceived, the data of a color scene must be transformed for display into a tricolor system. If you were to look closely at a computer monitor or television screen you would see that each picture element or pixel contains three colors. The combination of the red areas make up the "red image"; the same is true for the green and blue. Each image consists of three spectral samples of the object/scene at each pixel. Red, green, and blue values are integrated over bands of wavelengths centered at approximately 400nm, 555nm, and 700nm. This transformation is a routine procedure for an image taken within the visual spectrum, but this is not the case in this system.

Imaging systems that have sample spectral bands outside the visual range require more complex transformations. These "multiple spectra" images include separate wavelength intervals, or bands. A multispectral image differs from conventional color photographs, which combine three overlapping spectral ranges in the visual region. Multispectral images contain more than three bands and several of those bands are usually outside the visual range. Only three images can be displayed as RGB on a monitor or television, so additional processing is required to convert the multispectral images for display. Spectral reflectance curves, or reflectance spectra, record the percentage of incident energy that is reflected by a material as a function of wavelength of the energy. Dips in the spectral reflectance curve are called absorption features because they represent absorption of incident energy; peaks represent reflection of incident energy. These spectral features are clues for recognizing materials and details in images.

Gross (1996) has developed an image-fusion algorithm for remote sensing that combines images with low spatial and high spectral resolution with images that have good spatial resolution but poor spectral resolution. It uses the spectral signature to estimate the percentage of each material within each low-resolution pixel. To achieve high spectral resolution, a narrow-band filter is used to restrict the range of wavelengths. However the detector size must increase to compensate for the reduced irradiances, thus decreasing the spatial resolution. A filter with a wider bandwidth allows more light to pass onto the detector but also decreases the spectral resolution. Gross's algorithm combines an image made with high spatial resolution with wide-spectrum data from an image cube with low spatial resolution. This algorithm requires images from two sensor arrays. Most, if not all, remote sensing applications benefit from high-resolution images. To obtain high spatial resolution, spectral data is sacrificed except where a object is detected twice: a fusion algorithm is used to combine them. Most, if not all, image fusion methods are designed for multispectral remote sensing data. Their goals are to distinguish between different materials within a scene and assign a false color to each. For example, carbon materials have a different spectral reflectance than copper materials and would be assigned a different false color. The 'super-visual' algorithm has a fixed high resolution. Gross's algorithm gets at the goal of a super-visual image in that it distinguishes certain features from others by interpretation of features within the wide spectral range. There is no need to use two resolutions since high resolution is possible both spatially and spectrally with the data used for super-visual images.

 

 

Methods

Super-visual images may be used to evaluate the presence of information across a wide spectrum. When compared to a color image, any additional details in a super-visual image would indicate information present in bands outside the visible spectrum. The developed compression algorithm is designed for six input images. The test subjects used for the algorithm development contained important image details in (UV) and (IR) wavelengths. Six filters (UV, Blue, Green, Red, 2 IR) were used to give a spectral range from 300nm to 1000nm, which is thought to contain many important image details. Not all subjects contain important information in all six bands, but this would not be known until after each band was evaluated. It was known that important information is present in the ultraviolet portion of the spectrum for images of ancient documents. When compared to a color image, any additional details in a super-visual image would indicate information present in bands outside the visible spectrum. Six bands were chosen for consistency and to avoid further need for image capture. The algorithm is easily adaptable to incorporate more or fewer bands if necessary.

Fig. 2: Transmittance of filters used with test data

six bands captured with six filters (UV, Blue, Green, Red, 2 Infrared)
Fig. 3: Six images of the same subject through six different filters

There are four free parameters to these digital color images, spatial resolution and the three available image bands. The same scene is captured with a SENSYS camera (Photometrics, Munich, Germany) through six filters (or bands) so that the same pixel location in all six images corresponds to the same location in the scene imaged. Each pixel has six associated gray values, one for each of the six filters used to capture a different portion of the spectrum. To generate a spectrum in the visible range that approximates a wide spectrum, an interpolation function is created. The original spectrum cannot be fully recovered because it is sampled; the ideal interpolation function would exactly duplicate the spectrum emitted from the subject. The interpolation function used is only a good estimation of the amplitude integrated over each filters transmittance range. The value of the amplitude is assigned at the peak wavelength of each band. The amplitude of the spectrum between the band values is unknown. To estimate the unknown values, an approximation of the original spectrum in created by using the bands as guides. The number of values to be estimated in the spectrum determines the size of a new array. I chose to use sixty values. The pixel filter values are separated in the array by the same incremental spacing. For a faster calculation the interpolation function array is converted into frequency space. The spatial points are treated as discrete delta functions. To generate a smooth continuous curve, the delta functions are convolved with Gaussian functions. The width of the Gaussian was set to the spacing between bands in order to avoid excess dip or rise between bands.

Interpolation function

The interpolation created for each pixel is described by three functions used to produce all colors for visual display. These functions weight the interpolated spectrum by different amounts. They were designed to represent the entire spectrum equally, i.e. the weights sum to unity at each sample at every sample. For calculation, the weighting functions obviously must have the same number of samples as the interpolated spectrum. Each pixel in the super-visual image contains information from three channels. The value of each channel comes from the sum of the corresponding weighting function (red channel, red weighting function) multiplied by the interpolation function at each sample. The interpolation function is also by the green and blue weighting functions. The sum of all samples is the amplitude of the appropriate channel in the pixel. The sum value is sometimes out of scale for an 8-bit image so all the three channels are scaled to have their min and max be 0 and 255.

Fig. 4: Example of the interpolation function with the three color weighting functions

Fig. 5: Gaussian weighting functions used in calculation to generate RGB image

Fig. 6: Each RGB color image is calculated separatly and then combined

Flat fielding

Not all input data is in a form that is useable for a useful output. This was evident in the scroll test data which had two main problems. Algorithms for flat fielding and translation were developed to correct these problems. Some images contain a contrast from very dark to very light portions of the scene. The 'super-visual' algorithm was not useful for perceiving details contained in the darkest portions of the scene. Flat fielding eliminates slowly varying changes in brightness. To achieve an "uniform" scene, the original image is blurred to reveal only the coarse variation in brightness. Calculating the blurred image in the spatial domain is time consuming. Frequency-domain processing is much quicker. This lowpass filter kernal is a 5x5 array of units is zero valued every where except a 5x5 area in the center equal to one. Depending on the subject and the effect of the lighting, the center area could be a 3x3 or 5x5 with a 3x3 area producing the most blurred image. Fourier transforms have low-slowly varying frequencies in the center of the image so when multiplied by a lowpass filter all the low frequencies are obviously passed.

Flat Fielding
Fig. 7: Images of test data without Flat Fielding (left), and with Flat Fielding (right)

Translation

Sometimes the six input images are not all registered, i.e., the pixels do not "line up". This is usually due to movement of the camera while filters are changed. This means that the camera had a different field of view in at least one image. The result is an unregistered or "ghost" image can be seen in the 'super-visual' image. Many techniques have been developed to automatically register images. The speed of the algorithm usually depends on the desired accuracy. It is much faster to process a translation in frequency space. The shifted image can be thought of as the original image convolved by some delta function shifted from the origin (center of image). (translation formulas written out) If the images were identical, we could simply divide the shifted image by the unshifted to leave the delta function. All six images have the same main features but are certainly not identical. When trying to divide two different images in frequency space, the result looks like noise and has no distinguishable delta function. The differences between the images get highlighted as well as the shift and the two delta functions are indistinguishable from each other, which is ineffective for the purpose of this project. The effective technique treated the first image as a "reference" and compare the other five to it. Multiplying the reference image by the complex conjugate of the shifted image results in information about magnitude and phase. The phase contains the distance and direction of the shift. Since both images are similar their main difference comes from the phase. The shift in phase does not appear as a "delta" but rather as the peak of a function with finite width. The resulting function containing the shift is calculated by multiplying the magnitude terms, the image phases almost cancel out and the remaining delta shifts the multiplied magnitudes. This is not entirely accurate but is an effective indicator of the shift and is enough to get a useful result.

Steps required to translate image #2 to closely match the position of reference image #1

Task: Find xo such that g2(x) = f2(x) + d(x- xo) most closely approximates f1[x] (reference image #1)

Ideal Case: The two images are identical but for unknown translation xo



then


So that


Essentially: , an unknown translation


Realistic Case: The image to be registered differs from the reference image, so that:

So that

* The star indicates an auto-correlation *


The end terms combine and the middle term indicates the shifted position of g2[x] (image # 2) compared with f1[x] (the reference image # 1)

 

 

Results

This research succeeded in presenting important spectral details out of the visual range in the final image. This was the first and most important step toward useful results. Success of this first goal indicates that uses of the final image can be explored. The scroll test data contains high contrast text as well as subtler (but equally important) image features. This test data worked well to demonstrate where details are present in the final image. The nature of the test data made evaluation of the result fast and easy. It was easy to see if important details were visible from all 6 bands.

The first algorithm was written in the space domain using loops. This required the program to process multiple time-consuming computations at each pixel individually. It took approximately 50 minutes to process a series of six 1200 x 1200 images. Without changing the algorithm, I rewrote similar code to handle everything in the frequency domain, which reduced the computation time to about 15 minutes or less for images of the same-size.

Details from bands outside of the visual range were visible in the final image. Future modifications may serve to better understand what algorithms are most effective in creating super-visual images. The weighting functions were created to separate the spectral range into three distinct groups. It was thought to be important to have each wavelength equally represented so that the sum of the three weighting functions added up equally across the spectral range. The sum of the Gaussian functions used in the weighting function algorithm closely approximated this result. There may be advantages to an unequal sum, but they were not explored. Linear weighting functions were also applied to the algorithm. There was little to no perceptible difference between 'super-visual' images that used Gaussian or linear weighting functions. Upon closer examination and with quantitative image measurements this should prove not be entirely true. The construction of weighting functions determines the output and so whether two different weighting functions are perceptibly different or not they do have different values at each pixel. These differences would be perceived if the difference were sufficiently large.

The 'super-visual' algorithm is not designed to be radiometrically accurate. Each spectral band is spaced equally in the algorithm and does not correspond to its real spectral spacing. If radiometric accuracy required bands to be spaced in proportion to the true spectrum, the 'super-visual' image would look different. To maintain proportionality, the interpolation function would certainly be needed. Potencially more accurate interpolators can be found in (Castleman, 1996) and could be tested but would most likely take longer to process. Gaussian functions with different widths would be required to create the interpolation function. The weighting functions could stay the same. Without an interpolation function to fill in the gaps between bands, the weighting functions would be biased towards certain bands. This means that the weighting functions would incorporate an unequal number of bands. Those bands spaced further from the others would get weighted more thus displayed with greater amplitude. Since this algorithm is designed for human perception and we are not adapted to perceiving or understanding a scene with a wide spectral range; radiometric or spectral accuracy should not be an important issue for most applications.

Visual versus 'Super-Visual' image

Fig. 8: Image of 'visual' color image (left), and 'Super-Visual' color image (right) with added details from additional bands

 

 

Discussion

The success of a 'super-visual' image is determined by its usefulness, which is based on the perception of details otherwise imperceptible in a typical RGB color image. If the algorithm displays those details in a way that is easy to interpret, then the 'super-visual' image is useful. In order to test its usefulness the 'super-visual' image is compared to each individual band confirming that important detail was not lost during processing. Fortunately a useful 'super-visual' image doesn't need to be compared with any bands; it contains all information (details) contained in the processed bands. The ease of interpretation is subject dependent and relies on knowledge of the subject. Since the 'super-visual' image isn't radiometrically accurate colors don't give much of an indication of the wide band spectrum. Without knowledge of the subject the colors generated may seem distracting. False color in the 'super-visual' image may give some indication of band amplitude though. In the case of the test data, red indicated higher infrared amplitude. Comparing the 'super-visual' image with an RGB image of a subject gave a helpful indication what spectral regions contained certain information.

 

 

Conclusions

This research addressed a need for data compression. The desired result was to be a faster and easier evaluation of a subject containing important details in a wide spectral range. The effectiveness of the compression method is demonstrated by the visibility of important characters in the Scroll test data from bands outside of the visible spectral range. The constructed algorithms successfully compressed sampled wide-band electromagnetic radiation data within the visible range. Important image details were present within a final color image indicating that this method generally yields useful results. Other methods may yield useful results as well and further development and testing of the compression method developed in this project may enhance the results.

 

Table of Contents

Back (Copyright) Forward (IDL code)