Characterization of the Effects of Spatial Translations on JPEG-compressed Images
Keith Jacoby
The compression Standard Established by the Joint Photographic Experts Group
What happens when a digital image is compressed using the JPEG scheme? How is JPEG able to reduce the size of images so substantially? Why is the JPEG standard so widely used in the digital imaging field? Once some familiarity is gained with the JPEG compression scheme, these questions can be easily answered. This report assumes a cursory knowledge of digital imaging; e.g. the reader should be familiar with pixels, resolution, sampling, spatial frequency, and raw gray scale digital image structure in general.
JPEG compression is lossy. In other words, when a digital image is compressed using JPEG, operations within the JPEG compression scheme throw away data. The discarded data is in the form of higher frequency spatial information in the image, which is less visible. When this compressed data is stored as a file and then re-opened, the image exhibits error, or JPEG artifacts(1). If the compression ratio is large, these artifacts show up as clearly visible blocking, with patterns sometimes seen in the blocks. The figures below exhibit the blocking that occurs with a typical greyscale image when compressed using the JPEG scheme. The image in Figure 1 is a greyscale photographic image that exhibits no loss, because it is not compressed using the JPEG compression scheme. Figure 2, on the other hand, is compressed rather heavily using a JPEG quality factor of 20.
Figure 1: "Lossless" image Figure 2: JPEG image (Q.F. = 20)![]()
![]()
The JPEG Blocking Scheme: further detail (2)
The baseline JPEG algorithm partitions the image into 8 x 8-pixel blocks. The pixel data in each block are processed using a discrete cosine transform (DCT), which generates a frequency representation of the 8 x 8 block. The DCT coefficients are quantized, and the integer coefficients Huffman coded. The quantization eliminates DCT coefficients with small amplitudes, thus discarding the higher frequency data that these coefficients represent. This loss of data by quantization and coding is responsible for the high compression ratios that the JPEG scheme is capable of, but is also responsible for varying degrees of loss of image information.
Reconstruction of the image involves decoding of the Huffman-coded DCT data and computation of the inverse DCT to retrieve a facsimile of the original block (3). Since high-frequency data are lost in the quantization step of the JPEG compression, the reconstruction is missing this high spatial frequency information in the reconstructed 8 x 8 block. Compared to the original image data, there is significant loss of information; hence artifacts which appear as blocking in the reconstructed image.
JPEG is inherently lossy because it discards high-frequency data. These discarded data result in error in the reconstructed image, and appear as artifacts. This error depends upon several factors, including image size, compression factor, the relative pixel-to-pixel correlation of the gray values of the image, and the quantization that occurs in the JPEG scheme.
Principal Interest of Research
A JPEG-compressed image may be recompressed with little additional loss of data (1). The fact that a recompression of a JPEG image using JPEG compression at the same quality factor (QF) results in negligible further loss brings up an interesting scenario: what if a JPEG image is cropped, translated, or shifted, so that the image data no longer lines up perfectly with the 8x8 blocks of pixels? If the reconstructed image (Figure 2) is shifted by a small number of pixels, and then recompressed, further degradation may be introduced into the image. This degradation appears as additional blockiness in the reconstruction of the shifted, recompressed JPEG image. Because of the blocking scheme in JPEG compression, even a single pixel shift will introduce additional artifacts. The blocking scheme doesn't shift with the translation of the image data. The JPEG blocking scheme is not smart--it cannot detect the shifted image data. The result is additional error in the recompressed image that shows up as artifacts in the reconstruction.
It seems reasonable to think that there will be additional error in the form of artifacts when recompressing cropped, translated, or shifted JPEG images. Furthermore, it is hypothesized that there is a definite relationship between the displacement of the image data relative to the original JPEG compression's blocking pattern and the additional error that appears in the reconstruction. Put more simply, there is an inverse relationship between the amount of shift of the JPEG image and the integrity of the image data. Figure 3 shows a flowchart of the scenario just described.
JPEG is a compression scheme that operates in the frequency domain; i.e. the image data is transformed from a 2-dimensional spatial domain to a 2-dimensional frequency domain via a DCT operation (3). Since this transform from the spatial to the frequency domain has little to do with color spaces, this research was conducted using 8-bit gray scale images only (256 gray levels). For a color image, each of the color channels in effect is a gray scale image, and JPEG treats each color channel as such, albeit with a bit more complexity involving RGB conversion to a particular colorspace consisting of a luminance axis (Y) and two chrominance axes (C1 and C2) (3). Even though there are color transforms that occur in JPEG when compressing color images, the luminance and chrominance channels get compressed exactly the same way as a gray scale image, only using different quantization for the luminance and chrominance. Because JPEG treats each of the 3 channels of a YCC image as an 8-bit gray scale image with moderately different quantization (4), it was decided that simplifying the research to include only 8-bit monochrome images will be investigated.
Due to the 8 x 8 blocks of the DCT, there exist certain restrictions upon the types of manipulations that can be done on JPEG images. For example, suppose a JPEG image is cropped, translated, or otherwise shifted transversely so that the image data were moved relative to the original positioning of the image data when it was first JPEG-compressed. A subsequent (second) JPEG compression would block-transform many 8 x 8 blocks of pixels that now exhibit the previous JPEG compression's shifted block artifacts. The edges of the original blocks will not line up with the blocks of the subsequent compression. Because there is now an edge that exists in the 8 x 8 block, the DCT operation interprets this edge as a high frequency component, and thus the DCT matrix of coefficients changes. Of course, when the recompressed file is recomposed into an image, the reconstructed 8 x 8 block of pixels now differs to some degree from the original JPEG image. The degree of difference depends on several factors. One factor that effects the difference is obviously the QF given to the JPEG compression routine. Another is the frequency content of the image itself. Finally, there is the subject of this research--the amount of shift of image data.
Several experiments were performed to investigate the effects of translation and recompression on JPEG images. Before describing the experiments in detail, it is important to give an overview of some standard practices observed in all experiments that were vital to preserving the integrity of the data.
All of the images exist as lossless gray scale images. These digital images were resampled to 272 x 272 pixels. The reason for this size will become evident shortly. After the resizing and resampling, the images were saved as TIFF format. These TIFF images provided a baseline image from which a test suite of images were composed of varying degrees of compression. For all of the experiments, each baseline TIFF image was compressed as a JPEG image with incremental QFs ranging from 10 to 100, in steps of 10. These 10 images of varying compression composed a sample population on which the experiments were performed.
In each of the experiments, a relationship between the amount of shift and the error due to the shift was sought. Because the experiments are concerned with the blocking artifacts due only to shifts in the image, strictly controlling the JPEG compression applied to the images was essential. Each instance of a (JPEG) image shift and subsequent recompression was then converted to a TIFF file. By converting the recompressed, shifted images to TIFF files, the preservation of any and all JPEG artifacts present was assured. This being said, these images will always be referred to as JPEG images, even though they were converted to TIFFs.
An example of a composition of a test suite of images is appropriate at this point. Starting with a digital image, in this case a digital photograph saved as a TIFF file, the image is opened in Adobe Photoshop (r) and converted to an 8-bit gray scale image. This image is then resized/cropped and resampled to an image size of 272 by 272 pixels. This new version of the image is saved as a TIFF file for a baseline reference. The next step is to create the test suite of JPEG images at various QFs (QF = 10, 20, 30, 40, 50, 60, 70, 80, 90, 100). The resized TIFF image is opened, this time using GraphicConverter (5). This piece of shareware may convert amoung many image file formats. Its JPEG support is better than Photoshop's, and allowed the precise control of QF. The opened TIFF image is saved as a baseline JPEG with QF set at 10. This procedure is repeated using GraphicConverter for saving JPEG versions of the TIFF image for QFs ranging from 10 to 100, in increments of 10. The 10 resulting JPEG images make up the test suite for that particular image.
Histogram Analysis of Error Images
One very effective way to determine the difference between two images is to do just that--compose a difference image by taking the absolute value of the difference of corresponding pixel values of the two images in question. This is easily achieved using Photoshop's Difference calculation (Image>Calculations: Difference). The pixel gray values of this error image show how much an image has changed due to an operation. This is one extremely useful tool employed in quantifying the error due to a shift of a JPEG image.
The experimental method for "Histogram Analysis of Error Images" is rather tedious, and involved a considerable investment in time for each JPEG image put through the experiment. The steps are as follows:
At this point, we have two JPEG images that can be compared. The first is the original (unshifted) JPEG image, the second is the shifted, resaved JPEG image. For brevity, these two images will hereafter be referred to as U.JPEG and S.JPEG, for Unshifted and Shifted versions. Comparing the two images via differencing and looking at the histogram of the difference image is described:
The above steps were performed for a sample space that contained 70 images across two factor levels: quality factor, and amount of shift (pixels). The standard deviation and mean pixel value of the difference images were recorded, as stated, from the histogram of the error image, and can be seen in the Results section of the thesis.
The problem of characterizing the effects of JPEG images required the use of an environment that allowed one to easily read JPEG image files into arrays that could be easily operated on. For this purpose, the Interactive Data Language (IDL) was considered, along with some alternatives. Since a part of this research involves composing algorithms and working in this environment, it was important to find a powerful, yet simple, language to create tools and manipulate images.
Speaking with many different professors at RIT, it was advised that IDL would be a good choice for coding some simple routines. The process of learning the IDL environment began with reading the SV book and working through some tutorials. It soon became clear that IDL, while easier than C or C++, was still a cryptic coding language and development environment which required strict memory management and a high level of coding expertise to achieve the goals of this research. Further reading of included on-line documentation on the IDL CD-ROM revealed several more distinct differences of IDL and the SV. IDL's web site and documentation touted the ease of use and the great ability to do image processing, along with a very easy to use interface called Insight. However, the SV did not come with these components, and basically included just the language and a meager book. A week-long license for IDL was offered for no charge, and there would be a chance to evaluate the full version of IDL along with another image-processing-savvy environment: MATLAB.
MATLAB (6) is a mathematical development environment that is similar to IDL in that it is a programming language in and of itself, yet the interface that is used to construct routines and programs is much easier and handles large array sizes (image files) with ease. It is particularly image file-friendly, and a host of relevant image processing routines exist in an "Image Processing Toolbox" library.
Reading and writing image data into MATLAB's data elements, or matrices, was extremely simple. The language has built-in functions that allow several image file formats to be read into 2-dimensional matrices. MATLAB provides dozens of processing functions, several of which were combined and used to analyze the images.
Of key importance to this research is the characterization of shifts of JPEG image data, and the subsequent recompression. The shifted image essentially goes through a second JPEG compression, only the second time the artifacts from the first JPEG compression (before the shift) are worsened. The 8 x 8 blocking DCT routine of the JPEG scheme is now operating on 8 x 8 blocks of pixels that do not line up with the artifact "blocks" that exist before the second compression. Since these artifact blocks are skewed, or shifted, relative to the 8 x 8 blocking DCT that is applied to this shifted image, additional artifacts result from the second compression. These artifacts noticeably degrade the quality of the image.
Two experiments designed to help measure artifacts due to shift were developed. One of the experiments involves a simple image-to-image correlation done in the MATLAB environment. The second involves transforming shifted and unshifted images through the 8 x 8 blocking DCT algorithm, also done in MATLAB.
The DCT Image Analysis experiment performed identical 8x8 DCTs on U.JPEG and S.JPEG. The transform image, or DCT image, is then saved as a TIFF file for further analysis. The images are saved as TIFFs for a very good reason. After the images are shifted by n pixels and then resaved as JPEG, they are then opened again and converted to TIFF files so the shift artifacts are preserved without loss.
This experiment computes the 2-D crosscorrelation of an unshifted JPEG image ("U.JPEG") and the corresponding shifted image ("S.JPEG"). The correlation routine (written in MATLAB) simply returns a "correlation coefficient" that is a measure of the "degree of sameness" of the two images. This experiment has been used in a few shift cases to see what kind of R values are returned. The autocorrelation of an image generated the expected R value of unity. For comparison, an image with a shift of one pixel was correlated with the unshifted segment, and a correlation coefficient in the neighborhood of R = 0.9 resulted. This value of R obviously depends greatly on image content.
There are no results to report for the Correlation Coefficient experiment, however the correlation routine may have some relevant application in future experiments. The majority of time available was invested in the first two experimental procedures--the Histogram Analysis of Error Images, and the DCT Image Analysis.
Some operations were easier using a specific application. For example, the shifting operations and compression of the original TIFF files to JPEG formats were handled best by GraphicConverter. Photoshop was better suited for resizing/resampling of the image after compression and for composing the difference images. Time was spent transferring raw difference images into TIFFs, because Photoshop will not save a difference image in any other format. Because so much time was being wasted with file conversions and operations, a registered copy of the GraphicConverter software was purchased for $35. A registered copy has batch processing capability, and image file format transfers became a smooth, easy process. Photoshop 4.0 was also purchased, which has a built-in scripting language that was used to automate several operations.
Histogram Analysis of Error Images Results
The matrix of difference images was created across two factors--the degree of shift (in number of pixels), and quality factor of the JPEG compression (see Table 1 in Experimental section). Histograms of each of the images show the mean, standard deviation, and percentage value for a selected bin (pixel gray value). The mean and standard deviation were recorded and trends in the data were scrutinized.
The mean of the histogram shows the average gray value of the difference image. This value indicates the average change in pixel value as the U.JPEG image underwent a shift and recompression to form the S.JPEG image. Similarly, the standard deviation shows the spread of pixel values that changed. The median gray value of the difference image did not correlate with the QF and shift. The mean and standard deviation data were plotted individually on 3-D wire plots created by Minitab, and are shown in Figures 4 and 5:
Figure 4 Figure 5


The vertical-axis units are simply pixel gray values. For comparison, the plots from a synthetic image that underwent the same process as the photographic image are shown below in Figures 6 and 7:
Figure 6 Figure 7


Clearly, the results differ and will be discussed.
Single-Factor Analysis of Variance (Table 1: ANOVA) was performed on the data to verify that there indeed is a trend in the data that supports the hypothesis of additional artifacts due to image shifts in JPEGs.
Table 1: ANOVA output from Histogram Analysis of Error Images
MTB > Oneway 'Mean' 'shift'.
One-Way Analysis of Variance
Analysis of Variance on Mean
Source DF SS MS F p
shift 6 1.297 0.216 0.45 0.839
Error 63 29.946 0.475
Total 69 31.243
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev -----+---------+---------+---------+-
1 10 2.0230 0.5013 (------------*-----------)
2 10 2.1970 0.6896 (------------*-----------)
3 10 2.3040 0.7665 (------------*-----------)
4 10 2.4090 0.8592 (------------*-----------)
5 10 2.2990 0.7656 (------------*-----------)
6 10 2.1590 0.6678 (------------*-----------)
7 10 2.0200 0.4926 (------------*-----------)
-----+---------+---------+---------+-
Pooled StDev = 0.6894 1.75 2.10 2.45 2.80
MTB > Oneway 'Mean' 'Cfactor'.
One-Way Analysis of Variance
Analysis of Variance on Mean
Source DF SS MS F p
Cfactor 9 28.6188 3.1799 72.70 0.000
Error 60 2.6243 0.0437
Total 69 31.2431
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev ----+---------+---------+---------+--
10 7 3.4514 0.5316 (-*--)
20 7 2.9200 0.2650 (--*-)
30 7 2.6014 0.1707 (-*-)
40 7 2.4443 0.1531 (-*-)
50 7 2.2329 0.1083 (-*-)
60 7 2.0629 0.0864 (-*--)
70 7 1.8943 0.0772 (-*-)
80 7 1.6800 0.0630 (-*-)
90 7 1.3286 0.0540 (-*-)
100 7 1.4000 0.0000 (-*-)
----+---------+---------+---------+--
Pooled StDev = 0.2091 1.40 2.10 2.80 3.50
MTB >
MTB >
MTB >
MTB >
MTB > Twoway 'Mean' 'shift' 'Cfactor';
SUBC> Means 'shift' 'Cfactor'.
Two-way Analysis of Variance
Analysis of Variance for Mean
Source DF SS MS
shift 6 1.2970 0.2162
Cfactor 9 28.6188 3.1799
Error 54 1.3273 0.0246
Total 69 31.2431
Individual 95% CI
shift Mean --+---------+---------+---------+---------
1 2.023 (------*-----)
2 2.197 (-----*------)
3 2.304 (------*-----)
4 2.409 (------*-----)
5 2.299 (-----*------)
6 2.159 (------*------)
7 2.020 (------*-----)
--+---------+---------+---------+---------
1.950 2.100 2.250 2.400
Individual 95% CI
Cfactor Mean ----------+---------+---------+---------+-
10 3.451 (-*-)
20 2.920 (-*-)
30 2.601 (-*-)
40 2.444 (-*-)
50 2.233 (-*-)
60 2.063 (-*-)
70 1.894 (-*-)
80 1.680 (-*-)
90 1.329 (-*-)
100 1.400 (-*-)
----------+---------+---------+---------+-
1.800 2.400 3.000 3.600
MTB >
The DCT Image analysis proved to be the most interesting experiment. Initially, it was hoped that looking at the DCT images of the shifted JPEGs would exhibit a detectable phase difference in the frequency coefficients that corresponded to the degree of shift. Instead, there seemed to be no discernible patterns in any of the DCTs of the S.JPEG images. Taking this idea a little further, it was decided to look at the DCTs of the actual difference images, and see if there was any periodicity. Again, no discernible patterns in these DCT images were seen from one level of shift to the next. Of course, there were differences in the DCTs as the QF varied from low through high. The DCT images became more busy as the QF increased, indicating what was already known--that higher QF's preserve the high spatial frequencies.
Finally, the last attempt made at obtaining useful information from the DCT images was the employment of an "ensemble averaging" routine at the end of the 8 x 8 blocking DCT. It was hoped that a pattern related to the shift would be apparent in the average of all 8x8 DCT blocks of the difference image. This pattern might be some sort of periodicity to the coefficients, or perhaps the coefficients would go negative and then positive in some fashion. This expectation was not observed for any of the difference images--the results of this particular experiment was inconclusive.
Histogram Analysis of Error Images
The experiments performed were first attempts to learn what happens to the image data of a JPEG image when it is shifted and recompressed. The phenomenon is fully visible under the "wrong" conditions, and less visible or invisible under certain conditions. For example, when working with an image intended to be posted on the Internet, and only viewed on a computer screen, the low resolution (typically 72 dpi) tends to magnify any JPEG artifacts that might be present in the image. If downloaded, edited, and incorporated into another web document, the image would most likely not be cropped along the original JPEG blocks. When this image is then resaved on the web page, the artifacts in question are introduced, and are quite visible in areas of the image, depending on image content. If, on the other hand, a high-quality, high resolution JPEG image (300+ dpi) were put through the same cropping regimen, the artifacts would be less noticeable and masked by the higher spatial frequencies that were preserved by the initial JPEG compression. Of course, zooming in so that the individual blocking artifacts become visible will also likely reveal the phenomenon in question. This was the case when attempting to characterize the shift effect--though the phenomenon appears to be unavoidable, the severity/visibility of the artifacts due to shift tend to be dependent upon factors other than QF and shift.
Histogram analysis of error images provided the most significant statistical relationship of the experimental factors. The wire plots of the mean and standard deviation of the error images show a clear trend. As the amount of shift increases from 1 pixel to 4 pixels, the mean pixel value of the error image increases and is at its maximum at 4 pixels. Beyond 4 pixels (5,6, and 7 pixel shifts), the mean error decreases, virtually mirroring the behavior of a decreasing shift from 4 to 1 pixels. This makes sense, because a shift of 1 pixel in one direction is the same, relative to the 8 by 8blocking scheme of the DCT in JPEG, as a shift of 7 pixels in the opposite direction. The same can be said of a shift of 2 pixels and an opposite shift of 6 pixels, and 3 and 5. This increasing/decreasing trend was hypothesized initially, but had not been verified, and it was decided to look at pixel shifts of 1 through 7 pixels to be safe. Any further experimentation will likely omit shifts of beyond 4 pixels in any one direction. The standard deviation, interestingly, does not show any definite trend in mean error with respect to the amount of shift (Figure 4 and 6) for either of the images.
The Analysis of Variance done on the data set does not support the above assertion that there is a relationship between shift and error. This ANOVA study was conducted as a two-factor analysis, with QF and shift being the two factors, and the mean pixel value of the difference image the response. There were 7 levels of shift and 10 levels of QF, making a total of 70 treatments in this two-factor study. It is not known whether this study was an observational or an experimental one because, even though control was exercised in assigning the factor levels to the experimental units, there was only 1 sample per treatment; i.e. there was only 1 image--70 versions of which were being evaluated. Also, there was probably some additional interaction between factor level effects because of the fact that the difference between mean responses (the mean pixel value difference) was very similar for all levels of the shift factor.
DCT Image Analysis and Correlation Coefficient
For the Correlation Coefficient experiment and the DCT Image Analysis experiment, any inferences into what was seen would be premature. The experiments' level of maturity was very low, and though valuable insight and knowledge was gained in setting up these two initial experiments, they need to be revamped completely using a more statistically sound approach. In fact, this also can be said of the ANOVA for the first experiment.
Much of the work done in this research resulted in a better knowledge of what happens to JPEG images when shifted and recompressed. Though the experiments seemed promising at first, it became painfully clear that the phenomenon being dealt with is very difficult to characterize. Knowing the quality factor of a JPEG image is difficult, and involves obtaining the QF from the JPEG header. Further, once an image has been shifted and recompressed, it is too late to do anything about it.
An obvious shortcoming of the research was the large amount of time necessary to learn about the JPEG scheme and the source of artifacts in JPEG images. Another shortcoming was the amount of statistical background needed to determine whether the shifts and the error can be considered statistically significant. More knowledge on how to conduct statistically sound experiments and obtain significant results is needed. For example, there are many types of Analyses of Variance, and the appropriate ANOVA was either not used here or correctly interpreted. Any future experimentation will certainly require taking into account multiple factors that influence the response variable. Further, the response variable (metric) that is analyzed must show a consistent trend across each predictor factor for many different randomly selected images that make up a random sample. In other words, several different images should be put through the same multi-factor study and the mean resulting metric should be evaluated in a multi-factor ANOVA. This would determine if there exists a statistically significant relationship between the amount of shift and the artifacts. If there is no statistically significant relationship, then we cannot say with confidence that there is any effect on the image due to shifts. This certainly does not appear to be the case, and that simple observation justifies further investigation into this phenomenon.
There seems to be a definite trend present; however, the results of the experimentation are too sketchy and cannot be relied upon to characterize any effect due to shift. There are too many other factors at work. One factor, the QF, was under control of the experiment, but another factor that would be random and thus add complication to the experiment would be a group of many different images, all with widely varying spatial frequency content. While this may make the experiment much more complicated and time consuming, there would be added statistical merit to the experiment.
If eventually it is determined that there is a statistically significant effect due to shift, it would be helpful to achieve some way of having the 8 x 8 blocking DCT scheme in JPEG recognize the blocking due to a previous JPEG compression. Teaching JPEG to pad the image accordingly or adjust the blocking scheme appropriately to compensate for any shifts or crops would then be a research project for a computer programmer. This is the ultimate goal of this research. By further analyzing what happens--in both the spatial domain and the frequency domain--to images that are shifted, this problem of additional error due to shift may be eliminated. To do this, better design of experiments and isolating a specific metric is necessary. This research, at the very least, highlights some possible techniques that may be useful in future experimentation and observational studies of JPEG images and artifacts due to shift.