NLM Home Page VHP Home Page


Next: Results Up: Title Page Previous: Introduction Index: Full Text Index Contents: Conference Page 

Methods

Color space and human perception

      A digital color image can be viewed as three valued (channels) positive function I=I(x,y) defined onto a plane and its algebraic representation is obtained through an N by M by 3 matrix A. Thus each entry of A is a three component integer vector (pixel color) expressing intensity value at discrete location (x,y) with a precision p (for instance bit for each channel). Let the color space RGB (red-green-blue) be the input space of the image. Each layer can be viewed as a single channel image which, under particular conditions, can be analyzed independently from the others. But this is not the case of RGB space because, if two channel are fixed,  human visual perception is very sensitive also to weak changes on the value of  the remaining channel. Indeed the key step in lossy data compression is the quantization phase which exploits a data reduction based on their low information content. Nevertheless for three layers this can lead to eliminate some low coefficients in a channel in a certain spatial location even though the corresponding coefficients in the other layers are not eliminated because they carry high information content. When reconstructing the image at that location a high  visual distortion is introduced. Then the assumption of analyzing separately the three layers is valid only if they are not correlated in between them with respect the visual appearance. Therefore according to the visual perception of the human eye the analysis is performed onto the YIQ (Luminance-Chrominance NTSC) space through linear transformation mapping from RGB space. In YIQ space  luminance is directly related to Y layer whereas chrominance is related to I-Q plane. The encoding can be more efficiently performed into this space because the luminance is decorrelate from chrominance and the quantization can be exploited independently in Y with respect I-Q. Moreover because the human eye is more sensitive to luminance than chrominance much more low content information, especially at high frequencies, can be discarded on I-Q plane with respect Y direction without deteriorate the main properties of the image.

Linear wavelet transform

      As stated above the first step is to linear transform the entries of the matrix A  into a more suitable input space. Due to the theory of separable variables, a linear transformation applied to this matrix can be performed before on its rows and then to its columns. Thus the following short mathematics on wavelet focused on 1D signals. Let  f(x) be a 1D signal sj composed by N=2n evenly spaced samples with j the actual level of resolution (frequency content or description level). Let suppose that the energy amount of the signal at low frequencies is much higher than high frequencies one. This assumption is not far from typical images information content distribution. The basic idea of discrete wavelet transform is to decompose f(x) into a coarse and a detail approximation through filtering. This detail approximation can be thought of as the additional information that must be supplied to generate a finer-scale approximation from a coarse one. According to multiresolution analysis, a new decomposition can be applied to the coarse approximation and iterated so that the original signal is spread into a hierarchical tree of detail functions and the coarsest approximation. In order to maintain the same description of the original signal, in term of sample number, at each level of decomposition the detail and the coarse functions are subsampled by a factor of two, i.e. the scale of the signal at low resolution and the detail is halved. No information of the original signal is lost in this step. Given thus a signal of length N=2nn subbands can be obtained. In figure 2 a dyadic multiresolution decomposition is shown for a 16 samples signal. In this approach, the decorrelation transform operated by wavelets analyzes the signal progressively starting from features at high resolution and scale towards coarser resolutions and scales. This leads to narrower subbands at low frequencies and wider subbands in the high frequencies. As stated above wavelet transform is carried out by projecting the signal into two sub-spaces: scale and detail spaces. The two projections are scalar products of the signal with two real filters {ci} and {di}. By constraining the orthogonality among the two projection spaces the wavelet filter {di} is retrieved by the scaling filter {ci}  as di = (-1)ic1-i. Synthesis filter can be obtained directly from the analysis filter and allow to reconstruct the original function from its multiresolution description.

     Given a scaling filter {gi} and wavelet filter {hi} of length M,  the smooth (sj-1) and detail (dj-1) filtered functions, corresponding to low and high frequency, at resolution and scale j-1, are expressed by the following equations:

(1)
(2)
with i=1...2(j-1).

     As stated above at each step of decomposition the scale is halved along with the resolution.  The most simple wavelet filter is the Haar transform, which is defined by two equal samples. At each level of decomposition,  given sj,  it generates two sequences sj-1 (coarse version at  resolution  j-1) and dj-1 (detail at resolution j-1) of length 2n-1. The first one is obtained by averaging pair of following samples of sj and  dj-1  is obtaining by difference of pair of following samples of sj If the original sequence sj presents some local correlation, the low frequency signal energy will be concentrate in the coarse version and the detail, containing high frequencies, will be small allowing to use a few bits to encode it. The process can be iterated applying the same procedure to the sequence sj-1 till we attain sj being described exactly by a detail sequence set  (dj-1,dj-2,dj-3, ....,d0)  and the coarsest version s0 , a scalar value, which represents the mean value of  sj. But Haar transform is not very efficient in decorrelating the input signal. This transform is limited to capture exactly only signals which are constant, i.e. it eliminates only the zeroth order moment, the mean value of the signal, from the computed coefficients. More sophisticated wavelet basis function with high order moment can be applied to this analysis which allow to decorrelate more effectively the input signal values. But other design criteria, such as smoothness, accuracy of approximation, size of support and frequency localization,  play a significant role in the choice of the optimal wavelet basis.

      Let return to the original image stored in the N rows by M columns multidimensional matrix A. Each of the three color channel can be managed independently from one another and so in the following we are dealing with a single channel image. Due to separable variables, the 1-D wavelet transform applied first to each row so that we obtained N pairs smooth detail of length M/2. Then we applied wavelet transform for all M/2 columns of length N of L (smooth version of the rows) and H (detail version of the rows) and the result is a four-subband partition of the image into horizontal low-pass/vertical low-pass (LL1), horizontal high-pass/vertical low-pass (HL1), horizontal low-pass/vertical high-pass (LH1), horizontal high-pass/vertical high-pass (HH1) sub-images.  The LL1 image represents the smooth version whereas the other three images are the details. The previous procedure was iterated on LL1 till the third level of decomposition as can be seen in figure 3 and in figure 4 for the VHD slice a_wm1480. Then a pyramid of images was obtained, one for each channel.

Quantization of the wavelet coefficients and embedded encoding

      Let us face now to the problem of quantization of the computed coefficients in each subband. Two different approach have been developed. With scalar quantization (SQ) a value is represented by a fixed subset of representative values. For examples, a 16 bit pixel can be cast to only 8 most signifcant bits and some loss in resolution have to be paid. In vector quantization (VQ) a set of values are coded as single value by exploiting some redundancu in the values. Vector quantization (VQ) has shown superior capability in encoding with respect scalar quantization (SQ) but yet some drawbacks such as high memory and computational requirements limit the effective use in image compression. Moreover some assumption about the probability density distribution of the coefficients has to be given a priori which can not reflect the true one. If the multiresolution decomposition would operate a perfect decorrelation the obtained coefficients would be distributed as a Gaussian random process and vector quantization would be applicable. In reality this is not true and some relevant structures are still present in the sub-bands and can not be further decomposed. High frequency features such as edges are captured by wavelet coefficients in the detail subbands and are propagated in the tree at coarser resolution. Entropy coders based on scalar quantization has been proposed in order to capture this residual redundancy and reject unimportant information. Based on particular assumptions, these techniques allow to recover correlation on corresponding wavelet coefficients in following levels of decomposition (let remember that a wavelet coefficient at resolution/scale j has four children at resolution/scale j+1). Zerotree [3] coders are related to this approach and they exploit the self-similarity across bands of the wavelet decomposition. This property relates on the observation that whether a wavelet coefficient in a sub-band is lower than a threshold T, i.e. it is not significant, then its children on the corresponding wavelet tree, located at the same spatial location, are not significance with high probability too ( see figure 5). Then if the above condition is verified only the father coefficient needs to be coded, (as zero-tree root ZTR), i.e. its children are predicible from their father  and only one symbol is transmitter to the coder. For safe of clarity a coefficient is not significant for a given threshold T if its absolute value is lower than T. If the coefficient is significant, it is coded as POS, if positive, or NEG if negative. If a father coefficient is not significant at threshold T but a descendent of him is significant, then the father coefficient will be coded as isolated zero (IZ). This precise techniques has been proposed by Shapiro [5] and it is named 'Embedded Zero-tree Wavelet'.  In  figure 6 this mapping procedure is shown and it is referred as the building of a significance map.

      Let show the procedure applied to a single layer.
        1. In the first step of the procedure all the wavelet coefficients are sorted into a one-dimensional vector V  in descending order according to their values. The above sorting has located low frequency bands before high frequency one since low frequencies have more information content with respect the higher ones.
        2. An initial threshold level T=T0 is chosen (for instance the half of the maximum coefficient)
        3. The significance map is built in order to classify each wavelet coefficient in V according to the ZTR, IZ, POS , NEG symbols.
        4. From V a bit-slice B1 is obtained with the significant coefficients for T
        5. The significance map and the corresponding wavelet values are quantized: when coding B1 only the significant, zero-tree and isolated zero symbol are sent to the encoder and all the remaining zeros are not coded
        6. The significant coefficients for the threshold T  already coded are eliminated from V
        7. The points 3,4,5,6 are repeated for T=T0/2, T=T0/4, T=T0/8, … and the bit-slices B2, B3, B4, .. are obtained and coded till some stop condition is achieved. Here a refinement of the Shapiro algorithm has been applied according to [4].

     This approach allows to code following bit-plane slices from a coarse resolution with following refinement details  and an embedded code is built (see figure 7.

     In order to reconstruct the image a decoder will be adopted which performs an anti-transform from wavelet coefficient domain to image domain.
 


Next: Results Up: Title Page Previous: Introduction Index: Full Text Index Contents: Conference Page