Scale Mixtures of Gaussians and the Statistics of Natural Images Martin J. Wainwright Stochastic Systems Group Electrical Engineering & CS MIT, Building 35-425 Cambridge, MA 02139 mjwain@mit. edu Eero P. Simoncelli Ctr. for Neural Science, and Courant Inst. of Mathematical Sciences New York University New York, NY 10012 eero. simoncel li @nyu. edu Abstract The statistics of photographic images, when represented using multiscale (wavelet) bases, exhibit two striking types of non- Gaussian behavior. First, the marginal densities of the coefficients have extended heavy tails. Second, the joint densities exhibit vari- ance dependencies not captured by second-order models. We ex- amine properties of the class of Gaussian scale mixtures, and show that these densities can accurately characterize both the marginal and joint distributions of natural image wavelet coefficients. This class of model suggests a Markov structure, in which wavelet coeffi- cients are linked by hidden scaling variables corresponding to local image structure. We derive an estimator for these hidden variables, and show that a nonlinear "normalization" procedure can be used to Gaussianize the coefficients. Recent years have witnessed a surge of interest in modeling the statistics of natural images. Such models are important for applications in image processing and com- puter vision, where many techniques rely (either implicitly or explicitly) on a prior density. A number of empirical studies have demonstrated that the power spectra of natural images follow a 1If v law in radial frequency, where the exponent  is typically close to two [e.g., 1]. Such second-order characterization is inadequate, however, because images usually exhibit highly non-Gaussian behavior. For in- stance, the marginals of wavelet coefficients typically have much heavier tails than a Gaussian [2]. Furthermore, despite being approximately decorrelated (as sug- gested by theoretical analysis of 1If processes [3]), orthonormal wavelet coefficients exhibit striking forms of statistical dependency [4, 5]. In particular, the standard deviation of a wavelet coefficient typically scales with the absolute values of its neighbors [5]. A number of researchers have modeled the marginal distributions of wavelet coef- ficients with generalized Laplacians, p¾(y) c exp(-ly/AI p) [e.g. 6, 7, 8]. Special cases include the Gaussian (p - 2) and the Laplacian (p - 1), but appropriate ex- Research supported by NSERC 1969 fellowship 160833 to MJW, and NSF CAREER grant MIP-9796040 to EPS. 856 M. J. Wainwright and E. P Sirnoncelli Mixing density Positive, V/ - stable No explicit form GSM density GSM char. function symmetrized Gamma Student: , 1 [1/(), + > a-stable generalized Laplacian: exp(-[y/XlP), pe (0,2] t2 (l+2-X-  , >0 No explicit form exp (-Itla), a e (0,2] No explicit form Table 1. Example densities from the class of Gaussian scale mixtures. Z(7 ) de- notes a positive gamma variable, with density p(z)= [1/F(y)]z *- exp(-z). The characteristic function of a random variable x is defined as opt(t) -- f_øøoo p(x) exp (jxt) dx. ponents for natural images are typically less than one. Simoncelli [5, 9] has modeled the variance dependencies of pairs of wavelet coefficients. Romberg et al. [10] have modeled wavelet densities using two-component mixtures of Gaussians. Huang and Mumford [11] have modeled marginal densities and cross-sections of joint densities with multi-dimensional generalized Laplacians. In the following sections, we explore the semi-parametric class of Gaussian scale mixtures. We show that members of this class satisfy the dual requirements of being heavy-tailed, and exhibiting multiplicative scaling between coefficients. We also show that a particular member of this class, in which the multiplier variables are distributed according to a gamma density, captures the range of joint statistical behaviors seen in wavelet coefficients of natural images. We derive an estimator for the multipliers, and show that a nonlinear "normalization" procedure can be used to Gaussianize the wavelet coefficients. Lastly, we form random cascades by linking the multipliers on a multiresolution tree. I Scale Mixtures of Gaussians d d A random vector Y is a Gaussian scale mixture (GSM) if Y - zU, where = denotes equality in distribution; z > 0 is a scalar random variable; U ,- iV'(0, Q) is a Gaussian random vector; and z and U are independent. As a consequence, any GSM variable has a density given by an integral: /_ 1 ( YTo-Y'  PY(Y) = oo Iz2QI1/2 exp  / c)(z)dz. where bz is the probability density of the mixing variable z (henceforth the mul- tiplier). A special case of a GSM is a finite mixture of Gaussians, where z is a discrete random variable. More generally, it is straightforward to provide condi- tions on either the density [12] or characteristic function of X that ensure it is a GSM, but these conditions do not necessarily provide an explicit form of bz. Nev- ertheless, a number of well-known distributions may be written as Gaussian scale mixtures. For the scalar case, a few of these densities, along with their associated characteristic functions, are listed in Table 1. Each variable is characterized by a scale parameter A, and a tail parameter. All of the GSM models listed in Table 1 produce heavy-tailed marginal and variance-scaling joint densities. Scale Mixtures of Gaussians and the Statistics of Natural Images 857 baboon boats flower frog -500 0 5430 _2[ -500 0 500 1000 -1000 -500 0 500 ['7, ,'k2] = [0.97, 15.04] AH/H = 0.00079 [0.45, 13.77] [0.78, 26.83] [0.80, 15.39] 0.0030 0.0030 0.0076 Figure 1. GSMs (dashed lines) fitted to empirical histograms (solid lines). Below each plot are the parameter values, and the relative entropy between the histogram (with 256 bins) and the model, as a fraction of the histogram entropy. 2 Modeling Natural Images As mentioned in the introduction, natural images exhibit striking non-Gaussian behavior, both in their marginal and joint statistics. In this section, we show that this behavior is consistent with a GSM, using the first of the densities given in Table I for illustration. 2.1 Marginal distributions We begin by examining the symmetrized Gamma class as a model for marginal distributions of wavelet coefficients. Figure I shows empirical histograms of a par- ticular wavelet subband 1 for four different natural images, along with the best fitting instance of the symmetrized Gamma distribution. Fitting was performed by min- imizing the relative entropy (i.e., the Kullback-Leibler divergence, denoted AH) between empirical and theoretical histograms. In general, the fits are quite good: the fourth plot shows one of the worst fits in our data set. 2.2 Normalized components For a GSM random vector Y __.a zU, the normalized variable Y/z formed by component-wise division is Gaussian-distributed. In order to test this behavior empirically, we model a given wavelet coefficient Y0 and a collection of neighbors {Yl,.- ß , YN) as a GSM vector. For our examples, we use a neighborhood of N = 11 coefficients corresponding to basis functions at 4 adjacent positions, 5 orientations, and 2 scales. Although the multiplier z is unknown, we can estimate it by max- imizing the log likelihood of the observed coefficients:  _a argmaxz  logp(Y]z)). Under reasonable conditions, the normalized quantity Y/ should converge in dis- tribution to a Gaussian as the number of neighbors increases. The estimate  is simple to derive:  = argmax{logp(Y[z)} z = argmin {Nlog(z) + YrQ-1y/2z2} z = v/YrQ-1y/N, 1We use the steerable pyramid, an overcomplete multiscale representation described in [13]. The marginal and joint statistics of other multiscale oriented representations are similar. 858 M. d. Wainwright and E. P. Simoncelli baboon boats flowers frog o AH/H = 0.00035 0.00041 o 0.00042 0.00043 Figure 2. Marginal log histograms (solid lines) of the normalized coefficient v for a single subband of four natural images. Each shape is close to an inverted parabola, in agreement with Gaussians (dashed lines) of equivalent empirical variance. Below each plot is the relative entropy between the histogram (with 256 bins) and a variance-matched Gaussian, as a fraction of the total histogram entropy. where Q __a E [UU T] is the positive definite covariance matrix of the underlying Gaussian vector U. Given the estimate , we then compute the normalized coefficient y  Yo/. This is a generalization of the variance normalization proposed by Ruderman and Bialek[1], and the weighted sum of squares normalization procedure used by Simoncelli [5, 14]. Figure 2 shows the marginal histograms (in the log domain) of this normalized coefficient for four natural images, along with Gaussians of equal empirical variance. In contrast to histograms of the raw coefficients (shown in Figure 1), the histograms of normalized coefficients are nearly Gaussian. The GSM model makes a stronger prediction: that normalized quantities corre- sponding to nearby wavelet pairs should be jointly Gaussian. Specifically, a pair of normalized coefficients should be either correlated or uncorrelated Gaussians, depending on whether the underlying Gaussians U = Jul u2] T are correlated or un- correlated. We examine this prediction by collecting joint conditional histograms of normalized coefficients. The top row of Figure 3 shows joint conditional histograms for raw wavelet coefficients (taken from the same four natural images as Figure 2). The first two columns correspond to adjacent spatial scales; though decorrelated, they exhibit the familiar form of multiplicative scaling. The latter two columns cor- respond to adjacent orientations; in addition to being correlated, they also exhibit the multiplicative form of dependency. The bottom row shows the same joint conditional histograms, after the coefficients have been normalized. Whereas Figure 2 demonstrates that normalized coefficients are close to marginally Gaussian, Figure 3 demonstrates that they are also approx- imately jointly Gaussian. These observations support the use of a Gaussian scale mixture for modeling natural images. 2.3 Joint distributions The GSM model is a reasonable approximation for groups of nearby wavelet coef- ficients. However, the components of GSM vectors are highly dependent, whereas the dependency between wavelet coefficients decreases as (for example) their spa- tial separation increases. Consequently, the simple GSM model is inadequate for global modeling of coefficients. We are thus led to use a graphical model (such as tree) that specifies probabilistic relations between the multipliers. The wavelet co- efficients themselves are considered observations, and are linked indirectly by their shared dependency on the (hidden) multipliers. Scale Mixtures of Gaussians and the Statists'cs of Natural Images 859 baboon boats flowers frog -500 0 500 -500 0 500 -500 0 500 -500 0 500 -25 0 25 -25 0 25 -25 0 25 -25 0 AH/H = 0.0643 0.0743 0.0572 0.0836 25 Figure 3. Top row: joint conditional histograms of raw wavelet coefficients for four natural images. Bottom row: joint conditional histograms of normalized pairs of coefficients. Below each plot is the relative entropy between the joint histogram (with 256 x 256 bins) and a covaxiance-matched Gaussian, as a fraction of the total histogram entropy. For concreteness, we model the wavelet coefficient at node s as y(s) where x(s) is Gaussian, so that z - ]]xl] is the square root of a gamma variable of index 0.5. For illustration, we assume that the multipliers are linked by a multiscale autoregressive (MAR) process [15] on a tree: x(8) = x(p½)) + x/T- where p(s) is the parent of node s. Two wavelet coefficients y(s) and y(t) are linked through the multiplier at their common ancestral node denoted s A t. In particular, the joint distributions are given by - *^*) x(, A ,) + (*)11 u(,) where v, v2 are independent white noise processes; and d(, ) denotes the distance between a node d one of its ancestors on the tree (e.g., d(s,p(s)) = 1). For nodes s and t at the same scale and oriemation but spatially separated by a dis- tance of A(s, t), the distance between s and the coon ancestor s A t ows d(s,s  t)  gog2(A(s, t)) + 1]. The first row of Figure 4 shows the range of behaviors seen in joint distributions taken from a wavelet subband of a particul natural image, compared to simulated GSM gamma distributions with  = 0.92. The first column corresponds to a pair of wavelet filters in quadrature phe (i.e., related by a Hilbert transform). Note that for this pair of coefficients, the contours are nearly circular, an observation that h been previously made by Zetzsche [4]. Nevertheless, these two coefficients e dependent,  shown by the multiplicative scaling in the conditional histo,am of the third row. This type of scaling dependency h been extensivdy documemed by Simoncelli [5, 9]. Analogous plots for the simulated Gamma model, with zero spatial separation are shown in rows 2 and 4. As in the image data, the contours of the joint density are very close to circular, and the conditional distribution shows a string variance dependency. 860 M. J. Wainwright and E. P. Simoncelli image data simulated model image data quad. pair overlapping near distant simulated model Figure 4. Examples of empirically observed distributions of wavelet coefficients, compared with simulated distributions from the GSM gamma model. First row: Empirical joint histograms for the "mountain" image, for four pairs of wavelet coef- ficients, corresponding to basis functions with spatial separations A -- (0, 4, 8, 128). Second row: Simulated joint distributions for Gamma variables with p -- 0.92 and the same spatial separations. Contour lines are drawn at equal intervals of log probability. Third row: Empirical conditional histograms for the "mountain" im- age. Fourth row: Simulated conditional histograms for Gamma variables. For these conditional distributions, intensity corresponds to probability, except that each column has been independently rescaled to fill the full range of intensities. The remaining three columns of figure 4 show pairs of coefficients drawn from iden- tical wavelet filters at spatial displacements A = (4, 8, 128), corresponding to a pair of overlapping filters, a pair of nearby filters, and a distant pair. Note the pro- gression in the contour shapes from off-circular, to a diamond shape, to a concave "star" shape. The model distributions behave similarly, and show the same range of contours for simulated pairs of coefficients. Thus, consistent with empirical obser- vations, a GSM model can produce a range of dependency between pairs of wavelet coefficients. Again, the marginal histograms retain the same form throughout this range. 3 Conclusions We have proposed the class of Gaussian scale mixtures for modeling natural images. Models in this class typically exhibit heavy-tailed marginals, as well as multiplicative scaling between adjacent coefficients. We have demonstrated that a particular GSM (the symmetrized Gamma family) accounts well for both the marginal and joint distributions of wavelet coefficients from natural images. More importantly, this model suggests a hidden Markov structure for natural images, in which wavelet coefficients are linked by hidden multipliers. Romberg et al. [10] have made a related proposal using two-state discrete multipliers, corresponding to a finite mixture of Gaussians. Scale Mixtures of Gaussians and the Statistics of Natural Images 861 We have demonstrated that the hidden multipliers can be locally estimated from measurements of wavelet coefficients. Thus, by conditioning on fixed values of the multipliers, estimation problems may be reduced to the classical Gaussian case. Moreover, we described how to link the multipliers on a multiresolution tree, and showed that such a random cascade model accounts well for the drop-off in de- pendence of spatially separated coefficients. We are currently exploring EM-like algorithms for the problem of dual parameter and state estimation. Acknowledgements We thank Bill Freeman, David Mumford, Mike Schneider, Ilya Pollak, and Alan Willsky for helpful discussions. References [1] D. L. Ruderman and W. Bialek. Statistics of natural images: Scaling in the woods. Phys. Rev. Letters, 73(6):814-817, 1994. [2] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A, 4(12):2379-2394, 1987. [3] A. H. Tewfik and M. Kim. Correlation structure of the discrete wavelet coefficients of fractional Brownian motion. IEEE Trans. Info. Theory, 38:904-909, Mar. 1992. [4] C. Zetzsche, B. Wegmann, and E. Barth. Nonlinear aspects of primary vision: Entropy reduction beyond decorrelation. In Int 'l Syrup. Soc. for Info. Display, volume 24, pages 933-936, 1993. [5] E. P. Simoncelli. Statistical models for images: Compression, restoration and synthe- sis. In 31st Asilomar Conf., pages 673-678, Nov. 1997. [6] S. G. Mallat. A theory for multiresolution signal decomposition: the wavelet repre- sentation. IEEE Pat. Anal. Mach. Intell., 11:674-693, July 1989. [7] E. P. Simoncelli and E. H. Adelson. Noise removal via Bayesian wavelet coring. In Proc. IEEE ICIP, volume I, pages 379-382, September 1996. [8] P. Moulin and J. Liu. Analysis of multiresolution image denoising schemes using a generalized Gaussian and complexity priors. IEEE Trans. Info. Theory, 45:909-919, Apr. 1999. [9] R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical char- acterization in the wavelet domain. IEEE Trans. Image. Proc., 8(12):1688-1701, Dec. 1999. [10] J.K. Romberg, H. Choi, and R.G. Baraniuk. Bayesian wavelet domain image modeling using hidden Markov trees. In Proc. IEEE ICIP, Kobe, Japan, Oct. 1999. [11] J. Huang and D. Mumford. Statistics of natural images and models. In CVPR, paper 216, 1999. [12] D.F. Andrews and C.L. Mallows. Scale mixtures of normal distributions. J. Royal Star. Soc., 36:99-102, 1974. [13] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: A flexible architecture for multi-scale derivative computation. In Proc. IEEE ICIP, volume III, pages 444- 447, Oct. 1995. [14] E. P. Simoncelli and O. Schwartz. Image statistics and cortical normalization models. In M. S. Kearns, S. A. Solla, and D. A. Cobh, editors, Adv. Neural Information Processing Systems, volume 11, pages 153-159, Cambridge, MA, May 1999. [15] K. Chou, A. Willsky, and R. Nikoukhah. Multiscale systems, Kalman filters, and Riccati equations. IEEE Trans. Automatic Control, 39(3):479-492, Mar. 1994.