One of the most striking features about the Cosmic Microwave Background (CMB) is that it is incredibly compressible from an information content point of view. The Planck satellite produced maps with of order a billion pixels whose information could be compressed almost perfectly into a power spectrum of order one thousand real numbers.

This already is a massive compression. But in addition, most of this information can be compressed further into just six of the parameters of the standard model, yielding a total compression of about one billion to one. This is both remarkable and annoying because we want to be surprised and find things that we can’t explain. And if there are things we can’t explain we want to have clear signals data about them, not just vague hints of their existence.

Anyway, to illustrate just how efficient the compression is, I took the binned WMAP 9 TT power spectrum data – I refused to use the Planck power spectra because they are available only in a FITS file (which is like keeping a fire extinguisher in a safe) – and did some symbolic regressions with the cool Eureqa tool to try to find some relatively simple analytic functions to fit the data. After some reasonably extensive searching involving a few million generations with multiple restarts I was able to get some “reasonable” fits. One was:

$D_{\ell} = 627.95 + 5.16\ell + 4248.78 \exp( -(0.0092 \ell – 1.92)^2) – 0.004 \ell^2 – 0.64 \ell \cos(\cos(\cos(5.16 \ell)) – \sqrt(\ell))$

where is the usual set of Legendre polynomial coefficients of the angular two-point correlation function scaled by [the above latex expression does not compile in wordpress for some reason].

This contains eight parameters and it isn’t often you see being used. The fit is shown against the WMAP data below.

This wasn’t a very exhaustive search but it illustrates how non-trivial it is to fit the CMB power spectrum beautifully with just six free parameters, especially when you consider that those six parameters are filtered through the Einstein equations, thermodynamics and the Boltzmann equation and about fourteen billions years of slow cooling and massaging.

As an aside, it seems to me like a good sign if a theory matches the data much better than any simple analytic formulae or parametrizations. When I was looking at the fit I was initially a little surprised to see the jagged appearance of the symbolic fit, shown by the blue line in the first figure. This, I realised, was because it was just drawing straight lines between the function points evaluated at the ell values of the 50 or so binned WMAP data points. So instead of plotting the theory against the WMAP central ell values, I plotted a zoom of the data against all the relevant ells and wow…suddenly that cos(cos(cos( ))) really pops out…

This is actually rather salubrious and shows the potential dangers of binning data before fitting to models. Binning isn’t model-independent because it opens up a large amount of high-frequency model phase-space which would actually not be a good fit to the full dataset. Rerunning the symbolic regression, now with the full 1100 or so unbinned WMAP data points instead gave the following good fit:

$D_{\ell} = 3861.82 + 3.02 \ell\sin(1.88 – 0.015 \ell) – 2.39 \ell – 2304.25 \sin(1.88 – 0.015 \ell) – 740.99 \sin(1.88 – 0.015\ell) \cos(5.29 + 0.006\ell)$

which has none of the offending high-frequency terms at the expense of 11 free parameters; it is shown below. Now none of these symbolic fits are particularly amazing which illustrates the elegant minimialism of the theoretical predictions, especially when you consider that the theoretical model also fits the polarization spectra (TE and EE) with the same parameters. Now if only we understood the dark matter and dark energy that go into these predictions!

Reblogged this on In the Dark and commented:

I just came across this post, one of a series from the African Institute for Mathematical Sciences, which is in Muizenberg near Cape Town, South Africa, Bruce Bassett explains just how much of the information we get from measurements of the Cosmic Microwave Background can be squeezed into precise estimates of just a few parameters. The only point I would add is that this does assume at the outset that all relevant information is contained within the angular power spectrum; that’s not necessarily the case, but we don’t have any compelling evidence that it’s a wrong assumption for the CMB; see

https://telescoper.wordpress.com/2009/01/06/power-isnt-everything/

for a previous discussion of this.