User:Schancee/sandbox

Efficient coding
Introduction

Between the late 1990s and at the beginning of the 21st century Bruno Olshausen and Michael Lewicki respectively studied how natural images and natural sounds are encoded by the brain and tried to create a model which would replicate this process as accurately as possible. It was found that the process of both input signals could be modeled with very similar methods. The goal of efficient coding theory is to conceil a maximal amount of information about a stimulus by using a set of statistically independent characteristics. Efficient coding of natural images arises to a population of localized, oriented Gabor wavelet-like filters,. Gammatone filters are the equivalent of these for the auditory system. In order to distinguish shapes in an image the most important feature is edge detection, which is achieved with gabor filters. In sound processing, sound onsets or 'acoustic edges' can be encoded by a pool of filters similar to a gammatone filterbank.

Vision

In 1996, Bruno Olshausen and his team were the first to create a learning algorithm which aims to find sparse linear codes for natural images and maximizes sparseness will form a group of localized, oriented, bandpass receptive fields, analogous to those found in the primary visual cortex.

Assuming that an image $$I(x,y)$$ can be depicted as a linear superposition of basis functions, $$\phi_i (x,y) $$.

$$I(x,y) = \sum_{i} a_i \phi_i (x,y) $$

According to what basis function $$\phi_i (x,y) $$ is chosen the image code is different. The parameters $$a_i$$ are different for each image. The objective of efficient coding is to find a family of $$\phi_i (x,y) $$ that spans the image space and obtains parameters $$a_i$$ which are as statistically independent as possible.

Natural scenes contain many higher-order forms of statistical structure which are non-gaussian. Using principal component analysis to attain these two objectives would thereby be unsuitable. Statistical dependences among a pool of parameters can be detected as soon as the joint entropy is less than the sum of individual entropies:

$$H(a_1, a_2,...,a_n) < \sum_{i} H(a_i)$$

It is assumed that natural images have a 'sparse structure', meaning the image can be expressed in function of a a small amount of characteristics amongst a larger set,. The objective is to look for a code lowering entropy, where the probability distribution of each parameter is unimodal and tops out around zero. This can be articulated as an optimization problem.

$$E = - [\text{preserve information}] - \lambda[\text{sparseness of } a_i]$$

where $$\lambda$$ is positive weight coefficient. The first quantity evaluates the mean square error between the natural image and the reconstructed image.

$$[\text{preserve information}] = - \sum_{x,y} [ I(x,y) - \sum_{i}a_i\phi_i (x,y) ]^2$$

The second quantity is attributed a higher cost if for a given picture the different parameters are distributed sparsely. This is calculated by adding up each coefficient's activity plugged in a nonlinear function $$S(x)$$.

$$[\text{sparseness of } a_i] = - \sum_{i} S\left ( \frac{a_i}{\sigma} \right )$$

where $$\sigma$$ is a scaling constant. For $$S(x)$$, functions favoring amid activity states with equal variance those with the least amount of non-zero parameters(e.g. $$-e^{-x^2}$$, $$log(1+x^2)$$, $$\left\vert x \right\vert$$).

By minimizing the total cost $$E$$ over $$a_i$$, learning is achieved. The $$\phi_i$$ converges by gradient descent on $$E$$ averaged over multiple image variations. The algorithm enables the basis functions to be overcomplete dimensionwise and non-orthogonal, without decreasing the state of sparseness.

After the learning process, the algorithm was tested on artificial datasets, confirming that it is suited to detecting sparse structure in the data. Basis functions are well localized, oriented and selective to diverse spatial scales. Contriving the response of each $$a_i$$ to spots at every position established a similarity between the receptive fields and the basis functions. All basis functions form together an accomplished image code spanning the joint space of spatial position, orientation and scale in a manner similar to wavelet codes.

To conclude, Olshausen's team's results show that the two sufficient objectives for the emergence of localized, oriented, bandpass receptive fields are that information be preserved and the representation be sparse.

Audition



Lewicki published his findings posterior to Olshausen in 2002 and he tested the efficient coding theory inspired from the prior paper to derive efficient codes for different classes of natural sounds, which were animal vocalizations, environmental sound and human speech.

The precise method is called independent component analysis (ICA), which enables the extraction of linear decomposition of signals minimizing correlations and higher-order statistical dependencies. This learning algorithm then yields a filter for each data set which can be interpreted in the form of a time-frequency windows. The filter shape is determined by the statistical structure of the ensemble.

When applied to the different sample sounds, the method obtained filters with time-frequency windows similar to that of a wavelet for environmental sounds where sound is localized in both time and frequency (Fig. 1c), when for animal vocalizations a tiling pattern similar to Fourier transform is obtained where sound is localized in frequency but not in time (Fig. 1d). Speech contains a mixture of both with a weighting of 2:1 of environmental to animal sounds (Fig. 1e). That is due to the fact that speech is composed of harmonic vowels and non-harmonic consonants. These discovered patterns have been observed experimentally in animals and humans previously.

In order to break down the core differences of these three types of sounds, Lewicki's team analyzed bandwidth, filter sharpness, and the temporal envelope. Bandwidth increases as a function of center frequency for environmental sounds, whereas it stays constant for animal vocalizations. Speech increases as well but less than environmental sounds. Due to the time/frequency trade-off the temporal envelope curves behave similarly. When comparing the sharpness with respect to center frequency of physiological measurements, from speech data with the sharpness of the combined sound ensembles, consistency between both intricacies was confirmed.

It must be noted that several approximations were necessary to conduct this analysis. This analysis omitted to include the variations in intensity of sound. The auditory system obeys to certain intensity thresholds according to which frequencies are chosen. However the physiological measurements, with which these measurements are compared, are made using isolated pure tones, which in term limits the extent of application of this model but does not discredit it. Moreover the filters' symmetry in time does not match the physiologically characterized 'gamma-tone filters'. Modifying the algorithm to be causal is possible and the filters' temporal envelopes would then become asymmetric, similarly to gamma-tone filters.

Conclusion

There is an analogy which surfaces between these two systems. The location and spatial frequency of visual stimuli is encoded by the neurons in the visual cortex. The adjustment between these two variables is similar to that between timing and frequency in auditory coding.

Another interesting aspect of this parallel is why ICA elucidates the neural response properties in the earlier stages of analysis in the auditory system, while it elucidates the response properties of cortical neurons in the visual system. It must be noted that the neuronal anatomy of both systems differs. In the visual system a bottleneck occurs where information from 100 million photoreceptors is condensed into 1 million optic nerve fibers. The information is then spread by a factor of 50 in the cortex. In the auditory system no bottleneck occurs and information from 3000 cochlea inner hair cells directly bolster onto 30000 auditory nerve fibers. ICA is then actually assigned to the point of expansion in the representation.