Digital Music Composition/Understanding Sound

What Is Sound?
Sound is a pressure wave—a travelling variation in pressure. It propagates through a medium (any material object) at a speed that depends on properties of that medium—basically, the stiffer or less compressible the material, the faster the speed of sound through that material. It can propagate through any state of matter, whether solid, liquid or gas. The only place it cannot travel is where there is no matter—in a vacuum, such as that of outer space. It can pass from one material to another where they make contact. Thus, our ears are made of both solid (ossicles) and liquid (cochlea) parts, and normally pick up sound from the gaseous air, but they can also work underwater.

The way in which those pressure waves are converted into nerve signals which are processed by our brains is called psychoacoustics, and is a remarkably complex process, the subtleties of which are still being worked out. Suffice it to say that some of the things you think you can hear, in fact you can’t, and some of the things you think you cannot hear, in fact you can.

Sound Channels
Sound can basically be represented by a scalar quantity, representing a pressure at a point of measurement, varying over time—a single sound channel. Since we have two ears, we could usefully have two such quantities, simultaneously representing the pressure at each ear—two independent sound channels, otherwise known as stereo sound. In fact, nowadays we commonly have more than two channels, to produce surround sound.

But before we get into such complications, let us start by considering the characteristics of a single sound channel.

Time Domain Versus Frequency Domain
Some sounds seem to us more melodious and musical, while others seem harsh and cacophonic. If we plot the pressure wave as a curve over time, we will notice that the more musical sounds tend to have a shape that repeats very regularly over time, while the less musical ones tend to look more irregular.

The rate at which the shape repeats is called the frequency, and this translates directly into the pitch of a musical note. In fact, the sound that a musical instrument makes usually consists of a whole range of frequencies.

So another way to look at the sound is, instead of plotting the pressure amplitude against time (time-domain representation of the sound), we can decompose the sound into its frequency components and plot their strengths against the frequency (frequency-domain representation). This plot is alwo known as a spectrogram. There is a mathematical transformation that lets us move easily back and forth between these two domains, called the Fourier transform. You don’t need to understand the mathematical details of this, other than to appreciate that there is a 1:1 mapping between time-domain and frequency-domain representations—the Fourier transform is invertible.

Harmonics
If a sound signal repeats exactly after a time interval $$t$$, it is called periodic, and the time interval $$t$$ is called the period. An important characteristic of a waveform that is periodic in the time domain is that its frequency-domain representation is discrete—it is concentrated at certain specific frequencies: the fundamental frequency $$f = {1 \over t}$$ (which specifies the number of repeats of the complete waveform you get per unit time), and integer multiples of this: $$2f$$, $$3f$$ etc. These frequencies are called harmonics, and while the fundamental (also known as the first harmonic) tells us the pitch of the note, the higher ones add “colour” or timbre to the sound, which is how we can tell a note played on a violin from the same note played on a clarinet.

If the time duration $$t$$ of the waveform period is measured in seconds, then frequency $$f$$ is in units of 1/seconds, measuring “cycles per second” or hertz (abbreviated “Hz”).

The diagram shows, in the chart on top, 3 cycles of a sawtooth wave with a frequency of 1 Hz (too low to hear, but makes the numbers easier to understand), and in the chart below, the amplitudes of the first few frequency components from the Fourier transform. In this case the strongest component is the fundamental $$f = 1\text{Hz}$$ (and note also the zero component at 0 Hz), with higher harmonics getting progressively weaker but never quite disappearing. Other periodic waveforms can have quite different harmonic distributions.

Frequency Versus Pitch
The range of frequencies that the human ear can hear is generally considered to be around 20 Hz to 20,000 Hz (also written as 20 kHz). In music we have the concept of the pitch of a note, which increases with frequency, but not proportionally. Instead the pitch of a musical note is proportional to the logarithm of the frequency. Specifically, if one note is exactly an octave higher than another in pitch, than the frequency of the former is exactly twice that of the latter. Thus, the range of human hearing covers slightly less than $$\log_2{20000 \over 20} \approx 10$$ octaves.

Consider the most common tuning used in current Western music, which is called concert pitch. In this tuning, the A above middle C (and closest to it) is assigned the frequency 440 Hz. So the A one octave down is 220 Hz, the one one octave above is 880 Hz, and so on. The frequencies of the other notes depend on what temperament you use: if you use equal temperament (a common case), then each semitone interval corresponds to a frequency ratio of $$2^{1 \over 12} \approx 1.0595$$. Thus, for example, the pitch of A♭ (or G♯) one semitone down from reference A is $${440 \div {2^{1 \over 12}}} \approx 415.30$$ Hz, while the pitch of A♯ (or B♭) one semitone up is $$440 \times 2^{1 \over 12} \approx 466.16$$ Hz. And the pitch of middle C is 9 semitones down from the reference A, so its frequency is $${440 \over {2^{9 \over 12}}} \approx 261.63$$ Hz.

Acoustics
Acoustics is the science of sound. Among other things, it studies the spaces (whether indoor or outdoor) in which we perform and listen to sound. Why do certain concert halls sound better than others? How can we even tell that a sound was recorded in a large hall as opposed to a small room?

For example, consider this simple organ riff generated with ZynAddSubFX:

That doesn’t sound very exciting.

Now let us apply an “echo” effect to the sound:

That adds a bit more interest (reminiscent of sound bouncing back to us from the other end of a long room or a large canyon), but it still sounds a bit artificial.

Now compare this version:

The effect is very subtle, but does the sound seem more “alive” to you? Like it was performed in an actual room? That is because of an effect known as reverberation (usually shortened to reverb). It is a mixture of a whole lot of little overlapping echoes from different parts of the room, some closer to the sound source and some further away, some reflecting mainly the lower frequencies and others the higher ones. This spreads out the sound over time in a way that adds interest and realism, because it represents the way we normally hear sounds in the real world.

Here is a reverb effect applied that corresponds to a larger room:

And finally here is what it would sound like in a very large room, namely a cathedral:

A bit of reverb is usually essential in any musical recording, just so long as you don’t overdo it.

Analog Versus Digital
An analog signal is any physical quantity. For example, a microphone can pick up the sound pressure variation and convert it to an electrical voltage variation. In the old days of analog audio processing, this voltage might (after suitable processing, mixing etc) be converted to a corresponding varying magnetic field recorded on a tape, or a wobbling groove on a vinyl record. Then during the playback process, the magnetic field or groove wobble would be converted back to a varying voltage, which ultimately ends up controlling a power amplifier to drive speakers to recreate (something close to) the original sound pressure variation.

In other words, in analog processing, a varying physical quantity is converted to another varying physical quantity. Notionally, these quantities are continuous, and can take on any value within a particular range. But in practice, there are all kinds of inaccuracies in analog equipment, which lead to a loss of fidelity at every stage of the processing.

In digital processing, the physical quantity is measured at regular time intervals (the sampling rate), and converted to a stream of numbers, using an analog-to-digital converter. This number can only be measured to a certain accuracy. But once it is measured, it can be recorded and copied exactly. Then the playback equipment just has to put the numbers through a digital-to-analog converter to turn them back into a physical quantity to produce sound that we can actually hear. So the analog losses are confined just to the two ends of the conversion process, and removed from the stages in-between.

Computer processing of such numbers does lead to rounding errors, but in practice this causes less loss than the corresponding analog processing, and it makes possible new kinds of manipulations that would be simply infeasible with analog equipment.

Sampling converts the (notionally) continuous analog quantity to a discrete digital representation. This representation has a finite accuracy in time, limited by the sampling rate, and each sample has a finite accuracy of measurement, which is represented by the sample size, which is measured in bits. But with present-day technology, these can be made high enough to faithfully represent any sound that the human ear can hear.

Analog “Character”
A lot of musicians and audio engineers have a fondness for the imperfections of analog processing. They talk about the “warmth” or “character” that analog distortion adds to audio, and describe digital audio as often being “cold” and “clinical”.

There is nothing wrong with this point of view. The point is that digital audio offers you a choice: you can still bring analog distortion into the processing chain, but only as and where you want it—you are not forced into having analog “character” imprinted onto your creation at every single stage of the processing.