1. Physics
Sound is a pressure wave, pressure fluctuations move through space,but each air particle moves only a small distance.
While using a microphone to capture sounds, the physics behind it is AIR PRESSURE, so the waveform files (.wav/.mp3…) will have negative and positive value!
2. Spectrums
The spectrums of a sound plays a center role in determining its quality
we can represent the sound in frequency plot
- the quality of a vowel depends on the shape of its spectrum
- the peaks are called formants
- the quality depends primarily on the first three formant
3. Spectrogram
A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams.
Time-Frequency Plot
4. Auditory
4.1. Loudness
Loudness depends on amplitude of the sound wave $\to$ Amplitude is usually measured in the terms of root-mean-square(over some time window)
perceived loudness is more closely related to intensity, proportional to the square of the amplitude, relative intensity in dB = $20log(\frac{x}{r})$
Loudness of pure tone (>40 dB) in Sones:
$$
N=2^{\frac{(dB-40)}{10}}
$$
5. Sampling
What sampling rate? Due to Nyquist Theorem, twice that frequency
Since the highest frequency ears can perceive is about $20kHz$, we must sample at $2\times20kHz=40kHz$
However, almost all of the information relevant to speech sounds is below $10kHz$, so $20kHz$ sampling rate is enough
In practical, we use a sampling rate of $44.1kHz$