Audio Processing in Python Part II: Exploring Windowing, Sound Pressure Levels, and A-Weighting Using an iPhone X

September 17, 2018 Joshua Hrisko

In the previous tutorial, I discussed sampling and the Nyquist frequency. Henceforth, I will be acting under the assumption that our sample rate is above the Nyquist frequency. With that in mind, I wanted to discuss another limitation of recording data that is related to sampling, but is not part of the Nyquist theorem. When sampling data, we often want to sample over short periods so that our recording device, computer memory, and processing power don’t get overwhelmed. This short time sampling is called windowing. Windowing is done for a multitude of reasons - most of them having to do with saving resources and reducing error in measurements. I will not cover windowing in great detail here, but I will discuss its consequences to signal processing. In our case here, we will only be using a simple rectangular window, which means that we will be taking chunks of data and calculating the frequency spectrum from the raw data. Other types of windows (Hanning, Hamming, Welch, Blackman, etc.) solve issues resulting from taking data in chunks, so they can be incredibly powerful.

The most significant artifact of windowing is its limitation on resolving natural frequencies. When windowing a signal, the length of the window determines the minimum frequency resolvable in the system. For example, if we are sampling a 100Hz sine wave, the minimum window length needed to resolve the 100 Hz wave is 0.01 s (1/100 s), which is the period of the wave. This can be difficult, especially when analyzing lower frequencies because sometimes long periods of data is not possible. In audio, we are typically interested in the 20 Hz - 20 kHz range, so the minimum window period is 10 ms.

See this content in the original post

Figure 1: Window of a 100 Hz sine wave (on the left: time series, on the right: frequency spectrum). The Fourier spectrum demonstrates the insufficient length of a window at the period of the lowest natural frequency of the system.

Figure 2: Window of a 100 Hz sine wave for five cycles (on the left: time series, on the right: frequency spectrum). In this figure, the Fourier spectrum doesn’t have trouble resolving the 100 Hz peak - five cycles is often enough to resolve the signal (assuming high signal amplitude compared to noise). In high-noise cases, longer periods may be necessary.

Windowing technique is import when sampling data because of its relation to the natural frequencies of the system and also the reduction of noise. During our analysis, I will ensure that our window period is at least 5x the lowest predicted frequency. In acoustics, the lowest frequency can be assumed to be 20 Hz (sometimes 10 Hz), so we can assume a minimum window period of 250 ms. In cases like the range of the human voice, we see much shorter periods (25-50 ms) because the lowest frequency for a human voice is between 80-180 Hz , but the range of vocal analysis assumes an average low frequency of 200 Hz to emulate the time and frequency response of the human ear. In our case, I’ll be using 8192 points at 44.1 kHz - which gives us a window period of about 185 ms, which is sufficient for the low audible frequencies (minimum 27 Hz) while also respecting the computational resources of the Raspberry Pi.

See this content in the original post

Signal Gen App on the iTunes App Store

Below is the routine for recording audio (taken from the Recording Audio on the Raspberry Pi tutorial) and taking the FFT of the signal (taken from the Audio Processing in Python Part I). I used an iPhone to generate known sine wave frequencies so that I could check the accuracy of the FFT algorithm. The app I used is the ‘Signal Gen’ app on the iTunes App Store. The app allows you to choose the amplitude and frequency of the generated sine wave. It is a powerful tool for testing and validating the FFT windowing accuracy.

See this content in the original post

The code above records 182 ms of data via pyaudio and analyzes the signal using the Fast Fourier Transform. There is also a microphone sensitivity correction to go from Volts to Pascals using the manufacturer specification of my microphone. In my case, the sensitivity was -47 dBV/Pa. If the notation dB is unclear, it stands for decibel, which is a common notation for acoustic and electric signals. The decibel, in this case can be defined as:

See this content in the original post

where y is a voltage. In the case of microphone sensitivity, we can convert our voltage readings to the more meaningful Pascal, the pressure unit significant in acoustic measurements:

See this content in the original post

In the case of my microphone, the sensitivity is - 47 dBV/Pa, so I can convert the voltage readings using the equation above. My sensitivity becomes:

See this content in the original post

We multiply this value by the measurement voltage (after conversion from the 16-bit reading) to get the measurement in Pascals. Since we are recording from a USB port, I’ll be using 5.25 Volts as the peak voltage reading. This gets us closer to a realistic response of the microphone. It may not be obvious why we’re multiplying by the conversion, instead of dividing - however, if we switch to decibels it’s easier to see the reason why we’re multiplying.

Below is the output from the code above using a 1 kHz test sine wave generated from a smartphone.

Figure 3: Spectral plot of a 1 kHz sine wave signal. The max resolvable frequency resolution is 2.7 Hz, meaning, that at 1 kHz we expect a maximum accuracy of 0.27%. In this specific case, we measured 1001.3 Hz, or 0.13% error. The amplitude was measured to be just under 3 mPa. An iPhone was used to generate the 1 kHz sine wave.

The Δf_max signifies the maximum error between frequency calculations. This means, in our case, we are measuring 1001.3 Hz, and with a known input frequency of 1 kHz, we have an error of 0.13%, however, if we did not know the input frequency, we could say that we expect a maximum error of 0.27% ((1002.7-1000.0)/1000.0 = 0.0027) in the frequency measurement. This can be calculated by dividing the sample rate by twice the window size.

See this content in the original post

I wanted to test the validity of the microphone pressure levels of the microphone. I did this by using a white noise generator on my smartphone at its peak volume. One website used a sound level meter to test the maximum volume of a series of iPhones, so I used my microphone to record white noise at my smartphone’s loudest volume to see if I approach the same sound pressure level recorded on that site (see the study here). I used a 1 second recording period to emulate a slow response sound level meter (SLM). I also had to calculate the A-weighted values for the frequency spectrum to emulate the human ear’s response. This was done using the A-weighting function (IEC 61672-1):

iPhone X image courtesy of William Hook (Flickr)

See this content in the original post

The A-weighting function in the decibel domain takes on the following form:

See this content in the original post

When plotted, the significance of the A-weighting becomes more obvious:

Figure 4: A-weighting function in decibels, used to weight sound measurements to appeal to the human auditory system.

When using the weighting function, we will add the weighted values in decibels at specific frequencies to the measured data in decibels, then convert the new values back to Pascals, sum up the values in each frequency band, then convert that singular number back to decibels for an A-weighted sound pressure level, which is represented as dBA.

See this content in the original post

One widely used term in acoustics, specifically with regard to pressure measurement, is the decibel. Acoustically speaking, the decibel is a logarithmic conversion at a reference pressure level, usually 20 micro-Pascals. This can be written as follows:

See this content in the original post

The P₁ above is the converted pressure level from the microphone voltage. Once we convert the pressure to decibels using the equation above, we can compute the A-weighted distribution and finally calculate the average A-weighted sound pressure level - something that is used to define noise pollution and harmful acoustic environments.

I recorded one second of white noise using a smartphone (iPhone X) to measure the accuracy of the microphone and my Python sound pressure level (SPL) calculations. The study I mentioned above (link) measured the iPhone 6 at about 105 dBA. I put the iPhone X on the highest volume with white noise playing and I calculated 102 dBA (reference pressure 20 uPa, measured within inches of the microphone). The recorded white noise is below:

See this content in the original post

Your browser doesn't support HTML5 audio

White Noise Example Recording

See this content in the original post

In this entry in the acoustic signal processing series, I discussed in-depth the importance of sampling windows and interpreting real data using a microphone and its specifications. I was using a cheap USB microphone to analyze known signals (1 kHz sine wave, and white noise) to understand how the Fast Fourier Transform processes acoustic signals. Finally, I introduced weighting functions and the decibel along with their significance to the human auditory system. At this point in the acoustics series, the user should be capable of recording and analyzing acoustic signals with some physical significance, primarily limited by the resolution of the microphone and recording system (Raspberry Pi and Python). That being said, we were able to closely replicate the performance of the iPhone under similar test conditions of a previous study analyzing the peak volume of the device. In the next entry of the acoustic signal processing series, I will discuss updating periodograms and frequency spectrum plots for changing acoustic systems.

See this content in the original post

See More in Audio and Python:

See this gallery in the original post