Speech-shaped noise (SSN) is noise whose long-term average spectrum is similar to that of speech. It is primarily used as a masker in studies of speech perception (e.g., Nelson et al., 2003; Qin & Oxenham, 2003). Recently, I had to generate some SSN in python. After some googling, it seemed that the most straightforward approach was to create white noise, then filter it using the FFT of a speech signal. Here is the code snippet I used to create the SSN:
The script requires two additional files to run. The first is mix.wav, a mixture of a few hundred German sentences spoken by several different male speakers. The second is fftfilt.py. To avoid edge effects caused by the filtering, I created a long SSN waveform then chopped it to the desired duration. The spectrogram of the resulting SSN looks pretty good:
Next I compared the long-term spectrum of the SSN to that of the speech mixture:
As you can see, the two spectra are not identical. It looks like the SSN spectrum is offset by a constant amount relative to the mixture spectrum. To test this, I computed the difference in amplitude between the two spectra across frequencies:
This function is noisy, indicating that the spectra are not an exact match. However, it appears to be roughly flat, on average, across frequencies, which is what we want. You can listen to this particular SSN sample below:
Banner image is Number 1 (Lavender Mist) by Jackson Pollock.
- Qin, M. K., & Oxenham, A. J. (2003). Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers. Journal of the Acoustical Society of America, 114, 446–454.
- Nelson, P. B., Jin, S-H., Carney, A. E., & Nelson, D. A. (2003). Understanding speech in modulated interference: Cochlear implant users and normal-hearing listeners. Journal of the Acoustical Society of America, 113, 961–968.