Librosa generated waveplots are flat for certain audio sounds - python-3.x

Certain wave plots generated by librosa’s display module are just flat lines that fill the entire axes.
I used native sampling rates to load some wav files into librosa and my dataset is a mix of stereo and mono files. I know the wave plots are incorrect because it looks nothing like the frequency-time plot of the same files in audacity.
I've tried playing with the figure width, height and DPI, however there are no improvements in the generated waveplots. Below is the waveplot generated by Librosa for one of these audio files and the expected wave plot in audacity.
Librosa Waveplot
Audacity Waveplot
The code used to generate the plot is derived from the librosa documentation:
sound, sr = librosa.load(input_dir, sr=None)
matplotlib.pyplot.figure(figsize=(width, height), dpi=dpi)
librosa.display.waveplot(numpy.array(sound), sr=sr)
matplotlib.pyplot.tight_layout()

Related

Spectrogram PNG to scipy.signal.spectrogram

I currently have a PNG of a spectrogram such as:
I don't have the original audio file but am wondering if there is a way I can convert this into a SciPy spectrogram object. I was thinking I could try to convert the image to an audio file first, but it seems like there aren't many packages reconstructing spectrogram audio since there's already so much lost data.
Any ideas and suggestions would be appreciated!

Wav audio level is too large

I have a mono wav file for a 'glass breaking' sound. When I graphically display it's levels in python using librosa library, it shows very large range of amplitudes, between +/ 20000 instead of +/- 1. When I open same wav file with Audacity, the levels are between +/- 1.
My question is what generates this difference in displayed amplitude levels and how can I correct it in Python? MinMax scaling will distort the sound and I want to avoid it if possible.
The code is:
from scipy.io import wavfile
fs1, glass_break_data = wavfile.read('test_break_glass_normalized.wav')
%matplotlib inline
import matplotlib.pyplot as plt
import librosa.display
sr=44100
x = glass_break_data.astype('float')
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)
These are the images from the notebook and Audacity:
WAV usually uses integer values to represent individual samples, not floats. So what you see in the librosa plot is accurate for a 16 bit/sample audio file.
Programs like VLC show the format, including bit depth per sample in their info dialog, so you can easily check.
Another way to check the format might be using soxi or ffmpeg.
Audacity normalizes everything to floats in the range of -1 to 1—it does not show you the original format.
The same is true for librosa.load()—it also normalizes to [-1,1]. wavfile.read() on the other hand, does not normalize. For more info on ways to read WAV audio, please see for example this answer.
If you use librosa.load instead of wavfile.read it will normalize the range to -1, 1
glass_break_data, fs1 = librosa.load('test_break_glass_normalized.wav')

How do I display a spectrogram from a wav file in C++?

I am doing a project in which I want to embed images into a .wav file so that when one sees the spectrogram using certain parameters, they will see the hidden image. My question is, in C++, how can I use the data in a wav file to display a spectrogram without using any signal processing libraries?
An explanation of the math (especially the Hanning window) will also be of great help, I am fairly new to signal processing. Also, since this is a very broad question, detailed steps are preferable over actual code.
Example:
above: output spectrogram;
below: input audio waveform (.wav file)
Some of the steps (write C code for each):
Convert the data into a numeric sample array.
Chop sample array into some size of chunks, (usually) overlapped.
(usually) Window with some window function.
FFT each chunk.
Take the Magnitude.
(usually) Take the Log.
Assemble all the 1D FFT result vectors into a 2D matrix.
Scale.
Color the matrix.
Render the 2D bitmap.
(optional) (optimize by rolling some of the above into a loop.)
Add plot decorations (scale, grid marks, etc.)

Draw an image as a waveform?

I want to export a series of images (a movie) and draw as a waveform, like how:
https://www.youtube.com/watch?v=M9xMuPWAZW8&feature=youtu.be&t=328 did
http://oscilloscopemusic.com/ offers a program for loading 3D .obj files but I am currently working in 2D.
I tried opening a .WAV downloaded from http://www.wavtones.com/functiongenerator.php in vim and my terminal program crashed.
I tried .cat Downloads/wavTones.com.unregistred.sin_1000Hz_-6dBFS_3s.wav | pbcopy and pasting into a text editor which showed RIFFæ.
What is an algorithm for converting a series of images into a .wav? Ideally I'd like to make many images and string them together to make a movie like Oscilloscope Music does.
The Aphex Twin is using more than 2 colors and I'm not sure what's going on there.
oscilloscope music and the aphex twin thing are very different.
basically:
Oscilloscope Music:
This is an XY-Plot, also called Parametric Plot or 2D Plot.
Visual Side: The idea here is that you have one point (x/y) and you move that dot around (x and y change with t) so quickly that it appears as a line.
Acoustic Side: You separate the dot into the two axes. The movement of the x coordinate becomes the audio signal of the left channel, the movement of the y coordinate becomes the audio signal of the right channel.
Aphex Twin Spectrogram Trick:
A spectrogram displays how strong which frequency appears. Ie. the audio signal is dissembled into a weighted sum of sin waves of different frequencies. The weight (how much of which frequency) is the color, the y coordinate is the pitch (how high that sin wave is), the x coordinate is the time.
I hope this helps clear things up.

Reducing the size of pdf figure file in matplotlib

In matplotlib, I am using LineCollection to draw and color the countries, where the boundaries of the counties are given. When I am saving the figure as a pdf file:
fig.savefig('filename.pdf',dpi=300)
the figure size are quite big. However, on saving them as png file:
fig.savefig('filename.png',dpi=300)
and then converting them to pdf using linux convert command the files are small. I tried reducing the dpi, however that do not change the pdf file size. Is there a way the figures can be saved directly as smaller-pdf files from matplotlib?
The PDF is larger, since it contains all the vector information. By saving a PNG, you produce a rasterized image. It seems that in your case, you can produce a smaller PDF by rasterizing the plot directly:
plt.plot(x, y, 'r-', rasterized=True)
Here, x, y are some plot coordinates. You basically have to use the additionally keyword argument raterized to achieve the effect.
I think using "rasterized = True" effectively saves the image similarly to png format. When you zoom in, you will see blurring pixels.
If you want the figures to be high quality, my suggestion is to sample from the data and make a plot. The pdf file size is roughly the amount of data points it need to remember.

Resources