How to extract the envelope from a sound file? - audio

I need to extract the sound wave envelope from a set of audio files in a batch job on a headless Linux host.
I think this can be achieved using the Nyquist programming language, a variant of Lisp notably embedded in audacity, and dedicated to sound processing. However, if I have some familiarities with Lisp, I don't have any previous experience with Nyquist.
Is there some primitive in Nyquist to achieve that directly? Or how can I write a short Nyquist program to extract the envelope of a sound?
I tried using the snd-avg function. But it does not seem to produce a smooth envelope (upper half in the screenshot):
(snd-avg
(aref *track* 0) ; first sound
(truncate (/ *sound-srate* 50)) ; 50 Hz to samples
1 op-peak) ; follow peak values

I obtained decently good results by applying a low-pass filter after snd-avg in order to remove the staircase ripples:
(lp
(snd-avg
(aref *track* 0)
(truncate (/ *sound-srate* 1000)) ; 1kHz
1 op-peak)
50) ; 50 Hz low-pass

Related

How to impliment a Single Sideband Suppressed Carrier Modulator with an audio file as input?

I have been given an audio signal which I imported into Octave using audioread. I have obtained fs and can naturally plot the time domain signal. After an FFT the frequency domain can easily be plotted.
My Question is how do I take this signal as input and modulate it using SSB-SC modulation in Octave? I believe I first have to create a DSB and then filter the sidebands using filters, but I am also unsure of how to create the DSB, the filter I may be able to create. Any suggestions will be greatly appreciated.
There are several ways to implement SSB-SC modulation. See for instance Single-sideband modulation - Practical implementations on Wikipedia. For more detail, there's a nice tutorial about SSB at
http://www.eng.auburn.edu/~roppeth/courses/TIMS-manuals-r5/TIMS%20Experiment%20Manuals/Student_Text/Vol-A2/A2-03.pdf
Octave/Matlab has these building blocks useful for implementing SSB modulation techniques:
x .* exp((2j * pi * f / sample_rate) * (1:length(x)) to shift a signal in frequency, where x is an array of samples in the time domain (modulation / frequency shifting property).
filter to apply an FIR or IIR filter. To design a filter, a couple options are firls or fir1, among others in the signal package.
hilbert for the Hilbert transform (analytic extension) of a real-valued signal.

The Sound of Hydrogen using the NIST Spectral Database

In the video The Sound of Hydrogen (original here), the sound
is created using the NIST Atomic Spectra Database and then importing this edited data into Mathematica to modulate a Sine Wave. I was wondering how he turned the data from the website into the values shown in the video (3:47 - top of the page) because it is nothing like what is initially seen on the website.
Short answer: It's different because in the tutorial the sampling rate is 8 kHz while it's probably higher in the original video.
Long answer:
I wish you'd asked this on http://physics.stackexchange.com or http://math.stackexchange.com instead so I could use formulae... Use the bookmarklet
javascript:(function(){function%20a(a){var%20b=a.createElement('script'),c;b.src='https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML.js',b.type='text/javascript',c='MathJax.Hub.Config({tex2jax:{inlineMath:[[\'$\',\'$\']],displayMath:[[\'\\\\[\',\'\\\\]\']],processEscapes:true}});MathJax.Hub.Startup.onload();',window.opera?b.innerHTML=c:b.text=c,a.getElementsByTagName('head')[0].appendChild(b)}function%20b(b){b.MathJax===undefined?a(b.document):b.MathJax.Hub.Queue(new%20b.Array('Typeset',b.MathJax.Hub))}var%20c=document.getElementsByTagName('iframe'),d,e;b(window);for(d=0;d<c.length;d++)e=c[d].contentWindow||c[d].contentDocument,e.document||(e=e.parentNode),b(e)})()
to render the formulae with MathJax:
First of all, note how the Rydberg formula provides the resonance frequencies of hydrogen as $\nu_{nm} = c R \left(\frac1{n^2}-\frac1{m^2}\right)$ where $c$ is the speed of light and $R$ the Rydberg constant. The highest frequency is $\nu_{1\infty}\approx 3000$ THz while for $n,m\to\infty$ there is basically no lower limit, though if you restrict yourself to the Lyman series ($n=1$) and the Balmer series ($n=2$), the lower limit is $\nu_{23}\approx 400$ THz. These are electromagnetic frequencies corresponding to light (not entirely in the visual spectrum (ranging from 430–790 THz), there's some IR and lots of UV in there which you cannot see). "minutephysics" now simply considers these frequencies as sound frequencies that are remapped to the human hearing range (ca 20-20000 Hz).
But as the video stated, not all these frequencies resonate with the same strength, and the data at http://nist.gov/pml/data/asd.cfm also includes the amplitudes. For the frequency $\nu_{nm}$ let's call the intensity $I_{nm}$ (intensity is amplitude squared, I wonder if the video treated that correctly). Then your signal is simply
$f(t) = \sum\limits_{n=1}^N \sum\limits_{m=n+1}^M I_{nm}\sin(\alpha(\nu_{nm})t+\phi_{nm})$
where $\alpha$ denotes the frequency rescaling (probably something linear like $\alpha(\nu) = (20 + (\nu-400\cdot10^{12})\cdot\frac{20000-20}{(3000-400)\cdot 10^{12}})$ Hz) and the optional phase $\phi_{nm}$ is probably equal to zero.
Why does it sound slightly different? Probably the actual video did use a higher sampling rate than the 8 kHz used in the tutorial video.

Signal Processing and Audio Beat Detection

I am trying to do some work with basic Beat Detection (in both C and/or Java) by following the guide from GameDev.net. I understand the logic behind the implementation of the algorithms, however I am confused as to how one would get the "sound amplitude" data for the left and right channels of a song (i.e. mp3 or wav).
For example, he starts with the following assumption:
In this model we will detect sound energy variations by computing the average sound energy of the signal and comparing it to the instant sound energy. Lets say we are working in stereo mode with two lists of values : (an) and (bn). (an) contains the list of sound amplitude values captured every Te seconds for the left channel, (bn) the list of sound amplitude values captured every Te seconds for the right channel.
He then proceeds to manipulate an and bn using his following algorithms. I am wondering how one would do the Signal Processing necessary to get an and bn every Te seconds for both channels, such that I can begin to follow his guide and mess around with some simple Beat Detection in songs.
An uncompressed audio file (a .wav or.aiff for example) is for the most part a long array of samples. Each sample consists of the amplitude at a given point in time. When music is recorded, many of these amplitude samples are taken each second.
For stereo (2-channel) audio files, the samples in the array usually alternate channels: [sample1 left, sample1 right, sample2 left, sample2 right, etc...].
Most audio parsing libraries will already have a way of returning the samples separately for each channel.
Once you have the sample array for each channel, it is easy to find the samples for a particular second, as long as you know the sample rate, or number of samples per second. For example, if the sample rate for your file is 44100 samples per second, and you want to capture the samples in n th second, you would use the part of your vector that is between (n * 44100 ) and ((n + 1) * 44100).

Correctly decoding/encoding raw PCM data

I'm writing my WAVE decoder/encoder in C++. I've managed to correctly convert between different sample sizes (8, 16 and 32), but I need some help with the channels and the frequency.
Channels:
If I want to convert from stereo to mono:
do I just take the data from one channel (which one? 1 or 2?)?
or do I take the average from channel 1 and 2 for the mono channel.
If I want to convert from mono to stereo:
(I know this is not very scientific)
can I simply add the samples from the single channels into both the stereo channels?
is there a more scientific method to do this (eg: interpolation)?
Sample rate:
How do I change the sample rate (resample), eg: from 44100 Hz to 22050 Hz:
do I simply take the average of 2 sequential samples for the new (lower frequency) value?
Any more scientific algorithms for this?
Stereo to mono - take the mean of the left and right samples, i.e. M = (L + R) / 2 - this works for the vast majority of stereo content, but note that there are some rare cases where you can get left/right cancellation.
Mono to stereo - put the mono sample in both left and right channels, i.e. L = R = M - this gives a sound image which is centered when played as stereo
Resampling - for a simple integer ratio downsampling as in your example above, the process is:
low pass filter to accommodate new Nyquist frequency, e.g. 10 kHz LPF for 22.05 kHz sample rate
decimate by required ratio (i.e. drop alternate samples for your 2x downsampling example)
Note that there are third party libraries such as libsamplerate which can handle resampling for you in the general case, so if you have more than one ratio you need to support, or you have some tricky non-integer ratio, then this might be a better approach

Free Wavetable Synthesizer?

I need to implement a wavetable synthesizer in an ARM Cortex-M3 core. I'm looking for any code or tools to help me get started.
I'm aware of this AVR implementation. I actually converted it to a PIC a while back. Now I am looking for something similar, but a little better sounding.
ANSI C code would be great. Any code snippets (C or C++), samples, tools, or just general information would be greatly appreciated.
Thanks.
The Synthesis Toolkit (STK) is excellent, but it is C++ only:
http://ccrma.stanford.edu/software/stk/
You may be able to extract the wavetable synthesizer code from the STK though.
Two open-source wavetable synthesizers are FluidSynth and TiMidity.
Any ARM synth, the best ones, can be changed to wavescanner in less than a day. Scanning the wave from files or generating them mathematically is nearly the same thing audio wise, WT provides massive banks of waveforms at zero processing cost, you need the waves, the WT oscillator code itself is 20 lines. so change your waveform knob from 3 to 100 to indicate which WAV you are reading, use a ramp/counter to read the WAV files(as arrays). WT fixed.
From 7 years of Synth experience, i'd recommend to change 20 lines of the oscillator function of your favorite synth to adapt it to read wave arrays. The WT only uses 20 lines of logic, the rest of the synthesizer is more important: LFO's, Filters, input parameters, preset memory... Use your favorite synth instead and find a WT wave library as WAV files and folders, and replace your fav synth oscillators with WT functions, it will sound almost the same, only lower processing costs.
A synth normally uses Sin, Sqr, Saw, Antialiased OSC functions for the wave...
A wavetable synth uses about 20 lines of code at it's base, and 10/20/100ds of waves, each wave sampled at every octave ideally. If you can get a wavetable sound library, the synth just loops, pitch shifts, the sounds, and pro synths can also have multiple octave to mix the octaves.
WTfunction =
load WAV files into N arrays
change waveform = select waveform array from WAV list
read waveform array at desired Hz
wavescanner function =
crossfade between 2 waves and assign xfade to LFO, i.e. sine and xfade.
The envelope, filter, amplitude, all other functions are independent from the wave generation function in all synths.
remember the the most powerful psychoacoustic tool for synthesizers is deviation from the digital tone of the notes, it's called unison detune, sonic character of synthesizers mostly comes from chorus and unison detune.
WT's are either single periods of waves of longer sections, in more advanced synths. the single period stuff is super easy to write into code. the advanced WT's are sampled per octave with waves lasting N periods, even 2-3 seconds, i.e. piano, and that means that they change sound quality through the octaves, so the complex WT's are crossfaded every octave with multiple octave recordings.

Resources