An Algorithm for producing fake audio visualizer - audio

Does anybody knows an algorithm for making a random series of numbers (like 100 java-byte (>=-127 & <= 127) ) which when are drawn as a bar chart, would be similar to a regular audio spectrum, like those SoundCloud ones?
I'm trying to write one, it has multiple Random and Sinus calculations, but the result is very ugly, it's something between a sinus wave and an old toothbrush. I would be very thankful if you code direct me to a one which is aesthetically convincing
An algorithm with an explanation (and/or picture) is fine. A pseudocode would be very nice of you. An actual JAVA code is bonus. :D
Edit:
This is the code I'm using right now. It's convoluted but I'm basically adding a random deviation to a sinus wave with random amplitude (which I'm not sure if it was a good idea).
private static final int FREQ = 7;
private static final double DEG_TO_RAD = Math.PI / 180;
private static final int MAX_AMPLITUDE = 127;
private static final float DEVIATION = 0.1f; // 10 percent is maximum deviation
private void makeSinusoidRandomBytes() {
byte[] bytes = new byte[AUDIO_VISUALIZER_DENSITY];
for (int i = 0; i < AUDIO_VISUALIZER_DENSITY; i++) {
int amplitude = random.nextInt(MAX_AMPLITUDE) - MAX_AMPLITUDE/2;
byte dev = (byte) (random.nextInt((int) Math.max(Math.abs(2 * DEVIATION * amplitude), 1))
- Math.abs(DEVIATION * amplitude));
bytes[i] = (byte) (Math.sin(i * FREQ * DEG_TO_RAD) * amplitude - dev);
}
this.bytes = bytes;
}

A real soundwave is actually a combination of sine waves of different frequencies and amplitudes added together, not random deviations from a sine wave. The difficult part will be to choose a combination of wave amplitudes and frequencies that will produce the output that you will subjectively like! However, most sound waves have a base frequency and then a number of overtones which "fit into" that wavelength - for example it might have an overtone at 3/2 * the base frequency and at amplitude of 2/3 the base frequency. By combining these overtones and scaling the resulting waveform to the -127 - +127 range, you'll get an actual soundwave.
The following code is C#, but close enough to Java to give you an idea. It's from a game, where I needed to combine many sine waves together to create various types of oscillating effects:
/// <summary>
/// Return a value between 0 and 1 based on a sine-wave oscillating with a given combination of periods at a given point in time
/// </summary>
/// <param name="time">time to get wave value at</param>
/// <param name="periods">lengths of waves</param>
/// <returns>height of wave</returns>
public static float MultiPulse(float time, params float[] periods)
{
float c = 0;
foreach (float p in periods)
{
float cp = (MathHelper.Pi / p) * time;
float s = ((float)Math.Sin(cp) + 1) / 2;
c += s / periods.Length;
}
return c;
}
You probably want to modify that to allow you to specify different amplitudes as well as periods for the waves you are combining.
By combining many widely varying amplitudes and periods (frequencies) you should by trial and error be able to get something convincing.

Based on the idea see sharper gave me, this is the code I'm using right now:
int mainAmp = random.nextInt(MAX_AMPLITUDE) - MAX_AMPLITUDE / 2;
int overtoneAmp = random.nextInt(MAX_AMPLITUDE * 2 / 3) - MAX_AMPLITUDE / 3;
int overtone2Amp = random.nextInt(MAX_AMPLITUDE * 4 / 7) - MAX_AMPLITUDE / 2 * 7;
int mainFreq = random.nextInt(7) + 7;
int overtoneFreq = mainFreq * 3 / 2;
int overtone2Freq = mainFreq * 7 / 4;
byte[] bytes = new byte[AUDIO_VISUALIZER_DENSITY];
for (int i = 0; i < AUDIO_VISUALIZER_DENSITY; i++) {
bytes[i] = (byte) (Math.sin(i * mainFreq * DEG_TO_RAD) * mainAmp
+ Math.sin(i * overtoneFreq * DEG_TO_RAD) * overtoneAmp
+ Math.sin(i * overtone2Freq * DEG_TO_RAD) * overtone2Amp);
}
Main frequency is between 8 and 15 for my app. You can play with those. The other two overtones I'm using are (2 - 1/2)x & (2 - 1/4)x of main frequency. You can add more like (2 - 1/8)x etc. Or use another series of frequencies. I also randomize the amplitude to get a unique wave each time.
These are some waves I'm drawing using this code:

Related

Can I do random writes from a kernel without worrying about synchronization issues?

Consider a simple depth-of-field filter (my actual use case is similar). It loops over the image and scatters every pixel over a circular neighborhood of its. The radius of the neighborhood depends on the depth of the pixel - the closer the it is to the focal plane, the smaller the radius.
Note that I said "scatters" and not "gathers". In simpler image processing applications, you normally use the "gather" technique to perform an uniform Gaussian blur. IOW, you loop over the neighborhood of each pixel, and "gather" the nearby values into a weighted average. This works fine in that case, but if you make the blur kernel vary between pixels, while still using "gathering", you'll get a somewhat unrealistic effect. Such "space-variant filtering" scenarios are where "scattering" is different from "gathering".
To be clear: the scatter algo is something like this:
init resultImage to black
loop over sourceImage
var c = fetch current pixel from sourceImage
var toAdd = c * weight // weight < 1
loop over circular neighbourhood of current sourcepixel
add toAdd to current neighbor from resultImage
My question is: if I do a direct translation of this pseudocode to OpenCL, will there be synchronization issues due to different work-items simultaneously writing to the same output pixel?
Does the answer vary depending on whether I'm using Buffers or Images?
The course I'm reading suggests that there will be synchronization issues. But OTOH I read the source of Mandelbulber 1.21-2, which does a straightforward OpenCL DOF just like my above pseudocode, and it seems to work fine.
(the relevant code is in mandelbulber-opencl-1.21-2.orig/usr/share/cl/cl_DOF.cl and it's as follows)
//*********************************************************
// MANDELBULBER
// kernel for DOF effect
//
//
// author: Krzysztof Marczak
// contact: buddhi1980#gmail.com
// licence: GNU GPL v3.0
//
//*********************************************************
typedef struct
{
int width;
int height;
float focus;
float radius;
} sParamsDOF;
typedef struct
{
float z;
int i;
} sSortZ;
//------------------ MAIN RENDER FUNCTION --------------------
kernel void DOF(__global ushort4 *in_image, __global ushort4 *out_image, __global sSortZ *zBuffer, sParamsDOF p)
{
const unsigned int i = get_global_id(0);
uint index = p.height * p.width - i - 1;
int ii = zBuffer[index].i;
int2 scr = (int2){ii % p.width, ii / p.width};
float z = zBuffer[index].z;
float blur = fabs(z - p.focus) / z * p.radius;
blur = min(blur, 500.0f);
float4 center = convert_float4(in_image[scr.x + scr.y * p.width]);
float factor = blur * blur * sqrt(blur)* M_PI_F/3.0f;
int blurInt = (int)blur;
int2 scr2;
int2 start = (int2){scr.x - blurInt, scr.y - blurInt};
start = max(start, 0);
int2 end = (int2){scr.x + blurInt, scr.y + blurInt};
end = min(end, (int2){p.width - 1, p.height - 1});
for (scr2.y = start.y; scr2.y <= end.y; scr2.y++)
{
for(scr2.x = start.x; scr2.x <= end.x; scr2.x++)
{
float2 d = scr - scr2;
float r = length(d);
float op = (blur - r) / factor;
op = clamp(op, 0.0f, 1.0f);
float opN = 1.0f - op;
uint address = scr2.x + scr2.y * p.width;
float4 old = convert_float4(out_image[address]);
out_image[address] = convert_ushort4(opN * old + op * center);
}
}
}
No, you can't without worrying about synchronization. If two work items scatter to the same location without synchronization, you have a race condition and won't get the correct results. Same for both buffers and images. With buffers you could use atomics, but they can slow down your code, especially when there is contention (but even when not). AFAIK, read/write images don't have atomic operations.

Processing fft crash

Weird thing. I keep getting processing or java to crash with this code which is based on a sample code from the processing website.
On pc it doesn't work at all, on one mac it works for 5 seconds until it crushes and on another mac it just crust and gives me this:
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: RtApiCore::probeDeviceOpen: the device (2) does not support the requested channel count.
Could not run the sketch (Target VM failed to initialize).
Do you think it's a problem with the library or with the code?
If it's a problem with the library, could you recommend the best sound library to do something like this?
Thank you :)
import processing.sound.*;
FFT fft;
AudioIn in;
int bands = 512;
float[] spectrum = new float[bands];
void setup() {
size(900, 600);
background(255);
// Create an Input stream which is routed into the Amplitude analyzer
fft = new FFT(this, bands);
in = new AudioIn(this, 0);
// start the Audio Input
in.start();
// patch the AudioIn
fft.input(in);
}
void draw() {
background(255);
int midPointW = width/2;
int midPointH = height/2;
float angle = 1;
fft.analyze(spectrum);
//float radius = 200;
for(int i = 0; i < bands; i++){
// create the actions for placing points on a circle
float radius = spectrum[i]*height*10;
//float radius = 10;
float endX = midPointW+sin(angle) * radius*10;
float endY = midPointH+cos(angle) * radius*10;
float startX = midPointW+sin(angle) * radius*5;
float startY = midPointH+cos(angle) * radius*5;
// The result of the FFT is normalized
// draw the line for frequency band i scaling it up by 5 to get more amplitude.
line( startX, startY, endX, endY);
angle = angle + angle;
println(endX, "" ,endY);
// if(angle > 360){
// angle = 0;
// }
}
}
If you print the values you use like angle and start x,y you'll notice that:
start/end x,y values become NaN(not a number - invalid)
angle quickly goes to Infinity (but not beyond)
One of the main issues is this line:
angle = angle + angle;
You're exponentially increasing this value which you probably don't want.
Additionally, bare in mind trigonometric functions such as sin() and cos() use radians not degrees, so values tend to be small. You can constrain the values to 360 degrees or TWO_PI radians using the modulo operator(%) or the constrain() function:
angle = (angle + 0.01) % TWO_PI;
You were very close though as your angle > 360 check shows it. Not sure why you've left that commented out.
Here's your code with the tweak and comments
import processing.sound.*;
FFT fft;
AudioIn in;
int bands = 512;
float[] spectrum = new float[bands];
void setup() {
size(900, 600);
background(255);
// Create an Input stream which is routed into the Amplitude analyzer
fft = new FFT(this, bands);
in = new AudioIn(this, 0);
// start the Audio Input
in.start();
// patch the AudioIn
fft.input(in);
}
void draw() {
background(255);
int midPointW = width/2;
int midPointH = height/2;
float angle = 1;
fft.analyze(spectrum);
//float radius = 200;
for (int i = 0; i < bands; i++) {
// create the actions for placing points on a circle
float radius = spectrum[i] * height * 10;
//float radius = 10;
float endX = midPointW + (sin(angle) * radius * 10);
float endY = midPointH + (cos(angle) * radius * 10);
float startX = midPointW + (sin(angle) * radius * 5);
float startY = midPointH + (cos(angle) * radius * 5);
// The result of the FFT is normalized
// draw the line for frequency band i scaling it up by 5 to get more amplitude.
line( startX, startY, endX, endY);
//angle = angle + angle;
angle = (angle + 0.01) % TWO_PI;//linearly increase the angle and constrain it to a 360 degrees (2 * PI)
}
}
void exit() {
in.stop();//try to cleanly stop the audio input
super.exit();
}
The sketch ran for more than 5 minutes but when closing the sketch I still encountered JVM crashes on OSX.
I haven't used this sound library much and haven't looked into it's internals, but it might be a bug.
If this still is causing problems, for pragmatic reasons I'd recommend installing a different Processing library for FFT sound analysis via Contribution Manager.
Here are a couple of libraries:
Minim - provides some nice linear and logarithmic averaging functions that can help in visualisations
Beads - feature rich but more Java like syntax. There's also a free book on it: Sonifying Processing
Both libraries provide FFT examples.

Why does this programmatically generated musical chord not sound correct?

I have the following class which generates a buffer containing sound data:
package musicbox.example;
import javax.sound.sampled.LineUnavailableException;
import musicbox.engine.SoundPlayer;
public class CChordTest {
private static final int SAMPLE_RATE = 1024 * 64;
private static final double PI2 = 2 * Math.PI;
/*
* Note frequencies in Hz.
*/
private static final double C4 = 261.626;
private static final double E4 = 329.628;
private static final double G4 = 391.995;
/**
* Returns buffer containing audio information representing the C chord
* played for the specified duration.
*
* #param duration The duration in milliseconds.
* #return Array of bytes representing the audio information.
*/
private static byte[] generateSoundBuffer(int duration) {
double durationInSeconds = duration / 1000.0;
int samples = (int) durationInSeconds * SAMPLE_RATE;
byte[] out = new byte[samples];
for (int i = 0; i < samples; i++) {
double value = 0.0;
double t = (i * durationInSeconds) / samples;
value += Math.sin(t * C4 * PI2); // C note
value += Math.sin(t * E4 * PI2); // E note
value += Math.sin(t * G4 * PI2); // G note
out[i] = (byte) (value * Byte.MAX_VALUE);
}
return out;
}
public static void main(String... args) throws LineUnavailableException {
SoundPlayer player = new SoundPlayer(SAMPLE_RATE);
player.play(generateSoundBuffer(1000));
}
}
Perhaps I'm misunderstanding some physics or math here, but it seems like each sinusoid ought to represent the sound of each note (C, E, and G), and by summing the three sinusoids, I should hear something similar to when I play those three notes simultaneously on the keyboard. What I'm hearing, however, is not even close to that.
For what it's worth, if I comment out any two of the sinusoids and keep the third, I do hear the (correct) note corresponding to that sinusoid.
Can somebody spot what I'm doing wrong?
To combine audio signals you need to average their samples, not sum them.
Divide the value by 3 before converting to byte.
You don't say in what way it sounds incorrect, adding three sin values like that you are going to get a signal that ranges from -3.0 to 3.0 and so is going to clip when you apply your *Byte.MAX_VALUE, this is why averaging probable worked for you, adding is correct its just you need to scale the result after to prevent clipping and dividing by the number of sine waves is the easiest way to do this. But if you start changing the number of sine waves dynamically and try to use the same strategy you wont get the result you expect, you have to scale the signal for when you signal is at its loudest. Remember real audio is not going to be at maximum amplitude so you don't have to worry about it two much if you synthesised audio isn't, also, the way we perceive sound volume is logarithmic so a signal at half amplitude is a difference of -3dB which is pretty close to the smallest change in amplitude we can hear.

Finding the local maxima/peaks and minima/valleys of histograms

Ok, so I have a histogram (represented by an array of ints), and I'm looking for the best way to find local maxima and minima. Each histogram should have 3 peaks, one of them (the first one) probably much higher than the others.
I want to do several things:
Find the first "valley" following the first peak (in order to get rid of the first peak altogether in the picture)
Find the optimum "valley" value in between the remaining two peaks to separate the picture
I already know how to do step 2 by implementing a variant of Otsu.
But I'm struggling with step 1
In case the valley in between the two remaining peaks is not low enough, I'd like to give a warning.
Also, the image is quite clean with little noise to account for
What would be the brute-force algorithms to do steps 1 and 3? I could find a way to implement Otsu, but the brute-force is escaping me, math-wise. As it turns out, there is more documentation on doing methods like otsu, and less on simply finding peaks and valleys. I am not looking for anything more than whatever gets the job done (i.e. it's a temporary solution, just has to be implementable in a reasonable timeframe, until I can spend more time on it)
I am doing all this in c#
Any help on which steps to take would be appreciated!
Thank you so much!
EDIT: some more data:
most histogram are likely to be like the first one, with the first peak representing background.
Use peakiness-test. It's a method to find all the possible peak between two local minima, and measure the peakiness based on a formula. If the peakiness higher than a threshold, the peak is accepted.
Source: UCF CV CAP5415 lecture 9 slides
Below is my code:
public static List<int> PeakinessTest(int[] histogram, double peakinessThres)
{
int j=0;
List<int> valleys = new List<int> ();
//The start of the valley
int vA = histogram[j];
int P = vA;
//The end of the valley
int vB = 0;
//The width of the valley, default width is 1
int W = 1;
//The sum of the pixels between vA and vB
int N = 0;
//The measure of the peaks peakiness
double peakiness=0.0;
int peak=0;
bool l = false;
try
{
while (j < 254)
{
l = false;
vA = histogram[j];
P = vA;
W = 1;
N = vA;
int i = j + 1;
//To find the peak
while (P < histogram[i])
{
P = histogram[i];
W++;
N += histogram[i];
i++;
}
//To find the border of the valley other side
peak = i - 1;
vB = histogram[i];
N += histogram[i];
i++;
W++;
l = true;
while (vB >= histogram[i])
{
vB = histogram[i];
W++;
N += histogram[i];
i++;
}
//Calculate peakiness
peakiness = (1 - (double)((vA + vB) / (2.0 * P))) * (1 - ((double)N / (double)(W * P)));
if (peakiness > peakinessThres & !valleys.Contains(j))
{
//peaks.Add(peak);
valleys.Add(j);
valleys.Add(i - 1);
}
j = i - 1;
}
}
catch (Exception)
{
if (l)
{
vB = histogram[255];
peakiness = (1 - (double)((vA + vB) / (2.0 * P))) * (1 - ((double)N / (double)(W * P)));
if (peakiness > peakinessThres)
valleys.Add(255);
//peaks.Add(255);
return valleys;
}
}
//if(!valleys.Contains(255))
// valleys.Add(255);
return valleys;
}

What exactly does a Sample Rate of 44100 sample?

I'm using FMOD library to extract PCM from an MP3. I get the whole 2 channel - 16 bit thing, and I also get that a sample rate of 44100hz is 44,100 samples of "sound" in 1 second. What I don't get is, what exactly does the 16 bit value represent. I know how to plot coordinates on an xy axis, but what am I plotting? The y axis represents time, the x axis represents what? Sound level? Is that the same as amplitude? How do I determine the different sounds that compose this value. I mean, how do I get a spectrum from a 16 bit number.
This may be a separate question, but it's actually what I really need answered: How do I get the amplitude at every 25 milliseconds? Do I take 44,100 values, divide by 40 (40 * 0.025 seconds = 1 sec) ? That gives 1102.5 samples; so would I feed 1102 values into a blackbox that gives me the amplitude for that moment in time?
Edited original post to add code I plan to test soon: (note, I changed the frame rate from 25 ms to 40 ms)
// 44100 / 25 frames = 1764 samples per frame -> 1764 * 2 channels * 2 bytes [16 bit sample] = 7056 bytes
private const int CHUNKSIZE = 7056;
uint bytesread = 0;
var squares = new double[CHUNKSIZE / 4];
const double scale = 1.0d / 32768.0d;
do
{
result = sound.readData(data, CHUNKSIZE, ref read);
Marshal.Copy(data, buffer, 0, CHUNKSIZE);
//PCM samples are 16 bit little endian
Array.Reverse(buffer);
for (var i = 0; i < buffer.Length; i += 4)
{
var avg = scale * (Math.Abs((double)BitConverter.ToInt16(buffer, i)) + Math.Abs((double)BitConverter.ToInt16(buffer, i + 2))) / 2.0d;
squares[i >> 2] = avg * avg;
}
var rmsAmplitude = ((int)(Math.Floor(Math.Sqrt(squares.Average()) * 32768.0d))).ToString("X2");
fs.Write(buffer, 0, (int) read);
bytesread += read;
statusBar.Text = "writing " + bytesread + " bytes of " + length + " to output.raw";
} while (result == FMOD.RESULT.OK && read == CHUNKSIZE);
After loading mp3, seems my rmsAmplitude is in the range 3C00 to 4900. Have I done something wrong? I was expecting a wider spread.
Yes, a sample represents amplitude (at that point in time).
To get a spectrum, you typically convert it from the time domain to the frequency domain.
Last Q: Multiple approaches are used - You may want the RMS.
Generally, the x axis is the time value and y axis is the amplitude. To get the frequency, you need to take the Fourier transform of the data (most likely using the Fast Fourier Transform [fft] algorithm).
To use one of the simplest "sounds", let's assume you have a single frequency noise with frequency f. This is represented (in the amplitude/time domain) as y = sin(2 * pi * x / f).
If you convert that into the frequency domain, you just end up with Frequency = f.
Each sample represents the voltage of the analog signal at a given time.

Resources