I'm using FMOD library to extract PCM from an MP3. I get the whole 2 channel - 16 bit thing, and I also get that a sample rate of 44100hz is 44,100 samples of "sound" in 1 second. What I don't get is, what exactly does the 16 bit value represent. I know how to plot coordinates on an xy axis, but what am I plotting? The y axis represents time, the x axis represents what? Sound level? Is that the same as amplitude? How do I determine the different sounds that compose this value. I mean, how do I get a spectrum from a 16 bit number.
This may be a separate question, but it's actually what I really need answered: How do I get the amplitude at every 25 milliseconds? Do I take 44,100 values, divide by 40 (40 * 0.025 seconds = 1 sec) ? That gives 1102.5 samples; so would I feed 1102 values into a blackbox that gives me the amplitude for that moment in time?
Edited original post to add code I plan to test soon: (note, I changed the frame rate from 25 ms to 40 ms)
// 44100 / 25 frames = 1764 samples per frame -> 1764 * 2 channels * 2 bytes [16 bit sample] = 7056 bytes
private const int CHUNKSIZE = 7056;
uint bytesread = 0;
var squares = new double[CHUNKSIZE / 4];
const double scale = 1.0d / 32768.0d;
do
{
result = sound.readData(data, CHUNKSIZE, ref read);
Marshal.Copy(data, buffer, 0, CHUNKSIZE);
//PCM samples are 16 bit little endian
Array.Reverse(buffer);
for (var i = 0; i < buffer.Length; i += 4)
{
var avg = scale * (Math.Abs((double)BitConverter.ToInt16(buffer, i)) + Math.Abs((double)BitConverter.ToInt16(buffer, i + 2))) / 2.0d;
squares[i >> 2] = avg * avg;
}
var rmsAmplitude = ((int)(Math.Floor(Math.Sqrt(squares.Average()) * 32768.0d))).ToString("X2");
fs.Write(buffer, 0, (int) read);
bytesread += read;
statusBar.Text = "writing " + bytesread + " bytes of " + length + " to output.raw";
} while (result == FMOD.RESULT.OK && read == CHUNKSIZE);
After loading mp3, seems my rmsAmplitude is in the range 3C00 to 4900. Have I done something wrong? I was expecting a wider spread.
Yes, a sample represents amplitude (at that point in time).
To get a spectrum, you typically convert it from the time domain to the frequency domain.
Last Q: Multiple approaches are used - You may want the RMS.
Generally, the x axis is the time value and y axis is the amplitude. To get the frequency, you need to take the Fourier transform of the data (most likely using the Fast Fourier Transform [fft] algorithm).
To use one of the simplest "sounds", let's assume you have a single frequency noise with frequency f. This is represented (in the amplitude/time domain) as y = sin(2 * pi * x / f).
If you convert that into the frequency domain, you just end up with Frequency = f.
Each sample represents the voltage of the analog signal at a given time.
Related
I'm using node.js for a project im doing.
The project is to convert words into numbers and then to take those numbers and create an audio output.
The audio output should play the numbers as frequencies. for example, I have an array of numbers [913, 250,352] now I want to play those numbers as frequencies.
I know I can play them in the browser with audio API or any other third package that allows me to do so.
The thing is that I want to create some audio file, I tried to convert those numbers into notes and then save it as Midi file, I succeed but the problem is that the midi file takes the frequencies, convert them into the closest note (example: 913 will convert into 932.33HZ - which is note number 81),
// add a track
var array = gematriaArray
var count = 0
var track = midi.addTrack()
var note
for (var i = 0; i < array.length; i++) {
note = array[i]
track = track.addNote({
//here im converting the freq -> midi note.
midi: ftom(parseInt(note)),
time: count,
duration: 3
})
count++
}
// write the output
fs.writeFileSync('./public/sounds/' + name + random + '.mid', new Buffer.from(midi.toArray()))
I searched the internet but I couldn't find anything that can help.
I really want to have a file that the user can download with those numbers as frequencies, someone knows what can be done to get this result?
Thanks in advance for the helpers.
this function will populate a buffer with floating point values which represent the height of the raw audio curve for the given frequency
var pop_audio_buffer_custom = function (number_of_samples, given_freq, samples_per_second) {
var number_of_samples = Math.round(number_of_samples);
var audio_obj = {};
var source_buffer = new Float32Array(number_of_samples);
audio_obj.buffer = source_buffer;
var incr_theta = (2.0 * Math.PI * given_freq) / samples_per_second;
var theta = 0.0;
for (var curr_sample = 0; curr_sample < number_of_samples; curr_sample++) {
audio_obj.buffer[curr_sample] = Math.sin(theta);
console.log(audio_obj.buffer[curr_sample] , "theta ", theta);
theta += incr_theta;
}
return audio_obj;
}; // pop_audio_buffer_custom
var number_of_samples = 10000; // long enough to be audible
var given_freq = 300;
var samples_per_second = 44100; // CD quality sample rate
var wav_output_filename = "/tmp/wav_output_filename.wav"
var synthesized_obj = {};
synthesized_obj.buffer = pop_audio_buffer_custom(number_of_samples, given_freq, samples_per_second);
the world of digital audio is non trivial ... the next step once you have an audio buffer is to translate the floating point representation into something which can be stored in bytes ( typically 16 bit integers dependent on your choice of bit depth ) ... then that 16 bit integer buffer needs to get written out as a WAV file
audio is a wave sometimes called a time series ... when you pound your fist onto the table the table wobbles up and down which pushes tiny air molecules in unison with that wobble ... this wobbling of air propagates across the room and reaches a microphone diaphragm or maybe your eardrum which in turn wobbles in resonance with this wave ... if you glued a pencil onto the diaphragm so it wobbled along with the diaphragm and you slowly slid a strip of paper along the lead tip of the pencil you would see a curve being written onto that paper strip ... this is the audio curve ... an audio sample is just the height of that curve at an instant of time ... if you repeatedly wrote down this curve height value X times per second at a constant rate you will have a list of data points of raw audio ( this is what above function creates ) ... so a given audio sample is simply the value of the audio curve height at a given instant in time ... since computers are not continuous instead are discrete they cannot handle the entire pencil drawn curve so only care about this list of instantaneously measured curve height values ... those are audio samples
above 32 bit floating point buffer can be fed into following function to return a 16 bit integer buffer
var convert_32_bit_float_into_signed_16_bit_int_lossy = function(input_32_bit_buffer) {
// this method is LOSSY - intended as preliminary step when saving audio into WAV format files
// output is a byte array where the 16 bit output format
// is spread across two bytes in little endian ordering
var size_source_buffer = input_32_bit_buffer.length;
var buffer_byte_array = new Int16Array(size_source_buffer * 2); // Int8Array 8-bit twos complement signed integer
var value_16_bit_signed_int;
var index_byte = 0;
console.log("size_source_buffer", size_source_buffer);
for (var index = 0; index < size_source_buffer; index++) {
value_16_bit_signed_int = ~~((0 < input_32_bit_buffer[index]) ? input_32_bit_buffer[index] * 0x7FFF :
input_32_bit_buffer[index] * 0x8000);
buffer_byte_array[index_byte] = value_16_bit_signed_int & 0xFF; // bitwise AND operation to pluck out only the least significant byte
var byte_two_of_two = (value_16_bit_signed_int >> 8); // bit shift down to access the most significant byte
buffer_byte_array[index_byte + 1] = byte_two_of_two;
index_byte += 2;
};
// ---
return buffer_byte_array;
};
next step is to persist above 16 bit int buffer into a wav file ... I suggest you use one of the many nodejs libraries for that ( or even better write your own as its only two pages of code ;-)))
Does anybody knows an algorithm for making a random series of numbers (like 100 java-byte (>=-127 & <= 127) ) which when are drawn as a bar chart, would be similar to a regular audio spectrum, like those SoundCloud ones?
I'm trying to write one, it has multiple Random and Sinus calculations, but the result is very ugly, it's something between a sinus wave and an old toothbrush. I would be very thankful if you code direct me to a one which is aesthetically convincing
An algorithm with an explanation (and/or picture) is fine. A pseudocode would be very nice of you. An actual JAVA code is bonus. :D
Edit:
This is the code I'm using right now. It's convoluted but I'm basically adding a random deviation to a sinus wave with random amplitude (which I'm not sure if it was a good idea).
private static final int FREQ = 7;
private static final double DEG_TO_RAD = Math.PI / 180;
private static final int MAX_AMPLITUDE = 127;
private static final float DEVIATION = 0.1f; // 10 percent is maximum deviation
private void makeSinusoidRandomBytes() {
byte[] bytes = new byte[AUDIO_VISUALIZER_DENSITY];
for (int i = 0; i < AUDIO_VISUALIZER_DENSITY; i++) {
int amplitude = random.nextInt(MAX_AMPLITUDE) - MAX_AMPLITUDE/2;
byte dev = (byte) (random.nextInt((int) Math.max(Math.abs(2 * DEVIATION * amplitude), 1))
- Math.abs(DEVIATION * amplitude));
bytes[i] = (byte) (Math.sin(i * FREQ * DEG_TO_RAD) * amplitude - dev);
}
this.bytes = bytes;
}
A real soundwave is actually a combination of sine waves of different frequencies and amplitudes added together, not random deviations from a sine wave. The difficult part will be to choose a combination of wave amplitudes and frequencies that will produce the output that you will subjectively like! However, most sound waves have a base frequency and then a number of overtones which "fit into" that wavelength - for example it might have an overtone at 3/2 * the base frequency and at amplitude of 2/3 the base frequency. By combining these overtones and scaling the resulting waveform to the -127 - +127 range, you'll get an actual soundwave.
The following code is C#, but close enough to Java to give you an idea. It's from a game, where I needed to combine many sine waves together to create various types of oscillating effects:
/// <summary>
/// Return a value between 0 and 1 based on a sine-wave oscillating with a given combination of periods at a given point in time
/// </summary>
/// <param name="time">time to get wave value at</param>
/// <param name="periods">lengths of waves</param>
/// <returns>height of wave</returns>
public static float MultiPulse(float time, params float[] periods)
{
float c = 0;
foreach (float p in periods)
{
float cp = (MathHelper.Pi / p) * time;
float s = ((float)Math.Sin(cp) + 1) / 2;
c += s / periods.Length;
}
return c;
}
You probably want to modify that to allow you to specify different amplitudes as well as periods for the waves you are combining.
By combining many widely varying amplitudes and periods (frequencies) you should by trial and error be able to get something convincing.
Based on the idea see sharper gave me, this is the code I'm using right now:
int mainAmp = random.nextInt(MAX_AMPLITUDE) - MAX_AMPLITUDE / 2;
int overtoneAmp = random.nextInt(MAX_AMPLITUDE * 2 / 3) - MAX_AMPLITUDE / 3;
int overtone2Amp = random.nextInt(MAX_AMPLITUDE * 4 / 7) - MAX_AMPLITUDE / 2 * 7;
int mainFreq = random.nextInt(7) + 7;
int overtoneFreq = mainFreq * 3 / 2;
int overtone2Freq = mainFreq * 7 / 4;
byte[] bytes = new byte[AUDIO_VISUALIZER_DENSITY];
for (int i = 0; i < AUDIO_VISUALIZER_DENSITY; i++) {
bytes[i] = (byte) (Math.sin(i * mainFreq * DEG_TO_RAD) * mainAmp
+ Math.sin(i * overtoneFreq * DEG_TO_RAD) * overtoneAmp
+ Math.sin(i * overtone2Freq * DEG_TO_RAD) * overtone2Amp);
}
Main frequency is between 8 and 15 for my app. You can play with those. The other two overtones I'm using are (2 - 1/2)x & (2 - 1/4)x of main frequency. You can add more like (2 - 1/8)x etc. Or use another series of frequencies. I also randomize the amplitude to get a unique wave each time.
These are some waves I'm drawing using this code:
Ok, so basically, I am implementing the following algorithm:
1) Slice signal of size 256 with an overlap of 128
2) Multiply each chunk with the Hanning window
3) Get DFT
4) Compute the abs value sqrt(re*re+im*im)
Plotting these values, as a imshow I get the following result:
This looks ok, it's clearly showing some difference, i.e. the spike where the signal has most amplitude shows. However, in Python I get this result:
I know that I'm doing something right, but, also doing something wrong. I just can't seem to find out where which is making me not think I have done it correctly.
Any rough ideas to where I could be going wrong here? I mean, is plotting the abs value the right way here or not?
Thanks
EDIT:
Result after clamping..
UPDATE:
Code:
for(unsigned j=0; (j < stft_temp[i].size()/2); j++)
{
double v = 10 * log10(stft_temp[i][j].re * stft_temp[i][j].re + stft_temp[i][j].im * stft_temp[i][j].im);
double pixe = 1.5 * (v + 100);
STFT[i][j] = (int) pixe;
}
Typically you might want to use a log magnitude and then scale to the required range, which would usually be 0..255. In pseudo-code:
mag_dB = 10 * log10(re * re + im * im); // get log magnitude (dB)
pixel_intensity = 1.5 * (mag_dB + 100); // offset and scale
pixel_intensity = min(pixel_intensity, 255); // clamp to 0..255
pixel_intensity = max(pixel_intensity, 0);
when observing compressed data, I expect an almost uniformely distributed byte stream. When using the chi square test for measure the distribution, I get this result e.g. for ZIP-files and other compressed data, but not for JPG-files. Last days I spent with finding reasons for this, but I cannot find any.
When calculating the entropy of JPGs, I get a high result (e.g. 7,95 Bits/Byte). I thought there must be a connection between the entropy and the distribution: the entropy is hight, when every byte appears with almost the same probability. But when using chi square, a get a p-value which is about 4,5e-5...
I just want to understand how different distributions influence the test results... I thought I can measure the same property with both tests, but obviously I can not.
Thank you very much for any hint!
tom
Distribution in jpeg-files
Ignoring the meta-information and the jpeg-header-data, the payload of a jpeg consists of blocks describing huffmann-tables or encoded MCUs (Minimum-Coded-Units, square blocks of the size 16x16). There may be others but this are the most frequent ones.
Those blocks are delimited by 0xFF 0xSS, where 0xSS is a specific startcode. Here is the first problem: 0xFF is a bit more frequent as twalberg mentioned in the comments.
It may happen, that 0xFF occur in an encoded MCU. To distinguish between this normal payload and the start of a new block, 0xFF 0x00 is inserted. If the distribution of unstuffed payload is perfectly uniform, 0x00 will be twice as often in the stuffed data. To make bad things worse, every MCU is filled up with binary ones to get byte-alignment (a slight bias to larger values) and we might need stuffing again.
There may be also some other factors I'm not aware of. If you need more information you have to provide the jpeg-file.
And about your basic assumption:
for rand_data:
dd if=/dev/urandom of=rand_data count=4096 bs=256
for rand_pseudo (python):
s = "".join(chr(i) for i in range(256))
with file("rand_pseudo", "wb") as f:
for i in range(4096):
f.write(s)
Both should be uniform regarding byte-values, shouldn't they? ;)
$ ll rand_*
-rw-r--r-- 1 apuch apuch 1048576 2012-12-04 20:11 rand_data
-rw-r--r-- 1 apuch apuch 1048967 2012-12-04 20:13 rand_data.tar.gz
-rw-r--r-- 1 apuch apuch 1048576 2012-12-04 20:14 rand_pseudo
-rw-r--r-- 1 apuch apuch 4538 2012-12-04 20:15 rand_pseudo.tar.gz
A uniform distribution might indicate a high entropy but its not a guarantee. Also, rand_data might consists out of 1MB of 0x00. Its extremely unlikely, but possible.
Here you can find two files: the first one is random data, generated with dev/unrandom (about 46MB), the second one is a normal JPG file (about 9MB). It is obvious that the symbols of the JPG-file are not as equally distributed as in dev/urandom.
If I compare both files:
Entropy:
JPG: 7,969247 Bits/Byte
RND: 7,999996 Bits/Byte
P-Value of chi-square test:
JPG: 0
RND: 0,3621
How can the entropy lead to such a high result?!?
Here is my java code
public static double getShannonEntropy_Image(BufferedImage actualImage){
List<String> values= new ArrayList<String>();
int n = 0;
Map<Integer, Integer> occ = new HashMap<>();
for(int i=0;i<actualImage.getHeight();i++){
for(int j=0;j<actualImage.getWidth();j++){
int pixel = actualImage.getRGB(j, i);
int alpha = (pixel >> 24) & 0xff;
int red = (pixel >> 16) & 0xff;
int green = (pixel >> 8) & 0xff;
int blue = (pixel) & 0xff;
//0.2989 * R + 0.5870 * G + 0.1140 * B greyscale conversion
//System.out.println("i="+i+" j="+j+" argb: " + alpha + ", " + red + ", " + green + ", " + blue);
int d= (int)Math.round(0.2989 * red + 0.5870 * green + 0.1140 * blue);
if(!values.contains(String.valueOf(d)))
values.add(String.valueOf(d));
if (occ.containsKey(d)) {
occ.put(d, occ.get(d) + 1);
} else {
occ.put(d, 1);
}
++n;
}
}
double e = 0.0;
for (Map.Entry<Integer, Integer> entry : occ.entrySet()) {
int cx = entry.getKey();
double p = (double) entry.getValue() / n;
e += p * log2(p);
}
return -e;
}
I've written the following code for frequency modulation of an audio signal. The audio itself is 1 sec long, sampled at 8000 Hz. I want to apply FM to this audio signal by using a sine wave with a frequency of 50 Hz (expressed as a fraction of the sampling frequency). The modulating signal has a modulation index of 0.25 so as to create only one pair of sidebands.
for (i = 0; i < 7999; i++) {
phi_delta = 8000 - 8000 * (1 + 0.25 * sin(2* pi * mf * i));
f_phi_accum += phi_delta; //this can have a negative value
/*keep only the integer part that'll be used as an index into the input array*/
i_phi_accum = f_phi_accum;
/*keep only the fractional part that'll be used to interpolate between samples*/
r_phi_accum = f_phi_accum - i_phi_accum;
//If I'm getting negative values should I convert them to positive
//r_phi_accum = fabs(f_phi_accum - i_phi_accum);
i_phi_accum = abs(i_phi_accum);
/*since i_phi_accum often exceeds 7999 I have to add this if statement so as to prevent out of bounds errors */
if (i_phi_accum < 7999)
output[i] = ((input[i_phi_accum] + input[i_phi_accum + 1])/2) * r_phi_accum;
}
Your calculation of phi_delta is off by a factor of 8000 and an offset - it should be 1 +/- a small value, i.e.
phi_delta = 1.0 + 0.25 * sin(2.0 * pi * mf * i));
which will result in phi_delta having a range of 0.75 to 1.25.