creating audio file based on frequencies - node.js

I'm using node.js for a project im doing.
The project is to convert words into numbers and then to take those numbers and create an audio output.
The audio output should play the numbers as frequencies. for example, I have an array of numbers [913, 250,352] now I want to play those numbers as frequencies.
I know I can play them in the browser with audio API or any other third package that allows me to do so.
The thing is that I want to create some audio file, I tried to convert those numbers into notes and then save it as Midi file, I succeed but the problem is that the midi file takes the frequencies, convert them into the closest note (example: 913 will convert into 932.33HZ - which is note number 81),
// add a track
var array = gematriaArray
var count = 0
var track = midi.addTrack()
var note
for (var i = 0; i < array.length; i++) {
note = array[i]
track = track.addNote({
//here im converting the freq -> midi note.
midi: ftom(parseInt(note)),
time: count,
duration: 3
})
count++
}
// write the output
fs.writeFileSync('./public/sounds/' + name + random + '.mid', new Buffer.from(midi.toArray()))
I searched the internet but I couldn't find anything that can help.
I really want to have a file that the user can download with those numbers as frequencies, someone knows what can be done to get this result?
Thanks in advance for the helpers.

this function will populate a buffer with floating point values which represent the height of the raw audio curve for the given frequency
var pop_audio_buffer_custom = function (number_of_samples, given_freq, samples_per_second) {
var number_of_samples = Math.round(number_of_samples);
var audio_obj = {};
var source_buffer = new Float32Array(number_of_samples);
audio_obj.buffer = source_buffer;
var incr_theta = (2.0 * Math.PI * given_freq) / samples_per_second;
var theta = 0.0;
for (var curr_sample = 0; curr_sample < number_of_samples; curr_sample++) {
audio_obj.buffer[curr_sample] = Math.sin(theta);
console.log(audio_obj.buffer[curr_sample] , "theta ", theta);
theta += incr_theta;
}
return audio_obj;
}; // pop_audio_buffer_custom
var number_of_samples = 10000; // long enough to be audible
var given_freq = 300;
var samples_per_second = 44100; // CD quality sample rate
var wav_output_filename = "/tmp/wav_output_filename.wav"
var synthesized_obj = {};
synthesized_obj.buffer = pop_audio_buffer_custom(number_of_samples, given_freq, samples_per_second);
the world of digital audio is non trivial ... the next step once you have an audio buffer is to translate the floating point representation into something which can be stored in bytes ( typically 16 bit integers dependent on your choice of bit depth ) ... then that 16 bit integer buffer needs to get written out as a WAV file
audio is a wave sometimes called a time series ... when you pound your fist onto the table the table wobbles up and down which pushes tiny air molecules in unison with that wobble ... this wobbling of air propagates across the room and reaches a microphone diaphragm or maybe your eardrum which in turn wobbles in resonance with this wave ... if you glued a pencil onto the diaphragm so it wobbled along with the diaphragm and you slowly slid a strip of paper along the lead tip of the pencil you would see a curve being written onto that paper strip ... this is the audio curve ... an audio sample is just the height of that curve at an instant of time ... if you repeatedly wrote down this curve height value X times per second at a constant rate you will have a list of data points of raw audio ( this is what above function creates ) ... so a given audio sample is simply the value of the audio curve height at a given instant in time ... since computers are not continuous instead are discrete they cannot handle the entire pencil drawn curve so only care about this list of instantaneously measured curve height values ... those are audio samples
above 32 bit floating point buffer can be fed into following function to return a 16 bit integer buffer
var convert_32_bit_float_into_signed_16_bit_int_lossy = function(input_32_bit_buffer) {
// this method is LOSSY - intended as preliminary step when saving audio into WAV format files
// output is a byte array where the 16 bit output format
// is spread across two bytes in little endian ordering
var size_source_buffer = input_32_bit_buffer.length;
var buffer_byte_array = new Int16Array(size_source_buffer * 2); // Int8Array 8-bit twos complement signed integer
var value_16_bit_signed_int;
var index_byte = 0;
console.log("size_source_buffer", size_source_buffer);
for (var index = 0; index < size_source_buffer; index++) {
value_16_bit_signed_int = ~~((0 < input_32_bit_buffer[index]) ? input_32_bit_buffer[index] * 0x7FFF :
input_32_bit_buffer[index] * 0x8000);
buffer_byte_array[index_byte] = value_16_bit_signed_int & 0xFF; // bitwise AND operation to pluck out only the least significant byte
var byte_two_of_two = (value_16_bit_signed_int >> 8); // bit shift down to access the most significant byte
buffer_byte_array[index_byte + 1] = byte_two_of_two;
index_byte += 2;
};
// ---
return buffer_byte_array;
};
next step is to persist above 16 bit int buffer into a wav file ... I suggest you use one of the many nodejs libraries for that ( or even better write your own as its only two pages of code ;-)))

Related

Polyline encode gets wrong lat/lng after decoding

We are using Google's Polyline decoding algorithm to decode our coordinates. But in our case the most coordinates are wrong after decoding it. We have also tested the process with a deeper precision.
This is our code and also our logs to test that the coordinates are wrong:
let coordinates = [ [lat, lng], [...], ...];
console.log(coordinates[13347]); // Output: [ 13.44668, 52.47429 ]
let encoded = Polyline.encode(coordinates);
let decoded = Polyline.decode(encoded);
console.log(decoded[13347]); // Output: [ 13.44671, 52.47445 ]
console.log(coordinates.length == decoded.length)// true
In this case the distance is 20 meters which is a lot. Other points have distances like 150 meter or even more.
In my coordinates array are around 250.000 coordinates which we want to decode.
Am I missing something so the decode/encode process fails this hard ?
TL;DR
Add the following lines after the declaration of the coordinates variable:
coordinates = coordinates.map(
pair => { return [pair[0].toFixed(5), pair[1].toFixed(5)]; }
);
Full answer
It looks like you're dealing with floating point rounding errors. Probably the library you use has incorrect implementation of the Polyline encoding algorithm.
In the description of the algorithm we read that the encoded string generated by the algorithm stores the differences between consecutive coordinates using fixed-precision numbers (with 5 decimal places). Therefore it is important to round the latitude and longitude to 5 decimal places before computing the differences. Without that step, the rounding errors may accumulate. In the worst case error may increase by about 0.000005 deg for each subsequent item in the encoded list.
The official implementation of the algorithm does not introduce accumulated rounding errors. However, the implementation found in NPM (package polyline) gives incorrect results that indicate the invalid rounding of numbers.
Please look at the examples bellow:
Example 1. Encoding a polyline using official implementation of the algorithm
(using google.maps.geometry.encoding.encodePath from the Google Maps JavaScript API)
originalList = [];
for (var i = 0; i < 100; ++i)
originalList.push(
new google.maps.LatLng(6 * i / 1000000, 0)
);
// originalList looks like: [[0.000000,0],[0.000006,0],[0.000012,0],[0.000018,0], ..., [0.000594,0]];
// (but with LatLng objects instead of 2-element arrays)
console.log(originalList[99].lat()) // 0.000594
var encodedList = google.maps.geometry.encoding.encodePath(originalList)
var decodedList = google.maps.geometry.encoding.decodePath(encodedList)
console.log(decodedList[99].lat()) // 0.00059
Example 2. Encoding a polyline using package polyline from NPM
let Polyline = require('polyline');
var originalList = [];
for (var i = 0; i < 100; ++i)
originalList.push(
[6 * i / 1000000, 0]
);
// again: originalList == [[0.000000,0],[0.000006,0],[0.000012,0],[0.000018,0], ..., [0.000594,0]];
console.log(originalList[99][0]) // 0.000594
var encodedList = Polyline.encode(originalList);
var decodedList = Polyline.decode(encodedList);
console.log(decodedList[99][0]) // 0.00099
Invalid result: the values 0.000594 and 0.00099 differ by more than 0.000005.
Possible fix
The library that you're using probably doesn't round the coordinates before computing the differences.
For example when two consecutive points have latitudes 0.000000 and 0.000006, the difference is 0.000006 and it is rounded to 0.00001 giving error of 0.000004.
You may want to round the coordinates manually, before passing them to Polyline.encode(), eg. using the function .toFixed(5):
let Polyline = require('polyline');
var originalList = [];
for (var i = 0; i < 100; ++i)
originalList.push(
[(6 * i / 1000000).toFixed(5), 0]
);
// before rounding: [[ 0.000000,0],[ 0.000006,0],[ 0.000012,0],[ 0.000018,0], ..., [ 0.000594,0]];
// after rounding: [['0.00000',0],['0.00001',0],['0.00001',0],['0.00002',0], ..., ['0.00059',0]];
console.log(originalList[99][0]) // 0.00059
var encodedList = Polyline.encode(originalList);
var decodedList = Polyline.decode(encodedList);
console.log(decodedList[99][0]) // 0.00059
Polyline encoding is lossy:
https://developers.google.com/maps/documentation/utilities/polylinealgorithm (Polyline encoding is a *lossy* compression algorithm that allows you to store a series of coordinates as a single string
How about using your own encoding scheme? The page above also shows the encoding scheme used by Google. Perhaps you can look for a trade-off between space and accuracy.

How do I swap stereo channels in raw PCM audio data on OS X?

I'm writing audio from an external decoding library on OS X to an AIFF file, and I am able to swap the endianness of the data with OSSwapInt32().
The resulting AIFF file (16-bit PCM stereo) does play, but the left and right channels are swapped.
Would there be any way to swap the channels as I am writing each buffer?
Here is the relevant loop:
do
{
xmp_get_frame_info(writer_context, &writer_info);
if (writer_info.loop_count > 0)
break;
writeModBuffer.mBuffers[0].mDataByteSize = writer_info.buffer_size;
writeModBuffer.mBuffers[0].mNumberChannels = inputFormat.mChannelsPerFrame;
// Set up our buffer to do the endianness swap
void *new_buffer;
new_buffer = malloc((writer_info.buffer_size) * inputFormat.mBytesPerFrame);
int *ourBuffer = writer_info.buffer;
int *ourNewBuffer = new_buffer;
memset(new_buffer, 0, writer_info.buffer_size);
int i;
for (i = 0; i <= writer_info.buffer_size; i++)
{
ourNewBuffer[i] = OSSwapInt32(ourBuffer[i]);
};
writeModBuffer.mBuffers[0].mData = ourNewBuffer;
frame_size = writer_info.buffer_size / inputFormat.mBytesPerFrame;
err = ExtAudioFileWrite(writeModRef, frame_size, &writeModBuffer);
} while (xmp_play_frame(writer_context) == 0);
This solution is very specific to 2 channel audio. I chose to do it at the same time you're looping to change the byte ordering to avoid an extra loop. I'm going through the loop 1/2 the number and processing two samples per iteration. The samples are interleaved so I copy from odd sample indexes into even sample indexes and vis-a-versa.
for (i = 0; i <= writer_info.buffer_size/2; i++)
{
ourNewBuffer[i*2] = OSSwapInt32(ourBuffer[i*2 + 1]);
ourNewBuffer[i*2 + 1] = OSSwapInt32(ourBuffer[i*2]);
};
An alternative is to use a table lookup for channel mapping.

When reading a WAV file, dataID is printed as "fact" and not "data"

I'm new to audio playback and have spent the day reading over the wav file specification. I wrote a simple program to extract the header of a file but right now my program always returns false as the DataID keeps returning as "fact" instead of "data".
There are a few reasons I believe this could be happening.
The file I am reading in has a format size of 18, whereas this resource states a valid PCM file should have a format size of 16.
The format code of the file I am reading is 6, meaning it has probably been compressed.
The value of dataSize is far too small (only 4). Even though the file has 30 seconds of playback when ran through VLC or Windows Media Player.
The code I am using is as follows:
using (var reader = new BinaryReader(File.Open(wavFile, FileMode.Open)))
{
// Read all descriptor info into variables to be passed
// to an ASWAVFile instance.
var chunkID = reader.ReadBytes(4); // Should contain "RIFF"
var chunkSize = reader.ReadBytes(4);
var format = reader.ReadBytes(4); // Should contain "WAVE"
var formatID = reader.ReadBytes(4); // Should contain "fmt"
var formatSize = reader.ReadBytes(4); // 16 for PCM format.
var formatCode = reader.ReadBytes(2); // Determines linear quantization - 1 = PCM, else it has been compressed
var channels = reader.ReadBytes(2); // mono = 1, stereo = 2
var sampleRate = reader.ReadBytes(4); // 8000, 44,100 etc
var byteRate = reader.ReadBytes(4); // SampleRate * Channels * BitsPerSample / 8
var blockAlign = reader.ReadBytes(2); // Channels * BitsPerSample / 8
var bitsPerSample = reader.ReadBytes(2); // If mono 8, if stereo 16 etc.
var padding = byteToInt(formatSize);
// Read any extra values so we can jump to the data chunk - extra padding should only be set here
// if formatSize is 18
byte[] fmtExtraSize = new byte[2];
if (padding == 18)
{
fmtExtraSize = reader.ReadBytes(2);
}
// Read the final header information in, we can then set
// other
var dataID = reader.ReadBytes(4); // Should contain "data"
var dataSize = reader.ReadBytes(4); // Calculated by Samples * Channels * BitsPerSample / 8
// Check if the file is in the correct format
if (
System.Text.ASCIIEncoding.Default.GetString(chunkID) != "RIFF" ||
System.Text.ASCIIEncoding.Default.GetString(format) != "WAVE" ||
System.Text.ASCIIEncoding.Default.GetString(formatID) != "fmt" ||
System.Text.ASCIIEncoding.Default.GetString(dataID) != "data"
)
{
return false;
}
//file = new ASWAVFile();
}
If I dump the values of chunkID, format, formatID and dataID I get:
RIFF, WAVE, fmt, fact
Causing the method to return false. Why is this happening?
The RIFF specification doesn't require the 'data' chunk to follow the 'fmt' chunk. You may see some files that write a 'pad' chunk after the 'fmt' chunk to ensure page alignment for better streaming.
http://en.wikipedia.org/wiki/WAV
Also the format code indicates the audio compression type, as you noted. Valid format codes are in mmreg.h (on Windows): (Format 6 is aLaw, indeed a compression type).
http://www-mmsp.ece.mcgill.ca/documents/audioformats/wave/Docs/MMREG.H
Your best bet is to write code that reads chunk headers, checks for the type you want, and skip past it to the next chunk if you can't find what you are looking for.

How to detect string tone from FFT

I've got spectrum from a Fourier transformation. It looks like this:
Police was just passing nearby
Color represents intensity.
X axis is time.
Y axis is frequency - where 0 is at top.
While whistling or a police siren leave only one trace, many other tones seem to contain a lot of harmonic frequencies.
Electric guitar plugged directly into microphone (standard tuning)
The really bad thing is, that as you can see there is no major intensity - there are 2-3 frequencies that are almost equal.
I have written a peak detection algorithm to highlight the most sigificant peak:
function findPeaks(data, look_range, minimal_val) {
if(look_range==null)
look_range = 10;
if(minimal_val == null)
minimal_val = 20;
//Array of peaks
var peaks = [];
//Currently the max value (that might or might not end up in peaks array)
var max_value = 0;
var max_value_pos = 0;
//How many values did we check without changing the max value
var smaller_values = 0;
//Tmp variable for performance
var val;
var lastval=Math.round(data.averageValues(0,4));
//console.log(lastval);
for(var i=0, l=data.length; i<l; i++) {
//Remember the value for performance and readibility
val = data[i];
//If last max value is larger then the current one, proceed and remember
if(max_value>val) {
//iterate the ammount of values that are smaller than our champion
smaller_values++;
//If there has been enough smaller values we take this one for confirmed peak
if(smaller_values > look_range) {
//Remember peak
peaks.push(max_value_pos);
//Reset other variables
max_value = 0;
max_value_pos = 0;
smaller_values = 0;
}
}
//Only take values when the difference is positive (next value is larger)
//Also aonly take values that are larger than minimum thresold
else if(val>lastval && val>minimal_val) {
//Remeber this as our new champion
max_value = val;
max_value_pos = i;
smaller_values = 0;
//console.log("Max value: ", max_value);
}
//Remember this value for next iteration
lastval = val;
}
//Sort peaks so that the largest one is first
peaks.sort(function(a, b) {return -data[a]+data[b];});
//if(peaks.length>0)
// console.log(peaks);
//Return array
return peaks;
}
The idea is, that I walk through the data and remember a value that is larger than thresold minimal_val. If the next look_range values are smaller than the chosen value, it's considered peak. This algorithm is not very smart but it's very easy to implement.
However, it can't tell which is the major frequency of the string, much like I anticipated:
The red dots highlight the strongest peak
Here's a jsFiddle to see how it really works (or rather doesn't work).
What you see in the spectrum of a string tone is the set of harmonics at
f0, 2*f0, 3*f0, ...
with f0 being the fundamental frequency or pitch of your string tone.
To estimate f0 from the spectrum (Output of FFT, abs value, probably logarithmic) you should not look for the strongest component, but the distance between all these harmonics.
One very nice method to do so is a second (inverse) FFT of the (abs, real) spectrum. This produces a strong line at t0 == 1/f0.
The sequence fft -> abs() -> fft-1 is equivalent to calculating the auto-correlation function (ACF) thanks to the Wiener–Khinchin theorem.
The precission of this approach depends on the length of the FFT (or ACF) and your sampling rate. You can improve precission a lot if you interpolate the "real" max between the sampling points of the result using a sinc function.
For even better results you could correct the intermediate spectrum: Most sounds have an average pink spectrum. If you amplify the higher frequencies (according an inverse pink spectrum) before the inverse FFT the ACF will be "better" (It takes the higher harmonics more into account, improving acuracy).

What exactly does a Sample Rate of 44100 sample?

I'm using FMOD library to extract PCM from an MP3. I get the whole 2 channel - 16 bit thing, and I also get that a sample rate of 44100hz is 44,100 samples of "sound" in 1 second. What I don't get is, what exactly does the 16 bit value represent. I know how to plot coordinates on an xy axis, but what am I plotting? The y axis represents time, the x axis represents what? Sound level? Is that the same as amplitude? How do I determine the different sounds that compose this value. I mean, how do I get a spectrum from a 16 bit number.
This may be a separate question, but it's actually what I really need answered: How do I get the amplitude at every 25 milliseconds? Do I take 44,100 values, divide by 40 (40 * 0.025 seconds = 1 sec) ? That gives 1102.5 samples; so would I feed 1102 values into a blackbox that gives me the amplitude for that moment in time?
Edited original post to add code I plan to test soon: (note, I changed the frame rate from 25 ms to 40 ms)
// 44100 / 25 frames = 1764 samples per frame -> 1764 * 2 channels * 2 bytes [16 bit sample] = 7056 bytes
private const int CHUNKSIZE = 7056;
uint bytesread = 0;
var squares = new double[CHUNKSIZE / 4];
const double scale = 1.0d / 32768.0d;
do
{
result = sound.readData(data, CHUNKSIZE, ref read);
Marshal.Copy(data, buffer, 0, CHUNKSIZE);
//PCM samples are 16 bit little endian
Array.Reverse(buffer);
for (var i = 0; i < buffer.Length; i += 4)
{
var avg = scale * (Math.Abs((double)BitConverter.ToInt16(buffer, i)) + Math.Abs((double)BitConverter.ToInt16(buffer, i + 2))) / 2.0d;
squares[i >> 2] = avg * avg;
}
var rmsAmplitude = ((int)(Math.Floor(Math.Sqrt(squares.Average()) * 32768.0d))).ToString("X2");
fs.Write(buffer, 0, (int) read);
bytesread += read;
statusBar.Text = "writing " + bytesread + " bytes of " + length + " to output.raw";
} while (result == FMOD.RESULT.OK && read == CHUNKSIZE);
After loading mp3, seems my rmsAmplitude is in the range 3C00 to 4900. Have I done something wrong? I was expecting a wider spread.
Yes, a sample represents amplitude (at that point in time).
To get a spectrum, you typically convert it from the time domain to the frequency domain.
Last Q: Multiple approaches are used - You may want the RMS.
Generally, the x axis is the time value and y axis is the amplitude. To get the frequency, you need to take the Fourier transform of the data (most likely using the Fast Fourier Transform [fft] algorithm).
To use one of the simplest "sounds", let's assume you have a single frequency noise with frequency f. This is represented (in the amplitude/time domain) as y = sin(2 * pi * x / f).
If you convert that into the frequency domain, you just end up with Frequency = f.
Each sample represents the voltage of the analog signal at a given time.

Resources