I have a vectorized wav file with values between -1 and 1, 88,200 samples, 44.1 kHz sampling rate to hear the audio within two seconds. I'd like to send the audio through bluetooth to a bluetooth module, arduino, DAC, and 3.5mm breakout board with earbuds.
I am getting crackly audio when I receive it at the end. I tried to recreate this is MATLAB and it turns out to be a combination of the scaling (multiplying + shifting the values over 0) and the sampling rate change due to the receivers. Of course, I could be completely recking the sampling frequency with inefficient Arduino code, but since a factor is also the initial scaling my guess is that I am misunderstanding something fundamental to audio processing.
What is the proper way to format and or scale the values between 0-4095 (which are needed for the DAC input) so that the audio itself is not distorted upon listening due to the scaling factor, sampling rate retention aside? OR is there something else I am missing in the big picture of this?
Clarification: Currently I am using the python sockets library to send an audio string array char by char into an Arduino array and reading them as an integers, then inputting into the DAC. Not sure if python sockets is the best way to go, there should be something better or a more robust implementation of sockets to send the data
UPDATE: I realized that the HC-05 uses SPP bluetooth protocol, which seems to be waaay too low resolution to send reliable audio. I will see if I can send a more compressed audio file, store it in the arduino, then output to the DAC. That could provide more reliable audio.
Have you tried setting in and out values in your samples? I know that video that includes audio, that could be one thing being overlooked, anyhow, that can cause issues for uploading to YouTube. It seems similar to this, because it might not know where to begin and end and it can affect audio too.
Another issue may be the format of the samples, against Bluetooth technology. AAC should probably be the format, but confirm this because I am not 100% sure what all it will accept.
The library has an example for bandwidth:
https://www.arduino.cc/en/Reference/AudioFrequencyMeter
But there are other functions for begin() and end(). You could declare them as variable to your start and end times within the samples, such that one will be the active track at a given time. You could also declare your frequency() as a constant value of 44.1, but you might have to escape the period for that. (It otherwise reads 60 to 1500.)
Related
I have a .MP3 file stored on my server, and I'd like to modify it to be a bit lower in pitch. I know this can be achieved by increasing the length of the audio, however, I don't know of any libraries in node that can do this.
I've tried using the node web audio api, and soundbank-pitch-shift, but the former doesn't seem to have the capabilities of pitch shifting (AFAIK), and the latter seems designed toward client
I need the solution within the realm of node ONLY- that means no external programs, etc., and it needs to be automated as well, so I can't manually pitch shift.
An ideal solution would be a function that takes a file/filepath as an input, and then creates (or overwrites) another MP3 file but with the pitch shifted by x amount, but really, any solution that produces something with a lower pitch than the original, works.
I'm totally lost here. Please help.
An audio file is basically a list of numbers. Those numbers are read one at a time at a particular speed called the 'sample rate'. The sample rate is otherwise defined as the number of audio samples read every second e.g. if an audio files sample rate is 44100, then there are 44100 samples (or numbers) read every second.
If you are with me so far, the simplest way to lower the pitch of an audio file is to play the file back at a lower sample rate (which is normally fixed in place). In most cases you wont be able to do this, so you need to achieve the same effect by resampling the file i.e adding new samples to the file in between the old samples to make it literally longer. For this you would need to understand interpolation.
The drawback to this technique in either case is that the sound will also play back at a slower speed, as well as at a lower pitch. If it is a problem that the sound has slowed down as well as lowered in pitch as a result of your processing, then you will also have to use a timestretching algorithm to fix the playback speed.
You may also have problems doing this using MP3 files. In this case you may have to uncompress the data in the MP3 file before you can operate on it in such a way that changes the pitch of the file. WAV files are more ideal in audio processing. In any case, you essentially need to turn the file into a list of floating point numbers, and change those numbers to be effectively read back at a slower rate.
Other methods of pitch shifting would probably need to involve the use of ffts, and would be a more complicated affair to say the least.
I am not familiar with nodejs I'm afraid.
I managed to get it working with help from Ollie M's answer and node-lame.
I hadn't known previously that sample rate could affect the speed, but thanks to Ollie, suddenly this problem became a lot more simple.
Using node-lame, all I did was take one of the examples (mp32wav.js), and make it so that I change the parameter sampleRate of the format object, so that it is lower than the base sample rate, which in my application was always a static 24,000. I could also make it dynamic since node-lame can grab the parameters of the input file in the format object.
Ollie, however perfectly describes the drawback with this method
The drawback to this technique in either case is that the sound will
also play back at a slower speed, as well as at a lower pitch. If it
is a problem that the sound has slowed down as well as lowered in
pitch as a result of your processing, then you will also have to use a
timestretching algorithm to fix the playback speed.
I don't have a particular need to implement a time stretching algorithm at the moment (thankfully, because that's a whole other can of worms), since I have the ability to change the initial speed of the file, but others may in the future.
See https://www.npmjs.com/package/audio-decode, https://github.com/audiojs/audio-buffer, and related linked at bottom of audio-buffer readme.
Is there an open standard for transmitting M2M data via audio?
Use case example: I want to broadcast a public PGP key via some sort of audio output.
You want to use "Frequency Shift Keying". It works by encoding bits into different frequencies of tones.
If you're doing that on Linux, use Mini Modem (http://www.whence.com/minimodem/).
If you are trying to accomplish this on windows, there are a number of amateur radio tools that transmit FSK signals over a soundcard. Try googling for "Sound card FSK modem"
Just for fun, here's a link to the "Kansas City Standard", which was intend for storing data on audio cassete: https://www.wikiwand.com/en/Kansas_City_standard
I hope someone can provide a simpler solution but so far I've found that a process called modulation was used to transmit data over audio in the days of dial-up:
Data to audio and back. Modulation / demodulation with source code
If time is not critical then DTMF - up to roughly 10 signs or 40 bits per second.
I have a situation where I have a video capture of HD content via HDMI with audio from a sound board that goes through a impedance drop into a microphone input of a camcorder. That same signal is split at line level to a 'line in' jack on the same computer that is capturing the HDMI. Alternatively I can capture the audio via USB from the soundboard which is probably the best plan, but carries with it the same issue.
The point is that the line in or usb capture will be much higher quality than the one on HDMI because the line out -> impedance change -> mic in path generates inferior quality in that simply brushing the mic jack on the camera while trying to change the zoom (close proximity) can cause noise on the recording.
So I can do this today:
Take the good sound and the camera captured sound and load each into
audacity and pretty quickly use the timeshift toot to perfectly fit
the good audio to the questionable audio from the HDMI capture and
cut the good audio to the exact size of the video. Then I can use
ffmpeg or other video editing software to replace the questionable
audio with the better audio.
But while somewhat quick and easy, it always carries with it a bit of human error and time. I'd like to automate this if possible as this process is repeated at least weekly throughout the year.
Does anyone have a suggestion if any of these ideas have merit or could suggest another approach?
I suspect but have yet to confirm that the system timestamp of the start time may be recorded in both audio captured with something like Audacity, or the USB capture tool from the sound board as well as the HDMI mpeg-2 video. I tried ffprobe on a couple audacity captured .wav files but didn't see anything in the results about such a time code, but perhaps other audio formats or other probing tools may include this info. Can anyone advise if this is common with any particular capture tools or file formats?
if so, I think I could get best results by extracting this information and then using simple adelay and atrim filters in ffmpeg to sync reliably directly from the two sources in one ffmpeg call. This is all theoretical for me right now-- I've never tried either of these filters yet-- just trying to optimize against blind alleys by asking for advice up front.
If such timestamps are not embedded, possibly I can use the file system timestamp for the same idea expressed in 1a, but I suspect the file open of the two capture tools may have different inherant delays. Possibly these delays will be found to be nearly constant and the approach can work with a built-in constant anticipation delay but sounds messy and less reliable than idea 1. Still, I'd take it, if it turns out reasonably reliable
Are there any ffmpeg or general digital audio experts out there that know of particular filters that can be employed on the actual data to look for similarities like normalizing the peak amplitudes or normalizing the amplification of the two to some RMS value and then stepping through a short 10 second snippet of audio, moving one time stream .01s left against the other repeatedly and subtracting the two and looking for a minimum? Sounds like it could take a while, but if it could do this in less than a minute and be reliable, I suspect it could work. But I have only rudimentary knowledge of audio streams and perhaps what I suggest is just not plausible-- but since each stream starts with the same source I think there should be a chance. I am just way out of my depth as to how to go down this road, so if someone out there knows such magic or can throw me some names of filters and example calls, I can explore if I can make it work.
any hardware level suggestions to take a line level output down to a mic level input and not have the problems I am seeing using a simple in-line impedance drop module, so that I can simply rely on the audio from the HDMI?
Thanks in advance for any pointers or suggestinons!
I am developing a software which can auto record and extract every words in my voice. I used portaudio library to solve it. But I am stuck on detecting the sound: I set the silence's value is zero so if there is a sample which is zero, it must be a start or end point of a sound. But when I ran it, the program created many words. I think because the value I read by portaudio is raw data, so it can't be processed like that. Am I right? How can I fix it? By the way, I am coding in C++ :D
To detect the presence of a signal in a PCM stream you be able to detect it. As dprogramz put said, the noise floor of your soundcard is probably not perfect and so there will be some noise signal recorded (even with no mic connected).
The solution is to use a VOX or VAD algorithm to detect the presence of your voice. VOX can be tricky, since in most consumer grade electronics the noise floor is just low enough to be "silence" to the human ear, relative to the signal. This means that the difference on amplitude between the noise floor and signal may be slight. If your sound card has AGC turned on this can make it even more difficult, since the noise floor may move. Having said that, VOX can be implemented successfully on consumer grade equipment. It just takes more effort to establish the threshold. When done best the threshold is calculated periodically while the stream is active.
If I were doing this I'd implement a VAD algorithm. Since your objective is to detect your voice this should provide a reliable result regardless of the equipment you use.
I don't think it's because it is a RAW value. RAW sound files are a bitstream of frequency and volume information.
However, the value will rarely (if ever) be zero. You have to take into account there is a small amount of electrical noise that is made by the mic. Figure out the "idle" dB of your mic (just test the level when you aren't talking into it). You Then need to set a silence threshold (below a certain dB level for a certain number of samples) to detect the beginning/end. Attempting to detect a zero value is gonna be near impossible.
I am wondering on how to estimate where I am currently in an audio with regards to time, by using the data.
For example, I read data by byte[8192] blocks. How can I know how much byte[8192] is equivalent to in time?
If this is some sort of raw-ish encoding, like PCM, this is simple. The length in time is a function of the sample rate, bit depth, and number of channels. 30 seconds of 16-bit audio at 44.1kHz in mono is 2.5MB. However, you also need to factor in headers and container format crapola. WAV files for example can have a lot of other stuff in them.
Compressed formats are much more tricky. You can never be sure where you are without playing through the file to get to where you are. Of course you can always guesstimate based on the percentage of the file length, if that is good enough for your case.
I think this is not what he was asking.
First you have to tell us what kind of data you are using. WAV? MP3? Usually without knowing where that block came from - so you know if you have some kind of frame information and where to find it - you are not able to determine that block's position.
If you have the full stream and this data then you can do a search