i want to convert bits of data to sound, for example 1010 to beep-nobeep-beep-nobeep.how can i compress more bits in it
You can encode a lot of information into sound by using a Fourier transform.
you can use pattern, assume longbeep means the beginning of a pattern ID and than beep the pattern id, end it with another longbeep.
(If I understood your question...)
or - you can compress the data using a compression library and beep the results...
and then uncompress...
Speech coding does exactly what you are referring to in old FAX machines and TTY terminals,
http://en.wikipedia.org/wiki/Telecommunications_device_for_the_deaf and 3GGP has a standard which described some of the most efficient methods. http://www.qtc.jp/3GPP/Specs/22226-700.pdf
In simple terms you want the encoded message to be played out or recorded with current audio equipment so the resulting signal after encoding with 0,0, 1 ,1 0,1 sequence should be able to be compatible with the audio equipment that is what these encoders/decoded do.
Related
I have a vectorized wav file with values between -1 and 1, 88,200 samples, 44.1 kHz sampling rate to hear the audio within two seconds. I'd like to send the audio through bluetooth to a bluetooth module, arduino, DAC, and 3.5mm breakout board with earbuds.
I am getting crackly audio when I receive it at the end. I tried to recreate this is MATLAB and it turns out to be a combination of the scaling (multiplying + shifting the values over 0) and the sampling rate change due to the receivers. Of course, I could be completely recking the sampling frequency with inefficient Arduino code, but since a factor is also the initial scaling my guess is that I am misunderstanding something fundamental to audio processing.
What is the proper way to format and or scale the values between 0-4095 (which are needed for the DAC input) so that the audio itself is not distorted upon listening due to the scaling factor, sampling rate retention aside? OR is there something else I am missing in the big picture of this?
Clarification: Currently I am using the python sockets library to send an audio string array char by char into an Arduino array and reading them as an integers, then inputting into the DAC. Not sure if python sockets is the best way to go, there should be something better or a more robust implementation of sockets to send the data
UPDATE: I realized that the HC-05 uses SPP bluetooth protocol, which seems to be waaay too low resolution to send reliable audio. I will see if I can send a more compressed audio file, store it in the arduino, then output to the DAC. That could provide more reliable audio.
Have you tried setting in and out values in your samples? I know that video that includes audio, that could be one thing being overlooked, anyhow, that can cause issues for uploading to YouTube. It seems similar to this, because it might not know where to begin and end and it can affect audio too.
Another issue may be the format of the samples, against Bluetooth technology. AAC should probably be the format, but confirm this because I am not 100% sure what all it will accept.
The library has an example for bandwidth:
https://www.arduino.cc/en/Reference/AudioFrequencyMeter
But there are other functions for begin() and end(). You could declare them as variable to your start and end times within the samples, such that one will be the active track at a given time. You could also declare your frequency() as a constant value of 44.1, but you might have to escape the period for that. (It otherwise reads 60 to 1500.)
I am posed with the task of mixing raw data from audio files. I am currently struggling to get a clean sound from mixing the data, I keep getting distortion or white noise.
Lets say that I have a two byte array of data from two AudioInputStream's. The AIS is used to stream a byte array from a given audio file. Here I can playback single audio files using SourceDataLine's write method. I want to play two audio files simultaneously, therefore I am aware that I need to perform some sort of PCM addition.
Can anyone recommend whether this addition should be done with float values or byte values? Also, when it comes to adding 3,4 or more audio files, I am guessing my problem will be even harder! Do I need to divide by a certain amount to avoid this overflow? Lets say I am adding two 16-bit audio files (min -32,768, max 32,767).
I admit, I have had some advice on this before but can't seem to get it working! I have code of what I have tried but not with me!
Any advice would be great.
Thanks
First off, I question whether you are actually working with fully decoded PCM data values. If you are directly adding bytes, that would only make sense if the sound was recorded at 8-bit resolution, which is done less and less. These days, audio is recorded more commonly as 16-bit values, or more. I think there are some situations that don't require as much frequency content, but with current systems, the cpu savings aren't as critical so people opt to keep at least "CD Quality" (16-bit resolution, stereo, 41000 fps).
So step one, you have to make sure that you are properly converting the byte streams to valid PCM. For example, if 16-bit encoding, the two bytes have to be appended in the correct order (may be either big-endian or little-endian), and the resulting value used.
Once that is properly handled, it is usually sufficient to simply add the values and maybe impose a min and max filter to ensure the signal doesn't go beyond the defined range. I can think of two reasons why this works: (a) audio is usually recorded at a low enough volume that summing will not cause overflow, (b) the signals are random enough, with both positive and negative values, that moments where all the contributors line up in either the positive or negative direction are rare and short-lived.
Using a min and max will "clip" the signals, and can introduce some audible distortion, but it is a much less horrible sound than overflow! If your sources are routinely hitting the min and max, you can simply multiply a volume factor (within the range 0 to 1) to one or more of the contributing signals as a whole, to bring the audio values down.
For 16-bit data, it works to perform operations directly on the signed integers that result from appending the two bytes together (-32768 to 32767). But it is a more common practice to "normalize" the values, i.e., convert the 16-bit integers to floats ranging from -1 to 1, perform operations at that level, and then convert back to integers in the range -32768 to 32767 and break those integers into byte pairs.
There is a free book on digital signal processing that is well worth reading: Steven Smith's "The Scientists and Engineers Guide to Digital Signal Processing." It will give much more detail and background.
I'm trying to find out if there's a way to determine if an AAC-encoded audio track is encoded with Dolby Pro Logic II data. Is there a way of examining the file such that you can see this information? I have for example encoded a media file in Handbrake with (truncated to audio options) -E av_aac -B 320 --mixdown dpl2 and this is the audio track output that mediainfo shows:
Audio #1
ID : 2
Format : AAC
Format/Info : Advanced Audio Codec
Format profile : LC
Codec ID : 40
Duration : 2h 5mn
Bit rate mode : Variable
Bit rate : 321 Kbps
Channel(s) : 2 channels
Channel positions : Front: L R
Sampling rate : 48.0 KHz
Compression mode : Lossy
Stream size : 288 MiB (3%)
Title : Stereo / Stereo
Language : English
Encoded date : UTC 2017-04-11 22:21:41
Tagged date : UTC 2017-04-11 22:21:41
but I can't tell if there's anything in this output that would suggest that it's encoded with DPL2 data.
tl:dr; it's probably possible; it may be easier if you're a programmer.
Because the information encoded is just a stereo analog pair, there is no guaranteed way of detecting a Dolby Pro Logic II (DPL2) signal therein, unless you specifically store your own metadata saying "this is a DPL2 file." But you can probably make a pretty good guess.
All of the old analog Dolby Surround formats, including DPL2, store surround information in two channels by inverting the phase of the surround or surrounds and then mixing them into the original left and right channels. Dolby Surround type decoders, including DPL2, attempt to recover this information by inverting the phase of one of the two channels and then looking for similarities in these signal pairs. This is either done trivially, as in Dolby Surround, or else these similarities are artificially biased to be pushed much further to the left or right, or the left or right surround, as in DPL2.
So the trick is to detect whether important data is being stored in the surround channel(s). I'll sketch out for you a method that might work, and I'll try to express it without writing code, but it's up to you to implement and refine it to your liking.
Crop the first N seconds or so of program content into a stereo file, where N is between one and thirty. Call this file Input.
Mix down the Input stereo channels to a new mono file at -3dB per channel. Call this file Center.
Split the left and right channels of Input into separate files. Call these Left and Right.
Invert the right channel. Call this file RightInvert.
Mix down the Left and RightInvert channels to a new mono file at -3dB per channel. Call this file Surround.
Determine the RMS and peak dB of the Surround file.
If the RMS or peak DB of the Surround file are below "a tolerance", stop; the original file is either mono or center-panned and hence contains no surround information. You'll have to experiment with several DPL2 and non-DPL2 sources to see what these tolerances are, but after a dozen or so files the numbers should become clear. I'm guessing around -30 dB or so.
Invert the Center file into a new file. Call this file CenterInvert.
Mix the CenterInvert file into the Surround file at 0 dB (both CenterInvert and Surround should be mono). Call this new file SurroundInvert.
Determine the RMS and peak dB of the SurroundInvert file.
If either the RMS and/or peak dB of SurroundInvert are below "a tolerance," stop; your original source contains panned left or right front information, not surround information. You'll have to experiment with several DPL2 and non-DPL2 sources to see what these tolerances are, but after a dozen or so files the numbers should become clear -- I'm guessing around -35 dB or so.
If you've gotten this far, your original Input probably contains surround information, and hence is probably a member of the Dolby Surround family of encodings.
I've written this algorithm out such that you can do each of these steps with a specific command in sox. If you want to be fancier, instead of doing the RMS/peak value step in sox, you could run an ebur128 program and check your levels in LUFS against a tolerance. If you want to be even fancier, after you create the Surround and Center files, you could filter out all frequencies higher than 7kHz and do de-emphasis on them, just like a real DPL2 decoder would.
To keep this algorithm simple, I've sketched it out entirely in the amplitude domain. The calculation of the SurroundLevel file would probably be a lot more accurately done in the frequency domain, if you know how to calculate the magnitude and angle of FFT bins and you use windows of 30 to 100 ms. But this cheapo version above should get you started.
One last caution. AAC is a modern psychoacoustic codec, which means that it likes to play games with stereo phasing and imaging to achieve its compression. So I consider it likely that the mere act of encapsulating DPL2 into an AAC stream will likely hose some of the imaging present in DPL2. To be candid, neither DPL2 nor AAC belongs anywhere in this pipeline. If you must store an analog stream originally encoded with DPL2, do it in a lossless format like WAV or FLAC, not AAC.
As of this writing, operational concepts behind Dolby Pro Logic (I) are here. These basic concepts still apply to DPL2; operational concepts for DPL2 are here.
If the file has more than one channel, you can with some certainty assume that they are used for surround purposes, although they could be just multiple tracks.
In this case it falls on a playing system to do with channels as it "thinks" best. (if file header doesn't say what to do)
But your file is stereo. If you want to know whether it is a virtual surround file then you look in header for an encoder field to see which encoder was used.
This may help somewhat, although not much. Mostly encoder field is left empty, and second thing is that the encoder doesn't have to be same as the recoder that mixed down the surround data.
I.e. the recoder will first create raw PCM data, then feed it to some encoder to produce compressed file. (AAC or whatever)
Also, there are many applications and versions vary, so might the encoder field, so tracking all of them would be nasty work.
However, you can, with over 60% certainty, deduce whether something is virtual surround or not by examining the data.
This would be advanced DSP and, for speed, even machine learning may be involved.
You would have to find out whether the stereo signals contain certain features of HRTF (head related transfer function).
This may be achieved by examining intensity difference and delay features between same sounds appearing in time domain and harmonic features (characteristic frequency changes) in frequency domain.
You would have to do both, because one without another may just tell you that something is very good stereo recording,, not a virtual surround.
I don't know whether there are HRTF specific features mapped somewhere already, or you would need to do it by yourself.
It's a very complicated solution that takes a lot of time to make properly. Also it's performance would be problematic.
With this method you can also break the stereo mixdown to the nearly original surround channels.
But for stereo to surround conversion other methods are used and they sound well.
If you are determined to perform such a detection, dedicate half a year or more of hard work if no HRTF features are mapped, few weeks if they are,
brace yourself for big stress and I wish you luck. I have done something similar. It is a killer.
If you want an out of the box solution, then the answer to your question is no, unless header provides you with encoder field and the encoder is distinctive and known to be used only for doing surround to stereo conversion.
I do not think anyone did this from actual data as I described, or if they did it is a part of commercial product. Doing what you want is not usually needed, but it can be done.
Ow, BTW, try googling HRTF inversion, it might give some help.
Is there an open standard for transmitting M2M data via audio?
Use case example: I want to broadcast a public PGP key via some sort of audio output.
You want to use "Frequency Shift Keying". It works by encoding bits into different frequencies of tones.
If you're doing that on Linux, use Mini Modem (http://www.whence.com/minimodem/).
If you are trying to accomplish this on windows, there are a number of amateur radio tools that transmit FSK signals over a soundcard. Try googling for "Sound card FSK modem"
Just for fun, here's a link to the "Kansas City Standard", which was intend for storing data on audio cassete: https://www.wikiwand.com/en/Kansas_City_standard
I hope someone can provide a simpler solution but so far I've found that a process called modulation was used to transmit data over audio in the days of dial-up:
Data to audio and back. Modulation / demodulation with source code
If time is not critical then DTMF - up to roughly 10 signs or 40 bits per second.
I am wondering on how to estimate where I am currently in an audio with regards to time, by using the data.
For example, I read data by byte[8192] blocks. How can I know how much byte[8192] is equivalent to in time?
If this is some sort of raw-ish encoding, like PCM, this is simple. The length in time is a function of the sample rate, bit depth, and number of channels. 30 seconds of 16-bit audio at 44.1kHz in mono is 2.5MB. However, you also need to factor in headers and container format crapola. WAV files for example can have a lot of other stuff in them.
Compressed formats are much more tricky. You can never be sure where you are without playing through the file to get to where you are. Of course you can always guesstimate based on the percentage of the file length, if that is good enough for your case.
I think this is not what he was asking.
First you have to tell us what kind of data you are using. WAV? MP3? Usually without knowing where that block came from - so you know if you have some kind of frame information and where to find it - you are not able to determine that block's position.
If you have the full stream and this data then you can do a search