If I have a text file of sample amplitudes (0-26522), how can I create a playable audio file from them?
I have a vague recollection of tinkering with .pcm files and 8-bit samples way back in the nineties.
Is there any software to automatically create an audio file (PCM or other format) from my samples? I found SoX, but I even after looking at the documentation I can't figure out if it can do what I want, and if so how...
GUI audio workstation called Audacity that lets you do this
File -> Import -> Raw Data
Encoding: Signed 16-bit PCM // even though your ints are unsigned it still works
Byte order: little endian
Channels 1 channel mono
then just hit Import
to confirm this works, in a text editor I just did a ( cut N paste followed by select all paste,paste,paste,paste ) of below list of ints about 10 times to generate several thousand ints in a vertical column ... this is my toy input file ... after above Import just save by doing
File -> Export Audio
where you choose which output format ( mp3, aac, PCM, ...) once I did this the output mp3 is playable ... using my toy input file I did hear a sine tone
3
305
20294
11029
585
3
305
20294
11029
585
3
305
20294
11029
585
Related
I have a Canon Powershot S100 and I would like to be able to use it to play videos. The only format it supports it is a H.264 Linear PCM, 16 bit little-endian signed integer, 48000 Hz MOV file at 1080p 24FPS, and any other format is recognized as "Unrecognized Image". I have a LOT of mp4 files I would like to convert to that specific format, however I cannot find anything online related to converting mp4's to MOVs in that specific encoding.
I tried looking up online tools to achieve the same result, but none of them could get the specific audio format along with the video, and removing the audio from the MOV also makes the camera not recognize it.
It would be ideal if I could make a python script do the converting for me, as there are hundreds of mp4 files in a folder on my desktop.
If anyone could help, that would be greatly appreciated!
I am getting a problem while trying to get the binary result using webrctvad in a wave format audio file. I am using librosa in order to load the audio file in .wav format. Can anyone tell me how to use librosa along with webrtcvad in order to get the binary output of whether the audio contains speech or not?
Webrtcvad module works correctly with the wave module
The above link helped me a lot but still, I am confused as the link contains a good explanation but during implementation lot of errors are coming.
py-webrtcvad, expects the audio data to be 16bit PCM little-endian - as is the most common storage format in WAV files.
librosa and its underlying I/O library pysoundfile however always returns floating point arrays in the range [-1.0, 1.0]. To convertt this to bytes containing 16bit PCM you can use the following float_to_pcm16 function.
And I have tested to use the read_pcm16 function a direct replacement of read_wave in the official py-webrtcvad example. But allowing to open any audio file supported by soundfile (WAV, FLAC, OGG) etc.
def float_to_pcm16(audio):
import numpy
ints = (audio * 32767).astype(numpy.int16)
little_endian = ints.astype('<u2')
buf = little_endian.tostring()
return buf
def read_pcm16(path):
import soundfile
audio, sample_rate = soundfile.read(path)
assert sample_rate in (8000, 16000, 32000, 48000)
pcm_data = float_to_pcm16(audio)
return pcm_data, sample_rate
I currently have the idea to code a small audio converter (e.g. FLAC to MP3 or m4a format) application in C# or Python but my problem is I do not know at all how audio conversion works.
After a research, I heard about Analog-to-digital / Digital-to-analog converter but I guess it would be a Digital-to-digital or something like that isn't it ?
If someone could precisely explain how it works, it would be greatly appreciated.
Thanks.
digital audio is called PCM which is the raw audio format fundamental to any audio processing system ... its uncompressed ... just a series of integers representing the height of the audio curve for each sample of the curve (the Y axis where time is the X axis along this curve)
... this PCM audio can be compressed using some codec then bundled inside a container often together with video or meta data channels ... so to convert audio from A to B you would first need to understand the container spec as well as the compressed audio codec so you can decompress audio A into PCM format ... then do the reverse ... compress the PCM into codec of B then bundle it into the container of B
Before venturing further into this I suggest you master the art of WAVE audio files ... beauty of WAVE is that its just a 44 byte header followed by the uncompressed integers of the audio curve ... write some code to read a WAVE file then parse the header (identify bit depth, sample rate, channel count, endianness) to enable you to iterate across each audio sample for each channel ... prove that its working by sending your bytes into an output WAVE file ... diff input WAVE against output WAVE as they should be identical ... once mastered you are ready to venture into your above stated goal ... do not skip over groking notion of interleaving stereo audio as well as spreading out a single audio sample which has a bit depth of 16 bits across two bytes of storage and the reverse namely stitching together multiple bytes into a single integer with a bit depth of 16, 24 or even 32 bits while keeping endianness squared away ... this may sound scary at first however all necessary details are on the net as its how I taught myself this level of detail
modern audio compression algorithms leverage knowledge of how people perceive sound to discard information which is indiscernible ( lossy ) as opposed to lossless algorithms which retain all the informational load of the source ... opus (http://opus-codec.org/) is a current favorite codec untainted by patents and is open source
I want to analyze my music collection, which is all CD audio data (stereo 16-bit PCM, 44.1kHz). What I want to do is programmatically determine if the bass is mixed (panned) only to one channel. Ideally, I'd like to be able to run a program like this
mono-bass-checker music.wav
And have it output something like "bass is not panned" or "bass is mixed primarily to channel 0".
I have a rudimentary start on this, which in pseudocode looks like this:
binsize = 2^N # define a window or FFT bin as a power of 2
while not end of audio file:
read binsize samples from audio file
de-interleave channels into two separate arrays
chan0_fft_result = fft on channel 0 array
chan1_fft_result = fft on channel 1 array
for each index i in (number of items in chanX_fft_result/2):
freqency_bin = i * 44100 / binsize
# define bass as below 150 Hz (and above 30 Hz, since I can't hear it)
if frequency_bin > 150 or frequency_bin < 30 ignore
magnitude = sqrt(chanX_fft_result[i].real^2 + chanX_fft_result[i].complex^2)
I'm not really sure where to go from here. Some concepts I've read about but are still too nebulous to me:
Window function. I'm currently not using one, just naively reading from the audio file 0 to 1024, 1025 to 2048, etc (for example with binsize=1024). Is this something that would be useful to me? And if so, how does it get integrated into the program?
Normalizing and/or scaling of the magnitude. Lots of people do this for the purpose of making pretty spectograms, but do I need to do that in my case? I understand human hearing roughly works on a log scale, so perhaps I need to massage the magnitude result in some way to filter out what I wouldn't be able to hear anyway? Is something like A-weighting relevant here?
binsize. I understand that a bigger binsize gets me more frequency bins... but I can't decide if that helps or hurts in this case.
I can generate a "mono bass song" using sox like this:
sox -t null /dev/null --encoding signed-integer --bits 16 --rate 44100 --channels 1 sine40hz_mono.wav synth 5.0 sine 40.0
sox -t null /dev/null --encoding signed-integer --bits 16 --rate 44100 --channels 1 sine329hz_mono.wav synth 5.0 sine 329.6
sox -M sine40hz_mono.wav sine329hz_mono.wav sine_merged.wav
In the resulting "sine_merged.wav" file, one channel is pure bass (40Hz) and one is non-bass (329 Hz). When I compute the magnitude of bass frequencies for each channel of that file, I do see a significant difference. But what's curious is that the 329Hz channel has non-zero sub-150Hz magnitude. I would expect it to be zero.
Even then, with this trivial sox-generated file, I don't really know how to interpret the data I'm generating. And obviously, I don't know how I'd generalize to my actual music collection.
FWIW, I'm trying to do this with libsndfile and fftw3 in C, based on help from these other posts:
WAV-file analysis C (libsndfile, fftw3)
Converting an FFT to a spectogram
How do I obtain the frequencies of each value in an FFT?
Not using a window function (the same as using a rectangular window) will splatter some of the high frequency content (anything not exactly periodic in your FFT length) into all other frequency bins of an FFT result, including low frequency bins. (Sometimes this is called spectral "leakage".)
To minimize this, try applying a window function (von Hann, etc.) before the FFT, and expect to have to use some threshold level, instead of expecting zero content in any bins.
Also note that the bass notes from many musical instruments can generate some very powerful high frequency overtones or harmonics that will show up in the upper bins on an FFT, so you can't preclude a strong bass mix from the presence of a lot of high frequency content.
I was reading THIS TUTORIAL on wav files and I have some confusions.
Suppose I use PCM_16_BIT as my encoding format. So this should mean each of my sound samples need 16 bits to represent them shouldn't it?
But in this tutorial, the second figure shows 4 bytes as one sample. Why is that? I suppose because it is trying to show the format for a stereo recorded wav file, but what if I have a mono recorded wav file? Are the left and right channel values equal in this case, or one of the channel values is 0? How does it work?
Yes, for 16bit stereo you need 4 bytes. For mono, you just need two bytes for 16bit PCM. Check this out:
http://www.codeproject.com/Articles/501521/How-to-convert-between-most-audio-formats-in-NET
Also read here:
http://wiki.multimedia.cx/index.php?title=PCM