I was reading THIS TUTORIAL on wav files and I have some confusions.
Suppose I use PCM_16_BIT as my encoding format. So this should mean each of my sound samples need 16 bits to represent them shouldn't it?
But in this tutorial, the second figure shows 4 bytes as one sample. Why is that? I suppose because it is trying to show the format for a stereo recorded wav file, but what if I have a mono recorded wav file? Are the left and right channel values equal in this case, or one of the channel values is 0? How does it work?
Yes, for 16bit stereo you need 4 bytes. For mono, you just need two bytes for 16bit PCM. Check this out:
http://www.codeproject.com/Articles/501521/How-to-convert-between-most-audio-formats-in-NET
Also read here:
http://wiki.multimedia.cx/index.php?title=PCM
Related
I have a Canon Powershot S100 and I would like to be able to use it to play videos. The only format it supports it is a H.264 Linear PCM, 16 bit little-endian signed integer, 48000 Hz MOV file at 1080p 24FPS, and any other format is recognized as "Unrecognized Image". I have a LOT of mp4 files I would like to convert to that specific format, however I cannot find anything online related to converting mp4's to MOVs in that specific encoding.
I tried looking up online tools to achieve the same result, but none of them could get the specific audio format along with the video, and removing the audio from the MOV also makes the camera not recognize it.
It would be ideal if I could make a python script do the converting for me, as there are hundreds of mp4 files in a folder on my desktop.
If anyone could help, that would be greatly appreciated!
This question already has answers here:
What do the bytes in a .wav file represent?
(6 answers)
Closed last year.
Each WAV files depends on a Sampling Rate and a Bit Depth. The former governs how many different samples are played per second, and the latter governs how many possibilities there are for each timeslot.
For sampling rate for example 1000 Hz and the bit depth is 8 then each 1/1000 of a second the audio device plays one of a possible $2^8$ different sounds.
Hence the bulk of the WAV file is a sequence of 8-bit numbers. There is also a header which contains the Sampling Rate and Bit Depth and other specifics of how the data should be read:
The above comes from running xxd on a wav file to view it in binary on the terminal. The first column is just increments of 6 in hexadecimal. The last one seems to say where the header ends. So the data looks like this:
Each of those 8-bit numbers is a sample. So the device reads left-to right and converts the samples in order into sounds. But how in principle can each number correspond to a sound. I would think each bit should somehow encode an amplitude and a pitch, with each coming from a finite range. But I can not find any reference to for example the first half of the bits being a pitch and the second being a frequency.
I have found references to the numbers encoding "signal strength" but I do not know what this means.Can anyone explain in principle how the data is read and converted to audio?
In your example, over the course of a second, 1000 values are sent to a DAC (Digital to Analog converter) where the discrete values are smoothed out into a waveform. The pitch is determined by the rate and pattern by which the stream of values (which get smoothed out to a wave) rise and fall.
Steve W. Smith gives some good diagrams and explanations in his chapter ADC and DCA from his very helpful book The Scientists and Engineers Guide to Digital Signal Processing.
I used SciPy to run a butterworth pass, removing sounds above a certain frequency from an audio file. The SciPy package is fast and easy to use but unfortunately, lacking options in terms of specifying codec to be used in the output.
My original audio files were in PCM s16LE # 16 bits per sample. The output audio files are in 64 bits floats LE # 64 bits per sample. Will the change in codec have an appreciable impact on the way the audio files sound. Would I be able to keep the sound quality similar if I were to convert the output audio codec back to its original format?
Yes, converting the audio back to the original format of 16 bit integer should not cause audible quality loss.
The higher precision format might be useful as intermediate format for processing, but converting back to 16 bit integer format does not incur any extra audible noise.
See https://people.xiph.org/~xiphmont/demo/neil-young.html for further explanations on the matter. A few relevant quotes:
16 bits is enough to store all we can hear, and will be enough forever.
[...]
When does 24 bit matter?
Professionals use 24 bit samples in recording and production for headroom, noise floor, and convenience reasons.
16 bits is enough to span the real hearing range with room to spare. [...]
[...] Once the music is ready to distribute, there's no reason to keep more than 16 bits.
I currently have the idea to code a small audio converter (e.g. FLAC to MP3 or m4a format) application in C# or Python but my problem is I do not know at all how audio conversion works.
After a research, I heard about Analog-to-digital / Digital-to-analog converter but I guess it would be a Digital-to-digital or something like that isn't it ?
If someone could precisely explain how it works, it would be greatly appreciated.
Thanks.
digital audio is called PCM which is the raw audio format fundamental to any audio processing system ... its uncompressed ... just a series of integers representing the height of the audio curve for each sample of the curve (the Y axis where time is the X axis along this curve)
... this PCM audio can be compressed using some codec then bundled inside a container often together with video or meta data channels ... so to convert audio from A to B you would first need to understand the container spec as well as the compressed audio codec so you can decompress audio A into PCM format ... then do the reverse ... compress the PCM into codec of B then bundle it into the container of B
Before venturing further into this I suggest you master the art of WAVE audio files ... beauty of WAVE is that its just a 44 byte header followed by the uncompressed integers of the audio curve ... write some code to read a WAVE file then parse the header (identify bit depth, sample rate, channel count, endianness) to enable you to iterate across each audio sample for each channel ... prove that its working by sending your bytes into an output WAVE file ... diff input WAVE against output WAVE as they should be identical ... once mastered you are ready to venture into your above stated goal ... do not skip over groking notion of interleaving stereo audio as well as spreading out a single audio sample which has a bit depth of 16 bits across two bytes of storage and the reverse namely stitching together multiple bytes into a single integer with a bit depth of 16, 24 or even 32 bits while keeping endianness squared away ... this may sound scary at first however all necessary details are on the net as its how I taught myself this level of detail
modern audio compression algorithms leverage knowledge of how people perceive sound to discard information which is indiscernible ( lossy ) as opposed to lossless algorithms which retain all the informational load of the source ... opus (http://opus-codec.org/) is a current favorite codec untainted by patents and is open source
A MP3 file header only contain the sample rate and bit rate, so the decoder can't figure out the bit depth from the header. Maybe it can only guess from the bit rate? But the bit rate is varying from frame to frame.
Here is another way to ask this question: If I encoder an 24 bit WAV to mp3, so how does the 24-bit info stored in this mp3?
When the source WAV is compressed, the original bit depth information is "thrown away". This is by design in any compressed audio codec since the whole point is to use the least bits possible to store the "same" audio.
Internally, MP3 uses Huffman symbols to store the processed audio data. As such, there's no real "bit depth" to report.
During the encoding process, the samples are quantized, so the original bit depth information is lost.
MP3 decoders either choose a bitdepth they operate at, or allow the end-user/application to dictate it. The bitdepth is determined during "re-quantization".
Have a read of http://blog.bjrn.se/2008/10/lets-build-mp3-decoder.html which is rather enlightening