I'm looking to convert some audio files into spectrograms. I'm wondering what the difference is between an m4a and wav file. If I have two of the same audio recording, one saved as wav and the other as m4a, will there be a difference in the spectrogram representations of both?
Both WAV and M4A are container formats, with options for how exactly audio data is encoded and represented inside the file. WAV file has one audio track with variety of encoding options including those possible for M4A format. However most often typically WAV refers to having uncompressed audio inside, where data is contained in PCM format.
M4A files are MP4 (MPEG-4 Part 14) files with an implication that there is one audio track inside. There are much less encoding options even though they still include both compressed and uncompressed ones. Most often M4A has audio encoded with AAC encoding, which is a lossy encoding. Depending on that loss, roughly on how much of information was lost during the encoding, your spectrogram could be different from the one built on original data.
The m4a format uses a lossy compression algorithm, so there may be differences, depending on compression level, and the resolution and depth of the spectrogram. The .wav format can also be lossy, due to quantization of the sound by an A/D or any sample format/rate conversions. So the difference may be in the noise floor, or in the portions of the sound's spectrum that are usually inaudible (due to masking effects and etc.) to humans.
Related
I am recording audio from HTML and it is getting stored as .webm format.
I feeding that audio to google speech api to get the transcript from it.
I found out that .flac is lossless so I converted it from webm to flac using FFMPEG.
But i am having one doubt, converting audio from webm to flac increases the size of file but if an audio is already lossy with webm format converting to flac will still be lossy because the information is already lost.
Am i wrong with this assumption ?
Am i wrong with this assumption ?
No. FLAC conversion will only preserve the data in the source file. Any data lost during original conversion to WebM codec (Opus/vorbis) is gone.
I currently have the idea to code a small audio converter (e.g. FLAC to MP3 or m4a format) application in C# or Python but my problem is I do not know at all how audio conversion works.
After a research, I heard about Analog-to-digital / Digital-to-analog converter but I guess it would be a Digital-to-digital or something like that isn't it ?
If someone could precisely explain how it works, it would be greatly appreciated.
Thanks.
digital audio is called PCM which is the raw audio format fundamental to any audio processing system ... its uncompressed ... just a series of integers representing the height of the audio curve for each sample of the curve (the Y axis where time is the X axis along this curve)
... this PCM audio can be compressed using some codec then bundled inside a container often together with video or meta data channels ... so to convert audio from A to B you would first need to understand the container spec as well as the compressed audio codec so you can decompress audio A into PCM format ... then do the reverse ... compress the PCM into codec of B then bundle it into the container of B
Before venturing further into this I suggest you master the art of WAVE audio files ... beauty of WAVE is that its just a 44 byte header followed by the uncompressed integers of the audio curve ... write some code to read a WAVE file then parse the header (identify bit depth, sample rate, channel count, endianness) to enable you to iterate across each audio sample for each channel ... prove that its working by sending your bytes into an output WAVE file ... diff input WAVE against output WAVE as they should be identical ... once mastered you are ready to venture into your above stated goal ... do not skip over groking notion of interleaving stereo audio as well as spreading out a single audio sample which has a bit depth of 16 bits across two bytes of storage and the reverse namely stitching together multiple bytes into a single integer with a bit depth of 16, 24 or even 32 bits while keeping endianness squared away ... this may sound scary at first however all necessary details are on the net as its how I taught myself this level of detail
modern audio compression algorithms leverage knowledge of how people perceive sound to discard information which is indiscernible ( lossy ) as opposed to lossless algorithms which retain all the informational load of the source ... opus (http://opus-codec.org/) is a current favorite codec untainted by patents and is open source
i need the audio file for the yuv test video sequences like foreman.yuv, akiyo.yuv
,etc. none of the yuv sequences online have the audio file.
any other yuv sequences with the audio file suitable for encoder analysis also works.
I think there is no audio available for the sequences you mention.
But have a look at https://media.xiph.org/video/derf/, at the bottom of the page you have full sequences with flac audio.
A MP3 file header only contain the sample rate and bit rate, so the decoder can't figure out the bit depth from the header. Maybe it can only guess from the bit rate? But the bit rate is varying from frame to frame.
Here is another way to ask this question: If I encoder an 24 bit WAV to mp3, so how does the 24-bit info stored in this mp3?
When the source WAV is compressed, the original bit depth information is "thrown away". This is by design in any compressed audio codec since the whole point is to use the least bits possible to store the "same" audio.
Internally, MP3 uses Huffman symbols to store the processed audio data. As such, there's no real "bit depth" to report.
During the encoding process, the samples are quantized, so the original bit depth information is lost.
MP3 decoders either choose a bitdepth they operate at, or allow the end-user/application to dictate it. The bitdepth is determined during "re-quantization".
Have a read of http://blog.bjrn.se/2008/10/lets-build-mp3-decoder.html which is rather enlightening
I am writing an encoding software and dealing with uncompressed wav and flac formats. My question is, flac is supposed to be a lossless format, similar to wav but compressed. However, certain softwares such as JRiver's Media Center offer a 'quality' setting for encoding flac files. Does taht mean they are offering to reduce quality pre-encoding or am i missing something in the flac standard?
The quality parameter for FLAC refers to the quality of compression, not audio. The audio will stay lossless but you get a better compression with higher quality. Higher quality will take more time to compress however.
See docs http://wiki.jriver.com/index.php/Encoding_Settings
Free Lossless Audio Codec (FLAC): FLAC is a popular lossless, freely
available open source encoder. [2] Quality Settings: 0 - 8. Sets the
quality of compression (and not sound, which is lossless), 8 meaning
most compressed/time/effort.