How is the WAV signal stored? - audio

It is necessary to record sound in WAV format, forming a waveform according to a certain law. I figured out how the WAV header works, but I don't understand how the sound is recorded in any way. I chose a 32-bit recording per sample, here is an example of sample from a real file:
55 73 0A 0D 32 33 30 3D
In theory, it should be a fractional number from -1.0 to 1.0. There are questions: how is the sign stored here? I also wonder where the comma stands here.

Related

Will this change in audio codec result in appreciable difference?

I used SciPy to run a butterworth pass, removing sounds above a certain frequency from an audio file. The SciPy package is fast and easy to use but unfortunately, lacking options in terms of specifying codec to be used in the output.
My original audio files were in PCM s16LE # 16 bits per sample. The output audio files are in 64 bits floats LE # 64 bits per sample. Will the change in codec have an appreciable impact on the way the audio files sound. Would I be able to keep the sound quality similar if I were to convert the output audio codec back to its original format?
Yes, converting the audio back to the original format of 16 bit integer should not cause audible quality loss.
The higher precision format might be useful as intermediate format for processing, but converting back to 16 bit integer format does not incur any extra audible noise.
See https://people.xiph.org/~xiphmont/demo/neil-young.html for further explanations on the matter. A few relevant quotes:
16 bits is enough to store all we can hear, and will be enough forever.
[...]
When does 24 bit matter?
Professionals use 24 bit samples in recording and production for headroom, noise floor, and convenience reasons.
16 bits is enough to span the real hearing range with room to spare. [...]
[...] Once the music is ready to distribute, there's no reason to keep more than 16 bits.

How can I convert an audio (.wav) to satellite image

I need to create a software can capture sound (from NOAA Satellite with RTL-SDR). The problem is not capture the sound, the problem is how I converted the audio or waves into an image. I read many things, Fourier Fast Transformed, Hilbert Transform, etc... but I don't know how.
If you can give me an idea it would be fantastic. Thank you!
Over the past year I have been writing code which makes FFT calls and have amassed 15 pages of notes so the topic is vast however I can boil it down
Open up your WAV file ... parse the 44 byte header and note the given bit depth and endianness attributes ... then read across the payload which is everything after that header ... understand notion of bit depth as well as endianness ... typically a WAV file has a bit depth of 16 bits so each point on the audio curve will be stored across two bytes ... typically WAV file is little endian not big endian ... knowing what that means you take the next two bytes then bit shift one byte to the left (if little endian) then bit OR that pair of bytes into an integer then convert that int which typically varies from 0 to (2^16 - 1) into its floating point equivalent so your audio curve points now vary from -1 to +1 ... do that conversion for each set of bytes which corresponds to each sample of your payload buffer
Once you have the WAV audio curve as a buffer of floats which is called raw audio or PCM audio then perform your FFT api call ... all languages have such libraries ... output of FFT call will be a set of complex numbers ... pay attention to notion of the Nyquist Limit ... this will influence how your make use of output of your FFT call
Now you have a collection of complex numbers ... the index from 0 to N of that collection corresponds to frequency bins ... the size of your PCM buffer will determine how granular your frequency bins are ... lookup this equation ... in general more samples in your PCM buffer you send to the FFT api call will give you finer granularity in the output frequency bins ... essentially this means as you walk across this collection of complex numbers each index will increment the frequency assigned to that index
To visualize this just feed this into a 2D plot where X axis is frequency and Y axis is magnitude ... calculate this magnitude for each complex number using
curr_mag = 2.0 * math.Sqrt(curr_real*curr_real+curr_imag*curr_imag) / number_of_samples
For simplicity we will sweep under the carpet the phase shift information available to you in your complex number buffer
This only scratches the surface of what you need to master to properly render a WAV file into a 2D plot of its frequency domain representation ... there are libraries which perform parts or all of this however now you can appreciate some of the magic involved when the rubber hits the road
A great explanation of trade offs between frequency resolution and number of audio samples fed into your call to an FFT api https://electronics.stackexchange.com/questions/12407/what-is-the-relation-between-fft-length-and-frequency-resolution
Do yourself a favor and checkout https://www.sonicvisualiser.org/ which is one of many audio workstations which can perform what I described above. Just hit menu File -> Open -> choose a local WAV file -> Layer -> Add Spectrogram ... and it will render the visual representation of the Fourier Transform of your input audio file as such

How does an audio converter work?

I currently have the idea to code a small audio converter (e.g. FLAC to MP3 or m4a format) application in C# or Python but my problem is I do not know at all how audio conversion works.
After a research, I heard about Analog-to-digital / Digital-to-analog converter but I guess it would be a Digital-to-digital or something like that isn't it ?
If someone could precisely explain how it works, it would be greatly appreciated.
Thanks.
digital audio is called PCM which is the raw audio format fundamental to any audio processing system ... its uncompressed ... just a series of integers representing the height of the audio curve for each sample of the curve (the Y axis where time is the X axis along this curve)
... this PCM audio can be compressed using some codec then bundled inside a container often together with video or meta data channels ... so to convert audio from A to B you would first need to understand the container spec as well as the compressed audio codec so you can decompress audio A into PCM format ... then do the reverse ... compress the PCM into codec of B then bundle it into the container of B
Before venturing further into this I suggest you master the art of WAVE audio files ... beauty of WAVE is that its just a 44 byte header followed by the uncompressed integers of the audio curve ... write some code to read a WAVE file then parse the header (identify bit depth, sample rate, channel count, endianness) to enable you to iterate across each audio sample for each channel ... prove that its working by sending your bytes into an output WAVE file ... diff input WAVE against output WAVE as they should be identical ... once mastered you are ready to venture into your above stated goal ... do not skip over groking notion of interleaving stereo audio as well as spreading out a single audio sample which has a bit depth of 16 bits across two bytes of storage and the reverse namely stitching together multiple bytes into a single integer with a bit depth of 16, 24 or even 32 bits while keeping endianness squared away ... this may sound scary at first however all necessary details are on the net as its how I taught myself this level of detail
modern audio compression algorithms leverage knowledge of how people perceive sound to discard information which is indiscernible ( lossy ) as opposed to lossless algorithms which retain all the informational load of the source ... opus (http://opus-codec.org/) is a current favorite codec untainted by patents and is open source

Decoding lossless jpeg in nema.org DICOM files

I wrote a jpeg compressor/decompressor years ago, which can handle lossless and lossy jpeg files. It works well, but doesn't always decode jpeg streams in DICOM files correctly.
I know jpeg well, but I know little about DICOM. Lossless jpeg in DICOM can't possibly be compliant with the jpeg ISO standard. There must be some modification, either hard coded, or modified by a parameter somewhere in a DICOM file outside of the jpeg file stream.
My code fails on most of the sample DICOM files (compsamples_jpeg.tar) at:
ftp://medical.nema.org/MEDICAL/Dicom/DataSets/WG04/
Here's what happens when I decode the first lossless jpeg (IMAGES\JPLL\CT1_JPLL) in this set:
dicom decoded image
The left image is rendered from my code, the right was rendered by an online DICOM reader:
www (dot) ofoct (dot) com (slash) viewer (slash) dicom-viewer-online (dot) html
(x)MedCon, an open source DICOM reader, fails at the exact same pixel as my code, so I'm not the only one who has this problem.
xmedcon dot sourceforge dot net
I have read this jpeg stream byte by byte, drew the huffman tree and calculated the huffman codes with pencil and paper, and my code does exactly what it is supposed to do. Here are the huffman codes:
0 00
4 01
3 100
5 101
1 1100
2 1101
6 1110
7 11110
8 111110
9 1111110
12 11111110
11 111111110
10 1111111110
15 11111111110
Here is the compressed data after the SOS marker:
ff 00 de 0c 00 (00 after ff is stuff byte)
11111111 11011110 00001100 00000000
11111111110 si=15
111100000110000 diff=30768
The online viewer says the first pixel value is -3024. If this is correct, the first diff value should be -3024, but it is not.
After this, my code correctly decodes about 2/5 of the image, but then decodes a wildly inaccurate diff value:
d2 a1 fe ff 00 e0 (00 after ff is stuff byte)
1010111 10100001 11111110 11111111 11100000
101 si=5
01111 diff=-16
01 si=4
0000 diff=-15
111111110 si=11 ????
11111111111 diff=2047
If you look at the image decoded by the online viewer, there is no radical change in pixel intensity at this location, so the si=11 value can't be correct.
I am sure I have a good understanding of jpeg, but jpeg streams in DICOM don't seem to follow the jpeg standard. What extensions/changes are made to jpeg streams when they are embedded in DICOM files?
DICOM specifies the use of ISO 10918 just as it is written, so there is nothing magic about the use of lossless JPEG in DICOM images, other than the matters of reinterpreting the always unsigned output of the decoded bitstream as signed (depending on Pixel Representation) and applying the Rescale Slope and Intercept to the decoded "stored pixel values" into whatever "values" a viewer might report (e.g., as Hounsfield Units), as Paolo describes. Or to put it another way, do not rely on the "pixel values" reported by a viewer to be the same as the direct output of the decoded bitstream.
For reference, here are the sections in DICOM that address the use of 10918 in general:
http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_8.2.html#sect_8.2.1
http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_A.4.html#sect_A.4.1
DICOM encoders may split individual compressed frames into separate fragments, as in the case of this sample that deliberately uses fragmentation to test the decoding capability. I expect you know that and have taken care of reassembling compressed the bit stream across fragment boundaries (i.e., removing the fixed length Item tags between fragments):
http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_A.4.html
Though some encoders may be buggy, I don't think that is the case for IMAGES\JPLL\CT1_JPLL in the NEMA sample dataset, which I created many years ago using the Stanford PVRG codec.
My own decoder (minimal as it is) at http://www.dclunie.com/pixelmed/software/codec/ has no problem with it. The source is available, so if you want to recompile it with some of the debugging messages turned on to track each decoded value, predictor input value, restart at the beginning of each row, etc., to compare with your own logic, feel free.
Finally, since JPEG lossless is used rarely outside DICOM, you may find it hard to obtain other samples to test with. One such source that comes to mind is the USF digitized mammography collection (medical, but not DICOM), at http://marathon.csee.usf.edu/Mammography/Database.html.
David
PS. I did check which codec XMedCon is using at https://sourceforge.net/projects/xmedcon/ and it seems to use some copy of the Cornell lossless code; so it may be vulnerable to the same bug described in the post that BitBank referred to (https://groups.google.com/forum/#!topic/comp.protocols.dicom/Yl5GkZ8ggOE) or some other error. I didn't try to decipher the source code to see.
The first pixel's value is indeed -3024 as the online dicom viewer says:
You correctly decode the first amplitude as 30768, but the first pixel has the predictor set to zero and therefore its real value is 32768+30768=63536. This is an unsigned value.
Now, the pixel representation tag says that the file values are in b2 complement (signed), therefore when we use the most significant bit as a sign the value becomes -2000.
When we apply the value in the rescale slope intercept tag (-1024) then the value of the first pixel becomes -3024.
However, my codec doesn't find any amplitude 2047 near the row 179, so maybe your codec is going out of sync somehow: the loss of sync is also visible in the subsequent rows (they are all shifted to the right).

need help understanding data section of wav file

I was reading THIS TUTORIAL on wav files and I have some confusions.
Suppose I use PCM_16_BIT as my encoding format. So this should mean each of my sound samples need 16 bits to represent them shouldn't it?
But in this tutorial, the second figure shows 4 bytes as one sample. Why is that? I suppose because it is trying to show the format for a stereo recorded wav file, but what if I have a mono recorded wav file? Are the left and right channel values equal in this case, or one of the channel values is 0? How does it work?
Yes, for 16bit stereo you need 4 bytes. For mono, you just need two bytes for 16bit PCM. Check this out:
http://www.codeproject.com/Articles/501521/How-to-convert-between-most-audio-formats-in-NET
Also read here:
http://wiki.multimedia.cx/index.php?title=PCM

Resources