Finding number of samples in .wav file and Hex Editor - audio

Need help with Hex Editor and audio files.I am having trouble figuring out the formula to get the number of samples in my .wav files.
I downloaded StripWav which tells me the number of samples in the .waves,but still cannot figure out the formula.
Can you please download these two .wavs,open them in a hex editor and tell me the formula to get the number of samples.
If you so kindly do this for me,pleas tell me the number of samples for each .wav so I can make sure the formula is correct.
http://sinewavemultimedia.com/1wav.wav
http://sinewavemultimedia.com/2wav.wav
Here is a problem I have two programs,
One reads the wav data and the other shows the numsamples
here is the data
RIFF 'WAVE' (wave file)
<fmt > (format description)
PCM format
2 channel
44100 frames per sec
176400 bytes per sec
4 bytes per frame
16 bits per sample
<data> (waveform data - 92252 bytes)
But the other program says NumSamples is
23,063 samples
/*******UPDATE*********/
One more thing I did the calculation with 2 files
This one is correct
92,296 bytes and num samples is 23,063`
But this other one is not coming out correctly it is over 2 megs i just subracted 44 bytes and I doing it wrong here? here is the filesize
2,473,696 bytes
But the correct numsamples is
617,400

WAVE format
You must read the fmt header to determine the number of channels and bits per sample, then read the size of the data chunk to determine how many bytes of data are in the audio. Then:
NumSamples = NumBytes / (NumChannels * BitsPerSample / 8)

There is no simple formula for determining the number of samples in a WAV file. A so-called "canonical" WAV file consists of a 44-byte header followed by the actual sample data. So, if you know that the file uses 2 bytes per sample, then the number of samples is equal to the size of the file in bytes, minus 44 (for the header), and then divided by 2 (since there are 2 bytes per sample).
Unfortunately, not all WAV files are "canonical" like this. A WAV file uses the RIFF format, so the proper way to parse a WAV file is to search through the file and locate the various chunks.
Here is a sample (not sure what language you need to do this in):
http://msdn.microsoft.com/en-us/library/ms712835

A WAVE's format chunk (fmt) has the 'bytes per sample frame' specified as wBlockAlign.
So: framesTotal = data.ck_size / fmt.wBlockAlign;
and samplesTotal = framesTotal * wChannels;
Thus, samplesTotal===FramesTotal IIF wChannels === 1!!
Note how the above answer elegantly avoided to explain that key-equations the spec (and answers based on them) are WRONG:
consider flor example a 2 channel 12 bits per second wave..
The spec explains we put each 12bps sample in a word:
note: t=point in time, chan = channel
+---------------------------+---------------------------+-----
| frame 1 | frame 2 | etc
+-------------+-------------+-------------+-------------+-----
| chan 1 # t1 | chan 2 # t1 | chan 1 # t2 | chan 2 # t2 | etc
+------+------+------+------+------+------+------+------+-----
| byte | byte | byte | byte | byte | byte | byte | byte | etc
+------+------+------+------+------+------+------+------+-----
So.. how many bytes does the sample-frame (BlockAlign) for a 2ch 12bps wave have according to spec?
<sarcasm> CEIL(wChannels * bps / 8) = 3 bytes.. </sarcasm>
Obviously the correct equation is: wBlockAlign=wChannels*CEIL(bps/8)

Related

What does "type=1" mean in a .wav file?

If you unpack the data in a wav file between bytes 20 and 35 inclusive (I think) you get different values for type, channels, samplerate, bytespersec, alignment and bits.
[type] => 1
[channels] => 1
[samplerate] => 8000
[bytespersec] => 16000
[alignment] => 2
[bits] => 16
What does type=1 mean? Is there a type = 2? If there is, is that still a wav file? I'm trying to ask Google and I keep getting results like "What is a wav file?" that don't mention anything I'm asking here.
1 seems to imply the data is in the PCM format. Other values would indicate some other format.
http://soundfile.sapp.org/doc/WaveFormat/
20 2 AudioFormat PCM = 1 (i.e. Linear quantization)
Values other than 1 indicate some
form of compression.

Working out sample rate and bit depth of aiff audio from file size

I need some help with Maths/logic here. Working with aif files.
I have written the following:
LnByte = FileLen(ToCheck) 'Returns Filesize in Bytes
LnBit = LnByte * 8 'Get filesize in Bits
Chan = 1 'Channels in audio: mono = 1
BDpth = 24 'Bit Detph
SRate = 48000 'Sample Rate
BRate = 1152000 'Expected Bit Rate
Time_Secs = LnBit / Chan / BDpth / SRate 'Size in Bits / Channels / Bit Depth / Sample Rate
FSize = (BRate / 8) * Time_Secs '(Bitrate / 8) * Length of file in seconds
ToCheck is the current file when looping through a folder of files.
So I'm finding the length of audio based on the file size in bits / channels / bit depth / sample rate. This assumes that the bit depth and sample rate are correct (I need the files to be 24-bit/48kHz).
Time_Secs = Length of the file in seconds.
FSize = File size based on 24/48kHz using the Time_Secs
Probably because the FSize uses Time_Secs, I can't work out how to, from this, work out if the file sample rate and/or bit depth are indeed correct...
Assuming 24/48k should give 144,000 Bytes per second
Assuming 16/48k should give 96,000 Bytes per second
If I check a file that is 16-bit/48 kHz using the above code it gives the incorrect time in secs (naturally) but the correct file size... even though the Bit Rate is 1,152,000 should be wrong.
-- It would seem that the difference in time is making up for the difference in Bit Rate - or I'm looking at it wrong.
How would I adapt my formula, or do the maths to work out if the sample rate/bit depth of a file is actually 48,000 Hz /24-bit? Or is there a different way entirely? Remembering that they are aif files, not wavs.
Hope that makes sense.
Many Thanks in advance!

GStreamer - Generate audio waveform from MP4 file

I have a two-part question,
1) I have an MP4 file and want to generate it's audio waveform.
2) I have another MP4 file which has audio at channel [0] and channel [1] and a video track too, I want to generate waveforms for both channels as separate images.
How can I achieve both of the above by using GSteamer?
Is the raw audio in 16-bit or 32-bit format? What is the sample rate (44100 hz) and what is time duration?
Anyways assuming 44.1khz at 10 second duration... since you can't draw 44 thousand samples as pixel width so choose a final display size (eg: width = 800px and height = 600px) and do math :
//# is (samplerate / duration) / width...
(44100 / 10) / 800 = 551;
After reading first 2 values, you will jump ahead by 551 bytes and repeat until total of (.
So in your raw data starting from pos = 0;...
1) Check this and next sample, then multiply their values together (sample[pos] x sample[pos+1]).
2) Take that result and divide by 65335 (maximum value of 16-bits or 2 bytes). That's the final value of your first sample or point.
3) Draw a line to fit according to image height (eg: 600px) so if sample = 0.83 then:
line_height = (600 x 0.83); //# gives 498 as line height
line_count += 1; //# add plus 1 to line count (stop when it reaches 800)
4) Skip ahead from position [pos] by +551 bytes and repeat step (1) until line_count == 800;

Base64: What is the worst possible increase in space usage?

If a server received a base64 string and wanted to check it's length before converting,, say it wanted to always permit the final byte array to be 16KB. How big could a 16KB byte array possibly become when converted to a Base64 string (assuming one byte per character)?
Base64 encodes each set of three bytes into four bytes. In addition the output is padded to always be a multiple of four.
This means that the size of the base-64 representation of a string of size n is:
ceil(n / 3) * 4
So, for a 16kB array, the base-64 representation will be ceil(16*1024/3)*4 = 21848 bytes long ~= 21.8kB.
A rough approximation would be that the size of the data is increased to 4/3 of the original.
From Wikipedia
Note that given an input of n bytes,
the output will be (n + 2 - ((n + 2) %
3)) / 3 * 4 bytes long, so that the
number of output bytes per input byte
converges to 4 / 3 or 1.33333 for
large n.
So 16kb * 4 / 3 gives very little over 21.3' kb, or 21848 bytes, to be exact.
Hope this helps
16kb is 131,072 bits. Base64 packs 24-bit buffers into four 6-bit characters apiece, so you would have 5,462 * 4 = 21,848 bytes.
Since the question was about the worst possible increase, I must add that there are usually line breaks at around each 80 characters. This means that if you are saving base64 encoded data into a text file on Windows it will add 2 bytes, on Linux 1 byte for each line.
The increase from the actual encoding has been described above.
This is a future reference for myself. Since the question is on worst case, we should take line breaks into account. While RFC 1421 defines maximum line length to be 64 char, RFC 2045 (MIME) states there'd be 76 char in one line at most.
The latter is what C# library has implemented. So in Windows environment where a line break is 2 chars (\r\n), we get this: Length = Floor(Ceiling(N/3) * 4 * 78 / 76)
Note: Flooring is because during my test with C#, if the last line ends at exactly 76 chars, no line-break follows.
I can prove it by running the following code:
byte[] bytes = new byte[16 * 1024];
Console.WriteLine(Convert.ToBase64String(bytes, Base64FormattingOptions.InsertLineBreaks).Length);
The answer for 16 kBytes encoded to base64 with 76-char lines: 22422 chars
Assume in Linux it'd be Length = Floor(Ceiling(N/3) * 4 * 77 / 76) but I didn't get around to test it on my .NET core yet.
Also it would depend on actual character encoding, i.e. if we encode to UTF-32 string, each base64 character would consume 3 additional bytes (4 byte per char).

Libsox encoding

Why do i get distorted output if I convert a wav file using libsox to:
&in->encoding.encoding = SOX_ENCODING_UNSIGNED;
&in->encoding.bits_per_sample = 8;
using the above code?
The input file has bits_per_sample = 16.
So you're saying that you tell SOX to read a 16 bit sample WAV file as an 8 bit sample file? Knowing nothing about SOX, I would expect it to read each 16 bit sample as two 8 bit samples... the high order byte and the low order byte like this: ...HLHLHLHLHL...
For simplicity, we'll call high order byte samples 'A' samples. 'A' samples carry the original sound with less dynamic range, because the low order byte with the extra precision has been chopped off.
We'll call the low order byte samples "B samples." These will be roughly random and encode noise.
So, as a result we'll have the original sound, the 'A' samples, shifted down in frequency by a half. This is because there's a 'B' sample between every 'A' sample which halves the rate of the 'A' samples. The 'B' samples add noise to the original sound. So we'll have the original sound, shifted down by a half, with noise.
Is that what you're hearing?
Edit Guest commented that the goal is to downconvert a WAV to 8 bit audio. Reading the manpage for SoX, it looks like SoX always uses 32 bit audio in memory as a result of sox_read(). Passing it a format will only make it attempt to read from that format.
To downconvert in memory, use SOX_SAMPLE_TO_SIGNED_8BIT or SOX_SAMPLE_TO_UNSIGNED_8BIT from sox.h, ie:
sox_format_t ft = sox_open_read("/file/blah.wav", NULL, NULL);
if( ft ) {
sox_ssample_t buffer[100];
sox_size_t amt = sox_read(ft, buffer, sizeof(buffer));
char 8bitsample = SOX_SAMPLE_TO_SIGNED_8BIT(buffer[0], ft->clips);
}
to output a downconverted file, use the 8 bit format when writing instead of when reading.

Resources