I am working on a small project in Squeak and have ran into a problem: I can't get WAV files to decode correctly.
Here's the two methods I am using to decode it at the moment:
convert4bitUnsignedTo16Bit: anArray
"Convert the given array of samples--assumed to be 4-bit unsigned, linear data--into 16-bit signed samples. Return an array containing the resulting samples. I only thinking it is unsigned. I don't really know."
| n samples s |
n _ anArray size.
samples _ SoundBuffer newStereoSampleCount: (n * 2).
1 to: n do: [:i |
s _ anArray at: i.
samples at: (i * 2) put: (self imaDecode: s).
samples at: ((i * 2) - 1) put: (self imaDecode: s)].
^ samples
.
imaDecode: number
| n |
n _ number.
n >= 128 ifTrue: [n _ n - 256].
^ (n) * 16
It gives me a sound at the correct rate, and if I listen closely, I can hear the original sound. But it is very staticy.
I am wondering if anyone can spot what is wrong with my code and help me figure out why the sound is so staticy. (BTW: I would call the convert4bitUnsignedFrom16Bit: method from the readFrom: method in SampledSound with the data variable as the argument).
-TheCompModder
The input to your decoder method is a ByteArray. Each 8-bit byte stores two samples in a 4-bit encoding. Assuming this is a stereo track then the left/right channel would be stored in each byte's upper/lower 4 bits. Your imaDecode: method does not extract those bits. I think it should look more like this (obviously untested):
1 to: n do: [:i |
byte := anArray at: i.
left := byte bitAnd: 15. "lower 4 bits"
right := (byte >> 4) bitAnd: 15. "upper 4 bits"
samples at: (i * 2) put: (left - 8) << 12.
samples at: ((i * 2) - 1) put: (right - 8) << 12].
This would put the 4-bit values into left and right, bias them by -8 (assuming they're actually signed) and expand by 12 bits to be a full 16 bit signed sample.
Btw I think your buffer size is too large, its stereoSampleCount should be n not n * 2.
Also, you might want to post a sample file to the Squeak developers mailing list if you need further help.
Related
So basically, I thought the formula for computing pcm file size went as follows:
fileSize(in bits) = samples_per_sec x seconds x number_of_channels
And it worked just fine for me since I was exclusively dealing with pcm files which had 8 bit depth.
When I started to deal with 16 bit depth files, the formula didn't produce accurate results.
Through some googling I found out that my aformentioned formula was wrong, actually you have to adhere to this one:
fileSize(in bits) = samples_per_sec x seconds x number_of_channel x bit_depth/8
It explains why I was getting correct results with the incorrect formula since, you know, 8 / 8 = 1.
The thing that I don't get is this: why do you have to divide bit depth by eight?
In order to get bits as a result of your calculations, you have to get them on the right side of your formula as well:
bits = samples/seconds x seconds x num_of_channels(dimensionless) x bits/sample = bits
which is fine. So, it should work without division by eight. But it doesn't.
Where am I wrong?
In your notation style:
samples_per_sec x seconds x number_of_channels is total number of samples
samples_per_sec x seconds x number_of_channel x bit_depth is total number of bits
samples_per_sec x seconds x number_of_channel x bit_depth / 8 is total number of bytes
samples/seconds x seconds x num_of_channels(dimensionless) x bits/sample is sample_rate x duration_in_seconds x num_of_channels x bit_depth, which is again total number of bits
The main confusion is likely from bits and bytes. Audio sample size is typically described in bit depth not byte depth. File size / memory is described typically in bytes. To go from bits to bytes you simply divide by 8.
I have been working on string encoding schemes and while I examine how UTF-16 works, I have a question. Why using complex surrogate pairs to represent 21 bits code point? Why not to simply store the bits in the first code unit and the remaining bits in the second code unit? Am I missing something! Is there a problem to store the bits directly like we did in UTF-8?
Example of what I am thinking of:
The character 'π'
Corresponding code point: 128579 (Decimal)
The binary form: 1 1111 0110 0100 0011 (17 bits)
It's 17-bit code point.
Based on UTF-8 schemes, it will be represented as:
240 : 11110 000
159 : 10 011111
153 : 10 011001
131 : 10 000011
In UTF-16, why not do something looks like that rather than using surrogate pairs:
49159 : 110 0 0000 0000 0111
30275 : 01 11 0110 0100 0011
Proposed alternative to UTF-16
I think you're proposing an alternative format using 16-bit code units analogous to the UTF-8 code scheme βΒ let's designate it UTF-EMF-16.
In your UTF-EMF-16 scheme, code points from U+0000 to U+7FFF would be encoded as a single 16-bit unit with the MSB (most significant bit) always zero. Then, you'd reserve 16-bit units with the 2 most significant bits set to 10 as 'continuation units', with 14 bits of payload data. And then you'd encode code points from U+8000 to U+10FFFF (the current maximum Unicode code point) in 16-bit units with the three most significant bits set to 110 and up to 13 bits of payload data. With Unicode as currently defined (U+0000 .. U+10FFFF), you'd never need more than 7 of the 13 bits set.
U+0000 .. U+7FFF β One 16-bit unit: values 0x0000 .. 0x7FFF
U+8000 .. U+10FFF β Two 16-bit units:
1. First unit 0xC000 .. 0xC043
2. Second unit 0x8000 .. 0xBFFF
For your example code point, U+1F683 (binary: 1 1111 0110 0100 0011):
First unit: 1100 0000 0000 0111 = 0xC007
Second unit: 1011 0110 0100 0011 = 0xB643
The second unit differs from your example in reversing the two most significant bits, from 01 in your example to 10 in mine.
Why wasn't such a scheme used in UTF-16
Such a scheme could be made to work. It is unambiguous. It could accommodate many more characters than Unicode currently allows. UTF-8 could be modified to become UTF-EMF-8 so that it could handle the same extended range, with some characters needing 5 bytes instead of the current maximum of 4 bytes. UTF-EMF-8 with 5 bytes would encode up to 26 bits; UTF-EMF-16 could encode 27 bits, but should be limited to 26 bits (roughly 64 million code points, instead of just over 1 million). So, why wasn't it, or something very similar, adopted?
The answer is the very common one β history (plus backwards compatibility).
When Unicode was first defined, it was hoped or believed that a 16-bit code set would be sufficient. The UCS2 encoding was developed using 16-bit values, and many values in the range 0x8000 .. 0xFFFF were given meanings. For example, U+FEFF is the byte order mark.
When the Unicode scheme had to be extended to make Unicode into a bigger code set, there were many defined characters with the 10 and 110 bit patterns in the most significant bits, so backwards compatibility meant that the UTF-EMF-16 scheme outlined above could not be used for UTF-16 without breaking compatibility with UCS2, which would have been a serious problem.
Consequently, the standardizers chose an alternative scheme, where there are high surrogates and low surrogates.
0xD800 .. 0xDBFF High surrogates (most signicant bits of 21-bit value)
0xDC00 .. 0xDFFF Low surrogates (less significant bits of 21-bit value)
The low surrogates range provides storage for 10 bits of data β the prefix 1101 11 uses 6 of 16 bits. The high surrogates range also provides storage for 10 bits of data β the prefix 1101 10 also uses 6 of 16 bits. But because the BMP (Basic Multilingual Plane β U+0000 .. U+FFFF) doesn't need to be encoded with two 16-bit units, the UTF-16 encoding subtracts 1 from the high order data, and can therefore be used to encode U+10000 .. U+10FFFF. (Note that although Unicode is a 21-bit encoding, not all 21-bit (unsigned) numbers are valid Unicode code points. Values from 0x110000 .. 0x1FFFFF are 21-bit numbers but are not a part of Unicode.)
From the Unicode FAQ β UTF-8, UTF-16, UTF-32 & BOM:
Q: Whatβs the algorithm to convert from UTF-16 to character codes?
A: The Unicode Standard used to contain a short algorithm, now there is just a bit distribution table. Here are three short code snippets that translate the information from the bit distribution table into C code that will convert to and from UTF-16.
Using the following type definitions
typedef unsigned int16 UTF16;
typedef unsigned int32 UTF32;
the first snippet calculates the high (or leading) surrogate from a character code C.
const UTF16 HI_SURROGATE_START = 0xD800
UTF16 X = (UTF16) C;
UTF32 U = (C >> 16) & ((1 << 5) - 1);
UTF16 W = (UTF16) U - 1;
UTF16 HiSurrogate = HI_SURROGATE_START | (W << 6) | X >> 10;
where X, U and W correspond to the labels used in Table 3-5 UTF-16 Bit Distribution. The next snippet does the same for the low surrogate.
const UTF16 LO_SURROGATE_START = 0xDC00
UTF16 X = (UTF16) C;
UTF16 LoSurrogate = (UTF16) (LO_SURROGATE_START | X & ((1 << 10) - 1));
Finally, the reverse, where hi and lo are the high and low surrogate, and C the resulting character
UTF32 X = (hi & ((1 << 6) -1)) << 10 | lo & ((1 << 10) -1);
UTF32 W = (hi >> 6) & ((1 << 5) - 1);
UTF32 U = W + 1;
UTF32 C = U << 16 | X;
A caller would need to ensure that C, hi, and lo are in the appropriate ranges. [
I am trying to implement a ping server in Python, and I am going through Pyping's source code as a reference: https://github.com/Akhavi/pyping/blob/master/pyping/core.py
I am not being able to understand the calculate_checksum function that has been implemented to calculate the checksum of the ICMP echo request. It has been am implemented as follows:
def calculate_checksum(source_string):
countTo = (int(len(source_string) / 2)) * 2
sum = 0
count = 0
# Handle bytes in pairs (decoding as short ints)
loByte = 0
hiByte = 0
while count < countTo:
if (sys.byteorder == "little"):
loByte = source_string[count]
hiByte = source_string[count + 1]
else:
loByte = source_string[count + 1]
hiByte = source_string[count]
sum = sum + (ord(hiByte) * 256 + ord(loByte))
count += 2
# Handle last byte if applicable (odd-number of bytes)
# Endianness should be irrelevant in this case
if countTo < len(source_string): # Check for odd length
loByte = source_string[len(source_string) - 1]
sum += ord(loByte)
sum &= 0xffffffff # Truncate sum to 32 bits (a variance from ping.c, which
# uses signed ints, but overflow is unlikely in ping)
sum = (sum >> 16) + (sum & 0xffff) # Add high 16 bits to low 16 bits
sum += (sum >> 16) # Add carry from above (if any)
answer = ~sum & 0xffff # Invert and truncate to 16 bits
answer = socket.htons(answer)
return answer
sum &= 0xffffffff is used for truncating the sum to 32 bits. However, what happens to the extra bit (the 33rd bit). Shouldn't that be added to the sum as a carry? Also, I am not being the able to understand the code after this.
I read the RFC1071 documentation (http://www.faqs.org/rfcs/rfc1071.html) that explains how to implement the checksum, but I haven't been able to understand much.
Any help would be appreciated. Thanks!
I was finally able to figure out the working of the calculate_checksum function, and I have tried to explain it below.
The checksum calculation is as follows (as per RFC1071):
Adjacent octets in the source_string are paired to form 16-bit
integers, and the 1's complement sum of these integers is
calculated. In case of odd number of octets, pairs are created out of the n-1 octets and added, and the remaining octet is added to the sum.
The resulting sum is truncated to 16-bits (carry bits are to be taken care of) and the checksum is calculated by taking it's 1's complement. The final checksum should be 16-bits long.
Let's take an example.
If the checksum is to be computed over the sequence of octets [A, B, C, D, E], the pairs created would be [A, B] and [C, D], with the remaining octet E. The pairs [a, b] can be computed as follows:
a*256+b where a and b are the octets
Say if a is 11001010 and b is 00010001, a*256+b = 1100101000010001 thus giving us the concatenated results of the octets.
The 1's complement sum is thus computed as follows:
sum = [A+B] +' [C+D] +' E where +' represents 1's complement
addition
Now coming back to the code, everything before the line sum &= 0xffffffff calculates the 1's complement sum that we have calculated before.
sum &= 0xffffffff
is used for truncating the sum to 32-bits, although the sum exceeding is unlikely in ping as the size of the source_string is not very large
(source_string = header(8 bytes) + payload (variable length)).
sum = (sum >> 16) + (sum & 0xffff)
This piece of code is implemented for the case when the sum is greater than 16-bits. The sum is broken down into 2 parts:
(sum >> 16): the higher order 16-bits
(sum & 0xffff): the lower order 16-bits
and then these two parts are added. The final result can be 16-bits ogreater than 16-bits
sum += (sum >> 16)
This line is used in case the resulting sum from the previous computation is longer than 16-bits and is used to take care of the carry, similar to the previous line.
Finally, the 1's complement is calculated and truncated to 16 bits. The socket.htons() function is used for maintaining the arrangement of bytes sent to the network based on the architecture of your device (Little endian and big endian).
Say I have a Float. I want the first 32 bits of the fractional part of this Float? Specifically, I am looking at getting this part of the sha256 pseudo-code working (from the wikipedia)
# Note 1: All variables are unsigned 32 bits and wrap modulo 232 when calculating
# Note 2: All constants in this pseudo code are in big endian
# Initialize variables
# (first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):
h[0..7] := 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
I naively tried doing floor (((sqrt 2) - 1) * 2^32), and then converting the returned integer to a Word32. This does not at all appear to be the right answer. I figured that by multipling by 2^32 power, I was effectively left shifting it 32 places (after the floor). Obviously, not the case. Anyway, long and the short of it is, how do I generate h[0..7] ?
The best way to get h[0..7] is to copy the hex constants from the Wikipedia page. That way you know you will have the correct ones.
But if you really want to compute them:
scaledFrac :: Integer -> Integer
scaledFrac x =
let s = sqrt (fromIntegral x) :: Double
in floor ((s - fromInteger (floor s)) * 2^32)
[ printf "%x" (scaledFrac i) | i <- [2,3,5,7,11,13,17,19] ]
Need help with Hex Editor and audio files.I am having trouble figuring out the formula to get the number of samples in my .wav files.
I downloaded StripWav which tells me the number of samples in the .waves,but still cannot figure out the formula.
Can you please download these two .wavs,open them in a hex editor and tell me the formula to get the number of samples.
If you so kindly do this for me,pleas tell me the number of samples for each .wav so I can make sure the formula is correct.
http://sinewavemultimedia.com/1wav.wav
http://sinewavemultimedia.com/2wav.wav
Here is a problem I have two programs,
One reads the wav data and the other shows the numsamples
here is the data
RIFF 'WAVE' (wave file)
<fmt > (format description)
PCM format
2 channel
44100 frames per sec
176400 bytes per sec
4 bytes per frame
16 bits per sample
<data> (waveform data - 92252 bytes)
But the other program says NumSamples is
23,063 samples
/*******UPDATE*********/
One more thing I did the calculation with 2 files
This one is correct
92,296 bytes and num samples is 23,063`
But this other one is not coming out correctly it is over 2 megs i just subracted 44 bytes and I doing it wrong here? here is the filesize
2,473,696 bytes
But the correct numsamples is
617,400
WAVE format
You must read the fmt header to determine the number of channels and bits per sample, then read the size of the data chunk to determine how many bytes of data are in the audio. Then:
NumSamples = NumBytes / (NumChannels * BitsPerSample / 8)
There is no simple formula for determining the number of samples in a WAV file. A so-called "canonical" WAV file consists of a 44-byte header followed by the actual sample data. So, if you know that the file uses 2 bytes per sample, then the number of samples is equal to the size of the file in bytes, minus 44 (for the header), and then divided by 2 (since there are 2 bytes per sample).
Unfortunately, not all WAV files are "canonical" like this. A WAV file uses the RIFF format, so the proper way to parse a WAV file is to search through the file and locate the various chunks.
Here is a sample (not sure what language you need to do this in):
http://msdn.microsoft.com/en-us/library/ms712835
A WAVE's format chunk (fmt) has the 'bytes per sample frame' specified as wBlockAlign.
So: framesTotal = data.ck_size / fmt.wBlockAlign;
and samplesTotal = framesTotal * wChannels;
Thus, samplesTotal===FramesTotal IIF wChannels === 1!!
Note how the above answer elegantly avoided to explain that key-equations the spec (and answers based on them) are WRONG:
consider flor example a 2 channel 12 bits per second wave..
The spec explains we put each 12bps sample in a word:
note: t=point in time, chan = channel
+---------------------------+---------------------------+-----
| frame 1 | frame 2 | etc
+-------------+-------------+-------------+-------------+-----
| chan 1 # t1 | chan 2 # t1 | chan 1 # t2 | chan 2 # t2 | etc
+------+------+------+------+------+------+------+------+-----
| byte | byte | byte | byte | byte | byte | byte | byte | etc
+------+------+------+------+------+------+------+------+-----
So.. how many bytes does the sample-frame (BlockAlign) for a 2ch 12bps wave have according to spec?
<sarcasm> CEIL(wChannels * bps / 8) = 3 bytes.. </sarcasm>
Obviously the correct equation is: wBlockAlign=wChannels*CEIL(bps/8)