How to change the volume of a PCM data stream (failed experiment) - audio

My code was never before used for processing signed values and as such bytes -> short conversion was incorrectly handling the sign bit. Doing that properly solved the issue.
The question was...
I'm trying to change the volume of a PCM data stream. I can extract single channel data from a stereo file, do various silly experimental effects with the samples by skipping/duplicating them/inserting zeros/etc but I can't seem to find a way to modify actual sample values in any way and get a sensible output.
My attempts are really simple:
source audio data
values - 10000
values + 10000
values * 0.9
values * 1.1
(value = -value works fine -- reverses the wave and it sounds the same)
The code to do this is equally simple (I/O uses unsigned values in range 0-65535) <-- that was the problem, reading properly signed values solved the issue:
int sample = unsigned 16 bit value from a stream...
sample -= 32768;
sample = (int)(sample * 0.9f);
sample += 32768;
...write unsigned 16 bit value to a stream...
int sample = *signed* 16 bit value from a stream...
sample = (int)(sample * 0.9f);
...write 16 bit value to a stream...
I'm trying to make the sample quieter. I'd imagine making the amplitude smaller (sample * 0.9) would result in a quieter file but both 4. and 5. above are clearly invalid. There is a similar question on SO where MusiGenesis saying he got correct results with 'sample *= 0.75' type of code (yes, I did experiment with other values besides 0.9 and 1.1).
The question is: am I doing something stupid or is the whole idea of multiplying by a constant wrong? I'd like the end result to be something like this:

Your 4th attempt is definitely the the correct approach. Assuming your sample range is centered around 0, multiplying each sample by another value is how you can change the volume or gain of a signal.
In this case though, I'd guess something funny happening behind the scenes when you're multiplying an int by a float and casting back to int. Hard to say without knowing what language you're using, but that might be what's causing the problem.


Vulkan - strange mapping of float shader color values to uchar values in a read buffer

I knew that a range of float color value in a shader [0..1] is mapped into range of [0..255] in UCHAR buffer.
According to this, I was expecting for steps of size of 1/255 in shader color values for each change in UCHAR buffer.
But the results were surprisingly different. Here is for the first two steps:
Red float value in Shader -> UCHAR value in a read Buffer
0.000000 -> 0
0.002197 -> 0
0.002198 -> 1
0.006102 -> 1
0.006105 -> 2
The first two steps are around 0.002197 and 0.006102 which are different than the expected steps: 0.00392 and 0.00784.
So what is the mapping formula ?
Unsigned integer normalization is based on the formula f = i/INT_MAX, where f is the floating point value (after clamping to [0, 1]), i is the integer value, and INT_MAX is the maximum integer value for the integer's bitdepth (255) in this case.
So if you have a float, and want the unsigned, normalized integer value of it, you use i = f * INT_MAX. Of course... integers do not have the same precision as floats. So if the result of f * INT_MAX is 0.5, what is the integer value of that? It could be 0, or it could be 1, depending on how things are rounded.
Implementations are permitted to round integer values in any way they prefer. They are encouraged to use nearest rounding (the post-conversion 0.49 would become 0, and 0.5 would become 1), but that is not a requirement. The only requirements are that it must pick one of the two nearest values (it can't turn 0.5 into 3) and that the exact floating-point values of 0.0 and 1.0 (which includes any values clamped to them) must be exactly represented as integer 0 and INT_MAX.
If you have an explicit need to have direct rounding, you can always do the normalization yourself. In fact, GLSL has specific functions to help you. The following assumes that you are trying to write to a texture with the Vulkan format R8G8B8A8_UNORM, and we're assuming you're writing to a storage image, not via outputs from the fragment shader (you can do that too, but you lose blending).
So, step 1 is to change your layout format to be r32ui. That is, you are now writing an unsigned 32-bit value, rather than 4 unsigned 8-bit normalized values. That's perfectly valid.
Step 2 is to employ the packUNorm4x8 function. This function does float-to-integer normalization, but the specification explicitly performs rounding correctly. Use the return value of that function in your imageStore function, and you're fine.
If you want to write to a fragment shader output, that's a bit more complex. There, you will need to use a different image view, one that uses the R32_UINT format. So you're creating a 32-bit unsigned integer view of a 4x8-bit normalized texture. That has to become a render target, so you're going to have to do subpass surgery. From there, just write the result of packUNorm4x8.
Of course, you immediately lose blending and similar operations, since you're writing integers values. And since you had to do that subpass surgery, it's likely that any shader writing to it will need to do this too.
Also, note that in both cases, you will likely need to adjust the order of the components of the value you write. packUNorm4x8 is explicitly defined to be little endian, whereas (I believe?) R8G8B8A8 is specified to be in that order, most-significant to least. So you'll probably need to essentially do endian swapping with packUNorm4x8(value.abgr).

How does Audacity mix audio samples?

So let's say I want to mix these 2 audio tracks:
In Audacity, I can use the "Mix and Render" option to mix them together, and I'll get this:
However, when I try to write my own code to mix, I get this:
This is essentially how I mix the samples:
private function mixSamples(sample1:UInt, sample2:UInt):UInt
return (sample1 + sample2) & 0xFF;
(The syntax is Haxe but it should be easy to follow if you don't know it.)
These are 8-bit sample audio files, and I want the product to be 8-bit as well, hence the & 0xFF.
I do understand that by simply adding the samples, I should expect clipping. My issue is that mixing in Audacity doesn't cause clipping (at least not to the extent that my code does), and by looking at the "tail" of the second (longer) track, it doesn't seem to reduce the amplitude. It doesn't sound any softer either.
So basically, my question is this: what's Audacity doing that I'm not? I want to mix tracks to sound exactly as if they're being played on top of one another, but I (obviously) don't want this horrendous clipping.
Here is what I get if I sign the values before I add, then unsign the sum value, as suggested by Radiodef:
As you can see it's much better than before, but is still quite distorted and noisy compared to the result Audacity produces. So my problem still stands, Audacity must be doing something differently.
I mixed the first track on itself, both with my code and Audacity, and compared the points where distortion occurs. This is Audacity's result:
And this is my result:
I think what is happening is you are summing them as unsigned. A typical sound wave is both positive and negative which is why they add together the way they do (some parts cancel). If you have some 8-bit sample that is -96 and another that is 96 and you sum them you will get 0. If what you have is unsigned audio you will instead have the samples 32 and 224 summed = 256 (offset and overflow).
What you need to do is sign them before summing. To sign 8-bit samples convert them to a signed int type and subtract 128 from all of them. I assume what you have are WAV files and you will need to unsign them again after the sum.
Audacity probably does floating point processing. I've heard some real dubious claims about floating point like that it has "infinite dynamic range" and garbage like that but it doesn't clip in the same determinate and obvious way as integers do. Floating point has a finite range of values same as integers but the largest and smallest values are much farther apart. (That's about the simplest way to put it.) Floating point can allow much greater amplitude changes in the audio but the catch is the overall signal to noise ratio is lower than integers.
With the weird distortion my best guess is it is from the mask you are doing with & 0xFF. If you want to actually clip instead of getting overflow you will need to do so yourself.
for (int i = 0; i < samplesLength; i++) {
if (samples[i] > 127) {
samples[i] = 127;
} else if (samples[i] < -128) {
samples[i] = -128;
Otherwise say you have two samples that are 125, summing gets you 250 (11111010). Then you unsign (add 128) and get 378 (101111010). An & will get you 1111010 which is 122. Other numbers might get you results that are effectively negative or close to 0.
If you want to clip at something other than 8-bit, full scale for a bit depth n will be positive (2 ^ (n - 1)) - 1 and negative 2 ^ (n - 1) so for example 32767 and -32768 for 16-bit.
Another thing you can do instead of clipping is to search for clipping and normalize. Something like:
double[] normalize(double[] samples, int length, int destBits) {
double fsNeg = -pow(2, destBits - 1);
double fsPos = -fsNeg - 1;
double peak = 0;
double norm = 1;
for (int i = 0; i < length; i++) {
// find highest clip if there is one
if (samples[i] < fsNeg || samples[i] > fsPos) {
norm = abs(samples[i]);
if (norm > peak) {
norm = peak;
if (peak != 0) {
// ratio to reduce to where there is not a clip
norm = -fsNeg / peak;
for (int i = 0; i < length; i++) {
samples[i] *= norm;
return samples;
It's a lot simpler than you think; although your original files are 8-bit, Audacity handles them internally as 32-bit floating point. You can see this in the screenshot, in the information panel to the left of each track. This means that adding 2 tracks together means adding two floating point samples at each point, and will simply yield sample values from -2.0 to +2.0, which are then clamped to the -1 to +1 range. By comparison, adding two 8-bit integers together will yield another 8-bit number where the value overflows and wraps around. (This can apply whether you use signed or unsigned values.)

Apple's heart rate monitoring example and byte order of bluetooth heart rate measurement characteristics

On the heart rate measurement characteristics:
Link is now at
and look for "heart rate measurement".
They no longer offer an XML viewer, but instead you need to view XML directly.
Also for services it's on this page.
I want to make sure I'm reading it correctly. Does that actually says 5 fields? The mandatory, C1, C2, C3, and C4? And the mandatory is at the first byte, and C4 is at the last two bytes, C1 and C2 are 8-bit fields, and C3 to C4 are 16-bit each. That's a total of 8 bytes. Am I reading this document correctly?
I'm informed that the mandatory flag fields indicate something is 0, it means it's just not there. For example, if the first bit is 0, C1 is the next field, if 1, C2 follows instead.
In Apple's OSX heart rate monitor example:
- (void) updateWithHRMData:(NSData *)data
const uint8_t *reportData = [data bytes];
uint16_t bpm = 0;
if ((reportData[0] & 0x01) == 0)
/* uint8 bpm */
bpm = reportData[1];
/* uint16 bpm */
bpm = CFSwapInt16LittleToHost(*(uint16_t *)(&reportData[1]));
... // I ignore rest of the code for simplicity
It checks the first bit as zero, and if it isn't, it's changing the little endianness to whatever the host byte order is, by applying CFSwapInt16LittleToHost to reportData[1].
How does that bit checking work? I'm not entirely certain of endianess. Is it saying that whether it's little or big, the first byte is always the mandatory field, the second byte is the C1, etc? And since reportData is an 8-bit pointer (typedef to unsigned char), it's checking either bit 0 or bit 8 of the mandatory field.
If that bit is bit 8, the bit is reserved for future use, why is it reading in there?
If that bit is 0, it's little-endian and no transformation is required? But if it's little-endian, the first bit could be 1 according to the spec, 1 means "Heart Rate Value Format is set to UINT16. Units: beats per minute (bpm)", couldn't that be mis-read?
I don't understand how it does the checking.
I kept on saying there was C5, that was a blunder. It's up to C4 only and I edited above.
Am I reading this document correctly?
IMHO, you are reading it a little wrong.
C1 to C4 should be read as Conditional 1 to Conditional 4. And in the table for org.bluetooth.characteristic.heart_rate_measurement, if the lowest bit of the flag byte is 0, then C1 is met, otherwise, C2 is.
You can think it a run-time configurable union type in the C programming language(, which is determined by the flag. Beware this is not always true because the situation got complicated by C3 and C4).
// Note: this struct is only for you to better understand a simplified case.
// You should still stick to the profile documentations to implement.
typedef struct {
uint8_t flag;
union {
uint8_t bpm1;
uint16_t bpm2;
How does that bit checking work?
if ((reportData[0] & 0x01) == 0) effectively checks the bit with bitwise AND operator. Go and find a C/C++ programming intro book if any doubt.
The first byte is always the flag, in this case. The value of flag dynamically determines how should the rest of the bytes should be dealt with. C3 and C4 are both optional, and can be omitted if the corresponding bits in the flag were set zeroes. C1 and C2 are mutual exclusive.
There is no endianness ambiguity in the Bluetooth standard, as it has been well addressed that little-endian should be used all the time. You should always assume that those uint16_t fields are transferred as little endian. Apple's precaution is just to reassure the most portability of the code, since they would not guarantee the endianness of architectures used in their future products.
I see how it goes. It's not testing for Endianness. Rather, it's testing for whether the field is 8 bit or 16 bit, and in the case of 16 bit, it'll convert from little endianness to host order. But I see that before conversion and after conversion it's the same number. So I guess the system is little endian to begin with so I don't know what's the point.

How to assign a float to a int64_t

To upload a score to the game center, they require you to have a value which is of type int64_t.
Is there a way to simply convert my float to int64_t?
I have built my whole game around the score and i need an easy solution any ideas?
I'll take a stab at this. Someone else's answer (from )
Every score has to be submitted as an int64. So you need to convert your float to match the setting of your leaderboard. So with a fixed 3
dp you need to multiply the float by 1000 to get the 3rd dp into the
int - then submit.
int64_t gcScore = (int64_t)(score * 1000.0f);
gkscore.value = gcScore;
With some rounding coming into play it is important to make sure what
gets submitted is what has also been shown to the player - we had some
problems of being 1 out on the GC and in game display at times - just
had to go through every display & conversion of the score values to
make sure they took care to display properly.
All the changes to the leaderboard settings take a while it seems, and
submitted scores can often not show up for a while too. The Game
Center sandbox is pretty awful to be honest. Once you go live it is
better in responding to new scores, but you can't make any changes to
the leaderboard format so you have to persevere in the Sandbox to get
it right.
If your floats are limited in size (less than 9,223,372,036,854,775,807), the conversion is
int64_t myInt = (int64_t) myFloat;
If you want to "save" decimals, you can scale results (multiply the float with 10 or 100):
int64_t myInt_scaled = (int64_t) (myFloat * 100.0);
You will need to cast the float into an int64_t and then check for overflow. This cast will truncate the number, i.e. 5.655 will become 5.
try {
score = (int64_t)floatScore;
catch(OverflowException e) {
// Print error

Libsox encoding

Why do i get distorted output if I convert a wav file using libsox to:
&in->encoding.encoding = SOX_ENCODING_UNSIGNED;
&in->encoding.bits_per_sample = 8;
using the above code?
The input file has bits_per_sample = 16.
So you're saying that you tell SOX to read a 16 bit sample WAV file as an 8 bit sample file? Knowing nothing about SOX, I would expect it to read each 16 bit sample as two 8 bit samples... the high order byte and the low order byte like this: ...HLHLHLHLHL...
For simplicity, we'll call high order byte samples 'A' samples. 'A' samples carry the original sound with less dynamic range, because the low order byte with the extra precision has been chopped off.
We'll call the low order byte samples "B samples." These will be roughly random and encode noise.
So, as a result we'll have the original sound, the 'A' samples, shifted down in frequency by a half. This is because there's a 'B' sample between every 'A' sample which halves the rate of the 'A' samples. The 'B' samples add noise to the original sound. So we'll have the original sound, shifted down by a half, with noise.
Is that what you're hearing?
Edit Guest commented that the goal is to downconvert a WAV to 8 bit audio. Reading the manpage for SoX, it looks like SoX always uses 32 bit audio in memory as a result of sox_read(). Passing it a format will only make it attempt to read from that format.
To downconvert in memory, use SOX_SAMPLE_TO_SIGNED_8BIT or SOX_SAMPLE_TO_UNSIGNED_8BIT from sox.h, ie:
sox_format_t ft = sox_open_read("/file/blah.wav", NULL, NULL);
if( ft ) {
sox_ssample_t buffer[100];
sox_size_t amt = sox_read(ft, buffer, sizeof(buffer));
char 8bitsample = SOX_SAMPLE_TO_SIGNED_8BIT(buffer[0], ft->clips);
to output a downconverted file, use the 8 bit format when writing instead of when reading.
