Mixing multiple sound clip - audio

I'm trying to mix six sound clips together.
Imagine each clip is a single guitar string pluck sound and I want to mix them to produce a guitar chord.
Here, a clip is an array of real numbers in the range [-1,1], where each number is a mono sample.
double mixed_sample = mix(double sample1, ..., double sample6);
Please, implement mix!

You have got to be kidding.
Mixing is simple addition of signals.
double mix(double s1, double s2, double s3, double s4, double s5, double s6)
{
return (s1 + s2 + s3 + s4 + s5 + s6);
}
Next step is to provide individual channel gains.
double variable_mix(double s1, double s2, double s3, double s4, double s5, double s6,
double g1, double g2, double g3, double g4, double g5, double g6)
{
return (s1*g1 + s2*g2 + s3*g3 + s4*g4 + s5*g5 + s6*g6);
}
Of course, this is kind of a pain in the ass to code, and the parameter-passing overhead will eat you alive, but this is basically what you have to do.

Related

Robust linear interpolation

Given two segment endpoints A and B (in two dimensions), I would like to perform linear interpolation based on a value t, i.e.:
C = A + t(B-A)
In the ideal world, A, B and C should be collinear. However, we are operating with limited floating-point here, so there will be small deviations. To work around numerical issues with other operations I am using robust adaptive routines originally created by Jonathan Shewchuk. In particular, Shewchuk implements an orientation function orient2d that uses adaptive precision to exactly test the orientation of three points.
Here my question: is there a known procedure how the interpolation can be computed using the floating-point math, so that it lies exactly on the line between A and B? Here, I care less about the accuracy of the interpolation itself and more about the resulting collinearity. In another terms, its ok if C is shifted around a bit as long as collinearity is satisfied.
The bad news
The request can't be satisfied. There are values of A and B for which there is NO value of t other than 0 and 1 for which lerp(A, B, t) is a float.
A trivial example in single precision is x1 = 12345678.f and x2 = 12345679.f. Regardless of the values of y1 and y2, the required result must have an x component between 12345678.f and 12345679.f, and there's no single-precision float between these two.
The (sorta) good news
The exact interpolated value, however, can be represented as the sum of 5 floating-point values (vectors in the case of 2D): one for the formula's result, one for the error in each operation [1] and one for multiplying the error by t. I'm not sure if that will be useful to you. Here's a 1D C version of the algorithm in single precision that uses fused multiply-add to calculate the product error, for simplicity:
#include <math.h>
float exact_sum(float a, float b, float *err)
{
float sum = a + b;
float z = sum - a;
*err = a - (sum - z) + (b - z);
return sum;
}
float exact_mul(float a, float b, float *err)
{
float prod = a * b;
*err = fmaf(a, b, -prod);
return prod;
}
float exact_lerp(float A, float B, float t,
float *err1, float *err2, float *err3, float *err4)
{
float diff = exact_sum(B, -A, err1);
float prod = exact_mul(diff, t, err2);
*err1 = exact_mul(*err1, t, err4);
return exact_sum(A, prod, err3);
}
In order for this algorithm to work, operations need to conform to IEEE-754 semantics in round-to-nearest mode. That's not guaranteed by the C standard, but the GNU gcc compiler can be instructed to do so, at least in processors supporting SSE2 [2][3].
It is guaranteed that the arithmetic addition of (result + err1 + err2 + err3 + err4) will be equal to the desired result; however, there is no guarantee that the floating-point addition of these quantities will be exact.
To use the above example, exact_lerp(12345678.f, 12345679.f, 0.300000011920928955078125f, &err1, &err2, &err3, &err4) returns a result of 12345678.f and err1, err2, err3 and err4 are 0.0f, 0.0f, 0.300000011920928955078125f and 0.0f respectively. Indeed, the correct result is 12345678.300000011920928955078125 which can't be represented as a single-precision float.
A more convoluted example: exact_lerp(0.23456789553165435791015625f, 7.345678806304931640625f, 0.300000011920928955078125f, &err1, &err2, &err3, &err4) returns 2.3679010868072509765625f and the errors are 6.7055225372314453125e-08f, 8.4771045294473879039287567138671875e-08f, 1.490116119384765625e-08f and 2.66453525910037569701671600341796875e-15f. These numbers add up to the exact result, which is 2.36790125353468550173374751466326415538787841796875 and can't be exactly stored in a single-precision float.
All numbers in the examples above are written using their exact values, rather than a number that approximates to them. For example, 0.3 can't be represented exactly as a single-precision float; the closest one has an exact value of 0.300000011920928955078125 which is the one I've used.
It might be possible that if you calculate err1 + err2 + err3 + err4 + result (in that order), you get an approximation that is considered collinear in your use case. Perhaps worth a try.
References
[1] Graillat, Stef (2007). Accurate Floating Point Product and Exponentiation.
[2] Enabling strict floating point mode in GCC
[3] Semantics of Floating Point Math in GCC

Spectrogram of two audio files (Added together)

Assume for a moment I have two input signals f1 and f2. I could add these signals to produce a third signal f3 = f1 + f2. I would then compute the spectrogram of f3 as log(|stft(f3)|^2).
Unfortunately I don't have the original signals f1 and f2. I have, however, their spectrograms A = log(|stft(f1)|^2) and B = log(|stft(f2)|^2). What I'm looking for is a way to approximate log(|stft(f3)|^2) as closely as possible using A and B. If we do some math we can derive:
log(|stft(f1 + f2)|^2) = log(|stft(f1) + stft(f2)|^2)
express stft(f1) = x1 + i * y1 & stft(f2) = x2 + i * y2 to write
... = log(|x1 + i * y1 + x2 + i * y2|^2)
... = log((x1 + x2)^2 + (y1 + y2)^2)
... = log(x1^2 + x2^2 + y1^2 + y2^2 + 2 * (x1 * x2 + y1 * y2))
... = log(|stft(f1)|^2 + |stft(f2)|^2 + 2 * (x1 * x2 + y1 * y2))
So at this point I could use the approximation:
log(|stft(f3)|^2) ~ log(exp(A) + exp(B))
but I would ignore the last part 2 * (x1 * x2 + y1 * y2). So my question is: Is there a better approximation for this?
Any ideas? Thanks.
I'm not 100% understanding your notation but I'll give it a shot. Addition in the time domain corresponds to addition in the frequency domain. Adding two time domain signals x1 and x2 produces a 3rd time domain signal x3. x1, x2 and x3 all have a frequency domain spectrum, F(x1), F(x2) and F(x3). F(x3) is also equal to F(x1) + F(x2) where the addition is performed by adding the real parts of F(x1) to the real parts of F(x2) and adding the imaginary parts of F(x1) to the imaginary parts of F(x2). So if x1[0] is 1+0j and x2[0] is 0.5+0.5j then the sum is 1.5+0.5j. Judging from your notation you are trying to add the magnitudes, which with this example would be |1+0j| + |0.5+0.5j| = sqrt(1*1) + sqrt(0.5*0.5+0.5*0.5) = sqrt(2) + sqrt(0.5). Obviously not the same thing. I think you want something like this:
log((|stft(a) + stft(b)|)^2) = log(|stft(a)|^2) + log(|stft(b)|^2)
Take the exp() of the 2 log magnitudes, add them, then take the log of the sum.
Stepping back from the math for a minute, we can see that at a fundamental level, this just isn't possible.
Consider a 1st signal f1 that is a pure tone at frequency F and amplitude A.
Consider a 2nd signal f2 that is a pure tone at frequency F and amplitude A, but perfectly out of phase with f1.
In this case, the spectrograms of f1 & f2 are identical.
Now consider two possible combined signals.
f1 added to itself is a pure tone at frequency F and amplitude 2A.
f1 added to f2 is complete silence.
From the spectrograms of f1 and f2 alone (which are identical), you've no way to know which of these very different situations you're in. And this doesn't just hold for pure tones. Any signal and its reflection about the axis suffer the same problem. Generalizing even further, there's just no way to know how much your underlying signals cancel and how much they reinforce each other. That said, there are limits. If, for a particular frequency, your underlying signals had amplitudes A1 and A2, the biggest possible amplitude is A1+A2 and the smallest possible is abs(A1-A2).

Lua: color fading function

I'm trying to create a function that inputs two RGB colors and a percentage then returns a color in-between the two based off of the percentage.
I found the Dec2Hex function somewhere online and figured it would be useful.
Right now I have tried:
function Dec2Hex(nValue) -- http://www.indigorose.com/forums/threads/10192-Convert-Hexadecimal-to-Decimal
if type(nValue) == "string" then
nValue = String.ToNumber(nValue);
end
nHexVal = string.format("%X", nValue); -- %X returns uppercase hex, %x gives lowercase letters
sHexVal = nHexVal.."";
if nValue < 16 then
return "0"..tostring(sHexVal)
else
return sHexVal
end
end
function fade_RGB(colour1, colour2, percentage)
r1, g1, b1 = string.match(colour1, "#([0-9A-F][0-9A-F])([0-9A-F][0-9A-F])([0-9A-F][0-9A-F])")
r2, g2, b2 = string.match(colour2, "#([0-9A-F][0-9A-F])([0-9A-F][0-9A-F])([0-9A-F][0-9A-F])")
r3 = (tonumber(r1, 16)/tonumber(r2, 16))*(percentage)
g3 = (tonumber(g1, 16)/tonumber(g2, 16))*(percentage)
b3 = (tonumber(b1, 16)/tonumber(b2, 16))*(percentage)
return "#"..Dec2Hex(r3).. Dec2Hex(g3)..Dec2Hex(b3)
end
I think I'm headed in the right direction but the math isn't right and I can't figure out how to fix it. Thanks in advance!
No Name's answer is almost right, but he's not merging the two colors based on the percentage.
What you instead want is to do a linear interpolation of the two values (though know that human vision/light wise this isn't how interpolating colors works, but a lot of libraries do it this way because it is easy and works for simple cases).
r3 = tonumber(r1, 16)*(100-percentage)/100.0 + tonumber(r2, 16)*(percentage)/100.0
As you may notice multiplying and dividing the percentages by 100 is kind of tedious, so you may want to pass it in already divided.
If I'm right, the line
r3 = (tonumber(r1, 16)/tonumber(r2, 16))*(percentage)
should be
r3 = math.abs(tonumber(r1, 16) - tonumber(r2, 16))*(percentage/100)
The other similar lines follow the same concept.
EDIT:
r3 = math.min(tonumber(r1, 16), tonumber(r2, 16)) +
math.abs(tonumber(r1, 16) - tonumber(r2, 16)) * (percentage/100)
should yield red for fade_RGB("#FF0000", #0000FF, 0) and blue for fade_RGB("#FF0000", #0000FF, 100).

Un/pack additional set of UV coordinates into a 32bit RGBA field

I'm modding a game called Mount&Blade, currently trying to implement lightmapping through custom shaders.
As the in-game format doesn't allows more than one UV map per model and I need to carry the info of a second, non-overlapping parametrization somewhere, a field of four uints (RGBA, used for per-vertex coloring) is my only possibility.
At first thought about just using U,V=R,G but the precision isn't good enough.
Now I'm trying to encode them with the maximum precision available, using two fields (16bit) per coordinate. Snip of my Python exporter:
def decompose(a):
a=int(a*0xffff) #fill the entire range to get the maximum precision
aa =(a&0xff00)>>8 #decompose the first half and save it as an 8bit uint
ab =(a&0x00ff) #decompose the second half
return aa,ab
def compose(na,nb):
return (na<<8|nb)/0xffff
I'd like to know how to do the second part (composing, or unpacking it) in HLSL (DX9, shader model 2.0). Here's my try, close, but doesn't works:
//compose UV from n=(na<<8|nb)/0xffff
float2 thingie = float2(
float( ((In.Color.r*255.f)*256.f)+
(In.Color.g*255.f) )/65535.f,
float( ((In.Color.b*255.f)*256.f)+
(In.Color.w*255.f) )/65535.f
);
//sample the lightmap at that position
Output.RGBColor = tex2D(MeshTextureSamplerHQ, thingie);
Any suggestion or ingenious alternative is welcome.
Remember to normalize aa and ab after you decompose a.
Something like this:
(u1, u2) = decompose(u)
(v1, v2) = decompose(v)
color.r = float(u1) / 255.f
color.g = float(u2) / 255.f
color.b = float(v1) / 255.f
color.a = float(v2) / 255.f
The pixel shader:
float2 texc;
texc.x = (In.Color.r * 256.f + In.Color.g) / 257.f;
texc.y = (In.Color.b * 256.f + In.Color.a) / 257.f;

Optimizing parameter input for functions

I have a function that has a large number if individual input parameters, the function is also run hundreds of millions of times during the program.
If i wanted to optimize this function, should I create a new data structure that holds all the input parameters and pass it to the function by reference instead of passing each parameter individually to the function? Or would it not matter because the compiler is smart enough to deal wit this in an even more efficient manner?
In general, it's much better to pass a data structure that contains your variables. This isn't pretty to look at or use:
void f(int a, int b, int c, int d, int e, int f)
{
// do stuff
}
This is much nicer:
void f(Params p)
{
// do stuff with p
}
You may want to do pass by reference, so the compiler can just pass a reference to the object, and not copy the entire data structure.
As a real example:
double distance(double x1, double y1, double z1, double x2, double y2, double z2)
{
double dx = x1 - x2;
double dy = y1 - y2;
double dz = z1 - z2;
return sqrt(dx*dx + dy*dy + dz*dz);
}
It would be better if encapsulated our (x, y, z) into a data structure though:
struct Point
{
double x;
double y;
double z;
};
double distance(const Point &p1, const Point &p2)
{
double dx = p1.x - p2.x;
double dy = p1.y - p2.y;
double dz = p1.z - p2.z;
return sqrt(dx*dx + dy*dy + dz*dz);
}
Much cleaner code, and you get the added bonus that it could* perform better (*Depending on how smart your compiler is at optimizing either version).
Obviously this could vary wildly depending on what you're actually trying to accomplish, but if you have several (4+) variables that have a similar usage in a certain context, it may be better to just pass it in a data structure.
Are the arguments mostly constant, or do most of them change on every call? You don't want to evaluate arguments many times if you could only do them once.
Keep in mind what the compiler does with arguments.
It evaluates each one and pushes it on the stack. Then the function is entered, and it refers to those arguments by their offset in the stack. So it is basically the same as if you put the arguments into a block and passed the block. However, if you build the block yourself, you may be able to re-use old values and only evaluate the ones you know have changed.
In any case, you really have to look at how much work goes on inside the function relative to the time spent passing arguments to it. It's irrelevant that you're calling it 10^8 times without knowing over what overall time span. That could be 10ns per call, or 10ms per call. If the latter, nearly all the time is spent inside the function, so it probably doesn't make much difference how you call it.

Resources