How to implement an instrument/score pattern in supercollider?

How to implement an instrument/score pattern in supercollider? - audio

I've been through a few of the tutorials, but none of them seem to get at what, in my opinion, is a sensible architecture:
There are one or more Instrument instances,
There is a Score which defines a set of Note objects,
A Player class (maybe function) that routes the Note instances from the score to the instruments so that music is produced.
What I see in this pattern, but haven't seen in the examples I've read so far, is (a) the total separation between the score and the instruments and (b) explicit definition (in the form of a class and/or API) of the Note objects that tell the instruments what to do.
Are their built in utilities that support this type of operating pattern?
Is this an un-smallalkey way of thinking about the problem?

I'm not sure exactly what you want, given that you've looked at the examples. The odd bit is the "total separation" requirement; usually a score needs to make some assumptions about what parameters are relevant to what instruments - although there are enough introspective methods in SynthDef that a program could make educated guesses.
But the basic schematic is pretty standard: SynthDef defines instruments, Collection and its subclasses store data, Routine and other classes can interpret data structures in scheduled time to make music.
At the bottom I'm pasting some boilerplate code for a very simple c-like approach to such a structure, using SynthDef, Routine, and Array. Which instrument to use is arbitrarily chosen at note generation time, and the "score" is instrument-agnostic.
However, the idiomatic approach in SC is to use Patterns and Events, and the Pbind class in particular. Personally I find these a little restrictive and verbose, but they'll certainly do what you ask. Check out the "Streams-Patterns-Events" series of helpfiles.
And various people have written third-party extensions like Instr and Voicer to accommodate their own flavors of the score-instrument model. Check out the Quarks listing or consider rolling your own?
s = Server.local.boot;
s.waitForBoot{ Routine {
/// in a "real" patch, i'd make these local variables,
/// but in testing its convenient to use environment variables.
// var inst, tclock, score, playr, switchr;
// the current instrument
~inst = \ding;
// a fast TempoClock
~tclock = TempoClock.new(8);
// two instruments that take the same arguments
SynthDef.new(\ding, {
arg dur=0.2, hz=880, out=0, level=0.25, pan=0.0;
var snd;
var amp = EnvGen.ar(Env.perc, doneAction:2, timeScale:dur);
snd = SinOsc.ar(hz) * amp * level;
Out.ar(out, Pan2.ar(snd, pan));
}).send(s);
SynthDef.new(\tick, {
arg dur=0.1, hz=880, out=0, level=0.25, pan=0.0;
var snd;
var amp = EnvGen.ar(Env.perc, doneAction:2, timeScale:dur);
snd = LPF.ar(WhiteNoise.ar, hz) * amp * level;
Out.ar(out, Pan2.ar(snd, pan));
}).send(s);
s.sync;
// the "score" is just a nested array of argument values
// there are also many kinds of associative collections in SC if you prefer
~score = [
// each entry:
// midi note offset, note duration in seconds, wait time in beats
[0, 0.4, 2],
[0, 0.4, 1],
[7, 0.2, 1],
[0, 0.2, 1],
[7, 0.15, 1],
[10, 0.5, 2],
[7, 0.1, 1],
[2, 0.3, 1]
];
// a routine that plays the score, not knowing which instrument is the target
~playr = Routine { var note, hz; inf.do({ arg i;
// get the next note
note = ~score.wrapAt(i);
// interpret scale degree as MIDI note plus offset
hz = (note[0] + 60).midicps;
// play the note
Synth.new(~inst, [\hz, hz, \dur, note[1] ], s);
// wait
note[2].wait;
}); }.play(~tclock);
// a routine that randomly switches instruments
~switchr = Routine { var note, hz; inf.do({ arg i;
if(0.2.coin, {
if(~inst == \ding, {
~inst = \tick;
}, {
~inst = \ding;
});
~inst.postln;
});
// wait
1.wait;
}); }.play(~tclock);
}.play; };

I would also add that there is a set of extensions (a "Quark") called Ctk, that wraps the SynthDef (into CtkSynthDef), the concept of a note (into CtkNote) and the score (into CtkScore) facilitating work in both real time and non real time. I feel that the examples provided with its helpfiles are (mostly) following the architecture suggested by the OP.
To install it, run Quarks.install("Ctk") in SuperCollider.

Related

deno template matching using OpenCV gives no results

I'm trying to use https://deno.land/x/opencv#v4.3.0-10 to get template matching to work in deno. I heavily based my code on the node example provided, but can't seem to work it out just yet.
By following the source code I first stumbled upon error: Uncaught (in promise) TypeError: Cannot convert "undefined" to int while calling cv.matFromImageData(imageSource).
After experimenting and searching I figured the function expects {data: Uint8ClampedArray, height: number, width: number}. This is based on this SO post and might be incorrect, hence posting it here.
The issue I'm currently faced with is that I don't seem to get proper matches from my template. Only when I set the threshold to 0.1 or lower, I get a match, but this is not correct { xStart: 0, yStart: 0, xEnd: 29, yEnd: 25 }.
I used the images provided by the templateMatching example here.
Haystack
Needle
Any input/thoughts on this are appreciated.
import { cv } from 'https://deno.land/x/opencv#v4.3.0-10/mod.ts';
export const match = (imagePath: string, templatePath: string) => {
const imageSource = Deno.readFileSync(imagePath);
const imageTemplate = Deno.readFileSync(templatePath);
const src = cv.matFromImageData({ data: imageSource, width: 640, height: 640 });
const templ = cv.matFromImageData({ data: imageTemplate, width: 29, height: 25 });
const processedImage = new cv.Mat();
const logResult = new cv.Mat();
const mask = new cv.Mat();
cv.matchTemplate(src, templ, processedImage, cv.TM_SQDIFF, mask);
cv.log(processedImage, logResult)
console.log(logResult.empty())
};
UPDATE
Using #ChristophRackwitz's answer and digging into opencv(js) docs, I managed to get close to my goal.
I decided to step down from taking multiple matches into account, and focused on a single (best) match of my needle in the haystack. Since ultimately this is my use-case anyways.
Going through the code provided in this example and comparing data with the data in my code, I figured something was off with the binary image data which I supplied to cv.matFromImageData. I solved this my properly decoding the png and passing that decoded image's bitmap to cv.matFromImageData.
I used TM_SQDIFF as suggested, and got some great results.
Haystack
Needle
Result
I achieved this in the following way.
import { cv } from 'https://deno.land/x/opencv#v4.3.0-10/mod.ts';
import { Image } from 'https://deno.land/x/imagescript#v1.2.14/mod.ts';
export type Match = false | {
x: number;
y: number;
width: number;
height: number;
center?: {
x: number;
y: number;
};
};
export const match = async (haystackPath: string, needlePath: string, drawOutput = false): Promise<Match> => {
const perfStart = performance.now()
const haystack = await Image.decode(Deno.readFileSync(haystackPath));
const needle = await Image.decode(Deno.readFileSync(needlePath));
const haystackMat = cv.matFromImageData({
data: haystack.bitmap,
width: haystack.width,
height: haystack.height,
});
const needleMat = cv.matFromImageData({
data: needle.bitmap,
width: needle.width,
height: needle.height,
});
const dest = new cv.Mat();
const mask = new cv.Mat();
cv.matchTemplate(haystackMat, needleMat, dest, cv.TM_SQDIFF, mask);
const result = cv.minMaxLoc(dest, mask);
const match: Match = {
x: result.minLoc.x,
y: result.minLoc.y,
width: needleMat.cols,
height: needleMat.rows,
};
match.center = {
x: match.x + (match.width * 0.5),
y: match.y + (match.height * 0.5),
};
if (drawOutput) {
haystack.drawBox(
match.x,
match.y,
match.width,
match.height,
Image.rgbaToColor(255, 0, 0, 255),
);
Deno.writeFileSync(`${haystackPath.replace('.png', '-result.png')}`, await haystack.encode(0));
}
const perfEnd = performance.now()
console.log(`Match took ${perfEnd - perfStart}ms`)
return match.x > 0 || match.y > 0 ? match : false;
};
ISSUE
The remaining issue is that I also get a false match when it should not match anything.
Based on what I know so far, I should be able to solve this using a threshold like so:
cv.threshold(dest, dest, 0.9, 1, cv.THRESH_BINARY);
Adding this line after matchTemplate however makes it indeed so that I no longer get false matches when I don't expect them, but I also no longer get a match when I DO expect them.
Obviously I am missing something on how to work with the cv threshold. Any advice on that?
UPDATE 2
After experimenting and reading some more I managed to get it to work with normalised values like so:
cv.matchTemplate(haystackMat, needleMat, dest, cv.TM_SQDIFF_NORMED, mask);
cv.threshold(dest, dest, 0.01, 1, cv.THRESH_BINARY);
Other than it being normalised it seems to do the trick consistently for me. However, I would still like to know why I cant get it to work without using normalised values. So any input is still appreciated. Will mark this post as solved in a few days to give people the chance to discus the topic some more while it's still relevant.

The TM_* methods of matchTemplate are treacherous. And the docs throw formulas at you that would make anyone feel dumb, because they're code, not explanation.
Consider the calculation of one correlation: one particular position of the template/"needle" on the "haystack".
All the CCORR modes will simply multiply elementwise. Your data uses white as "background", which is a "DC offset". The signal, the difference to white of anything not-white, will drown in the large "DC offset" of your data. The calculated correlation coefficients will vary mostly with the DC offset and hardly at all with the actual signal/difference.
This is what that looks like, roughly. The result of running with TM_CCOEFF_NORMED, overlaid on top of the haystack (with some padding). You're getting big fat responses for all instances of all shapes, no matter their specific shape.
You want to use differences instead. The SQDIFF modes will handle that. Squared differences are a measure of dissimilarity, i.e. a perfect match will give you 0.
Let's look at some values...
(hh, hw) = haystack.shape[:2]
(nh, nw) = needle.shape[:2]
scores = cv.matchTemplate(image=haystack, templ=needle, method=cv.TM_SQDIFF)
(sh, sw) = scores.shape # will be shaped like haystack - needle
scores = np.log10(1+scores) # any log will do
maxscore = 255**2 * (nh * nw * 3)
# maximum conceivable SQDIFF score, 3-channel data, any needle
# for a specific needle:
#maxscore = (np.maximum(needle, 255-needle, dtype=np.float32)**2).sum()
# map range linearly, from [0 .. ~8] to [1 .. 0] (white to black)
(smin, smax) = (0.0, np.log10(1+maxscore))
(omin, omax) = (1.0, 0.0)
print("mapping from", (smin, smax), "to", (omin, omax))
out = (scores - smin) / (smax - smin) * (omax - omin) + omin
You'll see gray peaks, but some are actually (close to) white while others aren't. Those are truly instances of the needle image. The other instances differ more from the needle, so they're just some reddish shapes that kinda look like the needle.
Now you can find local extrema. There are many ways to do that. You'll want to do two things: filter by absolute value (threshold) and suppress non-maxima (scores above threshold that are dominated by better nearby score). I'll just do the filtering and pretend there aren't any nearby non-maxima because the resulting peaks fall off strongly enough. If that happens to not be the case, you'd see double drawing in the picture below, boxes becoming "bold" because they're drawn twice onto adjacent pixel positions.
I'm picking a threshold of 2.0 because that represents a difference of 100, i.e. one color value in one pixel may have differed by 10 (10*10 = 100), or two values may have differed by 7 (7*7 = 49, twice makes 98), ... so it's still a very tiny, imperceptible difference. A threshold of 6 would mean a sum of squared differences of upto a million, allowing for a lot more difference.
(i,j) = (scores <= 2.0).nonzero() # threshold "empirically decided"
instances = np.transpose([j,i]) # list of (x,y) points
That's giving me 16 instances.
canvas = haystack.copy()
for pt in instances:
(j,i) = pt
score = scores[i,j]
cv.rectangle(canvas,
pt1=(pt-(1,1)).astype(int), pt2=(pt+(nw,nh)).astype(int),
color=(255,0,0), thickness=1)
cv.putText(canvas,
text=f"{score:.2f}",
org=(pt+[0,-5]).astype(int),
fontFace=cv.FONT_HERSHEY_SIMPLEX, fontScale=0.4,
color=(255,0,0), thickness=1)
That's drawing a box around each, with the logarithm of the score above it.
One simple approach to get candidates for Non-Maxima Suppression (NMS) is to cv.dilate the scores and equality-compare, to gain a mask of candidates. Those scores that are local maxima, will compare equal to themselves (the dilated array), and every surrounding score will be less. This alone will have some corner cases you will need to handle. Note: at this stage, those are local maxima of any value. You need to combine (logical and) that with a mask from thresholding the values.
NMS commonly is required to handle immediate neighbors being above the threshold, and merge them or pick the better one. You can do that by simply running connectedComponents(WithStats) and taking the blob centers. I think that's clearly better than trying to find contours.
The dilate-and-compare approach will not suppress neighbors if they have the same score. If you did the connectedComponents step, you only have non-immediate neighbors to deal with here. What to do is up to you. It's not clear cut anyway.

Can I apply softmax only on specific output neurons?

I am building an Actor-Critic neural network model in pytorch in order to train an agent to play the game of Quoridor (hopefully). For this reason, I have a neural network with two heads, one for the actor output which does a softmax on all the possible moves and one for the critic output which is just one neuron (for regressing the value of the input state).
Now, in quoridor, most of the times not all moves will be legal and as such I am wondering if I can exclude output neurons on the actor's head that correspond to illegal moves for the input state e.g. by passing a list of indices of all the neurons that correspond to legal moves. Thus, I want to not sum these outputs on the denominator of softmax.
Is there a functionality like this on pytorch (because I cannot find one)? Should I attempt to implement such a Softmax myself (kinda scared to, pytorch probably knows best, I ve been adviced to use LogSoftmax as well)?
Furthermore, do you think this approach of dealing with illegal moves is good? Or should I just let him guess illegal moves and penalize him (negative reward) for it in the hopes that eventually it will not pick illegal moves?
Or should I let the softmax be over all the outputs and then just set illegal ones to zero? The rest won't sum to 1 but maybe I can solve that by plain normalization (i.e. dividing by the L2 norm)?

An easy solution would be to mask out illegal moves with a large negative value, this will practically force very low (log)softmax values (example below).
# 3 dummy actions for a batch size of 2
>>> actions = torch.rand(2, 3)
>>> actions
tensor([[0.9357, 0.2386, 0.3264],
[0.0179, 0.8989, 0.9156]])
# dummy mask assigning 0 to valid actions and 1 to invalid ones
>>> mask = torch.randint(low=0, high=2, size=(2, 3))
>>> mask
tensor([[1, 0, 0],
[0, 0, 0]])
# set actions marked as invalid to very large negative value
>>> actions = actions.masked_fill_(mask.eq(1), value=-1e10)
>>> actions
tensor([[-1.0000e+10, 2.3862e-01, 3.2636e-01],
[ 1.7921e-02, 8.9890e-01, 9.1564e-01]])
# softmax assigns no probability mass to illegal actions
>>> actions.softmax(dim=-1)
tensor([[0.0000, 0.4781, 0.5219],
[0.1704, 0.4113, 0.4183]])

I'm not qualified to say if this is a good idea, but I had the same one and ended up implementing it.
The code is using rust's bindings for pytorch, so it should be directly translatable to python based pytorch.
/// As log_softmax(dim=1) on a 2d tensor, but takes a {0, 1} `filter` of the same shape as `xs`
/// and has the softmax only look at values where filter[idx] = 1.
///
/// The output is 0 where the filter is 0.
pub fn filtered_log_softmax(xs: &Tensor, filter: &Tensor) -> Tensor {
// We are calculating `log softmax(xs, ys)` except that we only want to consider
// the values of xs and ys where the corresponding `filter` bit is set to 1.
//
// log_softmax on one element of the batch = for_each_i log(e^xs[i] / sum_j e^xs[j]))
//
// To filter that we need to remove (zero out) elements that are being filtered both after the log is
// taken, and before summing into the denominator. We can do this with two multiplications
//
// filtered_log_softmax = for_each_i filter[i] * log(e^xs[i] / sum_j filter[j] * e^xs[j]))
//
// This is mathematically correct, but it turns out there's a numeric stability trick we need to do,
// without it we're seeing NaNs. Sourcing the trick from: https://stackoverflow.com/a/52132033
//
// We can do the same transformation here, and come out with the following expression:
//
// let xs_max = max_i xs[i]
// for_each_i filter[i] * (xs[i] - xs_max - log(sum_j filter[j] * e^(xs[j] - xs_max))
//
// Keep in mind that the actual implementation below is further vectorized over an initial batch dimension.
let (xs_max, _) = xs.max_dim(1, true);
let xs_offset = xs - xs_max;
// TODO: Replace with Tensor::linalg_vecdot(&filter, &xs_offset.exp(), 1).log();
// when we update tch-rs (linalg_vecdot is new in pytorch 1.13)
let constant_sub = (filter * &xs_offset.exp()).sum_to_size(&[xs.size()[0], 1]).log();
filter * (&xs_offset - constant_sub)
}

Node: Generate 6 digits random number using crypto.randomBytes

What is the correct way to generate exact value from 0 to 999999 randomly since 1000000 is not a power of 2?
This is my approach:
use crypto.randomBytes to generate 3 bytes and convert to hex
use the first 5 characters to convert to integer (max is fffff == 1048575 > 999999)
if the result > 999999, start from step 1 again
It will somehow create a recursive function. Is it logically correct and will it cause a concern of performance?

There are several way to extract random numbers in a range from random bits. Some common ones are described in NIST Special Publication 800-90A revision 1: Recommendation for Random Number Generation Using Deterministic Random Bit Generators
Although this standard is about deterministic random bit generations there is a helpful appendix called A.5 Converting Random Bits into a Random Number which describes three useful methods.
The methods described are:
A.5.1 The Simple Discard Method
A.5.2 The Complex Discard Method
A.5.3 The Simple Modular Method
The first two of them are not deterministic with regards to running time but generate a number with no bias at all. They are based on rejection sampling.
The complex discard method discusses a more optimal scheme for generating large quantities of random numbers in a range. I think it is too complex for almost any normal use; I would look at the Optimized Simple Discard method described below if you require additional efficiency instead.
The Simple Modular Method is time constant and deterministic but has non-zero (but negligible) bias. It requires a relatively large amount of additional randomness to achieve the negligible bias though; basically to have a bias of one out of 2^128 you need 128 bits on top of the bit size of the range required. This is probably not the method to choose for smaller numbers.
Your algorithm is clearly a version of the Simple Discard Method (more generally called "rejection sampling"), so it is fine.
I've myself thought of a very efficient algorithm based on the Simple Discard Method called the "Optimized Simple Discard Method" or RNG-BC where "BC" stands for "binary compare". It is based on the observation that comparison only looks at the most significant bits, which means that the least significant bits should still be considered random and can therefore be reused. Beware that this method has not been officially peer reviewed; I do present an informal proof of equivalence with the Simple Discard Method.
Of course you should rather use a generic method that is efficient given any value of N. In that case the Complex Discard Method or Simple Modular Method should be considered over the Simple Discard Method. There are other, much more complex algorithms that are even more efficient, but generally you're fine when using either of these two.
Note that it is often beneficial to first check if N is a power of two when generating a random in the range [0, N). If N is a power of two then there is no need to use any of these possibly expensive computations; just use the bits you need from the random bit or byte generator.

It's a correct algorithm (https://en.wikipedia.org/wiki/Rejection_sampling), though you could consider using bitwise operations instead of converting to hex. It can run forever if the random number generator is malfunctioning -- you could consider trying a fixed number of times and then throwing an exception instead of looping forever.

The main possible performance problem is that on some platforms, crypto.randomBytes can block if it runs out of entropy. So you don't want to waste any randomness if you're using it.
Therefore instead of your string comparison I would use the following integer operation.
if (random_bytes < 16700000) {
return random_bytes = random_bytes - 100000 * Math.floor(random_bytes/100000);
}
This has about a 99.54% chance of producing an answer from the first 3 bytes, as opposed to around 76% odds with your approach.

I would suggest the following approach:
private generateCode(): string {
let code: string = "";
do {
code += randomBytes(3).readUIntBE(0, 3);
// code += Number.parseInt(randomBytes(3).toString("hex"), 16);
} while (code.length < 6);
return code.slice(0, 6);
}
This returns the numeric code as string, but if it is necessary to get it as a number, then change to return Number.parseInt(code.slice(0, 6))

I call it the random_6d algo. Worst case just a single additional loop.
var random_6d = function(n2){
var n1 = crypto.randomBytes(3).readUIntLE(0, 3) >>> 4;
if(n1 < 1000000)
return n1;
if(typeof n2 === 'undefined')
return random_6d(n1);
return Math.abs(n1 - n2);
};
loop version:
var random_6d = function(){
var n1, n2;
while(true){
n1 = crypto.randomBytes(3).readUIntLE(0, 3) >>> 4;
if(n1 < 1000000)
return n1;
if(typeof n2 === 'undefined')
n2 = n1;
else
return Math.abs(n1 - n2);
};
};

alloy model for hydrocarbons

i need to model hydrocarbon structure using alloy
basically i need to design alkane, alkene and alkyne groups
i have created following signatures(alkene example)
sig Hydrogen{}
sig Carbon{}
sig alkenegrp{
c:one Carbon,
h:set Hydrogen,
doublebond:lone alkenegrp
}
sig alkene{
unit : set alkenegrp
}
fact{
all a:alkenegrp|a not in a.doublebond.*doublebond
all a:alkenegrp|#a.h=mul[#(a.c),2]
}
pred show_alkene{
#alkene>1
}
run show_alkene
this works from alkene but when ever i try to design the same for alkane or alkyne by changing the fact like all a:alkynegrp|#a.h=minus[mul[#(a.c),2],2] it doesnt work.
Can anyone suggest how do i implement it?
My problem statement is
In Organic chemistry saturated hydrocarbons are organic compound composed entirely of single
bonds and are saturated with hydrogen. The general formula for saturated hydrocarbons is
CnH2n+2(assuming non-cyclic structures). Also called as alkanes. Unsaturated hydrocarbons
have one or more double or triple bonds between carbon atoms. Those with double bond are
called alkenes. Those with one double bond have the formula CnH2n (assuming non-cyclic
structures). Those containing triple bonds are called alkynes, with general formula CnH2n-2.
Model hydrocarbons and give predicates to generate instances of alkane, alkene and alkyne.
We have tried as:
sig Hydrogen{}
sig Carbon{}
sig alkane{
c:one Carbon,
h:set Hydrogen,
n:lone alkane
}
fact{
//(#h)=add [mul[(#c),2],2]
//all a:alkane|a not in a.*n
all a:alkane|#a.h=mul[#(a.c),2]
}
pred show_alkane(){}
run show_alkan
e
General formula for alkane is CnH2n+2,for multiplication we can use mul inbuilt function but we can not write for addtion as we have to do CnH2n+2.What should we write so that it can work for alkane

I understand alkanes, alkenes, and alkynes a little better now, but I still don't understand why you think your Alloy model doesn't work.
To express the CnH2n-2 constraint, you can certainly write what you suggested
all a:alkynegrp |
#a.h = minus[mul[#(a.c), 2], 2]
The problem is only that in your alkane sig declaration you said c: one Carbon, which is going to fix the number of carbon atoms to exactly 1, so minus[mul[#(a.c), 2], 2] is always going to evaluate to exactly 0. I assume you want to alloy for any number of carbons (since Cn) so you should change it from c: one Carbon to c: set Carbon. If you then run the show_alkane predicate, you should get some instances where the number of carbons is greater than 1 and thus, the number of hydrogens is greater than 0.
Also, for the alkane formula
all a:alkynegrp |
#a.h = plus[mul[#(a.c), 2], 2]
the default scope of 3 will not suffice, because you will need at least 4 atoms of hydrogen when a.c is non-empty, but you can fix that by explicitly giving a scope
run show_alkane for 8
If this wasn't the problem you were talking about, please be more specific about why you think "it doesn't work", i.e., what is it that you expect Alloy to do and what is it that Alloy actually does.

Is it possible to do an algebraic curve fit with just a single pass of the sample data?

I would like to do an algebraic curve fit of 2D data points, but for various reasons - it isn't really possible to have much of the sample data in memory at once, and iterating through all of it is an expensive process.
(The reason for this is that actually I need to fit thousands of curves simultaneously based on gigabytes of data which I'm reading off disk, and which is therefore sloooooow).
Note that the number of polynomial coefficients will be limited (perhaps 5-10), so an exact fit will be extremely unlikely, but this is ok as I'm trying to find an underlying pattern in data with a lot of random noise.
I understand how one can use a genetic algorithm to fit a curve to a dataset, but this requires many passes through the sample data, and thus isn't practical for my application.
Is there a way to fit a curve with a single pass of the data, where the state that must be maintained from sample to sample is minimal?
I should add that the nature of the data is that the points may lie anywhere on the X axis between 0.0 and 1.0, but the Y values will always be either 1.0 or 0.0.
So, in Java, I'm looking for a class with the following interface:
public interface CurveFit {
public void addData(double x, double y);
public List<Double> getBestFit(); // Returns the polynomial coefficients
}
The class that implements this must not need to keep much data in its instance fields, no more than a kilobyte even for millions of data points. This means that you can't just store the data as you get it to do multiple passes through it later.
edit: Some have suggested that finding an optimal curve in a single pass may be impossible, however an optimal fit is not required, just as close as we can get it in a single pass.
The bare bones of an approach might be if we have a way to start with a curve, and then a way to modify it to get it slightly closer to new data points as they come in - effectively a form of gradient descent. It is hoped that with sufficient data (and the data will be plentiful), we get a pretty good curve. Perhaps this inspires someone to a solution.

Yes, it is a projection. For
y = X beta + error
where lowercased terms are vectors, and X is a matrix, you have the solution vector
\hat{beta} = inverse(X'X) X' y
as per the OLS page. You almost never want to compute this directly but rather use LR, QR or SVD decompositions. References are plentiful in the statistics literature.
If your problem has only one parameter (and x is hence a vector as well) then this reduces to just summation of cross-products between y and x.

If you don't mind that you'll get a straight line "curve", then you only need six variables for any amount of data. Here's the source code that's going into my upcoming book; I'm sure that you can figure out how the DataPoint class works:
Interpolation.h:
#ifndef __INTERPOLATION_H
#define __INTERPOLATION_H
#include "DataPoint.h"
class Interpolation
{
private:
int m_count;
double m_sumX;
double m_sumXX; /* sum of X*X */
double m_sumXY; /* sum of X*Y */
double m_sumY;
double m_sumYY; /* sum of Y*Y */
public:
Interpolation();
void addData(const DataPoint& dp);
double slope() const;
double intercept() const;
double interpolate(double x) const;
double correlate() const;
};
#endif // __INTERPOLATION_H
Interpolation.cpp:
#include <cmath>
#include "Interpolation.h"
Interpolation::Interpolation()
{
m_count = 0;
m_sumX = 0.0;
m_sumXX = 0.0;
m_sumXY = 0.0;
m_sumY = 0.0;
m_sumYY = 0.0;
}
void Interpolation::addData(const DataPoint& dp)
{
m_count++;
m_sumX += dp.getX();
m_sumXX += dp.getX() * dp.getX();
m_sumXY += dp.getX() * dp.getY();
m_sumY += dp.getY();
m_sumYY += dp.getY() * dp.getY();
}
double Interpolation::slope() const
{
return (m_sumXY - (m_sumX * m_sumY / m_count)) /
(m_sumXX - (m_sumX * m_sumX / m_count));
}
double Interpolation::intercept() const
{
return (m_sumY / m_count) - slope() * (m_sumX / m_count);
}
double Interpolation::interpolate(double X) const
{
return intercept() + slope() * X;
}
double Interpolation::correlate() const
{
return m_sumXY / sqrt(m_sumXX * m_sumYY);
}

Why not use a ring buffer of some fixed size (say, the last 1000 points) and do a standard QR decomposition-based least squares fit to the buffered data? Once the buffer fills, each time you get a new point you replace the oldest and re-fit. That way you have a bounded working set that still has some data locality, without all the challenges of live stream (memoryless) processing.

Are you limiting the number of polynomial coefficients (i.e. fitting to a max power of x in your polynomial)?
If not, then you don't need a "best fit" algorithm - you can always fit N data points EXACTLY to a polynomial of N coefficients.
Just use matrices to solve N simultaneous equations for N unknowns (the N coefficients of the polynomial).
If you are limiting to a max number of coefficients, what is your max?
Following your comments and edit:
What you want is a low-pass filter to filter out noise, not fit a polynomial to the noise.

Given the nature of your data:
the points may lie anywhere on the X axis between 0.0 and 1.0, but the Y values will always be either 1.0 or 0.0.
Then you don't need even a single pass, as these two lines will pass exactly through every point:
X = [0.0 ... 1.0], Y = 0.0
X = [0.0 ... 1.0], Y = 1.0
Two short line segments, unit length, and every point falls on one line or the other.
Admittedly, an algorithm to find a good curve fit for arbitrary points in a single pass is interesting, but (based on your question), that's not what you need.

Assuming that you don't know which point should belong to which curve, something like a Hough Transform might provide what you need.
The Hough Transform is a technique that allows you to identify structure within a data set. One use is for computer vision, where it allows easy identification of lines and borders within the field of sight.
Advantages for this situation:
Each point need be considered only once
You don't need to keep a data structure for each candidate line, just one (complex, multi-dimensional) structure
Processing of each line is simple
You can stop at any point and output a set of good matches
You never discard any data, so it's not reliant on any accidental locality of references
You can trade off between accuracy and memory requirements
Isn't limited to exact matches, but will highlight partial matches too.
An approach
To find cubic fits, you'd construct a 4-dimensional Hough space, into which you'd project each of your data-points. Hotspots within Hough space would give you the parameters for the cubic through those points.

You need the solution to an overdetermined linear system. The popular methods are Normal Equations (not usually recommended), QR factorization, and singular value decomposition (SVD). Wikipedia has decent explanations, Trefethen and Bau is very good. Your options:
Out-of-core implementation via the normal equations. This requires the product A'A where A has many more rows than columns (so the result is very small). The matrix A is completely defined by the sample locations so you don't have to store it, thus computing A'A is reasonably cheap (very cheap if you don't need to hit memory for the node locations). Once A'A is computed, you get the solution in one pass through your input data, but the method can be unstable.
Implement an out-of-core QR factorization. Classical Gram-Schmidt will be fastest, but you have to be careful about stability.
Do it in-core with distributed memory (if you have the hardware available). Libraries like PLAPACK and SCALAPACK can do this, the performance should be much better than 1. The parallel scalability is not fantastic, but will be fine if it's a problem size that you would even think about doing in serial.
Use iterative methods to compute an SVD. Depending on the spectral properties of your system (maybe after preconditioning) this could converge very fast and does not require storage for the matrix (which in your case has 5-10 columns each of which are the size of your input data. A good library for this is SLEPc, you only have to find a the product of the Vandermonde matrix with a vector (so you only need to store the sample locations). This is very scalable in parallel.

I believe I found the answer to my own question based on a modified version of this code. For those interested, my Java code is here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string