As a beginner to differential privacy, I would like to why the variance for noise mechanisms needs to be calibrated with sensitivity? What is the purpose of that? What happens if we don't calibrate it and add a random variance?
Example scenario here In Laplacian noise, why scale parameter is calibrated?
One way you can understand this intuitively is by imagining a function that returns either of two values, say 0 and a for some real a.
Suppose further that we have an additive noise mechanism, so that we end up with two probability distributions on the real line, as in the image from your attached link (this is an example of the setup above, with a=1):
In pure DP, we are interested in computing the maximum of the ratio of these distributions over the entire real line. As the calculation in your link shows, this ratio is bounded everywhere by e to the power of epsilon.
Now, imagine moving the centers of these distributions further apart, say by shifting the red distribution further to the right (IE, increasing a). Clearly this will place less probability mass from the red distribution on the value 0, which is where the maximum of this ratio will be achieved. Therefore the ratio between these distributions at 0 will be increased--a constant (the mass the blue distribution places on 0) is divided by a smaller number.
One way we could move the ratio back down would be to "fatten" the distributions out. This would correspond pictorally to moving the peaks of the distributions lower, and spreading the mass out over a wider area (since they have to integrate to 1, these two things are necessarily coupled for a distribution like the Laplace). Mathematically we would accomplish this by increasing the variance in the Laplace distribution (increasing b in the parameterization here), which has the effect of lowering the peak of the blue distribution at 0 and raising the mass the red distribution places at 0, thereby reducing the ratio between them back down (a smaller numerator and a larger denominator).
If you perform the calculations, you will find that the relationship between the variance parameter b and the sensitivity of the function f is in fact linear; that is, setting b to be
fixes the maximum of this ratio, to
which is precisely the definition of pure differential privacy.
If you add arbitrary amounts of random noise, you simply end up with random data. Sure, it preserves privacy, but at the same time as destroying any real value in the data. The noise you add needs to match your existing distribution so that it preserves privacy without destroying the value of the data. That’s what the calibration step does.
Related
I am learning statistics, and have some basic yet core questions on SD:
s = sample size
n = total number of observations
xi = ith observation
μ = arithmetic mean of all observations
σ = the usual definition of SD, i.e. ((1/(n-1))*sum([(xi-μ)**2 for xi in s])**(1/2) in Python lingo
f = frequency of an observation value
I do understand that (1/n)*sum([xi-μ for xi in s]) would be useless (= 0), but would not (1/n)*sum([abs(xi-μ) for xi in s]) have been a measure of variation?
Why stop at power of 1 or 2? Would ((1/(n-1))*sum([abs((xi-μ)**3) for xi in s])**(1/3) or ((1/(n-1))*sum([(xi-μ)**4 for xi in s])**(1/4) and so on have made any sense?
My notion of squaring is that it 'amplifies' the measure of variation from the arithmetic mean while the simple absolute difference is somewhat a linear scale notionally. Would it not amplify it even more if I cubed it (and made absolute value of course) or quad it?
I do agree computationally cubes and quads would have been more expensive. But with the same argument, the absolute values would have been less expensive... So why squares?
Why is the Normal Distribution like it is, i.e. f = (1/(σ*math.sqrt(2*pi)))*e**((-1/2)*((xi-μ)/σ))?
What impact would it have on the normal distribution formula above if I calculated SD as described in (1) and (2) above?
Is it only a matter of our 'getting used to the squares', it could well have been linear, cubed or quad, and we would have trained our minds likewise?
(I may not have been 100% accurate in my number of opening and closing brackets above, but you will get the idea.)
So, if you are looking for an index of dispersion, you actually don't have to use the standard deviation. You can indeed report mean absolute deviation, the summary statistic you suggested. You merely need to be aware of how each summary statistic behaves, for example the SD assigns more weight to outlying variables. You should also consider how each one can be interpreted. For example, with a normal distribution, we know how much of the distribution lies between ±2SD from the mean. For some discussion of mean absolute deviation (and other measures of average absolute deviation, such as the median average deviation) and their uses see here.
Beyond its use as a measure of spread though, SD is related to variance and this is related to some of the other reasons it's popular, because the variance has some nice mathematical properties. A mathematician or statistician would be able to provide a more informed answer here, but squared difference is a smooth function and is differentiable everywhere, allowing one to analytically identify a minimum, which helps when fitting functions to data using least squares estimation. For more detail and for a comparison with least absolute deviations see here. Another major area where variance shines is that it can be easily decomposed and summed, which is useful for example in ANOVA and regression models generally. See here for a discussion.
As to your questions about raising to higher powers, they actually do have uses in statistics! In general, the mean (which is related to average absolute mean), the variance (related to standard deviation), skewness (related to the third power) and kurtosis (related to the fourth power) are all related to the moments of a distribution. Taking differences raised to those powers and standardizing them provides useful information about the shape of a distribution. The video I linked provides some easy intuition.
For some other answers and a larger discussion of why SD is so popular, See here.
Regarding the relationship of sigma and the normal distribution, sigma is simply a parameter that stretches the standard normal distribution, just like the mean changes its location. This is simply a result of the way the standard normal distribution (a normal distribution with mean=0 and SD=variance=1) is mathematically defined, and note that all normal distributions can be derived from the standard normal distribution. This answer illustrates this. Now, you can parameterize a normal distribution in other ways as well, but I believe you do need to provide sigma, whether using the SD or precisions. I don't think you can even parametrize a normal distribution using just the mean and the mean absolute difference. Now, a deeper question is why normal distributions are so incredibly useful in representing widely different phenomena and crop up everywhere. I think this is related to the Central Limit Theorem, but I do not understand the proofs of the theorem well enough to comment further.
I am trying to implement a Microfacet BRDF shading model (similar to the Cook-Torrance model) and I am having some trouble with the Beckmann Distribution defined in this paper: https://www.cs.cornell.edu/~srm/publications/EGSR07-btdf.pdf
Where M is a microfacet normal, N is the macrofacet normal and ab is a "hardness" parameter between [0, 1].
My issue is that this distribution often returns obscenely large values, especially when ab is very small.
For instance, the Beckmann distribution is used to calculate the probability of generating a microfacet normal M per this equation :
A probability has to be between the range [0,1], so how is it possible to get a value within this range using the function above if the Beckmann distribution gives me values that are 1000000000+ in size?
So there a proper way to clamp the distribution? Or am I misunderstanding it or the probability function? I had tried simply clamping it to 1 if the value exceeded 1 but this didn't really give me the results I was looking for.
I was having the same question you did.
If you read
http://blog.selfshadow.com/publications/s2012-shading-course/hoffman/s2012_pbs_physics_math_notes.pdf
and
http://blog.selfshadow.com/publications/s2012-shading-course/hoffman/s2012_pbs_physics_math_notebook.pdf
You'll notice it's perfectly normal. To quote from the links:
"The Beckmann Αb parameter is equal to the RMS (root mean square) microfacet slope. Therefore its valid range is from 0 (non-inclusive –0 corresponds to a perfect mirror or Dirac delta and causes divide by 0 errors in the Beckmann formulation) and up to arbitrarily high values. There is no special significance to a value of 1 –this just means that the RMS slope is 1/1 or 45°.(...)"
Also another quote:
"The statistical distribution of microfacet orientations is defined via the microfacet normal distribution function D(m). Unlike F (), the value of D() is not restricted to lie between 0 and 1—although values must be non-negative, they can be arbitrarily large (indicating a very high concentration of microfacets with normals pointing in a particular direction). (...)"
You should google for Self Shadow's Physically Based Shading courses which is full of useful material (there is one blog post for each year: 2010, 2011, 2012 & 2013)
my knowledge in statistics is minuscule, sorry. I have a large volume of measured amplitudes. In the absence of a signal, the noise is assumed to have a normal distribution. When a signal is present with higher amplitude than the surrounding noise, the shape of the distribution is more tailed on the positive side. I was thinking of using skewness for detection of signal. But the area of higher amplitude (cells in the volume) is rather small compared to the volume itself. So, we are talking of in magnitude of hundreds of cells from a total of some thousands. If the skewness is zero for a normal distribution, how can I extract those cells in my volume which contribute to the non-zero skewness. If say, my skewness value is 0.5, is there a way to drop all cells and keep only those which raised the skewness value. Perhaps I sound unclear but that just shows how little I understand of the topic.
Thanks in advance.
It seems to me that the problem might best be modeled as a mixture model: we have a Gaussian background
B ~ N(0, sigma)
and a signal, about which the poster has not specified a particular model.
If we can assume that the signal also takes the form of one (or possible a mixture of several) Gaussian(s), then Gaussian mixture modelling with the EM algorithm may be a good way to solve it (see Wikipedia).
A good paper in the context of segmentation is this here:
http://www.fil.ion.ucl.ac.uk/~karl/Unified%20segmentation.pdf
If we cannot make such an assumption, I would use a robust regression method to estimate the parameters of the Gaussian noise, where the signal is treated as an outlier, e.g. Least trimmed squares (again see Wikipedia).
The outlier cells can then be found via (Bonferroni-corrected) hypothesis testing, as described e.g. in this paper:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2900857/
The luminence of pixels on a computer screen is not usually linearly related to the digital RGB triplet values of a pixel. The nonlinear response of early CRTs required a compensating nonlinear encoding and we continue to use such encodings today.
Usually we produce images on a computer screen and consume them there as well, so it all works fine. But when we antialias, the nonlinearity — called gamma — means that we can't just add an alpha value of 0.5 to a 50% covered pixel and expect it to look right. An alpha value of 0.5 is only 0.5^2.2=22% as bright as an alpha of 1.0 with a typical gamma of 2.2.
Is there any widely established best practice for antialiasing gamma compensation? Do you have a pet method you use from day to day? Has anyone seen any studies of the results and human perceptions of the quality of the graphic output with different techniques?
I've thought of doing standard X^(1/2.2) compensation but that is pretty computationally intense. Maybe I can make it faster with a 256 entry lookup table, though.
Lookup tables are used quite often for work like that. They're small and fast.
But whether look-up or some formula, if the end result is an image file, and the format permits, it's best to save a color profile or at least the gamma value in the file for later viewing, rather than try adjusting RGB values yourself.
The reason: for typical byte-valued R, G, B channels, you have 256 unique values in each channel at each pixel. That's almost good enough to look good to the human eye (I wish "byte" had been defined as nine bits!) Any kind of math, aside from trivial value inversion, would map many-to-one for some of those values. The output won't have 256 values to pick from for each pixel for R, G, or B, but far fewer. That can lead to contouring, jaggies, color noise and other badness.
Precision issues aside, if any kind of decent quality is desired, all composting, mixing, blending, color correction, fake lens flare addition, chroma-keying and whatever, should be done in linear RGB space, where the values of R, G and B are in proportion to physical light intensity. The image math mimics physical light math. But where ultimate speed is vital, there are ways to cheat.
Jim Blinns - "Dirty Pixels" book outlines a fast and good compositing calculation by using 16 bit math plus lookup tables to accurately go back and forward to linear color space. This guy worked on NASAs visualisations, he knows his stuff.
I'm trying to answer, though mainly for reference now, to the actual questions:
First, there are the recommendations from ITU (http://www.itu.int/rec/T-REC-H.272-200701-I/en) which can be applied to programming (but you have to know your stuff).
In Jim Blinn's "Notation, Notation, Notation", Chapter 9, has a very detailed mathematical and perceptual error analysis, although he only covers compositing (many other graphics tasks are affected too).
The notation he establishes can also be used to derive a way of dealing with gamma, or to check if a given way of doing so is actually correct. Very handy, my pet method (mainly as I discovered it independently but later found his book).
When generating images, one typically works in a linear color space (like linear RGB or one of the CIE color spaces) and then converts to a non-linear RGB space at the end. That conversion can be accelerated in hardware or via lookup tables or even through tricky math. (See the other answers' references.)
When performing an alpha blend (e.g., render this icon onto this background), this kind of precision is often elided in favor of speed. The results are computed directly in the non-linear RGB-space by lerping with the alpha as the parameter. This is not "correct", but it's good enough in most cases. Especially for things like icons on desktops.
If you're trying to do more correct blending, you treat it like an original render. Work in linear space (which may require an initial conversion) and then convert to your non-linear display space at the end.
A lot of graphics nowadays use sRGB as the non-linear display color space. If I recall correctly, sRGB is very similar to a gamma of 2.2, but there are adjustments made to values at the low end.
Without any user interaction, how would a program identify what type of waveform is present in a recording from an ADC?
For the sake of this question: triangle, square, sine, half-sine, or sawtooth waves of constant frequency. Level and frequency are arbitrary, and they will have noise, small amounts of distortion, and other imperfections.
I'll propose a few (naive) ideas, too, and you can vote them up or down.
You definitely want to start by taking an autocorrelation to find the fundamental.
With that, take one period (approximately) of the waveform.
Now take a DFT of that signal, and immediately compensate for the phase shift of the first bin (the first bin being the fundamental, your task will be simpler if all phases are relative).
Now normalise all the bins so that the fundamental has unity gain.
Now compare and contrast the rest of the bins (representing the harmonics) against a set of pre-stored waveshapes that you're interested in testing for. Accept the closest, and reject overall if it fails to meet some threshold for accuracy determined by measurements of the noisefloor.
Do an FFT, find the odd and even harmonic peaks, and compare the rate at which they decrease to a library of common waveform.. peak... ratios.
Perform an autocorrelation to find the fundamental frequency, measure the RMS level, find the first zero-crossing, and then try subtracting common waveforms at that frequency, phase, and level. Whichever cancels out the best (and more than some threshold) wins.
This answer presumes no noise and that this is a simple academic exercise.
In the time domain, take the sample by sample difference of the waveform. Histogram the results. If the distribution has a sharply defined peak (mode) at zero, it is a square wave. If the distribution has a sharply defined peak at a positive value, it is a sawtooth. If the distribution has two sharply defined peaks, one negative and one positive,it is a triangle. If the distribution is broad and is peaked at either side, it is a sine wave.
arm yourself with more information...
I am assuming that you already know that a theoretically perfect sine wave has no harmonic partials (ie only a fundamental)... but since you are going through an ADC you can throw the idea of a theoretically perfect sine wave out the window... you have to fight against aliasing and determining what are "real" partials and what are artifacts... good luck.
the following information comes from this link about csound.
(*) A sawtooth wave contains (theoretically) an infinite number of harmonic partials, each in the ratio of the reciprocal of the partial number. Thus, the fundamental (1) has an amplitude of 1, the second partial 1/2, the third 1/3, and the nth 1/n.
(**) A square wave contains (theoretically) an infinite number of harmonic partials, but only odd-numbered harmonics (1,3,5,7,...) The amplitudes are in the ratio of the reciprocal of the partial number, just as sawtooth waves. Thus, the fundamental (1) has an amplitude of 1, the third partial 1/3, the fifth 1/5, and the nth 1/n.
I think that all of these answers so far are quite bad (including my own previous...)
after having thought the problem through a bit more I would suggest the following:
1) take a 1 second sample of the input signal (doesn't need to be so big, but it simplifies a few things)
2) over the entire second, count the zero-crossings. at this point you have the cps (cycles per second) and know the frequency of the oscillator. (in case that's something you wanted to know)
3) now take a smaller segment of the sample to work with: take precisely 7 zero-crossings worth. (so your work buffer should now, if visualized, look like one of the graphical representations you posted with the original question.) use this small work buffer to perform the following tests. (normalizing the work buffer at this point could make life easier)
4) test for square-wave: zero crossings for a square wave are always very large differences, look for a large signal delta followed by little to no movement until the next zero crossing.
5) test for saw-wave: similar to square-wave, but a large signal delta will be followed by a linear constant signal delta.
6) test for triangle-wave: linear constant (small) signal deltas. find the peaks, divide by the distance between them and calculate what the triangle wave should look like (ideally) now test the actual signal for deviance. set a deviance tolerance threshold and you can determine whether you are looking at a triangle or a sine (or something parabolic).
First find the base frequency and the phase. You can do that with FFT. Normalize the sample. Then subtract each sample with the sample of the waveform you want to test (same frequency and same phase). Square the result add it all up and divide it by the number of samples. The smallest number is the waveform you seek.