we have skew normal distribution with location=0, scale =1 and shape =0 then it is same as standard normal distribution with mean 0 and variance 1.but if we change the shape parameter say shape=5 then mean and variance also changes.how can we fix mean and variance with different values of shape parameter
Just look after how the mean and variance of a skew normal distribution can be computed and you got the answer! Knowing that the mean looks like:
and
You can see, that with a xi=0 (location), omega=1 (scale) and alpha=0 (shape) you really get a standard normal distribution (with mean=0, standard deviation=1):
If you only change the alpha (shape) to 5, you can except the mean will differ a lot, and will be positive. If you want to hold the mean around zero with a higher alpha (shape), you will have to decrease other parameters, e.g.: the omega (scale). The most obvious solution could be to set it to zero instead of 1. See:
Mean is set, we have to get a variance equal to zero with a omega set to zero and shape set to 5. The formula is known:
With our known parameters:
Which is insane :) That cannot be done this way. You may also go back and alter the value of xi instead of omega to get a mean equal to zero. But that way you might first compute the only possible value of omega with the formula of variance given.
Then the omega should be around 1.605681 (negative or positive).
Getting back to mean:
So, with the following parameters you should get a distribution you was intended to:
location = 1.256269 (negative or positive), scale = 1.605681 (negative or positive) and shape = 5.
Please, someone test it, as I might miscalculated somewhere with the given example.
Related
I am using Keras to setup neural networks.
As input data, I use vectors in which each coordinate can be either 0 (feature not present or not measured) or a value that can range for instance between 5000 and 10000.
So my input value distribution is a kind of gaussian centered let us say around 7500 plus a very thin peak at 0.
I cannot remove the vectors with 0 in some of their coordinates because almost all of them will have some 0s at some locations.
So my question is : "how to best normalize the input vectors ?". I see two possibilities :
just substract the mean and divide by standard deviation. The problem then is that the mean is biased by the high number of meaningless 0s, and the std is overestimated, which erases the fine changes in the meaningful measurement.
compute the mean and standard deviation on the non-zeros coordinates, which is more meaningful. But then all the 0 values that correspond to non measured data will come out with high (negative) values which gives some importance to meaningless data...
Does someone have an advice on how to proceed ?
Thanks !
Instead, represent your features as 2 dimensions:
First one is normalised value of the feature if it is non zero (where normalisation is computed over non zero elements), otherwise it is 0
Second is 1 iff the feature was 0, otherwise it is 1. This makes sure that 0 from the previous feature that could either come from raw 0, or from normalised 0 can be discriminated
You can think of this as encoding extra feature saying "the other feature is missing". This way scale of each feature is normalised, and all informatino preserved
I am trying to implement a Microfacet BRDF shading model (similar to the Cook-Torrance model) and I am having some trouble with the Beckmann Distribution defined in this paper: https://www.cs.cornell.edu/~srm/publications/EGSR07-btdf.pdf
Where M is a microfacet normal, N is the macrofacet normal and ab is a "hardness" parameter between [0, 1].
My issue is that this distribution often returns obscenely large values, especially when ab is very small.
For instance, the Beckmann distribution is used to calculate the probability of generating a microfacet normal M per this equation :
A probability has to be between the range [0,1], so how is it possible to get a value within this range using the function above if the Beckmann distribution gives me values that are 1000000000+ in size?
So there a proper way to clamp the distribution? Or am I misunderstanding it or the probability function? I had tried simply clamping it to 1 if the value exceeded 1 but this didn't really give me the results I was looking for.
I was having the same question you did.
If you read
http://blog.selfshadow.com/publications/s2012-shading-course/hoffman/s2012_pbs_physics_math_notes.pdf
and
http://blog.selfshadow.com/publications/s2012-shading-course/hoffman/s2012_pbs_physics_math_notebook.pdf
You'll notice it's perfectly normal. To quote from the links:
"The Beckmann Αb parameter is equal to the RMS (root mean square) microfacet slope. Therefore its valid range is from 0 (non-inclusive –0 corresponds to a perfect mirror or Dirac delta and causes divide by 0 errors in the Beckmann formulation) and up to arbitrarily high values. There is no special significance to a value of 1 –this just means that the RMS slope is 1/1 or 45°.(...)"
Also another quote:
"The statistical distribution of microfacet orientations is defined via the microfacet normal distribution function D(m). Unlike F (), the value of D() is not restricted to lie between 0 and 1—although values must be non-negative, they can be arbitrarily large (indicating a very high concentration of microfacets with normals pointing in a particular direction). (...)"
You should google for Self Shadow's Physically Based Shading courses which is full of useful material (there is one blog post for each year: 2010, 2011, 2012 & 2013)
I did PCA/FA analysis with and without standardization and end up with different results. For standardization, I just divided each input variable by its corresponding standard deviation. However, I have not subtracted the mean (as in case of Z-scores). My question is how important it is to subtract the mean in case of PCA/FA?
I found on another blog that dividing by std dev is another way of standardizing the data-set. Is this superior to z-scores in any sense? Thanks.
By definition, principal components try to capture highest variation in the data; The important point is that, variation in here is defined as the 2nd norm; not variance and not standard deviation;
For example the first principal component is the linear combination of data in the direction given by:
This matters a lot because
unlike variance, 2nd norm is sensitive to location; in other words, if you add a constant to a vector, the variance will not change but the 2nd norm will change;
unlike standard deviation, 2nd norm is sensitive to scale; i.e. if a vector is multiplied by a constant factor, 2nd norm will scale by that factor;
There are at least two problems if an analysis is impacted by location and scale of explanatory factors:
In reality, observations represent different phenomena, so they have different and incomparable scale and average; for example the variations and average income values are not comparable with variations and average age of a sample population;
You do not want the model results conceptually change if for example incomes are quoted in cents as opposed to dollars, or measurements are done in inches and feet as opposed to meters;
But, plain PCA is sensitive to scale and location; for example, this is a PCA analysis on two dimensional standard normal variables with correlation .4;
The red lines represents the direction of loading vectors; Obviously the first principal component is capturing the highest variation in the joint data, and correctly gives equal shares to each vector;
But things will change dramatically if we move the population 2 units to the right; (equivalent of increasing the average of the first vector by 2 units):
Technically we have the same data as before, but now the first principal component is basically capturing the fact that the first vector has non-zero mean;
Similarly, if the first vector is scaled by a factor of 2:
As can be seen, the first vector has got 4 times more weight than the second vector, simply driven by the fact that it has higher variance.
This shows the importance of normalizing scale and removing mean value from the data before doing PCA;
That said, still one can come up with certain situations that the relative location and scale of the explanatory factors have useful information in the analysis and they should not be wiped out of the data.
How do I compute the generalized mean for extreme values of p (very close to 0, or very large) with reasonable computational error?
As per your link, the limit for p going to 0 is the geometric mean, for which bounds are derived.
The limit for p going to infinity is the maximum.
I have been struggling with the same problem. Here is how I handled this:
Let gmean_p(x1,...,xn) be the generalized mean where p is real but not 0, and x1, ..xn nonnegative. For M>0, we have gmean_p(x1,...,xn) = M*gmean_p(x1/M,...,xn/M) of which the latter form can be exploited to reduce the computational error. For large p, I use M=max(x1,...,xn) and for p close to 0, I use M=mean(x1,..xn). In case M=0, just add a small positive constant to it. This did the job for me.
I suspect if you're interested in very large or small values of p, it may be best to do some form of algebraic manipulation of the generalized-mean formula before putting in numerical values.
For example, in the small-p limit, one can show that the generalized mean tends to the n'th root of the product x_1*x_2*...x_n. The higher order terms in p involve sums and products of log(x_i), which should also be relatively numerically stable to compute. In fact, I believe the first-order expansion in p has a simple relationship to the variance of log(x_i):
If one applies this formula to a set of 100 random numbers drawn uniformly from the range [0.2, 2], one gets a trend like this:
which here shows the asymptotic formula becoming pretty accurate for p less than about 0.3, and the simple formula only failing when p is less than about 1e-10.
The case of large p, is dominated by that x_i which has the largest magnitude (lets call that index i_max). One can rearrange the generalized mean formula to take the following form, which has less pathological behaviour for large p:
If this is applied (using standard numpy routines including numpy.log1p) to another 100 uniformly distributed samples over [0.2, 2.0], one finds that the rearranged formula agrees essentially exactly with the simple formula, but remains valid for much larger values of p for which the simple formula overflows when computing powers of x_i.
(Note that the left-hand plot has the blue curve for the simple formula shifted up by 0.1 so that one can see where it ends due to overflows. For p less than about 1000, the two curves would otherwise be indistinguishable.)
I think the answer here should be to use a recursive solution. In the same way that mean(1,2,3,4)=mean(mean(1,2),mean(3,4)), you can do this kind of recursion for generalized means. What this buys you is that you won't need to do as many sums of really large numbers and you decrease the likelihood of creating an overflow. Also, the other danger when working with floating point numbers is when adding numbers of very different magnitudes (or subtracting numbers of very similar magnitudes). So to avoid these kinds of rounding errors it might help to sort your data before you try and calculate the generalized mean.
Here's a hunch:
First convert all your numbers into a representation in base p. Now to raise to a power of 1/p or p, you just have to shift them --- so you can very easily do all powers without losing precision.
Work out your mean in base p, then convert the result back to base two.
If that doesn't work, an even less practical hunch:
Try working out the discrete Fourier transform, and relating that to the discrete Fourier transform of the input vector.
I have a reference set of n points, and another set which 'approximates' each of those points. How do I find out the absolute/percentage error between the approximation and my reference set.
Put in other words, I have a canned animation and a simulation. How do I know how much is the 'drift' between the 2 in terms of a single number? That is, how good is the simulation approximating the vertices as compared to those of the animation.
I actually do something like this for all vertices: |actual - reference|/|actual| and then average out the errors by dividing the number of verts. Is this correct at all?
Does this measurement really have to be a percentage value? I'm guessing you have one reference set, and then several sets that approximate this set and you want to pick the one that is "the best" in some sense.
I'd add the squared distances between the actual and the reference:
avgSquareDrift = sum(1..n, |actual - reference|^2) / numvertices
Main advantage with this approach, is that we dont need to take apply the square root, which is a costly operation.
If you sum the formula you have over all vertices (and then divide by the number of verts) you will have calculated the average percentage error in position for all vertices.
However, this percentage error is probably not quite what you want, because vertices closer to the origin will have a greater "percentage error" for the same displacement because their magnitude is smaller.
If you don't divide by anything at all, you will have the average drift in world units, which may be exactly what you want:
average_drift = sum(1->numvertices, |actual - reference|) / numvertices
You may want to divide by something more appropriate to your particular situation to get a meaningful unitless number. If you divide average_drift by the height of your model, you will have the error as a percentage of the model size, which could be useful.
If individual vertices are likely to have more error if they are a long distance from a vertex 'parented' to them, as could be the case if they are vertices of a jointed model, you could divide each error by the length of their parent joint to get the average error normalised for joint orientation -- i.e. what the average drift would be if each joint were of unit length:
orientation_drift = sum(1->numvertices, |actual - reference| / jointlength) / numvertices