what is zipF factor and why is its value 0.89? - statistics

Can someone tell me where I can use it ? Does it have something to do with weights and distribution ?? I am trying to study a code which uses zipf value to be 0.89 with no mention of as to why ?

Perhaps it's the exponent parameter for a distribution following Zipf's law.

Related

Why is Standard Deviation the square of difference of an obsevation from the mean?

I am learning statistics, and have some basic yet core questions on SD:
s = sample size
n = total number of observations
xi = ith observation
μ = arithmetic mean of all observations
σ = the usual definition of SD, i.e. ((1/(n-1))*sum([(xi-μ)**2 for xi in s])**(1/2) in Python lingo
f = frequency of an observation value
I do understand that (1/n)*sum([xi-μ for xi in s]) would be useless (= 0), but would not (1/n)*sum([abs(xi-μ) for xi in s]) have been a measure of variation?
Why stop at power of 1 or 2? Would ((1/(n-1))*sum([abs((xi-μ)**3) for xi in s])**(1/3) or ((1/(n-1))*sum([(xi-μ)**4 for xi in s])**(1/4) and so on have made any sense?
My notion of squaring is that it 'amplifies' the measure of variation from the arithmetic mean while the simple absolute difference is somewhat a linear scale notionally. Would it not amplify it even more if I cubed it (and made absolute value of course) or quad it?
I do agree computationally cubes and quads would have been more expensive. But with the same argument, the absolute values would have been less expensive... So why squares?
Why is the Normal Distribution like it is, i.e. f = (1/(σ*math.sqrt(2*pi)))*e**((-1/2)*((xi-μ)/σ))?
What impact would it have on the normal distribution formula above if I calculated SD as described in (1) and (2) above?
Is it only a matter of our 'getting used to the squares', it could well have been linear, cubed or quad, and we would have trained our minds likewise?
(I may not have been 100% accurate in my number of opening and closing brackets above, but you will get the idea.)
So, if you are looking for an index of dispersion, you actually don't have to use the standard deviation. You can indeed report mean absolute deviation, the summary statistic you suggested. You merely need to be aware of how each summary statistic behaves, for example the SD assigns more weight to outlying variables. You should also consider how each one can be interpreted. For example, with a normal distribution, we know how much of the distribution lies between ±2SD from the mean. For some discussion of mean absolute deviation (and other measures of average absolute deviation, such as the median average deviation) and their uses see here.
Beyond its use as a measure of spread though, SD is related to variance and this is related to some of the other reasons it's popular, because the variance has some nice mathematical properties. A mathematician or statistician would be able to provide a more informed answer here, but squared difference is a smooth function and is differentiable everywhere, allowing one to analytically identify a minimum, which helps when fitting functions to data using least squares estimation. For more detail and for a comparison with least absolute deviations see here. Another major area where variance shines is that it can be easily decomposed and summed, which is useful for example in ANOVA and regression models generally. See here for a discussion.
As to your questions about raising to higher powers, they actually do have uses in statistics! In general, the mean (which is related to average absolute mean), the variance (related to standard deviation), skewness (related to the third power) and kurtosis (related to the fourth power) are all related to the moments of a distribution. Taking differences raised to those powers and standardizing them provides useful information about the shape of a distribution. The video I linked provides some easy intuition.
For some other answers and a larger discussion of why SD is so popular, See here.
Regarding the relationship of sigma and the normal distribution, sigma is simply a parameter that stretches the standard normal distribution, just like the mean changes its location. This is simply a result of the way the standard normal distribution (a normal distribution with mean=0 and SD=variance=1) is mathematically defined, and note that all normal distributions can be derived from the standard normal distribution. This answer illustrates this. Now, you can parameterize a normal distribution in other ways as well, but I believe you do need to provide sigma, whether using the SD or precisions. I don't think you can even parametrize a normal distribution using just the mean and the mean absolute difference. Now, a deeper question is why normal distributions are so incredibly useful in representing widely different phenomena and crop up everywhere. I think this is related to the Central Limit Theorem, but I do not understand the proofs of the theorem well enough to comment further.

Is the definition of hyperparameter C in SVR opposite to the corresponding C in SVM?

I just realized that support vector machine can be used for regression thanks to the nice article However, I am quite confused with the definition of the hyperparameter C.
I am well aware of the slack variables \xi_i associated with each data point and the hyperparameter C in classification SVM. There, the objective function is
\min_{w, b} \frac{|w|}{2} + C\sum_{i=1}^N \xi_i, such that
y_i (w \cdot x_i + b) \ge 1 - \xi_i and \xi_i \ge 0.
In SVM, the larger C is, the larger the penalty and hence soft SVM reduces to hard SVM as C goes to infinity. (sorry for the raw latex code, i remember latex is supported but it seems not the case)
From the linked article, the objective function and the constraints are as follows
I think the equations also imply that the larger C is, the larger the penalty. However, the author of the article claims the opposite,
I noticed that someone asked the author the same question at the end of the article but there has been no response.
I guess there might be a typo in the equation, so I looked for support from any reference and then I found that SVR in Python uses the same convention that the strength of regularization is inversely proportional to C. I tried to check the source code of SVR but I can't find any formula. Can someone help resolve this? Thanks!
C is regularization parameter, which means its before the square of W, not before the relaxation, so the C parameter may be equal to 1/C.

Find unbiased estimator for (1+λ)e−λ in a poisson distribution

My Attempt:
I tried using the MLE of λ which I find the sample mean. Then using the invariance property and it follows that (1+X¯)e^−X¯will be the MLE of (1+λ)e^−λ but I'm not sure if it is also unbiased.
As this is probably a homework assignment, let me help you with the step you took yourself.
Due to the form of you may indeed use the invariance property to get the MLE as you mentioned.
To check whether it is unbiased, you need to prove that , i.e., that the expected value of is equal to the MLE estimator at . You may require the mathematical series expression of the exponential function at some point.
Try to do this step yourself, and remember to use the Law of the Unconscious Statistician (LOTUS).

Normal Approximation versus MAP in pymc

Can anyone explain to me what does Normal Approximation do over and above what is done by MAP in some easy words?
I have read enough on http://pymc-devs.github.io/pymc/modelfitting.html#normal-approximations
but it is too complicated for me.
An example showing the difference will be very helpful.
MAP simply returns a posterior mode, while the NormalApproximation uses a quadratic Taylor series approximation to the posterior, and so can return both the expected value and the covariance matrix. Of course, it uses a normal distribution to approximate the posterior, which may not be appropriate.

What's the correct term for "number of std deviations" away from a mean

I've computed the mean & variance of a set of values, and I want to pass along the value that represents the # of std deviations away from mean for each number in the set. Is there a better term for this, or should I just call it num_of_std_devs_from_mean ...
Some suggestions here:
Standard score (z-value, z-score, normal score)
but "sigma" or "stdev_distance" would probably be clearer
The standard deviation is usually denoted with the letter σ (sigma). Personally, I think more people will understand what you mean if you do say number of standard deviations.
As for a variable name, as long as you comment the declaration you could shorten it to std_devs.
sigma is what you want, I think.
That is normalizing your values. You could just refer to it as the normalized value. Maybe norm_val would be more appropriate.
I've always heard it as number of standard deviations
Deviation may be what you're after. Deviation is the distance between a data point and the mean.

Resources