What does taking the logarithm of a variable mean? [closed] - statistics

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Question with regards to taking the logarithm of a variable (Statistics Question)
Say you have a bar graph displaying data for an example "Cost of Computer Orders by the Population" and you are trying to analyze the data and find a distribution. The information does not indicate anything so you take the logarithm of the variable and the graph then resembles a normal distribution. I know that the normal distribution basically means the mean, but what does taking the logarithm of the information indicate?

It seems that you are describing the lognormal distribution: a random variable is said to come from a lognormal distribution if its logarithm is distributed normally.
In practice, this can describe processes where the value cannot go below zero, and most of the population is close to the left (right skewness). For example: salaries, home prices, bone fractures, number of girlfriends all could be reasonably modeled with a log normal distribution.
For example: say that on average young adults have had 2.5 girlfriends. A few have never had one; you cannot have "negative number" of girlfriends, and a few bastards have had 25. However, most young adults will have had between, say, one and three.

if you display the values of x as their log(x) then the line in the diagramm is a straight line, when the values grow exponential. This is a stastistically trick for a fast check if values grow exponentionally.

Related

Best way to store "percentages" while programming? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have a financial application that deals with "percentages" quite frequently. We are using "decimal" types to avoid rounding errors. My question is:
If I have a quantity representing 76%, is it better to store it as 0.76 or 76?
Furthermore, what are the advantages and/or disadvantages from a code maintainability point of view? Are there conventions for this?
If percentage is only a small part of the problem domain, I would probably stick with a primitive number.
PerCent literally means per hundred, and is a decimal number; e.g. 76 % is equal to 0.76. Thus, in the absence of units of measure support in the programming language itself, I would represent a percentage as a decimal number.
This also means that you don't need to perform special arithmetic in order to calculate with percentages, but you will need to multiply by 100 if you need to display the number in percent.
If percentage is a very central part of the problem domain, you should consider discarding Primitive Obsession and instead introduce a proper Value Object.
Still, even if you introduce a Percent Value Object that contains conversions to and from primitive numbers, you're still stuck with potential programmer errors, because in itself, the number 0.9 could both mean 90 % or 0.9 % depending on how you choose to interpret it.
In the end, my best advice is to cover your code base with appropriate unit tests, so that you lock the conversion code down.

Graphic to check for complete separation [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I need to check for complete separation. I am using SPSS and need to know what steps I have to take to get the grahpic on this site. Can someone help me?
SPSS does not provide that probability curve (SAS and Stata can do that). However, plotting the 1/0 outcome against the continuous predictor, and observe how the two horizontal data lines overlap may be enough to give you some hint.
If you have enough data, you can also first separate your data by different groups (for example, 10 equal groups split by your continuous predictors), and the compute each group's mean (aka probability of "yes" to outcome), and join the points. That line should approximate the curve in the illustration you provide.

covariance of orthorgonal variables [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
For uncorrelated variables, the covariance should be zero. But for the two variables, x=0,1 and y=1,0. Clearly they are orthogonal and so they are uncorrelated. But the covariance is
(0-0.5)(1-0.5)+(1-0.5)(0-0.5)/2 = -0.25 not zero. What is wrong then? Sorry for the stupid question.
"Clearly they are orthogonal and so they are uncorrelated" - vectors can be orthogonal, this has nothing to do with random processes...
They are negatively correlated (one thing goes up when another goes down), everything is all right...
From wikipedia:
covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behavior, the covariance is positive.1 In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative.

Why does cos(2^27) fail? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
It seems excel knows how to calculate =cos(2^27-1) but fails to calculate =cos(2^27). That returns #NUM!. Does anyone know why?
I have no idea what sort of arithmetic that Excel uses internally, but at some point, with a large number, the error after you do a mod 2*pi operation is too substantial to produce a reliable answer. Presumably they picked 2^27 as their cutoff.
This is behavior is not well documented. The Sin Function documentation indicates that the argument is a Double, and the specified limits in the documentation indicate that the double type is stored as a 64-bit number ranging from 4.94E-324 to 1.797E308 (for positive numbers).
I suspect that it is not coincidental that 2^27 (134,217,728) bytes is precisely 128 megabytes, and it seems likely that there is an internal limitation for some trig functions (eg. COS, SIN and TAN, but interestingly, NOT for TANH, etc.). This is not to say that this amount of memory consumption would be required - it's just that a programmer's implementation could have some (potentially unnecessary) limits on these types of inputs internally.
To get around this silly limit, simply use the following:
=COS(MOD(2^27, 2*PI()))
This works because the limitation does not exist for other operations, and is nowhere to be seen in the Excel Specifications and Limits. :-)
It would be good for the documentation as linked provided a description of these limits, but unfortunately, it does not.

what is the advantage to use Spline to represent curve? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Often hear about curve modeled using spline. What's the advantage of using spline?
Spline data consists of control points and weights that are related to each other (a point on a spline depends on the coordinates and weights several neighboring control points). Curve data would either be a large set of closely spaced points to approximate the curve (expensive to store, where spline data is sparse), or an equation which might take a lot of horsepower to solve for y from a given x. Splines can be cheaply computed and subdivided/interpolated to achieve the desired precision but a curve of explicit points loses precision without having weight information. Splines are also really useful in vector art (think Flash or Adobe Illustrator) and 3D graphics because you can intuitively drag a few control points around to get exactly the curve you want instead of having to move a ton of individual curve points.

Resources