For what sample mean would the p-value be equal to 0.05 - statistics

It's a homework question. I am not looking for exact answer but need a direction. I have following question
H0: µ = 30
HA: µ != 30
We know that the sample standard deviation is 10 and the sample size is 70. For what sample mean would the p-value be equal to 0.05? Assume that all conditions necessary for inference are satisfied.
I am solving it as following
As our H0 is based on equal sign, so our test is two sided, we need to check both for small and larger values.
The P value of 0.05 in the probability table is equal to a z score of 1.65.
We need to calculate S.E and then find mean using Z formula
S.D = 10
n = 70
se <- S.D/sqrt(n)
# Z= (xbar- µ)/ S.E => xbar = Z * S.E + µ
xbar = (1.65*se)+30
So, in this way i get one mean value. but our test is two sided. so i need another mean value. I am not getting how can I solve it. Any suggestion or idea will be appreciable.
Thanks

Related

What's the difference between these two methods for calculating a weighted median?

I'm trying to calculate a weighted median, but don't understand the difference between the following two methods. The answer I get from weighted.median() is different from (df, median(rep(value, count))), but I don't understand why. Are there many ways to get a weighted median? Is one more preferable over the other?
df = read.table(text="row count value
1 1. 25.
2 2. 26.
3 3. 30.
4 2. 32.
5 1. 39.", header=TRUE)
# weighted median
with(df, median(rep(value, count)))
# [1] 30
library(spatstat)
weighted.median(df$value, df$count)
# [1] 28
Note that with(df, median(rep(value, count))) only makes sense for weights which are positive integers (rep will accept float values for count but will coerce them to integers). This approach is thus not a full general approach to computing weighted medians. ?weighted.median shows that what the function tries to do is to compute a value m such that the total weight of the data below m is 50% of the total weight. In the case of your sample, there is no such m that works exactly. 28.5% of the total weight of the data is <= 26 and 61.9% is <= 30. In a case like this, by default ("type 2") it averages these 2 values to get the 28 that is returned. There are two other types. weighted.median(df$value,df$count,type = 1) returns 30. I am not completely sure if this type will always agree with your other approach.

Calculating contrast values on Excel

I am currently studying experimental designs in statistics and I am calculating values pertaining to 2^3 factorial designs.
The question that I have is particularly with the calculations of the "contrasts".
My goal of this question is to learn how to use the table "Coded Factors" and "Total" in order to get the values "Contrast" using the IF THEN function in Excel.
For example, Contrast A is calculated as : x - y . Where
x = sum of the values in the Total, where the Coded Factor A is + .
And y= sum of the values in the Total, where the Coded Factor A is - .
This would be rather simple, but for the interactions it is a bit more complex.
For example, contrast AC is obtained as : x - y . Where
x = sum of the values in the Total, where the product of Coded Factor A and that of C becomes + .
And y = sum of the values in the Total, where the product of Coded Factor A and that of B becomes - .
I would really appreciate your help.
Edited:
Considering the way how IF statements work, I thought that it might be a good idea to convert the + into 1 and - into -1 to make the calculation straight forward.
Convert all +/- to 1/-1. Use some cells as helper..
Put in these formulas :
J2 --> =LEFT(J1)
K2 --> =MID(J1,2,1)
L2 --> =MID(J1,3,1)
Put
J3 --> =IF(J$2="",1,INDEX($B3:$D3,MATCH(J$2,$B$2:$D$2,0)))
and drag to L10. Then
M3 --> =J3*K3*L3*G3
and drag to M10. Lastly,
M1 --> =SUM(M3:M10)
How to use : Input the Factor comb in cell J1 and the result will be in M1.
Idea : separate the factor text > load the multiplier > multiply Total values with multiplier > get sum.
Hope it helps.

why divide sample standard deviation by sqrt(sample size) when calculating z-score

I have been following Khan Academy videos to gain understanding of hypothesis testing, and I must confess that all my understanding thus far is based on that source.
Now, the following videos talk about z-score/hypothesis testing:
Hypothesis Testing
Z-statistic vs T-statistic
Now, coming to my doubts, which is all about the denominator in the z-score:
For the z-score formula which is: z = (x – μ) / σ,
we use this directly when the standard deviation of the population(σ), is known.
But when its unknown, and we use a sampling distribution,
then we have z = (x – μ) / (σ / √n); and we estimate σ with σs ; where σs is the standard deviation of the sample, and n is the sample size.
Then z score = (x – μ) / (σs / √n). Why are dividing by √n, when σs is already known?
Even in the video, Hypothesis Testing - Sal divides the sample's standard deviation by √n. Why are we doing this, when σs is directly given?
Please help me understand.
I tried applying this on the following question, and faced the problems below:
Question : Yardley designed new perfumes. Yardley company claimed that an average new
perfume bottle lasts 300 days. Another company randomly selects 35 new perfume bottles from
Yardley for testing. The sampled bottles last an average of 190 days, with a
standard deviation of 50 days. If the Yardley's claim were true,
what is the probability that 35 randomly selected bottles would have an average
life of no more than 190 days ?
So, the above question, when I do the following:
z = (190-300)/(50/√35), we get z = -13.05, which is not a possible score, since
z score should be between +-3.
And when I do, z = (190-110)/50, or rather z = (x – μ) / σ, I seem to be getting an acceptable answer over here.
Please help me figure out what I am missing.
I think the origin of the 1/\sqrt{n} is simply whether you're calculating the standard deviation of the lifetime of a single bottle, or the standard deviation of the (sample) mean of a set of bottles.
The question indicates that 50 days is the standard deviation of the lifetimes of the set of 35 bottles. That implies that the estimated mean age (190 days) will have a margin of error of about 50/\sqrt{35} days. Assuming that this similar margin of error applied to the claimed 300-day lifetime, one can calculate the probability that a set of 35 bottles would be measured to be 190 days or less, using the complementary error function.
Your z=-13.05 looks about right, implying that it is extremely unlikely that claimed 300-day lifetime is consistent with that seen in the 35-bottle experiment.

How to calculate growth with a positive and negative number?

I am trying to calculate percentage growth in excel with a positive and negative number.
This Year's value: 2434
Last Year's value: -2
formula I'm using is:
(This_Year - Last_Year) / Last_Year
=(2434 - -2) / -2
The problem is I get a negative result. Can an approximate growth number be calculated and if so how?
You could try shifting the number space upward so they both become positive.
To calculate a gain between any two positive or negative numbers, you're going to have to keep one foot in the magnitude-growth world and the other foot in the volume-growth world. You can lean to one side or the other depending on how you want the result gains to appear, and there are consequences to each choice.
Strategy
Create a shift equation that generates a positive number relative to the old and new numbers.
Add the custom shift to the old and new numbers to get new_shifted and old_shifted.
Take the (new_shifted - old_shifted) / old_shifted) calculation to get the gain.
For example:
old -> new
-50 -> 30 //Calculate a shift like (2*(50 + 30)) = 160
shifted_old -> shifted_new
110 -> 190
= (new-old)/old
= (190-110)/110 = 72.73%
How to choose a shift function
If your shift function shifts the numbers too far upward, like for example adding 10000 to each number, you always get a tiny growth/decline. But if the shift is just big enough to get both numbers into positive territory, you'll get wild swings in the growth/decline on edge cases. You'll need to dial in the shift function so it makes sense for your particular application. There is no totally correct solution to this problem, you must take the bitter with the sweet.
Add this to your excel to see how the numbers and gains move about:
shift function
old new abs_old abs_new 2*abs(old)+abs(new) shiftedold shiftednew gain
-50 30 50 30 160 110 190 72.73%
-50 40 50 40 180 130 220 69.23%
10 20 10 20 60 70 80 14.29%
10 30 10 30 80 90 110 22.22%
1 10 1 10 22 23 32 39.13%
1 20 1 20 42 43 62 44.19%
-10 10 10 10 40 30 50 66.67%
-10 20 10 20 60 50 80 60.00%
1 100 1 100 202 203 302 48.77%
1 1000 1 1000 2002 2003 3002 49.88%
The gain percentage is affected by the magnitude of the numbers. The numbers above are a bad example and result from a primitive shift function.
You have to ask yourself which critter has the most productive gain:
Evaluate the growth of critters A, B, C, and D:
A used to consume 0.01 units of energy and now consumes 10 units.
B used to consume 500 units and now consumes 700 units.
C used to consume -50 units (Producing units!) and now consumes 30 units.
D used to consume -0.01 units (Producing) and now consumes -30 units (producing).
In some ways arguments can be made that each critter is the biggest grower in their own way. Some people say B is best grower, others will say D is a bigger gain. You have to decide for yourself which is better.
The question becomes, can we map this intuitive feel of what we label as growth into a continuous function that tells us what humans tend to regard as "awesome growth" vs "mediocre growth".
Growth a mysterious thing
You then have to take into account that Critter B may have had a far more difficult time than critter D. Critter D may have far more prospects for it in the future than the others. It had an advantage! How do you measure the opportunity, difficulty, velocity and acceleration of growth? To be able to predict the future, you need to have an intuitive feel for what constitutes a "major home run" and a "lame advance in productivity".
The first and second derivatives of a function will give you the "velocity of growth" and "acceleration of growth". Learn about those in calculus, they are super important.
Which is growing more? A critter that is accelerating its growth minute by minute, or a critter that is decelerating its growth? What about high and low velocity and high/low rate of change? What about the notion of exhausting opportunities for growth. Cost benefit analysis and ability/inability to capitalize on opportunity. What about adversarial systems (where your success comes from another person's failure) and zero sum games?
There is exponential growth, liner growth. And unsustainable growth. Cost benefit analysis and fitting a curve to the data. The world is far queerer than we can suppose. Plotting a perfect line to the data does not tell you which data point comes next because of the black swan effect. I suggest all humans listen to this lecture on growth, the University of Colorado At Boulder gave a fantastic talk on growth, what it is, what it isn't, and how humans completely misunderstand it. http://www.youtube.com/watch?v=u5iFESMAU58
Fit a line to the temperature of heated water, once you think you've fit a curve, a black swan happens, and the water boils. This effect happens all throughout our universe, and your primitive function (new-old)/old is not going to help you.
Here is Java code that accomplishes most of the above notions in a neat package that suits my needs:
Critter growth - (a critter can be "radio waves", "beetles", "oil temprature", "stock options", anything).
public double evaluate_critter_growth_return_a_gain_percentage(
double old_value, double new_value) throws Exception{
double abs_old = Math.abs(old_value);
double abs_new = Math.abs(new_value);
//This is your shift function, fool around with it and see how
//It changes. Have a full battery of unit tests though before you fiddle.
double biggest_absolute_value = (Math.max(abs_old, abs_new)+1)*2;
if (new_value <= 0 || old_value <= 0){
new_value = new_value + (biggest_absolute_value+1);
old_value = old_value + (biggest_absolute_value+1);
}
if (old_value == 0 || new_value == 0){
old_value+=1;
new_value+=1;
}
if (old_value <= 0)
throw new Exception("This should never happen.");
if (new_value <= 0)
throw new Exception("This should never happen.");
return (new_value - old_value) / old_value;
}
Result
It behaves kind-of sort-of like humans have an instinctual feel for critter growth. When our bank account goes from -9000 to -3000, we say that is better growth than when the account goes from 1000 to 2000.
1->2 (1.0) should be bigger than 1->1 (0.0)
1->2 (1.0) should be smaller than 1->4 (3.0)
0->1 (0.2) should be smaller than 1->3 (2.0)
-5-> -3 (0.25) should be smaller than -5->-1 (0.5)
-5->1 (0.75) should be smaller than -5->5 (1.25)
100->200 (1.0) should be the same as 10->20 (1.0)
-10->1 (0.84) should be smaller than -20->1 (0.91)
-10->10 (1.53) should be smaller than -20->20 (1.73)
-200->200 should not be in outer space (say more than 500%):(1.97)
handle edge case 1-> -4: (-0.41)
1-> -4: (-0.42) should be bigger than 1-> -9:(-0.45)
Simplest solution is the following:
=(NEW/OLD-1)*SIGN(OLD)
The SIGN() function will result in -1 if the value is negative and 1 if the value is positive. So multiplying by that will conditionally invert the result if the previous value is negative.
Percentage growth is not a meaningful measure when the base is less than 0 and the current figure is greater than 0:
Yr 1 Yr 2 % Change (abs val base)
-1 10 %1100
-10 10 %200
The above calc reveals the weakness in this measure- if the base year is negative and current is positive, result is N/A
It is true that this calculation does not make sense in a strict mathematical perspective, however if we are checking financial data it is still a useful metric. The formula could be the following:
if(lastyear>0,(thisyear/lastyear-1),((thisyear+abs(lastyear)/abs(lastyear))
let's verify the formula empirically with simple numbers:
thisyear=50 lastyear=25 growth=100% makes sense
thisyear=25 lastyear=50 growth=-50% makes sense
thisyear=-25 lastyear=25 growth=-200% makes sense
thisyear=50 lastyear=-25 growth=300% makes sense
thisyear=-50 lastyear=-25 growth=-100% makes sense
thisyear=-25 lastyear=-50 growth=50% makes sense
again, it might not be mathematically correct, but if you need meaningful numbers (maybe to plug them in graphs or other formulas) it's a good alternative to N/A, especially when using N/A could screw all subsequent calculations.
You should be getting a negative result - you are dividing by a negative number. If last year was negative, then you had negative growth. You can avoid this anomaly by dividing by Abs(Last Year)
Let me draw the scenario.
From: -303 To 183, what is the percentage change?
-303, -100% 0 183, 60.396% 303, 100%
|_________________ ||||||||||||||||||||||||________|
(183 - -303) / |-303| * 100 = 160.396%
Total Percent Change is approximately 160%
Note: No matter how negative the value is, it is treated as -100%.
The best way to solve this issue is using the formula to calculate a slope:
(y1-y2/x1-x2)
*define x1 as the first moment, so value will be "C4=1"
define x2 as the first moment, so value will be "C5=2"
In order to get the correct percentage growth we can follow this order:
=(((B4-B5)/(C4-C5))/ABS(B4))*100
Perfectly Works!
Simplest method is the one I would use.
=(ThisYear - LastYear)/(ABS(LastYear))
However it only works in certain situations. With certain values the results will be inverted.
It really does not make sense to shift both into the positive, if you want a growth value that is comparable with the normal growth as result of both positive numbers. If I want to see the growth of 2 positive numbers, I don't want the shifting.
It makes however sense to invert the growth for 2 negative numbers. -1 to -2 is mathematically a growth of 100%, but that feels as something positive, and in fact, the result is a decline.
So, I have following function, allowing to invert the growth for 2 negative numbers:
setGrowth(Quantity q1, Quantity q2, boolean fromPositiveBase) {
if (q1.getValue().equals(q2.getValue()))
setValue(0.0F);
else if (q1.getValue() <= 0 ^ q2.getValue() <= 0) // growth makes no sense
setNaN();
else if (q1.getValue() < 0 && q2.getValue() < 0) // both negative, option to invert
setValue((q2.getValue() - q1.getValue()) / ((fromPositiveBase? -1: 1) * q1.getValue()));
else // both positive
setValue((q2.getValue() - q1.getValue()) / q1.getValue());
}
These questions are answering the question of "how should I?" without considering the question "should I?" A change in the value of a variable that takes positive and negative values is fairly meaning less, statistically speaking. The suggestion to "shift" might work well for some variables (e.g. temperature which can be shifted to a kelvin scale or something to take care of the problem) but very poorly for others, where negativity has a precise implication for direction. For example net income or losses. Operating at a loss (negative income) has a precise meaning in this context, and moving from -50 to 30 is not in any way the same for this context as moving from 110 to 190, as a previous post suggests. These percentage changes should most likely be reported as "NA".
Just change the divider to an absolute number.i.e.
A B C D
1 25,000 50,000 75,000 200%
2 (25,000) 50,000 25,000 200%
The formula in D2 is: =(C2-A2)/ABS(A2) compare with the all positive row the result is the same (when the absolute base number is the same). Without the ABS in the formula the result will be -200%.
Franco
Use this code:
=IFERROR((This Year/Last Year)-1,IF(AND(D2=0,E2=0),0,1))
The first part of this code iferror gets rid of the N/A issues when there is a negative or a 0 value. It does this by looking at the values in e2 and d2 and makes sure they are not both 0. If they are both 0 then it will place a 0%. If only one of the cells are a 0 then it will place 100% or -100% depending on where the 0 value falls. The second part of this code (e2/d2)-1 is the same code as (this year - lastyear)/Last year
Please click here for example picture
I was fumbling for answers today, and think this would work...
=IF(C5=0, B5/1, IF(C5<0, (B5+ABS(C5)/1), IF(C5>0, (B5/C5)-1)))
C5 = Last Year, B5 = This Year
We have 3 IF statements in the cell.
IF Last Year is 0, then This Year divided by 1
IF Last Year is less than 0, then This Year + ABSolute value of Last Year divided by 1
IF Last Year is greater than 0, then This Year divided by Last Year minus 1
Use this formula:
=100% + (Year 2/Year 1)
The logic is that you recover 100% of the negative in year 1 (hence the initial 100%) plus any excess will be a ratio against year 1.
Short one:
=IF(D2>C2, ABS((D2-C2)/C2), -1*ABS((D2-C2)/C2))
or confusing one (my first attempt):
=IF(D2>C2, IF(C2>0, (D2-C2)/C2, (D2-C2)/ABS(C2)), IF(OR(D2>0,C2>0), (D2-C2)/C2, IF(AND(D2<0, C2<0), (D2-C2)/ABS(C2), 0)))
D2 is this year, C2 is last year.
Formula should be this one:
=(thisYear+IF(LastYear<0,ABS(LastYear),0))/ABS(LastYear)-100%
The IF value if < 0 is added to your Thisyear value to generate the real difference.
If > 0, the LastYear value is 0
Seems to work in different scenarios checked
This article offers a detailed explanation for why the (b - a)/ABS(a) formula makes sense. It is counter-intuitive at first, but once you play with the underlying arithmetic, it starts to make sense. As you get used to it eventually, it changes the way you look at percentages.
Aim is to get increase rate.
Idea is following:
At first calculate value of absolute increase.
Then value of absolute increase add to both, this and last year values. And then calculate increase rate, based on the new values.
For example:
LastYear | ThisYear | AbsoluteIncrease | LastYear01 | ThisYear01 | Rate
-10 | 20 | 30 = (10+20) | 20=(-10+30)| 50=(20+30) | 2.5=50/20
-20 | 20 | 40 = (20+20) | 20=(-20+40)| 60=(20+40) | 3=60/2
=(This Year - Last Year) / (ABS(Last Year))
This only works reliably if this year and last year are always positive numbers.
For example last_year=-50 this_year = -1. You get -100% growth when in fact the numbers have improved a great deal.

Need Hint for ProjectEuler Problem

What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
I could easily brute force the solution in an imperative programming language with loops. But I want to do this in Haskell and not having loops makes it much harder. I was thinking of doing something like this:
[n | n <- [1..], d <- [1..20], n `mod` d == 0] !! 0
But I know that won't work because "d" will make the condition equal True at d = 1. I need a hint on how to make it so that n mod d is calculated for [1..20] and can be verified for all 20 numbers.
Again, please don't give me a solution. Thanks.
As with many of the Project Euler problems, this is at least as much about math as it is about programming.
What you're looking for is the least common multiple of a set of numbers, which happen to be in a sequence starting at 1.
A likely tactic in a functional language is trying to make it recursive based on figuring out the relation between the smallest number divisible by all of [1..n] and the smallest number divisible by all of [1..n+1]. Play with this with some smaller numbers than 20 and try to understand the mathematical relation or perhaps discern a pattern.
Instead of a search until you find such a number, consider instead a constructive algorithm, where, given a set of numbers, you construct the smallest (or least) positive number that is evenly divisible by (aka "is a common multiple of") all those numbers. Look at the algorithms there, and consider how Euclid's algorithm (which they mention) might apply.
Can you think of any relationship between two numbers in terms of their greatest common divisor and their least common multiple? How about among a set of numbers?
If you look at it, it seems to be a list filtering operation. List of infinite numbers, to be filtered based on case the whether number is divisible by all numbers from 1 to 20.
So what we got is we need a function which takes a integer and a list of integer and tells whether it is divisible by all those numbers in the list
isDivisible :: [Int] -> Int -> Bool
and then use this in List filter as
filter (isDivisible [1..20]) [1..]
Now as Haskell is a lazy language, you just need to take the required number of items (in your case you need just one hence List.head method sounds good) from the above filter result.
I hope this helps you. This is a simple solution and there will be many other single line solutions for this too :)
Alternative answer: You can just take advantage of the lcm function provided in the Prelude.
For efficiently solving this, go with Don Roby's answer. If you just want a little hint on the brute force approach, translate what you wrote back into english and see how it differs from the problem description.
You wrote something like "filter the product of the positive naturals and the positive naturals from 1 to 20"
what you want is more like "filter the positive naturals by some function of the positive naturals from 1 to 20"
You have to get Mathy in this case. You are gonna do a foldl through [1..20], starting with an accumulator n = 1. For each number p of that list, you only proceed if p is a prime. Now for the previous prime p, you want to find the largest integer q such that p^q <= 20. Multiply n *= (p^q). Once the foldl finishes, n is the number you want.
A possible brute force implementation would be
head [n|n <- [1..], all ((==0).(n `mod`)) [1..20]]
but in this case it would take way too long. The all function tests if a predicate holds for all elements of a list. The lambda is short for (\d -> mod n d == 0).
So how could you speed up the calculation? Let's factorize our divisors in prime factors, and search for the highest power of every prime factor:
2 = 2
3 = 3
4 = 2^2
5 = 5
6 = 2 * 3
7 = 7
8 = 2^3
9 = 3^2
10 = 2 * 5
11 = 11
12 = 2^2*3
13 = 13
14 = 2 *7
15 = 3 * 5
16 = 2^4
17 = 17
18 = 2 * 3^2
19 = 19
20 = 2^2 * 5
--------------------------------
max= 2^4*3^2*5*7*11*13*17*19
Using this number we have:
all ((==0).(2^4*3^2*5*7*11*13*17*19 `mod`)) [1..20]
--True
Hey, it is divisible by all numbers from 1 to 20. Not very surprising. E.g. it is divisible by 15 because it "contains" the factors 3 and 5, and it is divisible by 16, because it "contains" the factor 2^4. But is it the smallest possible number? Think about it...

Resources