Settling Rank ties in Excel for a Mann-Whitney U-test - excel

I have some ranked data that looks something like this in an excel spreadsheet.
1.3
1.3
1.3
1.4
1.6
1.6
1.7
1.8
1.9
2
2
2.3
2.3
2.3
2.4
2.7
3.1
3.3
3.3
3.4
3.4
4
4.2
4.5
4.7
4.9
5.8
6.1
6.7
I'm looking to make a calculator for the Mann-Whitney U-test and for that I need to rank these samples, simple enough using the =RANK() function in Excel, but I need to settle ties in the ranks for the test. The Mann-Whitney method involves taking the average of the ranks. For example, my first 3 values are 1.3 so I need excel to assign all 3 of these values the rank of (1+2+3)/3 (=2). At the moment the =RANK() function just ranks all 3 as 1.
I've seen some similar questions here solved using the IF command but have had trouble appling them to my data.
Any help would be greatley appreciated.
Thanks,
Sam

Which version of Excel are you using? If you have Excel 2010 or later then there is a specific function for this, RANK.AVG, e.g. if your data is in A2:A30 use this formula in B2 copied down to rank as required
=RANK.AVG(A2,A$2:A$30,1)
In earlier versions of Excel you can use this formula to give you the same results
=RANK(A2,A$2:A$30,1)+(COUNTIF(A$2:A$30,A2)-1)/2

You can cheat a little to get duplicates with different ranks. You will need a helper column
I'm going to assume the list of numbers starts # A1...
in B1, enter the formula =A1+ROW()/10000 - this will add on a small amount to the number - not enough to put it into the next value, but enough to distinguish between values
In C1, enter the formula =RANK(B1,$B$1:$B$29,1) to get the new rankings, with no duplicates.
Then copy B1 & C1 down to complete the table. You can hide column B if you don't want the intermediate cells shown

Related

EXCEL PERCENTILE result is wrong compared to a textbook?

I am helping my son with his math homework, specifically statistics and this is the dataset:
1 2 3 4 5 6 7 8 9 10
I have 10 numbers from 1 to 10.
15 percentile:
in Excel I use the PERCENTILE or PERCENTILE.INC function with .15 and the result is 2.35, why?
The book way. .15*10 = 1.5 th number. There is not 1.5 number so round up to 2 or 2.
20 percentile:
In excel I get 2.8.
Book version: .2*10 = 2 (exact) so take average of 2nd and 3rd value for 2.5
50 percentile or median:
In excel I get 5.5.
Book version .5*10 = 5 (exact) so take average of 5th and 6th value for 5.5 (only match)
75 percentile 7.75:
Book, .75*10 * 7.5 so round up to 8.
Excel
80 percentile:
Excel I get 8.2
Book, .8108, average of 8 and 9 is 8.5.
Obviously Excel is doing more advanced math and additional smoothing, however I have not been able to find the exact math it uses replicate it, hence I will say it is wrong. Other programs and statistical packages match Excel so it is correct, but not useful as I need it.
How can I get Excel to give me the Book version of answers or at least replicate the Excel answers with paper and a basic calculator.
Most importantly I need to find a way to explain to my son that it is OK that the results don't match that he should do it the book way, at least for now or in school.
EDIT: After posting, SO found and similar question: Different results for percentiles in SAS and Excel It seems SAS gives the same results as the book version. The answer there is that Excel and most packages use different interpolation methods. However I need a better explanation for my son and maybe a way to create a proper percentile function for my son, but hopefully without VBA.

What is the formula for the drag-down operation of Excel's range

What is the formula for the drop-down(or drag down) operation of Excel's range?
input 2,5,7,9 then drag down,show 11.5 13.8 16.1 18.4 .... step 2.3
input 5,10,20 then drag down,show 26.66667 34.166667 41.66667 .... step 7.5
input 1,2,3,5 then drag down,show 6 7.3 8.6 9.9 11.2 .... step 1.3
The step is given by the LINEST function which:
The LINEST function calculates the statistics for a line by using the "least squares" method to calculate a straight line that best fits your data.
The 'step' from your examples is shown calculated here:
You might also check out the FORECAST formula which will predict the series you see when you drag down. With your third example I added an index which is required for the regression calculation:

Calculate average of based on weekly dates EXCEL

I am completely new at excel and I have an assignment involving 12k of rows. Basically, I have to calculate the average of the all the values from the same date. These dates follow the arithmetic succession with a difference of 7. Therefore, dates will be like 2/2/52; 2/9/52; 2/16/52; 2/23/53 etc. I know how to find the average of a specific group of values, but selecting one group of values at a time to find the average will take forever because there must be about 5k of different dates. Therefore, I was looking for an automated way that allows me to find the average without going to select the values every single time. The following is an example of the spreadsheet that I am dealing with:
DATE------------------VALUE
2/2/52----------------3.5
2/2/52----------------3.4
2/2/52----------------2.5
2/9/52----------------4.5
2/9/52----------------3.6
2/16/52---------------2.4
2/16/52---------------4.1
2/16/52---------------3.1
2/16/52---------------4.2
2/16/52---------------2.34
Also, please note that the dates do not change in a pattern, meaning dates do not change every n rows.
This is a perfect candidate for a PIVOT table.
Here is your data.
DATE VALUE
2/2/1952 3.5
2/2/1952 3.4
2/2/1952 2.5
2/9/1952 4.5
2/9/1952 3.6
2/16/1952 2.4
2/16/1952 4.1
2/16/1952 3.1
2/16/1952 4.2
2/16/1952 2.34
Select the data and insert pivot table.
Drag Date into rows
Drag VALUE into VALUES
Drop down on the values - select value settings
and select Average
Row Labels Average of VALUE
2/2/1952 3.133333333
2/9/1952 4.05
2/16/1952 3.228
Grand Total 3.364

Populating values in Excel based on another column of values

I'm not the best with technical descriptions but please bear with me.
The task before me:
A large spreadsheet with a column of values that need to be reviewed and a different number input beside them. Unfortunately the number is not simply "value, less 15% though it's close". I will need to have a list of specific "find/replace" commands for my formula.
Example:
3.02
6.65
1.54
3.02
And I need to format it such that it says:
3.02 2.80
6.65 5.60
1.54 1.40
3.02 2.80
My idea was something along the lines of =if(A1=3.02,2.80,=if(A1=6.65,5.60,=if(A1....
Then I'd be able to just paste this formula and drag down the entire spreadsheet.
Unfortunately that didn't work and so I come to you all for help.
Please save me tons of time and figure out how I can make this spreadsheet generate it's own values!
Thanks,
Mike
I would make a little lookup table of the specific replacements:
A B
1 lookup result
2 3.02 2.8
3 6.65 5.6
4 1.54 1.4
5 3.02 2.8
And then you could set up a formula like this:
9 value result
10 3.02 =VLOOKUP(A10,A$2:B$5,2,FALSE)
11 6.65
12 1.54
13 3.02
14 1.54
And you can drag that formula down the rest of the table.
Beware of the A$2:B$5 - if your lookup table is differently-sized, it will need to change.
OR, to keep more along the lines of what you have, try a formula like this:
=if(A1=3.02,2.80,if(A1=6.65,5.60,if(A1....
The change I made was to remove the extra = before the inner IF function calls.
It is hard to guess how many different value do you have. Based on your given sample data you can use XLOOKUP() function like below if you have Excel-365.
=XLOOKUP(A1,{5.6,2.8,1.4},{5.6,2.8,1.4},"",-1)

Horizontal look up with multiple criteria

I am trying to develop the formula in excel to look up multiple criteria. Specifically, how do I pull in a value for apples in NYC for location (B)?
Northeast NYC
(A) (B) (C) (D) (A) (B) (C) (D)
Grapes 3,000 2,073 751 2,000 4,253 3,500 1,832 2,500
Apples 400 3,076 2,298 900 27,250 19,000 14,250 9,000
Oranges 6.0 3.1 3.9 5.0 28.4 20.0 13.8 10.0
I have gotten the following formula to work with your data with one minor issue, you will need to fill the header columns (Northeast/NYC) to allow the match to work
=VLOOKUP(A9,A1:I5,MATCH((B9&C9),(A1:I1&A2:I2),0),FALSE)
**NB: Because the formula is using an array function you will need to execute it using the Crl-Shift-Enter command :)
here is a screenshot to let you decipher the references :)
I hope this is helpful

Resources