Statistical significance test for ranked data

Statistical significance test for ranked data - statistics

I have a list of rankings in the following format:
Item | Score | Rank
item1 | 0.97 | 6
item2 | 0.53 | 4
item3 | 0.05 | 1
item4 | 0.68 | 5
item5 | 0.10 | 2
item6 | 0.29 | 3
I want to determine whether the difference between each two pair of ranked items is significant given the scores. What statistical test should I conduct? Thank you!

Related

rolling average and aggregate more than one column in pandas

How do I also aggregate the 'reviewer' lists together with average of 'quantities'?
For a data frame like below I can successfully calculate the average of the quantities per group over every 3 years. How do I add an extra column that aggregates the values of column 'reviewer, for every period as well? for example for company 'A' for year 1993, the column would be [[p1,p2],[p3,p2],[p4]].
df= pd.DataFrame(data=[
['A', 1990, 2,['p1','p2']],
['A', 1991,3,['p3','p2']],
['A', 1993,5,['p4']],
['A',2000,4,['p1','p5','p7']],
['B',2000,1, ['p3']],
['B',2001,2,['p6','p9']],
['B',2002,3,['p10','p1']]], columns=['company', 'year','quantity', 'reviewer'])
df['rolling_average'] = (df.groupby(['company'])
.rolling(3).agg({'quantity':'mean'}).reset_index(level=[0], drop=True))
The output currently looks like:
| index | company | year | quantity | reviewer | rolling_average |
| :---- | :------ | :--- | :------- | :------- | :-------------- |
| 0 | A | 1990 | 2 | [p1, p2] | NaN |
| 1 | A | 1991 | 3 | [p3, p2] | NaN |
| 2 | A | 1993 | 5 | [p4] | 3.33 |
| 3 | A | 2000 | 4 | [p5, p7] | 4.00 |
| 4 | B | 2000 | 1 | [p3] | NaN |
| 5 | B | 2001 | 2 | [p6, p9] | NaN |
| 6 | B | 2002 | 3 | [p10, p1]| 2.00 |

Since the rolling can not take non-numeric , we need self-define the rolling here
n = 3
df['new'] = df.groupby(['company'])['reviewer'].apply(lambda x :[x[y-n:y].tolist() if y>=n else np.nan for y in range(1,len(x)+1)]).explode().values
df
company year quantity reviewer new
0 A 1990 2 [p1, p2] NaN
1 A 1991 3 [p3, p2] NaN
2 A 1993 5 [p4] [[p1, p2], [p3, p2], [p4]]
3 A 2000 4 [p1, p5, p7] [[p3, p2], [p4], [p1, p5, p7]]
4 B 2000 1 [p3] NaN
5 B 2001 2 [p6, p9] NaN
6 B 2002 3 [p10, p1] [[p3], [p6, p9], [p10, p1]]

SUM values in one column if criteria exists at least once per value in another column

| A B C D | E F | G H
----|----------------------------------------------------|-----------------------|-------------------
1 | | |
2 | Products date quantity | |
----|----------------------------------------------------|-----------------------|-------------------
3 | Product_A 2020-01-08 0 | From 2020-01-01 | Result: 800
4 | Product_A 2020-12-15 0 | to 2020-10-31 |
5 | Product_A 2020-12-23 0 | |
6 | Product_A 500 | |
----|----------------------------------------------------|-----------------------|------------------
7 | Product_B 2020-11-09 0 | |
8 | Product_B 2021-03-14 0 | |
9 | Product_B 700 | |
----|----------------------------------------------------|-----------------------|------------------
10 | Product_C 2020-02-05 0 | |
11 | Product_C 2020-07-19 0 | |
12 | Product_C 2020-09-18 0 | |
13 | Product_C 2020-09-25 0 | |
14 | Product_C 300 | |
14 | | |
15 | | |
In the table I have listed different products with multiple dates per product.
Below each product there is a row in which a quantity is displayed.
Now in Cell H3 I want to get the Sum of the quantity of all products that have at least one date between the dates in Cell F3 and Cell F4. In the example this applies to Product_A and Product_C therefore the sum is 500+300=800.
I have no clue what kind of formula I need to achieve this.
I guess it must be something like this:
SUMIFS(Date in Cell F3 OR in Cell F4 exists for Product in Column C THEN SUM over Column D)
Do you have an idea how this formula has to look like?

One way would be with SUMPRODUCT() combined with COUNTIFS():
=SUMPRODUCT((COUNTIFS(B3:B14,B3:B14,C3:C14,">="&F3,C3:C14,"<="&F4)>0)*D3:D14)

Get count of days from last date over time? (non-VBA)

I have two columns like so:
Item | Date
Item1 | 1/1/20
Item2 | 1/2/20
Item1 | 1/3/20
Item2 | 1/4/20
Item1 | 1/6/20
Item2 | 1/8/20
I want to be able to get a count of days passed since any item showed from its last date, like so:
Item | Date | Days passed
Item1 | 1/1/20 | 0
Item2 | 1/2/20 | 0
Item1 | 1/3/20 | 2
Item2 | 1/4/20 | 2
Item1 | 1/6/20 | 3
Item2 | 1/8/20 | 4
Any ideas that are a non-VBA solution?

=B10-LOOKUP(2,1/($A$4:A9=A10),$B$4:B9)

Looking to create weighted average of partitioned columns in Excel

Horrible title, but I couldn't find a way to describe what I'm trying to do concisely. This question was posed to me by a friend, and I'm usually competent in Excel, but in this case I am totally stumped.
Suppose I have the following data:
| A | B | C | D | E | F | G | H |
---------------------------------------------------------------------
1 | 0.50 | 0.50 | 1 | | | 0.30 | 0.30 | |
2 | 0.25 | 0.75 | 2 | | | 0.40 | 0.70 | |
3 | 1.00 | 1.75 | 8 | | | 0.30 | 1.00 | |
4 | 0.75 | 2.50 | 2 | | | 0.50 | 1.50 | |
5 | 1.25 | 3.75 | 3 | | | 1.75 | 3.25 | |
6 | 0.50 | 4.25 | 1 | | | 0.25 | 3.50 | |
7 | 1.00 | 5.25 | 0 | | | 0.50 | 4.00 | |
8 | 0.25 | 5.50 | 2 | | | 0.30 | 4.30 | |
9 | 0.25 | 5.75 | 9 | | | 0.25 | 4.55 | |
10 | 0.75 | 6.50 | 4 | | | 0.70 | 5.25 | |
11 | | | | | | 1.00 | 6.25 | |
12 | | | | | | 0.25 | 0.25 | |
Column A represents the distance traveled while the measurement in column C was collected. Column B represents the total distance traveled so far. So C1 represents some value produced during the process from distance 0 to 0.5. B2 represents the value from distance 0.5 to 0.75, and B3 represents the value from 0.75 to 1.75, etc...
Column F represents a PLANNED second iteration of the same process, but with different measurement intervals. What I need is a way to PREDICT column H, based on a WEIGHTED AVERAGE of values from column C, based on where the intervals in column F intersect with the intervals in column A. For example, since F2 represents the measurement taken from distance 0.30 to 0.70 (an interval of 0.4, split 50/50 across the measurements in C1 and C2), H2 would be equal to: C1*0.5 + C2*0.5: 1.5.
Another example: H3 represents the expected measurement from an interval between 0.7 and 1.0, which is split between C2 (from 0.7 to 0.75 = 0.05) and C3 (from 0.75 to 1.0 = 0.25). So H3 = 16.6%*C2 + 83.3%*C3 = 0.332+6.664 = 6.996.
I'm looking for a way to do this in an Excel spreadsheet without using VBA or breaking it down into something like a Python script to process externally, but so far I'm not finding any way to do it.
Any ideas for accomplishing this entirely within Excel without any special add-ins/scripts installed ?

It's not pretty, but I think the following should work for all except H1 (which would need an added zero row):
=(MAX(0,INDEX(B:B,MATCH(G2,B:B,1))-G1)*INDEX(C:C,MATCH(G2,B:B,1)) +
(G2-INDEX(B:B,MATCH(G2,B:B,1)))*INDEX(C:C,MATCH(G2,B:B,1)+1)) /
MAX(G2-G1,G2-INDEX(B:B,MATCH(G2,B:B,1)))
It matches the values in B and C and weights them accordingly.

powerpivot using a calculated value in another calculation

I have the following tables
Orders:
OrderID|Cost|Quarter|User
-------------------------
1 | 10 | 1 | 1
2 | 15 | 1 | 2
3 | 3 | 2 | 1
4 | 5 | 3 | 3
5 | 8 | 4 | 2
6 | 9 | 2 | 3
7 | 6 | 3 | 3
Goals:
UserID|Goal|Quarter
-------------------
1 | 20 | 1
1 | 15 | 2
2 | 12 | 2
2 | 15 | 3
3 | 5 | 3
3 | 7 | 4
Users:
UserID|Name
-----------
1 | John
2 | Bob
3 | Homer
What I'm trying to do is to sum up all orders that one user had, divide it by the sum of his goals, then sum up all orders, devide the result by the sum of all goals and then add this result to the previous result of all Users.
The result should be:
UserID|Name |Goal|CostSum|Percentage|Sum all
---------------------------------------------------
1 |John | 35 | 13 | 0.37 |
2 |Bob | 27 | 23 | 0.85 |
3 |Homer| 12 | 20 | 1.67 |
the calculation is as follow:
CostSum: 10+3=13
Goal: 20+15=35
Percentage: CostSum/Goal=13/35=0.37
Sum all: 10+15+3+5+8+9+6=56
Goal all: 20+15+12+15+5+7=74
percentage all= Sum_all/Goal_all=56/74=0.76
Result: percentage+percentage_all=0.37+0.76=1.13 for John
1.61 for Bob
2.43 for Homer
My main problem is the last step. I cant get it to add the whole percentage. It will always filter the result so making it wrong.

To do this you're going to need to create some measures.
(I will assume you've already set your pivot table to be in tabular layout with subtotals switched off - this allows you to set UserID and Name next to each other in the row labels section.)
This is what our output will look like.
First let's be sure you've set up your relationships correctly - it should be like this:
I believe you already have the first 5 columns set up in your pivot table, so we need to create measures for CostSumAll, GoalSumAll, PercentageAll and Result.
The key to making this work is to ensure PowerPivot ignores the row label filter for your CostSumAll and GoalSumAll measures. The ALL() function acts as an override filter when used in CALCULATE() - you just have to specify which filters you want to ignore. In this case, UserID and Name.
CostSumAll:
=CALCULATE(SUM(Orders[Cost]),ALL(Users[UserID]),ALL(Users[Name]))
GoalSumAll:
=CALCULATE(SUM(Goals[Goal]),ALL(Users[UserID]),ALL(Users[Name]))
PercentageAll:
=Orders[CostSumAll]/Orders[GoalSumAll]
Result:
=Orders[Percentage]+Orders[PercentageAll]
Download - Example file available for download here. (Don't actually read it in Google Docs - it won't be able to handle the PowerPivot stuff. Save locally to view.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Statistical significance test for ranked data - statistics

Related

rolling average and aggregate more than one column in pandas

SUM values in one column if criteria exists at least once per value in another column

Get count of days from last date over time? (non-VBA)

Looking to create weighted average of partitioned columns in Excel

powerpivot using a calculated value in another calculation

Categories

Resources