How to convert repeated measures from rows to columns in Excel - excel

I have a data file of approximately 5000 repeated measures organized with rows containing IDs and repeated measures on weight, BMI, etc for children. I would like to find the maximum value of one variable (BMI) for each individual (out of up to 9 records). How can I do a lookup on multiple rows for each ID and return the max of a value for each person?
A very abbreviated example is as follows:
HAVE:
ID Date BMI
1 1 20
1 2 18
1 3 24
2 4 23
2 5 19
2 6 17
3 7 25
3 8 18
3 9 21
WANT
ID Highest BMI Corresponding date
1 24 3
2 23 4
3 25 7
Alternatively if there is a way to do this in SPSS or JMP (I don't have access to SAS now), please let me know.
Thanks!
Melissa

You can do this easily in Excel in two parts
A PivotTable to extract the maxiumum BMI for each ID
matching the maximum BMI per ID to a date
Part 1 - PivotTable
Create a PivotTable with
A Row Label of ID
Values as Max of BMI
see below
Part 2 - matching the date
In the cell to the right of tge first BMI maximum, put this formula
=SUMPRODUCT(--($A$2:$A$10=B14),--($C$2:$C$10=C14),$B$2:$B$10)/SUMPRODUCT(--($A$2:$A$10=B14),--($C$2:$C$10=C14))
(ensure you re-map your ranges if they differ from this example)
This formula the record that matches the ID and Max BMI

Related

How to combine SUMPRODUCT with an INDEX and MATCH formula?

Note, I have edited my original question to clarify my problem:
As the title suggests, I am looking for a way to combine the SUMPRODUCT functionalities with an INDEX and MATCH formula, but if a better approach exists to help solve the problem below I am also open to it.
In the below example, imagine that the tables are on different sheets. I have a report that has the sales of each ID in the rows and each month in the columns (first table). Unfortunately, the report only has IDs and not the region they belong to, but I do have a look up table which labels each ID with their respective region (second table):
A
B
C
D
1
ID
January
February
March
2
1
10
5
20
3
3
5
5
10
4
7
0
10
5
5
14
10
25
5
6
25
5
10
10
7
27
10
10
10
8
44
5
5
5
A
B
1
ID
Region
2
1
East
3
3
East
4
7
Central
5
14
Central
6
25
Central
7
27
West
8
44
West
My goal is to be able to aggregate the sales by region as per the result below. However I would only like to show sales data that belong to the month that is shown in cell D2.
Goal:
A
B
C
D
1
Region
Sales
February
2
East
10
3
Central
45
4
West
15
I have used the INDEX and MATCH combination to return a single value, but not sure how I can return multiple values with it and aggregate them at the same time. Any insight would be appreciated!
You may just use:
=SUMPRODUCT((Sheet1!B$1:D$1=D$1)*(Sheet1!H$2:H$8=A2),Sheet1!B2:D8)
Remember, SUMPRODUCT() could be quite heavy processing huge data, therefor to combine INDEX() and MATCH() is not a bad idea, but let's do it the other way around and nest the latter two into SUMPRODUCT() instead =):
=SUMPRODUCT(INDEX(Sheet1!B$2:D$8,0,MATCH(D$2,Sheet1!B$1:D$1,0))*(Sheet1!H$2:H$8=A2))
Another option using SUMIF+INDEX+MATCH function as in
In "Sheet2" B2, copied down :
=SUMIF(Sheet1!H:H,A2,INDEX(Sheet1!B$1:D$1,MATCH(D$2,Sheet1!B$1:D$1,0)))

Excel ranking based on grouping priorities

Hi everyone I have an excel question on how to rank but based first on a a ranking but then next on a second priority of a group. The formula is written in column 'Final_Rank' and I just hid a bunch of rows to show the clear example. Within the column Rank is just a normal rank function. I want the priority to be within Rank first, but then to add the next rank to the next item of the same group*. So if you look at Group HYP it will supersede ranked (3 and 4) and then 5 would be given to the next newest group.
I hope this is a clear explanation, thanks.
Group Rank Final_Rank_Manual
TAM 1 1
HYP 2 2
GAB 3 5
HYO 4 8
ALO 5 9
HYP 7 3
ACO 8 12
IBU 9 13
ACO 11 14
ALO 18 10
GAB 44 6
IBU 53 15
IBU 123 16
GAB 167 7
HYP 199 4
You can do this with an extra helper column. Assuming your table currently occupies columns A-C, with one header row, put the following in C2:
=SMALL(IF($A$2:$A$6=A2,$B$2:$B$6,9999999999),1)+(B2*0.000000001)
You'll need to enter this as an array formula by using Ctrl+Shift+Enter↵. Copy it down throughout the whole column. This gives you the group's ranking, and it adds a tiny decimal indicating the individual values position within each group. (e.g. the 3rd "HYP" value is converted to something like 2.0000000199, because out of all the available values, the second lowest belongs to "HYP", and this specific "HYP" value is 199).
Next, enter the following in D2 and copy it down throughout the column:
=RANK(C2,$C$2:$C$6,1)
This will give you the "Final" rankings. There won't be any ties because of the tiny decimals we added in the previous formula. The results end up looking just like your sample.

SUMPRODUCT INDEX MATCH

With the following data:
A B
1 CUMULATIVE PERCENTAGE OF ITEMS PRODUCED PER MONTH ("COMPLETION_TABLE")
2 Type Month 1 Month 2
3 KITTENS 0 10
4 FISH 0 20
5 BANANAS 2 5
6 APPLES 0 0
7 PEARS 0 5
8 KITTENS 0 5
9
10
11 PRICES TABLE ("PRICES_TABLE")
12 Type Value
13 APPLES 1000
14 BANANAS 5000
15 PEARS 3000
16 FISH 4000
17 KITTENS 2000
I'm attempting to use the SUMPRODUCT function to calculate the percentage change in each month and use that value as a multiple of the prices table to provide a total price per month across all types that have been produced.
I can calculate the movement as:
=SUMPRODUCT((COMPLETION_TABLE[Month 2]-COMPLETION_TABLE[Month 1]))
... but I then need to calculate the portion of the individual movement values against the price for that type and sum the resulting products together. I have been using various INDEX / MATCH combinations without much luck.
As an example: BANANAS which should =(5-2)*5000.
Written as expanded arrays I would like to do
({10;20;5;0;5;5}-{0;0;2;0;0;0})*{2000;4000;5000;1000;3000;2000}.
Use of SUMPRODUCT implies you want a single figure result. You can use SUMIF as a "pseudo lookup" within SUMPRODUCT to get the prices, e.g.
=SUMPRODUCT(C3:C8-B3:B8,SUMIF(A13:A17,A3:A8,B13:B17))
That would get you a result of 140,000 for your example
From your question I understand the result you want is an array. This is what you get with this formula:
=INDEX($B$13:$B$17,MATCH($A3:$A8,$A$13:$A$17,0))*($C3:$C8-$B3:$B8)
entered as an array formula using Ctrl Shift Enter.
I am sure there is something simpler. I am assuming that the Type in the Price Table are unique:
{=(SUM((A13=$A$3:$A$8)*$C$3:$C$8)-SUM((A13=$A$3:$A$8)*$B$3:$B$8))*SUM((A13=$A$13:$A$17)*$B$13:$B$17)

cognos: Pick up the initial value for every row of a crosstab

I have a requirement in which i have to pick up the initial value of each row in a crosstab..
My crosstab looks like this
value 1960 1970 2010 2011
aus 10 5 11 6
eng 5 2
bra 11 4
ind 8 11
i have to add another column which picks up the initial value for every row based on the year..
so the result should look like this.
value 1960 1970 2010 2011 initialValue
aus 10 5 11 6 10
eng 5 2 5
bra 11 4 11
ind 8 11 8
You should be able to use the minimum() function to determine the lowest value for year and then return the value corresponding to that. The expression for the initialValue data item would be something like:
total(
CASE
WHEN [Year] = minimum([Year] for [Language])
THEN [Value]
ELSE 0
END
for [Language])
We get the lowest year for the specific language in the data set using the minimum() function using the for clause to define the aggregation level. If the year of the row matches this number, we output the value, otherwise we output 0. We then total everything up for each language which should give us the value for the lowest year.
This solution assumes that the numbers displayed in your crosstab are totals of lower-level row detail. If the aggregate is something different, such as average or count, the wrapping summary function should be changed accordingly.

Extracting the upper quartile data from an array and placing it in new column

Hi and thanks for any help with this in advance
Below is a hypothetical data set; abundance = count data; mud% = the mud content in which the animals were found; mud bin = bins i've made up depending on the mud%; and UQ = upper quartile of the abundance data from its corresponding mud bin (i.e. the upper quartile for the abundance data in mud bin 1 is 17.25 etc).
Problem:
In excel, for abundance data in each of the four mud bins, I'm wanting to extract any values in the abundance column that are >= the upper quartile value for that particular mud bin and place these in a new column on the same sheet (with no gaps between rows from values that didn't meet the criteria) along with their corresponding mud% value in the neighboring cell. I've added the new columns to the below sheet to give you an idea of what I'm after.
abundance | mud% | mud bin | UQ | | New column | Mud% |
18 10.9 1 18 10.9(mud bid 1)
15 6.5 1 44 38.9(mud bin 1)
6 13.4 1 45 38 (mud bin 2)
13 42.1 1 37 37.8(mud bin 2)
15 36.4 1 etc
44 38.9 1 17.25 etc
22 46 2
30 36.4 2
45 38 2
29 35.3 2
37 37.8 2
29 41.8 2 35.25
11 44.4 3
17 47.8 3
21 40.7 3
15 13.9 3
35 13.9 3
14 13.9 3
15 13.9 3 19
19 12 4
14 12 4
10 12 4
12 12 4
14 12 4
13 12 4
45 9.525 4
66 9.525 4
78 9.525 4 45
The reality is I have a rather large dataset containing abundance data for a number of species, all on the same excel sheet and would greatly appreciate any insight into how I might achieve this in the most efficient manor.
For starters, to make this explanation simpler, I will assume that the last row of data is in row 100.
Populate Upper Quartile values for all line items
First you'll need to use the Quartile formula; however, since you want to find the upper quartile within a bin, you'll have to use an array formula. Put this formula in your UQ column (place in cell D2 and drag down). When entering the formula Be sure to press Ctrl+Shift before pressing Enter
=QUARTILE(IF($C$2:$C$100=C2,$A$2:$A$100,""),3)
The first part of this formula, $C$2:$C$100=C2 is your condition. Everywhere this condition is met, you will get the corresponding value in $A$2:$A$100; otherwise, you'll get a blank value. This will give you an array of abundance values that matches the indicated mudbin, C2. now that you have your subset of data, the quartile function will give you the value in the 3rd quartile (17.25 for mudbin 1, which will be placed next to every row that has a mudbin of 1).
Now that we have all the quartiles, we can get all the abundance values that are greater than the UQ for that mudbin. This is done in two parts
Get abundance values greater than mudbin UQ
First, you need to select one column of cells that has the same number of rows as your data (for example, select cells F2:F100)
Enter the following formula into the formula bar (while F2:F100 are highlighted) and press Ctrl+Shift, then enter
=IF($A$2:$A$100>$D$2:$D$100,$A$2:$A$100,"")
Similar to the IF statement used before, this formula finds all the abundance values that are greater than their corresponding UQ value. Now column F will have an abundance number where it is greater than it's UQ value, and a blank where it is not. Now onto the final step.
Populate abundance values that are greater than the UQ value, without the blanks
Select G2:G100 (your "New Column" in your sample data)
Enter the following formula into the formula bar (while G2:G100 are highlighted) and press Ctrl+Shift, then enter
=INDEX(F2:F100,SMALL(IF(F2:F100<>"",ROW(F2:F100)-1),ROW()-ROW($F$1)))
Looking at the IF statement again, this will find every value in F2:F100 that is not blank, but instead of grabbing the values, we'll keep track of the row number of that non blank value (done by ROW(F2:F100)-1
). Now that we have the row numbers of all the non blank values, we can grab the non-blank values in order and populate them in G2:G100. ROW()-ROW($F$1) is a counter, and SMALL will use the counter to determine the nth smallest number to return. Once we have our row number of the non blank value, INDEX returns that value
Finally, to populate the Mud%, you'll need to use the row number of the non blank values to get the mud% and the mud bin (You have the formula already to get the row number of the non blank value).
It's not a simple answer, but at least you won't have to use VBA.

Resources