Matrix with boolean values from a list of paired observations - excel

In the below spreadsheet, the cell values represent an ID for a person. The person in column A likes the person in column B, but it may not be mutual. So, in the first row with data, person 1 likes 2. In the second row with data person 1 likes 3.
A B
1 2
1 3
2 1
2 4
3 4
4 1
I'm looking for a way to have a 4 x 4 matrix with an entry of 1 in (i,j) to indicate person i likes person j and an entry of 0 to indicate they don't. The example above should like this after performing the task:
1 2 3 4
1 0 1 1 0
2 1 0 0 1
3 0 0 0 1
4 1 0 0 0
So, reading the first row of the matrix we would interpret it like this: person 1 does not like person 1 (cell value = 0), person 1 likes person 2 (cell value = 1), person 1 likes person 3 (cell value =1), person 1 does not like person 4 (cell value = 0)
Note that order of pairing matter so [4 2] does not equal [2 4].
How could this be done?

Assuming your existing data is in A1:B6, then in A10 enter:
=COUNTIFS($A$1:$A$6, ROW()-9,$B$1:$B$6, COLUMN())
This will return a 1 or a 0 depending on whether person 1 likes person 1. They don't so you get a 0. It uses Row()-9 to return 1 and COLUMN() to return 1 to find the match.
Copy this formula over 4 columns and down 4 rows and that ROW()-9 and COLUMN() formula will return the appropriate values for the check into the COUNTIFS() formula which will look for the matching pair.
Personally, if this was something I had to do and my matrix was of indeterminate size, I would probably stick these formulas on a second tab, starting at A1 and use ROW() where I don't have to adjust it by 9. But for a one off on the same tab, to help check the results, the above is fine.

Related

Excel - assign values based on the first unique item

I have got an excel question that I can not answer. Here is my table:
ID Key Count Unique Available Text Results
1 0 Text-1 Dupe-Y
2 1 Y Text-1 Y
3 0 Text-1 Dupe-Y
4 0 Text-1 Dupe-Y
5 1 N Text-2 N
6 1 Y Text-3 Y
7 0 Text-2 Dupe-N
8 0 Duplicate Text-2 Dupe-N
9 0 Duplicate Text-2 Dupe-N
10 0 Y Text-2 Dupe-N
Id Key is just unique key.
Count unique picks up the first time each value in column Text appears. Available can have Y, N, Duplicate and Text is the main column I need to analyze my table. The Results are for the first time each value in Text appears (Count unique = 1), if there is a value in Available then that is the value I need, if Count Unique is 0 then is either Dupe-Y or Dupe-N depending on the value in Available.
I tried with a formula like this one but got stuck after initial progress. =IF(B2=0,"",IFERROR(IF(COUNTIF(D:D,D2)>1,IF(COUNTIF($D:$D,D2)=1,"",C2),1),1))
Note that the column Results is the one I need to populate with a formula that is not affected by sorting or lack of it.
I guess you got all those values and you just need a formula for column Results.
My formul will work only if the data is sorted like in your example. If sorting changes, formula will fail:
My formula is:
=IF(B2=1;D2;"Dupe-"&RIGHT(G1;1))

Sum values of one column depending on the ID of another

I have a table with many values as such (this is an oversimplified example):
IDx
Namex
Pricex
1
a
5
2
b
2
1
a2
5
3
c
3
2
b2
9
and another table with only the ID, in which I'd like to add a column that shows the addition of all the values that match that ID, in this example:
IDy
Totaly
1
10
2
11
3
3
I'm guessing this is a combination of vlookup with sum or sumif, I've tried so far:
=SUM(VLOOKUP(IDy1,$IDx$1:$IDx$5,$Pricex$1:$Pricex$5),// don't know how to proceed here
Try this:
B5:B9 = IDx
B16,D5 = Price
=SUMIF(B$5:B$9,B16,D$5:D$9)

excel extendable formula for conditional sums over multiple columns

I have an arbitrary number of columns, one for each period a course is offered, in chronological order, and an arbitrary number of rows, one for each unique participant. The values are '1' for participation in that month, '0' for non-participation.
Fall2019 Spring2019 Fall2018 Spring2018 Fall2017
1 1 0 1 0
0 1 1 1 1
0 1 1 1 1
0 1 1 1 1
0 1 0 1 1
0 1 1 0 0
0 1 1 0 0
0 1 1 0 0
1 0 0 0 0
I would like to take a sum, at the bottom of each column, for how many participants were first time attendees that period, i.e. the sum of '1's where all values in the row to the right of that '1', are '0'.
In the given example set, Spring2018 should sum to 1, Fall2018 should sum to 3.
Something like the formula below will work for 'Spring2018' when there is just one previous column to compare:
=SUMPRODUCT((D2:D9)*(E2:E9=0))
But this formula cannot be 'autofilled' or extended across multiple columns... i.e. none of these variations work:
=SUMPRODUCT((C2:C9)*(D2:$E9=0))
=SUMPRODUCT((C2:C9)*(SUM(D2:$E9)=0))
=SUMPRODUCT((C2:C9)*(SUMIF(D2:$E9,"0")))
And while it will work, I do NOT want to have to manually create extended versions of this formula e.g.
=SUMPRODUCT((C2:C9)*(D2:D9+E2:E9=0))
=SUMPRODUCT((B2:B9)*(C2:C9+D2:D9+E2:E9=0))
... and so on
I have tried several variations on arrayformula, sumproduct, and sumif, but I'm really stuck. Any assistance is appreciated.
use this array formula:
=SUM(A2:A10*(MMULT(--(B$2:$E$10=1),TRANSPOSE(COLUMN(B$2:$E$10)^0))=0))
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.

Pandas groupby value and return observation count to dataset

I have a dataset like the following:
id value
a 0
a 0
a 0
a 0
a 1
a 2
a 2
a 2
b 0
b 0
b 1
b 2
b 2
I want to groupby the "id" column and grab the number of observations in the "value" column, and return a new column in the original dataset that counts the number of times the "value" observation occurs within each id.
An example of the output I'm looking for is represented in column "output":
id value output
a 0 4
a 0 4
a 0 4
a 0 4
a 1 1
a 2 3
a 2 3
a 2 3
b 0 2
b 0 2
b 1 1
b 2 2
b 2 2
When grouping on id "a", there are 4 observations of 0, which is provided in the column "output" for each row that contains id of "a" and value of 0.
I have tried applications of groupby and apply, to no avail. Any suggestions would be very helpful. Thank you.
Update: I figured out a solution for anyone who also faces this problem, and it works well.
grouped = df.groupby(['id','value'])
df['output'] = grouped['value'].transform('count')
This will return the count of observations under each bucket and return that count to each observation that meets that criteria, as shown in the "output" column above.
group by id and and value then count value.
data.groupby(['id' , 'value'])['id'].transform('count')

Indexing questions with array formulas in Excel

I'm new to array formulas in Excel and my brain has been trained in R for too long, so I'm sorry if this question is simple or too specific. I have data that looks like this:
ID Iteration Value Group1 Group2
2 1 100 0 0
2 2 85 1 0
2 3 28 0 0
3 1 94 1 0
5 1 83 0 1
5 2 50 1 1
6 1 94 0 0
6 2 28 1 0
I want to use array formulas to query the data in a few different ways. I want to:
For each ID, find the first iteration that has Group1 = 1.
For each ID, what is the maximum value when Group1 = 1.
For each ID, how many iterations of Group1 = 1 did it take to get to the maximum value when Group1 = 1.
I've figured out how to specify the maximum for each ID via: {=MAX(IF(A:A=A2,C:C))}
Any assistance would be appreciated. I've gone through a few quick tutorials so far, and I'm willing to look through any other good resources you may know of.
Have a look at this and tell me what you think - particularly for question 3
My setup looks like this
All formulas drag down and they are as follows:
Formula in B14 (Question 1)
{=INDEX($B$2:$B$9,MATCH(1,($A$2:$A$9=A14)*($D$2:$D$9=1),0))}
Formula in G14 (Question 2)
{=MAX(($A$2:$A$9=$F14)*($D$2:$D$9=1)*$C$2:$C$9)}
Formula in K14 (Question 3)
{=SUM(($A$2:$A$9=J14)*($C$2:$C$9=G14)*$B$2:$B$9)}
Update
If you want to know how many times in ID=2 that Group1=1 before we reach the maximum we found for ID =2 in question 2, then I'd proceed like this:
Add another column to your data, I labeled it: Group1 Passes. Placing this in the new column, F2, and drag down.
=COUNTIFS($A$2:A2,A2,$D$2:D2,1)
You can then use the following in K14
=SUM(($A$2:$A$9=J14)*($C$2:$C$9=G14)*($D$2:$D$9=1)*$F$2:$F$9)

Resources