excel formula how to build similarity matrix - excel

I would like to create a formula that will create a similarity matrix from a table of data. Here is an example of the data followed by the desired output. The rows of data represent the result of a modularity class algorithm. I need my output to calculate the number of months two countries share the same class value (It can be any value as long as they are equal) divided by the number of total months.
Input modularity class
jan feb mar apr may
USA 0 1 2 4 3
UK 0 3 2 3 3
AU 0 2 2 2 3
CH 1 0 1 1 2
EG 2 3 0 0 1
Output similarity matrix
USA UK AU CH EG
USA 1
UK 0.6 1
AU 0.6 0.6 1
CH 0 0 0 1
EG 0 0.2 0 0 1
I have not tried any formula because I dont understand where to begin. I have read about countifs and mmult, but I dont know what is most appropriate.

Right, there is a way using SUM(), IF(), COUNT() and using array formulas.
The basic formula that you could use is (assuming your top array is A1:F6, including the headers, so the data is in A2:F6):
{=SUM(IF($B$2:$F$2=$B2:$F2,1,0))/COUNT($B$2:$F$2)}
Using array formulas, you can make the IF() function return 1 for a match, and 0 for a mismatch, iterating through each element in a row. SUM() adds the matches up, and then dividing by the COUNT() of the number of cells processed gives you your similarity index.
The example above is for the top USA/USA cell in your example, you can fill down with that, but each new diagonal needs adjusting to change the fixed row number to the new row. So the top of the UK column would be:
{=SUM(IF($B$3:$F$3=$B3:$F3,1,0))/COUNT($B$3:$F$3)}
The COUNT() could be removed if you know how many columns/countries there are in advance.
Note: You do not type the curly braces. When you have finished entering the formula you have to press Ctrl-Shift-Enter whilst editing, to get those to appear and for the formula to be treated as an array formula. These formulas are often called CSE formulas for this reason (Ctrl-Shift-Enter).
Update:
You can do it with a single formula, filled over the cells, using INDIRECT() and COLUMN() as well.
{=SUM(IF(INDIRECT("$B$"&COLUMN(B2)):INDIRECT("$F$"&COLUMN(B2))=$B2:$F2,1,0))/COUNT(INDIRECT("$B$"&COLUMN(B2)):INDIRECT("$F$"&COLUMN(B2)))}
This uses the fact that the column number is the same as the row number, providing the transposition.
Update 2:
Actually, the COUNT() can be eliminated and SUM() replaced with AVERAGE() as all matches are 1, the mean is the correct value.
So this works for all cells:
{=AVERAGE(IF(INDIRECT("$B$"&COLUMN(B2)):INDIRECT("$F$"&COLUMN(B2))=$B2:$F2,1,0))}
Update 3:
If you desparately need nothing to appear on the upper diagonal side, making a lower triangular block, then you can use an IF() wrapping the above formula, checking to see if the column is greater than the row and making the cell blank in that case. You can then fill the entire block with the formula and it will look right, with no need for deleting.
{=IF(COLUMN(B2)>ROW(B2),"",AVERAGE(IF(INDIRECT("$B$"&COLUMN(B2)):INDIRECT("$F$"&COLUMN(B2))=$B2:$F2,1,0)))}

I'm going to provide a nudge in the right direction, rather than the whole solution. I think you'll be able to generalize the solution from the formulas given.
I placed your input matrix in A1:F6. Then I set up an intermediate matrix. A9 is simply "Matches in column 1."
===edit: Using month names in columns ===
Then I copied the month names jan .. may in B10:F10. Also USA .. EG in A11 .. A15.
Then in B12, the fun starts. I put a formula, =IF(B3=B2,1,0). What this does is it places a 1 in the cell if B3=B2, or a 0 otherwise. Then I fill right from B12 to F12 and I get the following row calculated:
1 0 1 0 1
You will note that these are the months when UK = USA.
If you take B12 and fill down to B15 and then fill B12 .. B15 right to F15, you will get the difference between each country and its adjacent neighbor. My matrix turned out to be:
jan feb mar apr may
USA
UK 1 0 1 0 1
AU 1 0 1 0 1
CH 0 0 0 0 0
EG 0 0 0 0 0
If you divide the sum of the rows by the number of countries, you will get 3/5, 3/5, 0/5, 0/5. These correspond to the output matrix of your first column, 0.6, 0.6, 0, 0.

Related

Excel counting taking into account adjacent cells (summing along a column, but looking across a row)

I have a table that tells me whether a value is found in a source:
(image of excel cells)
Value
Source1
Source2
Source3
alpha
1
0
1
beta
0
1
1
gamma
1
0
0
delta
1
1
1
epsilon
0
1
0
zeta
0
1
0
What I'd like to do is count the number of times that each source uniquely finds a given value. For this example, there are:
one value unique to Source1 (gamma)
two values unique to Source2 (epsilon and zeta)
zero values unique to Source3
In practice, this calculation will be used on ~10 columns and 1000s of rows, so I need some formula help.
I've tried various combinations of sumifs, countifs, sumproducts, and array formulas, but I am stumped by the fact that the sum needs to look perpendicularly to the column.
Any help is much appreciated!
With Excel365 you can try below formula-
=SUM(--(MMULT($B$2:$D$7,SEQUENCE(COLUMNS($B$2:$D$2),,,0))*(B$2:B$7)=1))
For Non365 version of excel you try below array (CTRL+SHIFT+ENTER) formula. In this case you must enter same number of one 1 of source column.
=SUM(--(MMULT($B$2:$D$7,TRANSPOSE({1,1,1}))*(B$2:B$7)=1))

Excel counting cells based on complex criteria

Hate asking question about something as simple as an Excel formula, but seem to really need and would appreciate the help.
I have a table where the rows headings contains names and the column headings contains week numbers. Within this table I have differents numbers. Both numbers that are plus and negatives.
I want to count each cell where the row heading matches a specific name and then each cell that has a plus value with a week number less than or equal to a certain week.
I have tried to got it to work with at least some function (without it caring about plus and negative values) but haven't even gotten that to work.
I've tried with:
=SUMPRODUCT((Data!F3:F28=I1)*(Data!I2:BI2="<="&A1)*(Data!I3:BI28))
=SUMIFS(Data!I3:BI28;Data!F3:F28;I1;Data!I2:BI2;"<="&A1)
.............1 2 3 4 5
name1 -1 4 3 1 1
name2 0 0 0 0 0
I want a formula that counts for example every column header with a value less than or equal to (for example) 4, but excludes negatives and vice versa. So for the example above, the result of name1 should be 8; counts week 2, 3 and 4.
For your current example:
Formula in H3:
=SUMPRODUCT((B2:F3>=0)*(B1:F1<=4)*(A2:A3="name1"),B2:F3)

Excel formula that returns specific lists of values?

Help is appreciated!
I have two columns of data. The first is a participant ID column. The second adjacent column are the binary coded respective responses (example below). I am interested in all of the 1's.
Does anyone know a formula which can extract a list of all of the participants who have scored a 1?
E.g., (Particpant ID is value on the left, and the participants' score is to the right).
1 0
2 1
3 1
4 0
5 0
6 1
FILTER() formula will do that. Try-
=FILTER(A2:A7,B2:B7=1)

Getting the next lower value with VLOOKUP

I have the following Excel spreadsheet:
A B C D
1 15.000 Product 1 7.500 Product 2
2 5.000 Product 2
3 1.000 Product 3
4 0 Product 4
5
In Cell C1 I type a random number (in this case 7.500). Now I want that in Cell D1 the corresponding Product is shown to the value in Cell C1.
Since 7.500 does not exist in Column A the next lower value should be used.
In this case 5.000 which belongs to Product 2.
I tried to go with the following formulas in Cell D2 but instead of getting Product 2 I always get Product 4 as a result.
=VLOOKUP(F16,$A$1:$B$4,2,TRUE)
=VLOOKUP(F16,$A$1:$B$4,2)
Do you have any idea how to solve this issue?
For unsorted data you can use below formula:
=INDEX(B1:B4,MATCH(LARGE($A$1:$A$4,COUNTIF($A$1:$A$4,">"&C1)+1),A1:A4,0))
See image for reference
i will try it with addition columns, which indicates the difference between given 7.5 and col A like this: =IF($C$1-A1<0,999999,$C$1-A1) and then get the mininum value like =MIN(E1:E4). then get the according value in column B.
If the order is not guaranteed to be ascending then you can use an array formula (use ctrl+shift+enter instead of just enter to create an array formula)
=INDEX(B2:B5,MATCH((C2-MIN(IF((C2-A2:A5)<0,MAX(A2:A5),C2-A2:A5))),A2:A5,0))

Excel sum one column based on another

I have data that is divided into columns as follows:
Runs RunsAfter Switch New
0 2 1
0 2 0
1 2 0
0 1 0
1 1 0
0 0 0
0 0 0
0 0 1
0 1 0
1 1 0
0 0 0
I want excel to sum the Runs column by taking each cell and summing down the remainder of the column until there is a 1 in the Switch column. It should then start calculating again until another Switch. All of this output should be put in the New column. The result should look like the RunsAfter column, which I am currently calculating by hand. I would keep doing this, but the dataset is going to get too big to continue doing this by hand.
I've checked for questions similar to this, but haven't been able to find quite what I'm looking for. If I've missed an answer elsewhere, please let me know.
If I am understanding correctly, I think you want to use a combination of the match formula to find the next "switch" and the indirect formula to define the range to sum. I can't think of a simple way to do this.
This is assuming that your headers are in row 1 and your columns are Runs (A), RunsAfter (B), Switch (C), and results in D. I've used 100 for your last row, but you will need to change that if you have more rows. This is what I did in D2:
To find the next row for which Switch is 1:
MATCH(1,C3:$C$100,0)+ROW()
I also included iferror to make it not break for rows after the last switch:
IFERROR(MATCH(1,C3:$C$100,0)+ROW()-1,100)
I included this as part of the indirect formula to tell it what range to sum up:
D2=SUM( INDIRECT("A"&ROW()&":A$"&IFERROR(MATCH(1,C3:$C$100,0)+ROW()-1,100)))
So for D2, it's summing from A2 to the row before the next switch, in this case row 8. You should be able to drag it down from D2 with those anchors.

Resources