1 / 0 Sparse Matrix in Excel - excel

I have a data of 1000 rows and 2 cols. I want to convert it to a matrix of one's and zero's such that the 1st column is Column A, 1st row is Column B and whenever a certain value of A matches B, it returns 1 or 0. I did this for a column but while trying to copy, the formula breaks.
Is there an easy solution for this?
Is there any other easy way to do this in Excel or any other language like Python or R.
Excel Screenshot:

You're almost there. Your formula is almost correct, but the first reference should be C$1. To populate the rest of the matrix, select the full range starting with C2 and enter the formula with Ctrl+Enter.

=(COUNTIF($C$1:INDEX($1:$1,MAX(($1:$1<>"")*(COLUMN($1:$1)))),$B2)>0)*1
Counts if the text in B2 occurs in the indexed first row starting from column 3 to the last non empty column. If only one match is possible, you can narrow it down to
=COUNTIF($C$1:INDEX($1:$1,MAX(($1:$1<>"")*(COLUMN($1:$1)))),$B2)

Related

Increment by 2 for n rows, increment by 4 once and repeat when referencing data from one sheet to another

thank you for taking the time to look at this question.
I'm looking for an equation that can easily take the numerical values from Sheet 1 (the first picture) which has 2 blank cells in between values for four values and then has 4 blank cells and then the other four values. I'm not sure if I am making sense but hopefully the picture I have attached helps.
Notice 2 blank rows between first 4 rows with values (Rows 2-11) and same between rows 16 and 25.
Also notice the 4 blank rows between the two sets of values.
For me, this is repeated for 700 values, same set up of 2 blank rows for 4 sets of values and then 4 blank rows and then four sets of values with 2 blank rows. I'm sure there is an easier way to do this.
I'm trying to recreate Sheet 2 from Sheet 1 using an equation. Is this possible?
Apologies in advance, English isn't my first language.
If the numbers are going to start in B2 and the intervals and offset staggers are static then,
=INDEX(B:B, 2+(ROW(1:1)-1)*3+INT((ROW(1:1)-1)/4)*2)
If the first number is in S6 then,
=INDEX(S:S, 6+(ROW(1:1)-1)*3+INT((ROW(1:1)-1)/4)*2)
Put this in D2:
=IFERROR(INDEX(Sheet1!B:B,AGGREGATE(15,6,ROW(Sheet1!$B$2:INDEX(Sheet1!B:B,MATCH("ZZZ",Sheet1!A:A)))/(Sheet1!$B$2:INDEX(Sheet1!B:B,MATCH("ZZZ",Sheet1!A:A))<>""),ROW(1:1))),"")
And copy down till you get blanks.
This will return the numbers in order that they appear on sheet 1.
The Sheet1!$B$2:INDEX(Sheet1!B:B,MATCH("ZZZ",Sheet1!A:A)) set the data set bounds. This being an array type formula it needs to reference the smallest possible data set. This part finds the last cell in Column A and sets that as the extent of the data set so we do not do unnecessary iterations.
The MATCH part will return the last row that has text in it, if Column A has numbers then we need to change the "ZZZ" to 1E+99 to get the last row in column A with a number.
The AGGREGATE is working like a small in that it will create an array of row numbers and Errors. It will return ROW Numbers where (Sheet1!$B$2:INDEX(Sheet1!B:B,MATCH("ZZZ",Sheet1!A:A))<>"") return true. And an Error where it returns FALSE.
The second criterion 6 in Aggregate tells it to ignore the errors, so it is only looking at the returned row numbers.
The ROW(1:1) is a counter. As the formula is dragged down it will iterate to 2 then 3 and so on. This tells the Aggregate that you want the 1st then the 2nd then the 3rd and so on.
The chosen row number is then passed to the INDEX and the correct value is returned.
If your numbers are in order (smallest to largest like your example) or you want the output in order(smallest to largest) then you can use this simple equation in D2:
=IFERROR(SMALL(Sheet1!B:B,ROW(1:1)),"")
Then copy down till you get blanks.
Here is another formula you might use.
=INDIRECT(ADDRESS((INT((ROW()-ROW($A$2))/4)*14+ROW(A$2))+(MOD(ROW()-ROW($A$2),4)*3),COLUMN($A$2),1,1,"Sheet1"))
You can paste it to the first cell where you want the result and copy down.
Note that $A$2 is the cell from where all the counting starts. If your data start from A3 you can change the references accordingly. Note further that ROW($A$2) is long for 2. I chose this syntax to enable you to identify the meaning.
COLUMN($A$2), on the other hand, just identifies Column A as the source of the data to be lifted. Row 2 in this formula is insignificant. It's the A that counts. However, COLUMN($A$2) is long for just 1, meaning column No. 1, meaning A. Once you get your bearing in the formula you can replace COLUMN($A$2) with 1.

Excel Index & Match two columns while pulling data for a third

I am trying to create a formula which compares numerical values of two columns and when a match is found return data from a different column.
Here is the problem I am having - the formula I have come up with seems to go Row by Row - for example, if column A & B are the two columns I want to compare, with column C having the data I want if a match is made. If I have a value of 1 in row A1 and a value of 1 in B1 - it will successfuly return the data in column C.
The problem is that my numbers are jumping, the row's do not match, for example column A is 1,2,3 however column B is 1,3,2. The end result here is that I get data for the value 1, but on the mismatch for the second row I get no value.
Basically the formula I made seems to do a hard comparison based just off the two rows of each column - meaning it will only compare A1 to B1. What I really need is it to compare the ENTIRE column and disregard the rows completely
Here is the formula I have been fooling around with - this formula works if A1 and B1 match
=INDEX(M:M,MATCH($L:L,$V:V,0))
In this formula M has the data I need, while columns L and V have numerical values, I need it to not 'hard check' row by row and instead evaluate the entire column and when a match is found return the result (so if both columns have a '2' return that value REGARDLESS of the fact that the '2' may be in rows A2 and B9)
Hopefully I explained my issue well, and I appreciate all the help I can get
EDIT
Sorry for failing to explain it properly on my end -- I will base my explanation off the picture link below.
I need data from column B to show up in column D. What is happening is row 2 matches so the data is successfully retrieved for number 1 - however on row 3 where column A switches the numbering it compared the number '3' to the number '2' and recognizes it is not a match and returns NA -- even though there IS a match in columns A4 and C2 -- in this situation for C3 I would need the data from B4 to showup in D3
http://i.stack.imgur.com/nrKJp.png
Two formulas that will each give you the return you want, Put one of the following in D2 and copy down:
=VLOOKUP(C2,A:B,2,FALSE)
OR
=INDEX(B:B,MATCH(C2,A:A,0))

Find the difference between data in 2 columns in Excel

I have 2 sets of data. I put it in Excel e.g. column A and column B. Now I want to know which data from B is part of column A. I run this formula =IF(COUNTIF($A$1:$A$327238,B1)>0,"Exist", "Nope")
Then I 'filter it and look only 'Exist'. Based on that I know that all data in B that has label 'Exist' is part of column A
Now I want to know opposite i.e. which data from A are part of B. For that reason I use the same formula but I replace the data in columns i.e. data from B now in A and vice versa.
Then I randomly verify results.
For case 1 it looks it works fine but for second case it looks it's not accurate.
My assumption: should it work in case 2 as well ( maybe I just was not very accurate in some way ) and I should expect it to work?
Thanks
In cell C1 (assuming your data starts from 1st row) type the following =IF(A2=B2,"equal","no"), and then populate the same formula to the last row where there is still data, so that for row N, your formula in column C is =IF(AN=BN,"equal","no"). After that you will just need to count the cells with value "no" to know the differences. Sorry if I didn't get the question correctly.
Ok, assuming that the two sets of data are in columns A and B (they might be of different sizes), and the last rows of data are L and M respectively, click on D1 and type the following: =IFNA(INDEX(B$1:B$5,MATCH(A1,B$1:B$5,0),1),"Unique"). Drag down to apply this formula on D1 - DL. That's it, you have the duplicate elements. Since the duplicate elements are the same in both columns - A and B, you don't need to repeat this for column B. Note, that for all the unique elements the corresponding rows of column D have the word "Unique", so if you want the unique elements, you can just get the elements from A with the mentioned row numbers:
Just select any column's first row cell and type the following formula: =IF(D1="Unique",INDEX(A$1:A$L,ROW(D1)),"Duplicate").

Excel - Find a value in an array and return the contents of the corresponding column

I am trying to find a value within an array and then return the value in a specific row in the corresponding column.
In the example below, I need to know which bay the Chevrolet is in:
Column A Column C Column D Column E
Chevrolet Bay 1 Bay 2 Bay 3
Toyota Ford Saturn
Honda Chevrolet Jaguar
Ferrari Subaru Lexus
Mitsubishi Hundai BMW
I am looking for Chevrolet in the array C2:E5. Once it determines that the Chevrolet is in Column D, I need for it to return the value in D1. If it was in column E, I need it to return the value in E1.
Any help would be greatly appreciated. Thank you so much in advance.
Try this Array Formula:
=INDEX($C$1:$E$5,1,SMALL(IF(NOT(ISERROR(SEARCH(A1,$C$1:$E$5))),COLUMN($A:$C),99^99),1))
or if you are sure that each column contains exactly what's being searched it can be written like this:
=INDEX($C$1:$E$5,1,SMALL(IF($C$1:$E$5=A1,COLUMN($A:$C),99^99),1))
Enter formula in any cell by pressing Ctrl+Shitf+Enter.
How does it work?
Our ultimate goal is to find the Column that contains the match:
First we did the search for the match using this formula: SEARCH(A1,$C$1:$E$5). It just checks if any of the entries matched A1. Actually, it can be simplified to $C$1:$E$5=A1 but I'm not sure if all entries in each column match exactly what's in A1.
That formula will produce an array of values when entered as array formula. Something like: {SEARCH(A1,C1), SEARCH(A1,D1), SEARCH(A1,E1);... SEARCH(A1,E5)}. The result will be array of number(s) and error (if non was found). But we don't want that, else we will be returning error everytime.
We then use IF(NOT(ISERROR(SEARCH(A1,$C$1:$E$5))),COLUMN($A:$C),99^99). This formula returns the Column Number if there is a match and a relatively huge number 99^99 otherwise. Result would be: {99^99, 99^99, 99^99, 2, ..., 99^99}.
And we are close to what we need since we already have an array of Column and huge number. We just use SMALL to return the smallest number which in my opinion is the lowest Column Number where a match is found. So SMALL(IF(NOT(ISERROR(SEARCH(A1,$C$1:$E$5))),COLUMN($A:$C),99^99),1) would return 2. Which is the column where Chevrolet is referenced at $C$1:$E$1.
Since we already have the column number we simply use INDEX Function which is: INDEX($C$1:$E$5,1,2).
Note: 99^99 can be any relatively large number. Not necessarily 99^99. Actual 16385(max column number in Excel 2007 and up + 1) can be used.
Result:
Another quick and dirty answer is to put a "dummy" row above the entire data set and then determine which placeholder column returned the correct result.
In B1, you can put the equation
=MATCH($A$2,B3:B6,0)
And then in C1, you can put
=MATCH($A$2,C3:C6,0)
And so on until you've covered all the rows and columns. Change the B3:B6 & C3:C6 to reflect the actual rows of data in the given column.
Now, the fun array formula which will actually return the bay. I have this array formula in cell A1 and it is looking from B1:D1, but you can move cell A1 anywhere you want and the B1:D1 range should be all the dummy columns you made above. Also, this is assuming that the bays you want are in row 2 (if they are in a different row, change R2C to R#C where # is the row number). In order to properly enter, enter the formula and then press CTRL+SHIFT+ENTER.
=INDIRECT("R2C"&SUM(IF(ISERROR(B1:D1),FALSE,B1:D1))+1,FALSE)
If the formula is entered properly, it will show curly brackets { } around the equation when you single click on the cell.
This formula will do it, assuming that the lookup value is in A1:
="Bay "&SUMPRODUCT((B2:D5=A1)*(COLUMN(B2:D5)))-1
You could easily adjust it to add more rows and columns.
The formula returns the column number that contains the lookup value and concatenates it with the word Bay to return the exact result you want.
The -1 at the end adjusts for the fact that the Bay 2 column is actually the third column in the worksheet, so you might need to adjust that offset as well.
The SUMPRODUCT function is much undervalued. More on it here:
http://fiveminutelessons.com/learn-microsoft-excel/multiply-two-columns-and-add-results-using-sumproduct
The SUMPRODUCT formula above is perfect for what I want.
E.g. to find the location - COLUMN and ROW - of a given value at A1 in a 2D Range A2:J44
use: =SUMPRODUCT((A2:J44=A!)*(COLUMN(A2:J44))) gives the Absolute* COL#
and : =SUMPRODUCT((A2:J44=A1)*(ROW(A2:J44))) gives the Absolute* ROW#
(not where it is in the range)
These can then be used in e.g. INDEX() or CELL() etc functions.

How to sort table using function/equation only

How can I sort a table in sheet 1 like
A B C D E
3 7 3 6 5
into another table in sheet 2
A C E D B
3 3 5 6 7
by using function only?
One really easy way to do it would be to just have a rank index and then use HLOOKUP to find the corresponding values:
=RANK(A4,$A$4:$E$4,1)
=IF(COUNTIF($A$1:A$1,A1)>1,RANK(A4,$A$4:$E$4,1)+COUNTIF($A$1:A$1,A1)-1,RANK(A4,$A$4:$E$4,1))
=HLOOKUP(COLUMN(),$A$2:$E$4,2,FALSE)
=HLOOKUP(COLUMN(),$A$2:$E$4,3,FALSE)
Okay, here's the "one formula does it all" solution without additional temporary columns:
Formula in A6:
=INDEX($A$2:$E$2,MATCH(SMALL($A$3:$E$3+COLUMN($A$3:$E$3)/100000000,COLUMN()),$A$3:$E$3+COLUMN($A$3:$E$3)/100000000,0))
Enter it as an array formula, i.e. press Ctrl-Shift-Enter. Then copy it to the adjacent columns.
To get also the number, use this formula in A7 (again as array formula):
=ROUND(SMALL($A$3:$E$3+COLUMN($A$3:$E$3)/100000000,COLUMN()),6)
Both formulas are a bit bloated as they have to also handle the potential duplicates. The solution is to simply add a very tiny fraction of the column before applying the sorting (SMALL function) - and then to remove it again...
I think an easier solution is to use the "Large" function:
The Column functions are just an easy way to count from 5 down to 1, but a helper column may be even easier.
You can have a similar answer using the "Small" function as well.
Although I have used the "add a small number technique", I think this is the most elegant for a helper row/column:
=RANK(B4,$B$4:$F$4,0) + COUNTIF($B$4:$F$4,B4)-1
(copy across the row column) RANK gets you close and the COUNTIF portion handles duplicates by counting the number of duplicates of the cell to that point. Since there is always match (itself) you subtract 1 for the final rank. This "sorts" ties in the order that they appear. Note that an empty cell will generate #N/A and character data as #Values.
It's doable! :-)
Here is the sample file.
Explanation
I assume that your data is in row 1 of Sheet1 and that you want the data sorted starting in column 1 of another sheet - if not, adjust accordingly.
Place this formula in column 1 of the target sheet, say in row 2:
=ROUND(SMALL(Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,COLUMN()),6)
You need to enter the formula as an array formula, i.e. press Ctrl-Shift-Enter.
In case you place it in another column than column one, you need to replace COLUMN() with COLUMN()-COLUMN($A$2)+1 where $A$2must be a reference to the cell itself.
This formula will return you the lowest number in your range - if you copy it the next 4 columns, you'll get your order list of numbers.
To translate this back to the column number, we need to perform 2 steps:
Figure out the column number:This can be done with with this formula:
=MATCH(SMALL(Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,COLUMN()),Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,0)
- again, to be entered as array formula. If the source data does not start in column A, you need to add +COLUMN(Sheet1!$A$1)-1 where $A$1 needs to be replaced with the leftmost cell of the source data.
Convert the column number to a letter:Assuming your column number from step 1 is in cell A6, this formula will do the job:
=LEFT(ADDRESS(1,A6,2),SEARCH("$",ADDRESS(1,A6,2))-1)
Of course, you can also combine step 1&2, this will result in this megaformula:
=LEFT(
ADDRESS(
1,
MATCH(
SMALL(Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,COLUMN()),
Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,
0),
2),
SEARCH(
"$",
ADDRESS(
1,
MATCH(
SMALL(Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,COLUMN()),
Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,0),
2)
)-1
)
Again, to be entered as array formula.

Resources