Find values occurring in multiple columns in excel - excel

I have sets of gene probes that are upregulated when put under different chemical stresses. Each column contains all of the upregulated gene probes. I have 12 columns, how do I get a list of gene probes that appear in all 12 columns?
I've been able to find similarities between two columns using the formula
=IF(ISERROR(MATCH(A2,$C$2:$C$21473,0)),"",A2)
but cant work out how to adapt it to include 12 columns
G.Ac G.As G.At G.Ac.At G.As.Ac G.As.At G.Cd G.Cu G.Ni
G.Cd.Cu G.Cd.Ni G.Ni.Cu
GENE:JGI_V11_3346220103 GENE:JGI_V11_2653050203 GENE:JGI_V11_3299790103
GENE:JGI_V11_359040103 GENE:JGI_V11_2228010103 GENE:JGI_V11_2662750203
GENE:JGI_V11_1926920303 GENE:JGI_V11_3134270303 GENE:JGI_V11_3119540303
GENE:JGI_V11_3134270203 GENE:JGI_V11_1926920303 GENE:JGI_V11_3134270303
GENE:JGI_V11_3164760203 GENE:JGI_V11_565470303 GENE:JGI_V11_2296170203
GENE:JGI_V11_2045300203 GENE:JGI_V11_2421620203 GENE:JGI_V11_2228010303
GENE:JGI_V11_2196580303 GENE:JGI_V11_3134270203 GENE:JGI_V11_3119540203
GENE:JGI_V11_1926920103 GENE:JGI_V11_1926920103 GENE:JGI_V11_1014720202
GENE:JGI_V11_478830203 GENE:JGI_V11_3168730303 GENE:JGI_V11_3311070202
GENE:JGI_V11_3216620102 GENE:JGI_V11_2653050303 GENE:JGI_V11_3300140202
GENE:JGI_V11_2653050303 GENE:JGI_V11_1159220202 GENE:JGI_V11_2024180303
GENE:JGI_V11_1926920303 GENE:JGI_V11_2196580303 GENE:JGI_V11_1159220202
GENE:JGI_V11_3164760303 GENE:JGI_V11_2228010203 GENE:JGI_V11_2341670203
GENE:JGI_V11_1938910303 GENE:JGI_V11_3026230203 GENE:JGI_V11_2449230203
GENE:JGI_V11_3134270303 GENE:JGI_V11_2235750203 GENE:JGI_V11_1981410203
GENE:JGI_V11_3251310202 GENE:JGI_V11_977750103 GENE:JGI_V11_954070203
GENE:JGI_V11_2267320203 GENE:JGI_V11_2268000303 GENE:JGI_V11_2226270101
GENE:JGI_V11_3003640303 GENE:JGI_V11_223520203 GENE:JGI_V11_2662750103
GENE:JGI_V11_2228010103 GENE:JGI_V11_3251310202 GENE:JGI_V11_3198630203
GENE:JGI_V11_3134270303 GENE:JGI_V11_1926920203 GENE:JGI_V11_287750103
GENE:JGI_V11_465160203 GENE:JGI_V11_2268000203 GENE:JGI_V11_2473230303
GENE:JGI_V11_3192220102 GENE:JGI_V11_3026230303 GENE:JGI_V11_3039310303
GENE:JGI_V11_1926920103 GENE:JGI_V11_1159220102 GENE:JGI_V11_3052790202
GENE:JGI_V11_3075830303 GENE:JGI_V11_2196580203 GENE:JGI_V11_3134280203
GENE:JGI_V11_3142970303 GENE:JGI_V11_503720303 GENE:JGI_V11_2236410103
GENE:JGI_V11_3042230103 GENE:JGI_V11_2228010203 GENE:JGI_V11_3028210101
GENE:JGI_V11_2105710303 GENE:JGI_V11_1926920303 GENE:JGI_V11_2131620103
GENE:JGI_V11_1002840203 GENE:JGI_V11_2088480203 GENE:JGI_V11_3196120102
Heres the first 8 rows of the 12 columns. There are 21473 rows in total.
Thanks

You could use an array formula like this to count how many columns a particular gene probe occurs in
=SUM(--(MMULT(TRANSPOSE(ROW(A$2:L$10000)^0),N(A$2:L$10000=A2))>0))
This is a standard way of getting column totals for a 2D array - in this case an array of true/false values corresponding to instances of an array element being equal/unequal to A2.
It is rather a brute force approach - it needs ~120K multiplications for each row. If you copy the formula down for ~10K rows, there is a delay of ~100 seconds on my computer while Excel works out the results.
Must be entered as an array formula using CtrlShiftEnter
In this dummy data C is the only value that occurs in all 12 columns.

Related

Excel AVERAGEIFS looking up ONE of the criteria columns

I have built a large data set and I need to see the average results given many different criteria. I've done this with the AVERAGEIFS function and it works just fine, however the more and more I add its getting really time intensive.
I'm wondering if there is a way to nest a vlookup or index match or anything like that in the AVERAGEIFS that read the criteria column heading and criteria in a cell (or 2 if they need to be separated) to be added to the AVERAGEIFS.
Here is an example of my spreadsheet:
The first 3 sets of criteria I want to stay locked.
I want it to read what the 4th criteria column and criteria should be by referencing the I11 cell. The highlighted portion in the formula bar is the part that I want to reference I11 so it reads it and knows that the 4th criteria is the 'code' column and the criteria is '>7'. I can separate this into 2 separate cells if need be.
I've tried a few combinations of VLOOKUP and INDEX MATCH but cannot get it to work.
Data as Text:
Price,Type,sub cat,Time,code,amount,Result,,
,,,,,,,,
9.95,t2,d,ac,2.18," 22,780,893 ",0.73,,T2 and D and AC
118.94,u2,d,bo,2.78," 172,110,893 ",4.07,,
57.63,t1,u,ac,7.09," 128,419,877 ",-2.16,,code
8.88,t2,d,ac,1.50," 62,634,868 ",12.72,,amount < 100 000 000
11.61,u1,u,ac,2.14," 146,982,736 ",1.07,,price >10
13.46,u3,u,ac,0.93," 17,513,672 ",-13.93,,
31.53,t1,u,ac,0.89," 47,170,877 ",1.39,,
16.34,t3,d,bo,1.07," 1,914,767,076 ",-1.42,,
111.59,u1,d,bo,0.62," 2,283,546,000 ",0.67,,
72.4,u3,d,bo,10.37," 951,541,514 ",1.13,,
34.55,u3,d,bo,0.77," 951,541,514 ",-2.52,,
42.25,t1,d,bo,1.05," 63,748,352 ",8.88,,
17.18,u3,u,ac,2.64," 140,217,257 ",4.35,,
97.66,t1,d,bo,3.45," 1,070,383,954 ",1.33,,
58.49,t2,u,bo,8.64," 151,876,559 ",-0.92,,
64.48,t2,d,ac,2.35," 291,967,334 ",3.03,,
38.4,t1,u,ac,17.05," 83,478,472 ",-4.31,,
20.87,u3,d,ac,28.92," 214,080,937 ",-2.16,,
36.53,t1,d,ac,1.43," 73,438,589 ",-2.07,,
89.16,t3,u,ac,1.41," 26,786,958 ",-1.75,,
15.84,t1,u,bo,2.90," 133,560,818 ",1.76,,
3.2,u3,u,bo,2.95," 215,677,667 ",-1.06,,
25.46,t1,d,bo,3.92," 57,148,431 ",1.89,,
40,t2,d,ac,8.00," 65,274,903 ",0.61,,
27.72,t1,u,ac,2.50," 381,400,886 ",6.46,,
29.07,u3,u,ac,2.32," 52,632,107 ",-0.78,,
173.31,t1,d,ac,3.58," 31,547,380 ",-4.92,,
18.22,u3,d,ac,0.58," 292,669,493 ",4.06,,
9.59,t1,d,bo,3.60," 266,883,020 ",3.16,,
115.22,t2,d,bo,4.51," 132,376,476 ",0.78,,
64.48,u3,d,ac,3.03," 338,360,104 ",-0.95,,
41.74,t1,u,bo,25.65," 245,766,436 ",-3.42,,
5.99,t3,u,bo,2.15," 175,054,713 ",-4.37,,
Use INDEX/MATCH to return the correct column. This will require that you separate the column name and the criteria:
=AVERAGEIFS(G:G,B:B,"T2",C:C,"D",D:D,"AC",INDEX(A:F,0,MATCH(I11,$A$7:$G$7,0)),J11)
An idea:
I10 - "Write down the limitation. (You have to use <,>,=,<> AND the value, for e.g.: <5)"
I11 - The user can use relations and values.
In J11, you can reference to I11 ;) It works for me.

Excel sum based on matrix condition and multiple criteria

Following from the example here I'm trying to add additional conditions to a sum formula. I've represented an example below:
The output that I'm looking for for example for Jan 2017 is
2017
1
UP A 1
UP B 6
UP C 6
DOWN A 1
DOWN B 8
DOWN C 7
I tried with the following formula:
=MMULT(--($B$17:$C$17="X"),MATCH(1,($A23=$C$2:$C$14)*(C$21=$A$2:$A$14)*(C$22=$B$2:$B$14)*($E$2:$E$14=$D$2:$D$14),0))
but I get a N/A value.
Does anyone know it if is possible to do it?
In your first example the number of rows in array1 and number of columns in array2 were equal, five. Here you have two columns and 13 rows. That they are unequal here is part (all) of the reason why you are having an issue.
Also your match function is returning a Boolean not an array
I have a way to do this using matrix condition and multiple criteria but had to change problem up a bit, see photo for example:
{=MMULT(--(D18:P18="x"),E$2:E$14*(--(A$2:A$14=$C$21)*--(B$2:B$14=$C$22)*--(C$2:C$14=A24)))"
https://i.stack.imgur.com/FEvgR.png
You can create a formula to fill the second matrix with X's see below
=IF(OR(INDIRECT("D"&VALUE(D20))=$A$18,INDIRECT("D"&VALUE(D20))=$B$18),"X","")
https://i.stack.imgur.com/4rS4L.png
That being said I don't think this is particularly efficient as you are treating the one of the matrixes as a all 1's so you basically just adding an extra criteria / Boolean with added complexity....that being said u asked for this specifically and I believe that I have delivered that LOL
Just add two SUMIFS together.
=SUMIFS($E$2:$E$14, $A$2:$A$14, C$21, $B$2:$B$14, C$22, $C$2:$C$14, $A23, $D$2:$D$14, IF(INDEX($B$17:$C$19, MATCH($B23, $A$17:$A$19, 0), 1)="x", $B$16))+
SUMIFS($E$2:$E$14, $A$2:$A$14, C$21, $B$2:$B$14, C$22, $C$2:$C$14, $A23, $D$2:$D$14, IF(INDEX($B$17:$C$19, MATCH($B23, $A$17:$A$19, 0), 2)="x", $C$16))

Excel, get data from matrix

I have a matrix table in excel, from which I need to get data, with two parameters.
Here's my matrix table:
Now, I want to enter manually Parameter 1 and Parameter 2 as numbers in two fields:
With the values 900 and 2000 it should get the number 283 from the matrix.
How can I do that? I tried at least 5 different variations (german excel) with INDEX and VERGLEICH (MATCH), or with SVERWEIS (VLOOKUP) with VERGLEICH (MATCH).
Here's my latest one:
=SVERWEIS(B55;'table_2'!B41:AO41;VERGLEICH(C55;'table_2'!A5:A40;0);FALSCH)
In this code, B55 is Parameter 1 and C55 is Parameter 2.
Use this formula:
=SVERWEIS(C55;'table_2'!A5:AO40;VERGLEICH(B55;'table_2'!B41:AO41;0)+1;FALSCH)

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

Return a row number that matches multiple criteria in vbs excel

I need to be able to search my whole table for a row that matches multiple criteria. We use a program that outputs data in the form of a .csv file. It has rows that separate sets of data, each of these headers don't have any columns that are unique in of them self but if i searched the table for multiple values i should be able to pinpoint each header row. I know i can use Application.WorksheetFunction.Match to return a row on a single criteria but i need to search on two three or four criteria.
In pseudo-code it would be something like this:
Return row number were column A = bill & column B = Woods & column C = some other data
We need to work with arrays:
There are 2 kinds of arrays:
numeric {1,0,1,1,1,0,0,1}
boolean {TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE}
to convert between them we can use:
MATCH function
MATCH(1,{1,0,1,1,1,0,0,1},0) -> will result {TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE}
simple multiplication
{TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE}*{TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE} -> will result {1,0,1,1,1,0,0,1}
you can can check an array in the match function, entering it like in the picture below, be warned that MATCH function WILL TREAT AN ARRAY AS AN "OR" FUNCTION (one match will result in true
ie:
MATCH(1,{1,0,1,1,1,0,0,1},0)=TRUE
, YOU MUST CTR+SHIFT+ENTER !!! FOR IT TO GIVE AN ARRAY BACK!!!
in the example below i show that i want to sum the hours of all the employees except the admin per case
we have 2 options, the long simple way, the complicated fast way:
long simple way
D2=SUMPRODUCT(C2:C9,(A2=A2:A9)*("admin"<>B2:B9)) <<- SUMPRODUCT makes a multiplication
basically A1={2,3,11,3,2,4,5,6}*{0,1,1,0,0,0,0,0} (IT MUST BE A NUMERIC ARRAY TO THE RIGHT IN SUMPRODUCT!!!)
ie: A1=2*0+3*1+11*1+3*0+2*0+4*0+5*0+6*0
this causes a problem because if you drag the cell to autocomplete the rest of the cells, it will edit the lower and higher values of
ie: D9=SUMPRODUCT(C9:C16,(A9=A9:A16)*("admin"<>B9:B16)), which is out of bounds
same as the above if you have a table and want to view the results in a diferent order
the fast complicated way
D3=SUMPRODUCT(INDIRECT("c2:c9"),(A3=INDIRECT("a2:a9"))*("admin"<>INDIRECT("b2:b9")))
it's the same, except that INDIRECT was used on the cells that we want not be modified when autocompleting or table reorderings
be warned that INDIRECT sometimes give VOLATILE ERROR,i recommend not using it on a single cell or using it only once in an array
f* c* i cant post pictures :(
table is:
case emplyee hours totalHoursPerCaseWithoutAdmin
1 admin 2 14
1 him 3 14
1 her 11 14
2 him 3 5
2 her 2 5
3 you 4 10
3 admin 5 10
3 her 6 10
and for the functions to check the arrays, open the insert function button (it looks like and fx) then doubleclick MATCH and then if you enter inside the Lookup_array a value like
A2=A2:A9 for our example it will give {TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE} that is because only the first 3 lines are from case=1
Something like this?
Assuming that you data in in A1:C20
I am looking for "Bill" in A, "Woods" in B and "some other data" in C
Change as applicable
=IF(INDEX(A1:A20,MATCH("Bill",A1:A20,0),1)="Bill",IF(INDEX(B1:B20,MATCH("Woods",B1:B20,0),1)="Woods",IF(INDEX(C1:C20,MATCH("some other data",C1:C20,0),1)="some other data",MATCH("Bill",A1:A20,0),"Not Found")))
SNAPSHOT
I would use this array* formula (for three criteria):
=MATCH(1,((Range1=Criterion1)*(Range2=Criterion2)*(Range3=Criterion3)),0)
*commit with Ctrl+Shift+Enter

Resources