Related
Here is an example of the data I'm trying to organize:
I'm looking for a way to automatically see the top 3 categories (column) for each Name# (row). The size of the category is determined by the number below the category.
Ideally, I'd also like to see a percentage breakdown (from the total) for each category. For example, in row "Name3" 2 categories make up a significantly larger portion of the total values. However, without this percentage breakdown, the 3 top values would seem to be comparable, when they are in fact, not.
Interested to see how this would all work with duplicate numbers, too.
I've tried Excel's rank function, but this doesn't tell me the categories that have the 3 largest sizes, just the 3 highest values.
With Office 365:
=FILTER(SORTBY($B$1:$H$1,B2:H2,-1),SORT(B2:H2,1,-1,TRUE)>=LARGE(B2:H2,3))
And copy down.
If there are ties it will expand the results to include it. It finds the third highest value and returns everything that is equal to or greater than it.
This approach spills all the results at once (array version). In cell J2, you can put the following formula:
=LET(D, A1:H5, A, TAKE(D,,1), DROP(REDUCE("", DROP(A,1), LAMBDA(ac,aa,
VSTACK(ac, TAKE(SORT(DROP(FILTER(D, (A=aa) + (A="")),,1),2,-1,1),1,3)))),1))
It assumes as per input data the cell A1 is empty (if not it can be adjusted accordingly). Here is the output:
An alternative that doesn't require previous assumption (but it is not really a hard one) is the following:
=LET(names, A2:A5, Data, B2:H5, colors, B1:H1, DROP(REDUCE("", names,
LAMBDA(ac,n, VSTACK(ac, TAKE(SORT(VSTACK(colors, INDEX(Data, XMATCH(n,names),0))
,2,-1,TRUE),1,3)))),1))
The non-array version can be obtained from previous approach, and expand it down:
=TAKE(SORT(VSTACK($B$1:$H$1,INDEX($B$2:$H$5, XMATCH(A2,$A$2:$A$5),0)),2,-1,TRUE),1,3)
Explanation
To spill the entire solution it uses DROP/REDUCE/VSTACK pattern. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length.
For the first formula we filter for a given element of A name (aa) via FILTER the input data (D) to select rows where the name is empty (to consider the header) OR (plus (+) condition) the name is equal to aa. We remove via DROP the first column of the filter result (names column). Next we SORT by the second row (the first rows are the colors) in descending order (-1) by column (last input parameter of SORT we can use TRUE or 1). Finally, we use TAKE to take the first three columns and the first row.
For the second approach, we select the values for a given row (names equals n) and use INDEX to select the entire row (column index 0), then we form an array via VSTACK to add as first row the colors and use the similar logic as in previous approach for sorting and select the corresponding rows and column (colors).
Notes:
If you don't have VSTACK function available, then you can replace it as follow: CHOOSE({1;2}, arr1,arr2) and substitute arr1, arr2, wit the corresponding arrays.
In the second formula instead of INDEX/XMATCH you can use: DROP(FILTER(Data, names=n),,1), it is a matter of personal preference.
I have a huge amount of data to process in which 4 points with a related prefix needs to be subtracted from each other.
Data consists of ID and x value
Example
ID = 290.12, 290.03, 290.06, 290.09, 300.12, 300.03, 300.06, 300.09, 301.12, 301.03, 301.06, 301.09
(let's call prefix a "ring number" and suffix time on the clock)
X value = any numerical value for each ID assigned
What I'm hoping to do is to search for the first number before the dot i.e. 300 and then subtract the value of 300.06-300.12 in one cell and in another cell 300.03-300.09.
(The subtraction is just an example, how I need to manipulate with the numbers is slightly more complicated, but I got this one under control)
This is my actual Data and what I need to produce is to the right of the raw data. At the moment, I'm doing it manually for each set of "rings"
Anyone knows how to approach this? I'm thinking vlookup, but I'm not very proficient in excel.
New Excel
I tried vlookup, but I don't know how to construct the formula and I run out of ideas.
Edit:
I found out that REDUCE is no requirement in this case, so it can be shortened to:
=SQRT(SUM(((INDEX(B:D,XMATCH(I3+0.09,A:A),SEQUENCE(1,3))-INDEX(B:D,XMATCH(I3+0.03,A:A),SEQUENCE(1,3)))^2)))
You could change +0.09 and +0.03 to your needs and may reference them using LET() for easy maintaining:
=LET(id,I3,
_id1,0.09,
_id2,0.03,
SQRT(SUM(((INDEX(B:D,XMATCH(id+_id1,A:A),SEQUENCE(1,3))-INDEX(B:D,XMATCH(id+_id2,A:A),SEQUENCE(1,3)))^2))))
Previous answer:
=LET(
id,I3,
_id1,0.09,
_id2,0.03,
SQRT(
REDUCE(0, SEQUENCE(1,3),
LAMBDA(x, y,
x+((INDEX(B:D,XMATCH(id+_id1,A:A),y)
-INDEX(B:D,XMATCH(id+_id2,A:A),y))
^2)))))
This formula looks for the matching value of the id value I3 + _id1 minus the matching value of id value + _id2 for columns B to D and adds the ^2 results per column. Then it calculates it's square root.
You can change _id1 and _id2 to your needs.
To calculate the Delta (as shown) at once you could use:
=LET(id,I3,
_id1,0.09,
_id2,0.03,
_id3,0.12,
_id4,0.06,
x,SQRT(SUM((INDEX(B:D,XMATCH(id+_id1,A:A),SEQUENCE(1,3))-INDEX(B:D,XMATCH(id+_id2,A:A),SEQUENCE(1,3)))^2)),
y,SQRT(SUM((INDEX(B:D,XMATCH(id+_id3,A:A),SEQUENCE(1,3))-INDEX(B:D,XMATCH(id+_id4,A:A),SEQUENCE(1,3)))^2)),
(x-y)*1000)
You can have a column of unique values of the integers and a new column where you reference these values as id and drag down the formula to get your row by row result
In another column you can refer to these columns and sort per the second column using SORTBY()
I'm creating a set of formulas to analyze different sets of json data. I would like to show the uniqueness for each field in the dataset and the top 3 values per field. The json data is pasted on one of the sheets, and the results of my analyses are shown on a different sheet.
An example of some arbitrary raw data:
For this dataset I can create the following formulas (all similar coloured cells are matrix formulas):
Cell A1 contains a formula that dynamically returns all headers (yellow). If the pasted data contains more fields, this list expands automatically. The pink area also grows or shrinks based on the amount of records and fields in the raw data.
What I would like to know is how to setup the following formulas:
Row 2: Return if the values are either all unique, or how many variations are there within each column. I allready have the formula for a single column, but I would like a matrix formula so that it automatically grows or shrinks as well.
Row 3 to 5: Return the top 3 of values within each column.
An example of the header formula (yellow):
=LET(SUB,INDIRECT("A8:"&ADDRESS(8,number_of_fields)),SUBSTITUTE(SUBSTRING(SUB,1,FIND(":",SUB)-1),"""","")
(formula translated from dutch syntax)
I know how to manually copy the formulas over, but I'm sure it's possible to convert this into a matrix formula. For example, is there a function like Repeat, but for formulas repeating for x amount of cells?
Edit after answer: Getting close! The top 3 is almost working as intended. The answer below creates the following result on a more complex dataset:
It sometimes leaves a cell empty in the top 3 for that column. Preferably the top 3 values bubble up to the top, where it populates row 2 and 3 if the column only contains 2 variations.
Maybe a little too literal, but the following formula will spill the top 3 and the splitted data as shown in the picture
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
IFERROR(VSTACK(header,_top3,"","",split),""))
split is all data (below),
_top3 is the top 3 of the frequency of the text per column.
You may only need the _top3 data though..
If I'm not mistaken, this would be the Dutch variant:
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
(I'm Dutch, but I'm not familiar with the Dutch equivalents of the newer functions, since I work with English version and support is contradicting in some times:
NEMEN might be TAKE, since it's listed as NEMEN here https://support.microsoft.com/nl-nl/office/excel-functies-alfabetisch-b3944572-255d-4efb-bb96-c6d90033e188#bm14, but if you click for it, it shows explanation for TAKE in Dutch (https://support.microsoft.com/nl-nl/office/take-functie-25382ff1-5da1-4f78-ab43-f33bd2e4e003) ).
Edit:
To "drop" the trailing boolean column you can add another condition to DROP (WEGLATEN):
WEGLATEN([data],1,-1) this means dropping the first row of the data (condition 1) and it's last column (condition -1):
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1;-1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
And to cope with columns where there's less than 3 top ranked values:
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
_top3minus,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,FILTER(INDEX(_top3,,b),INDEX(_top3,,b)<>"")))),""),,1),
IFERROR(VSTACK(header,_top3minus,"","",split),""))
I am working with a data table that contains two columns ( Max Room Height, # of bulbs). Intent is to create a formula which provides the output on # of bulbs to use after the user inputs room height. Here’s the trick, room height number that user inputs can be a random number, and could lie between two max room heights. For example , room height data is available as 10 feet, 12 feet, 14 feet, 16 feet and the and user inputs room height as 15 feet, formula should be able to pick up the # of bulbs corresponding to 16 feet height.
You can try using some of the built-in Excel functions to determine the number of bulbs to use based on the room height. Here's an example using INDEX(...), MATCH(...), and MIN(...):
I do not have access to Excel at the moment, but this worked in LibreOffice Calc v5.1.6.2. Excel seems to have comparable functions.
Long Winded Explanation
Read on, in case a picture is not worth a thousand words!
Table/Data Configuration
Create a table with two columns (Columns E and F in the example image).
The first column (Column E) represents the upper height boundaries in descending order. Note the upper bound of 10000.
The second column (Column F) represents the number of bulbs to use if a given room height falls within the height boundary.
So for this example, values between 10000 (inclusive) and 16 (exclusive) should state to forget light bulbs and just use the sun. Values between 16 (inclusive) and 14 (exclusive) should state to use 4 bulbs. Values between 14 (inclusive) and 12 (exclusive) should state to use 3 bulbs... etc.
Create a second table that has two columns for the data and results (Columns A and B in the example below; Column C is for illustrative purposes only).
The first column (Column A) will contain the "user input" (i.e. variable Room Heights whose number of bulbs should be looked up).
The second column (Column B) will contain the formula to calculate the number of bulbs to use based on the "user input" and the height boundaries defined in the first table that we created.
Formula/Calculation
Let's break down the formula that will go into the Column B cells. I left the text for the formula in Column C. You'll notice that only the first parameter of the MIN(...) function changes for each row. The rest of the formula is the same for each row.
Using row 2 as an example, we make use of 3 functions, nested together:
MIN(A2,E2) - We want to make sure the room height falls within our handled range. This works in conjunction with the the arbitrary upper bound of 10000 that was added to Column E. If we didn't force the data to fit within the upper boundary, we'd probably see some kind of error if the user exceeded the largest value specified in Column E.
MATCH(MIN(A2,E2),E2:E6,-1) - Essentially, this function finds the height range boundary under which the user entered data falls. This function has three parameters. The first is the user entered data (or the arbitrary upper bound)... MIN(A2,E2) for this row. The second is the height range boundaries (in descending order)... E2:E6. The third is the matching type... -1. The matching type of -1 means "search a descending list of values and stop when your given value (i.e. the first parameter) is equal to or less than a value in the descending list". If the first item in the descending list meets the criteria, the MATCH(...) function returns an index of 1. If the second item in the descending list meets the criteria, the function returns an index of 2... etc.
INDEX(F2:F6,MATCH(MIN(A2,E2),E2:E6,-1)) - This function essentially looks up our "answer" for the user's input. We found the "index" or "list position" of the height range for the user's input using the MATCH(...) function and we created our table such that the bulb-count for each height range is in the same row (i.e. it has the same "index" or "list position"). The INDEX(...) function accepts two parameters. The first is the range of cells containing the "answers"... F2:F6. The second parameter is the index or list position from the answer cell range that we want to return (i.e. the results of our MATCH(...) function). So if our MATCH(...) function call returns "1", the first cell from the F2:F6 range will be returned (i.e. F2 - Use the Sun!). If our MATCH(...) function call returns "2", the second cell from the F2:F6 range will be returned (i.e. F3 - 4 bulbs)... etc.
There might be better solutions out there depending on the version of Excel you are using. According to the Office documentation as of this writing, the functions used here should be valid for Excel 2007 through 2016.
I am attempting to count instances of resistance to 1 or 1+ antibiotics under certain conditions. Here is an example of what what my spreadsheet looks like:
For each drug "1" indicates resistance and "0" indicates sensitivity.
If I wanted to determine the number of bacteria in Group A that are resistant to only one antibiotic how would I do this? Or if I wanted to find how many bacteria in Group A are resistant to 1 or more antibiotics?
I've been struggling with this one for awhile so if anyone could point me in the right direction I would sure appreciate it.
Ideally my output would look like this
These are array formulas. You MUST use Ctrl-Shift-Enter to enter these formulas, and Excel will magically insert curly braces (you cannot insert them yourself).
For exactly 1 resistance, enter into I2 as I have the setup in the pic below :
=SUM(($B$2:$B$11=$H2)*IF(MMULT($C$2:$E$11, TRANSPOSE(COLUMN($C$1:$E$1)^0))=1, 1, 0)) for typeA.
Drag the formula down to get type B as in my set up.
=SUM(($B$2:$B$11=$H3)*IF(MMULT($C$2:$E$11, TRANSPOSE(COLUMN($C$1:$E$1)^0))=1, 1, 0))
For more than 1 resistance, in J2, use:
=SUM(($B$2:$B$11=$H2)*IF(MMULT($C$2:$E$11, TRANSPOSE(COLUMN($C$1:$E$1)^0))>1, 1, 0)*(MMULT(IF($C$2:$E$11=9, 0, 1), TRANSPOSE(COLUMN($C$1:$E$1)^0))= COLUMNS($C$1:$E$1)))
Again, drag the formula down to get typeB...
Note that there are only ever three cell ranges/areas which you need to input into the formulas (albeit in several spots).
The data range for the type column
The data area for the resistance test results
The column headers for the drugs(should be the same width as the test results area).
If you set up named ranges for these areas, then you could put the named range into the formulas, and never have to amend the formula range arguments when the data ranges change size. But that's another point...
Edit for explanation:
If you evaluate formula parts, you might see something like this(for the image I provided)
=SUM(({TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE})*IF(MMULT({1,0,0;1,9,1;0,1,1;9,1,0;1,0,1;1,1,0;9,1,1;0,1,9;0,1,1;1,1,0},{1;1;1})>1, 1, 0)*(MMULT({1,1,1;1,0,1;1,1,1;0,1,1;1,1,1;1,1,1;0,1,1;1,1,0;1,1,1;1,1,1},{1;1;1})= 3))
True or false evaluates as 1 or 0 when multiplied.
The effect of the '^0' is to turn each non-zero number in the resultant arrays into a '1'. This allows matrix multiplication to take place on the datarange to spit out column vectors.
TRANSPOSE({3,4,5}^0) becomes: TRANSPOSE({1,1,1}), which then becomes this: {1;1;1}. Notice the difference between commas and semicolons - this means a 1x3 vector is transposed into a 3x1 vector. Then you can use that as the second parameter into matrix multiplication with a 10x3 matrix(thats the dataset in our case).
The output of that MMULT is a 10x1 column vector representing some output.
The first MMULT is used to test if the row Total of any bacteria is > 1. The second MMULT is used to see if all the row entries for any bacteria are <> 9.
A similar thing is done for the case for Exactly One. The MMULT function is used once to determine if the row total for a bacterium is exactly 1.
Create a new column for "Resistance Count" and use =COUNTIF(B2:D2,">=1") for cell E2 and fill down. Then you can filter the table by Type or Resistance Count. Use SUBTOTAL to count the filtered rows.
Here's what I would do:
1) Create an additional column (named "Count") to the right of your table with the count of 1s in each bacteria row. The formula is: =COUNTIF(C2:E5, "1")
2) Create another column (named "Resistance")to the right of your table with a nested IF statement. The formula is: =IF(F2=1,"Resistant to 1",IF(F2=0,"Resistant to 0","Resistant to More than 1"))
3) Create a Pivot Table with this Data. Put "Resistance" in the Columns Field, "Type" in the Rows Field, and "Sum of Count" in the Values field.
This should give you exactly what you want.