Excel, convert data in one column to multiple columns - excel

As seen in the picture I have 5 sets of 2's in one column.
I would like it so that each set is in its own column.
Is there a way to do that?
I tried text to columns, but it did not work.

General solution
Imagine I have a vertical array starting in cell B2, which I want to separate into N stacked columns. I will place these columns from cell E4, as the picture indicates.
The code which achieves what I want is:
+OFFSET($B$2,(ROW()-ROW($E$4))*N+(COLUMN()-COLUMN($E$4)),0)
Replace N with your desired number (and the origin and destination cell with your particular values, B2 and E4 in this example), and expand the formula vertically and horizontally to form your desired matrix of N columns. For the case of N=3, you get:
(PS: if your array is horizontal, use transpose to transform to vertical. You can then transpose the resulting matrix, to get the final result.)
Explanation
The logic is simple. The function OFFSET has three compulsory inputs. The first one is the first point of your array you want to transform (in the example above, $B$2. The point you select has an index of 0, the one below an index of 1, etc. So, what you want is to put these ordered index into a matrix form, as shown below (for the case of N=3):
The rule to move these indexes is given in the second entry of the OFFSET function. This is basically a formula that calculates a sequence 0, 1, 2, 3 ... using some fixed values (the number of the row and columns of the first cell where you are putting the result, ROW($E$4) and COLUMN($E$4), which are equal to 4 and 5 respectively), and the variable values of the cell where you are placing the number (ROW() and COLUMN()). The formula computes the difference between actual row and reference row number, scale it by N, and adds any difference between actual and reference column. This formula gives the desired series 0, 1, 2, 3... for our desired output matrix.
Finally, the last item of OFFSET is equal to zero, since we are transforming with a vertical column of data, so no horizontal offset is needed.

You can do it with e.g. formula; enter this to C1 and fill down and right:
=OFFSET($A$1,ROW()-1+(COLUMN()-3)*6,0)

Take the total cells, dived it by 3 and cut and paste. I wasted a 30 mins trying all the solutions offered out there.
I gave up and now my project is complete. Only took about 15 seconds.

To split one column into multiple columns with column first order, in other words, without transpose, we can modify the formula as shown in https://www.extendoffice.com/documents/excel/3132-excel-convert-vector-to-matrix.html, which is the solution for row first order, i.e., with transpose, exchange the roles of ROW() and COLUMN(), example code:
=OFFSET($A$1:$A$10494,ROW()-ROW($B$1)+((COLUMN()-COLUMN($B$1))*(ROWS($A$1:$A$10494)/18)),0,1,1)
Here $a1:$a$10494 is source, $b$1 is destination, 18 is columns numbers to split into.
This can be used to get back the table structure of %debug print output in pdb, for example, which will split the output into narrow bands.

Related

How to find the 3 highest values and respective category for a cell

Here is an example of the data I'm trying to organize:
I'm looking for a way to automatically see the top 3 categories (column) for each Name# (row). The size of the category is determined by the number below the category.
Ideally, I'd also like to see a percentage breakdown (from the total) for each category. For example, in row "Name3" 2 categories make up a significantly larger portion of the total values. However, without this percentage breakdown, the 3 top values would seem to be comparable, when they are in fact, not.
Interested to see how this would all work with duplicate numbers, too.
I've tried Excel's rank function, but this doesn't tell me the categories that have the 3 largest sizes, just the 3 highest values.
With Office 365:
=FILTER(SORTBY($B$1:$H$1,B2:H2,-1),SORT(B2:H2,1,-1,TRUE)>=LARGE(B2:H2,3))
And copy down.
If there are ties it will expand the results to include it. It finds the third highest value and returns everything that is equal to or greater than it.
This approach spills all the results at once (array version). In cell J2, you can put the following formula:
=LET(D, A1:H5, A, TAKE(D,,1), DROP(REDUCE("", DROP(A,1), LAMBDA(ac,aa,
VSTACK(ac, TAKE(SORT(DROP(FILTER(D, (A=aa) + (A="")),,1),2,-1,1),1,3)))),1))
It assumes as per input data the cell A1 is empty (if not it can be adjusted accordingly). Here is the output:
An alternative that doesn't require previous assumption (but it is not really a hard one) is the following:
=LET(names, A2:A5, Data, B2:H5, colors, B1:H1, DROP(REDUCE("", names,
LAMBDA(ac,n, VSTACK(ac, TAKE(SORT(VSTACK(colors, INDEX(Data, XMATCH(n,names),0))
,2,-1,TRUE),1,3)))),1))
The non-array version can be obtained from previous approach, and expand it down:
=TAKE(SORT(VSTACK($B$1:$H$1,INDEX($B$2:$H$5, XMATCH(A2,$A$2:$A$5),0)),2,-1,TRUE),1,3)
Explanation
To spill the entire solution it uses DROP/REDUCE/VSTACK pattern. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length.
For the first formula we filter for a given element of A name (aa) via FILTER the input data (D) to select rows where the name is empty (to consider the header) OR (plus (+) condition) the name is equal to aa. We remove via DROP the first column of the filter result (names column). Next we SORT by the second row (the first rows are the colors) in descending order (-1) by column (last input parameter of SORT we can use TRUE or 1). Finally, we use TAKE to take the first three columns and the first row.
For the second approach, we select the values for a given row (names equals n) and use INDEX to select the entire row (column index 0), then we form an array via VSTACK to add as first row the colors and use the similar logic as in previous approach for sorting and select the corresponding rows and column (colors).
Notes:
If you don't have VSTACK function available, then you can replace it as follow: CHOOSE({1;2}, arr1,arr2) and substitute arr1, arr2, wit the corresponding arrays.
In the second formula instead of INDEX/XMATCH you can use: DROP(FILTER(Data, names=n),,1), it is a matter of personal preference.

Repeat formula based on dynamic range or matrix formula

I'm creating a set of formulas to analyze different sets of json data. I would like to show the uniqueness for each field in the dataset and the top 3 values per field. The json data is pasted on one of the sheets, and the results of my analyses are shown on a different sheet.
An example of some arbitrary raw data:
For this dataset I can create the following formulas (all similar coloured cells are matrix formulas):
Cell A1 contains a formula that dynamically returns all headers (yellow). If the pasted data contains more fields, this list expands automatically. The pink area also grows or shrinks based on the amount of records and fields in the raw data.
What I would like to know is how to setup the following formulas:
Row 2: Return if the values are either all unique, or how many variations are there within each column. I allready have the formula for a single column, but I would like a matrix formula so that it automatically grows or shrinks as well.
Row 3 to 5: Return the top 3 of values within each column.
An example of the header formula (yellow):
=LET(SUB,INDIRECT("A8:"&ADDRESS(8,number_of_fields)),SUBSTITUTE(SUBSTRING(SUB,1,FIND(":",SUB)-1),"""","")
(formula translated from dutch syntax)
I know how to manually copy the formulas over, but I'm sure it's possible to convert this into a matrix formula. For example, is there a function like Repeat, but for formulas repeating for x amount of cells?
Edit after answer: Getting close! The top 3 is almost working as intended. The answer below creates the following result on a more complex dataset:
It sometimes leaves a cell empty in the top 3 for that column. Preferably the top 3 values bubble up to the top, where it populates row 2 and 3 if the column only contains 2 variations.
Maybe a little too literal, but the following formula will spill the top 3 and the splitted data as shown in the picture
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
IFERROR(VSTACK(header,_top3,"","",split),""))
split is all data (below),
_top3 is the top 3 of the frequency of the text per column.
You may only need the _top3 data though..
If I'm not mistaken, this would be the Dutch variant:
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
(I'm Dutch, but I'm not familiar with the Dutch equivalents of the newer functions, since I work with English version and support is contradicting in some times:
NEMEN might be TAKE, since it's listed as NEMEN here https://support.microsoft.com/nl-nl/office/excel-functies-alfabetisch-b3944572-255d-4efb-bb96-c6d90033e188#bm14, but if you click for it, it shows explanation for TAKE in Dutch (https://support.microsoft.com/nl-nl/office/take-functie-25382ff1-5da1-4f78-ab43-f33bd2e4e003) ).
Edit:
To "drop" the trailing boolean column you can add another condition to DROP (WEGLATEN):
WEGLATEN([data],1,-1) this means dropping the first row of the data (condition 1) and it's last column (condition -1):
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1;-1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
And to cope with columns where there's less than 3 top ranked values:
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
_top3minus,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,FILTER(INDEX(_top3,,b),INDEX(_top3,,b)<>"")))),""),,1),
IFERROR(VSTACK(header,_top3minus,"","",split),""))

Excel Index to look up multiple values

I have a small data set of 2 columns and several rows (columns A and B)
I want to return each instance of codeblk 3 in a formula that is elsewhere in my sheet, (so a vlookup is out as it only shows the first instance) if it does not appear then a message to say its not there should come up.
I have the formula partially working but i cant see the reason why its not displaying the values.
My formula is as below:
This is an array
{=IF(ISERROR(INDEX($A$55:$B$70,SMALL(IF($B$55:$B$70=3,ROW($B$55:$B$70)),ROW(1:1))-1,1)),"No value's produced",INDEX($A$2:$C$7,SMALL(IF($B$55:$B$70=3,ROW($B$55:$B$70)),ROW(1:1))-1,1))}
The result that shows up is only "No values produced" but it should reflect statement B, C and D in 3 separate cells (when changing ROW(1:1), ROW(2:2) etc)
{=SMALL(IF($B$56:$B$69=4,ROW($B$56:$B$69)),ROW(1:1))} - This produces the result 68 which is the correct row.
Any ideas?
Thanks,
This is an array formula - Validate the formula with Ctrl+Shift+Enter while still in the formula bar
=IFERROR(INDEX($A$55:$B$70,SMALL(IF($B$55:$B$70=3,ROW($B$55:$B$70)-54),ROW(1:1)),1),"No value's produced")
The issue you are facing is that your index starts it's first row on $B$55, you need to offset the row numbers in the array to reflect this. For example, the INDEX contains 16 rows but if you had a match on the first row you are asking for the 55th row from that INDEX(), it just can't fulfil that.
EDIT
The offset was out of sync as your original formula included another -1 outside of the IF(), I also left an additional bracket in play (the formula above has now been edited)
The ROW() function will essentially translate $B$55:$B$70 into ROW(55:70) which will produce the array {55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70} so the offset is needed to translate those row numbers in to the position they represent in the indexed data of INDEX().
The other IF() statement then produces and array of {FALSE;2;3;4;FALSE etc.
You can see these results by highlighting parts of the formula in the formula bar and hitting F9 to calculate.

How to calculate how many bacteria in each group are resistant to an antibiotic

I am attempting to count instances of resistance to 1 or 1+ antibiotics under certain conditions. Here is an example of what what my spreadsheet looks like:
For each drug "1" indicates resistance and "0" indicates sensitivity.
If I wanted to determine the number of bacteria in Group A that are resistant to only one antibiotic how would I do this? Or if I wanted to find how many bacteria in Group A are resistant to 1 or more antibiotics?
I've been struggling with this one for awhile so if anyone could point me in the right direction I would sure appreciate it.
Ideally my output would look like this
These are array formulas. You MUST use Ctrl-Shift-Enter to enter these formulas, and Excel will magically insert curly braces (you cannot insert them yourself).
For exactly 1 resistance, enter into I2 as I have the setup in the pic below :
=SUM(($B$2:$B$11=$H2)*IF(MMULT($C$2:$E$11, TRANSPOSE(COLUMN($C$1:$E$1)^0))=1, 1, 0)) for typeA.
Drag the formula down to get type B as in my set up.
=SUM(($B$2:$B$11=$H3)*IF(MMULT($C$2:$E$11, TRANSPOSE(COLUMN($C$1:$E$1)^0))=1, 1, 0))
For more than 1 resistance, in J2, use:
=SUM(($B$2:$B$11=$H2)*IF(MMULT($C$2:$E$11, TRANSPOSE(COLUMN($C$1:$E$1)^0))>1, 1, 0)*(MMULT(IF($C$2:$E$11=9, 0, 1), TRANSPOSE(COLUMN($C$1:$E$1)^0))= COLUMNS($C$1:$E$1)))
Again, drag the formula down to get typeB...
Note that there are only ever three cell ranges/areas which you need to input into the formulas (albeit in several spots).
The data range for the type column
The data area for the resistance test results
The column headers for the drugs(should be the same width as the test results area).
If you set up named ranges for these areas, then you could put the named range into the formulas, and never have to amend the formula range arguments when the data ranges change size. But that's another point...
Edit for explanation:
If you evaluate formula parts, you might see something like this(for the image I provided)
=SUM(({TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE})*IF(MMULT({1,0,0;1,9,1;0,1,1;9,1,0;1,0,1;1,1,0;9,1,1;0,1,9;0,1,1;1,1,0},{1;1;1})>1, 1, 0)*(MMULT({1,1,1;1,0,1;1,1,1;0,1,1;1,1,1;1,1,1;0,1,1;1,1,0;1,1,1;1,1,1},{1;1;1})= 3))
True or false evaluates as 1 or 0 when multiplied.
The effect of the '^0' is to turn each non-zero number in the resultant arrays into a '1'. This allows matrix multiplication to take place on the datarange to spit out column vectors.
TRANSPOSE({3,4,5}^0) becomes: TRANSPOSE({1,1,1}), which then becomes this: {1;1;1}. Notice the difference between commas and semicolons - this means a 1x3 vector is transposed into a 3x1 vector. Then you can use that as the second parameter into matrix multiplication with a 10x3 matrix(thats the dataset in our case).
The output of that MMULT is a 10x1 column vector representing some output.
The first MMULT is used to test if the row Total of any bacteria is > 1. The second MMULT is used to see if all the row entries for any bacteria are <> 9.
A similar thing is done for the case for Exactly One. The MMULT function is used once to determine if the row total for a bacterium is exactly 1.
Create a new column for "Resistance Count" and use =COUNTIF(B2:D2,">=1") for cell E2 and fill down. Then you can filter the table by Type or Resistance Count. Use SUBTOTAL to count the filtered rows.
Here's what I would do:
1) Create an additional column (named "Count") to the right of your table with the count of 1s in each bacteria row. The formula is: =COUNTIF(C2:E5, "1")
2) Create another column (named "Resistance")to the right of your table with a nested IF statement. The formula is: =IF(F2=1,"Resistant to 1",IF(F2=0,"Resistant to 0","Resistant to More than 1"))
3) Create a Pivot Table with this Data. Put "Resistance" in the Columns Field, "Type" in the Rows Field, and "Sum of Count" in the Values field.
This should give you exactly what you want.

Sort Order formula to alphabetise in Excel

I am currently drawing up a spreadsheet that will automatically remove duplicates and alphabetize a list:
I am using the COUNTIF() function in column G to create a sort order and then VLOOKUP() to find the sort in column J.
The problem I am having is that I can't seem to get my SortOrder column to function properly. At the moment it creates an index for two number 1's meaning the cell highlighted in yellow is missed out and the last entry in the sorted list is null:
If anyone can find and rectify this mistake for me I'll be very grateful as it has been driving me insane all day! Many thanks.
I'll provide my usual method for doing an automatic pulling-in of raw data into a sorted, duplicate-removed list:
Assume raw data is in column A. In column B, use this formula to increase the counter each time the row shows a non-duplicate item in column A. Hardcord B2 to be "1", and use this formula in B3 and drag down.
=if(iserror(match(A3,$A$2:A2,0)),B2+1,B2)
This takes advantage of the fact that when we refer to this row counter in our revised list, we will use the match function, which only checks for the first matching number. Then say you want your new list of data on column D (usually I do this for display purposes, so either 'group-out' [hide] columns that form the formulas, or do this on another tab). You can avoid this step, but if you are already using helper columns I usually do each step in a different column - easier to document. In column C, starting in C3 [C2 hardcoded to 1] and drag down, just have a simple counter, which error-checks to the stop at the end of your list:
=if(C2<max(B:B),C2+1," ")
Then in column D, starting at D2 and dragged down:
=iferror(index(A:A,match(C2,B:B,0)),"")
The index function is like half of the vlookup function - it pulls the result out of a given array, when you provide it with a row number. The match function is like the other half of the vlookup function - it provides you with the row number where an item appears in a given array.
Hope this helps you in the future as well.
The actual reason that this is going wrong as implied by Jeeped's comment is that you can't meaningfully compare a string to a number unless you do a conversion because they are stored differently. So COUNTIF counts numbers and text separately.
20212 will give a count of 1 because it is the only (or lowest) number.
CS10Z002 will give a count of 1 because it is the first text string in alphabetical order.
Another approach is to add the count of numbers to the count if the current cell contains text:-
=COUNTIF(INDIRECT("$D$2:$D$"&$F$3),"<="&D2)+ISTEXT(D2)*COUNT(INDIRECT("$D$2:$D$"&$F$3))
It's easier to show the result of three different conversions with some test data:-
(0) No conversion - just use COUNTIF
=COUNTIF(D$2:D$7,"<="&D2)
"999"<"abc"<"def", 999<1000
(1) Count everything as text
=SUMPRODUCT(--(D$2:D$7&""<=D2&""))
"1000"<"999"
(2) Count numbers before text
=COUNTIF(D$2:D$7,"<="&D2)+ISTEXT(D2)*COUNT(D$2:D$7)
999<1000<"999"
(3) Count everything as text but convert numbers with leading zeroes
=SUMPRODUCT(--(TEXT(D$2:D$7,"000000")<=TEXT(D2,"000000")))
"000999" = "000999", "000999"<"001000"

Resources