Repeat formula based on dynamic range or matrix formula - excel-formula

I'm creating a set of formulas to analyze different sets of json data. I would like to show the uniqueness for each field in the dataset and the top 3 values per field. The json data is pasted on one of the sheets, and the results of my analyses are shown on a different sheet.
An example of some arbitrary raw data:
For this dataset I can create the following formulas (all similar coloured cells are matrix formulas):
Cell A1 contains a formula that dynamically returns all headers (yellow). If the pasted data contains more fields, this list expands automatically. The pink area also grows or shrinks based on the amount of records and fields in the raw data.
What I would like to know is how to setup the following formulas:
Row 2: Return if the values are either all unique, or how many variations are there within each column. I allready have the formula for a single column, but I would like a matrix formula so that it automatically grows or shrinks as well.
Row 3 to 5: Return the top 3 of values within each column.
An example of the header formula (yellow):
=LET(SUB,INDIRECT("A8:"&ADDRESS(8,number_of_fields)),SUBSTITUTE(SUBSTRING(SUB,1,FIND(":",SUB)-1),"""","")
(formula translated from dutch syntax)
I know how to manually copy the formulas over, but I'm sure it's possible to convert this into a matrix formula. For example, is there a function like Repeat, but for formulas repeating for x amount of cells?
Edit after answer: Getting close! The top 3 is almost working as intended. The answer below creates the following result on a more complex dataset:
It sometimes leaves a cell empty in the top 3 for that column. Preferably the top 3 values bubble up to the top, where it populates row 2 and 3 if the column only contains 2 variations.

Maybe a little too literal, but the following formula will spill the top 3 and the splitted data as shown in the picture
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
IFERROR(VSTACK(header,_top3,"","",split),""))
split is all data (below),
_top3 is the top 3 of the frequency of the text per column.
You may only need the _top3 data though..
If I'm not mistaken, this would be the Dutch variant:
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
(I'm Dutch, but I'm not familiar with the Dutch equivalents of the newer functions, since I work with English version and support is contradicting in some times:
NEMEN might be TAKE, since it's listed as NEMEN here https://support.microsoft.com/nl-nl/office/excel-functies-alfabetisch-b3944572-255d-4efb-bb96-c6d90033e188#bm14, but if you click for it, it shows explanation for TAKE in Dutch (https://support.microsoft.com/nl-nl/office/take-functie-25382ff1-5da1-4f78-ab43-f33bd2e4e003) ).
Edit:
To "drop" the trailing boolean column you can add another condition to DROP (WEGLATEN):
WEGLATEN([data],1,-1) this means dropping the first row of the data (condition 1) and it's last column (condition -1):
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1;-1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
And to cope with columns where there's less than 3 top ranked values:
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
_top3minus,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,FILTER(INDEX(_top3,,b),INDEX(_top3,,b)<>"")))),""),,1),
IFERROR(VSTACK(header,_top3minus,"","",split),""))

Related

How to find the 3 highest values and respective category for a cell

Here is an example of the data I'm trying to organize:
I'm looking for a way to automatically see the top 3 categories (column) for each Name# (row). The size of the category is determined by the number below the category.
Ideally, I'd also like to see a percentage breakdown (from the total) for each category. For example, in row "Name3" 2 categories make up a significantly larger portion of the total values. However, without this percentage breakdown, the 3 top values would seem to be comparable, when they are in fact, not.
Interested to see how this would all work with duplicate numbers, too.
I've tried Excel's rank function, but this doesn't tell me the categories that have the 3 largest sizes, just the 3 highest values.
With Office 365:
=FILTER(SORTBY($B$1:$H$1,B2:H2,-1),SORT(B2:H2,1,-1,TRUE)>=LARGE(B2:H2,3))
And copy down.
If there are ties it will expand the results to include it. It finds the third highest value and returns everything that is equal to or greater than it.
This approach spills all the results at once (array version). In cell J2, you can put the following formula:
=LET(D, A1:H5, A, TAKE(D,,1), DROP(REDUCE("", DROP(A,1), LAMBDA(ac,aa,
VSTACK(ac, TAKE(SORT(DROP(FILTER(D, (A=aa) + (A="")),,1),2,-1,1),1,3)))),1))
It assumes as per input data the cell A1 is empty (if not it can be adjusted accordingly). Here is the output:
An alternative that doesn't require previous assumption (but it is not really a hard one) is the following:
=LET(names, A2:A5, Data, B2:H5, colors, B1:H1, DROP(REDUCE("", names,
LAMBDA(ac,n, VSTACK(ac, TAKE(SORT(VSTACK(colors, INDEX(Data, XMATCH(n,names),0))
,2,-1,TRUE),1,3)))),1))
The non-array version can be obtained from previous approach, and expand it down:
=TAKE(SORT(VSTACK($B$1:$H$1,INDEX($B$2:$H$5, XMATCH(A2,$A$2:$A$5),0)),2,-1,TRUE),1,3)
Explanation
To spill the entire solution it uses DROP/REDUCE/VSTACK pattern. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length.
For the first formula we filter for a given element of A name (aa) via FILTER the input data (D) to select rows where the name is empty (to consider the header) OR (plus (+) condition) the name is equal to aa. We remove via DROP the first column of the filter result (names column). Next we SORT by the second row (the first rows are the colors) in descending order (-1) by column (last input parameter of SORT we can use TRUE or 1). Finally, we use TAKE to take the first three columns and the first row.
For the second approach, we select the values for a given row (names equals n) and use INDEX to select the entire row (column index 0), then we form an array via VSTACK to add as first row the colors and use the similar logic as in previous approach for sorting and select the corresponding rows and column (colors).
Notes:
If you don't have VSTACK function available, then you can replace it as follow: CHOOSE({1;2}, arr1,arr2) and substitute arr1, arr2, wit the corresponding arrays.
In the second formula instead of INDEX/XMATCH you can use: DROP(FILTER(Data, names=n),,1), it is a matter of personal preference.

SumIF Using Table/Named Range Instead of Single Cell Criteria

I have 2 sheets in a workbook (Sheet1, Sheet2).
Sheet 2 contains a table (Named Table1) with 5 columns:
Takeaways
Household
Clothing
Fuel
Groceries
On sheet one, I have 2 columns:
Expense Name
Expense Total
Now, what I am trying to do is:
Set the range for the Expense Name (Range 1)
Set the range for the Expense Total (Range 2)
Compare Range 1 with the respective column in the table and only add up the values for matches
For example, in Range 1 (B6:B16):
BP
Caltex
McDonalds
KFC
In Range 2 (C6:C16):
300
400
200
150
Now, all I want to do is add up the values for the Takeaways (McDonalds, KFC) and exclude anything that DOES NOT match the criteria.
So my sum total will be all occurrences of Takeaways - provided they are listed in my table - 350 in this case.
But I cannot seem to get the formula to work.
I used these sources:
https://exceljet.net/excel-functions/excel-sumifs-function
Selecting a Specific Column of a Named Range for the SUMIF Function
and ended up with this formula:
=SUMIF($B$6:$B$16;Table1[Takeaways];C6:C16)
This source:
https://excelchamps.com/blog/sumif-sumifs-or-logic/
and ended up with this formula:
=SUM(SUMIFS(C6:C16;B6:B16;Table1[Takeaways]))
Both formulae return 0.
BUT, with BOTH of them, if I change Table1[Takeaways] to "McDonalds", then it correctly identifies every occurrence of the word "McDonalds" in Range 1.
EDIT:
I have updated the formulae above to match the images below.
This is the table that contains the references:
This table contains the data:
Formula:
Cell C4 (Next to Takeaways): =SUMIF($B$6:B$16;Table1[Takeaways];C6:C16)
Cell C5 (Next to Fuel): =SUM(SUMIFS(C6:C16;B6:B16;Table1[Fuel]))
It appears that ONLY BP is being detected in the formula.
This is a an output table when I use the formulae with a single cell reference and not a table or used range:
Formula:
Cell F4 (Next to BP): =SUMIF($B$6:B$16;"BP";C6:C16)
Cell F5 (Next to Caltex): =SUM(SUMIFS(C6:C16;B6:B16;"Caltex"))
Cell F6 (Next to McDonalds): =SUMIF($B$6:B$16;"McDonalds";C6:C16)
Cell F7 (Next to KFC): =SUM(SUMIFS(C6:C16;B6:B16;"KFC"))
If I understand correctly what you're trying to achieve, I think your setup is not right conceptually.
It looks like you're trying to track expenses, and each expense (or payee) is allocated to a category ("Takeaways", "Household" etc.). From a relational-model point of view, your second table (which defines the category for each expense/payee) should only have two columns (or variables): Expense Name and Expense Category.
The table you set up ('Sheet 2') uses the categories (i.e., possible values) as different columns (i.e., variables). But there's only variable, namely the "Expense Category", and the categories themselves are the possible values.
If you set it up like that, the problem changes: you can add a dependent column to your first table that shows the category for each payee (or "Expense Name"), using a VLOOKUP() from the second table.
You can then sum the expenses for all payees matching that category.
Note: I've created the illustration using LibreOffice Calc, so there might be some small differences, but the logic is the same.
Without seeing the data in L and K I can't give you a full answer - but likely it's to do with the way you're pulling your Array
Try something similar to this
=SUMPRODUCT(SUMIFS($L$11:$L$43,$K$11:$K$43,CHOOSE({1,2},Takeaways,"anything else you wanted to sum")))
Remember SUMIFS is for multiple criteria, so if you're only calculating one, you'll need =SUMPRODUCT(SUMIF(
The way the above works is with vertical vectors only, but changing your named ranges so the table of 2 columns is 2 named ranges instead should be okay - unless it's part of your requirements
Table 2 would become expense_Name and expense_Total etc
I was about to close this as a duplicate of my own question here but there is a bit of a difference in using a named range I think. However the logic behind this follows more or less the same approach.
Working further on my partial solution below I derived the following formula:
=SUMPRODUCT(COUNTIF(Table1[Takeaways];Range1)*Range2)
The COUNTIF() part counts the number of occurrences of the cell value in your table. Therefore make sure there are no duplicates in your table. If the value is present in the table the result of COUNTIF() will be 0. This way we create a matrix of 1's and 0's. By multiplying and the use of SUMPRODUCT() we force excel to perform matrix calculations and return the correct result.
Partial solution
I used the following formula:
=SUMPRODUCT(ISNUMBER(MATCH(Range1;Table1[Takeaways]))*Range2)
The formula does the following:
The MATCH()checks if the value in Range1 is present in your table and returns the position of the matching value in your table.
The ISNUMBER() checks if a match is found by checking if the MATCH() fucntion returned a number
Multiplying this with Range2 forces matrix calculation, using the SUMPRODUCT() function
EDIT:
This worked for a really limited sample. As soon as I added the fourth row to my data the formula stopped working as intended. See screenshot:
It took the first two values into the sum correctly, the fourth is not taken into account.

Excel, convert data in one column to multiple columns

As seen in the picture I have 5 sets of 2's in one column.
I would like it so that each set is in its own column.
Is there a way to do that?
I tried text to columns, but it did not work.
General solution
Imagine I have a vertical array starting in cell B2, which I want to separate into N stacked columns. I will place these columns from cell E4, as the picture indicates.
The code which achieves what I want is:
+OFFSET($B$2,(ROW()-ROW($E$4))*N+(COLUMN()-COLUMN($E$4)),0)
Replace N with your desired number (and the origin and destination cell with your particular values, B2 and E4 in this example), and expand the formula vertically and horizontally to form your desired matrix of N columns. For the case of N=3, you get:
(PS: if your array is horizontal, use transpose to transform to vertical. You can then transpose the resulting matrix, to get the final result.)
Explanation
The logic is simple. The function OFFSET has three compulsory inputs. The first one is the first point of your array you want to transform (in the example above, $B$2. The point you select has an index of 0, the one below an index of 1, etc. So, what you want is to put these ordered index into a matrix form, as shown below (for the case of N=3):
The rule to move these indexes is given in the second entry of the OFFSET function. This is basically a formula that calculates a sequence 0, 1, 2, 3 ... using some fixed values (the number of the row and columns of the first cell where you are putting the result, ROW($E$4) and COLUMN($E$4), which are equal to 4 and 5 respectively), and the variable values of the cell where you are placing the number (ROW() and COLUMN()). The formula computes the difference between actual row and reference row number, scale it by N, and adds any difference between actual and reference column. This formula gives the desired series 0, 1, 2, 3... for our desired output matrix.
Finally, the last item of OFFSET is equal to zero, since we are transforming with a vertical column of data, so no horizontal offset is needed.
You can do it with e.g. formula; enter this to C1 and fill down and right:
=OFFSET($A$1,ROW()-1+(COLUMN()-3)*6,0)
Take the total cells, dived it by 3 and cut and paste. I wasted a 30 mins trying all the solutions offered out there.
I gave up and now my project is complete. Only took about 15 seconds.
To split one column into multiple columns with column first order, in other words, without transpose, we can modify the formula as shown in https://www.extendoffice.com/documents/excel/3132-excel-convert-vector-to-matrix.html, which is the solution for row first order, i.e., with transpose, exchange the roles of ROW() and COLUMN(), example code:
=OFFSET($A$1:$A$10494,ROW()-ROW($B$1)+((COLUMN()-COLUMN($B$1))*(ROWS($A$1:$A$10494)/18)),0,1,1)
Here $a1:$a$10494 is source, $b$1 is destination, 18 is columns numbers to split into.
This can be used to get back the table structure of %debug print output in pdb, for example, which will split the output into narrow bands.

Convert an Array Formula's Text Results into a Usable Format

When the results of an Array Formula are numbers, I find it generally easy to find an appropriate method to collapse the array into a single result. However when the results of an Array Formula are text, I find it difficult to manipulate the formula in a way which provides a single desired result. In short, is there a method of manipulating an Array of text results which I have overlooked? See the bottom of this question for the final desired formula which doesn't work, and request for solutions.
*Edit - after reading through this again, I can alternately summarize my question as: is there a way to access multiple text elements from a 'Formula Array result', without individually selecting (eg: with INDEX)?
Examples where Array Formulas work, where the Result Array is number values
(1) Example 1: Assume column A rows 1-500 is a list of product ID's in the format of xyz123, and column B rows 1-500 shows total sales for that product. If I want to find the sales for the product with the highest sales, where the last 3 digits of an ID are above 400, I could use an Array Formula like so (confirmed with CTRL + SHIFT + ENTER instead of just ENTER):
=MAX(IF(VALUE(RIGHT(A1:A500,3))>400,B1:B500,""))
(2) Example 2 Now assume that column B contains product names, instead of Sales. I now want to simply return the first name which matches criteria of the last 3 digits of the product ID being > 400. This could be done as follows:
=INDEX(B1:B500,MIN(IF(VALUE(RIGHT(A1:A500,3))>400,ROW(A1:A500),"")))
Here, I have done a little manipulation, so that the actual Array part of the formula [IF(RIGHT(A1:A500,3...] returns a value result [the ROWs of the cellsA1:A500 where the last 3 digits are above 400]; I can therefore use MIN to show only the first ROW # which matches, and then I can use that collapsed result in a regular INDEX function.
(3) Example 3 For a final example, see the discussion on a similar question here [Goes more in-depth than my summarized example below, in a way not directly relevant to this question]: https://stackoverflow.com/a/31325935/5090027
Assume now that you want a list of all product names, where the last 3 digits of the product ID >400. To my knowledge, this cannot really be done in a single Cell, it would have to be done by placing each individual result on a subsequent cell. The following formula could be placed, for example, in C1 and dragged down 10 rows, and would then show the first 10 product names with the product ID's having last 3 digits > 400.
=INDEX($B$1:$B$500,SMALL(IF(VALUE(RIGHT($A$1:$A$500,3))>400,ROW($A$1:$A$500),""),ROW()))
Example where Array Formulas will not work, where the result array is text values
Now assume that I want to take the results in Example 3, and perform some text manipulation on them. For example, assume I want to concatenate them all into a single string of text. The below doesn't work, because concatenate won't take an array of results like this as acceptable arguments.
=CONCATENATE((IF(VALUE(RIGHT($A$1:$A$500,3))>400,ROW($B$1:$B$500),"")))
So the question is: does anyone know how to get this last formula to work? Or, how to get a formula to work which takes an array of text results, and either converts it into a 'usable range' [so it can be plugged into Concatenate above], or can be manipulated with text arguments immediately [such as mid, search, substitute, etc.]? Right now the only method I can see would be using example 3 above, and then going further and saying, for example, Concatenate(C1,C2,C3...C10).
As stated previously, there is no native function which can do what you want in a single cell. If you absolutely cannot use VBA, then you could use a helper column (can hide the column if preferred) and then have the cell where you want the result simply show the last cell of the helper column.
Example:
Produce Name Type
Apple Fruit
Broccoli Vegetable
Carrot Vegetable
Orange Fruit
Say you want a single cell to show all Fruit results. You could use another column to host this formula. You will be hiding the column later, so let's use one out of the way, like column Z. We also want to easily be able to change what you're looking for, so we'll put the condition in cell D2. In cell Z2 and copied down, you would use this formula:
=IF(B2=$D$2,IF(Z1="",A2,Z1&", "&A2),IF(Z1="","",Z1))
That would result in the following:
Produce Name Type Search For (other columns until you get to Z)
Apple Fruit Fruit Apple
Broccoli Vegetable Apple
Carrot Vegetable Apple
Orange Fruit Apple, Orange
Then in wherever you want your result cell, we'll say D3, simply use this formula to get the last result from your helper column, and then hide the helper column.
=Z5
Which results in the following:
Produce Name Type Search For
Apple Fruit Fruit
Broccoli Vegetable Apple, Orange
Carrot Vegetable
Orange Fruit
You could use a dynamic named range instead of simply =Z5 to make sure you're always getting the last cell in your helper column so that your data can grow or shrink and you'll still get the correct result. But now you can change the contents of cell D2 from Fruit to be Vegetable and now the result cell will show Broccoli, Carrot. Hopefully something like this can be adapted to your needs.
To reiterate other responses, I did not find a way to use the concatenate function on an array. However, I did find a way to concatenate the "product names" using only one array function and no so-called "helper column." Although it is rather long and tedious, I think this may add to the discussion. For one, if you are actually going to use a formula like this for some valid purpose or to overcome a specific barrier, it can be easily used via copying and pasting of the formula (that is, it is actually relatively adaptable). On the other hand, if your interest is more a curiosity, my answer may be more banal than you might like.
In my simulation of your problem, I also had two columns, but shortened the row count to 40. The leftmost column ("C") contains sequences of three letters and three numbers, while the right column ("D") contains random sequences of letters and numbers that simulate your "product names."
I used a combination of nested replace and concatenate functions. The function below is chopped to focus on the "base unit" of the agglomerated function.
Base Unit
REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),1)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),1))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),1)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),2)))=TRUE,””,
The above formula essentially looks at the first product name with a corresponding product ID with numerical sequence > 400, then replaces it with a concatenation, given that there exists another product meeting the same product ID criteria. This can be thought of as a "accumulating" concatenation, starting at the innermost parentheses. This "base unit" of the formula can be repeated to an arbitrary extent. That is, if you believe that there are anywhere from 200 to 280 products in the list meeting the product ID criteria you set, you can repeat this base code 280 times. As you see, if the formula attempts to concatenate product names that do not exist (you have 280 formula base units and only 275 products meeting the criteria), the formula self-terminates...in a sense. It actually begins to concatenate nothing over and over again until all base units are enacted. The result will be all desired product names concatenated in one cell, with a period separating each one.
Only one number changes from base-block to base-block, and that is the kth element of the SMALL array. These variables will obviously step by one in each base unit. For my test, I used 14 base units.
Complete Formula with 14 Base Units
=REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),1)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),1))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),1)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),2)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),2)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),2))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),2)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),3)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),3)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),3))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),3)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),4)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),4)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),4))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),4)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),5)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),5)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),5))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),5)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),6)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),6)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),6))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),6)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),7)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),7)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),7))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),7)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),8)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),8)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),8))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),8)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),9)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),9)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),9))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),9)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),10)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),10)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),10))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),10)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),11)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),11)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),11))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),11)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),12)))=TRUE,””,**REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),12)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),12))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),12)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),13)))=TRUE,””,REPLACE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),13)),1,LEN(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),13))),CONCATENATE(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),13)),".",IF(ISERR(INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),14)))=TRUE,””,INDEX($D$1:$D$40,SMALL(IF(VALUE(RIGHT($C$1:$C$40,3))>400,ROW($D$1:$D$40),""),14)))))))))))))))))))))))))))))))))))))))))
Obviously, if you look at the entire formula, it is pretty indecipherable. But, looking at it in terms of base units, you may see how it can be easily constructed then copied and pasted (after writing the initial base unit, it took about 2 minutes to put it all together).
This is a VBA-free solution using Get&Transform in Excel 2016 or the Power Query Add-In for versions before:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
ExtractLast3Digits = Table.AddColumn(Source, "Value", each Text.End([ProductID],3)),
ChangeToNumber = Table.TransformColumnTypes(ExtractLast3Digits,{{"Value", type number}}),
FilterAbove400 = Table.SelectRows(ChangeToNumber, each [Value] > 400),
Concatenate = Text.Combine(FilterAbove400[ProductName])
in
Concatenate
You can perform all sorts of text manipulation on the “array-output” (Step “FilterAbove400”), in this example I’ve just concatenated without separators as I understood your request.
It takes your input data that should be in table-form and named “Table1” in the 1st step (Source).
Link to file with solution: https://www.dropbox.com/s/utsraj0bec5ewqk/SE_ConvertArrayFormulasTextResult.xlsx?dl=0
You can create your own aggregate function to handle the results of a formula array. It does require a little VBA... but it's not difficult. This will allow you to do all kinds of string manipulation or numerical analysis on arrays of values.
To do your concatenation function, open up a VBA code window and create a new module by right clicking on the project -> insert -> new module. Double click the new module and insert this code to create the function that will concatenate an array into one large string:
Function ConcatenateArray(ParamArray Nums() As Variant) As Variant
Dim BigString As String
Dim N As Long
Dim A() As Variant
Let A = Nums(0)
BigString = ""
For N = LBound(A) To UBound(A)
BigString = BigString & A(N, 1)
Next
ConcatenateArray = BigString
End Function
Then change your array formula in the cell to:
=ConcatenateArray(IF(VALUE(RIGHT($A$1:$A$500,3))>400,$A$1:$A$500,""))
Of course you have to hit CTRL + SHIFT + ENTER instead of just ENTER to confirm the cell as an array formula.
I would try to address the several question raised in this post:
how to get a formula to work which takes an array of text results, and
either converts it into a 'usable range' [so it can be plugged into
Concatenate above],
Even if the first part of this question is feasible, the last part (i.e. "[so it can be plugged into Concatenate above]" is not possible as the CONCATENATE function does not take ranges as argument.
or can be manipulated with text arguments immediately [such as mid,
search, substitute, etc.]? Right now the only method I can see would
be using example 3 above, and then going further and saying, for
example, Concatenate(C1,C2,C3...C10).
That’s certainly one method, but please give a try to this:
Let's start from this:
Now assume that I want to take the results in Example 3, and perform
some text manipulation on them. For example, assume I want to
concatenate them all into a single string of text.
But first let’s assume the following:
-. Data range is located at D10:F510 and includes fields: Product, Product, Sales and Product Name (Selection)*
*used to list results from formula in example 3
.- Data contains 23 records complying with the criteria defined in example 1 (see Fig. 1)
.- Value 400 is enter in cell E4 to ease modifications to the criteria instead of hard code in the formulas (see Fig. 3).
Fig. 1
Now, in order to generate an Array with the concatenated results and to post it a usable range, let’s apply a minor modification to the formula in example 3. Enter this FormulaArray in G11 and copy till last record (not just 10 lines)
=TRIM(CONCATENATE(
IF(ROW(G11)-ROW(G$11)+1=1,"",G10)," ",
IFERROR(INDEX($E$11:$E$510,
SMALL(IF(VALUE(RIGHT($D$11:$D$510,3))>$E$4,ROW($D$11:$D$510)-ROW($D$11)+1,""),
ROW(G11)-ROW(G$11)+1)),"")))
Fig. 2
The in the Summary section located at D4:E8 we have the results from examples 1 & 2 and the Concatenated results with the list of selected products (see Fig. 3). Enter this formula in E8 (suggest to increase the row height to the max of 409 and Wrap Text to true)
=INDEX($M$11:$M$510,1+MAX(ROW($M$11:$M$510))-ROW($D$11))
Fig. 3
As regards this question:
Is there a way to access multiple text elements from a 'Formula Array
result', without individually selecting (eg: with INDEX)?
On this particular case (i.e. concatenation of array elements) I would apply a different perspective and generate the array with concatenated results then to pick the needed element, even if the use of INDEX is required.
Last I would like to make a minor note about these formulas:
Example 2:
=INDEX(B1:B500,MIN(IF(VALUE(RIGHT(A1:A500,3))>400,ROW(A1:A500),"")))
If the data range does not start at Row 1 use this formula instead:
=INDEX($E$11:$E$510,MIN(IF(VALUE(RIGHT($D$11:$D$510,3))>400,
1+ROW($D$11:$D$510)-ROW($D$11),"")))
Example 3:
=INDEX($B$1:$B$500,SMALL(IF(VALUE(RIGHT($A$1:$A$500,3))>400,ROW($A$1:$A$500),""),ROW()))
If the data range does not start at Row 1 use this formula instead:
=IFERROR(INDEX($E$11:$E$510,
SMALL(IF(VALUE(RIGHT($D$11:$D$510,3))>$E$4,
1+ROW($D$11:$D$510)-ROW($D$11),""),
1+ROW()-ROW($K$11))),"")

Excel: Formulas for converting data among column / row / matrix

Are there formulas to convert data in a column to a matrix or to a row?
And to convert from/to other combinations?
What about an even more complex case: reshape a matrix of width W to width N*W?
There are a few similar or related questions.
I have answered some of them, marked with *.
I keep updating this list, as new similar (or equal) questions are added:
Formatting Data: Columns to Rows *
Move content from 1 column to 3 columns *
how to split one column into two columns base on conditions in EXCEL *
writing a macro to transpose 3 columns into 1 row
Excel VBA transpose with characters
Mathematical transpose in excel
How do transform a "matrix"-table to one line for each entry in excel
Convert columns with multiple rows of data to rows with multiple columns in Excel.
How to use VBA to reshape data in excel *
Sorting three columns into six, sorted horizontally by surname using excel *
divide data in one column into more column in excel
Move data from multiple columns into single row *
Some of the answers appear to be "upgradeable" to something more encompassing.
Is that possible?
Sample formats to convert from/to are:
Column
1
2
3
4
5
6
7
...
Row
1 2 3 4 5 6 7 ...
Matrix (with a span of 4 columns here)
1 2 3 4
5 6 7 8
...
The idea is to give here something that can likely be used with minor adaptations to the questions listed above, which may also serve as a reference for future related questions.
The essential functions to be used are INDEX or OFFSET. The pros and cons of each one will be given after explicit examples, with reference to the figure. It shows several ranges with their defined names (in italics in the following).
All defined names can be replaced by direct absolute references to the corresponding cells.
1. Column to matrix
The span (in C1) gives the number of columns. Then matrix_data_top_left (D1 here) contains
=INDEX(col_data,(ROW()-ROW(matrix_data_top_left))*span+(COLUMN()-COLUMN(matrix_data_top_left)+1),1)
which is then copied into the rest of matrix_data.
Note that copying also into D5 gives an error, since the resulting formula refers to a cell outside col_data (A1:A16).
The same result is obtained in matrix_data2_top_left (I1) with
=OFFSET(col_data_top,(ROW()-ROW(matrix_data2_top_left))*span+(COLUMN()-COLUMN(matrix_data2_top_left)),0)
and copying similarly into matrix_data2.
Note that copying also into I5 returns 0, not an error.
OFFSET has the advantage of requiring only one cell to be used as a base reference (col_data_top), so extending the source data range with further data does not need redefining the source data range in the formula, one has only to copy-paste into an extended target range.
On the other hand, extending the source data range using INDEX requires first updating it in the formula (changing the range if used explicitly), and then copy-paste into an extended target range. Using a defined name is more versatile for this purpose, as redefining col_data suffices here (and it can be done after extending the target range).
Due to this same property, INDEX provides a kind of automatic bounds checking on the source range, which OFFSET does not.
2. Matrix to column
col_data2_top contains
=INDEX(matrix_data2,INT((ROW()-ROW(col_data2_top))/span)+1,MOD(ROW()-ROW(col_data2_top),span)+1)
and col_data3_top
=OFFSET(matrix_data2_top_left,INT((ROW()-ROW(col_data3_top))/span),MOD(ROW()-ROW(col_data3_top),span))
Both formulas are copied downwards.
The same differences between INDEX and OFFSET exist.
3. Matrix to row
Since OFFSET does not give errors, the remaining formulas will use it. Adapting for INDEX along the lines shown above is easy.
row_data_left contains
=OFFSET(matrix_data_top_left,INT((COLUMN()-COLUMN(row_data_left))/span),MOD(COLUMN()-COLUMN(row_data_left),span))
then copied to the right.
4. Column to row
row_data2_left contains
=OFFSET(col_data_top,COLUMN()-COLUMN(row_data2_left),0)
again copied to the right.
PS: The formula =TRANSPOSE(... works for this case, and it should be entered as an array formula (with ctrl+shift+enter). Nevertheles, it might be desirable to avoid array formulas.
5/6. Row to column/matrix
It is very easy to obtain along these lines.
E.g., col_data_top contains
=OFFSET(row_data_left,0,ROW()-ROW(col_data_top))
and copy down.
7. Matrix transpose
To get in matrix_data3 (not shown in the fig.) the transpose of matrix_data2, one only needs to use matrix_data3_top_left, with the formula
=OFFSET(matrix_data2_top_left,COLUMN()-COLUMN(matrix_data3_top_left),ROW()-ROW(matrix_data3_top_left))
and copied to a suitable target range.
8. Matrix reshape
We want to reshape a matrix into a wider one:
matrix_data4, with N4 rows and M4 columns (width4), into
matrix_data5, with N5=N4/R rows and M5=M4xR columns (width5), with R (rep5) the number of repeats
(matrices not shown in the fig.) Then use
=OFFSET(matrix_data4_top_left,(ROW()-ROW(matrix_data5_top_left))*rep5+INT((COLUMN()-COLUMN(matrix_data5_top_left))/width4),MOD((COLUMN()-COLUMN(matrix_data5_top_left)),width4))
Now we want to reshape a matrix into a narrower one:
matrix_data4, with N4 rows and M4 columns (width4), into
matrix_data6, with N6=N4xS rows and M6=M4/S columns (width6), with S (split6) the number of splits
(matrices not shown in the fig.) Then use
=OFFSET(matrix_data4_top_left,INT((ROW()-ROW(matrix_data6_top_left))/split6),MOD((ROW()-ROW(matrix_data6_top_left)),split6)*width4+(COLUMN()-COLUMN(matrix_data6_top_left)))

Resources