I'm creating a set of formulas to analyze different sets of json data. I would like to show the uniqueness for each field in the dataset and the top 3 values per field. The json data is pasted on one of the sheets, and the results of my analyses are shown on a different sheet.
An example of some arbitrary raw data:
For this dataset I can create the following formulas (all similar coloured cells are matrix formulas):
Cell A1 contains a formula that dynamically returns all headers (yellow). If the pasted data contains more fields, this list expands automatically. The pink area also grows or shrinks based on the amount of records and fields in the raw data.
What I would like to know is how to setup the following formulas:
Row 2: Return if the values are either all unique, or how many variations are there within each column. I allready have the formula for a single column, but I would like a matrix formula so that it automatically grows or shrinks as well.
Row 3 to 5: Return the top 3 of values within each column.
An example of the header formula (yellow):
=LET(SUB,INDIRECT("A8:"&ADDRESS(8,number_of_fields)),SUBSTITUTE(SUBSTRING(SUB,1,FIND(":",SUB)-1),"""","")
(formula translated from dutch syntax)
I know how to manually copy the formulas over, but I'm sure it's possible to convert this into a matrix formula. For example, is there a function like Repeat, but for formulas repeating for x amount of cells?
Edit after answer: Getting close! The top 3 is almost working as intended. The answer below creates the following result on a more complex dataset:
It sometimes leaves a cell empty in the top 3 for that column. Preferably the top 3 values bubble up to the top, where it populates row 2 and 3 if the column only contains 2 variations.
Maybe a little too literal, but the following formula will spill the top 3 and the splitted data as shown in the picture
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
IFERROR(VSTACK(header,_top3,"","",split),""))
split is all data (below),
_top3 is the top 3 of the frequency of the text per column.
You may only need the _top3 data though..
If I'm not mistaken, this would be the Dutch variant:
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
(I'm Dutch, but I'm not familiar with the Dutch equivalents of the newer functions, since I work with English version and support is contradicting in some times:
NEMEN might be TAKE, since it's listed as NEMEN here https://support.microsoft.com/nl-nl/office/excel-functies-alfabetisch-b3944572-255d-4efb-bb96-c6d90033e188#bm14, but if you click for it, it shows explanation for TAKE in Dutch (https://support.microsoft.com/nl-nl/office/take-functie-25382ff1-5da1-4f78-ab43-f33bd2e4e003) ).
Edit:
To "drop" the trailing boolean column you can add another condition to DROP (WEGLATEN):
WEGLATEN([data],1,-1) this means dropping the first row of the data (condition 1) and it's last column (condition -1):
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1;-1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
And to cope with columns where there's less than 3 top ranked values:
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
_top3minus,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,FILTER(INDEX(_top3,,b),INDEX(_top3,,b)<>"")))),""),,1),
IFERROR(VSTACK(header,_top3minus,"","",split),""))
I have a data set which I need to convert to longform, so I'd be able to use it in a data analysis program (R). The format is standardised for each table so I'm wondering if there is a way to have excel to transpose the data for me.
Thanks in advance for the help.
Data set
Longform
If you have to do this regularly for a lot of data, writing a macro to loop through everything would be best. A manual workaround that is still quicker for a lot of data is to create a set of formulas that converts all data from one point at one person's place into 8 lines of longform data. Then by changing a reference you can re-use these formulas for every point at every person's place:
Your first 4 columns are manual: Location, Point, Quarter, Type. They have fixed values for every 8 rows. Enter them manually for one data point, they'll all get copied later.
Then have a 5th working column that records the location of an anchor point for every set of data at a point at a persons place. For this example, I'm assuming you have a "NW" value in cell B3 on a sheet called "Data". In your 5th table column, in first row only (Cell E2) put in the text "Data!B3" without an equal sign.
The remaining columns for all 8 rows all refer to this anchor point using the OFFSET and INDIRECT functions. For each column in your data for the first 8 rows, refer to each value in the data set based on their relative position from the anchor point:
The first data column is the NW Shrub Distance value, which is offset by 1 rows and 1 columns:
=OFFSET(INDIRECT(E2),1,1)
The second data column is NW Shrub Height, which is offset by 1 row and 2 columns:
=OFFSET(INDIRECT(E2),1,2)
Continue through the rest of the columns on that row. Then go to the next row in your table. The first data column there is the NE Shrub Distance, which is offset by 7 rows and 1 column from the anchor NW cell:
=OFFSET(INDIRECT(E2),7,1)
Then the second data column in the 2nd row is the NE Shrub Height, which is offset by 7 rows and 2 columns from the anchor NW cell:
=OFFSET(INDIRECT(E2),7,2)
Prepare these formulas for all columns for all 8 rows. It will take a little while, but after you're done, you can then just copy the entire chunk and paste it below the first chunk. Update the one anchor value for the whole chunk from Data!B3 to the NW location in the next data chunk, eg Data!H3, and all formulas will now pull the values from all cells relative to new anchor point.
Repeat this for every data chunk and you'll have it in longform fairly quickly.
As seen in the picture I have 5 sets of 2's in one column.
I would like it so that each set is in its own column.
Is there a way to do that?
I tried text to columns, but it did not work.
General solution
Imagine I have a vertical array starting in cell B2, which I want to separate into N stacked columns. I will place these columns from cell E4, as the picture indicates.
The code which achieves what I want is:
+OFFSET($B$2,(ROW()-ROW($E$4))*N+(COLUMN()-COLUMN($E$4)),0)
Replace N with your desired number (and the origin and destination cell with your particular values, B2 and E4 in this example), and expand the formula vertically and horizontally to form your desired matrix of N columns. For the case of N=3, you get:
(PS: if your array is horizontal, use transpose to transform to vertical. You can then transpose the resulting matrix, to get the final result.)
Explanation
The logic is simple. The function OFFSET has three compulsory inputs. The first one is the first point of your array you want to transform (in the example above, $B$2. The point you select has an index of 0, the one below an index of 1, etc. So, what you want is to put these ordered index into a matrix form, as shown below (for the case of N=3):
The rule to move these indexes is given in the second entry of the OFFSET function. This is basically a formula that calculates a sequence 0, 1, 2, 3 ... using some fixed values (the number of the row and columns of the first cell where you are putting the result, ROW($E$4) and COLUMN($E$4), which are equal to 4 and 5 respectively), and the variable values of the cell where you are placing the number (ROW() and COLUMN()). The formula computes the difference between actual row and reference row number, scale it by N, and adds any difference between actual and reference column. This formula gives the desired series 0, 1, 2, 3... for our desired output matrix.
Finally, the last item of OFFSET is equal to zero, since we are transforming with a vertical column of data, so no horizontal offset is needed.
You can do it with e.g. formula; enter this to C1 and fill down and right:
=OFFSET($A$1,ROW()-1+(COLUMN()-3)*6,0)
Take the total cells, dived it by 3 and cut and paste. I wasted a 30 mins trying all the solutions offered out there.
I gave up and now my project is complete. Only took about 15 seconds.
To split one column into multiple columns with column first order, in other words, without transpose, we can modify the formula as shown in https://www.extendoffice.com/documents/excel/3132-excel-convert-vector-to-matrix.html, which is the solution for row first order, i.e., with transpose, exchange the roles of ROW() and COLUMN(), example code:
=OFFSET($A$1:$A$10494,ROW()-ROW($B$1)+((COLUMN()-COLUMN($B$1))*(ROWS($A$1:$A$10494)/18)),0,1,1)
Here $a1:$a$10494 is source, $b$1 is destination, 18 is columns numbers to split into.
This can be used to get back the table structure of %debug print output in pdb, for example, which will split the output into narrow bands.
Are there formulas to convert data in a column to a matrix or to a row?
And to convert from/to other combinations?
What about an even more complex case: reshape a matrix of width W to width N*W?
There are a few similar or related questions.
I have answered some of them, marked with *.
I keep updating this list, as new similar (or equal) questions are added:
Formatting Data: Columns to Rows *
Move content from 1 column to 3 columns *
how to split one column into two columns base on conditions in EXCEL *
writing a macro to transpose 3 columns into 1 row
Excel VBA transpose with characters
Mathematical transpose in excel
How do transform a "matrix"-table to one line for each entry in excel
Convert columns with multiple rows of data to rows with multiple columns in Excel.
How to use VBA to reshape data in excel *
Sorting three columns into six, sorted horizontally by surname using excel *
divide data in one column into more column in excel
Move data from multiple columns into single row *
Some of the answers appear to be "upgradeable" to something more encompassing.
Is that possible?
Sample formats to convert from/to are:
Column
1
2
3
4
5
6
7
...
Row
1 2 3 4 5 6 7 ...
Matrix (with a span of 4 columns here)
1 2 3 4
5 6 7 8
...
The idea is to give here something that can likely be used with minor adaptations to the questions listed above, which may also serve as a reference for future related questions.
The essential functions to be used are INDEX or OFFSET. The pros and cons of each one will be given after explicit examples, with reference to the figure. It shows several ranges with their defined names (in italics in the following).
All defined names can be replaced by direct absolute references to the corresponding cells.
1. Column to matrix
The span (in C1) gives the number of columns. Then matrix_data_top_left (D1 here) contains
=INDEX(col_data,(ROW()-ROW(matrix_data_top_left))*span+(COLUMN()-COLUMN(matrix_data_top_left)+1),1)
which is then copied into the rest of matrix_data.
Note that copying also into D5 gives an error, since the resulting formula refers to a cell outside col_data (A1:A16).
The same result is obtained in matrix_data2_top_left (I1) with
=OFFSET(col_data_top,(ROW()-ROW(matrix_data2_top_left))*span+(COLUMN()-COLUMN(matrix_data2_top_left)),0)
and copying similarly into matrix_data2.
Note that copying also into I5 returns 0, not an error.
OFFSET has the advantage of requiring only one cell to be used as a base reference (col_data_top), so extending the source data range with further data does not need redefining the source data range in the formula, one has only to copy-paste into an extended target range.
On the other hand, extending the source data range using INDEX requires first updating it in the formula (changing the range if used explicitly), and then copy-paste into an extended target range. Using a defined name is more versatile for this purpose, as redefining col_data suffices here (and it can be done after extending the target range).
Due to this same property, INDEX provides a kind of automatic bounds checking on the source range, which OFFSET does not.
2. Matrix to column
col_data2_top contains
=INDEX(matrix_data2,INT((ROW()-ROW(col_data2_top))/span)+1,MOD(ROW()-ROW(col_data2_top),span)+1)
and col_data3_top
=OFFSET(matrix_data2_top_left,INT((ROW()-ROW(col_data3_top))/span),MOD(ROW()-ROW(col_data3_top),span))
Both formulas are copied downwards.
The same differences between INDEX and OFFSET exist.
3. Matrix to row
Since OFFSET does not give errors, the remaining formulas will use it. Adapting for INDEX along the lines shown above is easy.
row_data_left contains
=OFFSET(matrix_data_top_left,INT((COLUMN()-COLUMN(row_data_left))/span),MOD(COLUMN()-COLUMN(row_data_left),span))
then copied to the right.
4. Column to row
row_data2_left contains
=OFFSET(col_data_top,COLUMN()-COLUMN(row_data2_left),0)
again copied to the right.
PS: The formula =TRANSPOSE(... works for this case, and it should be entered as an array formula (with ctrl+shift+enter). Nevertheles, it might be desirable to avoid array formulas.
5/6. Row to column/matrix
It is very easy to obtain along these lines.
E.g., col_data_top contains
=OFFSET(row_data_left,0,ROW()-ROW(col_data_top))
and copy down.
7. Matrix transpose
To get in matrix_data3 (not shown in the fig.) the transpose of matrix_data2, one only needs to use matrix_data3_top_left, with the formula
=OFFSET(matrix_data2_top_left,COLUMN()-COLUMN(matrix_data3_top_left),ROW()-ROW(matrix_data3_top_left))
and copied to a suitable target range.
8. Matrix reshape
We want to reshape a matrix into a wider one:
matrix_data4, with N4 rows and M4 columns (width4), into
matrix_data5, with N5=N4/R rows and M5=M4xR columns (width5), with R (rep5) the number of repeats
(matrices not shown in the fig.) Then use
=OFFSET(matrix_data4_top_left,(ROW()-ROW(matrix_data5_top_left))*rep5+INT((COLUMN()-COLUMN(matrix_data5_top_left))/width4),MOD((COLUMN()-COLUMN(matrix_data5_top_left)),width4))
Now we want to reshape a matrix into a narrower one:
matrix_data4, with N4 rows and M4 columns (width4), into
matrix_data6, with N6=N4xS rows and M6=M4/S columns (width6), with S (split6) the number of splits
(matrices not shown in the fig.) Then use
=OFFSET(matrix_data4_top_left,INT((ROW()-ROW(matrix_data6_top_left))/split6),MOD((ROW()-ROW(matrix_data6_top_left)),split6)*width4+(COLUMN()-COLUMN(matrix_data6_top_left)))