I'm creating a set of formulas to analyze different sets of json data. I would like to show the uniqueness for each field in the dataset and the top 3 values per field. The json data is pasted on one of the sheets, and the results of my analyses are shown on a different sheet.
An example of some arbitrary raw data:
For this dataset I can create the following formulas (all similar coloured cells are matrix formulas):
Cell A1 contains a formula that dynamically returns all headers (yellow). If the pasted data contains more fields, this list expands automatically. The pink area also grows or shrinks based on the amount of records and fields in the raw data.
What I would like to know is how to setup the following formulas:
Row 2: Return if the values are either all unique, or how many variations are there within each column. I allready have the formula for a single column, but I would like a matrix formula so that it automatically grows or shrinks as well.
Row 3 to 5: Return the top 3 of values within each column.
An example of the header formula (yellow):
=LET(SUB,INDIRECT("A8:"&ADDRESS(8,number_of_fields)),SUBSTITUTE(SUBSTRING(SUB,1,FIND(":",SUB)-1),"""","")
(formula translated from dutch syntax)
I know how to manually copy the formulas over, but I'm sure it's possible to convert this into a matrix formula. For example, is there a function like Repeat, but for formulas repeating for x amount of cells?
Edit after answer: Getting close! The top 3 is almost working as intended. The answer below creates the following result on a more complex dataset:
It sometimes leaves a cell empty in the top 3 for that column. Preferably the top 3 values bubble up to the top, where it populates row 2 and 3 if the column only contains 2 variations.
Maybe a little too literal, but the following formula will spill the top 3 and the splitted data as shown in the picture
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
IFERROR(VSTACK(header,_top3,"","",split),""))
split is all data (below),
_top3 is the top 3 of the frequency of the text per column.
You may only need the _top3 data though..
If I'm not mistaken, this would be the Dutch variant:
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
(I'm Dutch, but I'm not familiar with the Dutch equivalents of the newer functions, since I work with English version and support is contradicting in some times:
NEMEN might be TAKE, since it's listed as NEMEN here https://support.microsoft.com/nl-nl/office/excel-functies-alfabetisch-b3944572-255d-4efb-bb96-c6d90033e188#bm14, but if you click for it, it shows explanation for TAKE in Dutch (https://support.microsoft.com/nl-nl/office/take-functie-25382ff1-5da1-4f78-ab43-f33bd2e4e003) ).
Edit:
To "drop" the trailing boolean column you can add another condition to DROP (WEGLATEN):
WEGLATEN([data],1,-1) this means dropping the first row of the data (condition 1) and it's last column (condition -1):
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1;-1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
And to cope with columns where there's less than 3 top ranked values:
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
_top3minus,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,FILTER(INDEX(_top3,,b),INDEX(_top3,,b)<>"")))),""),,1),
IFERROR(VSTACK(header,_top3minus,"","",split),""))
I'm trying to use this formula:
=SUMPRODUCT(VLOOKUP(B$4778,$D$4:$DC$4623,{4,5},0))
It works fine but I'd like to try to use a variable for the {4,5} portion of the formula (columns in the array to be summed) as the formula needs to change based on sheet inputs before this formula.
I have cells on the sheet that are to be used to set the initial and final columns to be searched (likely 10 columns, but the 10 columns would have to be selected from 90 some columns available).The columns are populations related to each age. So, if I need population of those aged 10 through 15, I'd need to sum up 5 columns. If 20-25, need to sum up 5 different columns.
I tried to use the Columns function but it didn't seem to work for me.
The columns are selected by users entering in cells the upper and lower limits of the search range and then I convert those values to the corresponding numerical column value.
So if they select 5 as lower and 10 as upper limit, I know I have to add 7 to get the correct data column on the data page (column 12) and likewise for upper (column 17).
The entire possible area to search is $D$4:$DC$4623. So, in the formula, if I wrote it out long way it would be:
=SUMPRODUCT(VLOOKUP(B$4778,$D$4:$DC$4623,{12,13,14,15,16,17},0))
I'd prefer to write it out using variables, something like this:
=SUMPRODUCT(VLOOKUP(B$4778,$D$4:$DC$4623,{L:U},0))
Where variable L would be 12 and variable U would be 17.
Can anyone suggest a way to write the formula?
use this array formula:
=SUM(VLOOKUP(B$4778,$D$4:$DC$4623,ROW(INDIRECT(D5 & ":" & E5)),FALSE))
Where D5 is the Lower and E5 would be the upper.
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then excel will put {} around the formula.
Or better yet use this non array formula:
=SUM(INDEX($D$4:$DC$4623,MATCH(B$4778,$D$4:$D$4623,0),D5):INDEX($D$4:$DC$4623,MATCH(B$4778,$D$4:$D$4623,0),E5))
I have a sheet with rows of data, with many columns. I am looking for help on a formula that will extract the sum of the smallest 3 numbers in a row based on the last 5 values entered. Note that not all the rows will have values for each column, so the first value found on each row will may be found in a different column.
To determine the sum of the smallest 3 I am using the formula =SUM(SMALL(B3:R3,{1,2,3})), Unfortunately, that formula is looking at the entire range. Again, I am looking for help that with a formula that will select only the last 5 values posted.
Here is simple example. The results for each line show the totals that should be determined. Again, it needs to look for the sum of the smallest 3 based on the last 5 posted (in the example below the range would be col. 1 thru 10, with col 10 having the latest postings).
Ex.
1.....2.....3......4......5.....6.....7.....8......9.....10...... Result
31.........44....51....36..........44...34....36....38.......106 (34+36+34)
35..31...44...40.....38...52..........42....37...............115 (37+38+40)
Hope this is understandable. I am looking for a formula solution vs a VBA macro solution because of my users. Thanks for any help!!
Now that you clarified the question, I have an answer for you. This is fairly ugly but it gets the job done. You might want to hide the columns with the intermediate results - or you could get adventurous and "nest" the expressions. This makes it really hard to understand / debug though. If there's a smarter way to do this I am always open to learning.
Assuming you have the data in columns A through J, starting in row 2, put the following in cell L2:P2:
=MATCH(9999, A2:J2,1)
=MATCH(9999,OFFSET($A2,0,0, 1, L2-1)) ... copy this by dragging right to the next 2 columns
=MATCH(9999,OFFSET($A2,0,0, 1, M2-1))
=MATCH(9999,OFFSET($A2,0,0, 1, N2-1))
=MATCH(9999,OFFSET($A2,0,0, 1, O2-1))
The first line finds the last cell with data in it; the next ones find the last cell "not including the last cell", and so they work backwards. The result is a number corresponding to the columns with data. For your example, this gives
10 9 8 7 5
9 8 6 5 4
Now we want to find the sum of the smallest 3 of these: put the following equation in cell Q2:
=SUM(SMALL(INDIRECT("RC["&P2-17&"]:RC["&L2-17&"]",FALSE),{1,2,3}))
Working from the inside out:
RC["&P2-17"] results in RC[-12], which is "the cell 12 to the left of this one".
That is the first of the "last five cells with data", cell E2
RC["&L2-17"] results in RC[-7], the last cell with data in this row
FALSE use "RC" rather than "A1" indexing
INDIRECT turn string into an address (in this case a range)
SMALL find the 3 smallest values in this range
SUM and add them together.
This formula did indeed give me 106, 115 for the example you provided.
I would hide columns L through P so you only see the result (and not the intermediate stuff).
I am analysing library statistics relating to loans made by particular user categories. The loan data forms the named range LoansToApril2013. Excel 2007 is quite happy for me to use an index range as the sum range in a SUMIF:
=SUMIF(INDEX(LoansToApril2013,0,3),10,INDEX(LoansToApril2013,0,4):INDEX(LoansToApril2013,0,6))
Here 10 indicates a specific user category, and this sums loans made to that group from three columns. By "index range" I'm referring to the
INDEX(LoansToApril2013,0,4):INDEX(LoansToApril2013,0,6)
sum_range value.
However, if I switch to using a SUMIFS to add further criteria, Excel returns a #VALUE error if an index range is used. It will only accept a single index.
=SUMIFS(INDEX(LoansToApril2013,0,4),INDEX(LoansToApril2013,0,3),1,INDEX(LoansToApril2013,0,1),"PTFBL")
works fine
=SUMIFS(INDEX(LoansToApril2013,0,4):INDEX(LoansToApril2013,0,6),INDEX(LoansToApril2013,0,3),1,INDEX(LoansToApril2013,0,1),"PTFBL")
returns #value, and I'm not sure why.
Interestingly,
=SUMIFS(INDEX(LoansToApril2013,0,4):INDEX(LoansToApril2013,0,4),INDEX(LoansToApril2013,0,3),1,INDEX(LoansToApril2013,0,1),"PTFBL")
is also accepted and returns the same as the first one with a single index.
I haven't been able to find any documentation or comments relating to this. Does anyone know if there is an alternative structure that would allow SUMIFS to conditionally sum index values from three columns? I'd rather not use three separate formulae and add them together, though it's possible.
The sumifs formula is modelled after an array formula and comparisons in the sumifs need to be the same size, the last one mimics a single column in the LoansToApril2013 array column 4:4 is column 4.
The second to bottom one is 3 columns wide and the comparison columns are 1 column wide causing the error.
sumifs can't do that, but sumproduct can
Example:
X 1 1 1
Y 2 2 2
Z 3 3 3
starting in A1
the formula =SUMPRODUCT((A1:A3="X")*B1:D3) gives the answer 3, and altering the value X in the formula to Y or Z changes the returned value to the appropriate sum of the lines.
Note that this will not work if you have text in the area - it will return #VALUE!
If you can't avoid the text, then you need an array formula. Using the same example, the formula would be =SUM(IF(A1:A3="X",B1:D3)), and to enter it as an array formula, you need to use CTRL+SHIFT+ENTER to enter the formula - you should notice that excel puts { } around the formula. It treats any text as zero, so it will successfully add up the numbers it finds even if you have text in one of the boxes (e.g. change one of the 1's in the example to be blah and the total will be 2 - the formula will add the two remaining 1s in the line)
The two answers above and a bit of searching allowed me to find a formula that worked. I'll put it here for posterity, because questions with no final outcome are a pain for future readers.
=SUMPRODUCT( (INDEX(LoansToApril2013,0,3)=C4) * (INDEX(LoansToApril2013,0,1)="PTFBL") * INDEX(LoansToApril2013,0,4):INDEX(LoansToApril2013,0,6))
This totals up values in columns 4-6 of the LoansToApril2013 range, where the value in column 3 equals the value in C4 (a.k.a. "the cell to the left of this one with the formula") AND the value in column 1 is "PTFBL".
Despite appearances, it isn't multiplying anything by anything else. I found an explanation on this page, but basically the asterisks are adding criteria to the function. Note that criteria are enclosed in their own brackets, while the range isn't.
If you want to use names ranges you need to use INDIRECT for the Index commands.
I used that formula to check for conditions in two columns, and then SUM the results in a table which has 12 columns for the months (the column is chosen by a helper cell which is 1 to 12 [L4]).
So you can do if:
Dept (1 column name range [C6]) = Sales [D6];
Region (1 column name range [C3]) = USA [D3];
SUM figures in the 12 column monthly named range table [E7] for that 1 single month [L4] for those people/products/line item
Just copy the formula across your report page which has columns 1-12 for the months and you get a monthly summary report with 2 conditions.
=SUMPRODUCT( (INDEX(INDIRECT($C$6),0,1)=$D$6) * (INDEX(INDIRECT($C$3),0,1)=$D$3) * INDEX(INDIRECT($E7),0,L$4))
What is the best way to find cells whose formulas refer to blank cells in Excel VBA. I'd like to delete any cell that references a blank cell.
More concretely, I have a two sheets: One sheet contains the actual values:
Product Month1 Month2 Month3
Sample1 1 3 5
Sample2 5 7 6
Sample3 3 8 2
The other is a summary view, with formulas to sum up the values, with the following formulas:
Product Month1
=values!A2 =SUM(values!B2:D2)
=values!A3 =SUM(values!B3:D3)
=values!A4 =SUM(values!B4:D4)
=values!A5 =SUM(values!B5:D5)
TOTAL =SUM(values!B:D)
Now in the previous example, the last raw refers to a blank row, namely the fifth row. Excel will show those cells as "0". Is there a mechanism to delete those cells within VBA?
Please note I prefer deleting the rows, to keep the TOTAL row close to the actual last value. Otherwise, the Total row might be distant from the rest of the values. Also, having blank cells with formulas may lead to a large Excel file.
EDIT: Clarified the question to role out the keeping the cells blank.
Is it always the last row, that could evaluate to 0?
If so u Could use a IF statement like:
=IF(SUM(values!B2:D2) > 0 ,values!A2,"") =IF(SUM(values!B2:D2),SUM(values!B2:D2),"")
No VBA needed...
I think Autofilter is the way to go here. If there's a zero in column A, I'm guessing you want to hide that whole row. You say delete, but I wonder if hide is a better way.
Put a filter on the range and for column A select everything except 0.
You could do this without having to write code to delete rows. I would use a variant of Arnoldiuss' solution.
For the Month total use:
=IF(LEN(values!A2)>0,SUM(B2:D2),"")
In this way, you can simply fill-down all the formulas and not have to worry if you reference a non-existent product.
Based on your edit i guess a pivot table fits your needs.
Add the products to the rowlabels and add the following calculated field to the values
=SUM(Month1,Month2,Month3)
Then add a value filter > 0
I would not recommend deleting rows in the "formula worksheet", for future use the series would wrecked, because of the missing references.