I have two sets of data in excel, set 1 is the raw data, and set 2 is a bridge table. The desired output is also added. How should I prepare for this formula.
set 1:
set 2:
output expected:
Here, a solution that assumes a variable number of headers and no specific pattern in the column names. Assumed no Excel version constraints as per tags listed in the question. In cell H1, put the following formula which spills the entire result all at once:
=LET(in, A1:F5, lk, A8:B12, header, DROP(TAKE(in,1),,1), A, TAKE(lk,,1),
B, DROP(lk,,1), data, DROP(in,1,1), REDUCE(TAKE(in,,1), UNIQUE(B),
LAMBDA(ac,bb, LET(f, FILTER(A, B=bb),values, CHOOSECOLS(data,XMATCH(f, header)),
sum, MMULT(values, SEQUENCE(ROWS(f),,1,0)), HSTACK(ac, VSTACK(bb, sum))))))
Here it the output:
We use LET function with two input ranges only: in, lk, so the rest of the names defined depend on such range names. It makes the formula easy to maintain and to adapt to your real scenario.
Using DROP and TAKE we extract each portion of the input ranges: header, data, A, B (columns from the second table). We use REDUCE/HSTACK pattern to concatenate the column of the result on each iteration. Check my answer from the question: how to transform a table in Excel from vertical to horizontal but with different length for more information.
We iterate by unique values of B and for each value (bb) we select the column A values (f). We use XMATCH to select the corresponding index columns from header (it doesn't include the date column). We use CHOOSECOOLS to select the corresponding columns from data (values). Now we need to sum by column, and we use MMULT for that. The result is in sum name. Finally, we use HSTACK to concatenate the selected columns one each iteration, including as header the unique values from B.
Note: Instead of MMULT function, you can use the following array function, it is a matter of personal preferences:
BYROW(values, LAMBDA(x, sum(x)))
You could try SUMIFS with the wild card character for each row. For example, for the first column, put the following formula and drag it down.
=SUMIFS($B2:$F2,$B$1:$F$1,"=A*")
Then do the same thing for the other columns, e.g. for column B:
Related
Here is an example of the data I'm trying to organize:
I'm looking for a way to automatically see the top 3 categories (column) for each Name# (row). The size of the category is determined by the number below the category.
Ideally, I'd also like to see a percentage breakdown (from the total) for each category. For example, in row "Name3" 2 categories make up a significantly larger portion of the total values. However, without this percentage breakdown, the 3 top values would seem to be comparable, when they are in fact, not.
Interested to see how this would all work with duplicate numbers, too.
I've tried Excel's rank function, but this doesn't tell me the categories that have the 3 largest sizes, just the 3 highest values.
With Office 365:
=FILTER(SORTBY($B$1:$H$1,B2:H2,-1),SORT(B2:H2,1,-1,TRUE)>=LARGE(B2:H2,3))
And copy down.
If there are ties it will expand the results to include it. It finds the third highest value and returns everything that is equal to or greater than it.
This approach spills all the results at once (array version). In cell J2, you can put the following formula:
=LET(D, A1:H5, A, TAKE(D,,1), DROP(REDUCE("", DROP(A,1), LAMBDA(ac,aa,
VSTACK(ac, TAKE(SORT(DROP(FILTER(D, (A=aa) + (A="")),,1),2,-1,1),1,3)))),1))
It assumes as per input data the cell A1 is empty (if not it can be adjusted accordingly). Here is the output:
An alternative that doesn't require previous assumption (but it is not really a hard one) is the following:
=LET(names, A2:A5, Data, B2:H5, colors, B1:H1, DROP(REDUCE("", names,
LAMBDA(ac,n, VSTACK(ac, TAKE(SORT(VSTACK(colors, INDEX(Data, XMATCH(n,names),0))
,2,-1,TRUE),1,3)))),1))
The non-array version can be obtained from previous approach, and expand it down:
=TAKE(SORT(VSTACK($B$1:$H$1,INDEX($B$2:$H$5, XMATCH(A2,$A$2:$A$5),0)),2,-1,TRUE),1,3)
Explanation
To spill the entire solution it uses DROP/REDUCE/VSTACK pattern. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length.
For the first formula we filter for a given element of A name (aa) via FILTER the input data (D) to select rows where the name is empty (to consider the header) OR (plus (+) condition) the name is equal to aa. We remove via DROP the first column of the filter result (names column). Next we SORT by the second row (the first rows are the colors) in descending order (-1) by column (last input parameter of SORT we can use TRUE or 1). Finally, we use TAKE to take the first three columns and the first row.
For the second approach, we select the values for a given row (names equals n) and use INDEX to select the entire row (column index 0), then we form an array via VSTACK to add as first row the colors and use the similar logic as in previous approach for sorting and select the corresponding rows and column (colors).
Notes:
If you don't have VSTACK function available, then you can replace it as follow: CHOOSE({1;2}, arr1,arr2) and substitute arr1, arr2, wit the corresponding arrays.
In the second formula instead of INDEX/XMATCH you can use: DROP(FILTER(Data, names=n),,1), it is a matter of personal preference.
I have been trying and searching how to append two lists in excel to use in a formula. The lists do not exist in columns, they are created using a formula. I want to combine the two lists in a single one, not to show the values but to use the new list in a formula. I am using excel 365 (UNIQUE function). Let me replace my initial text by a real small case.
I have an excel file with 3 work sheets. Sheet1 is:
Sheet2 is:
Now I want to run some analysis in Sheet3. In my example I want to count how many unique values from column A have column B containing one of the letters 'a', 'b, 'c', or 'd'. For instance, in Sheet1, the letter 'a' appears in all rows. Column A has 3 unique values. So my result for 'a' is 3. The letter 'b' does not appear for the case where column A is '3'. Therefore the result for 'b' is '2'.
So I create a Sheet3 to show my results. The first column contains a list of letters {a, b, c, d}. I then use the formula:
=COUNT(UNIQUE(FILTER(Sheet1!$A$1:$A$100, ISNUMBER(SEARCH(A1, Sheet1!$B$1:$B$100)))))
From inside out: the SEARCH function looks in cells B1 to B100 (I can live with specifying a larger range) where is the position of the value specified in column A (of the current sheet). If it does, then SEARCH returns a number. I check if the return value is a number (ISNUMBER) and use this to filter values in column A of Sheet1. I then apply the UNIQUE function to these values and finally count them.
Then I do the same with values in Sheet2. And it works. This is the output:
Column B is the number of unique values (as specified above) from Sheet1 and Column C the same from Sheet2.
So far so good. But now I want to have the counting of unique values globally. Not for each Sheet. One cannot just add the values from column B and C, as there might be an overlap. For example, the result for 'a' should be 3, not 5.
The solution here would be to grab the two unique lists (one from Sheet1 and the other from Sheet2), join them, UNIQUE this new list, and count. How do I join them ? That is my question.
Note that this 'counting of unique values' is just an example. I might want to find the maximum, or sort them, or find only prime numbers, or the average, or the median, or something else. So I need a general approach to join the results.
I got options close to a workable thing when all the data is in the same worksheet.
Finally, note that the data size I have is not huge, but it is large (thousands of lines at the most).
Here is something you could try:
=LET(x,{"A","B","C"},y,{"D","E"},z,CHOOSE({1,2},x,y),cnt,MAX(COUNTA(x),COUNTA(y)),seq,SEQUENCE(cnt*2),final,INDEX(z,MOD(seq-1,cnt)+1,CEILING(seq/cnt,1)),FILTER(final,NOT(ISERROR(final))))
Here both 'x' and 'y' variables are placeholders for your two (vertical) arrays. In this case I used: {"A","B","C"} and {"D","E"}. Assuming you just want to place the 2nd array directly under the 1st one, the above suggestion does just that:
I have two table, this one is the initial table that contains raw data (on Sheet 2)
And the second table (on Sheet 1) contains formula based on data from first table
I use this formula to calculate the data, but as we can see on the picture, it doesn't produce right result. Could you please help me to modify the formula?
=IFERROR(INDEX(Sheet2!$E$2:$E$12,MATCH(Sheet1!$B$1&Sheet1!B$2&Sheet1!$A3,Sheet2!$C$2:$C$12&Sheet2!$B$2:$B$12&Sheet2!$D$2:$D$12,0)),"")
First the auxiliar column, using the concatenate operator & :
Then the formula would be:
=VLOOKUP(B$2&$E$1&$A3;Sheet2!$A:$G;6;0)
Change 6 for 7 if you want the description instead of Activity.
Please try this formula. It should go into cell Sheet1!B3 where it must be confirmed with Ctl+Shift+Enter because it's an array formula. (017)
=IFERROR(INDEX(Table,MATCH(1,(INDEX(Table,,3)=$A$1)*(INDEX(Table,,2)=B$2)*(INDEX(Table,,4)=$A3),0),5),"")
In preparation of this formula to work you need to set up a named range by the name of "Table" which comprises of Sheet2!A2:Fxx. Better set this range up dynamically so that it expands as you add more data but you can also declare it as Sheet2!A2:F1000 where 1000 is a number of rows you expect never to need.
This table has 6 columns, A:F which I intentionally made to include column A, which you don't need so that range columns and sheet columns are identical. Table,,3 simply defines the 3rd column. You can replace it with Sheet2!$C$2:$C$1000. If you do, make sure that all your ranges have identical sizes.
The 5 near the end of the formula, at ,0),5),"") identifies the 5th column of the range Table from which the result is returned if the 3 criteria match. Change this number to 6 to return the result from column F or to 1 if you ever need the value from column A.
I am not into Excel and I have this problem trying to sum the values of 2 different column and put this result value into a cell.
So basically I have the D column containing 2 values (at the moment only 2 but will grows without a specific limit, I have to sum all the values in this column). These value are decimal values (in my example are: 0,3136322400 and 0,1000000000).
Then I have an I column containing the same type of value (at the moment only one but also the values in this column can grow without a specific limit...in my example at this time I have this value −0,335305)
Then I have the K3 cell where I have to put the sum of all the valus into the D column and all the values into the I column (following my example it will contain the result of this sum: 0,3136322400 + 0,1000000000 −0,335305.
Following a tutorial I tried to set this simple forumla in the K3 cell:
=SUM(A:I)
The problem is that in this cell now I am not obtaining the expected result (that is 0.07832724) but I am obtaining this value: 129236,1636322400.
It is very strange...I think that maybe it can depend by the fact that the D and the I column doesn't contain only number but both have a textual "heder" (that is the string "QUANTITY" for both the cells). So I think that maybe it is adding also the number conversion of this string (but I am absolutly not sure about this assertion).
So how can I handle this type of situation?
Can I do one of these 2 things:
1) Adding the column values starting from a specific starting cell in the column (for example: sum all the values under a cell without specify a down limit).
2) Exclude in some way the "header" cells from my sum so the textual values are not considered in my sum.
What could be a smart solution for my problem? How can I fix this issue?
The sum function can take several arguments.
=sum(d2:d10000, i2:I10,000, more columns )
This should remove the header from the calculation.
If you turn your data into an Excel Table (Insert > Table), you can use structured referencing to address a table column, excluding the header.
=SUM(Table1[This Header],Table1[That Header])
Then you don't need to reference whole columns. If you add new data to the table, the formula will take that into account.
I have have two data sets which I need to compare. There is a column that is the common identifier between the two, but the 2nd data set, which is updated, has more than the 1st data set.
Here is how I extracted the data sets that I need:
What I'm trying to do is use columns D/I as the key, then see if columns C/H match. If they do not match I want that data returned or just highlighted.
I'm not very familiar with Excel, but I see the issue, in addition to what I described above, as being since the 2nd data set has more rows, the it will return those as highlighted, which it doesn't need to.
Any help would be great!
If I understood your problem correctly, you may try
=C2=INDEX(H:H,MATCH(D2,I:I,0))
and extend / drag this formula to check for more values in D column.
This formula results like this:
This formula compares values in D with values in I column and then compares corresponding C and H values and returns True when they match otherwise returns False.
In other words: This formula checks if a pair of Cx-Dx exactly matches with pair Hy-Iy where x and y are not necessarily equal.
E.g. (refer above screenshot)
C2-D2 matches with H2-I2
C3-D3 matches with H4-I4
C4-D4 matches with H3-I3
and C5-D5 matches with no pair in H:I range.
You can also use COUNTIFS either in a separate column or conditional formatting:-
=COUNTIFS($I:$I,$D2,$H:$H,"<>"&$C2)
to highlight the first two columns and
=COUNTIFS($D:$D,$I2,$C:$C,"<>"&$H2)
to highlight the second two columns.