Appending two lists in excel - excel

I have been trying and searching how to append two lists in excel to use in a formula. The lists do not exist in columns, they are created using a formula. I want to combine the two lists in a single one, not to show the values but to use the new list in a formula. I am using excel 365 (UNIQUE function). Let me replace my initial text by a real small case.
I have an excel file with 3 work sheets. Sheet1 is:
Sheet2 is:
Now I want to run some analysis in Sheet3. In my example I want to count how many unique values from column A have column B containing one of the letters 'a', 'b, 'c', or 'd'. For instance, in Sheet1, the letter 'a' appears in all rows. Column A has 3 unique values. So my result for 'a' is 3. The letter 'b' does not appear for the case where column A is '3'. Therefore the result for 'b' is '2'.
So I create a Sheet3 to show my results. The first column contains a list of letters {a, b, c, d}. I then use the formula:
=COUNT(UNIQUE(FILTER(Sheet1!$A$1:$A$100, ISNUMBER(SEARCH(A1, Sheet1!$B$1:$B$100)))))
From inside out: the SEARCH function looks in cells B1 to B100 (I can live with specifying a larger range) where is the position of the value specified in column A (of the current sheet). If it does, then SEARCH returns a number. I check if the return value is a number (ISNUMBER) and use this to filter values in column A of Sheet1. I then apply the UNIQUE function to these values and finally count them.
Then I do the same with values in Sheet2. And it works. This is the output:
Column B is the number of unique values (as specified above) from Sheet1 and Column C the same from Sheet2.
So far so good. But now I want to have the counting of unique values globally. Not for each Sheet. One cannot just add the values from column B and C, as there might be an overlap. For example, the result for 'a' should be 3, not 5.
The solution here would be to grab the two unique lists (one from Sheet1 and the other from Sheet2), join them, UNIQUE this new list, and count. How do I join them ? That is my question.
Note that this 'counting of unique values' is just an example. I might want to find the maximum, or sort them, or find only prime numbers, or the average, or the median, or something else. So I need a general approach to join the results.
I got options close to a workable thing when all the data is in the same worksheet.
Finally, note that the data size I have is not huge, but it is large (thousands of lines at the most).

Here is something you could try:
=LET(x,{"A","B","C"},y,{"D","E"},z,CHOOSE({1,2},x,y),cnt,MAX(COUNTA(x),COUNTA(y)),seq,SEQUENCE(cnt*2),final,INDEX(z,MOD(seq-1,cnt)+1,CEILING(seq/cnt,1)),FILTER(final,NOT(ISERROR(final))))
Here both 'x' and 'y' variables are placeholders for your two (vertical) arrays. In this case I used: {"A","B","C"} and {"D","E"}. Assuming you just want to place the 2nd array directly under the 1st one, the above suggestion does just that:

Related

how to sum columns using column headers and bridge tables in excel

I have two sets of data in excel, set 1 is the raw data, and set 2 is a bridge table. The desired output is also added. How should I prepare for this formula.
set 1:
set 2:
output expected:
Here, a solution that assumes a variable number of headers and no specific pattern in the column names. Assumed no Excel version constraints as per tags listed in the question. In cell H1, put the following formula which spills the entire result all at once:
=LET(in, A1:F5, lk, A8:B12, header, DROP(TAKE(in,1),,1), A, TAKE(lk,,1),
B, DROP(lk,,1), data, DROP(in,1,1), REDUCE(TAKE(in,,1), UNIQUE(B),
LAMBDA(ac,bb, LET(f, FILTER(A, B=bb),values, CHOOSECOLS(data,XMATCH(f, header)),
sum, MMULT(values, SEQUENCE(ROWS(f),,1,0)), HSTACK(ac, VSTACK(bb, sum))))))
Here it the output:
We use LET function with two input ranges only: in, lk, so the rest of the names defined depend on such range names. It makes the formula easy to maintain and to adapt to your real scenario.
Using DROP and TAKE we extract each portion of the input ranges: header, data, A, B (columns from the second table). We use REDUCE/HSTACK pattern to concatenate the column of the result on each iteration. Check my answer from the question: how to transform a table in Excel from vertical to horizontal but with different length for more information.
We iterate by unique values of B and for each value (bb) we select the column A values (f). We use XMATCH to select the corresponding index columns from header (it doesn't include the date column). We use CHOOSECOOLS to select the corresponding columns from data (values). Now we need to sum by column, and we use MMULT for that. The result is in sum name. Finally, we use HSTACK to concatenate the selected columns one each iteration, including as header the unique values from B.
Note: Instead of MMULT function, you can use the following array function, it is a matter of personal preferences:
BYROW(values, LAMBDA(x, sum(x)))
You could try SUMIFS with the wild card character for each row. For example, for the first column, put the following formula and drag it down.
=SUMIFS($B2:$F2,$B$1:$F$1,"=A*")
Then do the same thing for the other columns, e.g. for column B:

Condensing nested if-statements with multiple criteria

The blue columns is the data given and the red columns is what is being calculated. Then the table to the right is what I am referencing. So, F2 will be calculated by the following steps:
Look at the Machinery column (D), if the cell contains LF, select column K, otherwise select column L
Look at the Grade column (E), if the cell contains RG, select rows 4:8, otherwise select rows 9:12.
Look at the Species column (A), if the cell contains MS, select rows 5 and 10, otherwise.......
Where every the most selected cell is in columns K and L, copy into column F.
Multiply column F by column C.
I don't want to make another column for my final result. I did in the picture to show the two steps separately. So column F should be the final answer (F2 = 107.33). The reference table can be formatted differently as well.
At first, I tried using nested-if statements, but realized that I would have like 20+ if statements for all the different outcomes. I think I would want to use the SEARCH function to find weather of not the cell contains a specific piece of information. Then I would probably use some sort of combination of match, if, v-lookup, index, search, but I am not sure how to condense these.
Any suggestion?
SUMPRODUCT is the function you need. I quickly created some test data on the lines of what you shared like this:
Then I entered the below formula in cell F2
=SUMPRODUCT(($I$4:$I$9=E2)*($J$4:$J$9=LEFT(A2,FIND(" ",A2)-1))*IF(ISERROR(FIND("LF",D2,1)),$L$4:$L$9,$K$4:$K$9))
The formula may look a little scary but is indeed very simple as each sub formula checks for a condition that you would want to evaluate. So, for example,
($I$4:$I$9=E2)
is looking for rows that match GRADE of the current row in range $I$4:$I$9 and so on. The * ensures that the arrays thus returned are multiplied and only the value where all conditions are true remains.
Since some of your conditions require looking for partial content like in Species and Machine, I have used Left and Find functions within Sumproduct
This formula simply returns the value from either column K or L based on the matching conditions and you may easily extend it or add more conditions.

Trying to specify multiple wildcards in a countifs function for Excel

I am working on a data sheet that has almost 300,000 rows by about 40 columns.
I have a countifs function to count the number of rows that have an entry ranging from "A1" through "A5" for each letter A-G in a particular column.
I have broken out analysis on separate sheets to pull data for each row for each separate letter A-G using countifs(range,"other data","F?") (I know its simplified).
I need to create a new sheet that excludes any row with an A value in it.
I tried countifs(range,"other data", range,{"B?","C?","D?","E?","F?","G?"}) and it only returns the count for the outside values (B and G), how do I get Excel to count all of those other values as well? I would like to keep this format because to create the sheets for B-G, I just used the find and replace to replace "A?" with "B?" and so on for the other sheets.
I would like to just replace "B?" with whatever works to count the number of rows that have B-G in that particular column.
You countifs formula, with an array constant for criteria, returns an array of values. But what you want is the SUM of that array. So:
sum(countifs(range,"other data", range,{"B?","C?","D?","E?","F?","G?"}))
Without the sum function, you will only see the value of the first element of that array.
I have a feeling this is the wrong answer, but I'll say it anyway. Why can't you use
=COUNTIFS(Range,"<>A?")
Or are there other possible values that you want to exclude?
In which case you should be able to use this for A
=COUNTIFS(Range,">=B1",Range,"<=G5")
and for B1-B5
=COUNTIFS(Range,">=A1",Range,"<=A5")+COUNTIFS(Range,">=C1",Range,"<=G5")
which can be modified for C, D, E and F
and this for G
=COUNTIFS(Range,">=G1",Range,"<=G5")

Sort Order formula to alphabetise in Excel

I am currently drawing up a spreadsheet that will automatically remove duplicates and alphabetize a list:
I am using the COUNTIF() function in column G to create a sort order and then VLOOKUP() to find the sort in column J.
The problem I am having is that I can't seem to get my SortOrder column to function properly. At the moment it creates an index for two number 1's meaning the cell highlighted in yellow is missed out and the last entry in the sorted list is null:
If anyone can find and rectify this mistake for me I'll be very grateful as it has been driving me insane all day! Many thanks.
I'll provide my usual method for doing an automatic pulling-in of raw data into a sorted, duplicate-removed list:
Assume raw data is in column A. In column B, use this formula to increase the counter each time the row shows a non-duplicate item in column A. Hardcord B2 to be "1", and use this formula in B3 and drag down.
=if(iserror(match(A3,$A$2:A2,0)),B2+1,B2)
This takes advantage of the fact that when we refer to this row counter in our revised list, we will use the match function, which only checks for the first matching number. Then say you want your new list of data on column D (usually I do this for display purposes, so either 'group-out' [hide] columns that form the formulas, or do this on another tab). You can avoid this step, but if you are already using helper columns I usually do each step in a different column - easier to document. In column C, starting in C3 [C2 hardcoded to 1] and drag down, just have a simple counter, which error-checks to the stop at the end of your list:
=if(C2<max(B:B),C2+1," ")
Then in column D, starting at D2 and dragged down:
=iferror(index(A:A,match(C2,B:B,0)),"")
The index function is like half of the vlookup function - it pulls the result out of a given array, when you provide it with a row number. The match function is like the other half of the vlookup function - it provides you with the row number where an item appears in a given array.
Hope this helps you in the future as well.
The actual reason that this is going wrong as implied by Jeeped's comment is that you can't meaningfully compare a string to a number unless you do a conversion because they are stored differently. So COUNTIF counts numbers and text separately.
20212 will give a count of 1 because it is the only (or lowest) number.
CS10Z002 will give a count of 1 because it is the first text string in alphabetical order.
Another approach is to add the count of numbers to the count if the current cell contains text:-
=COUNTIF(INDIRECT("$D$2:$D$"&$F$3),"<="&D2)+ISTEXT(D2)*COUNT(INDIRECT("$D$2:$D$"&$F$3))
It's easier to show the result of three different conversions with some test data:-
(0) No conversion - just use COUNTIF
=COUNTIF(D$2:D$7,"<="&D2)
"999"<"abc"<"def", 999<1000
(1) Count everything as text
=SUMPRODUCT(--(D$2:D$7&""<=D2&""))
"1000"<"999"
(2) Count numbers before text
=COUNTIF(D$2:D$7,"<="&D2)+ISTEXT(D2)*COUNT(D$2:D$7)
999<1000<"999"
(3) Count everything as text but convert numbers with leading zeroes
=SUMPRODUCT(--(TEXT(D$2:D$7,"000000")<=TEXT(D2,"000000")))
"000999" = "000999", "000999"<"001000"

comparing two columns in excel sheet (text/string) and return the matched element in third column

I have two columns in excel which I am trying to compare with one another and result the common element in third column. For example my sheet looks like
How do I compare Column D with E and if there is a matching string it will be printed in column F.
Edit 1: What function should I use to compare both case sensitive and non-sensitive strings.
This is kind of crude, but will tell you if the word in the first cell is in the 2nd, you can vary the left with mid, or right depending on your values.
=FIND(LEFT(C199,FIND(" ",C199,1)),O199,1)
In cell F1 place this formula:
=IF(ISERROR(MATCH(D1,$E$1:$E$100,0)),"",D1)
Then copy it down. This will show all non-case sensitive matches (for a column two list that has 100 values. Change the 100 to however long your column two list really is.)
To do the case-sensitive comparison try this:
=IF(EXACT(D1,LOOKUP(D1,$E$1:$E$100)),D1,"")

Resources