Not sure how to formulate my question so I'll start with an example.
I have a dataset that looks like this (the whole dataset has ~150000 rows) - highlighted some of the values for easier viewing:
What I want to do is count how many times "Johnson, Jim, 10" and "Gordon, Tom, 15" appear in the same SUBDATA set where the first column cells have the same value.
I counted how many times the respective values appear in the whole data set separately using COUNTIFS:
But what I want to do is somehow counting how many times both iterations of the given data ("Johnson, Jim, 10" and "Gordon, Tom, 15") appear IF the first column value is the same (the value should be a DATE in the format YYYYMMDD).
I don't want to give the formula a date e.g. 20221005 and see if those 2 instances appear together because I have hundreds of dates. I just want to somehow tell the formula to CHECK when it finds any of those two records if there is an instance where the DATE is the same and count it.
e.g. If "Johnson, Jim, 10" and "Gordon, Tom, 15" have their respective related A cell with the same value (20221010), count it. And see how many times this happens in the whole dataset.
I would like to know if it's possible to do this using only formulas, without using Macros.
PS: Sorry if I didn't make myself clear enough, I will answer every question you have.
Here is a long way round:
Formula in K2:
=LET(x,FILTER(A1:A12,BYROW(B1:D12,LAMBDA(a,SUM(--(MMULT(--(a=H1:J2),{1,1,1})=3))))),COUNT(UNIQUE(FILTER(x,MAP(x,LAMBDA(y,SUM(--(x=y))))=2))))
Assuming within each date group there are no duplicated values (if not the formula can be adjusted), then you can use the following formula in I3:
=LET(dates, A1:A12, values, B1:B12&C1:C12&D1:D12, lkUp, F2:F3&G2:G3&H2:H3,
REDUCE(0, UNIQUE(dates), LAMBDA(acc,ux, LET(
set, FILTER(values, dates=ux), match, IF(COUNT(XMATCH(lkUp, set))=2, 1, 0),
acc + match)))
)
and here is the output:
What it does is for each unique dates it uses concatenation to find index positions via XMATCH. If both lookup values (lkUp) are found it is counted as 1, otherwise 0. REDUCE does the total sum of all matches.
Related
I am trying to make a formula that could count the max sum of any number of consecutive days that I indicate in some cell. Here is the dataset and the formula:
Dataset
The formula that calculates the maximum sum of three consecutive days:
=MAX(IFERROR(INDEX(
INDEX(E2:AI2,0)+
INDEX(F2:AI2,0)+
INDEX(G2:AI2,0),
0),""))
As you can see the number of days here is determined by the number of rows in the formula that start with "Index". The only difference between these rows is the letters (E, F, G). Is there any way I could reference a cell in which I could put a number for those days, instead of adding more rows to this formula?
Another approach avoding use of Offset is to use Scan to generate an array of running totals, then subtract totals which are N elements apart (where N is the number of consecutive cells to be added):
=LET(range,E2:AI2,
length,A1,
runningTotal,SCAN(0,range,LAMBDA(a,b,a+b)),
sequence1,SEQUENCE(1,COLUMNS(range)-length+1,A1),
sequence2,SEQUENCE(1,COLUMNS(range)-length+1,0),
difference,INDEX(runningTotal,sequence1)-IF(sequence2,INDEX(runningTotal,sequence2),0),
MAX(difference))
The answer here was posted by another user on another website, so I will repost it here:
One way to achieve this without relying on a VBA solution would be to use the BYCOL() function (available for Excel for Microsoft 365):
=BYCOL(array, [function])
The array specifies the range to which you want to apply your function, and the function itself is specified in a lambda statement. In the end, you want to get the minimum value of the sum of x consecutive days. Assuming that your data is stored in the range E2:AI2 and the number of consecutive days is stored in cell A1, the function looks like this:
=MIN(BYCOL(E2:AI2,LAMBDA(col,SUM(OFFSET(col,,,,A1)))))
The MIN() part ensures that you get only the smallest sum of the array (all sums of the x consecutive values) returned. The array is simply the range in which your data is stored; it is named in the lambda argument col and consequently used by its name. In your case, you want to apply the sum function for, e.g., x = 4 consecutive days (where 4 is stored in cell A1).
However, with this simple specification, you run into the problem of offsetting beyond cells with values toward the right end of the data. This means that the last sum you get would be 81.8 (value on 31 Jan) + 3 times 0 because the cells are empty. To avoid this, you can combine your function with an IF() statement that replaces the result with an empty cell if the number of empty cells is greater than 0. The adjusted formula looks like this:
=MIN(BYCOL(E2:AI2,
LAMBDA(col,IF(COUNTIF(OFFSET(col,,,,A1),"")>0,"",SUM(OFFSET(col,,,,A1))))))
If you do not have the Microsoft 365 version, there are two approaches that would also work. However, the two approaches are a bit more tedious, especially for cases with multiple days (because the number of days can not really be set automatically; except for potentially constructing the ranges with a combination of ADDRESS() and INDIRECT()), but I would still argue a bit neater than your current specification:
=MIN(INDEX(E2:AF2+F2:AG2+G2:AH2+H2:AI2,0))
=SUMPRODUCT(MIN(E2:AF2+F2:AG2+G2:AH2+H2:AI2))
The idea regarding the ranges is the same in both scenarios, with a shift in the start and end of the range by 1 for each additional day.
Another approach getting to the same result:
=LET(range,E2:AI2,
cons,4,
repeat,COLUMNS(range)-cons+1,
MAX(
BYROW(SEQUENCE(repeat,cons,,1)-INT(SEQUENCE(repeat,cons,0,1/cons))*(cons-1),
LAMBDA(x,SUM(INDEX(range,1,x))))))
This avoids OFFSET (volatile, slowing your file down) and the repeat value, consecutive number and/or the range are easily changeable.
Hope it helps (I answered to the max sum, as stated in the title). Change max to min to get the min sum result.
Edit:
I changed the repeat part in the formula to be dynamic (max number of consecutive columns in range), but you can replace it by a number or a cell reference.
The cons part can also be linked to a cell reference.
Also found a big in my formula which is fixed.
This is what I am trying to figure out:
IF date in cell matches dates in range
and
If name in cell matches names in range
then
count/sum the number of unique ID#s
This is the formula I have:
=IF(Data!A:A=E10,(IF(Data!D:D=D11,(IF(Data!D:D=D11,SUM(IF(FREQUENCY(Data!C:C,Data!C:C)>0,1)),"ERROR3")),"ERROR2")),"ERROR1")
It does not output the correct info. It either counts all the unique IDs or it Errors out when it should have a result.
I hope I am on the right track, thank you for any help.
Sample dataset:
Try it as,
=SUMPRODUCT(SIGN((B$2:B$10>=E2)*(B$2:B$10<=F2))/
(COUNTIFS(B$2:B$10, ">="&E2, B$2:B$10, "<="&F2, A$2:A$10, A$2:A$10)+(B$2:B$10<E2)+(B$2:B$10>F2)))
First let me say that the question was pretty confusing before you posted an image of the data, as it appears that the term "dates in range" was completely misleading. In fact you are trying to match exact dates, not "ranges of date".
FREQUENCY is useful to detect the first appearance of an item in a column, but unfortunately, this "artificial trick" is not flexible enough to be mixed easily with other criteria, and most importantly FREQUENCY is not array friendly.
There's another method to achieve you goal, which is:
=SUMPRODUCT(((Data!$A$1:$A$24=E$10)*Data!$C$1:$C$24=$D11))/
COUNTIFS(Data!$A$1:$A$24,Data!$A$1:$A$24,Data!$B$1:$B$24,Data!$B$1:$B$24,Data!$C$1:$C$24,Data!$C$1:$C$24))
You can enter this formula in E11 in your sample image and copy/paste in the whole matrix.
The denominator of the formula (the second line) generates an array that counts for each row the number of duplicates.
The numerator sets the criteria. Since each successful row will repeat as many times in the numerator and in the denominator, each matching row will be counted for a total of one.
As a result, we obtain the number of "unique rows" that match the criteria.
The formula should not use complete columns such as A:A etc, make the effort to limit it to a reasonable number of rows, say A1:A999 or so. Complex formulas involving arrays must avoid as much as possible entire columns.
EDIT: I have revived the source data source to remove the ambiguity of my last screen shots
I am trying to transpose spreadsheet data where there are many rows where the customer name may be duplicated but each row contains a different product.
For instance
revised original data source
to
revised proposed data format
I would like to do it with formulae if possible as I struggle with VB
Thank you for any help
I realise this is a huge answer, apologies but I wanted to be clear. If you need anything from me, drop me a comment and I'll help out.
Here's the output from my formula:
EDITED ANSWER - Named ranges used for ease of understanding:
These are just an example of a few of the named ranges I have used, you can reference the ranges directly or name them yourself (simplest way is to highlight the data then put the name in the drop down next to the formula bar [top left])
Be wary that as we will be using Array formulas for AccNum and AccType, you will not want to select the entire column and instead opt for either the exact data length or overshoot it by 100 or so. Large array formulas tend to slow down calculation and will calculate every cell individually regardless of it being empty.
First formula
=IF(COUNTIF(D2:D11,">""")>0,CONCATENATE("Account Number ",LEFT((COLUMN(A:A)+1)/2,1)),"")
This formula is identical to the one in the original answer apart form the adjusted heading title.
=IF(Condition,True,False) - There are so many uses for the IF logic, it is the best formula in Excel in my opinion. I have used to IF with COUNTIF to check whether there is more than 0 cells that are more than BLANK (or ""). This is just a trick around using ISBLANK() or other blank identifiers that get confused when formula is present.
If the result is TRUE, I use CONCATENATE(Text1,Text2,etc.) to build a text string for the column header. ROW(1:1) or COLUMN(A:A) is commonly used to initiate an automatically increasing integer for formulas to use based on whether the count increase is required horizontally or vertically. I add 1 to this increasing integer and divide it by 2 so that the increase for each column is 0.5 (1 > 1.5 > 2 > 2.5) I then use LEFT formula to just take the first digit to the left of this decimal answer so the number increases only once every 2 columns.
If the result is FALSE then leave the cell blank ,""). Standard stuff here, no explanation needed.
Second Formula
=CONCATENATE(INDEX(Forename,MATCH(Sheet4!$A2,Reference,0)))
=CONCATENATE(INDEX(Surname,MATCH(Sheet4!$A2,Reference,0)))
CONCATENATE has only been used here to force blank cells to remain blank when pulled by INDEX. INDEX will read blank cells as values and therefore 0's whereas CONCATENATE will read them as text and therefore "".
INDEX(Range,Row,Column): This is a lookup formula that is much more advanced than VLOOKUP or HLOOKUP and not limited in the way that they are.
The range i have used is the expected output range - Forename or Surname
The row is then calculated using MATCH(Criteria,Range,Match Type). Match will look through a range and return the position as an integer where a match occurs. For this I have set the criteria to the unique reference number in column A for that row, the range to the named range Reference and the match type as 0 (1 Less than, 0 Exact Match, -1 Greater than).
I did not define a column number for INDEX as it defaults to the first column and I am only giving it one column of data to output from anyway.
Third Formula
Remember these need to be entered as an array (when in the formula bar hit Ctrl+Shift+Enter)
=IFERROR(INDEX(AccNum,SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(A:A)+1)/2,0))),"")
=IFERROR(INDEX(AccType,SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0))),"")
As you can see, one of these is used for AccNum and the other for AccType.
IFERROR(Value): The reason that this has been used is that we are not expecting the formula to always return something. When the formula cannot return something or SMALL has run out of matches to go through then an error will occur (usually #VALUE or #NUM!) so i use ,"") to force a blank result instead (again standard stuff).
I have already explained the INDEX formula above so let's just dive in to how I have worked out the rows that match what we are looking for:
SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0))
The IF statement here is fairly self explanatory but as we have used it as an array formula, it will perform =Sheet4!$A2 which is the unique reference on every cell in the named range Reference individually. In your mock data this returns a result of: {FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE} for the first entry (I included titles in the range, hence the initial FALSE). IF will do my row calculation* for every true but leave the FALSEs as they are.
This leaves a result of {FALSE;2;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE} that SMALL(array,k) will use. SMALL will only work on numeric values and will display the 'k'th result. Again the column trick has been used but to cover more ground, I used another method: ROUNDDOWN(Number,digits) as opposed to using LEFT() Digits here means decimal places so I used 0 to round down to a whole integer for the same result. As this copies across the columns like so: 1, 1, 2, 2, 3, 3, SMALL will alternatively (as the formulas alternate) grab the 1st smallest AccNum then the 1st Smallest AccType before grabbing the 2nd AccNum and Acctype and so forth.
*(Row number of the match minus the first row number of the range then plus 1, again fairly common as a foolproof way to always get the correct row regardless of where the data starts; actually as your data starts on row 1 we could just do ROW(Reference) but I left it as is incase you had data in a different format)
ORIGINAL ANSWER - Same logic as above
Here's your solution in 3 parts
Part 1 being a trick for the auto completion of the titles so that they will hide when not used (in case you will just copay and paste values the whole lot to speed up use again).
=IF(COUNTIF(C2:C11,">""")>0,CONCATENATE("Product ",LEFT((COLUMN(A:A)+1)/2,1)),"") in C
=IF(COUNTIF(D2:D11,">""")>0,CONCATENATE("Prod code ",LEFT((COLUMN(B:B)+1)/2,1)),"") in D
Highlight both of the cells and drag across to stagger the outputs "Product " and "Prod code "
Part 2 would be inputting the unique IDs to the new sheet, I would suggest copying your entire column A across to a new sheet and using DATA > REMOVE DUPLICATES > Continue with current selection to trim out the multiple occurrences of unique IDs.
In column B use =INDEX(Sheet2!$B$1:$B$7,MATCH(Sheet4!$A2,Sheet2!$A$1:$A$7,0)) to get the names pulled across.
Part 3, the INDEX
Once again, we are doing a staggered input here before copying the formula across the page to cover the entirety of the data.
=IFERROR(INDEX(Sheet2!$C$1:$D$11,SMALL(IF(Sheet2!$A$1:$A$11=Sheet4!$A2,ROW(Sheet2!$A$1:$A$11)-ROW(INDEX(Sheet2!$A$1:$A$11,1,1))+1),ROUNDDOWN((COLUMN(A:A)+1)/2,0)),1),"") in C
=IFERROR(INDEX(Sheet2!$C$1:$D$11,SMALL(IF(Sheet2!$A$1:$A$11=Sheet4!$A2,ROW(Sheet2!$A$1:$A$11)-ROW(INDEX(Sheet2!$A$1:$A$11,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0)),2),"") in D
The formulas of Part 3 will need to be entered as an array (when in the formula bar hit Ctrl+Shift+Enter) . This will need to be done before copying the formulas across.
These formulas can now be dragged / copied in all directions and will feed off of the unique ID in column A.
My Answer is already rather long so I haven't gone on to break the formula down. If you have any trouble understanding how this works, let me know and I will be happy to write up a quick guide, breaking it down chunk by chunk for you.
I have a (large) array of data in Excel of which I need to compute the average value of certain values in one column, based on the values of another column. For example, here's a snippet of my data:
So specifically, I want to take the average of the F635 mean values corresponding with Row values of 1. To take it a step further, I want this to continue to Row values of 2, Row values of 3 etc.
I'm not familiar with how to run code in Excel but have attempted to solve this by using the following:
=IF($C = "1", AVERAGE($D:$D), "")
which (to my understanding) can be interpreted as "if the values (anywhere) in column C are equal to 1, then take the average of the corresponding values in column D."
Of course, as I try this I get a formula error from Excel.
Any guidance would be incredibly appreciated. Thanks in advance.
For more complicated cases, I would use an array-formula. This one is simple enough for the AVERAGEIF formula. For instance =AVERAGEIF(A1:A23;1;B1:B23)
Array-formula allows for more elaborate ifs. To replicate the above, you could do =SUM(IF($A$1:$A$23=1;$B$1:$B$23;0))/COUNT(IF($A$1:$A$23=1;$B$1:$B$23;0)).
Looks like more work but you can create extremely elaborate if-statements. Instead of hitting ENTER, do CTRL-ENTER when entering the formula. Use * between criteria to replicate AND or + for OR. Example: SUM(IF(($A$1:$A$23="apple")*($B$1:$B$23="green");$C$1:$C$23;0)) tallies values for green apples in c1:c23.
Your sample data includes three columns with potential ifs so my guess is that you're going to need array formulas at some point.
Excel already has a builtin function for exactly this use; AVERAGEIF().
=AVERAGEIF(C:C,1,D:D)
I am having no luck trying to count my non blank cells in excel. I have tried multiple formulas and I keep getting inaccurate data. Here is the situation:
I am creating a Preventative Care list for my physicians (I have 4) I created a list of all their patients that have received a letter re: Prev. Care and I am inputting who needs a second letter and who has results. Some results are negative, some are positive. I have the list set up in alphabetical order on patients last name and then I have their physician initials in the other column. I want to check to see what percentage of each doctors patients have done their prev. care. I want to calculate this separately. Unfortunately, the cells are no in order. I have tried everything to my knowledge.
Help!
This will give you how many blank cells you have. You can deduct this from the total number of cells in your column, or you could use this directly to compute your percentage as (1 - x) where x is percentage of blank cells.
=COUNTBLANK(<your column>)
E.g:
=COUNTBLANK(A1:A10)
If it's not immediately obvious how to count the total number of cells in a range, this formula should help explain, answering the original question fully. It works with ranges that span more than 1 column, too.
=ROWS(range)*COLUMNS(range)-COUNTBLANK(range)
You might try something like:
=IF(LEN(A1) > 0, 1, 0)
You can then sum that column or do whatever other calculation you need.
=COUNTIF(range,"<>"&"") will count all cells that do not have a value equivalent to "", so, basically anything that is not blank.