Count values in column ignoring duplicates

Count values in column ignoring duplicates - excel

Not really an Excel user, but what seemed simple has turned out to be very difficult for me. I am in trouble as I can't come up with a nice and clean (or any) way to get it working.
What I have here:
I need to create a new columnn that would tell the amount of employees in each occupation while ignoring the duplicates (highlighted).
The amount of names formula is working, so maybe this can be used ? Or maybe it's just in the way and should be cleared.
It's just:
=COUNTIFS(A:A;A2)
Tried searching for quite a while did not find anything suitable. Any help or advice would be much appreciated. I hope I explained it in clear manner.
Thank you

Without helper columns:
Two options, D2:
{=SUM(--(FREQUENCY(IF($B$2:$B$9=C2,MATCH($A$2:$A$9,$A$2:$A$9,0)),ROW($A$2:$A$9)-ROW($D$1)+1)>0))}
Or put in E2:
{=SUMPRODUCT((($B$2:$B$9=C2))/COUNTIFS($B$2:$B$9,$B$2:$B$9&"",$A$2:$A$9,$A$2:$A$9&""))}
Notice both are array formulas and should be entered through CtrlShiftEnter

SUMPRODUCT 'Deals' in Arrays 3
You might have employees with the same name (David, Michael) in different occupations (Tech & Worker, Tech & Economy). To distinguish those from each other, in B2 you can use:
=SUMPRODUCT((A$2:A$21=A2)*(C$2:C$21=C2))
In D2 you can use:
=SUMPRODUCT((1/B$2:B$21)*(C$2:C$21=C2))

Distinct Employees Occupation Count
with a helper column
Unique and Distinct values are tricky. Using a helper column is beneficial for identifying either one of these when coupled with an expanding range:
=SUMPRODUCT((A2=$A$1:$A1)*(C2=$C$1:$C1))
Relative Rows: ^ ^
Paste to cell E2.
Copy Drag the formula down from the where pasted.
The relative row numbers identified above well increase as the formula is copy dragged down. This creates a larger and larger range for comparison. An expanding range.
In this case the range that is expanding is the range of already checked values. Many times the result range is expanded and tested against to eliminate posting duplicates of already posted results in subsequent rows of the results list.
The helper column's value is how many times the name and occupation pair has previously appeared. Zero previous appearances tells us this is the first occurance. We will only count the zeros (first occurance) in the main formula.
The main formula for counting distinct employees in each occupation:
=COUNTIFS( $C$2:$C$9, C2, $E$2:$E$9, 0)
Paste to cell D2.
Copy Drag the formula down from the where pasted.
Here we count all the rows for this row's occupation where the occupation matches the range of listed occupations and for that particular row in the list of occupations, the helper column row value is zero.

Add a final column which is the concatenation of the prior 3 columns then use
=SUMPRODUCT(1/COUNTIF(D2:D9,D2:D9))
There is a good explanation of this formula here. Basically, values that appear once will count as 1. Values that appear more than once will appear as fractions of their total occurrence count and be summed to 1.
If you convert your data to an Excel table by selecting a populated cell in the range and pressing Ctrl+T, then formulas will auto-populate down last column. You can then reference the table columns in the formula and you won't need to amend the formula as you add rows.

Related

Excel formula to find last non blank cell in a column starting on a certain row

So I've got a column of data that I want to count the non blank cells after a certain row.
Here's an example of what I have:
So, in this example, I would like to start counting non blank cells in column A starting on row 13 (which would be a total of 4). If you look at the formula I have entered into cell D12 I can get the value I'm looking for with this formula:
=COUNTA(A:A)-11
I could use this formula:
=COUNTA(A13:A16)
but the point is the last cell with data in it can change due to entering different amounts of data in the column.
But I'm wondering if there is a different formula I could use that would count non blank cells from a certain row down regardless of the amount of data I enter into the column from a certain row down using an open ended range, kind of like this:
=COUNTA(A13:A)
This formula doesn't work but it kind of illustrates what I tried to do that didn't work.

from my comment above:
Well, you could always get the last used row dynamically and incorporate that, not sure what your benefit is over using the last row:
=COUNTA(A13:INDEX(A:A,LOOKUP(2,1/(A:A<>""),ROW(A:A))))
This makes it somewhat "open" ended I guess. Unfortunately it isn't GS =)

Excel: Narrow entries from a sheet to those that appear on another sheet

I'm a relatively novice Excel user trying to streamline the following task:
I've got two sheets of product information. Sheet1 has around 3000 entries and Sheet2 has around 1300 entries. Every SKU in the product number column on Sheet2 appears on Sheet1, but some are formatted differently: some cells in Sheet1's SKU column occasionally contain multiple comma-separated entries (Example: PDB2S2FW, PDB2S2V, PDB2S2WH), whereas all Sheet2 SKUs are listed in their own cells.
My goal is to identify the items on Sheet 1 that appear on Sheet2 (with a filter or a helper column) so that I can narrow down Sheet1 to include only the items on Sheet2.
I've been experimenting with a few formulas to attempt this task, but haven't been able to solve for the multiple entry/single cell issue.
Here's my current formula:
=IF(ISNA(MATCH(BJ9,Sheet2!B:B,0)),"Not found","Found")
[Column BJ on Sheet1 and Column B on Sheet2 hold product numbers.]
Any ideas? Thanks!

This answer uses an array formula. If you're new to Excel, you may not know how to do an array formula, so I'll add a quick tutorial and a link at the end of this answer:
This formula will give you a zero if there is not a match and a positive number if there is a match. Enter the formula in a column next to BJ on Sheet1, then drag fill to the bottom. You can then filter out all positive numbers and see those values which are not found on Sheet2.
Assumption: Values start in Row 1. If they don't, change $BJ1 to match the row where values start.
{=LARGE(IFERROR(FIND($BJ1,Sheet2!$B:$B),0),1)}
Note: It might be more efficient if you change Sheet2!$B:$B to reference only the cells that actually have data, instead of the entire column. For example: Sheet2!$B$1:$B:$3000.
Array Formulas
To enter the array formula,
Select and copy the above formula excluding the curly braces. The curly braces are there just to show that it is an array formula.
Paste the formula into a cell on your spreadsheet.
Press Ctrl + Shift + Enter
This page has more info about array formulas.
Caveat
The IFERROR function is only available in later versions of Excel. If you're using an older version, you may need to work with ISERR instead.

Need excel average formula for a dynamic range

This is what my table looks like:
Please note that I cannot change the position of any data here. This is a table that will continuously expand as I add new columns to the right and new rows at the bottom.
I need a formula in Column A that will calculate the average of all data in the same row as where the formula is and the formula has to autoupdate whenever I add new columns to the right of the last column. For example, in cell A64 is the formula that will average C64 to E64. when I add new data in F64, I want A64 to autoupdate to include that new cell in the computation.
I tried
=AVERAGE(INDIRECT("C64:"&ADDRESS(ROW(),COLUMN()+4)))
but it did not autoupdate when I added new data in F64. I am not an excel expert and I mostly learn by googling, but this one is taking me forever. Please help.

This is where OFFSET and COUNTA are your friends, In A2 and fill down:
=AGGREGATE(1,6,OFFSET(C2,,,1,COUNTA(C2:XFD2)))
I have used AGGREGATE function with argument 1 for Average and argument 6 to ignore error values in the range. COUNTA resizes the array from C2 to the end of the populated area (allowing for error values).
You can also use INDEX with COUNTA
=AGGREGATE(1,6,$C$2:INDEX(C2:XFD2,COUNTA(C2:XFD2)))
Or INDEX with MATCH. In the example below, I have reduced the column end point to AA, rather than XFD (which is the last column in 2016). If you know a realistic number of columns that will ever be filled you can use that as your end point reference to reduce the amount of work your dynamic formulas are doing.
=AGGREGATE(1,6,$C$2:INDEX(C2:AA2,MATCH(99^99,2:2)))

Expanding an Excel formula without referencing the previous cell

I am attempting to use an IF statement to check whether the sum of two cells from another Excel sheet is greater than or equal to 1.
For a sheet called Test1 with the values of interest in column C, this is what I have so far, which works fine:
=IF((Test1!C1+Test1!C2>=1),1,0)
In column B on a second sheet that I'll call Test2, I want to copy this formula down 200,000 rows. However, if the aforementioned formula is in cell B1, for the formula in B2 I would like the formula to read:
=IF((Test1!C3+Test1!C4>=1),1,0)
I want to copy the formula down the column so that the second cell reference in the formula in the first row does not become the first cell reference in the formula in the second row (eg. it would go C1+C2, then C3+C4, C5+C6, etc.).
I have tried manually entering the formula for a few rows, highlighting those, and copying them down but can't get the desired cell reference pattern. If I highlight and drag these first three formulae down another three rows, C4 and C5 are repeated and not in the correct pair.
=IF((Test1!C1+Test1!C2>=1),1,0)
=IF((Test1!C3+Test1!C4>=1),1,0)
=IF((Test1!C5+Test1!C6>=1),1,0)
=IF((Test1!C4+Test1!C5>=1),1,0)
=IF((Test1!C6+Test1!C7>=1),1,0)
=IF((Test1!C8+Test1!C9>=1),1,0)
I have tried using OFFSET() within this formula but couldn't get it to work. I am basically just wanting to add 1 to each of the cell references in the formula, as compared to the previous row (but not to actually add 1 to the value of that cell, as would happen with C1+1 for example).
Any insight would be greatly appreciated!

If you plan on copying this down 200K rows then you will want the absolute simplest formula that accomplishes the stagger. Avoid the volatile OFFSET function or be prepared to spend a lot of time waiting for random calculation cycles to complete. A volatile function will recalculate whenever anything in the workbook changes; not just when something changes that involved the formula in the cell.
=--(SUM(INDEX(Test1!C:C, (ROW(1:1)-1)*2+1), INDEX(Test1!C:C, (ROW(1:1)-1)*2+2))>=1)

The following formula should do the trick:
=(SUM(INDIRECT("C"&ROW()*2-1);INDIRECT("C"&ROW()*2))>=1)*1
And that's the version using IF:
=IF(SUM(INDIRECT("C"&ZEILE()*2-1);INDIRECT("C"&ROW()*2))>=1;1;0)

You say I am basically just wanting to add '1' to each of the cell references in the formula but appear to be incrementing by 2, so I am confused but an option might be to apply you existing formula to 400,000 rows, together with =ISODD(ROW()) in another column, then filter on that other column to select and delete those showing FALSE.

Excel's autofill won't do the 2-cell shift that you're looking for. You can use the functionality that is there.
Put =IF((Test1!C1+Test1!C2>=1),1,0) in the top cell and drag a copy to the second row (it will be =IF((Test1!C2+Test1!C3>=1),1,0) but that's okay). Now, put 'A' and 'B' in the next column. Select all 4 cells and copy them down 400k rows.
Use filter to delete rows flagged with 'B' and delete the blank rows.
(Select blank rows with [F5] click Special and select Blanks, then right-click and delete)

Here is all you need. It's fast and nonvolatile.
=--(SUM(INDEX(Test1!C:C,ROW(1:1)*2-2):INDEX(Test1!C:C,ROW(2:2)*2-2))>=1)
Copy it down as far as you like.

Fill dates array and add dummy variables

I have a column with dates called "dates". This column contain dates from 01.01.2010 to 31.12.2010. it should have about 365 rows, but it actually has only 231 rows, because the data was not collected regularly. The others are missing, and I'd like to fill the gaps in time.
How can I fill the array of this column with the missing dates? I want to add 134 rows in the place of the missing ones, filling in the missing dates.

Create another sheet and put all the dates in column A in your new sheet.
Make sure your sheet with the data in it has the data column all the way on the left (important for how Vlookup works)
In your new sheet, starting in Cell B2 put numbers 1 through however many columns you have in your data sheet along that top row.
In your new sheet use Vlookup to find all the rows where there are data
=VLOOKUP($A2,DataSheet!$A$1:$C$20,B1,FALSE)
Note that the lookup column ($A1) is locked in to the column but not the row and that the range you are looking up is locked in in all directions. This will allow you to drag to the right/down and fill everything in.
Drag to the right then drag all the way down.
there will be #N/As where you cannot find a match which you can suppress with either an IF statement of conditional formatting. But now you have a row for every day with blanks when there is not data!

I found a solution with a similar formula, but the result was the same.
First, I got the two columns of data—"date" and "values" in the columns A and B of the worksheet. Each consisted of 231 rows. Then, I spread a full array of dates—365 in a new column D. Finally, I used this formula:
=VLOOKUP(D2;$A$2:$B$1056;2;FALSE)
in C2 and obtained the only the values from column "values" corresponded to the new dates of column D.
Thanks for Brad's answer for directing me to the VLOOKUP function.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Count values in column ignoring duplicates - excel

Related

Excel formula to find last non blank cell in a column starting on a certain row

Excel: Narrow entries from a sheet to those that appear on another sheet

Need excel average formula for a dynamic range

Expanding an Excel formula without referencing the previous cell

Fill dates array and add dummy variables

Categories

Resources