Listing duplicate emails in Excel mailing lists - excel

Im trying to create a list of values in 'sheet 3 column A', that are created by listing all values that are duplicates in two other sheets.
The duplicates are to be found by looking through each value in 'sheet 1 column P' and checking if that value also exists in
'sheet 2 column A'
I've tried reading up on this and there seem to be a number of functions I can use and not sure if I should use.

You need to use the VLOOKUP function, combined with IF. Together they are very very powerful. I really suggest you read up on them.
The following formula in Sheet 3, Column A (starting at row 2) will do what you want:
=IF(ISNA(VLOOKUP(Sheet1!P2,Sheet2!$A$2:$A$99,1,FALSE)),"",Sheet1!P2)
Copy that formula down from A2. I've assumed you have headings in row A. If you have more than 98 rows of emails (values to check), change $A$99 to be something like $A$9999.

So, let me get this straight. You have two workbook tabs. You want to get the intersection of the set (figure out where they are overlapping, duplicate, however you want to say it).
I would do one of two things, depending on how much you like Excel and moving your data around.
Option 1: Create a PivotTable of the data (assumes no duplicates within lists, only between lists)
Copy the data from the second list after the end of the first list (so both lists are now one list)
Insert a Pivot Table (on the ribbon), choosing your single column for the source
PivotTable options will pop up. Put the email address field in RowLabels and Count in the Summarize Values box.
Click on the count column of the pivot and sort largest to smallest.
All your duplicates will have Count > 1
Option 2 - use CountIf
This does not involve moving your data.
Go to sheet 2. In the next column over (from your info, it would be Column Q), put the CountIf function:
=CountIf(Sheet1!A:A,P2)
Then you can sort descending on your new count column to find duplicates.
CountIf performs very well in Excel if your lists are very large.

This can be refined slightly using iferror giving:
=IFERROR(VLOOKUP(Sheet1!P2,Sheet2!$A$2:$A$99,1,False),"",Sheet1!P2)
but is essentially the same thing

Related

How to count unique column data in an excel sheet

I am using excel sheet and i have data column as shown below:
As we can see that some of the names are duplicate or appeared twice. My question is how can count unique name records or rows associated with each name for summary column.
Out put i am looking for is shown below:
Not sure which formula to use as count is counting all of that data i.e. '7' in this case. How can i use count or any other function to count unique records as shown above?
You can do what you're after with a pivot table.
Click the Insert tab then select "Recommended Pivot Tables".
A window will open up prompting you to select the data range. I recommend using a named range for your list and referencing that, but you can just highlight the list directly if you want.
Once the data range is selected, click "Ok" and new window will open with exactly what you want. A unique values list and a "Count of Column1". It is the default of the recommended pivot tables.
I outlined this because it's easy and fast, but it's important to understand you can make this pivot table yourself from scratch if you learn about pivot tables in general. Pivot tables are often overlooked in Excel as an option.
Lastly, you could get really advanced with Excel Power Queries. Just Google "Excel Power query" and you will be shown all kinds of information on them. They are a close second place in power to manipulate Excel data short of using VBA.
Good luck!
CountA(Unique(D2:D8,,False)) = 5 [Count(Unique(D2:D8)) is the same as False is the default.]
CountA(Unique(D2:D8,,True)) = 3 (once and only once)
Note: the Unique function was released in late 2019 to Office 365. So if you want to use this check your version, not present in 1908, present in 2006.
Edit: It's actually in 2002, I just updated my 1908 machine.
HTH
If names duplicates are removed the following formula can be used: =COUNTIF(B:B,F2)
If duplicates must be removed by formula, MATCH (searches for a specified item in a range of cells, and then returns the relative position of that item in the range.) and SMALL (Returns the k-th smallest value in a data set.) functions can be used as shown.
C$1048576 is used to reference last row number for a big list case.
formulas:
Column A, names sequence
Colunm B, names
Column C, formula =MATCH(B2,B:B,0)
Column D, formula =IF(COUNTIF(C2:$C$1048576,C2)=1,C2,"")
Column E, formula =SMALL(D:D,A2)
Column F, formula =VLOOKUP(E2,A:B,2,0)
Column G, formula =COUNTIF(B:B,F2)
For anyone like me without O265's lovely Unique & Filter Functions, and who doesnt want to use a pivot table, and there are many ways to do this, but this i have just done this in normal excel.
List of data in Column H, Formula in column O3. Drag down. Highlights your distinct and unique values from H.
=IF(COUNTIF(H:H,H28)=1,"U - "&COUNTIF(H:H,H28),IF(COUNTIF(H$1:H27,H28)=1,"U - "&COUNTIF(H:H,H28),"-"))
Formula is short. You can just do this and drag down. Apply the same principal to your worksheet data wherever it is.
=IF(COUNTIF(H:H,H3)=1,"U",IF(COUNTIF(H$1:H2,H3)=1,"U","-"))
Similarly, you can just use this formula here (credit goes to this source for this one):
=(COUNTIF($H$1:$H1,$H1)=1)+0
Id like to point out that the above formula is a better formula than mine. It highlights with a "1" (or with a tweak, the value of your choice) the first time any value is seen/spotted on any given list, whether duplicate or unique.
Whereas mine is a bit "more random" when picking up the "unique and distict" values.
Mine gets there in the end, but Extend Office's gets there first, as I think is proper (getting the first time a unqique distict value is spotted/occurs.).
Formula in K5 =IF((COUNTIF($H$5:$H5,$H5)=1)+0=1,"UNIQUE DIST","") and drag down...
You could append/add a normal basic countif after the results to show how many actual times the given value appears if you wanted. :
=IF((COUNTIF($H$5:$H5,$H5)=1)+0=1,"UNIQUE DIST","")&" - "&COUNTIF(H:H,H5)

Excel Structed Reference to filter Table and produce List

I want to filter a table in excel and return a different column to create a list for data validation.
My table contains a list of names and one of the columns is a Yes or No for being an admin.
I want to create a data validation list on another sheet and use the filter table to just show those names that have a Yes in their associated row in the table.
I recorded a macro to filter the table to show just the rows I need and now want those names to appear in the list.
ActiveSheet.ListObjects("Staff").Range.AutoFilter Field:=8, Criteria1:="<>"
Is this possible?
I had tried using the =FILTER() formula but it's not available in my version of Excel.
I'd prefer to do it with a formula in the validation settings rather than VBA.
This is something of a faff but I think it works, though by no means the best way of doing it.
Table of data on the left.
The "Yes" names are listed in D1 and down, the formula is an array (use Ctrl, Shift and Enter to confirm). I'm sure someone cleverer than me can shorten this.
=IF(ROWS(D$1:D1)<=COUNTIF(Table1[Admin],"Yes"),INDEX(Table1[Name],SMALL(IF(Table1[Admin]="Yes",ROW(Table1[Name])-ROW($A$2)+1),ROWS(D$1:D1)),1),"")
E1 is just the total of the names shown in D (another array formula):
=SUM(IF(LEN(D:D)>0,1,0))
The DV is in G1 and the formula there is
=OFFSET(D1,0,0,E1,1)
If you change e.g. Sarah to Yes, her name will appear in D and will be added to the DV list.
Once you've applied your filter to column 8, you can select the range that remains visible in a different column using:
ActiveSheet.ListObjects("Staff").Range.Columns(8).SpecialCells(xlCellTypeVisible).Offset(0, -2)
This would return a range consisting of column 6 (8-2) of your table. Adjust to suit your needs.
You could then cycle through that range one cell at a time and populate a new range from it, accordingly.

Finding Duplicates across a thousand lists

I have over 1,100 lists that each contain no more than 30 items in them. I am trying to see if there are any items within the lists that appear in all lists. I was initially thinking that I would need to compare the list in column A to the list in column B, store the duplicates, then compare the duplicates to the list in Column C, store the new duplicates, compare the new duplicates to the list in Column D, and so on until all the lists have been covered.
My questions are:
1.) Is this the correct way to approach this?
2.) If so, is there a simple VBA code that could be used to do this?
Deduplicate each list using Data > Remove Duplicates
Collate all the lists into one long list
Create a pivot table with the column of items as the Row dimension
Use the same column as the Value displayed in the pivot table, and aggregate using Count.
Sort the pivot table in descending order of that count.
The count shows the number of lists in which each item appears. If any have a count of 1100 then they must occur in every list.
Here's my non VBA solution to this fun problem. The plan is to search each item in any one list and compare to all the other lists in the table.
Start off by inserting a new "A" column to the left of your table. Copy any list and paste to A35.
if your goal is only to find items occuring in all lists, choose the smallest list.
if you would like to analyse, choose the largest list or even multiple lists.
you could include all items by copy/paste TRANSPOSE the entire table to new sheet. then you have less than 30 colums. copy paste each into one column and delete duplicates of this list with data--> remove duplicates.
Now you need to create a formula in cell B35 that searches for the string in A35 in the range B1:B30. You drag the formula all the way right and down.
=COUNTIF(B$1:B$30,$A30)
The results will be the count of each item found in each list. In order to see if any item is in all lists, then all columns within the specific row should count at least 1 item. To the right of the results, see what the minimum value in the row is with:
=MIN(B35:API35)
(assuming your table ends in column API)
If any of your rows have a minimum of 1, then the item is included in all lists.
You could then also sum up the line to see which items occur the most and you could use the "max" instead of "min" to see if any list has duplicates.
Please try to use this
If it will not work I can help you with Macro VB code.
Logic will be as below:
1. Keep 1st column as base to check all the other column
2. Check each 30 cell of the 1st column in a loop with all the other column cell.
3. Stop the loop, if you don't the value in an entire column.

Excel Instance Parsing

I have a list of data "instances" within one column within an excel sheet.
Each instance can have numerous copies. Here is an example:
abcsingleinstanceblah0001
cdemultipleinstanceexample0001
cdemultipleinstanceexample0002
cdemultipleinstanceexample0003
cdemultipleinstanceexample0004
....
Unfortunately the numbering scheme was not preserved across all of this data. So in some cases copies will have randomized numbers. However, the root instance name is always the same.
QUESTION: What would be a good strategy for creating a function that will parse a list of these instances and, in a new column, list all duplicates past the second copy? In relation to the example above, the new column would list:
cdemultipleinstanceexample0003
cdemultipleinstanceexample0004
I need to have the two duplicates with the lowest integer values preserved out of each set of duplicates, which is why in the example above 3 and 4 would have to go. So in the case of randomized numbers, the two instances with the lowest integer values.
What I have thought of
I was thinking to first organize the column by alphabetical order, which should automatically put duplicates in ascending order. I could then basically strip the number value from all instances, and find where there are more than 2 exact duplicates from the core instance name, which would give me the instances with more than 2 duplicates so that I could perform a function on the original data set... but I don't know if there is a better way of doing this or where to go from here.
I'm looking for formula-based solutions.
Assuming your sorted list is in Column A and that you have a row of headers you could use the following formulas in the neighboring columns.
In B:
=LEFT(A2,LEN(A2)-4)
In C (although not really necessary):
=RIGHT(A2,4)
In D starting with row 3:
=IF(AND(B3=B2,COUNTIF(B1:B3,B3)>2),"Del","Keep")
This formula doesn't work in row 2, but you can hard code the first result.
Then filter the list on Column D for "Del" and delete all the rows.
How's that?
Sort your list in column A. You'll want column headings for later so put those in row 1 (or leave it blank. In B2, type =left(A2,len(A2)-4) and drag the formula down to strip the integers. In C3 type =vlookup(B3,$B$2:$B2,1,0). Populate the formula in C3 right one cell and then down the length of the data. Now in D3 you'll have a list that has errors for any entry that only 2 or fewer instances and will have the name for any that have 2 or more. Sorting this list with a filter on row D for #NA will allow you to delete all the rows with less than two entries.
Remove your filter. Then resort the list in column A in reverse order so the high numbers are first. Replace the contents of C2 and D2 with #N/A. Refilter the list on column D for everything but #N/A and delete all the entries that have an instance listed.

Sort one column to match another in excel

I have a spreadsheet and I need to match the two columns together. However "Dove code" is 3600 rows and "code 2" is 1100. They all have the same codes as you can see in the image but you can also see where it starts changing and I need to have the codes all line up so I can see the gaps. I have already arranged them all alphabetically and its the "code 2" that would need to match up to "Dove code
If the above solution would result in too much shunting and vba is not an option, there's another way. Copy the first column and use 'remove duplicates' on it. Now you have an index list, put numbers from 1 to x in the column on the right of it.
Insert a column between the two lists and right of the second one.
Assuming that the index list is in F and the numbers in G, put this formula in the cell right of the first cell in the larger list:
=VLOOKUP(A2,$F$2:$G$500,2,FALSE)
Adjust the range accordingly. Put the same formula in the cell right of the first cell in the shorter list, with of course C2 instead of A2. Copy both formules to the end of the list.
Now both columns have an index on every row. You can match them using data sort, but for that you need to add dummies in the index columns.
Put this formula in the cell right of your basic index list: =countif(B:B,G2)
And this one in the cell right of that: =countif(D:D,G2)
Now you know how many times each record arises in both lists. Just add extra numbers manually so that both formulas turn up the same result. You should be able to do that really fast. If you have 200 records that are used 2 times in the first column and not in the second one, just copy the index of those 200 records and paste them twice. The countif's will automatically update.
You can use an extra column to calculate the difference between the two counts and use data sort on your basic index list to sort on the diferences.
After that just use data sort.
IF my directions are clear enough, this shouldn't cost you more than 10 minutes.
Edit:
Here's an example: http://img14.imageshack.us/img14/6366/k8pg.jpg
Without VBA I do this (for columns with a limited number of mismatches!) by adding a formula such as =INDIRECT("A"&ROW())<>INDIRECT("B"&ROW()) in a helper column. Working downwards, every time you see a TRUE shunt the appropriate column down to suit. But it may be only just about viable for 1100 rows!

Resources