Excel: Find duplicates in one column, then remove rows based on value in other column - excel

I've been able to find a number of articles that seem to orbit my particular puzzle, but I'm having difficulty carving out the specific solution for it. Using the below image for reference:
ID Name Company Name
5 Dennis E Lantz Boggio Architects, Pc
6 Director Lantz Boggio Architects, Pc
7 Glenn D Lantz Boggio Architects, Pc
8 Director Ge Johnson Construction
9 Evan Da GH Phipps Construction Companies
10 Paul Fog GH Phipps Construction Companies
11 Todd W GH Phipps Construction Companies
I have a mailing list that is organized so each unique contact is placed on an individual row. The list contains columns for Name (column A in my sheet) and Company Name (column B).
If the Name cell was originally empty, a default 'generic' title is entered (e.g. 'Director', as per rows 6 and 8 in the image).
In some cases, there are multiple contacts at the same company (e.g. rows 5-7, 9-11). Occasionally, one of those contacts has a 'generic' name (e.g. row 6).
What I'd like to do:
Search for duplicates in Column B
Then delete the row based on the value in Column A (with me defining the specific values to be sought for)
So in the example image, only row 6 would be deleted because Column B contains a duplicate address, and Column A contains the value 'Director'.
Thank you!

Maybe, in C5 and copied down to suit:
=AND(COUNTIF(B:B,B5)>1,A5=C$1)
with Director in C1.
Then filter ColumnC to select TRUE and delete.
COUNTIF(B:B,B5) searches for the content of B5 throughout ColumnB (the B:B) and returns the count of the instances. B5 is within ColumnB so function will always find at least 1, for duplicates more than one, so >1 should detect that the row in question (5 for example) is not the only instance.
However, similar entries will not be counted - for example those that end in a trailing space, when what is in B5 does not.

Related

Splitting addresses into columns in Excel

I need help splitting addresses into columns in Excel.
Addresses in COLUMN A are written like:
601 W Houston St Abbott, TX 76621 United States
13498 US 301 South Riverview, FL 33578 United States
COLUMN B is actually a helper column. It contains only the city names from COLUMN A. My idea was to somehow match COLUMN B with COLUMN A and then all matches move to another column. That would separate City from the Address.State, Zip and Country I can use "split text to columns" since "comma" is delimiter. But I need help splitting address and the city.
There is a "comma" right after the city name, but some cities has more than one word in city name.
What I need to do is split the addresses like it's highlighted in green in the image below.
What is the best way to do that in Excel? What would be the formula for that?
We can use a quirk of LOOKUP to get this working.
=LOOKUP(1E+99,FIND(B$2:B$100,A2),B$2:B$100) in D2 will return the city based on searching for matches in column B. Note that this will need the full range of column B specified to be filled.
Then we can put =LEFT(A2,FIND(D2,A2)-2) in C2 to get the first part of the address.
The rest is easy if we can assume that the state, Zip and country are of constant length (if you've got any addressses outside the US then you'll need to alter this):
=LEFT(RIGHT(A2,22),3) in E2
=LEFT(RIGHT(A2,19),5) in F2
=RIGHT(A2,13) in G2
Since you already have City in Col B, just replace the city in A
D2 =SUBSTITUTE(A2,C2,"")
Column C Paste special values in Col C
Split Column C using comma.
Then split the Column D using "space". Assuming you have all records in US, you can add the country to all rows if required.
EDIT
I missed that the city name in the row does not correspond to the address. To match the city from the Master, you can use this array formula:
C2 =INDEX(B:B,MATCH(1,MATCH(""&$B:$B&"",A2,0),0))
Array formula must be confirmed with Ctrl-Shift-Enter.
However, this will find the first match. If you have cities Foster & Foster City in your master, Foster City wlll never be matched. So, sort the cities in descending order of length.
Once you have the City name matched you can follow the steps I gave earlier. Note that I have adjusted the formula to take into account the city name that has been matched by this new formula.
So, it is possible to get it done with the formula. It may not be the best way, but I got what I needed.
I've added a new sheet and named it "cities"
I moved the city list from sheet 1 to COL A in sheet "cities"
In sheet 1, B1 = =INDEX(cities!$A$1:$A$10000;LOOKUP(99^99;MATCH(RIGHT(TRIM(LEFT(SUBSTITUTE(A2;",";REPT(" ";255));255));ROW($A$1:$A$100));cities!$A$1:$A$10000;0)))
Then I've simply use SUBSTITUTE to remove city names from column A in Sheet 1: C1==SUBSTITUTE(A1, B1, "")
And that's it!

Excel look up value in array, return next value

I would like to look up a value in a range and return the value in the next row, but can't quite figure out how to do this. I especially would like to do this with formulas rather than VBA, and preferably with built-in formulas than custom (VBA) formulas, due to macro security issues.
I'm using Excel 2010. My workbook has two worksheets, "assessment" and "lookup". In lookup, I have lookup tables.
"lookup" looks something like:
Column A Column B Column C
1 Sales Engineering Manufacturing
2 Alice Bobbie Charlie
3 Dawn Edgar Frank
4 George Holly Isabel
In "assessment," I have some some drop downs from which users select one name from each column in "lookup." Based on some other criteria, I then rank these and create a new, sorted list (using INDEX() and MATCH()) that produce the selected name and corresponding column name a new sort order
Column A Column B
10 Engineering Edgar
11 Sales Alice
What I'd like is to return the name from the next row.
Column C
10 Holly
11 Dawn
But I'm having real trouble figuring out how to get there.
Assuming lookups is located at B2:D5 (change as required) and the result data is at F2:H3 (change as required) enter this formula in cell H2 then copy down.
=INDEX(
INDEX($B$2:$D$5,0,MATCH($F2,$B$2:$D$2,0)),
1+MATCH($G2,
INDEX($B$2:$D$5,0,MATCH($F2,$B$2:$D$2,0)),0))

Finding companies appearing with different IDs in MS Excel

I have 2 columns in my data:
A - each company's unique ID.
B - the company name that corresponds to the respective ID.
This type of data extends to 13,000 rows. For instance:
Col A Col B
12 Google Inc
12 The Google
14 Google
18 Amazon
18 Amazon
21 Amazon INC
18 Amazon
...
As you can see from the example above, the issue is that sometimes the company has a different ID appearing. Furthermore, although in all 3 cases, the company is still the same, the fact that they've been worded differently makes it hard to do an exact match.
My goal in this exercise is two-fold:
Find which companies have different IDs showing.
Identify the row at which this happens.
It would be cumbersome to go through all 13,000 rows. What Excel formulas would do the trick?
You could use pivot tables to count how many duplicates each name has.
I would also:
Order the list by column B.
Add a formula in column c that compares the formula row with the previous row.
For example consider a formula in row 5:
=IF(B4=B5,"Identical","Different")
You could build in more intelligence for example compare the first word in the name in row 5 to see if it is in the row 4 name. eg
=IF( iserror( find( LEFT(B5,FIND(" ",B5,1)-1) ,B4,1) )
,""
,"Similar")
You could combine the above tow into a single function, or may use both in different columns (which is easier)
PART 2:
The data must be ordered by column B!
So using the above logic to compare the IDs you should add another column (column F) with this formula
= find( LEFT(B5,FIND(" ",B5,1)-1) ,B4,1)
Then add another column (column G)
=IF(B4=B5
, B5
, IF( iserror(F5) )
,""
, F5 )
)
This results in a value in column G which is either the identical company name or the first word of a company that has a matching name.
You can then add another column (column H) which compares the id's of rows with the same IDs
=IF(F4=F5
, IF(A4<>A5, "Different IDS, "Ok IDs")
, "First row in company group"
)

Identify duplicate values in column groupings and then add a text label to the last group

The first group contains companies identified as weekly. This includes only Company A, B, and C.
The second group contains companies identified as monthly. This includes only Company D, E, F.
The third group is a long list of companies that need to be defined by the parameters of column a in groups 1 and 2. This list will include Company A to Company J.
So, if a company from Group 1 appears in Group 3, I'd like the 'weekly' status to appear in column A for Group 3. For ex: Cell B2 contains Company A, and since it then appears in Group 3, Cell B14, I'd like the text from cell A2 to duplicate down to cell A14.
http://imgur.com/HE45TUV
As formula version, maybe in A14 and copied down:
=IFERROR(INDEX(A$1:A$10,MATCH(B14,B$1:B$10,0)),"")

How can I remove non-matching values in two different columns and sort in Excel?

I have several columns of data in my Excel spreadsheet.
Originally, I had two different spreadsheets, as they were generated from reports in a software application.
One of the spreadsheets contains the names of individuals who have had transactions with us in the past year. The other spreadsheet contains the names and the phone numbers. I copied and pasted the columns with the names and phone numbers into my spreadsheet with just the names of people who have purchased something from us in the past year.
My ultimate goal is to extract the names and phone numbers of only the names that have purchased something in the past year.
My column for the past year contains 1,002 names, while my master customer list (with phone numbers) contains over 20,000 individuals. I need the phone numbers of all of the individuals that have purchased something from us in the past year, but I don't want to have to manually go through 1,000 names (and, essentially, 20,000+ to find the match).
If I can achieve my goal without having to use VBA, that would be great. If this is the only route I can take, then I will go that route, but I would like to avoid coding if possible. (This is simply due to time constraints.)
The VLOOKUP function is likely the best solution for you. From the Excel documentation, it:
Looks for a value in the leftmost column of a table, and then returns
a value in the same row from a column you specify. By default, the
table must be sorted in an ascending order.
Note well the implication of that last sentence: the column you're searching in (leftmost column of the lookup table) must be sorted in ascending order for this function to produce the correct results.
Taking a simple example, let's say you have Sheet1 in your Excel workbook with the following information:
A B C
1 Name Transactions Phone
2 Sally 3
3 Alice 5
4 Joe 2
5 Jon 1
You need to add their phone numbers to this sheet, from another workbook. Let's say your phone number information in the other workbook looks like this:
A B
1 Name Phone
2 Alice 2222222
3 Bill 3333333
4 Bob 4444444
5 Jim 5555555
6 Joe 6666666
7 Sally 7777777
8 Sue 8888888
9 Tom 9999999
Take the following steps to add the phone numbers to Sheet1 in the first workbook:
Copy the phone information into a blank sheet in the first workbook. Let's call this Sheet2 for this example.
Make sure the phone information is sorted ascending by the Name column (A), because that's the leftmost column and thus the lookup column.
In cell C2 of Sheet1 (the empty phone cell for Sally), enter: =VLOOKUP(A2, Sheet2!A$2:B$9, 2,FALSE).
Drag-copy this formula down to the remaining cells in the Phone column.
Result:
A B C
1 Name Transactions Phone
2 Sally 3 7777777
3 Alice 5 2222222
4 Joe 2 6666666
5 Jon 1 #N/A
Notes:
The second parameter (Table_array - the lookup data range) should not include the column headings. As you can see, it's Sheet2!A$2:B$9 so it includes the information from rows 2 to 9 in columns A and B.
The last parameter (Range_lookup) should be set to FALSE so you don't pick up the information from the closest match. Note how Jon has no matching phone number record, so his Phone is set to "#N/A" - otherwise he would have been assigned Joe's phone number since that's closest match to Jon.
Parameter documentation:
Lookup_value is the value to be found in the first column of the table, and can be a value, a reference, or a text string.
Table_array is a table of text, numbers, or logical values, in which data is retrieved. Table_array can be a reference to a range or
a range name.
Col_index_num is the column number in Table_array from which the matching value should be returned. The first column of values in
the table is column 1.
Range_lookup is a logical value: to find the closest match in the first column (sorted in ascending order) = TRUE or omitted; find
an exact match = FALSE.

Resources