Excel multiple criteria to retain rows that prioritise during deduplication - VBA macro or other method - excel

I need a solution to this so would any kind people out there that can help a novice Excel user?
I have three columns. Column A contains 10000 email addresses, Column B contains three values (Dr, Prof, Student), Column C contains title of project.
The problem: Same people (judging by email address) appear in this spreadsheet more than once with different value in column B (same person appears as Prof, Dr and Student).
Solution required: create three columns that count how many times same email address (Column A) appears on column A for each of the column b values. So I will then have a count of how many times an email address appears on the spreadsheet per (column b value).
Then deduplicate spreadsheet by Column A and B in a way that prioritises as follows : 1) prof over Dr and Student and 2) dr over student. In this way the rows to be retained will prioritise Professors and doctors over students.

I think it should work if you do the following (please create a backup first):
Search and replace 'Professor'-> '_Professor'
Sort the data by Email and Title (Prof,dr,student). You should now see the groups of duplicate email addresses, where the record to keep is the top one
In a new column (touching the data range) enter True at row 1 (assuming and you don't have headers)
Add a formula to the new column at row 2. (assuming the emails are in column A).
=A1<>A2
Extend the formula down
Copy/ paste the new column to values
Sort the data by the new column
All the rows that have False in the new column can be deleted.

Related

EXCEL - Find if value exists in column B in the range of a value in column A

I have a list of companies with certain products. Now I want to find out if one company has a certain product or not. Example, I want to find out which company had Product C and return a one on all cells:
Column 1
Company A
Company A
Company A
Company A
Company B
Company B
Column 2
Product A
Product B
Product C
Product A
Product B
Column 3 (Result):
1
1
1
0
0
This solution will require 2 additional columns. I'm assuming your first row is headers, and the range is from A1:B6. Data starts on Row 2. I'll give a few options on how to execute this though. Where I put "Product C" can also reference a cell. Whenever I'm using binary like this it's usually to filter datasets, so there might be a better alternative to what you want vs. what's below.
In Column C, =if(B2="Product C",1,0) or you can use =--(B2="Product C")
Sort by Column C in Descending Order, =vlookup(A2,$A$2:$A$6,1,0) copy and paste as values, but if you keep the formula and resort it will mess up.
If Product C would only appear once for any given company you can us Sumifs too. =Sumifs($C$2:$C$6,$A$2:A$6$,A2)
If you have 365, you can also use Maxifs($C$2:$C$6,$A$2:A$6$,A2), which won't care how you sort the dataset.

Excel - how to look in a dynamically changing range of multiple rows and columns and retrieve data

I have 2 excel files. 1 is a workfile in which I work, the other is the output of a database. See pic 1 for my database output (simplified).
What we see here:
The purchase order numer in column A
The row in the database in column B
The status of the row in the database in column C
The classification in column D, where W means a product we want to measure and P meaning delivery costs, administration costs etc (we don't want to measure this)
The number of items ordered and the number of items delivered in column E
The company name and product info in column F
Now, what I want, is something like this:
I want this table to be filled automatically based on the database output. It works for column B, but I'm stuck on column C, D and E.
What I want from you!
I need help with column C, D and E.
Number of rows: it needs to calculate the rows only with W in column D. So for item 4410027708 it has to be 2 (only 2 rows with W) and for item 4410027709 it should be 1.
Items ordered: it needs to add-up all the values that are directly to the right of the W in column D. So, for 4410027708, it needs to add up 3 and 5. It must ignore all the rows with P!
Items to be delivered: You may already guess this, but it needs to add up all the values in column E that are on the same row as column C with To be delivered, but only for the W rows (not the P versions). So, for item 4410027708 this should be
I suggest easy if ColumnA can be filled down first (including for the last entry) then assuming the database output sheet is called Sheet1, in:
C2: =COUNTIFS(Sheet1!A:A,A2,Sheet1!D:D,"W")
D2: =SUMIFS(Sheet1!E:E,Sheet1!A:A,A2,Sheet1!D:D,"W")
E2: =SUMIFS(Sheet1!E:E,Sheet1!A:A,A2,Sheet1!C:C,"To be delivered")
copied down to suit.

Excel look up value in array, return next value

I would like to look up a value in a range and return the value in the next row, but can't quite figure out how to do this. I especially would like to do this with formulas rather than VBA, and preferably with built-in formulas than custom (VBA) formulas, due to macro security issues.
I'm using Excel 2010. My workbook has two worksheets, "assessment" and "lookup". In lookup, I have lookup tables.
"lookup" looks something like:
Column A Column B Column C
1 Sales Engineering Manufacturing
2 Alice Bobbie Charlie
3 Dawn Edgar Frank
4 George Holly Isabel
In "assessment," I have some some drop downs from which users select one name from each column in "lookup." Based on some other criteria, I then rank these and create a new, sorted list (using INDEX() and MATCH()) that produce the selected name and corresponding column name a new sort order
Column A Column B
10 Engineering Edgar
11 Sales Alice
What I'd like is to return the name from the next row.
Column C
10 Holly
11 Dawn
But I'm having real trouble figuring out how to get there.
Assuming lookups is located at B2:D5 (change as required) and the result data is at F2:H3 (change as required) enter this formula in cell H2 then copy down.
=INDEX(
INDEX($B$2:$D$5,0,MATCH($F2,$B$2:$D$2,0)),
1+MATCH($G2,
INDEX($B$2:$D$5,0,MATCH($F2,$B$2:$D$2,0)),0))

Excel 2010: Vlookup Name from one column & Count and return data from another column

Hoping someone can help me here. :)
I have two columns of data in Worksheet 1:
COLUMN A = NAME (EG. TOM)
COLUMN C = TYPE OF QUERY (FAX, TEL, EMAIL, MAIL)
I would like to have in Worksheet 2:
COLUMN A = NAME (EG TOM)
COLUMN B = A COUNT OF HOW MANY FAXES TOM HAS
COLUNN C = A COUNT OF HOW MANY TELEPHONES TOM HAS
COLUMN D - A COUNT OF HOW MANY EMAILS TOM HAS
COLUMN E = A COUNT OF HOW MANY MAILS TOM HAS
If anyone can help me that would be great.
Thanks guys
You can use a pivot table. In sheet 1, click into the data table, then click Insert > Pivot table.
Drag the Name field to the rows. Drag the query type field to the columns.
Drag the Namie field again, this time to the Values area, where it will turn into a count.
Now you see a count of query types for each name in a matrix.
Use countifs instead if you really want to use formula. A pivot table would be the best way to go though.
eg for column B, row 1 on sheet 2:
=COUNTIFS(Sheet1!A:A, A1, Sheet1!C:C, "FAX")

Finding companies appearing with different IDs in MS Excel

I have 2 columns in my data:
A - each company's unique ID.
B - the company name that corresponds to the respective ID.
This type of data extends to 13,000 rows. For instance:
Col A Col B
12 Google Inc
12 The Google
14 Google
18 Amazon
18 Amazon
21 Amazon INC
18 Amazon
...
As you can see from the example above, the issue is that sometimes the company has a different ID appearing. Furthermore, although in all 3 cases, the company is still the same, the fact that they've been worded differently makes it hard to do an exact match.
My goal in this exercise is two-fold:
Find which companies have different IDs showing.
Identify the row at which this happens.
It would be cumbersome to go through all 13,000 rows. What Excel formulas would do the trick?
You could use pivot tables to count how many duplicates each name has.
I would also:
Order the list by column B.
Add a formula in column c that compares the formula row with the previous row.
For example consider a formula in row 5:
=IF(B4=B5,"Identical","Different")
You could build in more intelligence for example compare the first word in the name in row 5 to see if it is in the row 4 name. eg
=IF( iserror( find( LEFT(B5,FIND(" ",B5,1)-1) ,B4,1) )
,""
,"Similar")
You could combine the above tow into a single function, or may use both in different columns (which is easier)
PART 2:
The data must be ordered by column B!
So using the above logic to compare the IDs you should add another column (column F) with this formula
= find( LEFT(B5,FIND(" ",B5,1)-1) ,B4,1)
Then add another column (column G)
=IF(B4=B5
, B5
, IF( iserror(F5) )
,""
, F5 )
)
This results in a value in column G which is either the identical company name or the first word of a company that has a matching name.
You can then add another column (column H) which compares the id's of rows with the same IDs
=IF(F4=F5
, IF(A4<>A5, "Different IDS, "Ok IDs")
, "First row in company group"
)

Resources