Excel VBA for Data Cleaning - excel

I want to find a way to automate our data cleaning (de-duping) process using excel VBA. Currently, our team gets a patient list from the clinics to de-dup since we find variations of duplicate records.
The reports we get from the clinic comes in an excel spreadsheet. We use specific columns of information from the spreadsheet we receive since we do not need everything from there. I've used multiple functions to remove duplicates since the patient records are entered manually, so there's a lot of different ways you'll see in the list.
It's okay if they have different addresses and insurance info because people move and they switch insurances. The conditions we focus on is whether someone has the same date of birth and first name and last name. We're not too strict on the last name either because last names change. In this case, we make sure that their date of birth is the same, first name is the same, and the address. We need at least three elements the same to consider that these are the same person, or we'll say these are different people and leave the records alone.
The list we work consist of the following columns:
First name, last name, DOB, Address, City, State, Zip, Primary Insurance.
| First Name | Last Name | DOB | Address | City | State | Zip | Primary Insurance |
|------------|-----------|--------------|------------------|----------|------------|--------|--------------------|
| John | Smith | 01/01/1990 | 123 ABC Street | Denver | Colorado | 87880 | Humana Insurance |
| Jon | Smith | 01/01/1990 | 123 ABC Street | Denver | Colorado | 87880 | Humana Insurance |
| Anthony | Bowen | 02/02/1992 | 456 ABC Street | Austin | Texas | 78632 | Aetna |
| Tony | Bowen | 02/02/1992 | 456 ACC Str | Austin | Texas | 78632 | Aetna |
Currently I sort the entire sheet of data from DOB oldest to newest, first name a-z, last name a-z.
From there I apply a formula:
if(and(C2=C3, B2=B3, A2=A3),"Check","") and apply through the entire rows. I filter all the blank cells to remove the formulas embedded and un-filter to jump down to the next cell that is flagged down with "Check" and check the next two rows.
I spot check to make sure the formula is picking all the true duplicates and then filter to see all the "Check" and remove all rows at the same time.
Then I move on to applying another formula:
if(and(C2=C3,left(B2,2)=left(B3,3),left(A2,2)=left(A3,2)),"Check","")
This is to match by same DOB, first two letters of first name and last name. Same process by jumping down to the next row that is flagged down with "Check" to review the two consecutive rows.
Again, spot check to make sure they're true duplicates, then filter to see all "check" and mass delete.
I would like to do this by having macro embedded buttons and the button will help grab the flagged down records to another tab of worksheet and removing those duplicated records from the original working sheet. So this way, it's removed from the original list but the user can still go to the other tab to review those removed duplicates if they want to.
I'd appreciate any suggestions anyone can give.
Thank you!

Related

How to extract multiple rows that meet a criteria which is given by 2 drop-down lists in EXCEL

I have a sheet that looks like this:
A | B | C | D | E | F
1 NAME | TASK | ADRESS | ORDER_GIVER | COUNT | NOTE
2 DROPDOWN_2 | move | NY, xy_street | Ann | 1 | ...
3 DROPDOWN_2 | fill | CA, yx_street | Rose | 3 | ...
...
100 NAME | TASK | ADRESS | ORDER_GIVER | COUNT | NOTE
101 DROPDOWN_1
102
103 NAME | TASK | ADRESS | ORDER_GIVER | COUNT | NOTE
104 DROPDOWN_1
INITIALLY:
In rows 1-99 you find the tasks with 1 column empty (NAME).
In rows 100+ you find "Tickets" which can be printed (2 rows for example 100-101)
THEN
1, The ORGANISER (me) makes tickets with names, by ctrl+c/ctrl+v the "ticket structure" and by choosing a name from the DROPDOWN_1 list.
2, Then starts to assign the tasks (row 1-99) to people by choosing them from the DROPDOWN_2 list. (note that dropdown name lists contain the same names.)
After this I would like to have the Excel to fill in the tickets by the rows that contain the same name as the ticket. One person can be assigned to more tasks, but one task can only be assigned to one man. (So tickets can have 1 NAME but more rows depending on the 1-99 list.
I am asking you to help me make a formula or function for this "autofill" of tickets because I have been searching for days for a solution however couldn't find a proper one.
In the Similar problems and solutions section you can find 2 links which had the closest answer. Unfortunately neither of them contain dropdown lists. I tried to solve the problem with INDEX(MATCH()) functions, but the problem is that it cannot handle the changes of names.
Thanks you,
Max
Similar problems and solutions:
https://www.get-digital-help.com/2009/09/28/extract-all-rows-from-a-range-that-meet-criteria-in-one-column-in-excel/
Extracting all rows based on a value of cell without VBA
Select A101:F392 and enter this as an array formula (ctrl+shift+enter):
=IFERROR(INDEX(A1:F99,ROUND(MOD(SMALL(IFERROR(CHOOSE({1,2},SMALL(IFERROR(1/(1/MMULT(IF(SMALL(COUNTIF(A2:A99,"<="&A2:A99),ROW(INDIRECT("2:98")))=SMALL(COUNTIF(A2:A99,"<="&A2:A99),ROW(INDIRECT("1:97"))),0,ROW(A2:A98)),{1,1}))+{0.001,-0.001},FALSE),ROW(INDIRECT("1:196"))),COUNTIF(A2:A99,"<="&A2:A99)+ROW(A2:A99)/1000),FALSE),ROW(INDIRECT("1:292"))),1)*1000,0),{1,2,3,4,5,6}),"")

Count results based on a multiple VLOOKUPs?

I wan't really sure how to title this one, but I'll try and explain this the best I can.
I've got a workbook and on my Dashboard I have a table that lists the stats I want to provide. It's all about helpdesk tickets, so I want to know how many tickets each group has raised. In my data though, each ticket is listed by the persons name, not group.
What I have done is created a table in my 'Data' sheet where the name is next to the group. For example:
| A1 | A2 |
|:-----------------|------------:|
| John Smith | Helpdesk |
| Ben Jones | Helpdesk |
| Will Smith | Management |
What I want to do is say:
On my 'Tickets' sheet, give me the total number of rows where the name is equal to that of the group I want to check.
The results would be something like:
| Group | Tickets |
|:-----------------|------------:|
| Helpdesk | 5|
| Management | 2|
Im a little unsure on where to even start with this one
so I want to know how many tickets each group has raised.
Do you need the total Count of tickets raised by group?
Try this formula to calculate the number of tickets per department, in the second column of the resulttable (starting at B2 and copy down). In the first column are the department names, starting at A2.
=SUMPRODUCT(--(LOOKUP(Tickets!A1:A100,UserDeparment!A1:B20)=Results!A2))
Tickets!A1:A100 are the users for every ticket (ticketData). UserDeparment!A1:B20 holds the department (Column B) for each User (Column A).

Sorting three columns into six, sorted horizontally by surname using excel

I work at an event centre with a portrait screen either side of the entrance doors that we like to list peoples names and table numbers. We often get the list in the format;
Surname | First Name | Table Number
======================================
Aadomson | Adam | 5
Bobson | Bob | 10
etc
What we'd love to do is take those three columns and get a script or something (we're a little over our heads) or be pointed in the right direction for something that could sort it to two or three sets of columns (2 lots of surname, first name and table)
To something that goes like
Surname | First Name | Table | Surname | First Name | Table
===============================================================
Aadamson | Adam | 5 | Bobson | Bob | 30
Christon | Chris | 8 | Donaldson | Donald | 40
etc
If anyone could shed any help that would be incredible!
Check this and this for a hint on how to use INDEX.
I will use the more general answer here with OFFSET (item 8).
Assuming:
Your source data lays in A2:C3 (you can extend this range as needed),
Cell D2 contains 3 (your source width)
Cell D3 contains 2 (the number of repeats)
Your target range starts at F2
then cell F2: =OFFSET($A$2,(ROW()-ROW($F$2))*$D$3+INT((COLUMN()-COLUMN($F$2))/$D$2),MOD((COLUMN()-COLUMN($F$2)),$D$2))
Copy this to the right (6 columns) and down as far as needed.

In Excel, how can I use a formula to merge rows with common values and concatenate differences?

Using the example from this question:
Excel - Merge rows with common values and concatenate the differences in one column
How can I change:
Customer Name | NEW YORK | ALBANY
Customer Name | NEW YORK | CLINTON
Customer Name | NEW YORK | COLUMBIA
Customer Name | NEW YORK | DELAWARE
Customer Name | NEW YORK | DUTCHESS
Customer Name | VERMONT | BENNINGTON
Customer Name | VERMONT | CALEDONIA
Customer Name | VERMONT | CHITTENDEN
Customer Name | VERMONT | ESSEX
Customer Name | VERMONT | FRANKLIN
to this:
Customer Name | VERMONT | BENNINGTON,CALEDONIA,CHITTENDEN,ESSEX,FRANKLIN
Customer Name | NEW YORK | ALBANY,CLINTON,COLUMBIA,DELAWARE,DUTCHESS
where | denotes a cell. The answer given in the question above was for a macro. I need to create a manageable template and most people do not know how to manage macros. Thus, I need a formula to do this. Can anyone help out?
If your data is sorted by state, you can use a formula column and a helper column to make it simple. First column, you concatenate the states
in D2:
=if(B2=B1,D1&C2&",",C2&",")
in the second column, you can put a filter that tells you if the list is finished
=if(B2=B1,"","State Complete")
You can filter on the State Complete value and get your results.
If you're trying to go a lot more fancy than that, you'll need macros or user-defined functions.
Similar but a singe formula (and a 'wheeze'), assuming ALBANY is in C2, etc:
=IF(B1=B2,D1&","&C2,C2)&IF(B2<>B3,".","")
The full stop (period) is to identify the last of each set (which are assumed to be sorted and for ColumnC TRIMmed if necessary - also assumes no full stops in Column C).
A formula will not delete rows so either filter to select rows with cells containing full stops or copy the column, Paste Special Values over the top, filter and delete those not containing a full stop. I'd prefer the latter as presumably ColumnC itself should be deleted.
The periods could be removed with Find and Replace.

Report generation based on multi lookup and dynamic columns

I am a little stuck with a report I am trying to generate in Excel and was hoping someone could help.
Here is a summary of what I am trying to do:
Table 1 has one column called people (it’s basically a list of
employees)
Table 2 has one column called countries (it’s basically a
list of relevant countries)
Table 3 has three columns called person,
country and date.
There is one entry for every person each time they review a country.
So the data will look something like:
PERSON | COUNTRY | DATE
John | uk | 10/01/2013
Paul | uk | 15/01/2013
John | France | 15/01/2013
Bob | Spain | 16/01/2013
The report I need to produce is one which shows who has/hasn't checked each country.
So the columns will be ‘Person’, uk, France, Spain (and any other unique value from the country table).
There will then be one single row for each person with a Yes/No value in the relevant column if that person has reviewed that country i.e. Table 3 contains a value that matches that value for the person and country.
So to be clear the report should be similar to:
PERSON | UK | FRANCE | SPAIN
John | Yes | Yes | No
Paul | Yes | No | No
Bob | No | No | Yes
In summary I can split this into two problems:
How to generate a table that has a column for every unique value in another table (country in the explanation above)
How to do a double lookup i.e. IF EXISTS in TABLE 3 ‘person’=john & ‘country’=uk then return ‘Yes’, otherwise return ‘No’
I’m happy to keep in Excel or make use of SQL reporting i.e move my data to SQL first.
It's kind of a wonky formula but =sumproduct() will do a dual lookup.
=IF(SUMPRODUCT(--($K$2:$K$5=$K13), --($L$2:$L$5=M$10)),"Yes","No")
The Person/Country/Date table is located in the range K1:M5 the results table is located in range K10:N13. I had a workbook open and put it in the corner. (Nobody puts sumproduct in the corner)
The gist is, -- turns a true and false into a 1 or 0. sumproduct will multiply the two results line by line. If both are true, you get 1 x 1 and funnels that back into the if for a yes and no. You'll have to be mindful of th $ in the formula.

Resources