Excel -- Cell Values Match in First Row, Return Values on One Row with Multiple Columns - excel

Here's an explanation of what I'm having difficulty with:
Column A: Lists the Address Book Numbers for different companies (1234, 1235, 1236, etc.)
Column B: Lists the Phone Types (Cell, Fax, Home)
Column C: Lists Company Phone Numbers
Address Book Numbers from Column A are repeated in multiple rows (due to the fact that some companies have more than one Phone Number in Column C)... So, I'm looking to consolidate the Address Book Numbers that are the same (to one row) and have each phone number pertaining to the same address number in additional columns (within that row).
Current Excel Table:
AddressBookNumber PhoneType PhoneNumber
1234 CELL (444)444-4444
1235 FAX (777)777-7777
1234 OFFICE (000)000-0000
1236 FAX (222)222-2222
1234 HOME (555)555-5555
1236 OFFICE (111)111-1111
Would like my Excel Table to look like:
|AddressBookNumber | PhoneType1 | PhoneNumber1 | Phone Type2 | PhoneNumber2 | PhoneType3 | PhoneNumber3 |
|1234 |CELL | (444)444-4444 | OFFICE | (000)000-0000 | HOME | (555)555-5555 |
|1235 |FAX | (777)777-7777 | | | |
|1236 |FAX | (222)222-2222 | OFFICE | (111)111-1111 | |
Essentially, I need to have the phone numbers pertaining to one company all in one row.. would appreciate any assistance on the formula I should use. Thanks!

To get the unique list use this array formula put this in F2:
=IFERROR(INDEX($A$2:$A$7,MATCH(0,COUNTIF($F$1:F1,$A$2:$A$7),0)),"")
Hit Ctrl-Shift-Enter on exit of edit mode instead of Enter. Then copy/drag down till you get blanks.
And to get the numbers put this in G2:
=IFERROR(INDEX($B$2:$C$7,MATCH(1,(COUNTIFS($E$2:E2,$B$2:$B$7,$F$2:F2,$C$2:$C$7)=0)*($A$2:$A$7=$F2),0),MOD(COLUMN(A:A)-1,2)+1),"")
Hit Ctrl-Shift-Enter on exit of edit mode instead of Enter. Then copy/drag down and over till you get blanks.

Related

Excel VBA for Data Cleaning

I want to find a way to automate our data cleaning (de-duping) process using excel VBA. Currently, our team gets a patient list from the clinics to de-dup since we find variations of duplicate records.
The reports we get from the clinic comes in an excel spreadsheet. We use specific columns of information from the spreadsheet we receive since we do not need everything from there. I've used multiple functions to remove duplicates since the patient records are entered manually, so there's a lot of different ways you'll see in the list.
It's okay if they have different addresses and insurance info because people move and they switch insurances. The conditions we focus on is whether someone has the same date of birth and first name and last name. We're not too strict on the last name either because last names change. In this case, we make sure that their date of birth is the same, first name is the same, and the address. We need at least three elements the same to consider that these are the same person, or we'll say these are different people and leave the records alone.
The list we work consist of the following columns:
First name, last name, DOB, Address, City, State, Zip, Primary Insurance.
| First Name | Last Name | DOB | Address | City | State | Zip | Primary Insurance |
|------------|-----------|--------------|------------------|----------|------------|--------|--------------------|
| John | Smith | 01/01/1990 | 123 ABC Street | Denver | Colorado | 87880 | Humana Insurance |
| Jon | Smith | 01/01/1990 | 123 ABC Street | Denver | Colorado | 87880 | Humana Insurance |
| Anthony | Bowen | 02/02/1992 | 456 ABC Street | Austin | Texas | 78632 | Aetna |
| Tony | Bowen | 02/02/1992 | 456 ACC Str | Austin | Texas | 78632 | Aetna |
Currently I sort the entire sheet of data from DOB oldest to newest, first name a-z, last name a-z.
From there I apply a formula:
if(and(C2=C3, B2=B3, A2=A3),"Check","") and apply through the entire rows. I filter all the blank cells to remove the formulas embedded and un-filter to jump down to the next cell that is flagged down with "Check" and check the next two rows.
I spot check to make sure the formula is picking all the true duplicates and then filter to see all the "Check" and remove all rows at the same time.
Then I move on to applying another formula:
if(and(C2=C3,left(B2,2)=left(B3,3),left(A2,2)=left(A3,2)),"Check","")
This is to match by same DOB, first two letters of first name and last name. Same process by jumping down to the next row that is flagged down with "Check" to review the two consecutive rows.
Again, spot check to make sure they're true duplicates, then filter to see all "check" and mass delete.
I would like to do this by having macro embedded buttons and the button will help grab the flagged down records to another tab of worksheet and removing those duplicated records from the original working sheet. So this way, it's removed from the original list but the user can still go to the other tab to review those removed duplicates if they want to.
I'd appreciate any suggestions anyone can give.
Thank you!

To to arrange non sequencial IDs in order

I have three excels. All containing same IDs and phone numbers may be differ, Id is alphanumeric and we have total 10,000 records, but IDs are not present in same sequence across all excel sheets.
For example:
If one ID is present in row 2000 in sheet 1, same ID is present at 3200 row and in sheet 3 it is present in row 5200.
Sheet 1 Sheet2
ID | Contact Number| | ID | Contact Number |
MP-XX-098 | 89652395 | | KJ-OP-98 | 3323241 |
KJ-OP-98 | 3323241 | | MP-XX-098 | 89652395 |
OP-MK-09 | 9632211 | | UI-32-09 | 3234521 |
UI-32-09 | 3234521 | | OP-MK-09 | 9632211 |
I need to create single excel to find which system is having different records.
for example
Sheet
ID | Contact Number(1) | contact number(2) |Contact Number(3) |
MP-XX-098 | 89652395 | 89652395 | 89652395 |
KJ-OP-98 | 3323241 | 3323241 | 3323241 |
OP-MK-09 | 9632211 | 9632211 | 9632211 |
UI-32-09 | 3234521 | 3234521 | 3234521 |
Please note I already tried sort A-Z but it is not working.
Vlookup: use vlookup to find the corresponding value in each sheet:
ID | Contact Number(1) | contact number(2) |Contact Number(3) |
MP-XX-098 | =vlookup(a2,sheet1!$A$2:$b$100, 2, FALSE)| =vlookup(a2, sheet2!$A$2:$b$100, 2,FALSE)| 89652395 |
vlookup will search a range for a value (in this case ID) and return the nth column of the row where the value is found.
In this case the range to search is sheet1!$A$2:$b$100, the value to fins in in the first column: a2 and we need the value in the 2nd column: 2
To clarify Ken's answer a bit...
What you'll likely want to do is copy the IDs to a new sheet in, say, column A. then in columns B, C, and D, you'll put formulas such as Ken posted.
Note that Ken's formulas have a typo -- the search value comes first and then the search range. See this page at office.com for more info. So they really should be:
=vlookup($a2, sheet1!$A$2:$b$10001, 2, FALSE)
The first parm for vlookup is the cell address of the value you want to look up. That's the one in the current sheet, over in column A. If your first one is on row 2, then you'd use $A2 in your vlookup formulas. You want the $ before the A so that it always looks in column A, but not in front of the 2 because you want it to use the value on the same row as the formula. (So you can do this in cell B2 and copy it to C2 and D2, then use Fill Down to copy the formulas to all the rows.)
The second parm for vlookup is the search range -- that will be the range containing the ID and Contact Number in each of the other sheets. (e.g., if your ID is in column A and Contact Number in column B and they start on row 2 and there are 10k records, you'd use sheet1!$A$2:$B:10001 where "sheet1" is the name of the first worksheet.)
The third parm is the column in your search range from which you want to copy your value -- in this case, it's the contact number in the second column of the search range. (Note that this the column of the search range, not of the worksheet.)
The last parm, FALSE, just says to use an exact match, rather than find the closest.
Then, if you want to flag those rows where there is a discrepancy (so you can just scan them to find the problems), use something like this in Column E:
=IF(OR($B5 <> $C5,$B5 <> $D5), "***", "")
This will put three asterisks (***) in column E for each row where one of the contact numbers differs from one of the others.
Hope this helps!

Find value using multiple criteria

I've got a table with columns, each containing customer contact information. I've also got a formula that finds a phone number using multiple criteria: customer ID, type (mobile, home etc), and primary Y/N. The problem is this information can occur several times but with a different date, in which case the newest occurrence needs to be selected. The current CSE formula is:
=INDEX($C$6:$BZ$18;10;MATCH(<client_ID>;IF(($C$8:$BZ$8=<client_ID>)*($C$17:$BZ$17="home")*($C$18:$BZ$18="Y");$C$8:$BZ$8);0))
where
$C$6:$BZ$18 contains all data
$C$8:$BZ$8 contains all client IDs
$C$17:$BZ$17 contains the types of phone numbers
$C$18:$BZ$18 contains whether this number is the primary number of that type
$C$8:$BZ$8 contains the date a number was entered
The data looks like this:
B C D
---------------------------------------------------------------------
8 CLIENTID |Client1 |Client1 |
9 other | | |
10 other | | |
11 other | | |
12 other | | |
13 other | | |
14 other | | |
15 PHONE NUMBER |9876543210 |1234567890 |
16 DATE |2015-04-15 |2015-04-16 |
17 TYPE |Home |Home |
18 Primary |Y |Y |
The above formula selects phone number 9876543210 but it needs to select 1234567890 because that is the latest entry.
Any ideas on how to proceed from here?
The underlying value of dates are numbers so we can look for the furthest date to the right in a row by searching for an impossibly high number with the MATCH function without looking for an exact match.
      
The array formula in F6 is,
=INDEX($B$8:$BZ$18, MATCH(F$5, $B$8:$B$18, 0), MATCH(1E+99, IF($B$8:$BZ$8=$C6, IF($B$17:$BZ$17=$D6, IF($B$18:$BZ$18=$E6, $B$16:$BZ$16)))))
Array formulas need to be finalized with Ctrl+Shift+Enter↵.
If your dates are in ascending order (left-to-right) then an exact match will have to be sought. A three criteria pseudo-MAXIF formula can return that into the original formula modified to look for an exact match. If the maximum date is duplicated, the first one is returned.
=INDEX($C$8:$BZ$18, MATCH(F$5, $B$8:$B$18, 0), MATCH(MAX(INDEX($C$16:$BZ$16*($C$8:$BZ$8=$C6)*($C$17:$BZ$17=$D6)*($C$18:$BZ$18=$E6), , )), IF($C$8:$BZ$8=$C6, IF($C$17:$BZ$17=$D6, IF($C$18:$BZ$18=$E6, $C$16:$BZ$16))), 0))
In order to provide some maths without errors, I've shifted the calculation ranges to C:BZ. Array formulas still need to be finalized with Ctrl+Shift+Enter↵.
By appropriately locking either the row, column or both of the cell addresses, we can use the column header to identify a different category from column B as I have done with DATA LINE. The formula can be simply filled right.

Transponse just some columns in excel

I have a worksheet with columns similar to the below
name | id | contact | category | week 1 | week 2 | week 3 | ... |week 52
What I need to do is transpose the 'week' columns into rows, so I end up with:
name | id | contact | category | week
With an entry for each week as a row in the s/sheet - thus making a long list on rows with the column data for each week.
example current format:
jones | 12345 | simon | electronics | 100 | 120| 130| 110 | ..........150
Required format
jones | 12345 | simon | electronics | 100
jones | 12345 | simon | electronics | 120
jones | 12345 | simon | electronics | 130
jones | 12345 | simon | electronics | 110
...
jones | 12345 | simon | electronics | 150
I have tried the usual excel transpose (via paste) but cannot get the first few columns to stay static, whilst transposing the week columns
Ideally I would like to achieve this within excel, but I can import the data into a mysql database and use that if the solution would be easier that way
Hope this makes sense
[added examples]
I would do the work on a second sheet, which uses the INDIRECT function to do the lookups for you:
http://www.excelfunctions.net/Excel-Indirect-Function.html
Start by setting up some indexes on the new sheet - we will use these to indirectly look up into the original sheet and pull the data across.
I would count up to 52 again and again in column A, starting with a 1 in A2, and using this formula below:
=if(A2=52,1,A2+1)
This would be my count of the weeks per person.
In column B, I would count my people, starting with a 1 in B2, and using this formula:
=if(A3=1,B2+1,B2)
This gives me the row and column offsets to use in the INDIRECT function to fetch the data from your original sheet.
Now the fun part - matching these row and column offsets to your actual data.
Lets assume your original data is in a sheet called "original". This is where we need to look up the data.
We will map the original column A into the new sheet's column C. So C2 can hold this formula:
=indirect("original!R"&($B2+1)&"C1",false)
What you are doing there is looking in the row that you calculated in the B column (formula above), and looking in the first column of that row (i.e. column A) - this is where the Name is stored.
Similarly, the "id", "contact" and "category" columns get mapped to new sheet columns D, E, F, using modifications of that formula:
=indirect("original!R"&($B2+1)&"C2",false)
=indirect("original!R"&($B2+1)&"C3",false)
=indirect("original!R"&($B2+1)&"C4",false)
Only the column offset gets changed in these updates.
To pull the weekly data across, we use a similar formula; the difference is that now we get to use the newly calculated column A, where we counted up from 1 to 52 over and over.
So G2 becomes:
=indirect("original!R"&($B2+1)&"C"&(4+$A2),false)
Copy this all down as far as you need, and hide columns A and B.

Counting the number of older siblings in an Excel spreadsheet

I have a longitudinal spreadsheet of adolescent growth.
ID | CollectionDate | DOB | MOTHER ID | Sex
1 | 1Aug03 | 3Apr90 | 12 | 1
1 | 4Sept04 | 3Apr90 | 12 | 1
1 | 1Sept05 | 3Apr90 | 12 | 1
2 | 1Aug03 | 21Dec91 | 12 | 0
2 | 4Sept04 | 21Dec91 | 12 | 0
2 | 1Sept05 | 21Dec91 | 12 | 0
3 | 1Aug03 | 30Jan89 | 23 | 0
3 | 4Sept04 | 30Jan89 | 23 | 0
This is a sample of how my data is formatted and some of the variables that I have. As you can see, since it is longitudinal, each individual has multiple measurements. In the actual database there are over 10 measurements per individual and over 250 individuals.
What I am wanting to do is input a value signifying the number of older brothers and older sisters each individual has. That is why I have included the Mother ID (because it represents genetic relatedness) and sex. These new variable columns would just say how many older siblings of each sex each individual has. Is there a formula that I could use to do this quickly?
=COUNTIFS($B:$B,"<>"&$B2,$H:$H,$H2,$AI:$AI,$AI2,$J:$J,"<"&$J2)
Create a column named Distinct with this formula
=1/COUNTIF([ID],[#ID])
Then you can find all the older 0-sexed siblings like this
=SUMPRODUCT(([DOB]>[#DOB])*([MOTHERID]=[#MOTHERID])*([Sex]=0)*([Distinct]))
Note that I made the data a Table and used table notation. If you're not familiar [COLUMNNAME] refers to the whole column and [#COLUMNNAME] refers to the value in that column on the current row. It's similar to saying $A:$A and A2 if you're dealing with column A.
The first formula gives you a value to count that will always result in 1 for a particular ID. So ID=1 has three lines and Distinct will result in .33333 for each line. When you add up the three lines you get 1. This is similar to a SELECT DISTINCT in Sql parlance.
The SUMPRODUCT formula sums [Distinct] for every row where the DOB is greater than the current DOB, the Mother is the same as the current Mother, and the Sex is zero.
I have a possible solution. It involves adding two columns -- One for "# older siblings" and one for "unique?". So here are all the headings I have currently:
A -- ID
B -- CollectionDate
C -- DOB
D -- MOTHER ID
E -- Sex
F -- # older siblings
G -- unique?
In G2, I added the following formula:
=IF(A2=A1,0,1)
And dragged down. As long as the data is sorted by ID, this will only display "1" once for each unique person.
In F2, I added the following formula:
=COUNTIFS(G:G,"=1",D:D,"="&D2,C:C,"<"&C2)
And dragged down. It seemed to work correctly for the sample data you provided.
The stipulations are:
You would need the two columns.
The data would need to be sorted by ID
I hope this helps.
You need a formula like this (for example, for row 2):
=COUNTIFS($A:$A,"<>"&$A2,$E:$E,$E2,$D:$D,$D2,$C:$C,"<"&$C2)
Assuming E:E is column for sex, D:D is column for mother ID and C:C is column for DOB.
Write this formula in H2 cell for example and drag it down.

Resources