Cleaning data using macro - excel

If possible, I need help with creating excel macro which will clean some fields in my spreadsheets.
I am receiving excel spreadsheets with different amount of records. And in the following spreadsheets I need to format fields like First Name / Last Name / Job Title / City (I used excel Proper() function when I did it manually). Most important, I need also to replace Industry field with the standard values from another sheet. And: to replace State (from short values, like TX to Texas), also replace Country from us or usa or united states of america to "United States". (when I performed this manually, I used VlookUp () function).
Example:
I have spreadsheet(s), like:
Sheet 1, Data:
FName LName Email Title City ST Phone Industry Country
John sm j#hotmail.com it dallas TX 5556663344 mobile us
jess lee jess#aol.com ba ny ny 6667775656 art usa
nick Jahn nick#aol.com ba raleigh ny 444-3338888 tech us
Sheet2, State:
ST ST_Full
TX Texas
NY New York
NC North Carolina
etc. -> all US / Canada states list
Sheet 3, Industry:
Industry Industry_Correct
Mobile Telecom
Art Other
Tech Technology
etc. -> the list of all possible variations correct/incorrect industries
Sheet 4, Country:
Country
Angola
Canada
Russian Federation
United States
for the Sheet 4, I have alphabetical list of over 200 countries, and need to replace countries like "us" "Russia" to proper name from the list.
Result Sheet, (what expecting to have):
FName LName Title City ST Phone Industry Country
John Sm It Dallas Texas 555-666-3344 Telecom United States
Jess Lee Ba New York New York 666-777-5656 Other United States
Nick Jahn Ba Raleigh New York 444-333-8888 Technology United
States
email
j#hotmail.com
jess#aol.com
nick#aol.com
I was trying to record very simple macro; but I have very different spreadsheets - w. different amount of records - from 5 to 2000 or 3000 records.
Simple recorded macro only somehow cleaned FIXED amount of records.

It sounds like you have your recorded macro working properly except that it is working on a fixed amount of records. You need to make the range dynamic so your macro can see how many rows of data there are automatically. Without seeing your code, its hard to tell you how to implement this. Generally, something like this will do the trick:
Dim sht As Worksheet
Dim LastRow As Long
Set sht = ThisWorkbook.Worksheets("[sheet name]")
'Can also reference the sheet code, which is better in my opinion. Set sht = Sheet1 or whatever number your sheet is.
'This will fetch the last row for you
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
You may find this website helpful: http://www.thespreadsheetguru.com/blog/2014/7/7/5-different-ways-to-find-the-last-row-or-last-column-using-vba
To respond to your comments:
You should implement this at the very beginning of your code. So what we are doing is 1) setting the sheet for the macro, 2) determining the last row of data in that sheet, and 3) you need to then use that last row to set a Range for the macro to work on, or at least go through your macro and replace row 1000 with LastRow. So, in addition to the code I provided, you may also have something like:
Dim myRange as Range: Set myRange = sht.Range("A2:Z" & LastRow)
Then you can simply reference myRange when you need your macro to do something in that range. Hope that helps. Instead of make a new post, you should be able to edit your original post to add some of your code if necessary.
As to your IFERROR question, its hard to say the most correct approach without more information. Generally speaking, I think its better to use if/else statements, but you can certainly just use WorksheetFunction.iferror(...) if you need to replicate the same iferror functionality that is built into Excel.

Related

Excel Array formula IF function order of operations/priority

Context: I'm working on creating a dynamic tool in Excel to append existing ID's from our donor database onto incoming recurring gifts that will be imported.
I'm using an Array formula with multiple criteria to try and match IDs from our database based on incoming data of dubious quality (due to custom user input).
Sample Content:
By request, here are headings with a sample row below. First from one sheet, and then from the next.
WISE_TRANSACTIONS_2
Date Type Transaction Amount Description Payment Type Payment Name Account Code Email First Name Name.1.2 Last Name Address 1 Address 2 City State Zip Country
7/31/2018 23:27 Sponsorship $48.00 Sponsorship for Beneficiary XX11 Debit / MasterCard Peter K Tular IND-0019793 petetee#icloud.com Peter K Tular 123 Fake St Los Angeles Ca. 90043 US
GIFT_ID_1
First Name Name.1.2 Last Name Gift Type Nickname Constituent ID Gift ID Gift Amount Preferred Address Line 1 Preferred City Preferred State Preferred ZIP E-Mail Number Fund ID
Peter Tular Recurring Gift 81435 9777445 $48.00 123 Fake St Los Angeles California 90043 petertee#me.com Sponsorship
Problem: My formula seems to not prioritize the first IF statement in a series of nested IF statements. This first IF is based on their email since that's likely the most unique identifier available from the import.
Sorry the copied formula text is grotesque because of my weird excel formatting. Let me know if you'd like this redone in some way from my end.
Full formula - doesn't give e-mail priority, but I think it should
{=INDEX(GIFT_ID_1[#All],
MATCH(1,
(GIFT_ID_1[[#All],[Gift Amount]]=WISE_TRANSACTIONS_2[#[Transaction Amount]])*
IF((LEFT(GIFT_ID_1[[#All],[E-Mail Number]],6)=LEFT(WISE_TRANSACTIONS_2[#Email],6)),
(LEFT(GIFT_ID_1[[#All],[E-Mail Number]],6)=LEFT(WISE_TRANSACTIONS_2[#Email],6)),
(IF((GIFT_ID_1[[#All],[Last Name]]=WISE_TRANSACTIONS_2[#[Last Name]]),
(GIFT_ID_1[[#All],[Last Name]]=WISE_TRANSACTIONS_2[#[Last Name]]),
IF((LEFT(GIFT_ID_1[[#All],[First Name]],6)=LEFT(WISE_TRANSACTIONS_2[#[Last Name]],6)),
(LEFT(GIFT_ID_1[[#All],[First Name]],6)=LEFT(WISE_TRANSACTIONS_2[#[Last Name]],6)),
IF((LEFT(GIFT_ID_1[[#All],[First Name]],6)=LEFT(WISE_TRANSACTIONS_2[#[First Name]],6)),
(LEFT(GIFT_ID_1[[#All],[First Name]],6)=LEFT(WISE_TRANSACTIONS_2[#[First Name]],6)),
(GIFT_ID_1[[#All],[Nickname]]=WISE_TRANSACTIONS_2[#[First Name]]))))*
(IF((LEFT(GIFT_ID_1[[#All],[Preferred Address Line 1]],6)=LEFT(WISE_TRANSACTIONS_2[#[Address 1]],6)),
(LEFT(GIFT_ID_1[[#All],[Preferred Address Line 1]],6)=LEFT(WISE_TRANSACTIONS_2[#[Address 1]],6)),
IF((GIFT_ID_1[[#All],[Preferred ZIP]]=WISE_TRANSACTIONS_2[#Zip]),
(GIFT_ID_1[[#All],[Preferred ZIP]]=WISE_TRANSACTIONS_2[#Zip]),
(GIFT_ID_1[[#All],[Preferred City]]=WISE_TRANSACTIONS_2[#City])))))),
0),6)}
Instead it's using something less ideal, and several steps further down the line, like first name.
If I strip out the entire IF section of the formula, the email match works perfectly.
Reduced formula:
{=INDEX(GIFT_ID_1[#All],
MATCH(1,
(GIFT_ID_1[[#All],[Gift Amount]]=WISE_TRANSACTIONS_2[#[Transaction Amount]])*
(LEFT(GIFT_ID_1[[#All],[E-Mail Number]],5)=LEFT(WISE_TRANSACTIONS_2[#Email],5)),
0),6)}
What am I missing here? This is about as deep as I've dived into Excel up to now, and my first post on here ever. Any help is appreciated!
It's not perfect yet, but I've got it working for the problem cases I was looking at.
I realized the Index Match was making it's way down the Array, simply seeing who the first entry was it came across that matched anything it needed to find.
Therefore even though I was searching using
Michael Hernando, $48.00, 477 Bowter Plant Dr, Apex, NC, 35779,
hernandos#gmail.com
This (with a different last name and unrelated email)
Michael Johnson, $48.00, 555 Nebraska Ave, Apex, NC, 35779, tomatoboy#aol.com
Was found before this
Michael Hernando, $48.00, 911 Maui Ct, Maggie Valley, NC, 35688,
hernandos#gmail.com
Because I allowed a worst case scenario where only the Transaction Amount, First Name and Zip or City all matched.
Basically, I needed to tighten up my requirements. A few changes I made:
E-mail now searches everything up to the # symbol. Because there are a few e-mail row's in my array's e-mail column that are empty, I had to throw in IFERROR to deal with #N/A results.
I figured a matching street address for 8 characters is a good indicator, so I put that on par with e-mail.
Alternatively, some semblance of matching First, Last and location is the backup method.
I also made simplified use of IF conditions after realizing we're just trying to get 1's and 0's.
The formula I went with was this
{=INDEX(GIFT_ID_1[#All],
MATCH(1,
(GIFT_ID_1[[#All],[Gift Amount]]=WISE_TRANSACTIONS_2[#[Transaction Amount]]) *
IF(IFERROR((LEFT(GIFT_ID_1[[#All],[E-Mail Number]],FIND("#",GIFT_ID_1[[#All],[E-Mail Number]])-1)=LEFT(WISE_TRANSACTIONS_2[#Email],FIND("#",WISE_TRANSACTIONS_2[#Email])-1)),0) +
(LEFT(GIFT_ID_1[[#All],[Preferred Address Line 1]],8)=LEFT(WISE_TRANSACTIONS_2[#[Address 1]],8)) +
(IF(IF((GIFT_ID_1[[#All],[Last Name]]=WISE_TRANSACTIONS_2[#[Last Name]]) +
(LEFT(GIFT_ID_1[[#All],[First Name]],6)=LEFT(WISE_TRANSACTIONS_2[#[Last Name]],6))
>0,1,0) *
IF(((LEFT(GIFT_ID_1[[#All],[First Name]],6)=LEFT(WISE_TRANSACTIONS_2[#[First Name]],6)) +
(GIFT_ID_1[[#All],[Nickname]]=WISE_TRANSACTIONS_2[#[First Name]]))
>0,1,0) *
IF(((LEFT(GIFT_ID_1[[#All],[Preferred Address Line 1]],6)=LEFT(WISE_TRANSACTIONS_2[#[Address 1]],6)) +
(GIFT_ID_1[[#All],[Preferred ZIP]]=WISE_TRANSACTIONS_2[#Zip]) +
(GIFT_ID_1[[#All],[Preferred City]]=WISE_TRANSACTIONS_2[#City]))
>0,1,0)
>0,1,0))
>0,1,0),
0),6)}

Need to count certain text in Excel relative to today's date with large amounts of variable data in each cell

I have exported updates from my help desk ticket tracking software into Excel. Each cell contains all updates made on a single ticket over the life of the ticket. Here is an example of a typical cell:
IM1234567;"4/20/16 15:31:01 US/Eastern (Smith John ABC DEF GHI): Some text about the status of the ticket. 04/13/16 23:53:06 (Doe Jane ABC DEF GHI): Some more text about the status."
The goal is to count each technician's name in the ticket if they have made an update to the ticket in the last month. I have all the technicians broken out into individual columns with each ticket in its own row. Here is an example:
Ticket Jane Doe John Smith James Adam Etc
IM1234567 1 0 0 0
IM1234568 0 1 0 0
IM1234569 0 0 1 0
Given how varied each ticket is I am not sure how to go about doing this. Some of the tickets are extremely long, and as they are free text fields punctuation and spelling are sometimes lacking.
Thanks for the support.
EDIT: Updated to handle 1- or 2-digit month and day.
EDIT2: Updated to automatically calculate how far ahead in the ticket to look for the name.
If you're ok with the name in the table being in last first format, this will work. Enter as an array formula (ctrl+shift+enter):
SUM(IF(ISNUMBER(VALUE(MID($A2,ROW(INDIRECT("1:"&LEN($A2))),1))),IF(ISNUMBER(VALUE(MID($A2,ROW(INDIRECT("1:"&LEN($A2)))-1,1))),0,IF(LEN(SUBSTITUTE(MID($A2,ROW(INDIRECT("1:"&LEN($A2))),8),"/",""))=6,IF(IFERROR(DATEVALUE(MID($A2,ROW(INDIRECT("1:"&LEN($A2))),8)),IFERROR(DATEVALUE(MID($A2,ROW(INDIRECT("1:"&LEN($A2))),7)),IFERROR(DATEVALUE(MID($A2,ROW(INDIRECT("1:"&LEN($A2))),6)),0)))>EDATE(TODAY(),-3),IF(IFERROR(FIND(" ("&C$1,MID($A2,ROW(INDIRECT("1:"&LEN($A2))),SEARCH(")",MID($A2,ROW(INDIRECT("1:"&LEN($A2)))+8,LEN($A2))))),0)>0,1,0),0),0)),0))
The formula =(LEN(<Insert Complete string ticket address here>)-LEN(SUBSTITUTE(<Insert Complete string ticket address here>,<Insert name or address here>,"")))/LEN(<Insert name or address here>) should do the trick.
Edit: To solve the lastname, firstname order issue, flip the names when you refer to them in your formula with: =MID(B1&" "&B1,FIND(" ",B1)+1,LEN(B1)).
So, in resume, place on B2:
=(LEN($A2)-LEN(SUBSTITUTE($A2,MID(B$1&" "&B$1,FIND(" ",B$1)+1,LEN(B$1)),"")))/LEN(MID(B$1&" "&B$1,FIND(" ",B$1)+1,LEN(B$1))) and drag it to the other cells.
Finally, for find the value you want (sorry for the late edit, I didn't read your question with much attention), try the following (on B2 cell):
{=SUM(IF(IFERROR(FIND(TEXT(NOW()-31;"m/yy");(FILTERXML("<t><s>" & SUBSTITUTE($A2; "."; "</s><s>") & "</s></t>"; "//s")));0)<>0;LEN((FILTERXML("<t><s>" & SUBSTITUTE($A2; "."; "</s><s>") & "</s></t>"; "//s")))-LEN(SUBSTITUTE((FILTERXML("<t><s>" & SUBSTITUTE($A2; "."; "</s><s>") & "</s></t>"; "//s"));MID(B$1&" "&B$1;FIND(" ";B$1)+1;LEN(B$1));"")))/LEN(MID(B$1&" "&B$1;FIND(" ";B$1)+1;LEN(B$1))))}
Please make sure:
First: you place the formula as an array (Ctrl+Shift+Enter) to enter the brackets;
Second: check the format display "m/yy" used in your country or in your Excel version.
I tested the formula on a similar file of my own and it worked.

VLOOKUP with conditions

I have an issue at the moment which I'm not able to resolve even with multiple combinations of If and Vlookups. I'm not doing this right.
I have a sheet which has the names of the products and an empty column for the Sl Number. The Sl number needs to be retrieved from Sheet 2 if it matches the value in the adjacent cell of the formula (This I know can be possible with Vlookup). However, I am trying to display the value even if the match is not exact. By that I mean if the product name has all the values as on the sheet 1 but also has additional information in brackets, then the value should still be displayed.
Sheet 1
Formula in A2 - A7 = "=VLOOKUP(B2, Sheet2!B:E, 2, 0)"
Sheet 2
The complete data
Is this possible?
Thanks in advance.
Apologies, I'm new here and not sure how this works. So trying to do the right thing but may take some time.
Thanks Frank and Tim. I have another extended question to this.
Is there a way to retrieve the value by ignoring text in brackets on the lookup cell itself?
For example:
Sheet 1
Sl Number Name
123454 Cream SPF 30+ 50g
**NA** Bar Chocolate 70g X 6 (Sample)
234256 Hand Wash 150ml
26786 Toothpaste - Whitening 110g
Sheet 2
ID Name Sl number Manufacturer Quantity
8 Collagen Essence 10ml 456788 AL 87
9 Hand Wash 150ml 234256 AD 23
10 Bar Chocolate 70g X 6 835424 AU 234
Row 2 on Sheet 1 has the name that includes (Sample) and the same product on sheet 2 does not contain the (Sample) for that product. Is there a way I can use lookup in the above scenario?
Thank you
Tim's comment
=VLOOKUP(B2 & "*", Sheet2!B:E, 2, 0) as long as the "Extra" info is tagged onto the end of the name, and none of your product names is a
substring of another product name. – Tim Williams 53 mins ago
Will get what you are looking for, as for getting rid of text between "(...)" use
=IFERROR(IF(FIND("(",A2),LEFT(A2,FIND("(",A2)-1),A2),A2)
To create a new column that will cut out anything that has parentheses "(...)" this presumes that all of your entries has the "(...)" at the end, i.e. far right side.
As you are new, I presume you might be interested in an explanation. I'll explain what Tim and I did. If I am incorrect, anyone is free to edit.
Based on your question, it would appear that you are familiar with Excel but not the site. This said, my understanding of the key difference between your attempt and Tim's was =VLOOKUP(B2 & "*", Sheet2!B:E, 2, 0) or specifically & "*". This introduces a Wildcard to the search parameter. So if you typed "Bob" but the actual reference was "Bob's Burger" That "*" would allow ['s Burger] to be included as part of the possible search given that you set vLookup to search for Approximate rather than exact matches. =VLOOKUP(B2 & "*", Sheet2!B:E, 2, 0) specifically , 0).
As for my part, IFERROR is effectively an catch-all for errors in IF functions. If there is a error, then X. In this case, if it does not find "(" in the cell, then it will throw an error. Since it is an error, display the original cell.
As for IF(FIND("(",A2),LEFT(A2,FIND("(",A2)-1),A2) It asks Excel to look for "(" in the cell A2, if it finds it, then it it counts from the LEFT until it finds the "(" and deletes the text one space to the left of the first "(". Thus removing the "(...)".

VLOOKUP MULTIPLE RANGES

Column A and B is a item and country post code. Column B contain two country post code USA and UK. Both country we have dispatched same part. I am trying to create vlookup formula corresponding to the range but its return na. Please help me.
Country code ranges;
USA Angeles10 Angeles20 Angeles30 Angeles40 Angeles50 Angeles60 Angeles70 Angeles80 Angeles90 Angeles100 Angeles110 Angeles120 Angeles130 Angeles140 Angeles150
UK London10 London20 London30 London40 London50 London60 London70 London80 London90 London100 London110 London120 London130 London140 London150
DATA
ITEM POST CODE
4 Angeles10
4 Angeles20
110489 Angeles30
110489 Angeles40
113388 Angeles50
113388 Angeles60
113636 Angeles70
113636 Angeles80
11363613001 Angeles90
11363613001 Angeles100
11363613002 Angeles110
11363613002 Angeles120
11363613003 Angeles130
11363613003 Angeles140
1136362001 Angeles150
4 London10
4 London20
110489 London30
110489 London40
113388 London50
113388 London60
113636 London70
113636 London80
11363613001 London90
11363613001 London100
11363613002 London110
11363613002 London120
11363613003 London130
11363613003 London140
1136362001 London150
DESIRED RESULT
ITEM USA UK
4 Los Angeles10 London10
I put the first data on a sheet named datasheet in starting in A1.
Then use a formula like so in the E3:
=INDEX($B:$B,AGGREGATE(15,6,ROW($B$2:$B$31)/((ISNUMBER(MATCH($B$2:$B$31,INDEX(datasheet!$1:$1048576,MATCH(E$2,datasheet!$A:$A,0),0),0)))*($A$2:$A$31=$D3)),1))
Then copy/drag over and down.
Easiset Answer
If your data isn't changing and you know exactly where Angeles stops and London starts, you can just use a standard VLOOKUP formula. You just give the bottom part of the table to the UK column.
E3: =VLOOKUP(D3,A$3:B$6,2,)
F3: =VLOOKUP(D3,A$7:B$10,2,)
A little more complicated
If you need to be able to add rows or locations, this solution will work better. Add helper columns for each of the locations you need and a helper column which combines the item ID with the location. You can then use VLOOKUP by searching for the combination of item ID and location.
B3: =A3&CONCAT(D3:E3) (can expand past E3 for extra locations)
D3: =IF(ISERR(SEARCH(D$2,$C3)),"",D$2)
E3: =IF(ISERR(SEARCH(E$2,$C3)),"",E$2) (can drag right for each extra location)
H3: =VLOOKUP($G3&H$2,$B$3:$C$10,2,)
I3: =VLOOKUP($G3&I$2,$B$3:$C$10,2,) (can drag right for each extra location)
My favorite Answer
Just use Scott Craner's approach! ☺

Google Sheets - Auto Find & Replace

I have a large spreadsheet that imports basketball statistics from various websites throughout the day and then index-matches them in other tables. My problem is that some websites use a name like "Lou Williams," while others use "Louis Williams." Each time I update one of the tabs with new data, I have to manually correct all the discrepancies between names. Is there a way to write a script that auto corrects the 10-12 names that I am constantly having to fix? I'd love to find some way to do this that is easier than doing a "Find & Replace" 10 different times.
I'd also love if these "spell checks" would happen only on 3 specific tabs, but would also be ok if it happened across the entire worksheet.
I've done some research for a solution but I'm so new to scripting that it's hard to for me to translate other people's code.
Here is a (current) list of the names I frequently have to correct
INCORRECT CORRECT
Louis Williams Lou Williams
Kelly Oubre Kelly Oubre Jr.
Patrick Mills Patty Mills
James Ennis James Ennis III
Alex Abrines Álex Abrines
Guillermo Hernangomez Guillermo Hernangómez
Ishmael Smith Ish Smith
Sergio Rodriguez Sergio Rodríguez
Larry Nance Larry Nance Jr
Luc Mbah a Moute Luc Richard Mbah a Moute
Juan Hernangomez Juancho Hernangómez
Glenn Robinson Glenn Robinson III
There are other ways to do this, but this uses a 'Name List' sheet with correct and incorrect names. Run 'fix' from the custom menu and column A names will change the names to the correct ones. Here is the code and a sample spreadsheet you can copy and run.
function onOpen() {
SpreadsheetApp.getActiveSpreadsheet().addMenu(
'Correct Names', [
{ name: 'Fix', functionName: 'changeNames' },
]);
}
function changeNames() {
var ss=SpreadsheetApp.getActiveSpreadsheet()
var allSheets = ss.getSheets() //get all sheets
var numSheets=allSheets.length
var s=ss.getSheetByName("Name List")//get the list on names to change.
var lr=s.getLastRow()
var rng= s.getRange(2, 1, lr, 2).getValues()
for(var k=0;k<numSheets;k++){//loop through sheets
var s1=ss.getSheets()[k]
var test =s1.getSheetName()
if (s1.getSheetName()== "RG" || s1.getSheetName()=="NF" || s1.getSheetName()=="SWISH"){ //Sheets to process. '||' is OR
var lr1=s1.getLastRow()
var rng1=s1.getRange(2, 1, lr1, 1).getValues()// Assumes names start on eow 2 Column A. Adjust as needed
for(i=0;i<rng1.length;i++){
for(j=0;j<rng.length;j++){
if(rng1[i][0]==rng[j][1]){//if incorrect names match
rng1[i][0]=rng[j][0] //replace incorrect name with correct name
}}}
s1.getRange(2,1, lr1, 1).setValues(rng1)//replace names
}}}
https://docs.google.com/spreadsheets/d/16DPTDTCqzhYTGVmUpb1wlNFI8ZK7xoJNZvobufXapvg/edit?usp=sharing
Changed to only run certain sheets.

Resources