VBA Data Parsing from txt to excel - string

Okay I'll try to explain this the best way possible
- I am parsing .txt files which get imported into excel
- The .txt files which are parsed all have static fields (sometimes)
- In the future these fields could change and may not be static therefore the below code would parse pretty sloppy stuff
Questions:
1. Assuming in the future that these string values can change (ie "CUSTOMER NAME:" changes to DISTRIBUTOR NAME:)
a. Is there are way I can manipulate the data to where I only have to establish the name of the strings and skip having to establish the location of the strings within the text based on character length? (ie without using Mid(text, act + 6, 9))
If so, how would I go about doing this? Some lines within the text file have 1, 2 or three fields (ie SOME INDICATOR:,NO STATUS:,AUTO RENEW:.. has three field values, while the first line has one, and the second two. I am guessing one way to do this would be to establish the name of the strings so that every field between said string and the next string would be the string fields value ie:
ACCOUNT OPEN:(string 1) 05/10/15(string 1 value) ACT TYPE:(string 2) PREMIUM (string value 2)
In a nutshell I am trying to find a way that I can making this parser 'intelligent' so to speak. While its no big deal using the below method for the first part of a customers record, in reality each record has additional screens (mainframe). There are also additional screens pertaining to other things such as inventory, order numbers, etc.. So you could see how this could become a large and tedious undertaking. I don't mind spending a few months to map each field, it is just more work that I think is necessary, so any input would be greatly appreciated on how I can do this.
Also my below code does not work completely. It correctly imports the first record of text file but repeats that exact record for the remainder of rows in excel ( ie account: ABCDEF12 is repeated for 100 rows in excel rather than going down the list of records. Each new record starts 20 lines down from the previous). I am assuming that I have my loop structure wrong? Any thoughts? Any thoughts on how I can make the autofit portion of the below code more efficient? Currently it adds a lot more time to how long my code takes to parse, trim, clean and autofit, and ideas on how to shorten this?
Dim myFile As String
Dim text As String
Dim textline As String
Dim cstAct as integer
Dim actOpe as integer
Dim cusNam as integer
Dim act as integer
Dim reg as integer
myFile = "put file patch to text file here"
myFile = Application.GetOpenFilename()
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
cusAct = InStr(text, "ACCOUNT ")
actOpe = InStr(text, "ACCOUNT OPEN:")
reg = InStr(text, "REGION:")
cusNam = InStr(text, "CUSTOMER NAME:")
For i = 2 To ThisWorkbook.Worksheets("b2").Range("a65536").End(xlUp).Row
ThisWorkbook.Worksheets("name").Range("a" & i).Value = Mid(text, act + 6, 9)
ThisWorkbook.Worksheets("name").Range("b" & i).Value = Mid(text, cstAct + 6, 9)
ThisWorkbook.Worksheets("name").Range("c" & i).Value = Mid(text, actOpe + 13, 27)
ThisWorkbook.Worksheets("name").Range("d" & i).Value = Mid(text, cusNam + 20, 19)
next i
'here I format and autofit the cells
For x = 2 To ThisWorkbook.Worksheets("b2").Range("a65536").End(xlUp).Row
Range("a" & x).Value = Application.WorksheetFunction.Clean(trim(Range("a" & x)))
Range("b" & x).Value = Application.WorksheetFunction.Clean(trim(Range("b" & x)))
Range("c" & x).Value = Application.WorksheetFunction.Clean(trim(Range("c" & x)))
'etc etc
next x
'Text file which is parsed and then imported into excel
ACCOUNT ABCDEF12
ACCOUNT OPEN: 05/10/15 ACT TYPE: PREMIUM
CUSTOMER NAME: JOHN B. SMITH CSA REP: 154983
CUSTOMER ADDRESS: 123 SOMEWHERE DRIVE SOMETHING HERE:
LAST ORDER: 06/24/2011 COUNTRY CODE: UNITED STATES
INVOICE #: 123456789 STATE CODE: CALIFORNIA
LAST MAINTENANCE: 01/02/15 COUNTY CODE: UNCODED
SOME INDICATOR: NO COMPLAINTS: NO IPM IND: DATAPREP/PERF4
SOME INDICATOR: NO STATUS: NONE AUTO RENEW: YES
SOMETHING HERE: NO
SOMETHING HERE: ABC IND:
SOMETHING HERE: 2 ABC ASSET NO: T
.......REMAINDER OF TXT FILE REPEATS...JUST DIFFERENT RECORDS....
ACCOUNT ZXYFDG13
ACCOUNT OPEN: 05/10/15 ACT TYPE: PREMIUM
CUSTOMER NAME: JANE B. SMITH CSA REP: 154983
CUSTOMER ADDRESS: 123 SOMEWHERE DRIVE SOMETHING HERE:
LAST ORDER: 06/24/2011 COUNTRY CODE: UNITED STATES
INVOICE #: 123456789 STATE CODE: CALIFORNIA
LAST MAINTENANCE: 01/02/15 COUNTY CODE: UNCODED
SOME INDICATOR: NO COMPLAINTS: NO IPM IND: DATAPREP/PERF4
SOME INDICATOR: NO STATUS: NONE AUTO RENEW: YES
SOMETHING HERE: NO
SOMETHING HERE: ABC IND: NO
SOMETHING HERE: 2 REGION: NE
.....Records continue Indefinitely.........

Related

VBA Excel Formatting String with multiple periods

I'm working with government harmonized codes. They're formatted as 10 numbers with periods between the 4th and 5th characters, and six-seventh characters like this "1234.56.7890". I'm trying to do some validation work so when a user enters a number without the periods, which is often the case, it puts the periods in for them.
The harmonized code is a variable in this instance named dimmed tv as a string.
Here's the code I'm using:
tv = Format(tv, "####.##.####")
Problem is, when I input 1234567890, it converts it instead to 1234567890.. with the two periods at the end. Any idea how I can get it to convert it to 1234.56.7890 as my code implies it should?
I'd do it like this:
Sub tester()
Dim e
For Each e In Array("1234.56.7899", "123456.7899", "1234.567899", _
"1234567899", "123A567899", "123456789")
Debug.Print e, ValidTv(e)
Next e
End Sub
'check format and return normalized value if possible
' return empty string if valid value can't be created
Function ValidTv(ByVal tv As String) As String
If tv Like "####.##.####" Then
ValidTv = tv
Else
tv = Replace(tv, ".", "")
If tv Like "##########" Then
ValidTv = Left(tv, 4) & "." & Mid(tv, 5, 2) & "." & Right(tv, 4)
End If
End If
End Function

Schedule - User defined text that can reference array elements

I'm in the process of writting a macro for collecting debris. At each location, a series of tasks will be performed to recover the debris. The majority of these locations will require the same activities so im assuming they are all the same and the user can modify them as required after the macro is complete. Below is the current code:
Public Function additional_Lines(ByVal target_ID)
ActiveCell.offset(1, 0).Select
Dim offset As String
offset = " "
Dim additional_Text As Variant
additional_Text = Array(offset & "Relocate to " & target_ID, offset & "Recover " & target_ID, offset & "Recover basket")
For i = 0 To UBound(additional_Text)
ActiveCell.Value = additional_Text(i)
ActiveCell.offset(1, 0).Select
Next i
End Function
This generates the following output
OSP-040
Relocate to OSP-040
Recover OSP-040
Recover basket
ZDUN-THI-004
Relocate to ZDUN-THI-004
Recover ZDUN-THI-004
Recover basket
OSP-046
Relocate to OSP-046
Recover OSP-046
Recover basket
OSP-056
Relocate to OSP-056
Where OSP-056, OSP-046, ZDUN-THI-004, OSP-004 are the debris locations.
I'm wanting to allow the user to type in the text they would like to insert between the debris locations in one of the sheets ie:
Relocate to (Debris ID Number)
Recover (Debris ID Number)
Where Debris ID Number is stored in an array. Is it possible to read the 2 string above and then replace the (Debris ID Number) with a variable so i can update it from the array of debris IDs?
There is a replace function which can search a string for a specified string, and once it finds it, can replace it. Rather than replace it with a static string, i have referenced an array of the debris IDs.
For y = 1 To UBound(additional_Lines)
new_additional_Lines(y) = Replace(additional_Lines(y, 1), ID, " " & target_ID & " ")
Next y
Thanks again for everyones help.

regex for Excel to remove all but specific symbols after a specific symbol?

I have stings like this which are addresses, e.g.:
P.O. Box 422, E-commerce park<br>Vredenberg<br><br><br>Curaçao
Adelgatan 21<br>Malmö<br><br>211 22<br>Sweden
Läntinen Pitkäkatu 35 A 15<br>Turku<br><br>20100<br>Finland
I am interested in Country only. Country always comes last after a <br> tag.
Note, that there can be several such tags preceding this last value (e.g. 1st example string).
Is there a good way to do a formula may ve along those lines:
Identify end of string
Loop a character back until one reaches ">" character
Cut everything else (including the ">" encountered)
You don't need RegEx to do this if it's always the last part of the string.
You can get it with String modifiers doing
Sub Test()
Dim str As String, str1 As String, str2 As String
Dim Countries As String
str = "P.O. Box 422, E-commerce park<br>Vredenberg<br><br><br>Curaçao"
str1 = "Adelgatan 21<br>Malmö<br><br>211 22<br>Sweden"
str2 = "La¨ntinen Pitka¨katu 35 A 15<br>Turku<br><br>20100<br>Finland"
Countries = Right(str, Len(str) - InStrRev(str, "<br>") - 3)
Countries = Countries + vbNewLine + Right(str1, Len(str1) - InStrRev(str1, "<br>") - 3)
Countries = Countries + vbNewLine + Right(str2, Len(str2) - InStrRev(str2, "<br>") - 3)
MsgBox Countries
End Sub
Obviously this will need to be updated for how your data set is stored. You can loop through the dataset and use the string modifier on each line
A formula works too. If a string in A1, write in B1:
=TRIM(RIGHT(SUBSTITUTE(A1,"<br>",REPT(" ",100)),100))
Modified using an approach taken from here:
https://exceljet.net/formula/get-last-word

How can I pick specific string fragments out of an excel cell using a custom formula written in VBA

At work I am required to reformat incorrect Addresses on a weekly basis from records in our Salesforce instance. We gather the incorrectly formatted addresses using a Report and export them to an Excel file. My job is simply to manipulate the data in the file to format them properly then reinsert them into the database.
Typically the addresses are formatted as so:
5 Sesame Street, Anytown, Anyplace
Separating these can be done easily by hand, but I typically have to work with hundreds of addresses at a time, and using default excel formulas tends to require lots of wrangling multiple cells at once to break it up into fragments.
Thus I wrote a custom formula to run through the cell and return a specific fragment of the string based on the "Comma Number" given. So if I give a Comma Number of 1, I would get "5 Sesame Street", 2 would get me "Anytown", etc.
Here is my code so far:
Public Function fragmentAddress(address As String, numberofcommas As Integer) As String
seen = 1
lastComma = -1
Dim x As Long
Dim frag As Long
For x = 0 To Len(address)
If Mid(address, x, 1) = "," & numberofcommas = seen Then
Exit For
ElseIf Mid(address, x, 1) = "," & numberofcommas <> seen Then
seen = seen + 1
lastComma = x
End If
Next
frag = Mid(address, lastComma + 1, seen - lastComma)
fragmentAddress = frag
I have not implemented the ability to handle the final value yet, but it does not give me any outputs, only outputting a "#VALUE!" error when I attempt to give it the input
=fragmentAddress("3 Ashley Close, Charlton Kings",1)
I have some experience with programming, but this is my first time writing anything in VBA.
Any help would be appreciated, thank you.
Not exactly sure what your question is, but this is simpler:
Public Function GetAddressFragment(ByVal Address As String, ByVal Index As Integer) As String
Dim addr() As String
addr = Split(Address, ",")
On Error Resume Next
GetAddressFragment = Trim(addr(Index - 1))
End Function

Excel formula to rearrange LastName, Firstname MiddleInitial to FirstName Lastname

I need some help with a formula to rearrange names. I've found lots of formulas that work in some cases, but none that can handle all of the cases I encounter.
Briefly, here is the list of names, and the desired output:
Original names Desired Output
John, James J James John
Junior, Lake Lake Junior
Mitchel, Fields M Fields Mitchel
Rothschild Jr., Michael K Michael Rotschild
Sally, Sue L Sue Sally
Rinkel, Michael Michael Rinkel
Rivel, Nicholas L Nicholas Rivel
Hurwitz Sr., Susan Susan Hurwitz
The formula I have so far is: =TRIM(PROPER(TRIM(IF(ISERROR(FIND(",",A1,1)),A1,MID(A1,FIND(",",A1,1)+1,IF(ISERROR(FIND(" ",A1,FIND(",",A1,1)+2)),LEN(A1),FIND(" ",A1,FIND(",",A1,1)+2))-FIND(",",A1,1))))&" "&LEFT(A1,IF(ISERROR(FIND(",",A1,1)),LEN(A1),FIND(",",A1,1)-1))))
Its cobbled together from some other formulas I have found, and eliminates middle initials if they are present, but not the Sr. or Jr.'s if they are present.
There are so many nuances to parsing names that just as soon as you think you have covered all possibilities, there are new ones coming up. If you keep a User Defined Formula (aka UDF), you can quickly add new coding processes to meet new problems.
Public Function fcn_First_Last_Name(sNAM As String)
Dim sTMP As String, v As Long, vDELs As Variant, vNAMs As Variant
sTMP = Application.Trim(sNAM)
vDELs = Array(" Jr.", " Sr.", " III", " II")
sTMP = Replace(sTMP, Chr(160), Chr(32))
For v = LBound(vDELs) To UBound(vDELs)
sTMP = Replace(sTMP, vDELs(v), vbNullString, compare:=vbTextCompare)
Next v
If Asc(Mid(sTMP, Len(sTMP) - 1, 1)) = 32 Then sTMP = Trim(Left(sTMP, Len(sTMP) - 1))
vNAMs = Split(sTMP, Chr(44))
If CBool(UBound(vNAMs)) Then
fcn_First_Last_Name = vNAMs(UBound(vNAMs)) & Chr(32) & vNAMs(LBound(vNAMs))
Else
fcn_First_Last_Name = vNAMs(UBound(vNAMs))
End If
End Function
That should get you started. It should also prove to be a good learning experience as you add new routines to cover new difficulties.
Once you have that in a VBA code module sheet, use it just like any other worksheet formula. Example:
        
Another possibility, but very clunky:
=MID(A2,FIND(", ",A2,1)+2,IFERROR((FIND(" ",A2,FIND(", ",A2,1)+2)-4)-FIND(", ",A2,1)+2,LEN(A2)))&" "&IFERROR(LEFT(A2,FIND(" ",LEFT(A2,FIND(", ",A2,1)-1))-1),LEFT(A2,FIND(", ",A2,1)-1))

Resources