I'm working on cleaning a ton of data that has a common pattern like this:
REG#: 15082608 Date:15-JUN-15 BACKTRACK Cleared: Date:31-AUG-15 Recvd:13-MAY-15 Agency:OAKLAND (and about 25 other data points for each record but, following this pattern). A raw PDF file with a bunch of records is here: http://abc.ca.gov/reports/Actions2015/ActionsFinal_09-23-15.pdf
I'm not a programmer but, have tried Refine and a bunch of Excel tests but, haven't found a way I can do this for a large number of records (thousands but, will start with dozens :). So, my question is:
Could a script identify the colon ':' and then go backwards to the first space before that colon eg 'Date:15-JUN-15 BACKTRACK Cleared: Date:31-AUG-15' and enter a new line for each instance? So the resulting output would be:
Date:15-JUN-15 BACKTRACK
Cleared:
Date:31-AUG-15
The other question is that I can manually copy and paste each record (of all 25+ data points) into a unique cell but, what would be ideal is that I save the PDF as a spreadsheet and it basically builds a row for every row it finds - meaning in some cells there would be multiple colons and I would need the script to bump down the other rows accordingly.
Once I get to that place I can do a text-to-column and then build my database from there.
Select the cells containing the data and run this short macro:
Sub FixData()
Dim r As Range, v As String, vOut As String
For Each r In Selection
v = r.Text
vOut = ""
If v <> "" Then
ary = Split(v, " ")
For i = LBound(ary) To UBound(ary)
If InStr(1, ary(i), ":") > 0 Then
vOut = vOut & vbCrLf & ary(i)
Else
vOut = vOut & " " & ary(i)
End If
Next i
r.Value = vOut
End If
Next r
End Sub
Before:
and after:
Related
I am trying to create a Module that will format an excel spreadsheet for my team at work. There is one column that will contain the word "CPT" and various CPT codes with descriptions.
I need to delete all text (CPT description) after the 5 digit CPT code but alsp keep the word CPT in other cells.
For example: Column S, Row 6 contains only the word "CPT" (not in quotations)
Then Column S, Row 7 contains the text "99217 Observation Care Discharge"
This setup repeats several times throughout Column S.
I would like for Row 6 to stay the same as it is ("CPT") but in Row 7 i only want to keep "99217"
Unfortunately, this is not possible to do by hand as there are several people who will need this macro and our spreadsheets can have this wording repeated hundreds of times in this column with different CPT codes and descriptions.
I have tried various If/Then statements, If/Then/Else
Sub CPTcolumn()
Dim celltxt As String
celltxt = ActiveSheet.Range("S6" & Rows.Count).End(xlUp).Text
Dim LR As Long, i As Long
LR = Range("S6" & Rows.Count).End(xlUp).Row
For i = 1 To LR
If InStr(1, celltxt, "CPT") Then
Next i
Else
With Range("S6" & i)
.Value = Left(.Value, InStr(.Value, " "))
End With
Next i
End If
End Sub
When i try to run it I get Various "Compile Errors"
I would do this differently.
Given:
The cell to be modified will be the cell under a cell that contains CPT
in the algorithm below, we look for CPT all caps and only that contents. Easily modified if that is not the case.
Since you write " a five digit code", we need only extract the first five characters.
IF you might have some cells that contain CPT where the cell underneath does not contain a CPT code, then we'd also have to check the contents of the cell beneath to see if it looked like a CPT code.
So we just use the Range.Find method:
Sub CPT()
Dim WS As Worksheet, R As Range, C As Range
Dim sfirstAddress As String
Set WS = Worksheets("sheet4")
With WS.Cells
Set R = .Find(what:="CPT", LookIn:=xlValues, lookat:=xlWhole, _
MatchCase:=True)
If Not R Is Nothing Then
sfirstAddress = R.Address
Set C = R.Offset(1, 0)
C.Value = Left(C.Value, 5)
Do
Set R = .FindNext(R)
If Not R.Address = sfirstAddress Then
Set C = R.Offset(1, 0)
C.Value = Left(C.Value, 5)
End If
Loop Until R.Address = sfirstAddress
End If
End With
End Sub
If this sequence is guaranteed to only be in Column S, we can change
With WS.Cells
to With WS.Columns(19).Cells
and that might speed things up a bit.
You may also speed things up by adding turning off ScreenUpdating and Calculation while this runs.
Your first error will occur here:
ActiveSheet.Range("S6" & Rows.Count).End(xlUp).Text
Because you're trying to retrieve text from the last used range starting .End(xlUp) at Range("S61048576"), which is roughly 58 times the row limit in Excel. You might change Range("S6" & Rows.Count) to Range("S" & Rows.Count)
Your second error will occur here:
LR = Range("S6" & Rows.Count).End(xlUp).Row
Which will be the same error.
The third error will occur here:
For i = 1 To LR
If InStr(1, celltxt, "CPT") Then
Next i
You cannot nest half of an If-End If block in a For-Next loop, or vice-versa and you've done both. If you want to iterate and perform an If-End If each iteration, you need to contain the If-End If within the For-Next like
For i = 1 To LR
If InStr(1, celltxt, "CPT") Then
'Is the purpose here to do nothing???
Else
With Range("S" & i)
.Value = Left(.Value, InStr(.Value, " "))
End With
End If
Next i
EDIT:
For technical accuracy, your first error would actually be your broken up For-Next and If-End If, as you wouldn't even be able to compile to execute the code to run into the other two errors.
You can simply use the Mid function in the worksheet.
As I understood from your question that you need to separate numbers and put them in other cells, is this true?
To do this, you can write this function in cell R6 like this
=Mid(S6,1,5)
Then press enter and drag the function down and you will find that all the cells containing numbers and texts have been retained numbers in them
I need to parse out a list of tracking numbers from text in excel. The position in terms of characters will not always be the same. An example:
Location ID 987
Your package is arriving 01/01/2015
Fruit Snacks 706970554628
<http://www.fedex. com/Tracking?tracknumbers=706970554628>
Olive Oil 709970554631
<http://www.fedex. com/Tracking?tracknumbers=709970554631>
Sign 706970594642
<http://www.fedex .com/Tracking?tracknumbers=706970594642>
Thank you for shopping with us!
The chunk of text is located in one cell. I would like the results to either be 3 separate columns or rows looking like this:
706970554628 , 709970554631 , 706970594642
There will not always be the same number of tracking numbers. One cell might have six while another has one.
Thank you for any help!!
I think you'll need some VBA to do this. And it's not going to be super simple stuff. #Gary'sStudent has a great example of grabbing numbers from a big string. If you need something that is more specific to your scenario you'll have to parse the string word by word and have it figure out when it encounters a tracking number in the URL.
Something like the following will do the trick:
Function getTrackingNumber(bigMessage As String, numberPosition As Integer) As String
Dim intStrPos As Integer
Dim arrTrackNumbers() As Variant
'create a variable to hold characters we'll use to identify words
Dim strWorkSeparators As String
strWordSeparators = "()=/<>?. " & vbCrLf
'iterate through each character in the big message
For intStrPos = 1 To Len(bigMessage)
'Identify distinct words
If InStr(1, strWordSeparators, Mid(bigMessage, intStrPos, 1)) > 1 Then 'we found the start of a new word
'if foundTrackNumber is true, then this must be a tracking number. Add it to the array of tracking numbers
If foundTrackNumber Then
'keep track of how many we've found
trackNumbersFound = trackNumbersFound + 1
'redim the array in which we are holding the track numbers
ReDim Preserve arrTrackNumbers(0 To trackNumbersFound - 1)
'add the track
arrTrackNumbers(trackNumbersFound - 1) = strword
End If
'Check to see if the word that we just grabbed is "tracknumber"
If strword = "tracknumbers" Then
foundTrackNumber = True
Else
foundTrackNumber = False
End If
'set this back to nothing
strword = ""
Else
strword = strword + Mid(bigMessage, intStrPos, 1)
End If
Next intStrPos
'return the requested tracking number if it exists.
If numberPosition > UBound(arrTrackNumbers) + 1 Then
getTrackingNumber = ""
Else
getTrackingNumber = arrTrackNumbers(numberPosition - 1)
End If
End Function
This is a UDF, so you can use it in your worksheet as a formula with:
=getTrackingNumber(A1, 1)
Which will return the first tracking number it encounters in cell A1. Consequently the formula
=getTrackingNumber(A1, 2)
will return the second tracking number, and so on.
This is not going to be a speedy function though since it's parsing the big string character by character and making decisions as it goes. If you can wrangle Gary's Student's answer into something workable it'll be much faster and less CPU intensive on larger data. However, if you are getting too many results and need to go at this like a surgeon, then this should get you in the ballpark.
If tracking is always a 12 digit number, then select the cell run run this short macro:
Sub parser117()
Dim s As String, ary, i As Long
With ActiveCell
ary = Split(Replace(Replace(.Text, Chr(10), " "), Chr(13), " "), " ")
i = 1
For Each a In ary
If Len(a) = 12 And IsNumeric(a) Then
.Offset(0, i).Value = a
i = i + 1
End If
Next a
End With
End Sub
I am looping through a file (junk data.xlsx) to capture data for (thisworkbook). For range C1 it is a direct reference to the data in the junk data file and works just great by using CStr to increment the row number in the junk data file. But when I use a formula for the cell contents in the junk data file to strip off the left three characters (=LEFT(C1,3) for range A1 I get a syntax error message. Something wrong with what I am trying to do?
Dim r As Integer 'for row count in junk data file
r = 1
Workbooks("junk data.xlsx").Sheets("sheet1").Activate
'loop through junk data file until an empty row is found
Do While Cells(r, 1) <> ""
ThisWorkbook.Activate
Range("A1").Select
ActiveCell.FormulaR1C1 = "=LEFT('[junk data.xlsx]Sheet1'!R" & CStr(r) &"C1",3)"
Range("C1").Select
ActiveCell.FormulaR1C1 = "='[junk data.xlsx]Sheet1'!R" & CStr(r) & "C8"
Workbooks("junk data.xlsx").Sheets("sheet1").Activate
r = r + 1
Loop
&"C1",3)"
one quotation mark too many:
&"C1,3)"
Perhaps I misunderstand what you are trying to do, but Left(s,3) doesn't "strip off the left 3 characters". It does the opposite. It retains those characters and removes everything else. You seem to want Mid():
Sub test()
Dim s As String
s = "Hello World"
MsgBox Left(s, 3) 'diplays "Hel"
MsgBox Mid(s, 4) 'displays "lo world"
End Sub
Hello I'm trying to delete all the rows where in column B the members value is over 1000.
I tried this step by step and tried first getting rid of all the unecessary data from B cells and leave just the line with the members.
I noticed there are 5 lines and the members line is the 6'th one. I searched for hours and I still don't get it HOW TO DELETE THE FIRST 5 LINES. Could you please offer me a hand of help? Im sure its soo easy but I cant find it.
I have this:
Option Explicit
Sub Delete5TextLines()
Dim c As Range, s
Application.ScreenUpdating = False
For Each c In Range("B1", Range("B" & Rows.Count).End(xlUp))
**********
Next c
Application.ScreenUpdating = True
End Sub
this is the .csv file:
http://we.tl/vNcyfg9Wus
Alright, this is not very elegant, but the first thing that I came up with, that kinda works.
use this formula to delete the last word in your bulk of text ("members")
Assuming your text is in A1:
=LEFT(A1,FIND("|",SUBSTITUTE(A1," ","|",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))-1)
This formula gets you the last word of a text, in this case the number of members (because we deleted the word "members)
Assuming you put the formula above in A2
=IF(ISERR(FIND(" ",A2)),"",RIGHT(A2,LEN(A2)-FIND("*",SUBSTITUTE(A2," ","*",LEN(A2)-LEN(SUBSTITUTE(A2," ",""))))))
Now you should have extracted the number of members. If this value is <5000 you can delete the row with a vba loop that should look like this:
Sub deleteRowsAfterMembers
Dim i as Integer
i = ThisWorkbook.Sheets(1).Rows.Count
While i > 0 Do
If (CellWithMemberCount).Value < 5000 Then
ThisWorkbook.Sheets(1).Rows(i).Delete
End If
i = i-1
Loop
End Sub
That'll (hopefully) do it.
Whenever you delete entire rows using a loop, you should start at the bottom of the range and work the loop upwards.
EDIT#1:
Assuming that there are at least five lines within a cell and the lines are separated by Chr(10) then this will remove the first 5 lines:
Sub marine()
ary = Split(ActiveCell.Value, Chr(10))
t = ""
For i = 5 To UBound(ary)
t = t & Chr(10) & ary(i)
Next i
If Len(t) > 1 Then
t = Mid(t, 2)
Else
t = ""
End If
ActiveCell.Value = t
End Sub
Please be aware that I am working with a series of ~1000 line medical information databases. Due to the size of the databases, manual manipulation of the data is too time consuming. As such, I have attempted to learn VBA and code an Excel 2010 macro using VBA to help me accomplish parsing certain data. The desired output is to split certain characters from a provided string on each line of the database as follows:
99204 - OFFICE/OUTPATIENT VISIT, NEW
will need to be split into
Active Row Active Column = 99204 ActiveRow Active Column+3 = OFFICE/OUTPATIENT VISIT, NEW
I have researched this topic using Walkenbach's "Excel 2013: Power Programming with VBA" and a fair amount of web resources, including this awesome site, but have been unable to develop a fully-workable solution using VBA in Excel. The code for my current macro is:
Sub EasySplit()
Dim text As String
Dim a As Integer
Dim name As Variant
text = ActiveCell.Value
name = Split(text, "-", 2)
For a = 0 To 1
Cells(1, a + 3).Value = Trim(name(a))
Next a
End Sub
The code uses the "-" character as a delimiter to split the input string into two substrings (I have limited the output strings to 2, as there exists in some input strings multiple "-" characters). I have trimmed the second string output to remove leading spaces.
The trouble that I am having is that the output is being presented at the top of the activesheet, instead of on the activerow.
Thank you in advance for any help. I have been working on this for 2 days and although I have made some progress, I feel that I have reached an impasse. I think that the issue is somewhere in the
Cells(1, a + 3).Value = Trim(name(a))
code, specifically with "Cells()".
Thank you Conrad Frix!
Yah.. funny enough. Just after I post I have a brainstorm.. and modify the code to read:
Sub EasySplit()
Dim text As String
Dim a As Integer
Dim name As Variant
text = ActiveCell.Value
name = Split(text, "-", 2)
For a = 0 To 1
ActiveCell.Offset(0, 3 + a).Value = Trim(name(a))
Next a
End Sub
Not quite the colkumn1,column4 output that I want (it outputs to column3,column4), but it will work for my purpose.
Now I need to incorporate a loop so that the code runs on each successive cell in the column (downwards, step 1) skipping all bolded cells, until it hits an empty cell.
Modified answer to modified request.
This will start on row 1 and continue until a blank cell is found in column A. If you would like to start on a different row, perhaps row 2 if you have headers, change the
i = 1
line to
i = 2
I added a check on the upper bound of our variant before doing the output writes, in case the macro is run again on already formatted cells. (Does nothing instead of erroring out)
Sub EasySplit()
Dim initialText As String
Dim i As Double
Dim name As Variant
i = 1
Do While Trim(Cells(i, 1)) <> ""
If Not Cells(i, 1).Font.Bold Then
initialText = Cells(i, 1).text
name = Split(initialText, "-", 2)
If Not UBound(name) < 1 Then
Cells(i, 1) = Trim(name(0))
Cells(i, 4) = Trim(name(1))
End If
End If
i = i + 1
Loop
End Sub
just add a variable to keep track of the active row and then use that in place of the constant 1.
e.g.
Dim iRow as Integer = ActiveCell.Row
For a = 0 To 1
Cells(iRow , a + 3).Value = Trim(name(a))
Next a
Alternate method utilizing TextToColumns. This code also avoids using a loop, making it more efficient and much faster. Comments have been added to assist with understanding the code.
EDIT: I have expanded the code to make it more versatile by using a temp worksheet. You can then output the two columns to wherever you'd like. As stated in your original question, the output is now to columns 1 and 4.
Sub tgr()
Const DataCol As String = "A" 'Change to the correct column letter
Const HeaderRow As Long = 1 'Change to be the correct header row
Dim rngOriginal As Range 'Use this variable to capture your original data
'Capture the original data, starting in Data column and the header row + 1
Set rngOriginal = Range(DataCol & HeaderRow + 1, Cells(Rows.Count, DataCol).End(xlUp))
If rngOriginal.Row < HeaderRow + 1 Then Exit Sub 'No data
'We will be using a temp worksheet, and to avoid a prompt when we delete the temp worksheet we turn off alerts
'We also turn off screenupdating to prevent "screen flickering"
Application.DisplayAlerts = False
Application.ScreenUpdating = False
'Move the original data to a temp worksheet to perform the split
'To avoid having leading/trailing spaces, replace all instances of " - " with simply "-"
'Lastly, move the split data to desired locations and remove the temp worksheet
With Sheets.Add.Range("A1").Resize(rngOriginal.Rows.Count)
.Value = rngOriginal.Value
.Replace " - ", "-"
.TextToColumns .Cells, xlDelimited, Other:=True, OtherChar:="-"
rngOriginal.Value = .Value
rngOriginal.Offset(, 3).Value = .Offset(, 1).Value
.Worksheet.Delete
End With
'Now that all operations have completed, turn alerts and screenupdating back on
Application.DisplayAlerts = True
Application.ScreenUpdating = True
End Sub
You can do this in a single shot without looping using the VBA equivalent of entering this formula, then taking values only
as a formula
=IF(NOT(ISERROR(FIND("-",A1))),RIGHT(A1,LEN(A1)-FIND("-",A1)-1 ),A1)
code
Sub Quicker()
Dim rng1 As Range
Set rng1 = Range([a1], Cells(Rows.Count, "A").End(xlUp))
With rng1.Offset(0, 3)
.FormulaR1C1 = "=IF(NOT(ISERROR(FIND(""-"",RC[-3]))),RIGHT(RC[-3],LEN(RC[-3])-FIND(""-"",RC[-3])-1 ),RC[-3])"
.Value = .Value
End With
End Sub