How do extract values from a label-value string in Excel VBA? - string

I am trying to process a large amount of data in VBA (in excel).
I have thousands of lines of strings that look like this:
LABEL_PERCENT XXX.XX% LABEL_DATE mm/dd/yy
I have used split to process line-by-line (so I am looking at an individual string as defined above). All of the lines have that exact formatting. For each line, I'd like to extract the percentage, and date, for populating a spreadsheet. How do I process the string in VBA, such that I can extract the values into two new variables?

You are already using Split()? This function is how you could extract the four values, splitting on the spaces:
Dim str As String
Dim splitted As Variant
str = "LABEL_PERCENT XXX.XX% LABEL_DATE mm/dd/yy"
splitted = Split(str, " ")
Debug.Print splitted(1) 'XXX.XX%
splitted(3) will give you the date. You then might want to parse the values as a percentage and date.

Related

Extract substrings from irregular text in Excel cell

I am trying to solve this problem -
If suppose I have text like this in a single column on Excel
#22-atr$$1 AM**01-May-2015&&
$21-atr#10-Jan-2007*6 PM&
&&56-atr#11 PM$$8-Jan-2016*
**4 PM#68-atr#21-Mar-2022&&
and I want to write functions to have separate columns as follows
Can someone help me do that please?
I am trying to solve this problem and the only thing that I was able to arrive to is extracting Month by using =MID(A1,FIND("-",A1)+1,3)
One option for formulae would be using new functions, currently available in the BETA-channel for insiders:
Formula in B1:
=LET(A,TEXTSPLIT(A1,{"#","$","&","*","#"},,1),B,SORTBY(A,IFERROR(MATCH(RIGHT(A),{"r","M"},0),3)),C,HSTACK(TAKE(B,,2),TEXTSPLIT(TEXT(--INDEX(B,3),"YYYY-Mmm-D"),"-")),IFERROR(--C,C))
The idea is to:
Use LET() throughout to store variables;
TEXTSPLIT() the value in column A using all available delimiters into columns and skip empty values in the resulting array;
Then SORTBY() the rightmost character of the resulting three elements using MATCH(). The IFERROR() will catch the data-string;
We can than HSTACK() the 1st and 2nd column with the result of splitting the 3rd element after we formatted to YYYY-MMM-D first;
Finally, the resulting array can be multiplied by a double unary. If not, we replace it with original content from the previous variable.
Notes:
I formatted column C to hold time-value in AM/PM.
I changed the text to hold dutch month-names to have Excel recognize the dates for demonstration purposes. Should work the same with English names.
For fun an UDF using regular expressions:
Public Function GetPart(inp As String, prt As Long) As Variant
Dim Pat As String
Select Case prt
Case 0
Pat = "(\d+-atr)"
Case 1
Pat = "(\d+\s*[AP]M)"
Case 2
Pat = "-(\d{4})"
Case 3
Pat = "-(\w+)-"
Case 4
Pat = "(\d+)-\w+-"
Case Else
Pat = ""
End Select
With CreateObject("vbscript.regexp")
.Pattern = ".*" & Pat & ".*"
GetPart = .Replace(inp, "$1")
End With
End Function
Invoke through =GetPart(0,A1). Choices ares 0-4 and in order of your column-headers.
You can achieve what you wish by applying a few simple transformations.
Replace the #,$,* and & with a common character that is guaranteed not to appear in the data sections (e.g. #)
Replace all occurrences of 2 or more runs of the # character with a single #
Trim the # from the start and end of the string
Split the string into an array using # as the split character (vba.split)
use For Each to loop over the array
In the loop have a set of three tests
Test 1 tests the string for the ocurrence of "-atr"
Test 2 tests the string for the occurence of "-XXX-" where XXX is a three letter month - You then split the date at the - to give an array with Day/Month/Year
Test 3 Tests if the string has ' AM' or ' PM'

How to split by multiple delimiters in vba excel

I need to split some data with multi delimiters (//,/,-) , and I used one cell (A3) as data entry cell and I need multi delimiters to provide multi option to the user.
and I also need to know the availability to re-arrange the splitting results like if the results involved words content (*.com or *.net) transfer to certain column
I try to use a code to split but it is working with one delimiter
Make them the same.
Say we have a string that we want to parse by both / and $. Here an example:
Sub multiparse()
Dim s As String, s2 As String
s = "poiuy/tyuiop$7654$lkiop/"
s2 = Replace(s, "/", "$")
arr = Split(s2, "$")
End Sub

Handle Large Data for Conversion of Hex Data

I have a Text/CSV File of more than 10,000,000 Rows and 3 Columns.
Columns Names: ClientName, CLientMobile, ClientData
ClientData is in Hex format.
Presently I am doing:
Splitting the File in multiple parts of 900,000 rows each
Opening Each File - Say File 1
Pasting the below stated Function as Macro (Macro for Hex2Text)
Public Function HexToText(Text As Range) As String
Dim i As Integer
Dim DummyStr As String
For i = 1 To Len(Text) Step 2
DummyStr = DummyStr & Chr(Val("&H" & (Mid(Text, i, 2))))
DoEvents
Next i
HexToText = DummyStr
End Function
Converting Each Hex Value on Column "ClientData" in Readable Text by using above Function "Hex2Text"
Saving the Sheet.
Issues Faced:
I have to split all such big files in 900,000 row limit due to Excel limitations
It takes lot of time for calculations to run when I copy past formulae for Hex2Text for all 900,000 rows for Hex Values conversion in "ClientData"
Solution Desired:
Is there any other software that I can use for the same purpose to avoid spitting and avoid huge time wasted in Excel calculations for Hext2Text conversion.
Any simple solution/idea's will be welcome.

Excel Substrings

I have two unordered sets of data here:
blah blah:2020:50::7.1:45
movie blah:blahbah, The:1914:54:
I want to extract all the data to the left of the year (aka, 1915 and 1914).
What excel formula would I use for this?
I tried this formula
=IF(ISNUMBER(SEARCH(":",A1)),MID(A1,SEARCH(":",A1),300),A1)
these were the results below:
: blahblah, The:1914:54::7
:1915:50::7.1:45:
This is because there is a colon in the movie title.
The results I need consistently are:
:1914:54::7.9:17::
:1915:50::7.1:45::
Can someone help with this?
You can use Regular Expressions, make sure you include a reference for it in your VBA editor. The following UDF will do the job.
Function ExtractNumber(cell As Range) As String
ExtractNumber = ""
Dim rex As New RegExp
rex.Pattern = "(:\d{4}:\d{2}::\d\.\d:\d{2}::\d:\d:\d:\d:\d:\d:\d)"
rex.Global = True
Dim mtch As Object, sbmtch As Object
For Each mtch In rex.Execute(cell.Value)
ExtractNumber = ExtractNumber & mtch.SubMatches(0)
Next mtch
End Function
Without VBA:
In reality you don't want to find the : You want to find either :1 or :2 since the year will either start with 1 or 2This formula should do it:
=MID(A1,MIN(IFERROR(FIND(":1",A1,1),9999),IFERROR(FIND(":2",A1),9999)),9999)
Look for a four digit string, in a certain range, bounded by colons.
For example:
=MID(A1,MIN(FIND(":" &ROW(INDIRECT("1900:2100"))&":",A1 &":" &ROW(INDIRECT("1900:2100"))&":")),99)
entered as an array formula by holding down ctrl-shift while hitting Enter would ensure years in the range 1900 to 2100. Change those values as appropriate for your data. The 99 at the end represents the longest possible string. Again, that can be increased as required.
You can use the same approach to return just the left hand part, up to the colon preceding the year:
=LEFT(A1,-1+MIN(FIND(":" &ROW(INDIRECT("1900:2100"))&":",A1 &":" &ROW(INDIRECT("1900:2100"))&":")))
Here is a screen shot, showing the original data in B1:B2, with the results of the first part in B4:B5, and the formula for B4 showing in the formula bar.
The results for the 2nd part are in B7:B9

MATLAB - Only First Letter of String is Printing

I am having an issue printing a string in MATLAB (2012a) using the fprtinf command (and sprintf).
I have an array of 12 dates (numeric). I am converting them to strings using the following command:
months = datestr(data(:,1)-365,12); %Mar13 format
I obtain the following (and desired) output when I call the months variable:
Jan12
Feb12
Mar12
Apr12
etc..
The issue is when I call the fprintf or sprintf, say with the following code:
fprintf('%s', months(1))
I will only get the first letter of the month and not the full string. Any idea how to make it print the full string?
Thanks!
The resulting data type for your months variable is an NxM character array. You need to process it as a cell array of strings instead.
dates = num2cell(data(:,1)-365)
months = cellfun(#(x) datestr(x,12),dates,'UniformOutput',false)
fprintf('%s', months{1})
should get you what you want.
Simply change your call to
fprintf('%s', months(1, :))
datestr returns the string of each of the supplied dates on a separate row.
Alternatively you could use the cellstr function to convert the result to a cell array (this would also work with non fixed-length date formats like 'dddd')
months = cellstr(months);
fprintf('%s', months{1});

Resources