I'm having trouble extracting this text, exactly as it appears, from a CSV. There are similar questions posted on SO but they don't match my requirements:
I want to extract "31 January 2017" from this row:
4,'31 January 2017','Funds Received/Credits',56,,401.45,
Currently, VBA considers it "31 Jan" without the year. I've tried applying .NumberFormat to the cell (general, text, date).
SOLUTION REQUIREMENTS:
No user action required -- Interact with the file only using VBA (not using File > Import > Wizard)
Compatible with VBA Excel 2003
Extract the full text regardless of Excel or operating system date settings
Thank you for your ideas
You can use the split function, using the comma as a delimiter like this:
sResult = Split("4,'31 January 2017','Funds Received/Credits',56,,401.45, ", ",")(1)
If you dont want the single quotes, then add the replace function like this:
sResult = Replace(Split("4,'31 January 2017','Funds Received/Credits',56,,401.45, ", ",")(1), "'", "")
If you include the "Microsoft VBScript Regular Expressions 5.5" Reference, you can set up a pattern that will extract the whole date if it is found. For example:
Dim tstring As String
Dim myregexp As RegExp
Dim StrMatch As Object
tstring = 'Line from the CSV, or entire CSV as one string
Set myregexp = New RegExp
myregexp.Pattern = "\d{1,2} [A-Z]{3,9} \d{4}"
Set StrMatch = myregexp.Execute(tstring)
You get the benefit from this method that all the dates in the CSV will be pulled out at once, much faster than using a split line by line. Additionally, the dates may be accessed by using
DateStr = StrMatch.Item(index)
for the whole string line, or substrings can be set up to get specific parts of the string(Such as month, day, year).
myregexp.Pattern = "\(d{1,2}) ([A-Z]{3,9}) (\d{4})"
Set StrMatch = myregexp.Execute(tstring)
DateStr = StrMatch.Item(index1).SubMatches(index2)
It is a very powerful tool, with a simple set of symbols for development of patterns. I highly suggest you familiarize yourself with it for manipulation of large strings.
Related
I have a text field in a table where I need to substitute phone numbers where applicable.
For example the text field could have:
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
Sometimes a phone number will be in the text but not always and the phone number entered will always be different.
Is there a measure to use to replace the phone numbers with no text.
Ideally the solution would be Power BI, but can also be done in the raw data using excel or VBA
Regular expression in VBA (excel) or Python (Power BI) is a straightforward solution.
I have never used PowerBI with Python before but manage to make following python script.
In PowerBI transformation steps I created a new column that would copy [message] columns and named it [noPhoneNumber], then next step ran this python script
import re
def removePhone(x):
return re.sub('\d{10,11}', "**number removed**", x)
length = len(dataset["noPhoneNumber"])
for iRow in range(length):
dataset["noPhoneNumber"][iRow] = removePhone(dataset["noPhoneNumber"][iRow])
so column "noPhoneNumber"
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
becomes
Call me on **number removed** immediately
Call me on **number removed**
I need assistance please contact me
Good service
In VBA Preferable create UDF (user defined function) and don't create a subroutine, that would be too error prone for this kind of problem.
[Added]
If you need to make a Excel based solution, you can create a UDF function like so:
(remember early binding to import of VBScript_RegExp_55.RegExp in excel)
Function removePhoneNumber(text As String, Optional replacement As String = "**number removed**") As String
Dim regex As New RegExp
regex.Pattern = "\d{10,11}"
removePhoneNumber = regex.Replace(text, replacement)
End Function
...and then use excel function like so:
=removePhoneNumber(A2),
=removePhoneNumber(A3)
and so on...
A simple VBA function alternative
Function removePhone(s As String) As String
Const DELIM As String = " "
Dim i As Long, tokens As Variant
tokens = Split(s, DELIM)
For i = LBound(tokens) To UBound(tokens)
If IsNumeric(tokens(i)) Then
tokens(i) = "*Removed*" ' << change to your needs
Exit For ' assuming a single phone number per string
End If
Next
removePhone = Join(tokens, DELIM)
End Function
You can do this in Power Query. Create a custom column with this below code. I have considered the column name is Comments but please adjust this with your column name.
if Text.Length(Text.Select([comments], {"0".."9"})) = 11
then
Text.Replace(
[comments],
Text.Select([comments], {"0".."9"}),
""
)
else [comments]
Here is the output below. You can also replace phone numbers with other text like #### to make is anonymous.
NOTE
This will only work if there are only 1 number in the string with length 11 (You can adjust the length in code as per requirement).
This will Not work if there are more than one Numbers in the string.
If there are 1 number in the string but length not equal 11, this will keep the whole string as original.
I'm currently trying to automate our accounting process. From the bank, I download a .csv file that I'd like to transform in a certain way. I'm also attempting to eliminate all IBAN and BIC numbers from the document as they're not necessary for the accounting process.
Now, every IBAN and BIC follows a certain pattern. How do I replace all strings with a certain pattern (i.e. XX00000000000000 and DEXXXXXXXXX) or at least how do I find them using Visual Basic? I'm familiar with the .replace method already, I just cannot manage to find the string.
Thank you so much in advance!
I think this should help you:
RegEx
An another way could be to load each textline of the .csv file into an array and just Loop through them.
Something like:
Dim Textline() As String 'array
Dim IBAN As String
Dim posIBAN As Integer
Dim iban_length As Integer
textlinelength = UBound(Textline)
iban_length = 22
For i = 0 To textlinelength
If InStr(Textline(i), "DE") Then 'if array contains DE
posIBAN = InStr(Textline(i), "DE") 'find position of IBAN
IBAN = Mid(Textline(i), posIBAN, iban_length) 'get IBAN
Textline(i) = Replace(Textline(i), IBAN, "") 'replace IBAN with ""
End If
Next i
After that you could create a new file and write the arrays in it.
So you would have a IBAN-free txt-file
PS: Is there a way to properly link other questions/answers?
I have a frustrating problem. I have a string containg other characters that are not in this list (check link). My string represents a SQL Query.
This is an example of what my string can contain: INSERT INTO test (description) VALUES ('≤ ≥ >= <=')
When I check the database, the row is inserted successfully, but the characters "≤" and "≥" are replaced with "=" character.
In the database the string in description column looks like "= = >= <=".
For the most characters I can get a character code. I googled a character code for those two symbols, but I didn't find one. My goal is to check if my string contains this two characters , and afterwards replace them with ">=" and "<="
===Later Edit===
I have tried to check every character in a for loop;
tmp = Mid$(str, i, 1)
tmp will have the value "=" when my for loop reaches the "≤" character, so Excel cannot read this "≤" character in a VB string, then when I'm checking for character code I get the code for "=" (Chr(61))
Are you able to figure out what the character codes for both "≤" and "≥" in your database character set are? if so then maybe try replacing both characters in your query string with chrw(character_code).
I have just tested something along the lines of what you are trying to do using Excel as my database - and it looks to work fine.
Edit: assuming you are still stuck and looking for assistance here - could you confirm what database you are working with, and any type information setting for the "description" field you are looking to insert your string into?
Edit2: I am not familiar with SQL server, but isn't your "description" field set up to be of a certain data type? if so what is it and does it support unicode characters? ncharvar, nchar seem to be examples of sql server data types that support Unicode.
It sounds like you may also want to try and add an "N" prefix to the value in your query string - see
Do I have use the prefix N in the "insert into" statement for unicode? &
how to insert unicode text to SQL Server from query window
Edit3: varchar won't qualify for proper rendering of Unicode - see here What is the difference between varchar and nvarchar?. Can you switch to nvarchar? as mentionned above, you may also want to prefix the values in your query string with 'N' for full effect
Edit4: I can't speak much more about sqlserver, but what you are looking at here is how VBA displays the character, not at how it actually stores it in memory - which is the bottom line. VBA won't display "≤" properly since it doesn't support the Unicode character set. However, it may - and it does - store the binary representation correctly.
For any evidence of this, just try and paste back the character to another cell in Excel from VBA, and you will retrieve the original character - or look at the binary representation in VBA:
Sub test()
Dim s As String
Dim B() As Byte
'8804 is "≤" character in Excel character set
s = ChrW(8804)
'Assign memory representation of s to byte array B
B = s
'This loop prints "100" and "34", respectively the low and high bytes of s coding in memory
'representing binary value 0010 0010 0110 0100 ie 8804
For i = LBound(B) To UBound(B)
Debug.Print B(i)
Next i
'This prints "=" because VBA can not render character code 8804 properly
Debug.Print s
End Sub
If I copy your text INSERT INTO test (description) VALUES ('≤ ≥ >= <=') and paste it into the VBA editor, it becomes INSERT INTO test (description) VALUES ('= = >= <=').
If I paste that text into a Excel cell or an Access table's text field, it pastes "correctly".
This seems to be a matter of character code supported, and I suggest you have a look at this SO question.
But where in you program does that string come from, since it cannot be typed in VBA ??
Edit: I jus gave it a try with the below code, and it works like a charm for transferring your exotic characters from the worksheet to a table !
Sub test1()
Dim db As Object, rs As Object, cn As Object
Set cn = CreateObject("DAO.DBEngine.120")
Set db = cn.OpenDatabase("P:\Database1.accdb")
Set rs = db.OpenRecordset("table1")
With rs
.addnew
.Fields(0) = Range("d5").Value
.Update
End With
End Sub
This is really bugging me as it seems pretty illogical the way it's working.
I have a macro to format a cell as a currency using a bit of code to obtain the currency symbol.
Here is the code involved:
Dim sym As String
sym = reportConstants(ISOcode)
'Just use the ISO code if there isn't a symbol available
If sym = "" Then
sym = ISOcode
End If
With range(.Offset(0, 3), .Offset(3, 3))
.NumberFormat = sym & "#,##0;(" & sym & "#,##0)"
Debug.Print sym & "#,##0;(" & sym & "#,##0)"
End With
reportConstants is a dictionary object with currency symbols defined as strings. E.g. reportConstants("USD") = "$". This is defined earlier in the macro.
When the macro runs it gets the ISO code and should then format the cell with the corresponding currency symbol.
When I run it in one instance the ISO code is "USD" - so sym is defined as "$" - but it still formats the cell with a pound sign (£). When I debug.print the format cell string it shows $#,##0;($#,##0) so, as long as I got my syntax correct, it should use a dollar sign in the cell. But it uses a £ sign instead. (I am running a UK version of excel so it may be defaulting to £-sign, but why?)
Any help greatly appreciated.
I just recorded a macro to set the format to $xx.xx and it created this: [$$-409]#,##0.00. Looks like the -409 localises the currency to a particular country; it works without it - try changing yours to .NumberFormat = "[$" & sym & "]#,##0.00"
Btw guess I read your question somewhat after posting ;) Excel is well influenced by the regional settings of your computer for currency, language, dates... Using numberformat can force it to keep the sign you require. if it is a matter of rounding up you can try to: On Excel 2010, go to File - Options - Advanced and scroll down to "When calculating this workbook" and click on the "set precision as displayed" and OK out.
Try this: given your values are numerics/ integers/decimals....
Range("a2").Style = "Currency"
Or you can use format:
Format(value, "Currency")
Format(Range(a2).value, "Currency")
References:
http://www.mrexcel.com/forum/excel-questions/439331-displaying-currency-based-regional-settings.html
http://www.addictivetips.com/microsoft-office/excel-2010-currency-values/
(PS: I am on mobile, you may try these two links)
I am extracting a column of data from a range of filenames. All my filenames are strings in the form:
Temporary PSD Report 'Month' 2011.xls
I am using Replace to extract the month from each, at the moment I am doing it in two stages which works but it seems a bit clumsy. Is there a way to use some kind of AND for multiple replacements in the same string?
Dim strfilename As String
Dim mnth As String
Dim mnthshrt As String
mnth = Replace(strfilename, "Temporary PSD Report ", "")
mnthshrt = Replace(mnth, " 2011.xls", "")
I've tried using & and AND to reference both parts to be removed but it either has no effect on the original string or produces an error.
You could also split the string at each space character and take the 4th word (index starts at 0):
s = "Temporary PSD Report 'Month' 2011.xls"
mth = Split(s, " ")(3)