Excel VBA process csv string into array - excel

I have csv string (utf-8) obtained via a http download.
Depending on the situation the data in the string could contain a different number of columns, but each individual time a string is processed it will contain the same number of columns and be contiguous. (the data will be even).
The string could contain any number of rows.
The first row will always be the headings.
String fields will be encased in double quotes and could contain commas, quotes and newlines.
quotes and double quotes inside a string are escaped by doubling so "" and ''
In other words this is a well formed csv format. Excel through it's standard file open mechanism has no problem formatting this data.
However I want to avoid saving to a file and then opening the csv as I will need to process the output in some cases, or even merge with existing data on a worksheet.
(Added the following information via edit)
The Excel Application will be distributed to various destinations and I want to avoid if possible potential permissions issues, seems that writing nothing to disk is a good way to do that
I am thinking something like the following pseudo:
rows = split(csvString, vbCrLf) 'wont work due to newlines inside string fields?
FOREACH rows as row
fields = split(row, ',') 'wont work due to commas in string fields?
ENDFOR
Obviously that cant handle the fields containing special tokens.
What is a solid way of parsing this data?
Thanks
EDIT 13/10/2012 Data Samples
csv as it would appear in notepad (note not all line breaks will be \r\n some could be \n)
LanguageID,AssetID,String,TypeID,Gender
3,50820,"A string of natural language",3,0
3,50819,"Complex text, with comma, "", '' and new line
all being valid",3,0
3,50818,"Some more language",3,0
The same csv in Excel 2010 - opened from shell (double click - no extra options)

If you don't mind putting the data in your workbook: You could use a blank worksheet, add the data in 1 column, then call TextToColumns. Then if you want to get the data back as an array just load it from the UsedRange of the worksheet.
'Dim myArray 'Uncomment line if storing data to array.
'Assumes cvsString is already defined
'Used Temp as sheet for processing
With Sheets("Temp")
.Cells.Delete
.Cells(1, 1) = cvsString
.Cells(1, 1).TextToColumns Destination:=Cells(1, 1), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
Semicolon:=False, Comma:=True, Space:=False, Other:=False
'myArray = .UsedRange 'Uncomment line if storing data to array
End With

I can think of three possibilities:
Use Regular Expressions to process the text. There are plenty of examples available on SO and via google for separating strings like this.
Use the power of Excel: save the text to a temp file, open into a temp sheet and read the data off the sheet. Delete the file and sheet when done.
Use ADO to query the data. Save the string to a temp file and run a query on that to return the fields you want.
To offer any more specific advice I would need samples of input data and expected output

Related

How do I preserve the leading zeros in a user entered field with excel vba

I am a newbie working on my first excel vba application. One of my user forms has a text box that the user enters data into. The data is likely to be a number that has leading zeros. I am placing the input in a string and trying to format it as text but both things I tried to not work. Any help would be appreciated.
Here are the two things I tried after search on line for how to format text in VBA code
txtString.NumberFormat = "#"
txtString.Value = Format(txtString.Value,"'0")
Thanks for any help.
More detailed question:
My application has 15 user forms and a workbook with 19 sheets in it. The first 5 sheets are excel worksheets that are used as databases. There are 2 worksheets that are inventory databases (account for 2 different types of inventory), there is a worksheet that tracks orders, there is a work sheet that tracks test results for products in inventory, and there is a worksheet to track the label information that must go on order. When the order is generated the user enters a package tag which is likely to be a number with leading zeros. The entry with leading zeros is stored in the orders database correctly. A different user from generates the label information that must go on the product. To do this the application displays orders that need labels and then when the user selects the order they want to generate the label the application searches the order database to get info to put on label and places this in a variable within the module associated with the generate label user form. It gets data in this fashion from each of the other databases to have all of the label information together. It then writes these variables to the database that has the label info in it. When it does this the leading zero get stripped off. I done several searches to find ways to do this and I have tried many of them and cannot seem to get any to work. I was hoping to fix this with the format method because I have to use it with other things I pull from the database like %s. The stripping of the leading zeros occurs when I store the value in the worksheet that has the label info. It does not matter if I set the cell in the label worksheet from a variable or directly from the orders workbook the leading zeros get stripped off.
Thanks!
Assuming your input is a string. Converts string to value you can work with. Calculates how many zeros to precede with in case it is not consistent.
Sub PrecedingZeros()
Dim strng As String
Dim lng As Integer
Dim fmt As String
Dim i As Integer
With Selection
strng = .Value
lng = Len(strng)
.NumberFormat = "#"
fmt = "0"
If lng >= 2 Then
For i = 2 To lng
fmt = fmt + "0"
Next i
End If
.NumberFormat = fmt
.Value = CSng(strng)
End With
End Sub
All
Thanks for your help. I ended up prepending a "'" to the text string every time I set my internal variable and that kept the leading zeros in place. This worked so I dropped the format idea.
Thanks again!
Bruce

How to get Excel to recognise dates without a manual find and replace "/" for "/"?

How do I get excel to recognize timestamps as timestamps rather than strings?
If I do a find and replace on "/" with "/" it fixes it on most files:
Cells.Replace What:="/", Replacement:="/", LookAt:=xlPart, SearchOrder :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False
I have a chunk of code that checks if it's still in the wrong format by converting to "Comma" format and checking if the cell contains any "/" characters, then a break line that triggers in that instance to alert me that I need to manually do the find and replace on this file. If I stop the Macro when it fails and run it manually (Crtl+h, Enter), then it works and I can restart the macro to finish the standardisation. I need a way of automating this.
I have >2000 .csv files of a similar but not identical format. Each one contains ~350 variables, each with it's own timestamp and data column. I've written some code that formats it into a usable format. The original csv has the timestamps in "DD/MM/YYYY hh:mm:ss" format as is my computer and Excel default.
Excel seemingly randomly decides it can't recognise around a quarter of the files timestamps and instead interprets them as strings. This can be corrected by clicking into the cell and then clicking out of the cell, then excel recognises it as a timestamp. I need the timestamps recognised so that I can interpolate values into a standard sampling frequency as Excel can't interpolate using values it interprets as strings.
There are often well over 100k timestamps per file, so doing this manually isn't an option.
I've tried using SendKeys. The problem with that seems to be that it opens the find and replace dialogue for the VBA script editor, not for the excel sheet.
I've tried shifting focus before by calling:
Windows(windowname).Activate
ActiveWorkbook.Application.SendKeys("^h")
I've also tried:
Windows(windowname).Application.SendKeys("^h")
Which both result in the find and replace being called on the VBA script editor.
I have no shortcut to start the Macro.
I've tried Matlab, but it can't deal with the header on the file or the columns populated with text. I'd like to retain all the data.
I have used the Macro recorder to record me doing the find and replace which results in:
Sub Fixer()
'
' Fixer Macro
'
'
Selection.Replace What:="/", Replacement:="/", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
End Sub
But this doesn't work when you run it on the same file.
I expect it to convert the string "DD/MM/YYYY hh:mm:ss" format into a date-time format that I can then convert into decimal format which I can then use for interpolating the values into a usable format. Instead I get no error message, and nothing happens.
An example line of date timestamps from the raw CSV is:
"31/03/2019 14:55:57,1.0000000149,31/03/2019 14:55:57,14.6,31/03/2019 14:55:57,57.86,31/03/2019 14:55:57,0.175000000000068"
So the timestamp "31/03/2019 14:55:57" I want converting into "43555.62218750000"
I could use a script to deconstruct the string, calculate the decimal equivalent, and overwrite the cell, but this will take a prohibitively long time.
First import the date/time field into Excel as Text. In my demo case I use column A. Then run:
Sub TextToValue()
Dim Kolumn As String, N As Long, i As Long
Dim d As Double, dt As Date, tm As Date
Dim v As Variant
Kolumn = "A"
N = Cells(Rows.Count, Kolumn).End(xlUp).Row
For i = 1 To N
v = Cells(i, Kolumn).Text
If InStr(1, v, "/") > 0 Then
arr = Split(v, " ")
brr = Split(arr(0), "/")
dt = DateSerial(brr(2), brr(1), brr(0))
tm = TimeValue(arr(1))
d = CDbl(dt + tm)
Cells(i, Kolumn).Clear
Cells(i, Kolumn).Value = d
End If
Next i
End Sub
Before:
And after:
You need to back up a step.
Your problem is common and is caused by OPENing a csv file where the date format (DMY) in this case is in a different format than your Windows Regional Setting on the your computer.
So even the input that appears to convert properly, will not be as the Day and Month will be exchanged from what you might expect.
Arguably the "best" fix for this issue, assuming you cannot alter the csv file, will be to IMPORT the file instead.
Depending on how you do the IMPORT, and that is Excel version dependent, you will have the opportunity, at the time of import, to define the date format of the incoming data, so it will be properly converted by Excel.
In Power Query you can specify to have PQ do the date conversion according to a specified locale. In earlier versions of Excel, a Text-to-columns wizard will open allowing you to specify DMY for the format of the csv file. I would suggest using PQ as it can easily handle the time portion, whereas with the older wizard you'll need to split the two, and then add them back together.

OpenSchema(adSchemaColumns) for table with hyphens in name

I'm reading some Excel data using ADO, and want to acquire some OpenSchema column values.
My connection string (which successfully opens the connection) is:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:[my
path].xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;IMEX=1;";
I can happily open the AdSchemaTables recordset and get the table name:
Set tablesRs = conn.OpenSchema(AD_SCHEMA_TABLES)
Do While Not tablesRs.EOF
tbl = tablesRs.Fields("TABLE_NAME")
/../
Loop
And for a table with a name like Sheet1$, I can also happily read my column data:
Set colsRs = conn.OpenSchema(AD_SCHEMA_COLUMNS, Array(Empty, Empty, tbl))
My problem is that the name of one of the sheets contains hyphens, eg "16-11-2018" and this seems to throw a 3251 error. I've tried with and without inverted commas "'16-11-2018'" and square brackets "[16-11-2018]", but the former throws 3251 and the latter returns an empty recordset.
I know the data is good because if I copy the sheet to a different workbook with a generic sheet name, my code works fine. So I'm assuming my problem is related to that sheet name.
Is there a way of dealing with this sheet name?
Enclose it in single quotes, so you are effectively looking to use:
Array(Empty, Empty, "'16-11-2018$'")
as the second argument.

vba powerpoint formatting % and $ [duplicate]

This question already has answers here:
Stop Excel from automatically converting certain text values to dates
(37 answers)
Closed 6 years ago.
I am having a challenge with how MSOffice deals with number formats.
While I believe this is similar root cause to: Stop Excel from automatically converting certain text values to dates
It is different as this is not a date format and this involves both Excel and PowerPoint with VBA.
I have data that I am pulling out of a dB into CSV files and I am doing a .Replace on certain text markers (e.g. ##ReplaceText##) in a PPT template. (There is a good post on the site on how to do this I can't seem to locate now)
There is one field I need to deal with which is tracking a metric, this field is text in my dB, but it can contain special characters - specifically $ and %.
e.g. I could see the following values in the CSV file:
"increase market share","1234","$10","28%"
I want VBA to treat this all as text, so the % and $ characters are maintained...but... Excel reads the data as a number and keeps the $ or % sign. PowerPoint removes the $ or % sign and converts 28% to 0.28 and $10 to 10.
Per the above question, adding "=""28%""" to the .csv in Excel, will give me that exact literal text in PowerPoint.
Adding a preceding space or ' character works in forcing Excel to read the data as text string. But PowerPoint ignores it and behaves same as above. Eg 28% to 0.28.
I tried using FORMAT as below, but because the data is variable, I don't know which case to apply.
sCurrentText = Format(sCurrentText, "$#")
or
sCurrentText = Format(sCurrentText, "0.0%")
If statements don't work because the $ or % are not present in what VBA sees (e.g the $ or % character is already gone)
If sCurrentText Like "*$*" Then or If sCurrentText Like "*%" Then
So my question is how do I force VBA to take what is in the CSV file as text and ignore processing $ or % as special characters and just maintain them in the CSV?
You didn't specify what exactly you want to do with the data in the CSV file, but I've assumed you're trying to open the file in VBA.
If you are opening the CSV file using OpenText (as below) then Excel will automatically parse the data in the format it sees fit. eg:
Workbooks.OpenText fileName:="directory", DataType:=xlDelimited, Comma:=True
You can use a different method to open the CSV file if you want VBA to handle the data as just text which you can use as you see fit.
Sub OpenCSVFile()
Dim ff As Long, iRow As Long, iCol As Long
Dim FilePath As String
Dim FileBuffer As String 'Entire CSV file as one string
Dim LineSeparatedFile() As String 'Array of data separated into lines
Dim LineData() As String 'Array of comma separated values for that line
ff = FreeFile
Open FilePath For Binary Access Read As #ff
FileBuffer = Space$(LOF(ff))
Get #ff, , FileBuffer
Close #ff
LineSeparatedFile = Split(txtBuffer, vbCrLf)
For iRow = 0 To UBound(LineSeparatedFile)
LineData = Split(LineSeparatedFile(i), ",")
For iCol = 0 To UBound(LineData)
'Code to do something with each entry.
'Eg. print to cell as text
ThisWorkbook.Sheets(1).Cells(iRow + 1, iCol + 1).NumberFormat = "#"
ThisWorkbook.Sheets(1).Cells(iRow + 1, iCol + 1).Value = LineData(iCol)
Next iCol
Next iRow
End Sub

CSV files: excel hiding zeros

if I load a csv file into excel, value 123.320000 will become 123.32.
i need to view all contents as they are. any way to stop excel from hiding trailing zeros?
reading other posts, i found that doing something like this could work "=""123.3200000" but that would mean running regex on the file every time i want to view it.. since it comes in xxxx|###|xxx format and i have no control over the generation part.
How exactly are you loading the CSV file?
If you import it as "Text" format then Excel will retain all formatting, including leading/trailing zeros.
In Excel 2010 you import from the "Data" tab and choose "From Text", find your CSV file then when prompted choose to format the data as "Text"
I'm assuming that once the imported values are in the sheet, you want to treat them as numbers and not as text, i.e. you want to be able to sum, multiply, etc. Loading the values as text will prevent you from doing this -- until you convert the values back to numbers, in which case you will lose the trailing zeros, which brings you back to your initial conundrum.
Keeping in mind that there is no difference between the values 123.32 and 123.3200000, what you want is just to change the display format such that the full precision of your value is shown explicitly. You could do this in VBA like so:
strMyValue = "123.3200000"
strFormat = "#."
' Append a 0 to the format string for each figure after the decimal point.
For i = 1 To Len(strMyValue) - InStr(strMyValue, ".")
strFormat = strFormat & "0"
Next i
With Range("A1")
.Value = CDbl(strMyValue)
.NumberFormat = strFormat
'Value now shown with same precision as in strMyValue.
End With

Resources