Importing CSV US formatted numbers in Excel with localisation? - excel

I have a .csv file with the following values:
1488201602.653, 8.304700E-04, 3.079498E-03
1488201603.107, 8.677357E-04, 2.856719E-03
1488201821.012, 7.071995E-04, 4.147542E-03
As visible from the snippet, the numbers are in differing format: the first column has a full number, and a period . as a decimal point separator. The second and third columns have numbers in scientific notation, except a capital E is used, and again a period is used for the decimal separator; there are no thousands separator in any of the values.
When I try to import this in a Danish localized version of Excel 2016, what I get is something like this:
So, I'm apparently getting a ton of thousand separators as periods . in the first column, however, if I select the first number, the formula field shows this:
... meaning, the number that was originally 1488201602.653 in the .csv file, now became interpreted as the integer 1488201602653, which is completely wrong.
For the sevcond and third columns, if I select a number, then the formula field shows:
... meaning, the number that was originally 8.304700E-04 in the .csv file, then became 8,30E+02 in the cell, shown as 830,47 in the .csv, which is - again - completely wrong.
How can I persuade Excel to import the data in the .csv file, which in USA or C locale, in its proper numeric values, so they are shown properly under Danish localisation (that is, 1488201602,653 and 8,304700e-04)?

Well, I found a manual way to handle this issue, but it would still be nice to know if there is an automatic one.
First, get and install Notepad++ if you don't already have it.
Then, note that:
Under US (or "C" language) localization, there is no thousands separator (i.e. it is an empty string, "") - under Danish localization, the thousands separator is period "."
Under US (or "C" language) localization, the decimal separator is a period "." - under Danish localization, the decimal separator is comma ","
The Danish localization demands that the E-notation exponent is written as miniscule letter e, not as a capital letter E
Then, open your .csv file in Notepad++, and possibly save it as a copy under a different filename. Then, do the following replacements in this order:
Search for comma , -> replace with semicolon ; (replace all)
Search for period . -> replace with comma , (replace all)
Search for capital E -> replace with miniscule e (replace all)
Then save the file, and import it in Excel. When importing in Excel, remember to specify the semicolon ; as a CSV field separator - and the numbers (at least as per the OP example) should be read-in and interpreted correctly.

I would try like this with VBA (not tested) :
Sub ImportCSVFile()
Dim xFileName As Variant
xFileName = Application.GetOpenFilename("CSV File (*.csv), *.csv", , "Choose CSV", , False)
If xFileName = False Then Exit Sub
Dim wS As Worksheet
Set wS = ThisWorkbook.Sheets.Add
Dim rG As Range
Set rG = wS.Range("A1")
Dim QT As QueryTable
With wS
Set QT = .QueryTables.Add("TEXT;" & xFileName, rG)
With QT
'''Preserve initial format
.PreserveFormatting = True
'''Select the delimiter
.TextFileParseType = xlDelimited
.TextFileCommaDelimiter = True
'''Choose refresh options
.RefreshStyle = xlInsertDeleteCells
.RefreshOnFileOpen = False
.RefreshPeriod = 0
.SaveData = True
'''Import the data
.Refresh BackgroundQuery:=False
End With 'QT
'''Force the formatting
Call .Columns("1:3").Replace(".", ",")
End With 'wS
End Sub

Related

Location causes incorrect placement of separators in Excel during file import

I import large amounts of data into Excel. These are previously reduced in quantity from 100 Hz to 1 Hz by a third-party program to reduce work and load time. However, during this reduction process, decimal and thousands separators are swapped, probably because the software is designed in a different language.
Original (Example line):
009 090308.510 +2475.77145123 -0091.51682637 070.530 271.89 +0168.67 +0001.13 -8.485680E-04 0.000000 +4.625850E-04 +2.679440E+36 -2.544081E-29 +2.658468E+36
Processed by third party program:
009 090308,510 +2475,77145123 -0091,51682637 070,530 271.89 +00168,67 +001,130 0,000000 -8.485680E-04 +4.625850E-04 +2.679440E+36 -2.544081E-29 +2.658468E+36
As can be seen, some separators are swapped by the program, but others are not. If I now apply my import code to both formats, I get the following results:
Original:
9 90308.51 2475.771 -91.5168 70.53 271.89 168.67 1.13 -8.49E-04 0 4.63E-04 2.68E+36 -2.54E-29 2.66E+36
Processed:
9 90,308,510 247,577,145,123 -9,151,682,637 40,530 271.89 +00168,67 1,130 0,000000 -8.49E-04 4.63E-04 2.68E+36 -2.54E-29 2.66E+36
For understanding here the code for import:
Option Explicit
Public Sub fileImporter()
Dim fDialog As FileDialog
Dim fPath As Variant
Dim FSO
Dim Data
Dim arr, tmp, output
Dim file, fileName As String
Dim x, y As Integer
Dim newSht As Worksheet
Application.ScreenUpdating = False
Set fDialog = Application.FileDialog(msoFileDialogFilePicker)
With fDialog
.AllowMultiSelect = True
.Title = "Please select files to import"
.Filters.Clear
.Filters.Add "VBO Files", "*.vbo" 'VBO Files are opened and handled like Text Files
If .Show = True Then
For Each fPath In .SelectedItems
Set FSO = CreateObject("Scripting.FilesystemObject")
fileName = FSO.GetFilename(fPath)
Set Data = FSO.OpentextFile(fPath)
file = Data.readall
Data.Close
arr = Split(file, vbCrLf)
ReDim output(UBound(arr), 50)
For x = 0 To UBound(arr)
tmp = Split(arr(x), " ")
For y = 0 To UBound(tmp)
output(x, y) = tmp(y)
Next
Next
Set newSht = ActiveWorkbook.Sheets.Add(after:=ActiveWorkbook.Worksheets(ActiveWorkbook.Worksheets.Count))
newSht.Name = fileName
Sheets(fileName).Range("A1").Resize(UBound(output) + 1, UBound(output, 2)) = output
Next
End If
End With
Application.ScreenUpdating = True
End Sub
The record of the processed file is not separated rudimentarily correctly and reasonably. I have also already tried using
With Application
.DecimalSeparator = "."
.ThousandsSeparator = ","
.UseSystemSeparator = False
End With
but that did not work either. Or rather, it changed the separators, but the result stayed the same. The numbers were not separated at the correct places.
I found a similar question here (Importing CSV US formatted numbers in Excel with localisation), which seems to be the same problem. But since the import function in the answer is different from mine, I am not sure how to integrate it properly.
Does someone have a idea? Maybe a way how to preserve the format while or during splitting? Or a better place to integrate the Application.DecimalSeparator argument in the given code?
Thanks for the help!
EDIT:
The problem could be solved by comparing system settings. Apparently, the computer was not provided with the default settings by IT and some settings of the previous owner were still present. These included a partial language change to German, as well as a permanent replacement of the decimal and thousands separators in Excel, instead of using the system separators. After correcting these settings, the program and import works without incorrect separator usage.
You may be able to obtain your desired output using Power Query, available in Windows Excel 2010+ and Office 365 Excel
Data => Get &Transform => From Text/Csv
When the initial dialog opens, ensure Space is selected as the Delimiter
Select Transform Data
Select Home=>Advanced Editor
Replace the code after your Source line with the code below (after Source)
Edit change in swapping algorithm
It appears by examination that
All columns are numeric
Separators are swapped on the columns where there is a comma
All other columns have the separators not swapped
If those assumptions are not correct, some changes may need to occur
If there are swapped and non-swapped values in a single column, I would suggest you develop another method to sample your data, else you would have to check every single cell
M Code edited to implement change in algorithm above
M Code edited to check first 200 table rows for commas instead of just first row
let
Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\decimals.txt"),
[Delimiter=" ", Columns=14, Encoding=1252, QuoteStyle=QuoteStyle.None]),
//Swapped separators = numbers with commas
// check only the first 200 rows, for efficiency
#"Check Table" = Table.Buffer(Table.FirstN(Source, List.Max({Table.RowCount(Source), 200}))),
Swapped = List.Accumulate(Table.ColumnNames(#"Check Table"), {}, (state, current)=>
if List.AnyTrue(List.Transform(Table.Column(#"Check Table", current), each Text.Contains(_,",")))
then state & {current} else state),
notSwapped = List.RemoveMatchingItems(Table.ColumnNames(#"Check Table"),Swapped),
//Set the data types appropriately
#"Type Not Swapped"= Table.TransformColumnTypes(Source,
List.Transform(notSwapped, each {_, type number}),"en-US"),
#"Type Swapped" = Table.TransformColumnTypes(#"Type Not Swapped",
List.Transform(Swapped, each {_, type number}),"da-DK")
in
#"Type Swapped"
Original Data
| 009 | 090308,510 | +2475,77145123 | -0091,51682637 | 070,530 | 271.89 | +00168,67 | +001,130 | 0,000000 | -8.485680E-04 | +4.625850E-04 | +2.679440E+36 | -2.544081E-29 | +2.658468E+36 |
Results

How to prevent excel frommaking any changes in a CSV file when opening it

I have multiple .csv files with data I generated on another program. I want to open each of these files, copy all data from it and paste it all on an existing workbook on excel. My problem is that when I open the csv file with excel it automatically divides the csv text in three or more columns so that, later, I can't convert the data using the Text to Columns data tool (or any other tool, as this auto-conversion from excel breaks some single values in two different numbers). Is there any way I can prevent excel from making ANY changes at all when I open a csv file?
You need to specify in the csv file that it is text. You do this by putting your number in quotes and preceeding with and equal sign, eg:
="001145",="55666",="02133"
The easiest way to do this would be to do a find-replace on , with ",=", replacing end of lines (you might need to use an advanced editor like Notepad++ for this) with "\r\n=" and doing the start and end of the file manually.
Avoid opening CSV directly into Excel by the double click from Windows Explorer and consider either of the two options:
Manual User Interface
Open Excel program by itself without any file. Under Open, browse and select the csv file. You will then walk through the import process wizard. Be sure to select 1) delimited (not fixed) - comma type; 2) headers are on first row; 3) IMPORTANTLY: defined each column as Text type.
You may want to check off the quoting option to enclose values since commas may be placed within text strings and may be confused with the delimited comma separator. Hence, on automated double click, Excel returned multiple columns for each comma separation. Even more, have the software producing csv files to quote enclose values with potential commas within it.
Automated VBA Code
Import the csv into Excel with QueryTables but after formatting all cells to Text which preserves all data types, specifically by setting .NumberFormat to #:
Function ImportCSV()
Dim qt As QueryTable
csvfile = "C:\Path\To\CSV\File.csv"
' FORMAT ALL CELLS AS TEXT
ActiveSheet.Cells.NumberFormat = "#"
' ADD QUERYTABLE
With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & csvfile, _
Destination:=Range("$A$1"))
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.Refresh BackgroundQuery:=False
End With
' REMOVE QUERYTABLE
For Each qt In currwb.Sheets(2).QueryTables
qt.Delete
Next qt
Set qt = Nothing
End Function

Excel VBA extract date from CSV txt

I'm having trouble extracting this text, exactly as it appears, from a CSV. There are similar questions posted on SO but they don't match my requirements:
I want to extract "31 January 2017" from this row:
4,'31 January 2017','Funds Received/Credits',56,,401.45,
Currently, VBA considers it "31 Jan" without the year. I've tried applying .NumberFormat to the cell (general, text, date).
SOLUTION REQUIREMENTS:
No user action required -- Interact with the file only using VBA (not using File > Import > Wizard)
Compatible with VBA Excel 2003
Extract the full text regardless of Excel or operating system date settings
Thank you for your ideas
You can use the split function, using the comma as a delimiter like this:
sResult = Split("4,'31 January 2017','Funds Received/Credits',56,,401.45, ", ",")(1)
If you dont want the single quotes, then add the replace function like this:
sResult = Replace(Split("4,'31 January 2017','Funds Received/Credits',56,,401.45, ", ",")(1), "'", "")
If you include the "Microsoft VBScript Regular Expressions 5.5" Reference, you can set up a pattern that will extract the whole date if it is found. For example:
Dim tstring As String
Dim myregexp As RegExp
Dim StrMatch As Object
tstring = 'Line from the CSV, or entire CSV as one string
Set myregexp = New RegExp
myregexp.Pattern = "\d{1,2} [A-Z]{3,9} \d{4}"
Set StrMatch = myregexp.Execute(tstring)
You get the benefit from this method that all the dates in the CSV will be pulled out at once, much faster than using a split line by line. Additionally, the dates may be accessed by using
DateStr = StrMatch.Item(index)
for the whole string line, or substrings can be set up to get specific parts of the string(Such as month, day, year).
myregexp.Pattern = "\(d{1,2}) ([A-Z]{3,9}) (\d{4})"
Set StrMatch = myregexp.Execute(tstring)
DateStr = StrMatch.Item(index1).SubMatches(index2)
It is a very powerful tool, with a simple set of symbols for development of patterns. I highly suggest you familiarize yourself with it for manipulation of large strings.

Decimalseperator lost after conversion from csv to excel with vb-script

I have a CSV with semicolon seperators that I would like to convert to a regular Excel sheet. I managed to do this with the code below, but I must have made a mistake because numbers with decimals in the original file that don't start with a zero are shown in Excel as number without the decimal separator. When I open the CSV manually in Excel the result will be fine, so it must be a side-effect of doing it with a script.
For example:
In the CSV there is a line:
2013-03-10 17:00:15; idle; 2,272298;; 0,121860
In the Excel sheet this becomes:
2013-03-10 17:00 | idle | 2.272.298| | 0,121860
Opened manually in excel gives:
2013-03-10 17:00 | idle | 2,272298| | 0,121860
Could somebody please tell me what I could/should change to keep the decimals as decimals in Excel? Possibly a way to tell Excel which symbol represents the decimal separator or an argument to force it into using European formats?
Kind regards, Nico
This is the script I currently have, where csvFile is a string with the full path to the original file and excelFile is a string with the full path to the location where I want to store the new excel sheet.
Set objExcel = CreateObject("Excel.Application") 'use excel
objExcel.Visible = true 'visible
objExcel.displayalerts=false 'no warnings
objExcel.Workbooks.Open(csvFile) 'open the file
objExcel.ActiveWorkbook.SaveAs excelFile, -4143, , , False, False 'save as xls
objExcel.Quit 'close excel
Create a schema.ini file in the folder your csvFile lives in and describe it according to the rules given here.
Further reading: import, text files
There are several approaches possible, I will cover one that I favor:
Start Recording a macro
Create a new workbook
From that workbook go to Data > From Text and there you select the CSV file, then you can do all the required settings regarding Value separators, Decimal separators, Thousands separators. Also the specific data type can be selected for each column.
When the CSV content is added go to Data > Connections and Remove
the connection. The data will stay in the worksheet, but there is no
longer an active connection.
Save the workbook under the xls name
Stop the Recording
Now tweak the script a bit to your liking.
In general Excel honors the system's regional settings. The CSV import, however, sometimes has its own mind about the "correct" format, particularly when the imported file has the extension .csv.
I'd try the following. Rename the file to .txt or .tsv and import it like this:
objExcel.Workbooks.OpenText csvFile, , , 1, 1, False, False, True
I made a work around. I now create a copy of the CSV file where I replace all commas followed by a number by points. While not very effective it does give Excel what it wants and it is simple enough for an inexperienced programmer like me to use.
When doing so a college asked me to also remove white spaces and entries with duplicate values in the first column (the timestamp in this case).
The result was this script
'csvFile is a string with the full path to the file. e.g. "C:\\Program Files\\Program\\data.csv"
'tempFile is a string with the full path to the file. e.g. "C:\\Temp\\temp.csv"
'excelfile is a string with the full path to the file. e.g. "D:\\Data\\sheet.xls"
Set fs=CreateObject("Scripting.FileSystemObject")
Set writeFile = fs.CreateTextFile(tempFile,True)
Set readFile = fs.OpenTextFile(csvFile)
' regular expression to remove leading whitespaces
Set regular_expression = New RegExp
regular_expression.Pattern = "^\s*"
regular_expression.Multiline = False
' regular expression to change the decimal seperator into a point
Set regular_expression2 = New RegExp
regular_expression2.Global = True
regular_expression2.Pattern = ",(?=\d)"
regular_expression2.Multiline = False
'copy the original file to the temp file and apply the changes
Do Until readFile.AtEndOfStream
strLine= readFile.ReadLine
If (StrComp(current_timestamp,Mid(strLine, 1, InStr(strLine,";")),1)<>0) Then
If (Len(previous_line) > 2) Then
previous_line = regular_expression2.replace(previous_line,".")
writeFile.Write regular_expression.Replace(previous_line, "") & vbCrLf
End if
End if
current_timestamp = Mid(strLine, 1, InStr(strLine,";"))
previous_line = strLine
Loop
readFile.Close
writeFile.Close
Set objExcel = CreateObject("Excel.Application") ' use excel
objExcel.Visible = true ' visible
objExcel.displayalerts=false ' no warning pop-ups
objExcel.Workbooks.Open(tempFile) ' open the file
objExcel.ActiveWorkbook.SaveAs excelfile, -4143, , , False, False 'save as excelfile
fs.DeleteFile tempFile ' clean up the temp file
I hope this will also be useful for someone else.

Extract tables from pdf (to excel), pref. w/ vba

I am trying to extract tables from pdf files with vba and export them to excel. If everything works out the way it should, it should go all automatic. The problem is that the table are not standardized.
This is what I have so far.
VBA (Excel) runs XPDF, and converts all .pdf files found in current folder to a text file.
VBA (Excel) reads through each text file line by line.
And the code:
With New Scripting.FileSystemObject
With .OpenTextFile(strFileName, 1, False, 0)
If Not .AtEndOfStream Then .SkipLine
Do Until .AtEndOfStream
//do something
Loop
End With
End With
This all works great. But now I am getting to the issue of extracting the tables from the text files.
What I am trying to do is VBA to find a string e.g. "Year's Income", and then output the data, after it, into columns. (Until the table ends.)
The first part is not very difficult (find a certain string), but how would I go about the second part. The text file will look like this Pastebin. The problem is that the text is not standardized. Thus for example some tables have 3-year columns (2010 2011 2012) and some only two (or 1), some tables have more spaces between the columnn, and some do not include certain rows (such as Capital Asset, net).
I was thinking about doing something like this but not sure how to go about it in VBA.
Find user defined string. eg. "Table 1: Years' Return."
a. Next line find years; if there are two we will need three columns in output (titles +, 2x year), if there are three we will need four (titles +, 3x year).. etc
b. Create title column + column for each year.
When reaching end of line, go to next line
a. Read text -> output to column 1.
b. Recognize spaces (Are spaces > 3?) as start of column 2. Read numbers -> output to column 2.
c. (if column = 3) Recognize spaces as start of column 3. Read numbers -> output to column 3.
d. (if column = 4) Recognize spaces as start of column 4. Read numbers -> output to column 4.
Each line, loop 4.
Next line does not include any numbers - End table. (probably the easiet just a user defined number, after 15 characters no number? end table)
I based my first version on Pdf to excel, but reading online people do not recommend OpenFile but rather FileSystemObject (even though it seems to be a lot slower).
Any pointers to get me started, mainly on step 2?
You have a number of ways to dissect a text file and depending on how complex it is might cause you to lean one way or another. I started this and it got a bit out of hand... enjoy.
Based on the sample you've provided and the additional comments, I noted the following. Some of these may work well for simple files but can get unwieldy with bigger more complex files. Furthermore, there may be slightly more efficient methods or tricks to what I have used here but this will definitely get you going an achieve the desired outcome. Hopefully this makes sense in conjunction with the code provided:
You can use booleans to help you determine what 'section' of the text file you are in. Ie use InStr on the current line to
determine you are in a Table by looking for the text 'Table' and then
once you know you are in the 'Table' section of the file start
looking for the 'Assets' section etc
You can use a few methods to determine the number of years (or columns) you have. The Split function along with a loop will do
the job.
If your files always have constant formatting, even only in certain parts, you can take advantage of this. For example, if you know your
file line will always have a dollar sign in front of the them, then
you know this will define the column widths and you can use this on
subsequent lines of text.
The following code will extract the Assets details from the text file, you can mod it to extract other sections. It should handle multiple rows. Hopefully I've commented it sufficient. Have a look and I'll edit if needs to help out further.
Sub ReadInTextFile()
Dim fs As Scripting.FileSystemObject, fsFile As Scripting.TextStream
Dim sFileName As String, sLine As String, vYears As Variant
Dim iNoColumns As Integer, ii As Integer, iCount As Integer
Dim bIsTable As Boolean, bIsAssets As Boolean, bIsLiabilities As Boolean, bIsNetAssets As Boolean
Set fs = CreateObject("Scripting.FileSystemObject")
sFileName = "G:\Sample.txt"
Set fsFile = fs.OpenTextFile(sFileName, 1, False)
'Loop through the file as you've already done
Do While fsFile.AtEndOfStream <> True
'Determine flag positions in text file
sLine = fsFile.Readline
Debug.Print VBA.Len(sLine)
'Always skip empty lines (including single spaceS)
If VBA.Len(sLine) > 1 Then
'We've found a new table so we can reset the booleans
If VBA.InStr(1, sLine, "Table") > 0 Then
bIsTable = True
bIsAssets = False
bIsNetAssets = False
bIsLiabilities = False
iNoColumns = 0
End If
'Perhaps you want to also have some sort of way to designate that a table has finished. Like so
If VBA.Instr(1, sLine, "Some text that designates the end of the table") Then
bIsTable = False
End If
'If we're in the table section then we want to read in the data
If bIsTable Then
'Check for your different sections. You could make this constant if your text file allowed it.
If VBA.InStr(1, sLine, "Assets") > 0 And VBA.InStr(1, sLine, "Net") = 0 Then bIsAssets = True: bIsLiabilities = False: bIsNetAssets = False
If VBA.InStr(1, sLine, "Liabilities") > 0 Then bIsAssets = False: bIsLiabilities = True: bIsNetAssets = False
If VBA.InStr(1, sLine, "Net Assests") > 0 Then bIsAssets = True: bIsLiabilities = False: bIsNetAssets = True
'If we haven't triggered any of these booleans then we're at the column headings
If Not bIsAssets And Not bIsLiabilities And Not bIsNetAssets And VBA.InStr(1, sLine, "Table") = 0 Then
'Trim the current line to remove leading and trailing spaces then use the split function to determine the number of years
vYears = VBA.Split(VBA.Trim$(sLine), " ")
For ii = LBound(vYears) To UBound(vYears)
If VBA.Len(vYears(ii)) > 0 Then iNoColumns = iNoColumns + 1
Next ii
'Now we can redefine some variables to hold the information (you'll want to redim after you've collected the info)
ReDim sAssets(1 To iNoColumns + 1, 1 To 100) As String
ReDim iColumns(1 To iNoColumns) As Integer
Else
If bIsAssets Then
'Skip the heading line
If Not VBA.Trim$(sLine) = "Assets" Then
'Increment the counter
iCount = iCount + 1
'If iCount reaches it's limit you'll have to redim preseve you sAssets array (I'll leave this to you)
If iCount > 99 Then
'You'll find other posts on stackoverflow to do this
End If
'This will happen on the first row, it'll happen everytime you
'hit a $ sign but you could code to only do so the first time
If VBA.InStr(1, sLine, "$") > 0 Then
iColumns(1) = VBA.InStr(1, sLine, "$")
For ii = 2 To iNoColumns
'We need to start at the next character across
iColumns(ii) = VBA.InStr(iColumns(ii - 1) + 1, sLine, "$")
Next ii
End If
'The first part (the name) is simply up to the $ sign (trimmed of spaces)
sAssets(1, iCount) = VBA.Trim$(VBA.Mid$(sLine, 1, iColumns(1) - 1))
For ii = 2 To iNoColumns
'Then we can loop around for the rest
sAssets(ii, iCount) = VBA.Trim$(VBA.Mid$(sLine, iColumns(ii) + 1, iColumns(ii) - iColumns(ii - 1)))
Next ii
'Now do the last column
If VBA.Len(sLine) > iColumns(iNoColumns) Then
sAssets(iNoColumns + 1, iCount) = VBA.Trim$(VBA.Right$(sLine, VBA.Len(sLine) - iColumns(iNoColumns)))
End If
Else
'Reset the counter
iCount = 0
End If
End If
End If
End If
End If
Loop
'Clean up
fsFile.Close
Set fsFile = Nothing
Set fs = Nothing
End Sub
I cannot examine the sample data as the PasteBin has been removed. Based on what I can glean from the problem description, it seems to me that using Regular Expressions would make parsing the data much easier.
Add a reference to the Scripting Runtime scrrun.dll for the FileSystemObject.
Add a reference to the Microsoft VBScript Regular Expressions 5.5. library for the RegExp object.
Instantiate a RegEx object with
Dim objRE As New RegExp
Set the Pattern property to "(\bd{4}\b){1,3}"
The above pattern should match on lines containing strings like:
2010
2010 2011
2010 2011 2012
The number of spaces between the year strings is irrelevant, as long as there is at least one (since we're not expecting to encounter strings like 201020112012 for example)
Set the Global property to True
The captured groups will be found in the individual Match objects from the MatchCollection returned by the Execute method of the RegEx object objRE. So declare the appropriate objects:
Dim objMatches as MatchCollection
Dim objMatch as Match
Dim intMatchCount 'tells you how many year strings were found, if any
Assuming you've set up a FileSystemObject object and are scanning the text file, reading each line into a variable strLine
First test to see if the current line contains the pattern sought:
If objRE.Test(strLine) Then
'do something
Else
'skip over this line
End If
Set objMatches = objRe.Execute(strLine)
intMatchCount = objMatches.Count
For i = 0 To intMatchCount - 1
'processing code such as writing the years as column headings in Excel
Set objMatch = objMatches(i)
e.g. ActiveCell.Value = objMatch.Value
'subsequent lines beneath the line containing the year strings should
'have the amounts, which may be captured in a similar fashion using an
'additional RegExp object and a Pattern such as "(\b\d+\b){1,3}" for
'whole numbers or "(\b\d+\.\d+\b){1,3}" for floats. For currency, you
'can use "(\b\$\d+\.\d{2}\b){1,3}"
Next i
This is just a rough outline of how I would approach this challenge. I hope there is something in this code outline that will be of help to you.
Another way to do this I have some success with is to use VBA to convert to a .doc or .docx file and then search for and pull tables from the Word file. They can be easily extracted into Excel sheets. The conversion seems to handle tables nicely. Note however that it works on a page by page basis so tables extending over a page end up as separate tables in the word doc.

Resources