Extracting information from text file in VBA - excel

I'm currently trying to extract data from a text file as part of an analytics challenge in my place of work. The text file is a bunch of data in lines with each heading/entry separated by a comma.
I've looked at several examples of text extraction online but the furthest I've gotten is getting one line in a single cell and then Excel freezing. All others have just frozen Excel after I've put in my conditions.
My current attempts involve the following:
Do Until EOF #1, textLine
Line Input #1, textLine
Do Until Count = Len(text line) + 1
Text = Text & Mid(textLine, Count, Count)
If Right(text, 1) = "," Then
textImport = Left(text, Count - 1)
Cells(rowCount, column count) = textImport
Text = ""
columnCount = columnCount + 1
Loop
rowCount = rowCount + 1
Loop
Can anyone advise where I'm going wrong? I can't share any of the data or the text file due to the nature of the challenge and the data involved.

QueryTable Import
You can do this:
Sub QueryImport()
Const cSheet As Variant = "Sheet1" ' Worksheet Name/Index
Const cSource As String = "A1" ' Source Range
Dim vntFile As Variant ' Source Array
vntFile = Application.GetOpenFilename("Text Files (*.txt), *.txt")
If vntFile <> False Then
With ThisWorkbook.Worksheets(cSheet).QueryTables _
.Add(Connection:="TEXT;" & vntFile, _
Destination:=ThisWorkbook.Worksheets(cSheet).Range(cSource))
.Name = "Pets"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = xlWindows
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(1, 1, 1, 1, 1, 1, 1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
End If
End Sub
which will open a dialog where you can pick the file, which will then be imported to Excel, and then you can manipulate it further which is out of scope due to lack of information. Post part of the result in another question to get the desired result.

Try this if this is not a CSV that can be opened in Excel.
Sub readCSVLikeFile()
r = 1
Open "<path of the file> For Input As #1
While Not EOF(1)
Line Input #1, txtline
v = Split(txtline, ",")
Range(Cells(r, 1), Cells(r, UBound(v) + 1)) = v
r = r + 1
Wend
Close #1
End Sub

Related

Search a string and return line number VBA

I am very new here and to VBA Excel programming. I have a text file with x,y,z coordinates for some points. It also has some unwanted infos at the beginning. My required info comes only at line number say 20 after the string '1-xxx'. My code is to open this file, go through line by line, search for the string '1-xxx' and return the line number. This line number output is then given as input to a recorded macro below in the field '.TextFileStartRow'. I have searched in internet and wrote somethin like this. But the problem is i get only the character postion of the searched text and not the line number as i expected. Where am i going wrong ?
Sub Macro()
Dim Pos_rw As Integer, text As String, textline As String
Dim folder As String, StartingDir As String
Dim file
StartingDir = ThisWorkbook.Path
ChDir StartingDir
file = Application.GetOpenFilename("Text Files (*.txt), *.txt")
Open file For Input As #1
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
Close #1
Pos_rw = InStr(text, "1-xxx")
With ActiveSheet.QueryTables _
.Add(Connection:="TEXT;" & file, Destination:=Range("$D$2:$F$26"))
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlOverwriteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 1252
.TextFileStartRow = Pos_rw
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = False
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = False
.TextFileOtherDelimiter = ":"
.TextFileColumnDataTypes = Array(9, 9, 1, 1, 1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
The solution is to keep track of the line number in the loop, which should terminate as soon as you found the line that you are looking for:
Sub Macro()
Dim Pos_rw As Long
Dim textline As String
Dim folder As String, StartingDir As String
Dim found As Boolean
Dim file
StartingDir = ThisWorkbook.Path
ChDir StartingDir
file = Application.GetOpenFilename("Text Files (*.txt), *.txt")
Open file For Input As #1
Do Until EOF(1) or found
Pos_rw = Pos_rw + 1
Line Input #1, textline
If InStr(textline, "1-xxx") > 0 Then found = True
Loop
Close #1
'the rest of your logic, including some safety code if found = False here
End Sub
A small point: I changed Dim Pos_rw As Integer to Dim Pos_rw As Long. There is almost never a good reason to use Integer in modern VBA (unless you are somehow or other running it on a 16-bit processor). You are risking overflow for no good reason.

Unnecessary value concatenation and Type Errors in Excel VBA

My VBA code takes a .txt file from a specific software output (Carlson Survey software) and does some calculations, then converts it into a .CSV file. I am specifically having issues with the calculation component, where one of my columns of the text file (brought into excel using comma separators) isn't doing the calculation I tell it, and is seemingly concatenating itself (removes everything after the decimal point). My assumption is, that because I am taking these values into an Array (which had to be set as string, or else I was getting type errors) which is set as a string, this is causing the concatenation after the decimal point. I am at a loss as to why the calculation doesn't appear to be running though, as the program seemingly executes fine.
And the VBA script for quick reference (specific section with problem is the 'Do data conversion' section:
Private Sub Workbook_Open()
Sheets("Sheet1").Cells.ClearContents
'---------------------------------------------------------------------------------------
'Choose and open the .TXT file for conversion
Dim answer As Integer
answer = MsgBox("Do you want to process a .TXT file for use in InfoSWMM?", vbYesNo + vbQuestion, "Select .TXT File")
If answer = vbNo Then
Exit Sub
End If
Dim Ret
Ret = Application.GetOpenFilename("Text Files (*.txt),*.txt")
If Ret <> False Then
With ActiveSheet.QueryTables.Add(Connection:= _
"TEXT;" & Ret, Destination:=Range("$A$1") _
)
.Name = "Sample"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
TextFilePromptOnRefresh = False
.TextFilePlatform = 437
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(1, 1, 1, 1, 1, 1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
End If
'---------------------------------------------------------------------------------------
'Do data conversion, SECTION NEEDS UPDATING LACKING FEATURES, BUGS
Dim row As Integer
Dim col As Integer
Dim i As Integer
Dim tester(3) As String 'Bug[1] related, type error (see below). String type fixes type error, but causes undesired concatenation
Dim col_test As Integer
Dim rim As Integer
For row = 1 To ActiveSheet.UsedRange.Rows.Count
If IsEmpty(ActiveSheet.Cells(row, 1).Value) = True Then
Exit For
End If
'Change these values in case feature code library is changed in Carlson, also need to add extra fields
If ActiveSheet.Cells(row, 5).Value = "SD" Or ActiveSheet.Cells(row, 5).Value = "WQ" Then
col_test = 20
rim = ActiveSheet.Cells(row, 4).Value
For i = 0 To 3
tester(i) = ActiveSheet.Cells(row, col_test).Value 'Bug[1] here, type error if not a String.
col_test = col_test + 4
Next i
ActiveSheet.Cells(row, 37).Value = rim - Application.Max(tester) 'Bug[2] here, not performing calculation.
End If
Next row
'---------------------------------------------------------------------------------------
'Save converted file as .CSV
MsgBox "Choose the desired save location for the .CSV file."
Dim InitialName As String
Dim PathName As Variant
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
InitialName = "sfm_output"
PathName = Application.GetSaveAsFilename(InitialFileName:=InitialName, fileFilter:="CSV (Comma delimited) (*.csv), *.csv")
ws.Copy
ActiveWorkbook.SaveAs Filename:=PathName, _
FileFormat:=xlCSV, CreateBackup:=False
MsgBox "Process completed successfully." & vbNewLine & "File saved to:" & vbNewLine & PathName
'---------------------------------------------------------------------------------------
'Close all Workbooks
Application.DisplayAlerts = False
Application.Quit
End Sub
Any help is greatly appreciated. Thanks.
Have you tried CSTRING or CINT functions?
For example:
tester(i) = CString(ActiveSheet.Cells(row, col_test).Value)

Insert top x rows from text with VBA

I want to insert the top x rows from a text file. I can give the number of the StartRow, but is there such parameters where I can give the "EndRow" number to give the top x rows what I want to inseret.
Sub insertTopX()
With ActiveSheet.QueryTables.Add(Connection:= _
"TEXT;C:\Users\HarrsionDavid\Desktop\AnswerForEveryQuestions" _
,Destination:=Cells(1,1))
.Name = "test_file.txt"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileConsecutiveDelimiter = True
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = False
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
End Sub
You can read the txt file and split it by new line. Then you would have an array which can be easy to work with.
Sample file:
Read the file and parse it to a variant;
Pass the startRow and the endRow;
Option Explicit
Public Sub TestMe()
Dim filePath As String
filePath = "C:\Users\user\User\nt.txt"
Dim myFile As String
myFile = ReadFileLineByLineToString(filePath)
Dim startRow As Long
Dim endRow As Long
Dim fixedFile As Variant
fixedFile = Split(myFile, vbCrLf)
startRow = 2
endRow = 3
Dim cnt As Long
For cnt = startRow To endRow
Debug.Print fixedFile(cnt - 1)
Next cnt
End Sub
This is the result:
This is the ReadFileLineByLineToString:
Public Function ReadFileLineByLineToString(path As String) As String
Dim fileNo As Long
fileNo = FreeFile
Open path For Input As #fileNo
Do While Not EOF(fileNo)
Dim textRowInput As String
Line Input #fileNo, textRowInput
ReadFileLineByLineToString = ReadFileLineByLineToString & textRowInput
If Not EOF(fileNo) Then
ReadFileLineByLineToString = ReadFileLineByLineToString & vbCrLf
End If
Loop
Close #fileNo
End Function

Excel Date formatting with hash marks for some reason

The Problem
I've got a VBA program that exports data from one Excel file into a CSV. When it comes across a date, it formats it like #2016-06-14#. I'm assuming the hash marks (or octothorpe or pound sign or hashtag) are meant to indicate that the field is a date field. But, when I'm importing the CSV back into a different Workbook, the date will not come in no matter how I format the field. It still contains the # characters.
The Question
How can I get the date column to import as a YMD format date?
Appendix
Here's some code I'm using to export and import, for reference.
Export
Sub WriteCSV(writeRange As Range, fileName As String)
Dim myFile As String, rng As Range, cellValue As Variant, i As Integer, j As Integer
myFile = ActiveWorkbook.Path & "\Results\" & fileName & ".csv"
Debug.Print myFile
Open myFile For Output As #1
For i = 1 To writeRange.Rows.Count
For j = 1 To writeRange.Columns.Count
cellValue = writeRange.Cells(i, j).value
If j = writeRange.Columns.Count Then
Write #1, cellValue
Else
Write #1, cellValue,
End If
Next
Next
Close #1
End Sub
Import
Sub ReadCSV(targetCell As Range, filePath As String)
With ActiveSheet.QueryTables.Add(Connection:= _
"TEXT;" & filePath, Destination:=targetCell)
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 437
.TextFileStartRow = 2
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = False
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
.Delete
End With
End Sub
Try outputing .text instead of .value.
Change this:
cellValue = writeRange.Cells(i, j).value
to this:
cellValue = writeRange.Cells(i, j).Text
Discard those #.
When I manually Save as a file as CSV, Excel does NOT put those # around the dates. Instead it writes the dates as they are defined in my regional settings (that is - for me - dd/mm/yyyy).
And when importing back, the recorder used array(...4...) for the date column, and it was imported correctly.
Anyway CSV really sucks if your are working in an international environment since it behaves differently according the the local machine settings. Being poorly defined, it's the last format I would use.

Using a variable or string as a file name for importing data using query tables

I'm trying to import a lot of text files with numerical names into a separate worksheets.
The loop to create the worksheets works fine
Dim i as integer 'initial file name
Dim k as integer 'final file name
i = Cells(3, 3).Value
k = Cells(5, 3).Value
Do while i <= k
Worksheets.Add.Name = i
i = i +5
Loop
and for importing specific individual files, this line also seems to work fine (when including the .FileNames .RowNumbers. RefreshPeriod etc. commands):
With Activesheet.QueryTables.Add(Connection:="TEXT;C:\temp\load_excel\15.txt" _, Destination:=Range ("$A$1"))
I would like to replace the "TEXT;C:\temp\load_excel\15.txt" with something more that allows me to use two different variables to change the files being imported:
Dim Folder As String
Dim File As String
Dim DQ as String
DQ = """" 'double quotation marks
Folder = Cells(14, 2).Value 'cell which states C:\temp\load_excel\
File = DQ & "TEXT;" & Folder & i & ".txt" & DQ
'for i = 15 this gives "TEXT;C:\temp\load_excel\15.txt"
Is there a way to incorporate the two so I can have a loop like this?
Do while i <=k
Worksheets.Add.Name = i
Activesheet.QueryTables.Add(Connection:= File _, Destination:=Range ("$A$1"))
i = i +5
Loop
As far as I can see, this should work, but when I try and run it I get a Run-time error '1004': Application or object-defined error. If anyone could help, it would be greatly appreciated.
EDIT: here is exact code being used
Sub ImportPLEtextFiles()
Dim i As Integer ''initial file name
Dim k As Integer ''final file name
Dim DQ As String '' Double quotation marks
Dim Folder As String
Dim File As String
i = Cells(3, 3).Value
k = Cells(5, 3).Value
DQ = """"
Folder = Cells(14, 2).Value
File = DQ & Folder & i & ".txt" & DQ
Do While i <= k
Worksheets.Add.Name = i
File = DQ & "TEXT;" & Folder & i & ".txt" & DQ
With ActiveSheet.QueryTables.Add(Connection:=File _
, Destination:=Range("$A$1"))
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = False
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 850
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
i = i + 5
Loop
End Sub
Put this inside your loop.
File = "TEXT;" & Cells(14, 2).Value & i & ".txt"
With Sheets(i).QueryTables.Add(Connection:= _
File, Destination:=Range("$A$1"))
.Refresh BackgroundQuery:=False
End With

Resources