Stop words in excel VBA - excel

I'm working on a project in excel and I am taking a text file, reading the text file, and trying to remove the stop words from the text file. But I'm getting stuck on removing the stop words in excel VBA. From the research I've seen it's possible in Java and PHP but I haven't been able to find one specifically to excel VBA. Is there a function that will remove stop words in excel VBA?

Const InputTxtFile As String = "C:\Temp\InTxt.txt"
Const OutputTxtFile As String = "C:\Temp\OutTxt.txt"
Const ListOfStopWords As String = ";CAT;DOG;FOX;"
Sub main()
Dim DataLine As String
Dim strTempLine As String
Open InputTxtFile For Input As #1 'Or FreeFile()
Open OutputTxtFile For Append As #2
While Not EOF(1)
Line Input #1, DataLine
Dim LineTab() As String
LineTab = Split(DataLine, " ") 'Split readed line on space
If UBound(LineTab) > 0 Then
For i = 0 To UBound(LineTab)
If (InStr(ListOfStopWords, ";" + LineTab(i) + ";") = 0) Then 'Look if not in Stop Words list
strTempLine = strTempLine + LineTab(i) + " "
End If
Next
Print #2, strTempLine 'Print to output file
strTempLine = ""
End If
Wend
Close #1
Close #2
End Sub
'Ref: Read/Parse text file line by line in VBA

Related

Special Characters from txt file to excel

I am trying to import special characters from a txt file into excel.
I've tried so many things but the characters BREAK in excel.
example of my string:
in txt: Changjíhuízúzìzhìzhou
converts in excel to: Changjíhuízúzìzhìzhou
so I tried moving values over bit by bit but no luck..
Sub ImportTXTFile()
Dim file As Variant
Dim EXT As String
Dim Direct As String ' directory...
Direct = "C:\FilePath\Here\"
EXT = ".txt"
Dim COL As Long
Dim row As Long
COL = 1
row = 1
file = Dir(Direct)
Do While (file <> "") ' Cycle through files until no more files
If InStr(file, "Data.txt") > 0 Then
'
Open Direct & "Data.txt" For Input As #1
'
While Not EOF(1)
Line Input #1, DataLine ' Read in line
Do While DataLine <> ""
If InStr(DataLine, ",") = 0 Then ' Drop value into excel upto the first ,
Sheets("test").Cells(row, COL).Value = DataLine
DataLine = ""
Else
Sheets("test").Cells(row, COL).Value = Left(DataLine, InStr(DataLine, ",") - 1)
DataLine = Right(DataLine, Len(DataLine) - InStr(DataLine, ",")) ' rebuild array without data upto first ,
End If
COL = COL + 1 ' next column
Loop
COL = 1 ' reset column
row = row + 1 ' write to next row
Wend
'
Close #1 ' Close files straight away
End If
file = Dir
Loop
MsgBox "Data Updated"
End Sub
So I want to cry because all this converting of UTF-8 to ASCII can be avoid simply by:
opening the txt file in Notepad++
going to the encoding tab
clicking convert to ASCII
ran my original code.
BLAM
everything is perfect.
Thank you danieltakeshi for all your help!
Using the first link i gave you, here is a test code, i tested with success. Using the charset: CdoISO_8859_1
Dim objStream As Object
Dim strData As String
Set objStream = CreateObject("ADODB.Stream")
objStream.Charset = "iso-8859-1"
objStream.Open
objStream.LoadFromFile ("C:\Users\user_name\Desktop\test.txt")
strData = objStream.ReadText()
Debug.Print strData & " Compare to: Changjíhuízúzìzhìzhou"
The output was:
EDIT:
Check the encoding type of your .txt file and import to Excel with the same encoding charset, for example, i changed the test.txt to UTF-8 and imported successfully with the .Charset as "utf-8"
You can Save As your .txt file and choose the encoding.

Ignoring blank lines and spaces in text files when reading

I have a text file with file addresses listed line by line.
Sometimes, however, the users go in there and accidentally add a space or a blank line between the addresses and that crashes the entire code.
How could I avoid this when reading the file using VBA?
This is the current block used to open the text file and read addresses line by line:
Set ActiveBook = Application.ActiveWorkbook
PathFile = ActiveWorkbook.Path & "\FilePaths.txt"
Open PathFile For Input As #1
Do Until EOF(1)
Line Input #1, SourceFile
Set Source = Workbooks.Open(SourceFile)
You will add two lines which will ignore blank lines and spaces like this:
Line Input #1, SourceFile
SourceFile = Trim(SourceFile) '~~> This will trim all the spaces
If Not SourceFile = "" Then '~~> This will check if lines is empty
Set Source = Workbooks.Open(SourceFile)
Suggest you add further code to
test if the file actually exists
test if the file is of a valid type for excel to open
code
Dim SourceFile As String
Dim PathFile As String
Set ActiveBook = Application.ActiveWorkbook
PathFile = ActiveWorkbook.Path & "\FilePaths.txt"
Open PathFile For Input As #1
Do Until EOF(1)
Line Input #1, SourceFile
SourceFile = Trim$(SourceFile)
If Len(Dir(ActiveWorkbook.Path & "\" & SourceFile)) > 0 Then
Select Case Right$(SourceFile, Len(SourceFile) - InStrRev(SourceFile, "."))
Case "xls", "xls*"
Set Source = Workbooks.Open(ActiveWorkbook.Path & "\" & SourceFile)
Case Else
Debug.Print "source not valid"
End Select
End If
Loop
Thanks for the code.
I did some small changes so I can reuse it in many different cases and call at any point of the code, using up to 3 different args (you may increase if you wish). like this below example.
note: you may change "totalBananas,EN2003" to anything you find impossible to exist in your files... I used it this way because I am not sure how to declare the args as optional :-p I don't think they are really possible to be optional anyway.
...
Call FixTextFile(file_name, "blabla", "0000", "")
...
Sub FixTextFile(inFile As Variant, fixArg1 As String, fixArg2 As String, fixArg3 As String)
Dim resArg1, resArg2, resArg3 As Long
Dim outFile As String
Dim data As String
If fixArg1 = "" Then fixArg1 = "totalBananas,EN2003"
If fixArg2 = "" Then fixArg2 = "totalBananas,EN2003"
If fixArg3 = "" Then fixArg3 = "totalBananas,EN2003"
Open inFile For Input As #1
outFile = inFile & ".alt"
Open outFile For Output As #2
Do Until EOF(1)
Line Input #1, data
resArg1 = InStr(1, data, fixArg1)
resArg2 = InStr(1, data, fixArg2)
resArg3 = InStr(1, data, fixArg3)
If Trim(data) <> "" And resArg1 < 1 And resArg2 < 1 And resArg3 < 1 Then
Print #2, data
End If
Loop
Close #1
Close #2
Kill inFile
Name outFile As inFile
MsgBox "File alteration completed!"
End Sub

How can I write a macro to import a text file into Excel where the text file to be selected is based on a variable in the spreadsheet

I need to select a text file to import into Excel where the name of the text file contains a string of text that matches a cell in the Excel spreadsheet.
Eg.
A cell with a value "D12345"
I need to import a text file into the sheet where the same string (i.e. "D12345") is contained in the name of the text file.
The selection needs to be made from a collection of text files. Only 1 file in the collection will contain the matching string.
Hope that makes sense.
Give this a try:
Sub SimpleFileListre()
Dim s As String, FileName As String
Dim mesage As String
Range("A:A").Clear
s = "C:\TestFolder\*.txt"
sFolder = "C:\TestFolder\"
FileName = Dir(s)
Do Until FileName = ""
If InStr(1, FileName, "D12345") > 0 Then
Call GetStuff(sFolder & FileName)
End If
FileName = Dir()
Loop
End Sub
Sub GetStuff(s)
Close #2
Open s For Input As #2
j = 1
Do While Not EOF(2)
Line Input #2, TextLine
Cells(j, 1) = TextLine
j = j + 1
Loop
Close #2
End Sub

Read/Parse text file line by line in VBA

I'm trying to parse a text document using VBA and return the path given in the text file. For example, the text file would look like:
*Blah blah instructions
*Blah blah instructions on line 2
G:\\Folder\...\data.xls
D:\\AnotherFolder\...\moredata.xls
I want the VBA to load 1 line at a time, and if it starts with a * then move to the next line (similar to that line being commented). For the lines with a file path, I want to write that path to cell, say A2 for the first path, B2 for the next, etc.
The main things I was hoping to have answered were:
What is the best/simple way to read through a text file using VBA?
How can I do that line by line?
for the most basic read of a text file, use open
example:
Dim FileNum As Integer
Dim DataLine As String
FileNum = FreeFile()
Open "Filename" For Input As #FileNum
While Not EOF(FileNum)
Line Input #FileNum, DataLine ' read in data 1 line at a time
' decide what to do with dataline,
' depending on what processing you need to do for each case
Wend
#Author note - Please stop adding in close #FileNum - it's addressed in the comments, and it's not needed as an improvement to this answer
I find the FileSystemObject with a TxtStream the easiest way to read files
Dim fso As FileSystemObject: Set fso = New FileSystemObject
Set txtStream = fso.OpenTextFile(filePath, ForReading, False)
Then with this txtStream object you have all sorts of tools which intellisense picks up (unlike using the FreeFile() method) so there is less guesswork. Plus you don' have to assign a FreeFile and hope it is actually still free since when you assigned it.
You can read a file like:
Do While Not txtStream.AtEndOfStream
txtStream.ReadLine
Loop
txtStream.Close
NOTE: This requires a reference to Microsoft Scripting Runtime.
For completeness; working with the data loaded into memory;
dim hf As integer: hf = freefile
dim lines() as string, i as long
open "c:\bla\bla.bla" for input as #hf
lines = Split(input$(LOF(hf), #hf), vbnewline)
close #hf
for i = 0 to ubound(lines)
debug.? "Line"; i; "="; lines(i)
next
You Can use this code to read line by line in text file and You could also check about the first character is "*" then you can leave that..
Public Sub Test()
Dim ReadData as String
Open "C:\satheesh\myfile\file.txt" For Input As #1
Do Until EOF(1)
Line Input #1, ReadData 'Adding Line to read the whole line, not only first 128 positions
If Not Left(ReadData, 1) = "*" then
'' you can write the variable ReadData into the database or file
End If
Loop
Close #1
End Sub
The below is my code from reading text file to excel file.
Sub openteatfile()
Dim i As Long, j As Long
Dim filepath As String
filepath = "C:\Users\TarunReddyNuthula\Desktop\sample.ctxt"
ThisWorkbook.Worksheets("Sheet4").Range("Al:L20").ClearContents
Open filepath For Input As #1
i = l
Do Until EOF(1)
Line Input #1, linefromfile
lineitems = Split(linefromfile, "|")
For j = LBound(lineitems) To UBound(lineitems)
ThisWorkbook.Worksheets("Sheet4").Cells(i, j + 1).value = lineitems(j)
Next j
i = i + 1
Loop
Close #1
End Sub

Loading linux text file into excel using VBA

I have a text file created on linux, if I open it in Word pad the file appears normally. However when I open it in notepad, and when I try to load it into excel using the code below it appears as a single line.
' Open the file
Open Filename For Input As #1
' Look for the Table Title
Do While Not (EOF(1) Or InStr(TextLine, TableTitle) > 0)
Line Input #1, TextLine
Loop
How can I split it into the original lines? Is there an end of line seperator, that vba can use?
Linux uses a line-feed (\n) to denote a new line rather than the carriage return+line-feed (\r\n) as used by Windows so you can't use Line input, instead:
Open Filename For Input As #1
'//load all
buff = Input$(LOF(1), #1)
Close #1
'//*either* replace all lf -> crlf
buff = replace$(buff, vbLf, vbCrLf)
msgbox buff
'//*or* line by line
dim lines() As String: lines = split(buff, vbLf)
for i = 0 To UBound(lines)
msgbox lines(i)
next
The Function
Public Function GetLines(fpath$) As Variant
'REFERENCES:
'Microsoft Scripting Runtime // Scripting.FileSystemObject
'Microsoft VBScript Regular Expressions 5.5 // VBScript_RegExp_55.RegExp
Dim fso As New Scripting.FileSystemObject, RE As New VBScript_RegExp_55.RegExp
If fso.FileExists(fpath) = True Then
Dim mts As MatchCollection, mt As Match
Dim lines() As String
Dim content$: content = fso.OpenTextFile(fpath).ReadAll()
With RE
.Global = True
.Pattern = "[^\r\n]+" 'catch all characters except NewLines/Carraige Returns
If .test(content) = True Then
Set mts = .Execute(content)
ReDim lines(mts.Count - 1)
Dim pos&
For Each mt In mts
lines(pos) = mt.Value
pos = pos + 1
Next mt
Else
MsgBox "'" & Dir(fpath) & "' contains zero bytes!", vbExclamation
End If
End With
GetLines = lines
Else
MsgBox "File not found at:" & vbCrLf & Dir(fpath), vbCritical
End If
End Function
and could be invoked by (from immediate window)
?GetLines("C:\BOOT.INI")(2)
and the output
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
The above example could be used to get all lines from any text file originated from any OS.
Hope this helps.
Open the linux text file using Windows "Word Pad". Save the file. Word Pad will convert the linux line-feed (\n) to carriage return+line-feed (\r\n) as it saves the file. No coding is necessary.

Resources