How to extract specific words from text files into xls spreadsheet - excel

I'm new in VBA. Before posting my question here,I have spent almost 3 days surfing Internet.
I have 300+ text files (text converted from PDF using OCR),from text file. I need to get all words that contain "alphabet" and "digits" (as example KT315A, KT-315-a, etc) along with source reference (txt file name).
What I need is
1.add "smart filter" that will copy only words that contains
"alphabets" and "digits"
paste copied data to column A
add reference file name to column B
I have found code below that can copy all data from text files into excel spreadsheet.
text files look like
"line from 252A-552A to ddddd, ,,, #,#,rrrr, 22 , ....kt3443 , fff,,,etc"
final result in xls should be
A | B
252A-552A | file1
kt3443 | file1
Option Explicit
Const sPath = "C:\outp\" 'remember end backslash
Const delim = "," 'comma delimited text file - EDIT
'Const delim = vbTab 'for TAB delimited text files
Sub ImportMultipleTextFiles()
Dim wb As Workbook
Dim sFile As String
Dim inputRow As Long
RefreshSheet
On Error Resume Next
sFile = Dir(sPath & "*.txt")
Do Until sFile = ""
inputRow = Sheets("Temp").Range("A" & Rows.Count).End(xlUp).Row + 1
'open the text file
'format=6 denotes a text file
Set wb = Workbooks.Open(Filename:=sPath & sFile, _
Format:=6, _
Delimiter:=delim)
'copy and paste
wb.Sheets(1).Range("A1").CurrentRegion.Copy _
Destination:=ThisWorkbook.Sheets("Temp").Range("A" & inputRow)
wb.Close SaveChanges:=False
'get next text file
sFile = Dir()
Loop
Set wb = Nothing
End Sub
Sub RefreshSheet()
'delete old sheet and add a new one
On Error Resume Next
Application.DisplayAlerts = False
Sheets("Temp").Delete
Application.DisplayAlerts = True
Worksheets.Add
ActiveSheet.Name = "Temp"
On Error GoTo 0
End Sub
thanks!

It's a little tough to tell exactly what constitutes a word from your example. It clearly can contain characters other than letters and numbers (eg the dash), but some of the items have dots preceding, so it cannot be defined as being delimited by a space.
I defined a "word" as a string that
Starts with a letter or digit and ends with a letter or digit
Contains both letters and digits
Might also contain any other non-space characters except a comma
To do this, I first replaced all the commas with spaces, and then applied an appropriate regular expression. However, this might accept undesired strings, so you might need to be more specific in defining exactly what is a word.
Also, instead of reading the entire file into an Excel workbook, by using the FileSystemObject we can process one line at a time, without reading 300 files into Excel. The base folder is set, as you did, by a constant in the VBA code.
But there are other ways to do this.
Be sure to set the references for early binding as noted in the code:
Option Explicit
'Set References to:
' Microsoft Scripting Runtime
' Microsoft VBscript Regular Expressions 5.5
Sub SearchMultipleTextFiles()
Dim FSO As FileSystemObject
Dim TS As TextStream, FO As Folder, FI As File, FIs As Files
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim WS As Worksheet, RW As Long
Const sPath As String = "C:\Users\Ron\Desktop"
Set FSO = New FileSystemObject
Set FO = FSO.GetFolder(sPath)
Set WS = ActiveSheet
WS.Columns.Clear
Set RE = New RegExp
With RE
.Global = True
.Pattern = "(?:\d(?=\S*[a-z])|[a-z](?=\S*\d))+\S*[a-z\d]"
.IgnoreCase = True
End With
For Each FI In FO.Files
If FI.Name Like "*.txt" Then
Set TS = FI.OpenAsTextStream(ForReading)
Do Until TS.AtEndOfStream
'Change .ReadLine to .ReadAll *might* make this run faster
' but would need to be tested.
Set MC = RE.Execute(Replace(TS.ReadLine, ",", " "))
If MC.Count > 0 Then
For Each M In MC
RW = RW + 1
WS.Cells(RW, 1) = M
WS.Cells(RW, 2) = FI.Name
Next M
End If
Loop
End If
Next FI
End Sub

Related

Modifying CSV files from a local folder-VBA

I am trying to rearrange the order of the columns in csv files in a folder on my local drive.
At the moment, from a tutorial, I have found a way to loop through the files. I wanted to cut a column and re insert in a different column. When running this code, Excel is crashing. It seems to be going through duplicate files.
I expected the columns to have moved in all the files in the folder. But they didn't move. And excel is crashing, looks like it's duplicating the files when hitting CTRL + G and running the code.
Here's the code.
Option Explicit
Sub FleetMoveColumns()
Dim fileDirectory As String
Dim fileCriteria As String
Dim fileName As String
Dim fileToOpen As Workbook
Application.ScreenUpdating = False
fileDirectory = "C:\...\*csv"
fileName = Dir(fileDirectory)
Do While Len(fileName) > 0
Set fileToOpen = Workbooks.Open(fileDirectory & fileName)
Columns("R").Cut
Columns("AB").Insert
Debug.Print fileName
Loop
Application.ScreenUpdating = True
End Sub
Please help.
You need to fully qualify your Columns object with a Worksheet object.
You need to place FileName = Dir within your Do While loop.
Modified code
Do While Len(FileName) > 0
Set fileToOpen = Workbooks.Open(fileDirectory & FileName)
' set the worksheet object
Set Sht = fileToOpen.Worksheets(1) ' <-- Rename "Sheet1" to your desired worksheet
With Sht
.Columns("R").Cut
.Columns("AB").Insert
End With
' clear objects
Set Sht = Nothing
Set fileToOpen = Nothing
Debug.Print FileName
FileName = Dir
Loop

VBA - import n lines from multiple text files

I have a piece of code that imports multiple text files with some data I need. I'd like to change it a bit - I want it to stop reading the file after reaching line number 50 in the text file and import only those first 50 lines. Is there a way I could do this? I was thinking about a loop that goes line by line and executes the code until the line number is larger than 50. I figured out a way to write such a loop, however it doesn't split the line into columns and I need that. Also in the way I wrote it it imports only 1 file. I had a code that worked in terms of reading multiple files and dividing them into columns, but I couldn't make it to end after 50 lines. I used QueryTables for this. Maybe instead of doing that loop I could draw on that?
Here's what I have - it obviously doesn't work:
Sub RT()
Dim fso As Object
Dim xlsheet As Worksheet
Dim qt As QueryTable
Dim txtfilesToOpen As Variant, txtfile As Variant
Dim rec As String
Dim i As Long
Dim txtfilnumber As Integer
Dim FileNumber
Dim txtline As String
i = 0
Application.ScreenUpdating = False
txtfilesToOpen = Application.GetOpenFilename _
(FileFilter:="Text Files (*.txt), *.txt", _
MultiSelect:=True, Title:="Text Files to Open")
With ActiveSheet
.Cells.ClearContents
For Each txtfile In txtfilesToOpen
importrow = 2 + .Cells(.Rows.Count, 1).End(xlUp).Row
With CreateObject("Scripting.FileSystemObject").OpenTextFile(txtfile)
Do While Not .AtEndOfStream
If .line < 50 Then
Cells(.line, 1).Value = .ReadLine
Else: Exit Do
End If
Loop
End With
Next txtfile
For Each qt In .QueryTables
qt.Delete
Next qt
End With
Application.ScreenUpdating = True
MsgBox "Successfully imported text files!", vbInformation, "SUCCESSFUL IMPORT"
Set fso = Nothing
End Sub
Does anyone know how I can approach this? I'm really new at this and still very lost. I'm pretty much stabbing in the dark here. If you could give me a tip on what I can do or what function to use I'll be really thankful!
Your code imports more that one file, however, it always overwrite the content of a previous imported file. You need to add importrow to the cell address.
When you want to split the text into several columns, you need to know how to split it. Do you have a field separator (Tab, Semicolon, comma)? Fixed length?
The following code will split the text into several cells assuming the semicolon as separator. It may be a little bit slow, but you will get the idea.
Do While Not .AtEndOfStream
If .line > 50 Then Exit Do
Dim txtLine as String, tokens() as String, i as long
txtLine = .ReadLine
tokens = Split(txtLine, ";")
For i = 0 to UBound(tokens)
.Cells(importrow + .line, i+1).Value = tokens(i)
Next i
Loop

VBA - Replace with find one word in a specific cell works, how do I find one word or another?

First I should apologize for my very limited VBA coding skills. So the code I have basically does what I want it to do: I have hundreds of Excel files I need to modify at a time repeatedly. If a specific cell ("B1") has the word string "draw" in it, nothing is to happen. If the cell doesn't have the word string "draw", the word "tank" is to be inserted before the word "prep" in the cell. The macro runs through all the files in a given folder, changes the format, outputs to a new folder, etc. This all works beautifully. But on occasion, the cell may contain the word string "pool" instead of "draw". In that case, I don't want to change the cell contents at all. So basically, if "pool" or "draw" is in the cell, do nothing. If they're both not present, add "Tank" before the word string "prep" in the cell. Here's the code I have:
Sub SIS_ALIMS()
Dim wbOpen As Workbook
Dim MyDir As String
MyDir = "C:\Processed data"
strExtension = Dir(MyDir & "\*.xls")
While strExtension <> vbNullString
Set wbOpen = Workbooks.Open(MyDir & "\" & strExtension)
With wbOpen
Set rgFound = Range("B1").Find("draw", MatchCase:=False)
If rgFound Is Nothing Then
Range("B1").replace What:="prep", Replacement:="Tank prep"
Else
End If
Dim SaveName As String
SaveName = ActiveSheet.Range("B8").Text
ActiveWorkbook.SaveAs fileName:="C:\Processed data\ALIMS data\" & _
SaveName & ".txt"
.Close SaveChanges:=False
End With
strExtension = Dir
Wend
Application.ScreenUpdating = True
End Sub
First an observation: Your code does not specify a worksheet in wbOpen, so you may run into problems if a workbook happens not to open on the worksheet you expect. Better to use something like With wbOpen.Sheets(1).
As for your question, instead of using Find you may find it easier to work with the cell value as a string variable:
Dim CellData As String
With wbOpen.Sheets(1)
CellData = .Range("B1").Value
If CellData = "draw" Or CellData = "pool" Then
'do nothing
ElseIf CellData = "prep" Then
.Range("B1").Value = "Tank prep"
Else
'add other conditionals as needed
End If
End With
Finally, if the VBA doesn't need to perform any action when the cell value is "draw" or "pool," then testing for those values is superfluous. The If ... End If block can be replaced with just the conditional that is of interest:
If .Range("B1").Value = "prep" Then .Range("B1").Value = "Tank prep"

Problems with Worksheetfunction.Match in a closed workbook. Cannot work out why no match is found

I'm writing a code to delete a log entry in a .csv file. The code starts with opening the .csv file, using Application.Match to return the row number, and then deleting that and closing the file again. The problems I'm experiencing are I get a type mismatch (my error handling is activated) OR (and here it gets weird) it works (a match is found, the row is deleted) but then the logfile is messed up - all data is one string in column a with either ";" or "," delimiters (this varies somehow, relevant note: I use Dutch language excel). Of course, this makes it impossible for the macro to find a match in any case.
I found that the type mismatch problems I'm experiencing will most likely be caused by the code not finding a match, and this is what I don't understand since I checked and doublechecked the input and the data in the logfile - by all means it simply should find a match. And sometimes it does find a match, deletes the row and messes up formatting. (NOTE: Mostly it does NOT find a match.)
I check data in the .csv file before running the macro. I have tried running the macro with the .csv file already opened. I have tried to Set the matchArray from outside the With. I have tried both sweet talking my laptop and a more aggressive approach, to no avail.
Sub DeleteRowFromFile(ByVal matchValue As String)
Dim filePath As String
Dim fileName As String
Dim fileType As String
Dim matchArray As Range
Dim rowToDelete As Variant
'Naming variables for flexibility
filePath = "C:\Users\Maxim\Documents\Log\"
fileName = "TestRegister"
fileType = ".csv"
Application.ScreenUpdating = False
Workbooks.Open (filePath & fileName & fileType)
With Workbooks(fileName).Worksheets(1)
Set matchArray = .Range("A:A") 'set range in the logfile
'Type mismatch here:
rowToDelete = Application.Match(matchValue, matchArray, 0)
If Not IsError(rowToDelete) Then
Rows(rowToDelete).Delete
Else:
MsgBox "Orderno. " & matchValue & " not found.", vbOKOnly + vbExclamation, "Error"
End If
End With
'Closing the log file
Workbooks(fileName).Close SaveChanges:=True
Application.ScreenUpdating = True
End Sub
Sub MatchAndDelete()
Dim matchValueRange As String
matchValueRange = ActiveWorkbook.Worksheets(1).Range("A1").Value
DeleteRowFromFile (matchValueRange)
End Sub
Footnote:
I'm a struggling enthusiast, I have a lot to learn. Sorry in advance if I have left out any crucial information for you to be of help, and thanks a lot for any and all help.
When you open or save a csv file using a VBA macro Excel will always use the standard (US English delimiters) while if you do the same via the user interface it will use the separators as defined in the Windows regional settings, which probably is ";" in your case.
You can check with .?application.International(xlListSeparator) in the immediate window of your VBEditor.
You can tell Excel to use a different separator, by e.g. adding sep=; as line 1 of your file. Hoever this entry is gone after opening the file. The following code - added before you open the csv file will add this:
Dim oFSo As Object
Dim oTxtFile As Object
Dim strData As String
Set oFSo = CreateObject("Scripting.FileSystemObject")
strData = oFSo.OpenTextFile(filePath & fileName & fileType, 1).ReadAll
Set oTxtFile = oFSo.OpenTextFile(filePath & fileName & fileType, 2)
oTxtFile.writeline "sep=;"
oTxtFile.writeline strData
oTxtFile.Close
You can save your changed file by using the Excel UserInterface Shortcuts via the Application.SendKeys thus achieving what you want:
Application.SendKeys ("^s") 'Save
Application.SendKeys ("^{F4}") 'Close
Dont run this code from the VBE Immeditate window as it will probabaly act on the wrong file!
The full code - just with an alternate way to make the requested change:
Sub DeleteRowFromFile(ByVal matchValue As String)
Dim filePath As String
Dim fileName As String
Dim fileType As String
Dim matchArray As Range
Dim rowToDelete As Variant
Dim oFSo As Object
Dim oTxtFile As Object
Dim strData As String
Dim content As Variant
Dim i As Long
'Naming variables for flexibility
filePath = "C:\Users\Maxim\Documents\Log\"
fileName = "TestRegister"
fileType = ".csv"
Application.ScreenUpdating = False
'Adding "sep =" ; as line 1 of the log file
Set oFSo = CreateObject("Scripting.FileSystemObject")
strData = oFSo.OpenTextFile(filePath & fileName & fileType, 1).ReadAll
Set oTxtFile = oFSo.OpenTextFile(filePath & fileName & fileType, 2)
oTxtFile.writeline "sep=;"
oTxtFile.writeline strData
oTxtFile.Close
'Open logfile
Workbooks.Open (filePath & fileName & fileType)
'Make your changes
With Workbooks(fileName).Worksheets(1)
content = .UsedRange.Value
For i = UBound(content, 1) To 1 Step -1
If content(i, 1) = matchValue Then
.Rows(i).Delete
End If
Next i
End With
'Closing the log file via Sendkeys using excel shortcuts
Application.SendKeys ("^s") 'Save
Application.SendKeys ("^{F4}") 'Close
Application.ScreenUpdating = True
I think that Match it is not required. Try this one.
Sub DeleteRowFromFile(ByVal matchValue As String)
Dim filePath As String
Dim fileName As String
Dim fileType As String
Dim matchArray As Range
Dim rowToDelete As Variant
'Naming variables for flexibility
filePath = "C:\Users\Maxim\Documents\Log\"
fileName = "TestRegister"
fileType = ".csv"
Application.ScreenUpdating = False
Workbooks.Open (filePath & fileName & fileType)
With Workbooks(fileName).Worksheets(1)
For i = .UsedRange.SpecialCells(xlCellTypeLastCell).Row To 1 Step -1
If .Cells(i, 1).Value2 = matchValue Then
.Cells(i, 1).EntireRow.Delete
End If
Next
End With
'Closing the log file
Workbooks(fileName & fileType).SaveAs Filename:= _
(filePath & fileName & fileType) _
, FileFormat:=xlCSVMSDOS, CreateBackup:=False 'Saving the file
Workbooks(fileName & fileType).Close 'Closing the file
Application.ScreenUpdating = True
End Sub
Hope it helps

How to export CSV file encoded with "Unicode"

Currently i using VBA code to export range data to a CSV file:
Sub Fct_Export_CSV_Migration() Dim Value As String Dim size As Integer
Value = ThisWorkbook.Path & "\Export_Migration" & Sheets(1).range("B20").Value & ".csv" chemincsv = Value
Worksheets("Correspondance Nv Arborescence").Select Dim Plage As Object, oL As Object, oC As Object, Tmp As String, Sep$ Sep = ";" size = Worksheets("Correspondance Nv Arborescence").range("B" & Rows.Count).End(xlUp).Row Set Plage = ActiveSheet.range("A1:B" & size)
Open chemincsv For Output As #1 For Each oL In Plage.Rows
Tmp = ""
For Each oC In oL.Cells
Tmp = Tmp & CStr(oC.Text) & Sep
Next
'take one less than length of the string number of characters from left, that would eliminate the trailing semicolon
Tmp = Left(Tmp, Len(Tmp) - 1)
Print #1, Tmp Next Close
MsgBox "OK! Export to " & Value End Sub
Now, i would like to export CSV encoded with "Unicode". I think i need to use VBA function like SaveAs( xlUnicodeText ) but how to use that ?
Thx
Unicode CSVs are not one of the file formats supported by Excel, out of the box. This means we cannot use the SaveAs method. The good news we can work around this restriction, using VBA.
My approach uses the file system object. This incredibly handy object is great for interacting with the file system. Before you can use it you will need to add a reference:
From the VBA IDE click Tools.
Click References...
Select Windows Script Host Object Model from the list.
Press OK.
The code:
' Saves the active sheet as a Unicode CSV.
Sub SaveAsUnicodeCSV()
Dim fso As FileSystemObject ' Provides access to the file system.
Dim ts As TextStream ' Writes to your text file.
Dim r As Range ' Used to loop over all used rows.
Dim c As Range ' Used to loop over all used columns.
' Use the file system object to write to the file system.
' WARNING: This code will overwrite any existing file with the same name.
Set fso = New FileSystemObject
Set ts = fso.CreateTextFile("!!YOUR FILE PATH HERE.CSV!!", True, True)
' Read each used row.
For Each r In ActiveSheet.UsedRange.Rows
' Read each used column.
For Each c In r.Cells
' Write content to file.
ts.Write c.Value
If c.Column < r.Columns.Count Then ts.Write ","
Next
' Add a line break, between rows.
If r.Row < ActiveSheet.UsedRange.Count Then ts.Write vbCrLf
Next
' Close the file.
ts.Close
' Release object variables before they leave scope, to reclaim memory and avoid leaks.
Set ts = Nothing
Set fso = Nothing
End Sub
This code loops over each used row in the active worksheet. Within each row, it loops over every column in use. The contents of each cell is appended to your text file. At the end of each row, a line break is added.
To use; simply replace !!YOUR FILE PATH HERE.CSV!! with your file name.

Resources