I have a piece of code that imports multiple text files with some data I need. I'd like to change it a bit - I want it to stop reading the file after reaching line number 50 in the text file and import only those first 50 lines. Is there a way I could do this? I was thinking about a loop that goes line by line and executes the code until the line number is larger than 50. I figured out a way to write such a loop, however it doesn't split the line into columns and I need that. Also in the way I wrote it it imports only 1 file. I had a code that worked in terms of reading multiple files and dividing them into columns, but I couldn't make it to end after 50 lines. I used QueryTables for this. Maybe instead of doing that loop I could draw on that?
Here's what I have - it obviously doesn't work:
Sub RT()
Dim fso As Object
Dim xlsheet As Worksheet
Dim qt As QueryTable
Dim txtfilesToOpen As Variant, txtfile As Variant
Dim rec As String
Dim i As Long
Dim txtfilnumber As Integer
Dim FileNumber
Dim txtline As String
i = 0
Application.ScreenUpdating = False
txtfilesToOpen = Application.GetOpenFilename _
(FileFilter:="Text Files (*.txt), *.txt", _
MultiSelect:=True, Title:="Text Files to Open")
With ActiveSheet
.Cells.ClearContents
For Each txtfile In txtfilesToOpen
importrow = 2 + .Cells(.Rows.Count, 1).End(xlUp).Row
With CreateObject("Scripting.FileSystemObject").OpenTextFile(txtfile)
Do While Not .AtEndOfStream
If .line < 50 Then
Cells(.line, 1).Value = .ReadLine
Else: Exit Do
End If
Loop
End With
Next txtfile
For Each qt In .QueryTables
qt.Delete
Next qt
End With
Application.ScreenUpdating = True
MsgBox "Successfully imported text files!", vbInformation, "SUCCESSFUL IMPORT"
Set fso = Nothing
End Sub
Does anyone know how I can approach this? I'm really new at this and still very lost. I'm pretty much stabbing in the dark here. If you could give me a tip on what I can do or what function to use I'll be really thankful!
Your code imports more that one file, however, it always overwrite the content of a previous imported file. You need to add importrow to the cell address.
When you want to split the text into several columns, you need to know how to split it. Do you have a field separator (Tab, Semicolon, comma)? Fixed length?
The following code will split the text into several cells assuming the semicolon as separator. It may be a little bit slow, but you will get the idea.
Do While Not .AtEndOfStream
If .line > 50 Then Exit Do
Dim txtLine as String, tokens() as String, i as long
txtLine = .ReadLine
tokens = Split(txtLine, ";")
For i = 0 to UBound(tokens)
.Cells(importrow + .line, i+1).Value = tokens(i)
Next i
Loop
Related
What i am trying to do is select multiple txt files, then copy-paste each one into its predetermined cell of the current working sheet, and organize it in columns. Every file has the same structure (10 by 10 for example), and needs to be allocated in a certain cell (for example file_1 into F14, file_2 into X14, etc.), with the same horizontal distance between each other.
What I am missing is a way to copy-paste every one of the selected files into the desired positions.
I tried to do something by doing research on the internet but i couldn't figure it out.
Thank you in advance for any help.
Here is the code I was working on:
Sub ImportTXTFiles()
Dim fso As Object
Dim xlsheet As Worksheet
Dim qt As QueryTable
Dim txtfilesToOpen As Variant, txtfile As Variant
Application.ScreenUpdating = False
Set fso = CreateObject("Scripting.FileSystemObject")
txtfilesToOpen = Application.GetOpenFilename _
(FileFilter:="Text Files (*.txt), *.txt", _
MultiSelect:=True, Title:="Text Files to Open")
i = 14
j = 6
With ActiveSheet
For Each txtfile In txtfilesToOpen
' WHAT DO I PUT HERE????
j = j + 20
Next txtfile
End With
End Sub
I have this code
Dim FileToOpen As Variant
Dim OpenBook As Workbook
FileToOpen = Application.GetOpenFilename("Text Files (*.txt), *.txt")
If FileToOpen <> False Then
Set OpenBook = Application.Workbooks.Open(FileToOpen)
OpenBook.Sheets(1).UsedRange.Select
Selection.NumberFormat = "#"
OpenBook.Sheets(1).UsedRange.Copy
ThisWorkbook.Worksheets("BOM").Range("C1").PasteSpecial xlPasteValues
OpenBook.Close False
End If
Which is how I tried to automate manual actions of:
Opening a .txt file
Ctrl + a
Ctrl + c
Pasting it in my workbook via VBA code which is irrelevant in this case.
In the end I end up with this kind of table (main workbook in the image below has .NumberFormat = "#"):
https://i.stack.imgur.com/98tiC.png
But when I run it with the code above - I end up with:
https://i.stack.imgur.com/bJahk.png
Ignore the column titles in the row 1.
The problem I faced is that this code I have above, opens .txt file contents with already lost leading "0" in a temporary excel workbook from where it then copies them to my active workbook.
I'm wondering if there's any ways around it to get what I am looking to get done i.e. properly automating the sequence of manual actions listed above via VBA code displaying a search message box as it does now and then me choosing a .txt file I need and getting all the contents from it to my active workbook while maintaining all leading zeros (the number of zeroes and length of strings may vary so no solutions of adding them back in again won't be what I'm looking for)
The issue that you have is that as soon as excel gets hold of the data it creates problems.
So read it as a text file and split each line and output that directly to your target range - it will therefor stop excel parsing any strings as values - after that you can do whatever you want
option explicit
Sub read_text()
Dim FileToOpen As Variant
FileToOpen = Application.GetOpenFilename("Text Files (*.txt), *.txt")
Dim max_cols As Long
max_cols = 0
Dim r_out As Range
Set r_out = ThisWorkbook.Worksheets("BOM").Range("C1")
Dim row_offset As Long
offset = 0
If FileToOpen <> False Then
Dim fso As Object
Dim file As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Set file = fso.OpenTextFile(FileToOpen, 1)
While Not file.AtEndOfStream
Dim line As String
line = file.ReadLine
Dim line_arr As Variant
line_arr = Split(line, vbTab)
ThisWorkbook.Worksheets("BOM").Range("C1").offset(row_offset, 0) _
.Resize(1, UBound(line_arr) - LBound(line_arr) + 1).Value = line_arr
row_offset = row_offset + 1
Wend
file.Close
End If
End Sub
output
How to import a Sheet from an external Workbook AND use the Filename (WITHOUT the .datatype at the end) as the new Worksheet name?
The part with WITHOUT the .datatype at the end I meant because I could split the filename from the file path with UBound, but when I try to do that with the filename and the filetype at the end, it doesn't work and gives me an error. Perhaps i dont understand ubound
well enough.
I found this Sub somewhere here on the forum.
But I don't want to import any sheet except the sheet which has the same name as the file itself. So I am not even sure if you need to specify the sheet name.
So I have this Excel file with VBA macros. And the Sheet is called Blank (Since I can't have an excel file without a sheet inside it) and
I have a Userform button where I browse for the file first, and the sheet there should be imported to my Excel File and delete the Blank sheet and import the new EXTERNAL sheet.
Also, it should import ANY Sheet from the file path. Because the names will always be different.
And also, how do I import the data as csv?
I am googling but I don't see what exactly causes it to be imported as csv at other peoples solutions.
Sub ImportSheet()
Dim sImportFile As String, sFile As String
Dim sThisBk As Workbook, wbBk As Workbook
Dim vfilename As Variant
Dim wsSht As Worksheet
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Set sThisBk = ActiveWorkbook
sImportFile = Application.GetOpenFilename( _
FileFilter:="Comma Separated Value, *.csv", Title:="Open Workbook")
If sImportFile = "False" Then
MsgBox "No File Selected!"
Exit Sub
Else
vfilename = Split(sImportFile, "\")
sFile = vfilename(UBound(vfilename))
Application.Workbooks.Open Filename:=sImportFile
Set wbBk = Workbooks(sFile)
With wbBk
If SheetExists("GaebTesten.g42_2") Then
Set wsSht = .Sheets("GaebTesten.g42_2")
wsSht.Copy Before:=sThisBk.Sheets("Start")
Else
MsgBox "There is no sheet with name :US in:" & vbCr & .Name
End If
wbBk.Close SaveChanges:=False
End With
End If
Application.ScreenUpdating = True
Application.DisplayAlerts = True
End Sub
Private Function SheetExists(sWSName As String) As Boolean
Dim ws As Worksheet
On Error Resume Next
Set ws = Worksheets(sWSName)
If Not ws Is Nothing Then SheetExists = True
End Function
this is my second post here on stack overflow, and my first question was very dumb, and when I asked my first question, it was my 2nd hour with vba.
I think I am at about 30 hours now and I've learned a lot.
Question: I am doing this Excel Macro in VBA with userform too now. But mostly I google how to do what and I try to implement it WHILE understanding it, I don't just copy and paste code. Often I just do line by line and test it out.
BUT... how do you guys remember all that?
If I had to program the same thing again right now, I won't know how to, because I know how a syntax works, but I wouldn't know which syntax and stuff to actually use to achieve the desired effect...
Does it come from repeating the same things = experience?
Or how do you acquire the abilities to code without googling almost every single thing? When watching youtubers live streaming how they code something, they never look it up on the internet....
Let me present you a different way than pure string manipulation:
Set a new reference to Microsoft Scripting Runtime. This will enable the Scripting namespace. With it you can do things like the following:
sImportFile = "C:\StackFolder\PrintMyName.xlsx"
With New Scripting.FileSystemObject
Debug.Print .GetBaseName(sImportFile)
' Outputs "PrintMyName"
Debug.Print .GetExtensionName(sImportFile)
' Outputs "xlsx"
Debug.Print .GetFileName(sImportFile)
' Outputs "PrintMyName.xlsx"
Debug.Print .GetDriveName(sImportFile)
' Outputs "C:"
Debug.Print .GetParentFolderName(sImportFile)
' Outputs "C:\StackFolder"
End With
You can build a little helper function to give you the part of the file name you need:
Public Function GetFilenameWithoutExtension(ByVal filename as String) as String
With New Scripting.FileSystemObject
GetFilenameWithoutExtension = .GetBaseName(filename)
End With
End Function
and call it: sFile = GetFilenameWithoutExtension(sImportFile)
Regarding the interesting use of UBound in your subroutine, you could even get the filename (without extension) that way - assuming it doesn't contain additional dots:
vfilename = Split(sImportFile, "\")
sFile = vfilename(UBound(vfilename))
SplitName = Split(sFile, ".")
FilenameWithoutExtension = SplitName(UBound(SplitName)-1)
Extension = SplitName(UBound(SplitName))
These are, however, purely academical thoughts and I wouldn't recommend doing it this way.
Here are two ways to extract the workbook name without the file extension. Here I am removing the extension .xlsx. If the extension is constant, you can just hard code it. If not, you can use wildcards also
MsgBox Left(wbBk.Name, Len(ThisWorkbook.Name) - 5)
MsgBox Replace(wbBk.Name, ".xlsx", "")
You can refer to the sheet with the same name as the workbook by using something like
Sheets(Left(wbBk.Name, Len(ThisWorkbook.Name) - 5).Copy
Sheets(Replace(wbBk.Name, ".xlsx", "").Copy
You can use InstrRev. It is efficient as starts from the end of the string which is where the extension is located.
Left$(wbBk.Name, InStrRev((wbBk.Name, ".") - 1)
I'm new in VBA. Before posting my question here,I have spent almost 3 days surfing Internet.
I have 300+ text files (text converted from PDF using OCR),from text file. I need to get all words that contain "alphabet" and "digits" (as example KT315A, KT-315-a, etc) along with source reference (txt file name).
What I need is
1.add "smart filter" that will copy only words that contains
"alphabets" and "digits"
paste copied data to column A
add reference file name to column B
I have found code below that can copy all data from text files into excel spreadsheet.
text files look like
"line from 252A-552A to ddddd, ,,, #,#,rrrr, 22 , ....kt3443 , fff,,,etc"
final result in xls should be
A | B
252A-552A | file1
kt3443 | file1
Option Explicit
Const sPath = "C:\outp\" 'remember end backslash
Const delim = "," 'comma delimited text file - EDIT
'Const delim = vbTab 'for TAB delimited text files
Sub ImportMultipleTextFiles()
Dim wb As Workbook
Dim sFile As String
Dim inputRow As Long
RefreshSheet
On Error Resume Next
sFile = Dir(sPath & "*.txt")
Do Until sFile = ""
inputRow = Sheets("Temp").Range("A" & Rows.Count).End(xlUp).Row + 1
'open the text file
'format=6 denotes a text file
Set wb = Workbooks.Open(Filename:=sPath & sFile, _
Format:=6, _
Delimiter:=delim)
'copy and paste
wb.Sheets(1).Range("A1").CurrentRegion.Copy _
Destination:=ThisWorkbook.Sheets("Temp").Range("A" & inputRow)
wb.Close SaveChanges:=False
'get next text file
sFile = Dir()
Loop
Set wb = Nothing
End Sub
Sub RefreshSheet()
'delete old sheet and add a new one
On Error Resume Next
Application.DisplayAlerts = False
Sheets("Temp").Delete
Application.DisplayAlerts = True
Worksheets.Add
ActiveSheet.Name = "Temp"
On Error GoTo 0
End Sub
thanks!
It's a little tough to tell exactly what constitutes a word from your example. It clearly can contain characters other than letters and numbers (eg the dash), but some of the items have dots preceding, so it cannot be defined as being delimited by a space.
I defined a "word" as a string that
Starts with a letter or digit and ends with a letter or digit
Contains both letters and digits
Might also contain any other non-space characters except a comma
To do this, I first replaced all the commas with spaces, and then applied an appropriate regular expression. However, this might accept undesired strings, so you might need to be more specific in defining exactly what is a word.
Also, instead of reading the entire file into an Excel workbook, by using the FileSystemObject we can process one line at a time, without reading 300 files into Excel. The base folder is set, as you did, by a constant in the VBA code.
But there are other ways to do this.
Be sure to set the references for early binding as noted in the code:
Option Explicit
'Set References to:
' Microsoft Scripting Runtime
' Microsoft VBscript Regular Expressions 5.5
Sub SearchMultipleTextFiles()
Dim FSO As FileSystemObject
Dim TS As TextStream, FO As Folder, FI As File, FIs As Files
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim WS As Worksheet, RW As Long
Const sPath As String = "C:\Users\Ron\Desktop"
Set FSO = New FileSystemObject
Set FO = FSO.GetFolder(sPath)
Set WS = ActiveSheet
WS.Columns.Clear
Set RE = New RegExp
With RE
.Global = True
.Pattern = "(?:\d(?=\S*[a-z])|[a-z](?=\S*\d))+\S*[a-z\d]"
.IgnoreCase = True
End With
For Each FI In FO.Files
If FI.Name Like "*.txt" Then
Set TS = FI.OpenAsTextStream(ForReading)
Do Until TS.AtEndOfStream
'Change .ReadLine to .ReadAll *might* make this run faster
' but would need to be tested.
Set MC = RE.Execute(Replace(TS.ReadLine, ",", " "))
If MC.Count > 0 Then
For Each M In MC
RW = RW + 1
WS.Cells(RW, 1) = M
WS.Cells(RW, 2) = FI.Name
Next M
End If
Loop
End If
Next FI
End Sub
I am trying to extract the data from a PDF document into a worksheet. The PDFs show and text can be manually copied and pasted into the Excel document.
I am currently doing this through SendKeys and it is not working. I get an error when I try to paste the data from the PDF document. Why is my paste not working? If I paste after the macro has stopped running it pastes as normal.
Dim myPath As String, myExt As String
Dim ws As Worksheet
Dim openPDF As Object
'Dim pasteData As MSForms.DataObject
Dim fCell As Range
'Set pasteData = New MSForms.DataObject
Set ws = Sheets("DATA")
If ws.Cells(ws.Rows.Count, "A").End(xlUp).Row > 1 Then Range("A3:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row).ClearContents
myExt = "\*.pdf"
'When Scan Receipts Button Pressed Scan the selected folder/s for receipts
For Each fCell In Range(ws.Cells(1, 1), ws.Cells(1, ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column))
myPath = Dir(fCell.Value & myExt)
Do While myPath <> ""
myPath = fCell.Value & "\" & myPath
Set openPDF = CreateObject("Shell.Application")
openPDF.Open (myPath)
Application.Wait Now + TimeValue("00:00:2")
SendKeys "^a"
Application.Wait Now + TimeValue("00:00:2")
SendKeys "^c"
'Application.Wait Now + TimeValue("00:00:2")
ws.Select
ActiveSheet.Paste
'pasteData.GetFromClipboard
'ws.Cells(3, 1) = pasteData.GetText
Exit Sub
myPath = Dir
Loop
Next fCell
You can open the PDF file and extract its contents using the Adobe library (which I believe you can download from Adobe as part of the SDK, but it comes with certain versions of Acrobat as well)
Make sure to add the Library to your references too (On my machine it is the Adobe Acrobat 10.0 Type Library, but not sure if that is the newest version)
Even with the Adobe library it is not trivial (you'll need to add your own error-trapping etc):
Function getTextFromPDF(ByVal strFilename As String) As String
Dim objAVDoc As New AcroAVDoc
Dim objPDDoc As New AcroPDDoc
Dim objPage As AcroPDPage
Dim objSelection As AcroPDTextSelect
Dim objHighlight As AcroHiliteList
Dim pageNum As Long
Dim strText As String
strText = ""
If (objAvDoc.Open(strFilename, "") Then
Set objPDDoc = objAVDoc.GetPDDoc
For pageNum = 0 To objPDDoc.GetNumPages() - 1
Set objPage = objPDDoc.AcquirePage(pageNum)
Set objHighlight = New AcroHiliteList
objHighlight.Add 0, 10000 ' Adjust this up if it's not getting all the text on the page
Set objSelection = objPage.CreatePageHilite(objHighlight)
If Not objSelection Is Nothing Then
For tCount = 0 To objSelection.GetNumText - 1
strText = strText & objSelection.GetText(tCount)
Next tCount
End If
Next pageNum
objAVDoc.Close 1
End If
getTextFromPDF = strText
End Function
What this does is essentially the same thing you are trying to do - only using Adobe's own library. It's going through the PDF one page at a time, highlighting all of the text on the page, then dropping it (one text element at a time) into a string.
Keep in mind what you get from this could be full of all kinds of non-printing characters (line feeds, newlines, etc) that could even end up in the middle of what look like contiguous blocks of text, so you may need additional code to clean it up before you can use it.
Hope that helps!
I know this is an old issue but I just had to do this for a project at work, and I am very surprised that nobody has thought of this solution yet:
Just open the .pdf with Microsoft word.
The code is a lot easier to work with when you are trying to extract data from a .docx because it opens in Microsoft Word. Excel and Word play well together because they are both Microsoft programs. In my case, the file of question had to be a .pdf file. Here's the solution I came up with:
Choose the default program to open .pdf files to be Microsoft Word
The first time you open a .pdf file with word, a dialogue box pops up claiming word will need to convert the .pdf into a .docx file. Click the check box in the bottom left stating "do not show this message again" and then click OK.
Create a macro that extracts data from a .docx file. I used MikeD's Code as a resource for this.
Tinker around with the MoveDown, MoveRight, and Find.Execute methods to fit the need of your task.
Yes you could just convert the .pdf file to a .docx file but this is a much simpler solution in my opinion.
Over time, I have found that extracting text from PDFs in a structured format is tough business. However if you are looking for an easy solution, you might want to consider XPDF tool pdftotext.
Pseudocode to extract the text would include:
Using SHELL VBA statement to extract the text from PDF to a temporary file using XPDF
Using sequential file read statements to read the temporary file contents into a string
Pasting the string into Excel
Simplified example below:
Sub ReadIntoExcel(PDFName As String)
'Convert PDF to text
Shell "C:\Utils\pdftotext.exe -layout " & PDFName & " tempfile.txt"
'Read in the text file and write to Excel
Dim TextLine as String
Dim RowNumber as Integer
Dim F1 as Integer
RowNumber = 1
F1 = Freefile()
Open "tempfile.txt" for Input as #F1
While Not EOF(#F1)
Line Input #F1, TextLine
ThisWorkbook.WorkSheets(1).Cells(RowNumber, 1).Value = TextLine
RowNumber = RowNumber + 1
Wend
Close #F1
End Sub
Since I do not prefer to rely on external libraries and/or other programs, I have extended your solution so that it works.
The actual change here is using the GetFromClipboard function instead of Paste which is mainly used to paste a range of cells.
Of course, the downside is that the user must not change focus or intervene during the whole process.
Dim pathPDF As String, textPDF As String
Dim openPDF As Object
Dim objPDF As MsForms.DataObject
pathPDF = "C:\some\path\data.pdf"
Set openPDF = CreateObject("Shell.Application")
openPDF.Open (pathPDF)
'TIME TO WAIT BEFORE/AFTER COPY AND PASTE SENDKEYS
Application.Wait Now + TimeValue("00:00:2")
SendKeys "^a"
Application.Wait Now + TimeValue("00:00:2")
SendKeys "^c"
Application.Wait Now + TimeValue("00:00:1")
AppActivate ActiveWorkbook.Windows(1).Caption
objPDF.GetFromClipboard
textPDF = objPDF.GetText(1)
MsgBox textPDF
If you're interested see my project in github.
Copying and pasting by user interactions emulation could be not reliable (for example, popup appears and it switches the focus). You may be interested in trying the commercial ByteScout PDF Extractor SDK that is specifically designed to extract data from PDF and it works from VBA. It is also capable of extracting data from invoices and tables as CSV using VB code.
Here is the VBA code for Excel to extract text from given locations and save them into cells in the Sheet1:
Private Sub CommandButton1_Click()
' Create TextExtractor object
' Set extractor = CreateObject("Bytescout.PDFExtractor.TextExtractor")
Dim extractor As New Bytescout_PDFExtractor.TextExtractor
extractor.RegistrationName = "demo"
extractor.RegistrationKey = "demo"
' Load sample PDF document
extractor.LoadDocumentFromFile ("c:\sample1.pdf")
' Get page count
pageCount = extractor.GetPageCount()
Dim wb As Workbook
Dim ws As Worksheet
Dim TxtRng As Range
Set wb = ActiveWorkbook
Set ws = wb.Sheets("Sheet1")
For i = 0 To pageCount - 1
RectLeft = 10
RectTop = 10
RectWidth = 100
RectHeight = 100
' check the same text is extracted from returned coordinates
extractor.SetExtractionArea RectLeft, RectTop, RectWidth, RectHeight
' extract text from given area
extractedText = extractor.GetTextFromPage(i)
' insert rows
' Rows(1).Insert shift:=xlShiftDown
' write cell value
Set TxtRng = ws.Range("A" & CStr(i + 2))
TxtRng.Value = extractedText
Next
Set extractor = Nothing
End Sub
Disclosure: I am related to ByteScout
Using Bytescout PDF Extractor SDK is a good option. It is cheap and gives plenty of PDF related functionality. One of the answers above points to the dead page Bytescout on GitHub. I am providing a relevant working sample to extract table from PDF. You may use it to export in any format.
Set extractor = CreateObject("Bytescout.PDFExtractor.StructuredExtractor")
extractor.RegistrationName = "demo"
extractor.RegistrationKey = "demo"
' Load sample PDF document
extractor.LoadDocumentFromFile "../../sample3.pdf"
For ipage = 0 To extractor.GetPageCount() - 1
' starting extraction from page #"
extractor.PrepareStructure ipage
rowCount = extractor.GetRowCount(ipage)
For row = 0 To rowCount - 1
columnCount = extractor.GetColumnCount(ipage, row)
For col = 0 To columnCount-1
WScript.Echo "Cell at page #" +CStr(ipage) + ", row=" & CStr(row) & ", column=" & _
CStr(col) & vbCRLF & extractor.GetCellValue(ipage, row, col)
Next
Next
Next
Many more samples available here: https://github.com/bytescout/pdf-extractor-sdk-samples
To improve the solution of Slinky Sloth I had to add this beforere get from clipboard :
Set objPDF = New MSForms.DataObject
Sadly it didn't worked for a pdf of 10 pages.
This doesn't seem to work with the Adobe Type library. As soon as it gets to Open, I get a 429 error. Acrobat works fine though...