Excel - Replace Google's Drive hyperlinks with local folder paths - excel

I have an Excel doc with Google's drive hyperlinks to photos, I want to change these to link local photos I downloaded in a folder instead. Is this possible without having to do it manually ?
Hyperlink : https://drive.google.com/open?id=1yCSptfKRkbkN39Lkbz2yXLM0CI332_DC
Image name : _storage_emulated_0_odk_instances_CASA_2018-06-22_15-29-52_1529678182622.jpg

It looks to me like you are using shareable links from Google drive - this means that the filename of the images is not visible in the link, and hence you need to uncover the filename by opening the link. We can do this in VBA by invoking a browser object, here, using Internet Explorer:
Sub GetFileName()
Dim ie As Object
Set ie = CreateObject("Internetexplorer.Application")
ie.Navigate "https://drive.google.com/open?id=1yCSptfKRkbkN39Lkbz2yXLM0CI332_DC"
While ie.busy = True 'Allow the website to load
Application.Wait (Now + TimeValue("0:00:01"))
Wend
Debug.Print (ie.Document.Title)
ie.Quit
End Sub
This gets us the filename /storage/emulated/0/odk/instances/CASA_2018-06-22_15-29-52/1529678182622.jpg for the link you have. As you have stated that the filename on your computer is: _storage_emulated_0_odk_instances_CASA_2018-06-22_15-29-52_1529678182622.jpg, we replace \ with _ using the replace-function. We will also need to remove the " - Google Disk" text from the end of the filename:
Sub GetFileName()
Dim ie As Object
Dim fname As String 'Saving filename as string for later use
Set ie = CreateObject("Internetexplorer.Application")
ie.Navigate "https://drive.google.com/open?id=1yCSptfKRkbkN39Lkbz2yXLM0CI332_DC"
While ie.busy = True 'Allow the website to load the image (wait for 1 second if browser is busy)
Application.Wait (Now + TimeValue("0:00:01"))
Wend
fname = ie.Document.Title
ie.Quit
fname = Replace(fname, "/", "_") 'Changing filename to fit your local file
fname = Replace(fname, " - Google Disk", "") 'Removing the additional text from the filename
Debug.Print (fname)
End Sub
Now that we have that working, we can loop through the region in your excel sheet where you have the hyperlinks saved. We will also make sure that Excel recognises the paths to the local file as a hyperlink, using Hyperlinks.Add:
Sub GetFileName()
Dim ie As Object
Dim fname As String, wlink As String, lpath As String
lpath = "C:\Users\LocalAccount\Downloads\" 'The folder where you have the images saved
Set ie = CreateObject("Internetexplorer.Application")
For i = 1 To 10 'Replace 1 and 10 with your first and last row of hyperlinks
wlink = Cells(i, 2).Value 'My links are in column B, hence "2". Change this to fit your sheet (1 for column A, 3 for Column C, etc.)
ie.Navigate wlink
While ie.busy = True 'Allow the website to load the image (wait for 1 second if browser is busy)
Application.Wait (Now + TimeValue("0:00:01"))
Wend
fname = ie.Document.Title
fname = Replace(fname, "/", "_")
fname = Replace(fname, " - Google Disk", "") 'Removing the additional text from the filename
fname = lpath + fname
Cells(i, 2).Value = fname 'Replaces the hyperlink with the local filename
Cells(i, 2).Hyperlinks.Add Cells(i, 2), Cells(i, 2).Value
Next i
ie.Quit
End Sub
This should solve your problem - let me know if you have any troubles.
PS: Remember to set the lpath variable to the folder path where you have the local images

Related

Open PDF and save specific pages as new PDF

I have a PDF that I need to mail out each day. I have a macro set up in my excel file that updates a table of data that goes in the body of the email and then it opens up this PDF file and saves 4 pages as a PDF and attaches it to the email that I send.
The problem is, sendkeys really isn't that reliable and I'd like to use something else or have it just silently open and save those specific pages as a new pdf in my temp folder. Any ideas would be appreciated!
Option Explicit
Public Sub Print_All_PDF_Files_in_Folder()
On Error Resume Next
Kill "C:\temp\S4 Region.pdf"
On Error GoTo 0
Dim folder As String
Dim PDFfilename As String
folder = "location of pdf" 'CHANGE AS REQUIRED
If Right(folder, 1) <> "\" Then folder = folder & "\"
PDFfilename = Dir(folder & "S4 Reg" & "*.pdf", vbNormal)
While Len(PDFfilename) <> 0
Print_PDF folder & PDFfilename
PDFfilename = Dir() ' Get next matching file
Wend
Call ClosePDF
End Sub
Private Sub Print_PDF(sPDFfile As String)
Shell "C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe /p " & Chr(34) & sPDFfile & Chr(34)
SendKeys "p"
SendKeys "%g"
SendKeys "{tab}"
SendKeys "5,9,14,15"
SendKeys "%r"
SendKeys "{down 2}"
Application.Wait DateAdd("s", 10, Now)
SendKeys "{enter}"
Application.Wait DateAdd("s", 15, Now)
SendKeys "{tab 6}"
SendKeys "{enter}"
SendKeys "C:\temp"
SendKeys "%s"
Application.Wait DateAdd("s", 10, Now)
End Sub
Sub ClosePDF()
Dim Process As Object, intError As Integer
For Each Process In GetObject("winmgmts:{impersonationLevel=impersonate}!\\.\root\cimv2").ExecQuery("select * from win32_process where name='AcroRd32.exe'")
intError = Process.Terminate 'Terminates a process and all of its threads.
If intError <> 0 Then Exit For 'Return value is 0 for success. Any other number is an error.
Next
End Sub
silently open and save those specific pages as a new pdf in my temp folder.
You need to use the Workbook.ExportAsFixedFormat method which is used to publish a workbook to either the PDF or XPS format. Pay attention to the following parameters:
From - The number of the page at which to start publishing. If this argument is omitted, publishing starts at the beginning.
To - The number of the last page to publish. If this argument is omitted, publishing ends with the last page.
OpenAfterPublish - If set to True, displays the file in the viewer after it is published. If set to False, the file is published but not displayed.
ActiveWorkbook.ExportAsFixedFormat Type:=xlTypePDF FileName:="sales.pdf" Quality:=xlQualityStandard From:=2 To:=4 OpenAfterPublish:=True

Can't embed movie posters next to movie names in a spreadsheet

I've written a script in vba to download different movie posters from a torrent site and embed them in a spreadsheet right next to its concerning movies. My script can parse the movie names in a spreadsheet and download & save the images in a folder. What I can't do is place the downloaded images next to each movie name in a spreadsheet.
How can I place the movie posters in concerning cells right next to each movie name?
My script so far:
Sub DownloadAndEmbedImages()
'customized directory location within double quotes
Const strPath$ = "C:\Users\WCS\Desktop\Test\"
Dim Http As New XMLHTTP60, Html As New HTMLDocument
Dim post As Object, imgArr As Variant, R&
'check out if the folder is empty. If not empty, delete them to download anew
If Dir(strPath & "*.*") <> "" Then Kill strPath & "*.*"
With Http
.Open "GET", "https://yts.am/browse-movies", False
.send
Html.body.innerHTML = .responseText
End With
For Each post In Html.getElementsByClassName("img-responsive")
R = R + 1: Sheets("Sheet1").Cells(R, 1) = post.alt
imgArr = Split(post.src, "/")
imgArr = imgArr(UBound(imgArr) - 1) & ".jpg"
Http.Open "GET", post.src, False
Http.send
With CreateObject("ADODB.Stream")
.Open
.Type = 1
.write Http.responseBody
.SaveToFile (strPath & imgArr)
.Close
End With
Next post
End Sub
Referenece to add to execute the above script:
Microsoft XML, v6.0
Microsoft HTML Object library

VBE macro that creates shortcuts which include the autor of the linked file as property

this is my first question so I would love to improve my style and such. Just tell me if I am doing something completely wrong.
My question:
I am searching files with a specific extensions. All results get printed to excel and then create shortcuts to each file which get then stored in a folder. This works perfectly fine for now, but I need the shortcut to include the author detail to filter all entries (hundreds to thousends) for it.
The result should be a shortcut with the same properties that you get when using the 'create shortcut' from context menu vie right click.
I hope you can help my since I am trying to get this to work for a while now.
If you know a solution, that does what I need but is maybe written in a different language that is fine for me as long as the user does not have to install runtimes/libraries (sory I am a complete beginner)
My code:
'This function searches for files with endings (ppt,pptx,pptm) and pastes the found entries into the excel sheet
Function Recurse(sPath As String) As String
Dim FSO As New FileSystemObject
Dim myFolder As Folder
Dim mySubFolder As Folder
Dim myFile As File
Set myFolder = FSO.GetFolder(sPath)
Set Extensions = CreateObject("Scripting.Dictionary")
Extensions.CompareMode = 1 ' make lookups case-insensitive
'Extensions.Add Range("C5").Value, True
Extensions.Add "pptx", True
Extensions.Add "ppt", True
Extensions.Add "pptm", True
For Each mySubFolder In myFolder.SubFolders
For Each myFile In mySubFolder.Files
'
i = Range("D4").Value
If Extensions.Exists(FSO.GetExtensionName(myFile)) Then
Cells(8 + i, 3).Value = myFile.Name
Cells(8 + i, 4).Value = myFile.Path
i = i + 1
Range("D4").Value = i 'storing number of entrys found
'Exit For
End If
Next
Recurse = Recurse(mySubFolder.Path)
Next
End Function
'This Function creates a folder with the name "A1" if it does not exist already
Function PathExist(ByVal vPfadName As String) As Boolean
scutPath = Application.ActiveWorkbook.Path & "\" & Range("A1").Value
On Error GoTo ErrorPathExist
ChDir (vPfadName)
PathExist = True
Exit Function
ErrorPathExist:
MkDir scutPath
End Function
'Main Function that clears table and uses the found entries to get create shortcuts. Unfortunately the author is not integrated when doing it this way. The author is necessary to filter through hundreds of results.
Sub TestR()
Range("B8:C999999") = ""
Range("D4").Value = 0
Call Recurse(Application.ActiveWorkbook.Path)
i = 1
scutPath = Application.ActiveWorkbook.Path & "\" & Range("A1").Value
Call PathExist(scutPath)
For i = 1 To 200 '(last line)
Set oWSH = CreateObject("WScript.Shell")
Set oShortcut = oWSH.CreateShortCut(scutPath & "\" & Cells(7 + i, 3).Value & ".lnk")
With oShortcut
.TargetPath = Cells(7 + i, 4).Value
.Save
End With
Set oWSH = Nothing
Next i
MsgBox "Done"
End Sub

Extract Data from PDF and Add to Worksheet

I am trying to extract the data from a PDF document into a worksheet. The PDFs show and text can be manually copied and pasted into the Excel document.
I am currently doing this through SendKeys and it is not working. I get an error when I try to paste the data from the PDF document. Why is my paste not working? If I paste after the macro has stopped running it pastes as normal.
Dim myPath As String, myExt As String
Dim ws As Worksheet
Dim openPDF As Object
'Dim pasteData As MSForms.DataObject
Dim fCell As Range
'Set pasteData = New MSForms.DataObject
Set ws = Sheets("DATA")
If ws.Cells(ws.Rows.Count, "A").End(xlUp).Row > 1 Then Range("A3:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row).ClearContents
myExt = "\*.pdf"
'When Scan Receipts Button Pressed Scan the selected folder/s for receipts
For Each fCell In Range(ws.Cells(1, 1), ws.Cells(1, ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column))
myPath = Dir(fCell.Value & myExt)
Do While myPath <> ""
myPath = fCell.Value & "\" & myPath
Set openPDF = CreateObject("Shell.Application")
openPDF.Open (myPath)
Application.Wait Now + TimeValue("00:00:2")
SendKeys "^a"
Application.Wait Now + TimeValue("00:00:2")
SendKeys "^c"
'Application.Wait Now + TimeValue("00:00:2")
ws.Select
ActiveSheet.Paste
'pasteData.GetFromClipboard
'ws.Cells(3, 1) = pasteData.GetText
Exit Sub
myPath = Dir
Loop
Next fCell
You can open the PDF file and extract its contents using the Adobe library (which I believe you can download from Adobe as part of the SDK, but it comes with certain versions of Acrobat as well)
Make sure to add the Library to your references too (On my machine it is the Adobe Acrobat 10.0 Type Library, but not sure if that is the newest version)
Even with the Adobe library it is not trivial (you'll need to add your own error-trapping etc):
Function getTextFromPDF(ByVal strFilename As String) As String
Dim objAVDoc As New AcroAVDoc
Dim objPDDoc As New AcroPDDoc
Dim objPage As AcroPDPage
Dim objSelection As AcroPDTextSelect
Dim objHighlight As AcroHiliteList
Dim pageNum As Long
Dim strText As String
strText = ""
If (objAvDoc.Open(strFilename, "") Then
Set objPDDoc = objAVDoc.GetPDDoc
For pageNum = 0 To objPDDoc.GetNumPages() - 1
Set objPage = objPDDoc.AcquirePage(pageNum)
Set objHighlight = New AcroHiliteList
objHighlight.Add 0, 10000 ' Adjust this up if it's not getting all the text on the page
Set objSelection = objPage.CreatePageHilite(objHighlight)
If Not objSelection Is Nothing Then
For tCount = 0 To objSelection.GetNumText - 1
strText = strText & objSelection.GetText(tCount)
Next tCount
End If
Next pageNum
objAVDoc.Close 1
End If
getTextFromPDF = strText
End Function
What this does is essentially the same thing you are trying to do - only using Adobe's own library. It's going through the PDF one page at a time, highlighting all of the text on the page, then dropping it (one text element at a time) into a string.
Keep in mind what you get from this could be full of all kinds of non-printing characters (line feeds, newlines, etc) that could even end up in the middle of what look like contiguous blocks of text, so you may need additional code to clean it up before you can use it.
Hope that helps!
I know this is an old issue but I just had to do this for a project at work, and I am very surprised that nobody has thought of this solution yet:
Just open the .pdf with Microsoft word.
The code is a lot easier to work with when you are trying to extract data from a .docx because it opens in Microsoft Word. Excel and Word play well together because they are both Microsoft programs. In my case, the file of question had to be a .pdf file. Here's the solution I came up with:
Choose the default program to open .pdf files to be Microsoft Word
The first time you open a .pdf file with word, a dialogue box pops up claiming word will need to convert the .pdf into a .docx file. Click the check box in the bottom left stating "do not show this message again" and then click OK.
Create a macro that extracts data from a .docx file. I used MikeD's Code as a resource for this.
Tinker around with the MoveDown, MoveRight, and Find.Execute methods to fit the need of your task.
Yes you could just convert the .pdf file to a .docx file but this is a much simpler solution in my opinion.
Over time, I have found that extracting text from PDFs in a structured format is tough business. However if you are looking for an easy solution, you might want to consider XPDF tool pdftotext.
Pseudocode to extract the text would include:
Using SHELL VBA statement to extract the text from PDF to a temporary file using XPDF
Using sequential file read statements to read the temporary file contents into a string
Pasting the string into Excel
Simplified example below:
Sub ReadIntoExcel(PDFName As String)
'Convert PDF to text
Shell "C:\Utils\pdftotext.exe -layout " & PDFName & " tempfile.txt"
'Read in the text file and write to Excel
Dim TextLine as String
Dim RowNumber as Integer
Dim F1 as Integer
RowNumber = 1
F1 = Freefile()
Open "tempfile.txt" for Input as #F1
While Not EOF(#F1)
Line Input #F1, TextLine
ThisWorkbook.WorkSheets(1).Cells(RowNumber, 1).Value = TextLine
RowNumber = RowNumber + 1
Wend
Close #F1
End Sub
Since I do not prefer to rely on external libraries and/or other programs, I have extended your solution so that it works.
The actual change here is using the GetFromClipboard function instead of Paste which is mainly used to paste a range of cells.
Of course, the downside is that the user must not change focus or intervene during the whole process.
Dim pathPDF As String, textPDF As String
Dim openPDF As Object
Dim objPDF As MsForms.DataObject
pathPDF = "C:\some\path\data.pdf"
Set openPDF = CreateObject("Shell.Application")
openPDF.Open (pathPDF)
'TIME TO WAIT BEFORE/AFTER COPY AND PASTE SENDKEYS
Application.Wait Now + TimeValue("00:00:2")
SendKeys "^a"
Application.Wait Now + TimeValue("00:00:2")
SendKeys "^c"
Application.Wait Now + TimeValue("00:00:1")
AppActivate ActiveWorkbook.Windows(1).Caption
objPDF.GetFromClipboard
textPDF = objPDF.GetText(1)
MsgBox textPDF
If you're interested see my project in github.
Copying and pasting by user interactions emulation could be not reliable (for example, popup appears and it switches the focus). You may be interested in trying the commercial ByteScout PDF Extractor SDK that is specifically designed to extract data from PDF and it works from VBA. It is also capable of extracting data from invoices and tables as CSV using VB code.
Here is the VBA code for Excel to extract text from given locations and save them into cells in the Sheet1:
Private Sub CommandButton1_Click()
' Create TextExtractor object
' Set extractor = CreateObject("Bytescout.PDFExtractor.TextExtractor")
Dim extractor As New Bytescout_PDFExtractor.TextExtractor
extractor.RegistrationName = "demo"
extractor.RegistrationKey = "demo"
' Load sample PDF document
extractor.LoadDocumentFromFile ("c:\sample1.pdf")
' Get page count
pageCount = extractor.GetPageCount()
Dim wb As Workbook
Dim ws As Worksheet
Dim TxtRng As Range
Set wb = ActiveWorkbook
Set ws = wb.Sheets("Sheet1")
For i = 0 To pageCount - 1
RectLeft = 10
RectTop = 10
RectWidth = 100
RectHeight = 100
' check the same text is extracted from returned coordinates
extractor.SetExtractionArea RectLeft, RectTop, RectWidth, RectHeight
' extract text from given area
extractedText = extractor.GetTextFromPage(i)
' insert rows
' Rows(1).Insert shift:=xlShiftDown
' write cell value
Set TxtRng = ws.Range("A" & CStr(i + 2))
TxtRng.Value = extractedText
Next
Set extractor = Nothing
End Sub
Disclosure: I am related to ByteScout
Using Bytescout PDF Extractor SDK is a good option. It is cheap and gives plenty of PDF related functionality. One of the answers above points to the dead page Bytescout on GitHub. I am providing a relevant working sample to extract table from PDF. You may use it to export in any format.
Set extractor = CreateObject("Bytescout.PDFExtractor.StructuredExtractor")
extractor.RegistrationName = "demo"
extractor.RegistrationKey = "demo"
' Load sample PDF document
extractor.LoadDocumentFromFile "../../sample3.pdf"
For ipage = 0 To extractor.GetPageCount() - 1
' starting extraction from page #"
extractor.PrepareStructure ipage
rowCount = extractor.GetRowCount(ipage)
For row = 0 To rowCount - 1
columnCount = extractor.GetColumnCount(ipage, row)
For col = 0 To columnCount-1
WScript.Echo "Cell at page #" +CStr(ipage) + ", row=" & CStr(row) & ", column=" & _
CStr(col) & vbCRLF & extractor.GetCellValue(ipage, row, col)
Next
Next
Next
Many more samples available here: https://github.com/bytescout/pdf-extractor-sdk-samples
To improve the solution of Slinky Sloth I had to add this beforere get from clipboard :
Set objPDF = New MSForms.DataObject
Sadly it didn't worked for a pdf of 10 pages.
This doesn't seem to work with the Adobe Type library. As soon as it gets to Open, I get a 429 error. Acrobat works fine though...

Copying data from multiple pdf files

I have pdf files from which I would like to copy all the data to a column in a spreadsheet.
Here is the code I have. All it does is open the pdf, use control-a, then control-c to copy then activates the workbook, finds an open column and pastes the data with a control-v Sendkey.
I have a range with path names it opens and copies data from all but only pastes the last one.
Sub StartAdobe1()
Dim AdobeApp As String
Dim AdobeFile As String
Dim StartAdobe
Dim fname As Variant
Dim iRow As Integer
Dim Filename As String
For Each fname In Range("path")
AdobeApp = "C:\Program Files (x86)\Adobe\Reader 10.0\Reader\AcroRd32.exe"
StartAdobe = Shell("" & AdobeApp & " " & fname & "", 1)
Application.Wait Now + TimeValue("00:00:01")
SendKeys "^a", True
Application.Wait Now + TimeValue("00:00:01")
SendKeys "^c"
Application.Wait Now + TimeValue("00:00:01")
SendKeys ("%{F4}")
Windows("transfer (Autosaved).xlsm").Activate
Worksheets("new").Activate
ActiveSheet.Range("A1").Select
Selection.End(xlToRight).Offset(0, 1).Select
SendKeys "^v"
Application.Wait Now + TimeValue("00:00:2")
Next fname
Jeanno's right, if you have Acrobat then using its API library to work with the file directly is much better than the workarounds. I use this every day to convert pdf files into database entries.
Your code has a few problems, but I suspect the biggest issue is the use of SendKeys "^v" to paste into Excel. You're better off selecting the cell you want then using Selection.Paste. Or even better, transfer the contents of the clipboard to a variable, then parse it out as needed on the backend before writing to your spreadsheet--but that adds a bunch of complexity and doesn't help you a lot in this case.
To use the code below, be sure to select your 'Acrobat x.x Type Library' under Tools>References.
Sub StartAdobe1()
Dim fName As Variant
Dim wbTransfer As Excel.Workbook
Dim wsNew As Excel.Worksheet
Dim dOpenCol As Double
Dim oPDFApp As AcroApp
Dim oAVDoc As AcroAVDoc
Dim oPDDoc As AcroPDDoc
'Define your spreadsheet
Set wbTransfer = Workbooks("transfer (Autosaved).xlsm")
Set wsNew = wbTransfer.Sheets("new")
'Find first open column
dOpenCol = ws.Cells(1, columns.count).End(xlToleft).Column + 1
'Instantiate Acrobat Objects
Set oPDFApp = CreateObject("AcroExch.App")
Set oAVDoc = CreateObject("AcroExch.AVDoc")
Set oPDDoc = CreateObject("AcroExch.PDDoc")
For Each fName In Range("path")
'Open the PDF file. The AcroAVDoc.Open function returns a true/false
'to tell you if it worked
If oAVDoc.Open(fName.Text, "") = True Then
Set oPDDoc = oAVDoc.GetPDDoc
Else
Debug.Assert False
End If
'Copy all using Acrobat menu
oPDFApp.MenuItemExecute ("SelectAll")
oPDFApp.MenuItemExecute ("Copy")
'Paste into open column
wbTransfer.Activate
wsNew.Cells(1, dOpenCol).Select
ActiveSheet.Paste
'Select next open column
dOpenCol = dOpenCol + 1
oAVDoc.Close (1) '(1)=Do not save changes
oPDDoc.Close
Next
'Clean up
Set wbTransfer = Nothing
Set wsNew = Nothing
Set oPDFApp = Nothing
Set oAVDoc = Nothing
Set oPDDoc = Nothing
End Sub
Note:
1-There is also a menu item oPDFApp.MenuItemExecute ("CopyFileToClipboard") that should do the select all and copy in one step, but I have had problems with it so I stick to the two-step method above.
2-A pdf file consists of two objects, the oAVDoc and the oPDDoc. Different aspects of the file are controlled by each. In this case you might only need the oAVDoc. Try commenting out the lines dealing with oPDDoc and see if it works without them.
I can't quite get your code to work, but my guess is that it's copying all of the data, but overwriting it each time through the loop. To fix this try:
ActiveSheet.Cells(1, ActiveSheet.Cells(1, Columns.Count).End(xlToLeft).Column + 1).Select
instead of the two lines that begin activesheet.range("A1").Select and Selection.End....
try this code this might work:
Sub Shell_Copy_Paste()
Dim o As Variant
Dim wkSheet As Worksheet
Set wkSheet = ActiveSheet
o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\red.pdf", vbNormalFocus)
Application.Wait (Now + TimeSerial(0, 0, 2)) 'Wait for Acrobat to load
SendKeys "^a" 'Select All
SendKeys "^c" 'Copy
SendKeys "%{F4}" 'Close shell application
wkSheet.Range("B5").Select
SendKeys "^v" 'Paste
End Sub
BELOW CODE WILL COPY DATA FROM PDF & will PASTE IT IN WORD THEN COPY DATA FROM WORD AND THEN PASTE IT TO THE EXCEL .
NOW Why I am copying data from pdf to word & then copying from word and pasting it to the excel because i want the data from the pdf in exact format to my excel sheet if i copy directly from pdf to excel it will paste the whole data from pdf into a single cell means even if i am having two columns or multiple rows it will paste all of my data into one column and that too in single cell but if i copy from word to excel it will retain its original format and two columns will get pasted as two columns only in excel.
Private Sub CommandButton3_Click() '(load pdf)
Dim o As Variant
Set appWord = CreateObject("Word.Application")
o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\Book1.pdf", vbNormalFocus) 'loading adobe reader & pdf file from their location
Application.Wait (Now + TimeSerial(0, 0, 2))
SendKeys ("^a")
SendKeys ("^c")
SendKeys "%{F4}"
Application.Wait Now + TimeValue("00:00:01")
Set appWord = CreateObject("Word.Application")
appWord.Visible = True
appWord.Documents.Add.Content.Paste
With appWord
.ActiveDocument.SaveAs Filename:=ThisWorkbook.Path & "\pdf" & ".docx", FileFormat:=wdocument 'saving word file in docx format
.ActiveWindow.Close
.Quit
End With
MsgBox " pdf is loaded "
MsgBox " Paste to EXCEL "
Set appWord = CreateObject("Word.Application")
appWord.Visible = True
appWord.Documents.Open "C:\Users\saurabh.ad.sharma\Desktop\pdf.docx" 'opening word document
appWord.Selection.WholeStory
appWord.Selection.Copy
Set wkSheet = ActiveSheet
wkSheet.Range("A1").Select
wkSheet.Paste 'pasting to the excel file
End Sub
This is the more modified version of my above code it will not save any document it will save data in clipboard and will do the execution fast..
Private Sub CommandButton3_Click() '(load pdf)
Dim o As Variant
Set appWord = CreateObject("Word.Application")
o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\Book1.pdf2", vbNormalFocus)
Application.Wait (Now + TimeSerial(0, 0, 2))
SendKeys ("^a")
SendKeys ("^c")
SendKeys "%{F4}"
Application.Wait Now + TimeValue("00:00:01")
Set appWord = CreateObject("Word.Application")
appWord.Visible = False
appWord.Documents.Add.Content.Paste
With appWord
.Selection.WholeStory
.Selection.Copy
.ActiveDocument.Close SaveChanges:=wdDoNotSaveChanges
.Quit
End With
MsgBox " pdf is loaded "
MsgBox " Paste to EXCEL "
Set wkSheet = ActiveSheet
wkSheet.Range("A1").Select
wkSheet.Paste
End Sub
I had similar problem. The best solution is, as it was mentioned before, to use Adobe API. In my case it was impossible because macro was intended for 100+ users without Adobe Pro on their PC.
Ultimate solution that I have developed recently was to build converted in C# (for free using Visual Studio and iText library), install it on end users computers and run whenever I need via VBA. Here are some links for more guidance:
How to develop pdf converter in C#: link
How to create Excel Addin in C#: link
How to run C# addin from VBA: link
Overall it's fairly complicated but once done works like a dream.
Another solution as mentioned before is to use sendkeys in VBA. My experience is that it requires some optimization to handle various opening and copying times (depending on file size). Below is code that worked for me, however it's not even near that fast and stable as C# converter.
Private Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) 'Initialize sleep function via Windows API
Public Sub CopyToSheet(destinationSheet As Worksheet, pathToPdf as String)
'Copy data from PDF to worksheet
'Initialize timer
Dim StartTime As Double
StartTime = Timer
'Clear clipboard
Dim myData As DataObject
Set myData = New DataObject
myData.SetText text:=Empty
myData.PutInClipboard
Set myData = Nothing
'Build file paths
Dim pathToAdobe As String
pathToAdobe = """C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe"""
pathToPdf = """" & pathToPdf & """"
'Open PDF and wait untill it is open. If file is already opened it will be just activated
Dim pdfId As Long
pdfId = Shell(pathToAdobe & " " & pathToPdf, vbMaximizedFocus)
Do
Sleep (500)
If Round(Timer - StartTime, 2) > 9 Then MsgBox "Failed to open PDF": Exit Sub 'Safety check
Loop Until Me.IsPdfOpen(pathToPdf)
'Copy and wait until copying is completed
SendKeys "^a"
SendKeys "^c"
Do
Sleep (500)
If Round(Timer - StartTime, 2) > 18 Then MsgBox "Failed to copy data to clipboard": Exit Sub 'Safety check
Loop Until Me.GetClipboardStatus = "ClipboardHasData"
'Paste data into worksheet
destinationSheet.Activate
destinationSheet.Range("A1").Select
destinationSheet.Paste
'Close pdf
Call Shell("TaskKill /F /PID " & CStr(pdfId), vbHide)
'Clear clipboard
Set myData = New DataObject
myData.SetText text:=Empty
myData.PutInClipboard
Set myData = Nothing
End Sub
Function IsPdfOpen(pathToPdf) As Boolean
'Check if PDF is already opened
'Build window name (window name is name of the application on Windows task bar)
Dim windowName As String
windowName = pathToPdf
windowName = Mid(windowName, InStrRev(windowName, "\") + 1, Len(windowName) - InStrRev(windowName, "\") + 1)
windowName = windowName + " - Adobe Acrobat Reader DC"
'Try to activate application to check if is opened
On Error Resume Next
AppActivate windowName, True
Select Case Err.Number
Case 5: IsPdfOpen = False
Case 0: IsPdfOpen = True
Case Else: Debug.Assert False
End Select
On Error GoTo 0
End Function
Function GetClipboardStatus() As String
'Check if copying data to clipboard is completed
Dim tempString As String
Dim myData As DataObject
'Try to put data from clipboard to string to check if operations on clipboard are completed
On Error Resume Next
Set myData = New DataObject
myData.GetFromClipboard
tempString = myData.GetText(1)
If Err.Number = 0 Then
If tempString = "" Then
GetClipboardStatus = "ClipboardEmpty"
Else
GetClipboardStatus = "ClipboardHasData"
End If
Else
GetClipboardStatus = "ClipboardBusy"
End If
On Error GoTo 0
Set myData = Nothing
End Function

Resources