Count data on webpage from URL in Excel VBA - excel

is it possible to read a webpage from a hyperlink in excel and directly count the readability scores, ampersands and exclamations from the webpage without query the data back into excel by changing this VBA code? Also it is possible from a file path? This is all in one spreadsheet.
Option Compare Text
Sub Display_Stylometric_Scores()
Dim Words As String
Dim Characters As String
Dim Paragraphs As String
Dim Sentences As String
Dim Sentences_per_paragraph As String
Dim Words_per_sentence As String
Dim Characters_per_word As String
Dim Ratio_of_passive_sentences As String
Dim Flesch_Reading_Ease_score As String
Dim Flesch_Kincaid_Grade_Level_score As String
Dim Coleman_Liau_Readability_Score As String
Dim Ampersands As Long
Dim Exclamations As Long
Dim row As Integer
Dim column As Integer
Dim ActiveDocument As Object
Dim RS As Object
Dim txt As String
row = 3
Set ActiveDocument = CreateObject("Word.Document")
Do While Worksheets("Sample_Output_2").Cells(row, 1) <> ""
txt = Worksheets("Sample_Output_2").Cells(row, 2).Value
ActiveDocument.Content = txt
Set RS = ActiveDocument.Content.ReadabilityStatistics
Words = RS(1).Value
Characters = RS(2).Value
Paragraphs = RS(3).Value
Sentences = RS(4).Value
Sentences_per_paragraph = RS(5).Value
Words_per_sentence = RS(6).Value
Characters_per_word = RS(7).Value
Ratio_of_passive_sentences = RS(8).Value
Flesch_Reading_Ease_score = RS(9).Value
Flesch_Kincaid_Grade_Level_score = RS(10).Value
Ampersands = CountChar(txt, "&")
Exclamations = CountChar(txt, "!")
Worksheets("Sample_Output_2").Cells(row, 4).Resize(1, 12).Value =
Array(Words, Characters, Paragraphs, Sentences, Sentences_per_paragraph, _
Words_per_sentence, Characters_per_word, Ratio_of_passive_sentences, _
Flesch_Reading_Ease_score, Flesch_Kincaid_Grade_Level_score, _
Ampersands, Exclamations)
row = row + 1
Loop
End Sub
Function CountChar(txt As String, char As String) As Long
CountChar = Len(txt) - Len(Replace(txt, char, ""))
End Function

Yes, you an use MXSML to make an http request. Here's an example and a little refactoring of your existing code
Sub Main()
Dim vaWrite As Variant
Dim hDoc As MSHTML.HTMLDocument
Dim xHttp As MSXML2.XMLHTTP
'Set a reference to MSXML2
'Open a webpage using GET
Set xHttp = New MSXML2.XMLHTTP
xHttp.Open "GET", "http://stackoverflow.com/questions/15103048/count-data-on-webpage-from-url-in-excel-vba"
xHttp.send
'Wait for the web page to finish loading
Do Until xHttp.readyState = 4
DoEvents
Loop
'If the web page rendered properly
If xHttp.Status = 200 Then
'Create a new HTMLdocument
Set hDoc = New MSHTML.HTMLDocument
'Put the GET response into the doc's body
hDoc.body.innerHTML = xHttp.responseText
'Get an array back containing the readability scores
vaWrite = Display_Stylometric_Scores(hDoc.body.innerText)
'Write that array to a worksheet
Sheet1.Range("A2").Resize(UBound(vaWrite, 1), UBound(vaWrite, 2)).Value = vaWrite
End If
End Sub
Function Display_Stylometric_Scores(ByRef sText As String) As Variant
Dim aReadStats(1 To 1, 1 To 12) As Double
Dim wdDoc As Object
Dim wdRs As Object
Dim i As Long
Dim vaSpecial As Variant
Const lMAXIDX As Long = 10
vaSpecial = Array("&", "!")
Set wdDoc = CreateObject("Word.Document")
wdDoc.Content = sText
Set wdRs = wdDoc.Content.ReadabilityStatistics
For i = 1 To lMAXIDX
aReadStats(1, i) = wdRs(i).Value
Next i
For i = LBound(vaSpecial) To UBound(vaSpecial)
aReadStats(1, lMAXIDX + 1 + i) = CountChar(sText, vaSpecial(i))
Next i
Display_Stylometric_Scores = aReadStats
End Function
Function CountChar(ByRef sText As String, ByVal sChar As String) As Long
CountChar = Len(sText) - Len(Replace(sText, sChar, vbNullString))
End Function

Related

VBA - rename pdf per content

I need to develop a excel vba application to rename over hundred pdf file...
I have excel file, column A is content in pdf, column B is new name of pdf. if pdf content match with column A, then rename to new name in column B.
but there is a error - Method or data member not found (Error 461) in Function ExtractPDFContent(pdfFile As String) As String and highlighting 'GetText'
code below
Sub RenamePDF()
Dim wb As Workbook
Set wb = ThisWorkbook
Dim ws As Worksheet
Set ws = wb.Sheets("Sheet1")
Dim pdfPath As String
pdfPath = "F:\exceltest\"
Dim pdfFile As String
Dim pdfContent As String
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
Dim i As Long
For i = 1 To lastRow
pdfFile = pdfPath & ws.Cells(i, "A").Value & ".pdf"
If Dir(pdfFile) <> "" Then
pdfContent = ExtractPDFContent(pdfFile)
If pdfContent = ws.Cells(i, "A").Value Then
Name pdfFile As pdfPath & ws.Cells(i, "B").Value & ".pdf"
End If
End If
Next i
End Sub
**Function ExtractPDFContent(pdfFile As String) As String**
Dim pdfDoc As Acrobat.CAcroPDDoc
Set pdfDoc = CreateObject("Acrobat.PDDoc")
pdfDoc.Open (pdfFile)
Dim numPages As Long
numPages = pdfDoc.GetNumPages
Dim i As Long
Dim text As String
For i = 0 To numPages - 1
text = text & pdfDoc.**GetText**(i)
Next i
pdfDoc.Close
Set pdfDoc = Nothing
ExtractPDFContent = text
End Function
i asked ChatGPT before, it said missing Acrobat in reference, then check it all still not work

Opening Word Document in VBA Results in Empty Variable

I am trying to loop through a folder and open each word document one at a time in VBA. I had the code working, and then I added two more files to the folder. Now it won't open my first file (which I had opened previously. My code is as follows:
Sub readEmailsV2()
Dim oFSO As Object, oFolder As Object, oFile As Object
Dim i As Integer
Dim j As Integer
Dim pN As Integer
Dim sFileSmall As String, sFileYear As String, sFilePath As String
Dim wapp As Word.Application
Dim wdoc As Word.Document
Dim tabDest As Worksheet
Dim splitVals As Variant
Dim contentsVar As String
Dim jContent As String
Dim pageCount As Integer
Dim fpOpen As Variant
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' USER INPUT
sFileSmall = "C:\Users\rstrott\OneDrive - Research Triangle Institute\Desktop\VBApractice\Docket Index\filesToRead\"
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Get variable with filenames from folder (Only contains word docs)
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFolder = oFSO.getfolder(sFileSmall)
Set tabDest = ThisWorkbook.Sheets("FileContents")
Set wapp = GetObject(, "Word.Application")
If wapp Is Nothing Then
Set wapp = CreateObject("Word.Application")
End If
tabDest.Cells.Clear
tabDest.Range("a1:a1") = "File Title"
tabDest.Range("b1:b1") = "From:"
tabDest.Range("c1:c1") = "To:"
tabDest.Range("d1:d1") = "cc:"
tabDest.Range("e1:e1") = "Date Sent:"
tabDest.Range("f1:f1") = "Subject:"
tabDest.Range("g1:g1") = "Body:"
tabDest.Range("h1:h1") = "Page Count:"
i = 2
For Each oFile In oFolder.Files
' Assign variables
sFilePath = sFileSmall & oFile.Name
wapp.Visible = True
fpOpen = oFile.Path
Set wdoc = wapp.Documents.Open(sFilePath) ' <---- ERROR HERE: Output is 'Nothing'
pN = ActiveDocument.Paragraphs.Count
pageCount = ActiveDocument.ActiveWindow.ActivePane.Pages.Count
' Put paragraph contents in cells
tabDest.Cells(i, 1) = oFile.Name
tabDest.Cells(i, 2) = wdoc.Paragraphs(2)
tabDest.Cells(i, 3) = wdoc.Paragraphs(8)
tabDest.Cells(i, 4) = wdoc.Paragraphs(11)
tabDest.Cells(i, 5) = wdoc.Paragraphs(5)
tabDest.Cells(i, 6) = wdoc.Paragraphs(14)
Dim item As Variant
For j = 15 To pN
jContent = wdoc.Paragraphs(j).Range.Text
If j = 15 And Len(jContent) > 2 Then
contentsVar = wdoc.Paragraphs(j).Range.Text
ElseIf Len(jContent) > 2 Then
contentsVar = contentsVar & Chr(10) & wdoc.Paragraphs(j).Range.Text
End If
Next j
tabDest.Cells(i, 7) = contentsVar
tabDest.Cells(i, 8) = pageCount
' Close Word Doc
wdoc.Close _
SaveChanges:=wdDoNotSaveChanges
i = i + 1
Next oFile
End Sub
I've tried lots of different things to get it to work again, and I ran out of ideas. Any help would be greatly appreciated.

Not able to pull headers with data from Access to Excel using VBA,

How to modify my query so that the headers can also be pulled?
Sub AccessToExcel()
ControlFile = ActiveWorkbook.Name
Dim myrange As Range
Set myrange = ActiveWorkbook.Sheets("Control").Range("C9")
Dim i As Integer
i = 1
Do While Len(myrange.Offset(i, 0)) > 0
Dim terr_filter As String
terr_filter = myrange.Offset(i, 1).Value
Dim terr_name As String
terr_name = myrange.Offset(i, 0).Value
Dim file_path As String
file_path = myrange.Offset(i, 3).Value
Dim file_name As String
file_name = myrange.Offset(i, 2).Value
Dim j As Long, sPath_Access_DB As String
Dim oDAO As DAO.DBEngine, oDB As DAO.Database, oRS As DAO.Recordset
Dim strPath As String
sPath_Access_DB = Range("rng_Ctrl_Path").Value
'Exporting Component Summary to Access
If sPath_Access_DB = "" Then Exit Sub
Set oDAO = New DAO.DBEngine
Set oDB = oDAO.OpenDatabase(sPath_Access_DB, dbOpenDynaset)
Set oRS = oDB.OpenRecordset(terr_name)
Sheets.Add After:=ActiveSheet
Range("B2").CopyFromRecordset oRS
ActiveSheet.Name = terr_filter
oDB.Close
i = i + 1
Loop
End Sub
You need to loop and extract them yourself:
For j = 1 to oRS.Fields.Count
Cells(1, j + 1).Value = oRS.Fields(j - 1).Name
Next

Linking Hyperlinks in a Word Document to a corresponding document in a excel sheet

To keep it simple I have several hundred word documents for clients which list templates used for those clients. I need to hyperlink each mention of a template in every document to its corresponding template document, which are all stored in a template folder.
I have a excel spread sheet with 2 columns. The 1st being the name of the template, the 2nd being a hyperlink to that template in the relevant folder.
Below is the script I have created but I am having issues getting it to hyperlink the text, I have tried the code written here with some changed to search and replace with my variable but it makes them all the same hyperlink. https://superuser.com/a/1010293
I am struggling to find another way to do this based on my current knowledge of VBA.
Below is my current code which carries out the whole task.
Public strArray() As String
Public LinkArray() As String
Public TotalRows As Long
Sub Hyperlink()
Dim file
Dim path As String
Dim FilenameWaterMark As String
Call OpenExcelFile
i = 1
For i = 1 To TotalRows
'here I need the document to look through while searching for strarray(I)
'and make that string a hyperlink to linkarray(I)
Next
ActiveDocument.Save
End Sub
Sub OpenExcelFile()
'Variables
Dim i, x As Long
Dim oExcel As Excel.Application
Dim oWB As Workbook
i = 1
'Opening Excel Sheet
Set oExcel = New Excel.Application
Set oWB = oExcel.Workbooks.Open("H:\DCTEST\Templates\DOCS.xlsx")
oExcel.Visible = True
'Counts Number of Rows in Sheet
TotalRows = Rows(Rows.Count).End(xlUp).Row
ReDim strArray(1 To TotalRows)
ReDim LinkArray(1 To TotalRows)
'Assigns each cell in Column A to an Array
For i = 1 To TotalRows
strArray(i) = Cells(i, 1).Value
Next
'searches for hyperlink
For i = 1 To TotalRows
LinkArray(i) = Cells(i, 2).Value
Next
oExcel.Quit
End Sub
I got it working myself. Below is the full code.
Dim strArray() As String
Dim LinkArray() As String
Dim TotalRows As Long
Private Sub DOCUMENT_OPEN()
Dim file
Dim path As String
Dim FilenameWaterMark As String
Dim Rng As Range
Dim SearchString As String
Dim EndString As String
Dim Id As String
Dim Link As String
Call OpenExcelFile
i = 1
For i = 1 To TotalRows
Set Rng = ActiveDocument.Range
SearchString = strArray(i)
With Rng.Find
.MatchWildcards = False
Do While .Execute(findText:=SearchString, Forward:=False, MatchWholeWord:=True) = True
Rng.MoveStartUntil (strArray(i))
Rng.MoveEndUntil ("")
Link = LinkArray(i)
ActiveDocument.Hyperlinks.Add Anchor:=Rng, _
Address:=Link, _
SubAddress:="", ScreenTip:="", TextToDisplay:=Rng.Text
Rng.Collapse wdCollapseStart
Loop
End With
Next
ActiveDocument.Save
End Sub
Sub OpenExcelFile()
'Variables
Dim i, x As Long
Dim oExcel As Excel.Application
Dim oWB As Workbook
i = 1
'Opening Excel Sheet
Set oExcel = New Excel.Application
Set oWB = oExcel.Workbooks.Open("H:\DCTEST\Templates\DOCS.xlsx")
oExcel.Visible = False
'Counts Number of Rows in Sheet
TotalRows = Rows(Rows.Count).End(xlUp).Row
ReDim strArray(1 To TotalRows)
ReDim LinkArray(1 To TotalRows)
'Assigns each cell in Column A to an Array
For i = 1 To TotalRows
strArray(i) = Cells(i, 1).Value
Next
'searches for hyperlink
For i = 1 To TotalRows
LinkArray(i) = Cells(i, 2).Value
Next
oExcel.Quit
End Sub
This runs when the document is open and links all mentions of a template to its document in the template folder.

Looping Through PDF Files

I have a working VBA script that pulls specific form fields from a specified PDF file into a spreadsheet. However I have several hundred PDFs that I need to do this for, so I'd like to loop through files in a directory and perform the same action.
Conveniently I have an old VBA script that loops through Word files in a directory and imports the content of each just how I'd like.
I hardly know VBA but I've adapted scripts in several language including VBA to meet my needs. I thought this would take 10 minutes but its taken several hours. Can somebody please look at my script below and tell me where I'm going wrong? I assume it has something to do with the Word and Acrobat libraries having different requirements, but even my loop isn't displaying the test message.
PS I have Acrobat Pro installed.
My Script (Non-Working)
Private Sub CommandButton1_Click()
Dim f As String: f = "C:\temp\ocopy"
Dim s As String: s = Dir(f & "*.pdf")
Dim AcroApp As Acrobat.CAcroApp
Dim theForm As Acrobat.CAcroPDDoc
Dim jso As Object
Dim text1, text2 As String
Dim col As Integer: col = 1
Do Until s = ""
Set AcroApp = CreateObject("AcroExch.App")
Set theForm = CreateObject("AcroExch.PDDoc")
theForm.Open (f & s)
Set jso = theForm.GetJSObject
text1 = jso.getField("Name of serviceRow1").Value
text2 = jso.getField("Who are the key contacts?").Value
MsgBox text1
MsgBox "text1"
Sheet1.Cells(col, 1).Value = text1
Sheet1.Cells(col, 2).Value = text2
col = col + 1: s = Dir
theForm.Close
AcroApp.Exit
Set AcroApp = Nothing
Set theForm = Nothing
Loop
End Sub
Word Script - Works at Looping and Importing
Sub fromWordDocsToMultiCols()
Dim f As String: f = "C:\temp\Test\"
Dim s As String: s = Dir(f & "*.docx")
Dim wdApp As New Word.Application, wdDoc As Word.Document
Dim col As Integer: col = 1
On Error GoTo errHandler
Do Until s = ""
Set wdDoc = wdApp.Documents.Open(f & s)
wdDoc.Range.Copy
Sheet1.Cells(1, col).Value = s
Sheet1.Cells(2, col).PasteSpecial xlPasteValues
wdDoc.Close False: col = col + 1: s = Dir
Loop
errHandler:
If Err.Number <> 0 Then MsgBox Err.Description
If Not wdApp Is Nothing Then wdApp.Quit False
End Sub
Acrobat Script - Works as Importing One-by-One
Private Sub CommandButton1_Click()
Dim AcroApp As Acrobat.CAcroApp
Dim theForm As Acrobat.CAcroPDDoc
Dim jso As Object
Dim text1, text2 As String
Set AcroApp = CreateObject("AcroExch.App")
Set theForm = CreateObject("AcroExch.PDDoc")
theForm.Open ("C:\temp\ocopy\Minerals asset management.pdf")
Set jso = theForm.GetJSObject
' get the information from the form fiels Text1 and Text2
text1 = jso.getField("Name of serviceRow1").Value
text2 = jso.getField("Who are the key contacts within the team for this service? Please provide one contact per region").Value
Sheet1.Cells(1, 1).Value = text1
Sheet1.Cells(1, 2).Value = text2
theForm.Close
AcroApp.Exit
Set AcroApp = Nothing
Set theForm = Nothing
End Sub
Many thanks in advance.

Resources