VBA web scrape innertext - excel

I have now tried for quite some time to web scrape this innertext:
I want the value 0606 copied to an Excel sheet
<TABLE class="group"
<td width="100%" nowrap="" colspan="3">
<input name="pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityQualifier"
type="HIDDEN" value="CPR">CPR-nr:
<input name="pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue"
type="HIDDEN" value="0606">0606</td>
I have tried through get.attribute,getelementbyclassname, value and innertext, but now I need some fresh eyes on it.
Does any of you have a good idea?

Something like this should work, however without your code I don't know how you're obtaining your HTMLDocument:
Dim oHTMLDocument As Object
Dim ele As Object
Set oHTMLDocument = ... 'No code provided so I'm unsure how you obtained the HTMLDocument
For Each ele in oHTMLDocument.getElementsByTagName("input")
If ele.Name = "pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue" Then
Debug.Print ele.innerText
Exit For
End If
Next ele

You can use a CSS selector and avoid the need for a loop.
input[name="pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue"]
Try it
VBA
.querySelector is accessed via the HTML document set when you have your page (method not shown in your question) but with Internet Explorer for example:
IE.document.querySelector("pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue").innerText
Further info:
HTML DOM querySelector() Method

Related

Selecting a webpage radio-button

I am trying to do some webpage data extraction using VBA within excel. I have managed to automate the login process, which takes me to a new page with a search form. This form has an input field for 'Search Value' and a series of radio buttons that specify the variable to conduct the search on eg. ID number, Name. I am trying to automate filling in the input field and selecting one of the radio buttons and then submitting the form.
The code I used to login does not seem to work on the next page where I need to do the search. This is the HTML from the search page containing the input field:
<table width="100%">
<tr>
<td style="font-weight: bold; width:10%;" nowrap="nowrap">Search Value</td>
<td style="width: 15%; text-align: left;">
<input name="txtSearch" type="text" id="txtSearch" onkeyup="return txtSearch_onkeyup()" style="background-color:White;border-color:#222244;border-width:1px;border-style:Solid;height:20px;width:125px;" />
Which I try to fill in using:
ieDoc.Document.getElementsByName("txtSearch")(0).Value = "Test String"
Or:
ieDoc.Document.all.txtSearch.Value = "Test String"
Or
ieDoc.Document.getElementByID("txtSearch").Value = "Test String"
...all giving me the same Object defined error.
I have confirmed that the ieDoc is referencing the correct page after the login (by checking the title in the immediate window), and tried to ensure it is not a timing issue with page-loading.
I know the HTML methods in vba tend to be temperamental but I've run out of ideas. Any other way to access the input field on this page?
Success!
I was trying to reference an element on a sub-frame within the IE object, which I accessed with:
ieDoc.Document.frames(0).Document.all.txtSearch.Value = "Hello!"

Excel VBA: Working with iFrame via IE Automation

I have a project that I am working on where I am trying to automate a site's behavior via Excel's VBA. So far, I know how to initialize a web browser from VBA, navigate to a website, and perform a simple task such as clicking on an item using the getElementById function and click method. However, I wanted to know how can I go about working with an embedded object(s) that is inside of an iframe.
For example, here is an overview of what the tree structure looks like via HTML source code. Of course, there are a lot more tags, but at least you can get an idea of what it is that I am trying to do.
<html>
<body>
<div>
<iframe class="abs0 hw100" scrolling="no" allowtransparency="true" id="Ifrm1568521068980" src="xxxxx" title="mailboxes - Microsoft Exchange" ldhdlr="1" cf="t" frameborder="0"> <<< ----- This is where I am stuck
<div>
<tbody>
<tr>
etc.....
etc.....
etc.....
</tr>
<tbody>
<div>
----------- close tags
I guess the biggest problem for me is to learn how to manipulate an embedded object(s) that is enclosed inside of an iframe because all of this is still new to me and I am not an expert in writing programs in VBA. Any help or guidance in the right direction will help me out a lot. Also, if you need more information or clarification, please let me know.
Here the code that I have written so far:
Option Explicit
'Added: Microsoft HTML Object Library
'Added: Microsoft Internet Controls
'Added: Global Variable
Dim URL As String
Dim iE As InternetExplorer
Sub Main()
'Declaring local variables
Dim objCollection As Object
Dim objElement As Object
Dim counterClock As Long
Dim iframeDoc As MSHTML.HTMLDocument 'this variable is from stackoverflow (https://stackoverflow.com/questions/44902558/accessing-object-in-iframe-using-vba)
'Predefining URL
URL = ""
'Created InternetExplorer Object
Set iE = CreateObject("InternetExplorer.Application")
'Launching InternetExplorer and navigating to the URL.
With iE
.Visible = True
.Navigate URL
'Waiting for the site to load.
loadingSite
End With
'Navigating to the page I need help with that contains the iFrame structure.
iE.Document.getElementById("Menu_UsersGroups").Click
'Waiting for the site to load.
loadingSite
'Set iframeDoc = iE.frames("iframename").Document '<<-- this is where the error message happens: 438 - object doesn't support this property or method.
'The iFrame of the page does not have a name. Instead "Ifrm1" is the ID of the iFrame.
End Sub
'Created a function that will be used frequently.
Function loadingSite()
Application.StatusBar = URL & " is loading. Please wait..."
Do While iE.Busy = True Or iE.ReadyState <> 4: Debug.Print "loading": Loop
Application.StatusBar = URL & " Loaded."
End Function
Please note: My knowledge of programming in VBA is on an entry-level. So, please bear with me if I don't understand your answer the first time around. Plus, any nifty documentation or videos about this topic will help me a lot as well. Either way, I'm very determined to learn this language as it is becoming very fun and interesting to me especially when I can get a program to do exactly what it was designed to do. :)
You try to use the following code to get elements from the iframe:
IE.Document.getElementsbyTagName("iframe")(0).contentDocument.getElementbyId("txtcontentinput").Value = "BBB"
IE.Document.getElementsbyTagName("iframe")(0).contentDocument.getElementbyId("btncontentSayHello").Click
Sample code as below:
index page:
<input id="txtinput" type="text" /><br />
<input id="btnSayHello" type="button" value="Say Hello" onclick="document.getElementById('result').innerText = 'Hello ' + document.getElementById('txtinput').value" /><br />
<div id="result"></div><br />
<iframe width="500px" height="300px" src="vbaiframecontent.html">
</iframe>
vbaframeContent.html
<input id="txtcontentinput" type="text" /><br />
<input id="btncontentSayHello" type="button" value="Say Hello" onclick="document.getElementById('content_result').innerText = 'Hello ' + document.getElementById('txtcontentinput').value" /><br />
<div id="content_result"></div>
The VBA script as below:
Sub extractTablesData1()
'we define the essential variables
Dim IE As Object, Data As Object
Dim ticket As String
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("<your website url>")
While IE.ReadyState <> 4
DoEvents
Wend
Set Data = IE.Document.getElementsbyTagName("input")
'Navigating to the page I need help with that contains the iFrame structure.
IE.Document.getElementbyId("txtinput").Value = "AAA"
IE.Document.getElementbyId("btnSayHello").Click
'Waiting for the site to load.
'loadingSite
IE.Document.getElementsbyTagName("iframe")(0).contentDocument.getElementbyId("txtcontentinput").Value = "BBB"
IE.Document.getElementsbyTagName("iframe")(0).contentDocument.getElementbyId("btncontentSayHello").Click
End With
Set IE = Nothing
End Sub
After running the script, the result as below:

How to click Save button on a web form using VBA

I am working on automating a task. I want to click a Save button on a web form using VBA, but it's not working:
<input name="save" title="Save" class="btn" type="submit" value=" Save ">
<input name="save" tabindex="79" title="Save" class="btn" type="submit" value=" Save ">
I've tried ie.Document.all("save").Click, but it doesn't seem to work. What method do I need to click the button?
You can try going through all your "btn" class collection, and click the one with your save value:
Dim btnClassColl As Object, btn As Object
Set btnClassColl = ie.document.getElementsByClassName("btn")
For Each btn In btnClassColl
If btn.Value Like "*save*" Then
btn.Click
Exit For
End If
Next
Also: make sure that your web page has FULLY loaded before trying to automate anything.
Edit
In response to the comment:
This code is neither giving error nor its clicking on btn. Can this be because there are two buttons on web page with same name and function?
An alternative solution would be that if you know the index number of the collection item, you can simply use that index number and not loop at all. In this case, your index # is 1 (remember: Base 0).
Try this alternative:
Dim btn As Object
ie.document.getElementsByName("save")(1) '1 actually means #2 in index

Web scrape innertext vba

I've tried for a long time to scrape but i've faced a problem.
I've tried to scrape both the value with getattribute.value and tried to do it with the getelementbyID/name/ClassName, but nothing helps
I need help to web scrape the innertext called '0606' from the following html-code:
<td width="100%" nowrap="" colspan="3">
<input name="pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityQualifier"
type="HIDDEN" value="CPR">CPR-nr:
<input name="pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue"
type="HIDDEN" value="0606">0606</td>
My code for now is:
Dim CPR As String
CPR = IE.Document.getElementById("pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue").innerText
Range("A2").Value = CPR
I also tried this, but this returns the very first input way above my wanted input, and no matter which value I change (1) to, it errors with 91:
CPR= Trim(Doc.getElementsByTagName("input")(1).getAttribute("value"))
Range("A2").Value = CPR
Can anybody help me?
any suggestions for code would help me immensly
Try to get the element by it's name property like below...
IE.document.getElementsByName("pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue")
Try selecting the element and then parse the OuterHTML to retrieve the value.
Dim s As String
s = IE.document.querySelector("input[name=pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue]").outerHTML
Debug.Print Split(Split(s, "value=" & Chr$(34))(1), Chr$(34))(0)

how to tell if a url is for default blue image gravatar

Is there a way to tell if a url is for default blue gravatar image?
Here are two urls for the default Gravatar image:
http://www.gravatar.com/avatar/19b2990ba88512ab38abdbbca5701d27?s=120
http://www.gravatar.com/avatar/88a771dc1a611b2038c9a0ad0770b595?s=120
Here is a url that has an image:
http://www.gravatar.com/avatar/d8f98df8a6ed24a727b993ea01cc91f6?s=120
Is there something in these url's that I can search for that the default blue gravatar image url has that the non default gravatas do not have?
Edit:
What I am trying to do is:
I have an excel sheet downloaded from an app that has a column for gravatar urls. I need to delete all the links that go the the default blue gravatar image.
Gravatar avatar image URLs that are the default image will return a 404 Not Found error if the following parameter is set d=404. For example, here are the URLs that you used as examples, but with the parameter set:
http://www.gravatar.com/avatar/19b2990ba88512ab38abdbbca5701d27?s=120&d=404
http://www.gravatar.com/avatar/88a771dc1a611b2038c9a0ad0770b595?s=120&d=404
http://www.gravatar.com/avatar/d8f98df8a6ed24a727b993ea01cc91f6?s=120&d=404
Assuming you're detecting if the images default using JavaScript, you can then use AJAX (without displaying the image) or an error catcher (displaying non-default images) to detect if these images successfully loaded.
jQuery (AJAX)
// Image Exists
$.ajax({url:"http://www.gravatar.com/avatar/d8f98df8a6ed24a727b993ea01cc91f6?s=120&d=404",type:"GET",crossDomain:true,success:(function(){console.log("Custom Gravatar");}),error:(function(){console.log("Default Gravatar");})});
// Image Does Not Exist
$.ajax({url:"http://www.gravatar.com/avatar/88a771dc1a611b2038c9a0ad0770b595?s=120&d=404",type:"GET",crossDomain:true,success:(function(){console.log("Custom Gravatar");}),error:(function(){console.log("Default Gravatar");})});
Error Catching
You can use either the jQuery load and error event handlers, or the HTML onload and onerror attributes.
$("img").load(function(e) {
e.target.parentNode.parentNode.getElementsByClassName("stat")[0].innerHTML = e.type;
}).error(function(e) {
e.target.parentNode.parentNode.getElementsByClassName("stat")[0].innerHTML = e.type;
});
table,
td {
border: 1px solid black;
border-collapse: collapse;
}
table img {
width: 48px;
height: 48px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table>
<tbody>
<tr>
<td>Image</td>
<td>Expected Result</td>
<td>Actual Result</td>
</tr>
<tr>
<td>
<img id="good" src="http://www.gravatar.com/avatar/d8f98df8a6ed24a727b993ea01cc91f6?s=120&d=404" />
</td>
<td>load</td>
<td class="stat">Loading...</td>
</tr>
<tr>
<td>
<img id="bad" src="http://www.gravatar.com/avatar/88a771dc1a611b2038c9a0ad0770b595?s=120&d=404" />
</td>
<td>error</td>
<td class="stat">Loading...</td>
</tr>
</tbody>
</table>
EDIT: OP clarified what was being asked for
I wrote a small VBA script in this example file to iterate through the first column up until the first empty cell, creating WinHTTP requests with a modified URL then, as OP asked, deleting the contents of cells that contained a link to the default Gravatar avatar.
To run the code in the sample Excel file:
Excel 2003 and lower: Tools > Macro > Macros (Alt + F8) > checkGravatar
Excel 2007 and newer: Develooper > Macros > checkGravatar
In order to run the VBA, you may also need to enable macros and reference MSXML.
Sub checkGravatar()
Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
Dim URL As String
Dim goodStat As String
Dim badStat As String
Dim row As Integer
Dim pos As Integer
row = 1
URL = Cells(row, 1).Value
Do While Len(URL) > 0
If InStr(URL, "gravatar.com/avatar/") > 0 Then
If InStr(URL, "?") = 0 Then
URL = URL & "?d=404"
Else
If Not InStr(URL, "&d=") Then
URL = URL & "&d=404"
End If
End If
objHTTP.Open "GET", URL, False
objHTTP.send ("")
If objHTTP.Status = 404 Then
Cells(row, 1).Value = ""
Else
MsgBox "GET request failed"
End If
row = row + 1
URL = Cells(row, 1).Value
Loop
End Sub
You can try to pass a default image of your choice, and see what is returned.
URLs mapping to existing gravatars will ignore the default image parameter, returning an image.
URLs mapping do no images will use the default you provide (instead of the blue G).
If you pass something invalid (empty string), you'll get a 400.
Compare the existing one:
http://www.gravatar.com/avatar/d8f98df8a6ed24a727b993ea01cc91f6?d=%22%22
With the non-existing one:
http://www.gravatar.com/avatar/19b2990ba88512ab38abdbbca5701d27?d=%22%22
That said... It's quite rude to deliberately cause errors on someone else's API.
You could fire off some background worker that will GET the images and see what's returned.

Resources