Scraping web [VBA] xmlhttprequest - excel

I am stuck with this. I am trying to learn how to use xmlhttprequest with VBA. My intention is to access the following url: "https://micuil.net/".
Once there, I can send values to the following fields, as seen in the image:
image1
After pressing the button, the page displays the following information with the data I want to extract:
image2 (return value)
I am able to complete what you see in image 1 by code, but I don't know how to get the result (image 2). Any help please?
Function CuitEstimado2(sDni As Variant, sSexo As String) As String
Set oDoc = New HTMLDocument
Set oHTTP = New XMLHTTP60
sSexo = IIf(sSexo = "f", "Mujeres", "Varones")
sUrl = "https://micuil.net/"
oHTTP.Open "GET", sUrl, False
oHTTP.send
sRespuesta = oHTTP.responseText
oDoc.body.innerHTML = sRespuesta
oDoc.getElementById("dni").Value = sDni
oDoc.getElementsByName("sexo")(0).setAttribute("checked") = True
oDoc.getElementById("btn").parentElement.Click
End Function

Without having done heavy research for the specific website you are using, it may still be possible that the result you are looking for can be found within the response text (assuming I am understanding your dilemma properly).
First, it is recommended to perform a loop through the innerHTML of each element contained in the HTMLDoc. During your loop, use the InStr function to locate the result code as a string. It is a good idea to store each element that contains that result code into a collection for easy access after the loop.
It does get a bit more complicated from here, because the innerHTML of the corresponding elements may differ when pasting them into Notepad vs. trying to utilize in the VBE. However, if you can identify any unique JS (or other language) characters that will consistently indicate the location of the result code for each request, you may be able to use the Mid function to return the desired result into a string variable.

Related

VBA function to get page's URL to extract gps coordinates

I have a little problem, thanks for whatever help you might bring :)
my goal : i want to create a vba function that receives an address and returns the gps coordinates related to this address. To do so, i want to open my address in Openstreetmap (or Google Maps) in order to extract the page's URL which has the gps coordinated within it.
problem : my function only works way, it open the address in the navigator but it won't return the URL...
my code :
Option Explicit
Function AdresseToCoordonnesGPS(URL As String)
Dim navigateur As Object
Set navigateur = CreateObject("Shell.Application").ShellExecute("microsoft-edge:" & URL)
navigateur.Visible = True
Do While navigateur.busy And navigateur.ReadyState <> 4
DoEvents
Loop
pause (3)
Dim redirection As String
redirection = navigateur.locationUrl
redirection = Right(redirection, Len(redirection) - InStr(redirection, "#"))
redirection = Right(redirection, Len(redirection) - InStr(redirection, "/"))
AdresseToCoordonnesGPS = redirection
End Function
result : this function manages to open the link in the navigator (microsoft edge) it won't copy the page's URL, which i need in order to extract the GPS coordinates
here's what it return :

Vba Scraping Twitter details

I have a list of Twitter urls in Column A, for which I am trying to pull some information off, however I am having a lot of trouble. I want to pull off everything in yellow
I am not sure if it is due to having the wrong classes or due to the Twitter Urls NOT opening in excel. If I double click a url in excel and try to open it I get this error message.
The link works fine when I copy and paste them into the browser. I have read some information on the web that states that a HKEY on the PC may need changing LINK. The problem I have the person I am building this for is not pc literate and will struggle, to do any fix.
I have always used the below code for scraping and it has never failed me. When it does pull data off Twitter, I get an error message, see image below columns D + E. I am assuming this is making some contact to Twitter but can not access the page to extract the data. I am NOT using IE as it no longer works with twitter, I am using a MSXML2.ServerXMLHTTP.
This is what i am using to extract the data, it is the same for all the columns, just the class changes and if it is a Span or a child.
''''Element 3 Column D
If doc.getElementsByClassName("css-1dbjc4n")(0) Is Nothing Then
wsSheet.Cells(StartRow + myCounter, 4).Value = "-"
Else
wsSheet.Cells(StartRow + myCounter, 4).Value = doc.getElementsByClassName("css-1dbjc4n")(0).getElementsByTagName("Span")(0).innerText
End If
Public Function NewHTMLDocument(strURL As String) As Object
Dim objHTTP As Object, objHTML As Object, strTemp As String
Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
objHTTP.setOption(2) = 13056
objHTTP.Open "GET", strURL, False
objHTTP.send
If objHTTP.Status = 200 Then
strTemp = objHTTP.responseText
Set objHTML = CreateObject("htmlfile")
objHTML.body.innerHTML = strTemp
Set NewHTMLDocument = objHTML
Else
'There has been an error
End If
End Function
QUESTION
Is the problem due to the urls not opening in excel, or is it because the data is dynamic and it can not be extracted?
Twitter Link 1
Twitter Link 2
As always thanks for having a look and my apologies in advance for NOT adding HTML snippet as it would not let me post, I could not find the error so removed the html, it was stating that a URL had been shortened, but could not find it so removed the whole html snippet in order to post.
UPDATE
I thought this link was in my post, but I must have removed it when I removed the HTML Snippet. I found this on Stackoverflow but could not get it to work form me, nothing would extract Link

Error entering data into web page with Excel

I want to enter data into a web page field.
There are 2 data entry fields on the web page.
I entered data in the first section.
However, I cannot enter data in the other field.
Information you need to review the site :
Site : http://splan.byethost7.com/mesaj_yaz.php?fno=1&kip=yeni
user :kurucu password :a11111
I entered the data in the "BAŞLIK" field.
However I am unable to write data to the field named "İÇERİK"
I want to enter data in this field using an Excel macro. But I can't enter data using the code:
Sub deneme()
Dim URL As String
On Error Resume Next
URL = "http://splan.byethost7.com/mesaj_yaz.php?fno=1&kip=yeni"
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 1
For i = 1 To Range("A" & Rows.Count).End(3).row
If Cells(i, 1) <> Empty Then
ie.navigate URL
Call bekle
ie.Document.getElementById("mesaj_icerik").Value = "TEST"
ie.document.getElementsByName("mesaj_baslik").Item(0).Value = Cells(i, 1)
'IE.Document.getElementsByClassName("submitButton")(0).Click
Call bekle
End If
Next i
' IE.Quit
Set ie = Nothing
End Sub
Sub bekle()
With ie
Do Until .readyState = 4: DoEvents: Loop
Do While .Busy: DoEvents: Loop
End With
Application.Wait (Now + TimeValue("00:00:02"))
End Sub
As I said in my comments, there are several issues with your code, although the overall effort is good.
Firstly, this ie.document.getElementsid("mesaj_baslik") is not a valid method. If what you want is to access a single HTML element with a unique ID, then the method you need to use is ie.Document.getElementById("the element's ID").
Assuming that what I wrote above is what you were trying to achieve, you have to keep in mind that the .getElementById() method, returns only one single element.
So this ie.Document.getElementById("the element's ID").item(0) would give you an error saying:
Object doesn't support this property or method.
Even if all the aforementioned mistakes were corrected, I still don't see any elements with an ID equal to "mesaj_baslik", in the HTML snippet that you have provided. In fact this particular string is nowhere to be found in the HTML.
So even if the method was correct, this ie.Document.getElementById("mesaj_baslik"), would return Nothing.
Secondly, although your usage of the method ie.document.getElementsByName() is correct, there is no element with a Name attribute being equal to "formlar_mesajyaz", in the HTML snippet you have provided.
In fact this string seems to be a Class name rather than anything else. In this case you would have to use this method: ie.document.getElementsByClassName().
Now, from the info you have provided, the best I can do is assume that, what you want to do is enter some text in the textArea element. To do that, you can use the element's ID like so:
ie.Document.getElementById("mesaj_icerik").Value = "TEST"

How to get my VBA scraper to find the above row?

I have some experience with VBA but I am very new to web scraping with VBA. However I am very enthusiastic about it and thought of a 1000 ways how could I use it and make my job easier. :)
My problem is that I have a website with two input fields and one button. I can write in the input fields (they have ID so I can easily find them)
My code for the input fields:
.Document.getElementById("header_keyword").Value = my_first
.Document.getElementById("header_location").Value = my_last
But I am really stuck with clicking the button.
Here is the html code for the buttons:
<span class="p2_button_outer p2_button_outer_big"><input class="p2_button_inner" type="submit" value="Keresés" /></span>
<span class="p2_button_outer p2_button_outer_big light hide_floating"><a id="tour_det_search" class="p2_button_inner" href="http://www.profession.hu/kereses">Részletes keresés</a></span>
As you can see there are two different buttons near each other, and they share the same class. I am looking for the first/upper one. My problem is that it has no ID, only class, type and value. But I was not able to find getelementsbytype or getelementsbyvalue method.
Is there any solution to find the button by type or value (or both)?
Sorry if I am asking something stupid but as I said previously I am new in scraping...:)
Thank you in advance and have a nice weekend!
Fortunatelly I have worked out the solution. :)
What I did is the following. I made searched for the relevant classes and then using the getAttribute() method and looping thru the classes I searched for the specific value and clicked on it when found it.
Below is the working code:
Set my_classes = .Document.getElementsByClassName("p2_button_inner")
For Each class In my_classes
If class.getAttribute("value") = "Keresés" Then
Range("c4") = "Clicked"
class.Click
Exit For
End If
Next class
Thank you!
You can use the following function. It looks for a first HTML element with the given caption. You can also limit the searching by HTML tag.
(The code is compatible with IE <9 that doesn't contain getElementsByClassName method).
Public Function FindElementByCaption(dom As Object, Caption As String, _
Optional Tag As String, Optional Nested As Boolean = True) As Object
Dim ControlsSet As Variant
Dim Controls As Variant
Dim Control As Object
'------------------------------------------------------------------------------------
Set ControlsSet = VBA.IIf(Nested, dom.all, dom.childNodes)
If VBA.Len(Tag) Then
Set Controls = ControlsSet.tags(VBA.LCase(Tag))
Else
Set Controls = ControlsSet
End If
For Each Control In Controls
If VBA.StrComp(Control.InnerHtml, Caption, vbTextCompare) = 0 Then
Set FindElementByCaption = Control
Exit For
End If
Next Control
End Function
Here is how to apply it in your code:
Dim button As Object
Set button = FindElementByCaption(.Document, "Keresés", "INPUT", True)
If Not button Is Nothing Then
Call button.Click
Else
Call MsgBox("Button has not been found")
End If
CSS selector:
Use a CSS selector to target the element of:
input[value='Keresés']
This says element with input tag, having attribute value with value 'Keresés'.
CSS query:
VBA:
You apply the selector via the querySelector method of document.
ie.document.querySelector("input[value='Keresés']").Click

efficient search methods for getElementsByTagName("a")

I got 1000s of rows of URLs and for each row I am using:
For i = 2 to last row
Set links = html.getElementsByTagName("a")
For Each lnk In links
If lnk.innertext = "something" then
do something
end if
Next lnk
Next i
a method coomonly used, I guess, and as a ref shown by Sid's code at How to access innerText of HTML tag inside a <TD> tag
Is For loop (the one for each lnk), pretty much the only method in this scenario or are there faster efficient methods?
MATCH is probably used only for sheet ranges, but tried it anyways. It runs w/o error, does nothing and takes the same time as the For loop method. I think it does nothing due to the lack of appropriate addressing in :
If Not IsError(Application.Match("something", Range("A1:A100"), 0)) 'normally used for ranges
If Not IsError(Application.Match("something", links.innertext, 0)) 'what I tried

Resources