I have a case where I am trying to scrape multiple pages, but I noticed that the desired part to scrape is different sometimes and this makes me to use IF statements to check for the existence of object like that
Set obj = html.querySelector("div > blockquote > p > span > strong")
If obj Is Nothing Then
Set obj = html.querySelector("div > blockquote > p > strong > span")
If obj Is Nothing Then
Set obj = html.querySelector("div > blockquote:nth-child(14) > p > strong")
If obj Is Nothing Then
Set obj = html.querySelector("div > blockquote:nth-child(13) > p > strong")
If obj Is Nothing Then
Set obj = html.querySelector("div > blockquote:nth-child(12) > p > strong")
End If
End If
End If
End If
Is there an alternative and more reliable way to solve such a problem?
You have to know there are more cases for the element
Without actual html to work with unsure if there are additional alternatives such as writing simpler/more transferable css selector lists.
That said, here are two options I would consider. Option 1: For very long css selector lists. Reduce the complexity of your code and have the one level of nesting. Option 2: For shorter css selector lists, use OR syntax to test for alternate patterns.
Having each alternate list on its own line, in one place, should aid with code maintenance over time.
Dim tests() As Variant, test As Long
tests = Array( _
"div > blockquote > p > span > strong", _
"div > blockquote > p > strong > span", _
"div > blockquote:nth-child(14) > p > strong", _
"div > blockquote:nth-child(13) > p > strong", _
"div > blockquote:nth-child(12) > p > strong")
'-------------------------
'Option 1: Single nested testing for longer selector list
For test = LBound(tests) To UBound(tests)
Set obj = HTML.querySelector(tests(test))
If Not obj Is Nothing Then Exit For
Next
'Option 2: CSS OR syntax for shorter selector list
Dim selectorList As String
selectorList = Join$(tests, ",")
Set obj = HTML.querySelector(selectorList)
'--------------- then continue -------
If Not obj Is Nothing Then
' do something
End If
If going with Option 1 I might then go on to consider using a flag boolean variable
Dim found As Boolean
For test = LBound(tests) To UBound(tests)
Set obj = html.querySelector(tests(test))
found = Not obj Is Nothing
If found Then Exit For
Next
If found Then
'do something
End If
Related
On the Sofascore.com site, I will automatically take the link in cell A1 at https://www.sofascore.com/tr/football/2022-11-16 or similar, and run it in the for next loop and print some information of the matches on my table. In order to print the information, I first have to click on each match one by one. I also do this with CSS.Click. But since I'm going to do this over and over, I definitely need a loop. CSS links are very similar to each other, only 1 number changes for each match. The first thing that comes to my mind is a next loop for CSSs that automatically picks up the next CSS code, but I don't have enough knowledge. I want to consult experts, thanks in advance.
I am open to all suggestions.
Sub sofascore()
Dim x As New Selenium.ChromeDriver, i, sonsat As Integer
x.SetProfile "C:\Users\Oğuzhan\AppData\Local\Google\Chrome\User Data"
x.AddArgument ("user-data-dir=C:\Users\xyz\AppData\Local\Google\Chrome\User Data\System Profile")
x.Start "chrome", "https://www.sofascore.com/"
x.Window.Maximize
Set ks = New Selenium.Keys
sonsat = Sheets("veri").Range("A10000").End(xlUp).Row
For i = 2 To sonsat
On Error Resume Next
x.Get Range("A" & i).Value
'CLICK THE SHOW ALL MATCHES ON THE SITE.
x.FindElementByCss("#__next > div > main > div.sc-
hLBbgP.dRtNhU.sc-cabffeca-0.QpfGa > div.sc- hLBbgP.sc-eDvSVe.gjJmZQ.fEHohf.sc-cabffeca-1.iITCqu > div.sc-hLBbgP.tYcjv.sc-cabffeca-2.loALSf > div > div.sc-hLBbgP.sc-eDvSVe.bdzsxu.hryjgv > button > div > span").Click
x.SendKeys ks.Home
x.Wait 200
'CLICK ON MATCH 1 ON THE LIST
x.FindElementByCss("#__next > div > main > div.sc-hLBbgP.dRtNhU.sc-cabffeca-0.QpfGa > div.sc-hLBbgP.sc-eDvSVe.gjJmZQ.fEHohf.sc-cabffeca-1.iITCqu > div.sc-hLBbgP.tYcjv.sc-cabffeca-2.loALSf > div > div:nth-child(2) > div > div > div:nth-child(2) > a > div > div > div.sc-hLBbgP.dRtNhU.sc-9199a964-1.kusmLq").Click
x.Wait 1500
'THE FOLLOWING CODES PRINT THE MATCH DATA INTO THE COLUMNS.
codes..
..
'THIS IS IMPORTANT NOW. FOR MATCH 2, THE LOOP STARTS AGAIN AND WITH THE EXACT SAME CODES. HERE I HAVE TO MANUALLY WRITE THE 2nd CSS CODE :(
x.FindElementByCss("#__next > div > main > div.sc-hLBbgP.dRtNhU.sc-cabffeca-0.QpfGa > div.sc-hLBbgP.sc-eDvSVe.gjJmZQ.fEHohf.sc-cabffeca-1.iITCqu > div.sc-hLBbgP.tYcjv.sc-cabffeca-2.loALSf > div > div:nth-child(2) > div > div > div:nth-child(3) > a > div > div > div.sc-hLBbgP.dRtNhU.sc-9199a964-1.kusmLq").Click
x.Wait 1500
'Same codes upper
....
...
...
Next
End Sub
I've tried many ways but probably nonsense. Expert comments will enlighten me.
I have through an API fetched my data as an XML, and I wish to cycle through nodes (there are several of the same type) and add them to certain fields/a table.
Example from the XML-file:
<HistRating
xmlns="">
<EndrAr>2020</EndrAr>
<EndrMnd>7</EndrMnd>
<Rating>A</Rating>
</HistRating>
<HistRating
xmlns="">
<EndrAr>2019</EndrAr>
<EndrMnd>6</EndrMnd>
<Rating>A</Rating>
</HistRating>
I have tried the following format (at this point the XML I need is in a string in xmlDoc xmlDoc = CreateObject("MSXML2.DOMDocument.6.0"). Fully aware that this is not a really "sexy" way to write it, but I'm new at this game:
Set nodeXML = xmlDoc.getElementsByTagName("EndrAr")
Range("G1").Value = nodeXML(1).Text
Range("H1").Value = nodeXML(2).Text
Range("I1").Value = nodeXML(3).Text
Set nodeXML = xmlDoc.getElementsByTagName("EndrMnd")
Range("G2").Value = nodeXML(1).Text
Range("H2").Value = nodeXML(2).Text
Range("I2").Value = nodeXML(3).Text
Set nodeXML = xmlDoc.getElementsByTagName("Rating")
Range("G3").Value = nodeXML(1).Text
Range("H3").Value = nodeXML(2).Text
Range("I3").Value = nodeXML(3).Text
This works great as long as all three items are there. Unfortunately that is not given. If it is a new company i.e. (3) wont exist (there is one line per year above), and I would like to either set the cell to Blank or No value or something.
The result from when I run the above code:
But if I try to add a line 4 to test what happens if value does not exists I get the following (for obvious reasons)
What I would love some help with is:
Can I by some "magic" add a ifmissing (tried it, but could not get it to work)?
Other ways to add a if variable is not found, input following into cell
Or are there a complete different way I should have solved this?
This is to add accounting data from last X available years (where X is ie 4, or less if not 4 is available) from 30 nodes.
You could use an Error trapping Function. Note in the code below we choose not to use the returned boolean.
Dim myTest as String
.
.
TryReadingXmlNode nodeXML,1, myText
Range("G1").Value = myText
.
.
Public Function TryReadingXmlNode(ByVal ipNode as object, ByVal ipIndex as Long, ByRef opText as string) as boolean
On Error Resume Next
opText=ipNode.Item(ipIndex).Text
TryReadingXmlNode=Len(opText)>0
If err.number>0 then opText="NoValue"
on Error Goto 0
End Function
Start by querying all of the HistRating elements, then loop over that collection:
Const MAX_YEARS As Long = 4
Dim ratings, rating, c As Range, i as Long
Set c= Range("A1")
Set ratings = xmlDoc.getElementsByTagName("HistRating")
For Each rating in ratings
c.offset(0, i) = rating.getElementsByTagName("EndrAr")(0).Text
c.offset(1, i) = rating.getElementsByTagName("EndrMnd")(0).Text
c.offset(2, i) = rating.getElementsByTagName("Rating")(0).Text
i = i + 1
If i >= MAX_YEARS Then Exit For 'exit if processed enough nodes
Next rating
My EXCEL VBA code builds up 1500+ instances of complex objects, always with good speed.
After all data have been processed my code cleans up, i.e. all objects and collections of them are getting set to NOTHING.
Sometimes this clean-up is fast (~10s), sometimes slow (>5 minutes).
I never start this code multiple times in the same session, i.e. I always close the workbook (all workbooks) and make sure Excel is closed.
There are no conditional formattings.
Screen updating is set to FALSE.
I use MS Office Professional Plus 2016.
Does anybody have an idea why it sometimes is fast and sometimes slow?
As some code is asked for here's the upper level clean-up loop for the superItems collection:
For idx = 1 To superItems.Count
Application.statusBar = thisFunction & ": " & superItems.Count - idx & " items left"
Set superItem = superItems(idx)
If Not (superItem Is Nothing) Then
superItem.clear
Set superItem = Nothing
End If
DoEvents
Next idx
Set superItems = Nothing
... and here's the method superItem.clear:
the superItem object holds an array (pArrTmxxItems) with pointers to item objects.
ub = UBound(pArrTmxxItems)
For idx = 0 To ub
Set item = pArrTmxxItems(idx)
If Not (item Is Nothing) Then
item.clear
Set item = Nothing
End If
Next idx
... item.clear looks like this:
the item object holds a collection pCAs of CA object instances, and (b) a collection pChildren of its own kind (i.e. aChild is of the same class as item)
If Not (pCAs Is Nothing) Then
For idx = 1 To pCAs.Count
Set CA = pCAs(idx)
If Not (CA Is Nothing) Then Set CA = Nothing 'jp171107
Next idx
Set pCAs = Nothing 'jp171107
End If
If Not (pChildren Is Nothing) Then
For idx = 1 To pChildren.Count
Set aChild = pChildren.item(idx)
aChild.clear
Set aChild = Nothing
Next idx
Set pChildren = Nothing 'jp171107
End If
Redeclare all the public variables. Every single one of them, except for the constants. Take care of the scope and do not pollute it.
Delete all the part of the code, setting stuff to nothing. The VBA Environment will do it for you. Automatically.
Objective
To search through a bunch of lines in a text file, and if a match is found populate that line in a Options list that is displayed in HTA.
Eg: If 'Setup' is found in 5 lines out of total 10, all the 5 lines need to be populated as 'Options'
Code
Set objFSO = CreateObject("Scripting.Filesystemobject")
Set objRegEx = New RegExp
With objRegEx
.Pattern = "(\b" & "setup" & "\b)"
.IgnoreCase = True
.Global = True
End With
Set objOpen = objFSO.OpenTextFile ("FileList.lst", 1)
Contents = objOpen.ReadAll
Set objMatchAll = objRegEx.Execute( Contents )
If objMatchAll.count > 0 Then
Set objOpen = objFSO.OpenTextFile ("FileList.lst", 1)
Do Until objOpen.AtEndOfStream
Line = objOpen.ReadLine
Set objMatchAny = objRegEx.Execute( Line )
If objMatchAny.count > 0 Then
Set objOption = Document.createElement("OPTION")
objOption.Text = Line
objOption.Value = Line
ValuesList.add objOption
'Matched = Matched & vbNewLine & Line
MatchCount = MatchCount + 1
End If
Loop
Else
MsgBox "No results"
End If
Explanation
The code looks for the term 'setup' (of course this is dynamically populated at the time of execution) in the file 'FileList.lst'. When results are found an 'Option' object is generated and added to the 'ValuesList' List which is in an HTML body using tags.
Note 1: The reason i generate an 'Options' object instead of just loading the line is so that we can populate the tag. The tag is used so we can select any one of the search result.
Note 2: The reason the 'Contents' variable is created so that incase if there are no matches at all, it need not go to each line to find a match, which would take longer to just display that message.
Problem
The code works fine, tested upto 150 results (outcome), but when there is a large number of matches my HTA freezes.
Question
Can the existing code be modified to perform better, like a different method to instead of creating the an 'Options' object, an alternate method to generate the 'ValuesList' ?
Instead of running two objRegEx search results, is there way to return the matched line from 'Contents' Varialable ?
Update
Ok, i ran my script without the objOption part which is not creating and adding options to my ValuesList, only regexp parsing through 58k lines, also resulting in 58k matches and the outcome was 3secs ... so looks like i need an alternative to populate my HTA options list ... its not able to handle that many options to select from ... any alternatives ? I used the same logic in a browser and the entire browser freezes ...
It seems like you really only care about whether or not the regex matches in a particular line or not. Since you don't need to know how many matches occurred, nor do you need the actual match text, you can use the Test method instead. This should be faster because it will stop after the first match, plus it doesn't have to construct the Matches collection. I'd also leave the Global property at its default value of False for pretty much the same reason, but if you're just using the Test method, I don't think the Global property matters.
Thanks to Cheran Shunmugavel i found out that the best way is to use DocumentFragments. I impleted that concept in my code and the results were great !
New Code
Set objFSO = CreateObject("Scripting.Filesystemobject")
Set objRegEx = New RegExp
With objRegEx
.Pattern = "(\b" & "setup" & "\b)"
.IgnoreCase = True
.Global = True
End With
Set objFragment = Document.createDocumentFragment()
Set objOpen = objFSO.OpenTextFile ("FileList.lst", 1)
Contents = objOpen.ReadAll
Set objMatchAll = objRegEx.Execute( Contents )
If objMatchAll.count > 0 Then
Set objOpen = objFSO.OpenTextFile ("FileList.lst", 1)
Do Until objOpen.AtEndOfStream
Line = objOpen.ReadLine
Set objMatchAny = objRegEx.Execute( Line )
If objMatchAny.count > 0 Then
Set objOption = Document.createElement("OPTION")
objOption.innerHTML = Line
objFragment.appendchild objOption
MatchCount = MatchCount + 1
End If
Loop
ViewList.appendChild objFragment.cloneNode(True)
Else
MsgBox "No results"
End If
Old Code : 53mins 23secs
New Code : 31secs
I am writing an Excel macro to fill out a form on a website. I have written the code that populate the text boxes easily enough, and found code to chose radio boxes, but I am having problems with choosing info from dropdown menus.
Example 'Gender':
The combo box has three options:
Select / Male / Female
I've tried a few variations on this:
doc.getElementsByName("xs_r_gender").Item(0).Value="Male"
...but with no luck.
This is the web source code:
<td> <select name="xs_r_gender" id="xs_r_gender">
<option value="" selected>Select</option>
<option value="male">Male</option>
<option value="female">Female</option> </select></td>
Thanks.
doc.getElementById("xs_r_gender").selectedindex=1
seems to do the trick. (Where 1 represents male)
Though it means I will need to do alot of lookups to determine what the value is for the items in my dropdown. (Easy enough for Sex, where there are only two options, but I have some comboboxes with up to 50 options). If anyone knows of a faster solution, that'd be great. In the meantime, Ill start doing up some tables!!!
thanks.
Try below code assuming doc = ie.document
doc.getElementById("xs_r_gender").value = "Male"
Use this in your code to call the function below.
xOffset = SetSelect(IE.Document.all.Item("shipToStateValue"), "Texas")
doc.getElementById("shipToStateValue").selectedindex = xOffset
Then use this for your function
Function SetSelect(xComboName, xComboValue) As Integer
'Finds an option in a combobox and selects it.
Dim x As Integer
For x = 0 To xComboName.options.Length - 1
If xComboName.options(x).Text = xComboValue Then
xComboName.selectedindex = x
Exit For
End If
Next x
SetSelect = x
End Function
Thanks Stack, works for me! My solution to operate an IE HTML combobox drop down turned out to be two parts.
Part 1 was to click the pull down, here's code:
Dim eUOM1 As MSHTML.HTMLHtmlElement
Set eUOM1 = ie.document.getElementsByTagName("input")(27).NextSibling
eUOM1.Focus
eUOM1.Click
Part 2 was to choose and click the value, like this (*actual element name changed):
Dim eUOM2 As MSHTML.HTMLHtmlElement
Set eUOM2 = ie.document.getElementsByName("[*PutNameHere]")(0)
eUOM2.Value = "EA"
eUOM2.Click
Here are references:refs
You can try the querySelector method of document to apply a CSS selector of option tag with attribute value = 'male':
doc.querySelector("option[value='male']").Click
or
doc.querySelector("option[value='male']").Selected = True
Function SetSelect(s, val) As Boolean
'Selects an item (val) from a combobox (s)
'Usage:
'If Not SetSelect(IE.Document.all.Item("tspan"), "Custom") Then
'something went wrong
'Else
'continue...
'End If
Dim x As Integer
Dim r As Boolean
r = False
For x = 0 To s.Options.Length - 1
If s.Options(x).Text = val Then
s.selectedIndex = x
r = True
Exit For
End If
Next x
SetSelect = r
End Function
Try this code :
doc.getElementById("xs_r_gender").value = "Male"
doc.getElementById("xs_r_gender").FireEvent("onchange")
You can do something like this:
doc.getElementsByName("xs_r_gender").Item(1).Selected=True
or
doc.getElementById("xs_r_gender").selectedindex = 1
Where 1 is the male option (in both cases).
If the dropbox needs to fire some event in order to aknowledge your choice, it is likely that it will be the "onchange" event. You can fire it like so:
doc.getElementById("xs_r_gender").FireEvent("onchange")
If you ever want to be able to select an option based on the option's text you can use the function given by Lansman (here) .
Based on the same answer, if you want to call the option by it's value property (instead of the text, you can just change the line If xComboName.Options(x).Text = xComboValue Then to If xComboName.Options(x).value = xComboValue Then).
This should cover all bases.
Copy from Here till last line:
Sub Filldata()
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For X = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
my_url = objShell.Windows(X).document.Location
my_title = objShell.Windows(X).document.Title
If my_title Like "***Write your page name***" Then
Set IE = objShell.Windows(X)
Exit For
Else
End If
Next
With IE.document.forms("***write your form name***")
' Assuming you r picking values from MS Excel Sheet1 cell A2
i=sheet1.range("A2").value
.all("xs_r_gender").Item(i).Selected = True
End with
End sub