Problems to extract links with webscraping

Problems to extract links with webscraping - excel

I want to extract the links of the toys listed in this webpage:
https://cebra.com.ar/category/73/Juego-de-Construccion.html
I have an entire procedure (I don´t copye here because it´s very long and complicated), in which in some part I have the following code that doesn´t work:
Cells(erow, 1) = html.getElementsByTagName("a").href
Any idea to solve this?
Thanks a lot!

getElementsByTagName returns a collection and indeed you would need to index into it to get a particular element.
However, you don't want all a tags. That is inefficient and you would need an additional test to limit to those of interest. You want specifically the links for products so use an attribute = value css selector to get those:
Dim links As Object, i As Long
Set links = html.querySelectorAll("[href^=product]")
For i = 0 to links.Length - 1
ActiveSheet.Cells(erow + i, 1) = links.item(i).href
Next
This:
[href^=product]
looks for href attributes whose value starts with, ^, product.
If you look at the page html you can see each of your target links begins with that substring

The function getElementsByTagName() of the object HTMLDocument returns a list, but you're trying to access the property .href of one object as if it was a single object.
You should replace this:
Cells(erow, 1) = html.getElementsByTagName("a").href
with this
Cells(erow, 1) = html.getElementsByTagName("a")[yourIndex].href
... where yourIndex is a number representing the index of your list (0, 1,... n).
Of course you'll have to find the correct rule to get the right a element at the right place, as just getting all the elements of the document with tag a retrieves 278 elements in your page (including all the page headers, footers and other things I don't really think you need):

Related

(Python) Selenium list objects are not callable when looping through them (autocomplete don't work either)

I'm fairly new to Python and Selenium.
I'm trying to gather elements from a webpage using Python and Selenium in VS Code.
I've already done similar things in other webpages so I can confirm the setups and the drivers all work fine.
Here is the code which I'll try to explain line by line.
'// Creating an empty Array of Names'
Names = []
'// Finding and Saving the Table im interested in'
Table = driver.find_element_by_id("pokedex")
'// Finding and Saving in a list all the elements with a particular class name'
NameCells = Table.find_elements_by_class_name("cell-name")
'// Looping through the List'
for NameCell in NameCells:
'// If it finds a child element with a particular class...'
if NameCell.find_elements(By.CLASS_NAME, "text-muted"):
'// ... append it in the array once transformed into text'
Names.append(NameCell.find_element(By.CLASS_NAME, "text-muted").text)
'// ... else...'
else:
'// ... append an element with another class into the array once transformed into text.'
Names.append(NameCell.find_element(By.CLASS_NAME, "ent-name").text)
'// .. and print the array.'
print(Names)
The problem is that while I can use functions like "find_element" in the second and third line of code... I can't use it in the for loop, in the fifth line of code.
VS Code doesn't even show me the expected functions after digiting the ".".
I tried to complete it myself hoping it worked but of course it didn't.
Why does it happen?
Why can't I use WebElements functions at times?
I'm noticing it's happening mainly on Lists of objects rather than single objects.

Access specific index of array with Pug

How can you display a specific item in an array with Pug? For example:
each answer in answers
li!= answer.Response
Will display each item in the array. But, say I wanted just the the third item or, better yet, pass a variable for a specific index to display. What is the syntax for this?

- const indexIwant = 2;
if answers && answers.length>indexIwant
li=answers[indexIwant]
You need to ensure answers is not null and has at least the number of items to include the indexed item you want.
Another thing: don't use != unless you know exactly what data you are handling.

The simplest way to access a specific index of an array in pug:
-const meals = ["breakfast", "lunch", "dinner"]
-const favoriteDishes = ["coffee & doughnut salad","cheese danish soup","red wine","banana split sandwich"]
-const sides = ["ranch dressing","chutney","ketchup","chocolate sauce"]
p I reckon I will fix myself a hefty helping of #{favoriteDishes[2]} for #{meals[0]} with a side of #{sides[2]}.
Considering that indentation and whitespace is everything in jade / pug, this works :))

Can't acess dynamic element on webpage

I can't acess a textbox on a webpage box , it's a dynamic element. I've tried to filter it by many attributes on the xpath but it seems that the number that changes on the id and name is the only unique part of the element's xpath. All the filters I try show at least 3 element. I've been trying for 2 days, really need some help here.
from selenium import webdriver
def click_btn(submit_xpath): #clicks on button
submit_box = driver.find_element_by_xpath(submit_xpath)
submit_box.click()
driver.implicitly_wait(7)
return
#sends text to text box
def send_text_to_box(box_xpath, text):
box = driver.find_element_by_xpath(box_xpath)
box.send_keys(text)
driver.implicitly_wait(3)
return
descr = 'Can't send this text'
send_text_to_box('//*[#id="textfield-1285-inputEl"]', descr)' #the number
#here is the changeable part on the xpath
:
edit: it worked now with the following xpath //input[contains(#id, 'textfield') and contains(#aria-readonly, 'false') and contains (#class, 'x-form-invalid-field-default')] . Hopefully I found something specific on this element:

You can use partial string to find the element instead of an exact match. That is, in place of
send_text_to_box('//*[#id="textfield-1285-inputEl"]', descr)' please try send_text_to_box('//*[contains(#id,"inputEl")]', descr)'
In case if there are multiple elements that have string 'inputE1' in id, you should look for something else that remains constant(some other property may be). Else, try finding some other element and then traverse to the required input.

fetch html data from website into excel vba

Object does not support this property or method.
Error occur on this line:
For Each ele In objIE.document.getElementsByClassName("table info-table").getElementsByTagName("tr")

getElementsByClassName() returns a collection of matching elements (even if there's only one match), so you need something like (e.g.):
For Each ele In objIE.document.getElementsByClassName( _
"table info-table")(0).getElementsByTagName("tr")
which will loop over the tr elements in the first table with a matching class name.
If you need a different table you'll need to adjust the (0)

Using load with data from cells

In my code I'm trying to use load with entries from a cell, but it is not working. The portion of my code below produces a 3 dimensional array of strings. The strings represent the paths to file names.
for i = 1:Something
for j = 1:Something Different
for k = 1: Yet Something Something Different
DataPath{j,k,i} = 'F:\blah\blah\blah\fileijk %file changes based on i,j,and k
end
end
end
In the next part of the code I want to use load to open the files using the path names defined in the code above. I do this using the code below.
Dummy = DataPath{l,(k-1)*TSRRange+m};
Data = load(Dummy);
The idea is for Dummy to take the string content out of DataPath so I can use it in load. By doing this I thought that Dummy would be defined as a string and not a cell, but this is not the case. How do I pull the string out of DataPath so I can use it with load? Thanks.
I have to load the data this way because the data is located in multiple folders. I can post more of the code if needed, but it is complex.

Dummy is a cell because you assigned a 3D cell array but are accessing a 2D cell with Dummy = Datapath{1,(k-1)*TSRRange+m}
I don't believe that you can expect to access all cell elements I this way. Instead, use three indices just as you did when creating it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Problems to extract links with webscraping - excel

Related

(Python) Selenium list objects are not callable when looping through them (autocomplete don't work either)

Access specific index of array with Pug

Can't acess dynamic element on webpage

fetch html data from website into excel vba

Using load with data from cells

Categories

Resources