<a class="link__f5415c25" href="/profiles/people/1515754-andrea-jung" title="Andrea Jung">
I have above HTML element and tried using
driver.find_elements_by_class_name('link__f5415c25')
and
driver.get_attribute('href')
but it doesn't work at all. I expected to extract values in href.
How can I do that? Thanks!
You have to first locate the element, then retrieve the attribute href, like so:
href = driver.find_element_by_class_name('link__f5415c25').get_attribute('href')
if there are multiple links associated with that class name, you can try something like:
eList = driver.find_elements_by_class_name('link__f5415c25')
hrefList = []
for e in eList:
hrefList.append(e.get_attribute('href'))
for href in hrefList:
print(href)
Related
I'm working with html loosely structured like this:
...
<div class='TL-dsdf2323...'>
<a href='/link1/'>
(more stuff)
</a>
<a href='/link2/'>
(more stuff)
</a>
</div>
...
I want to be able to return all of the hrefs contained within this particular div. So far it seems like I am able to locate the proper div
div = driver.find_elements_by_xpath("//div[starts-with(#class, 'TL')]")
This is where I'm hitting a wall though. I've gone through other posts and tried several options such as
links = div.find_elements_by_xpath("//a[starts-with(#href,'/link')]")
and
div.find_element_by_partial_link_text('/link')
but I keep returning empty lists. Any idea where I'm going wrong here?
Edit:
here's a picture of the actual html. I simplified the div class name from ThumbnailLayout to TL and the href /listing to /link
As #mr_mooo_cow pointed out in a comment, a delay was needed in order to extract the links. Here is the final working code:
a_tags = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located( (By.XPATH, "//a[starts-with(#href,'/listing')]") ))
links = []
for link in a_tags:
links.append(link.get_attribute('href'))
Can you try something like this:
links = div.find_elements_by_xpath("//a[starts-with(#href,'/link') and ./div[starts-with(#class, 'TL')]]")
./ references the parent element in xpath. I haven't tested this so let me know if it doesn't work.
I am trying to get href attribute from an html list using Robot Framework keywords. For example suppose the html code
<ul class="my-list">
<li class="my-listitem"><a href="...">...</li>
...
<li class="my-listitem"><a href="...">...</li>
</ul>
I have tried to use the keywords WebElement, WebElements and for loop without success. How can I do it?
This is my MWE
*** Test Cases ***
#{a tags} = Create List
#{href attr} = Create List
#{li items} = Get WebElements class:my-listitem
FOR ${li} IN #{li items}
${a tag} = Get WebElement tag:a
Append To List #{a tags} ${a tag}
END
FOR ${a tag} IN #{a tags}
${attr} = Get Element Attribute css:my-listitem href
Append To List #{href attr} ${attr}
END
Thanks in advance.
The href is an attribute of the a elements, not the li, thus you need to target them. Get a reference for all such elements, and then get their href in the loop:
${the a-s}= Get WebElements xpath=//li[#class='my-listitem']/a # by targeting the correct element, the list is a reference to all such "a" elements
${all href}= Create List
FOR ${el} IN #{the a-s} # loop over each of them
${value}= Get Element Attribute ${el} href # get the individual href
Append To List ${all href} ${value} # and store it in a result list
END
Log To Console ${all href}
Here is a possible solution (not tested):
#{my_list}= Get WebElements xpath=//li[#class='my-listitem']
FOR ${element} IN #{my_list}
${attr}= Get Element Attribute ${element} href
Log ${attr} html=True
END
allId=soup.find_all("tr","data-id")
I just take data-id's values. How can I scrape these tags?
To fetch value of data-id try this.
allId=soup.find_all("tr",attrs={"data-id" : True})
for item in allId:
print(item['data-id'])
You can also use css selector.
allId=soup.select("tr[data-id]")
for item in allId:
print(item['data-id'])
I want to find all the elements that contain a certain class name but skip the ones the also contain another class name beside the one that i am searching for
I have the element <div class="examplenameA"> and the element <div class="examplenameA examplenameB">
At the moment i am doing this to overcome my problem:
items = driver.find_elements_by_class_name('examplenameA')
for item in items:
cname = item.get_attribute('class')
if 'examplenameB' in cname:
pass
else:
rest of code
I only want the elements that have the class name examplenameA and i want to skip the ones that also contain examplenameB
To find all the elements with class attribute as examplenameA leaving out the ones with class attribute as examplenameB you can use the following solution:
css_selector:
items = driver.find_elements_by_css_selector("div.examplenameA:not(.examplenameB)")
xpath:
items = driver.find_element_by_xpath("//div[contains(#class, 'examplenameA') and not(#class='examplenameB')]")
You can use xpath in this case. So as per your example you need to use something like driver.find_elements_by_xpath('//div[#class='examplenameA'). This will give you only the elements whose class is examplenameA
So how xpath works is : Xpath=//tagname[#attribute='value']
Hence the class is considered as the attribute & xpath will try to match the exact given value, in this case examplenameA, so <div class="examplenameA examplenameB"> will be ignored
In case of find_elements_by_class_name method, it will try to match the element which has the class as examplenameA, so the <div class="examplenameA examplenameB"> will also be matched
Hope this helps
Here is a way to get unique values. It doesn't work if i want to get unique attribute.
For example:
<a href = '11111'>sometext</a>
<a href = '11121'>sometext2</a>
<a href = '11111'>sometext3</a>
I want to get unique hrefs. Restricted by using xpath 1.0
page_src.xpath( '(//a[not(.=preceding::a)] )')
page_src.xpath( '//a/#href[not(.=preceding::a/#href)]' )
return duplicates.
Is it possible to resolve this nightmare with unique-values absence ?
UPD : it's not a solution like function i wanted, but i wrote python function, which iterates over parent elements and check if adding parent tag filters links to needed count.
Here is my example:
_x_item = (
'//a[starts-with(#href, "%s")'
'and (not(#href="%s"))'
'and (not (starts-with(#href, "%s"))) ]'
%(param1, param1, param2 ))
#rm double links
neededLinks = list(map(lambda vasa: vasa.get('href'), page_src.xpath(_x_item)))
if len(neededLinks)!=len(list(set(neededLinks))):
uniqLength = len(list(set(neededLinks)))
breakFlag = False
for linkk in neededLinks:
if neededLinks.count(linkk)>1:
dupLinks = page_src.xpath('//a[#href="%s"]'%(linkk))
dupLinkParents = list(map(lambda vasa: vasa.getparent(), dupLinks))
for dupParent in dupLinkParents:
tempLinks = page_src.xpath(_x_item.replace('//','//%s/'%(dupParent.tag)))
tempLinks = list(map(lambda vasa: vasa.get('href'), tempLinks))
if len(tempLinks)==len(set(neededLinks)):
breakFlag = True
_x_item = _x_item.replace('//','//%s/'%(dupParent.tag))
break
if breakFlag:
break
This WILL work if duplicate links has different parent, but same #href value.
As a result i will add parent.tag prefix like //div/my_prev_x_item
Plus, using python, i can update result to //div[#key1="val1" and #key2="val2"]/my_prev_x_item , iterating over dupParent.items(). But this is only working if items are not located in same parent object.
In result i need only x_path_expression, so i cant just use list(set(myItems)) .
I want easier solution ( like unique-values() ), if it exists. Plus my solution does not work if link's parent is same.
You can extract all the hrefs and then find the unique ones:
all_hrefs = page_src.xpath('//a/#href')
unique_hrefs = list(set(all_hrefs))