extract links from a page using selenium

extract links from a page using selenium - python-3.x

I am new to selenium, and I am studying how to extract what I want using selenium
I want to extract hyperlinks in a webpage, but only those that have specific tags. The hyperlink are all nested in the following structure:
<a title="Chris Frye" class="_32mo" href="https://www.facebook.com/CnMFrye"><span>Chris Frye</span></a>
However, when using tag 'a', I realized that it scrapes other hyperlinks, so I believe I need to condition both tag 'a' and 'class'.
In this case, what is the right strategy? I can't seem to use driver.find_elements_by_tag_name, because this is only for a single tag.
The page I want to scrape is : https://www.facebook.com/public/chris-frye

You can use css selector like bellow:
elements = driver.find_elements_by_css_selector('a._32mo')
Or use xpath:
elements = driver.find_elements_by_xpath("//a[#class='_32mo']")

Related

How do I get the class name from HTML using the absolute xpath

In the following picture, the full xpath of the yellow highlighted bit of HTML is
/html/body/bx-site/ng-component/div/sp-sports-ui/div/main/div/section/main/sp-path-event/div/sp-next-events/div/div/div[2]/div1/sp-coupon1/sp-multi-markets/section/section/sp-outcomes/sp-two-way-vertical[2]/ul/li1/sp-outcome/button
I am using selenium to scrape some data from a website. The text of the xpath is what I want but I also need the class name of the yellow highlight bit of HTML. The class name constantly changes so I need a way to retrieve the class name along with the text. In this case the class name would be "bet-btn". I am using driver.find_element_by_xpath to get the text from the html, but can't figure out a way to retrieve the class name. Using the xpath is there a way in selenium to retrieve the class name of the yellow highlighted bit.

I would advise against using absolute xpath unless you really needed to
Try this instead:
elem = driver.find_element_by_xpath("//sp-outcome/button")
class_value = elem.get_attribute("class")
BTW that xpath is assuming there are no other //sp-outcome/button element paths on that page. If there are you would need to expand it some, but you still wouldn't need the entire absolute xpath. Those are generally pretty fragile.

Selenium: failing to find element by XPATH Python

I am a little bit new to programming but python really made get into it. I am trying to create a programm that automatically checks for updates in a website. I've successfully implemented the neccessary code to call the page of enrollment but yet there is one element that cannot be located. Since I have to do it for multiple courses and iterate throught them there is no specific id, I've tried to find it by title but also this didn't work.
Is there a way you can locate the button with the title "enroll".
I've tried
driver.find_element_by_xpath("//a\[#title ='enroll']").click()
but this didn't work and I always get
NoSuchElement
error.
The XPATH for the button is simply: //*\[#id="id572"\]
Here is the part of the HTML code:

From the screenshot of HTML code you provided, the element is <button>, not <a>.
Try this xpath expression //button[#title='enroll']

This should do it if it's not in any iframes. Just grab the button whose title is enroll and click. Your css selector was an a tag and it might get id dynamically.
driver.find_element_by_css_selector("button[title ='enroll']").click()

Node.js Selenium can't find element. No such element error

So i have this element:
and i just can't find it, tried
driver.findElement(webdriver.By.partialLinkText('my_sites')).click();
but that just throws no such element error

I believe selecting by partial link text searches the visible text (the text in between the opening and closing a tag) rather than the href. Since you have no text within the a tag, it is not finding it. You would have to find using xpath something like "//a[contains(href, 'my_sites')]"

For selenium link text is what you find between the HTML brackets, for example:
LinkText
You can try to select via CSS selector:
driver.findElement(By.cssSelector("a[href*='my_sites']")).click();
Check this link for more info:
How to click a link whose href has a certain substring in Selenium?

Can selenium find location of link and just not search for a specific link?

I'm building a program that looks over google news and gets the links for the top ten stories, but I'm running into problems. I just can tell it to find a specific link because they change every day. Can selenium find the location where these links would be? if not what can I do?

Try finding an element which contains the uniquely identify the URLs. I saw the HTML and noticed that the links are all within a <h2> tag with class 'esc-lead-article-title'. So by simply using xpath I was able to fetch the URLs.
links = driver.get_elements_by_xpath("//h2[#class='esc-lead-article-title']/a")
for link in links:
print(link.get_attribute("url"))

You can get all the url's using "href" tagName.
Try to use below selenium Java sample code :
List<WebElement> links = mDriver.findElements(By.tagName("a"));
for (WebElement link : links) {
System.out.println("LInk= " + link.getAttribute("href"));
}

Scrape all Text on a Webpage that is buried within Tags in Python 3

I need to scrape a webpage (https://www304.americanexpress.com/credit-card/compare) but I am running into an issue -- the text that I need on the front page is absolutely buried within many different formatting tags.
I know how to scrape a regular page using Beautiful Soup but this is not giving me what I want (i.e. text is missing, some tags make it through...)
import requests
from bs4 import BeautifulSoup
from collections import Counter
urls = ['https://www304.americanexpress.com/credit-card/compare']
with open('thisisanew.txt', 'w', encoding='utf-8') as outfile:
for url in urls:
website = requests.get(url)
soup = BeautifulSoup(website.content)
text = [''.join(s.findAll(text=True))for s in soup.findAll('p')]
for item in text:
print (''.join([element.text for element in soup.body.find_all(lambda tag: tag != 'script', recursive=False)]))
Is there a special way to scrape this particular webpage?

This is just a regular webpage. For instance <span class="card-offer-des"> contains the text after you use your new Card to make $1,000 in purchases within the first 3 months.. I also tried turning off Javascript in the browser. The text is still there as it should be.
So I don't really see what the problem is. Also, I would suggest that try to learn lxml and xpath. Once you know how that works, it's actually easier to get the text you want.

The code you should try with python is:
if not "what-have-you" in StringPulledFromSite: continue;
if "what-have-you" in StringPulledFromSite:
[your code to save to the filesystem];
And the string you should aim for would be something like:
((<span class=\") && (/>))
you should try to find both (and attempt to be specific, so that you can easily differentiate from them). Once you've found both, save the string, test it and save the text.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

extract links from a page using selenium - python-3.x

You can use css selector like bellow: elements = driver.find_elements_by_css_selector('a._32mo') Or use xpath: elements = driver.find_elements_by_xpath("//a[#class='_32mo']")

Related

How do I get the class name from HTML using the absolute xpath

Selenium: failing to find element by XPATH Python

Node.js Selenium can't find element. No such element error

Can selenium find location of link and just not search for a specific link?

Scrape all Text on a Webpage that is buried within Tags in Python 3

Categories

Resources