issue on web scraping in python - python-3.x

I have a problem with my code.
When I write the code below, I want the page I called to persist, but after a short period of time, the page closes.
can you help me?
from selenium import webdriver
driver=webdriver.Chrome()

Related

How can I find elements that are not in the page source using selenium (python)

Currently I'm trying to scrape something from a website. For that I need content of an email and so I use yopmail for that (https://yopmail.com).
In yopmail you have the mails on the left side on the screen with the mail subject under it. This text is the part I need.
[the mail view][1]
[the devtools code][2]
The problem now is that this code is not available in the page source. For what I red online it can be caused by javascript generation although, I'm not sure that is exactly the problem
I've tried multiple solutions:
attempt 1:
using beautifulSoup and locate the element (failed because not in the page source)
attempt 2:
tried locate element with xpath with the selenium driver (also unable to find)
attempt 3:
get the inner html of the body (still not available in that html)
driver.find_element_by_tag_name('body').get_attribute('innerHTML')
It feels like nothing works and also the other related posts here dont give me an answer that helps. Is there anyone who can help me with this?
[1]: https://i.stack.imgur.com/vTi0s.png
[2]: https://i.stack.imgur.com/nmBZ8.png
It seems like the element you are trying to get is inside an iframe, that's why you are not able to locate it. So first you have to switch to the iframe by using:
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.ID ,'ifinbox')))
element = driver.find_element(By.XPATH, "//div[#class='lms']")
print(element.text)
When you are done you can switch back to default content by using
driver.switch_to.default_content()
NOTE: You need to import the following
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

getting data from pages of a site

Im trying to get a site data with requests and then i wanna go to next page and again get the data.
the simplest way is that i put page numbers at the end of site url but the problme is the url doesnt reload for next page and it doesnt have page number in url
i can click on the next page button with selenium but i dont know how to get the data cause of driver in selenium doesnt have .text or any other functions as i know
what can i do?
this is the part of my code to trying access to the site data:
from selenium import webdriver
import requests
import re
from bs4 import BeautifulSoup as bfs
import time
driver = webdriver.Chrome(executable_path='/Users/payasystem1/w/samane_tadarokat/chromedriver')
# URL of website
url = "https://etend.setadiran.ir/etend/index.action"
r = driver.get(url);
time.sleep(5) # Let the user actually see something!
search_box = driver.find_element_by_id('next_tendersGrid_pager')
search_box.click()
print (str(r))
time.sleep(5)
driver.quit()
as you see i can access to the site and next page but i dont know how to get the shown table data in my program
if you know pls help me!

Clicking on the latest date link using Python+Selenium giving me none object

My aim is to click on the first link(latest by date) of a 'FIXED INCOME SECURITIES' tab on a website. For this, I am trying with below code-
import time
from selenium import webdriver
from selenium.webdriver import ActionChains
browser = webdriver.Chrome('chromedriver.exe')
browser.get('https://www.fbil.org.in/#/home');
browser.find_element_by_id('content-C').click()
link=browser.find_element_by_xpath('//*[#id="Gsec"]/tbody/tr[1]/td[2]/div/a')
link.click()
browser.quit()
With the above code, I am able to click on the 'FIXED INCOME SECURITIES' tab and links are showing under GSEC tab. But code is not further moving to click on the first link (by latest date).
Can anybody please help me to find out what I am doing wrong here?
After you click on the first link(latest by date) of a 'FIXED INCOME SECURITIES' tab on a website, there will be a task to download an Excel file. Then the webdriver shutdown the browser at browser.quit(), however, the downloading task has not finished yet.
So, if you click the first link just for downloading files, you could add "wait" for it.

Unable to click on element using Selenium

newbie here. I've been reading the site for a while as I'm still new to coding but hoping you can help.
I've worked my way through some tutorials/worked examples on web scraping and am looking at the website http://enfa.co.uk/
I am trying to open an instance of Chrome using chromedriver with selenium in Python and click on one of the sidebar options on the website called 'Clubs' - located on the left of the homepage.
I've navigated to the element that needs to be clicked and taken the xpath to use in my code (simple use of 'inspect' in the Chrome dev tools when hovering over the 'Clubs' link, then copying the xpath). My code opens chrome fine (so no issues with Chromedriver and that part of the project) but I receive an error telling me the object has no click attribute.
I've tried returning the object and it states my list has no elements (which seems to be the problem) but am unsure why... am I using the incorrect xpath or do some websites react differently i.e. won't respond to a click request like this?
I have run my code on other sites to check I'm utilising the click function and it seems to work ok so I'm a little stumped by this one. Any help would be great!
Code:
chromedriver = "/Users/philthomas/Desktop/web/chromedriver"
driver = webdriver.Chrome(chromedriver)
driver.get("http:enfa.co.uk")
button = driver.find_elements_by_xpath("/html/body/table/tbody/tr[5]/td")
button.click()
Traceback (most recent call last):
File "sel.py", line 9, in
button.click()
AttributeError: 'list' object has no attribute 'click'
HTML of link I am trying to click
find_elements_by_xpath returns list of all web elements that match the criteria. You have to use to find_element_by_xpath instead of find_elements_by_xpath.
Also, iframe are present on your page so you need to switch to that frame before you perform any action on it. Kindly find below ssolution which is working fine for me.
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path=r"C:\New folder\chromedriver.exe")
driver.maximize_window()
driver.get("http:enfa.co.uk")
driver.switch_to.frame(2);
ClubsLink=WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH, "//span[contains(text(),'Clubs')]")))
ClubsLink.click()
Output after clicking Clubs link :
helpful link to locate element :https://selenium-python.readthedocs.io/locating-elements.html
Change button = driver.find_elements_by_xpath("/html/body/table/tbody/tr[5]/td") to button = driver.find_element_by_xpath("/html/body/table/tbody/tr[5]/td")
driver.find_elements_by_xpath returns a collection of element, not a single one, hence you cannot click it.
You can find some details and examples here.

Beautiful Soup or Selenium?

I am fairly new to programming and I need a technical explanation to the below questions.
First of all, while I humbly know my way around both "Beautiful Soup" and "Selenium", I would like answers from experienced users, which are really hard to pull of the web or texts.
I am able to get data from a website by opening the page via selenium, then getting page.source for parsing through Beautiful soup. Beautiful soup on its own, does not give the html of the page, instead, it provides the source code of the whole website, which does not include the desired html of a particular page, even though the link is directly to that page!
1) Is there a way of getting the page_source without selenium, but only Beautiful Soup?
2) Can I use selenium without opening the page in question? (like is there an equivalent to .get('http..'), which will not physically open up the link! I find this to be a nightmare if dealing with > 300 links!!!!!)
2) Is there another more efficient pythonic way of doing this?
The code I am currently working with:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium import webdriver
import os
from selenium.webdriver import chrome
driver = webdriver.Chrome(executable_path=r'C:chromedriver.exe')
url= "https.."
driver.get(url)
driver.implicitly_wait(10)
soup = BeautifulSoup(driver.page_source,"lxml")
print(soup.text)
Thank you all in advance.
The api approach, recommended in the comments above, is to essentially hijack the api calls being made by the web page. If you go through the network tab of your browser and find the request being made that gets the data you are looking for, then you can mimic the same request in python.
Curl converter is a simple tool with screenshots of what I mean.
Once you know the request that is being made you can mimic the headers to make the server think you are the website making similar requests.

Resources