Login to a website then open it in browser - python-3.x

I am trying to write a Python 3 code that logins in to a website and then opens it in a web browser to be able to take a screenshot of it.
Looking online I found that I could do webbrowser.open('example.com')
This opens the website, but cannot login.
Then I found that it is possible to login to a website using the request library, or urllib.
But the problem with both it that they do not seem to provide the option of opening a web page.
So how is it possible to login to a web page then display it, so that a screenshot of that page could be taken
Thanks

Have you considered Selenium? It drives a browser natively as a user would, and its Python client is pretty easy to use.
Here is one of my latest works with Selenium. It is a script to scrape multiple pages from a certain website and save their data into a csv file:
import os
import time
import csv
from selenium import webdriver
cols = [
'ies', 'campus', 'curso', 'grau_turno', 'modalidade',
'classificacao', 'nome', 'inscricao', 'nota'
]
codigos = [
96518, 96519, 96520, 96521, 96522, 96523, 96524, 96525, 96527, 96528
]
if not os.path.exists('arquivos_csv'):
os.makedirs('arquivos_csv')
options = webdriver.ChromeOptions()
prefs = {
'profile.default_content_setting_values.automatic_downloads': 1,
'profile.managed_default_content_settings.images': 2
}
options.add_experimental_option('prefs', prefs)
# Here you choose a webdriver ("the browser")
browser = webdriver.Chrome('chromedriver', chrome_options=options)
for codigo in codigos:
time.sleep(0.1)
# Here is where I set the URL
browser.get(f'http://www.sisu.mec.gov.br/selecionados?co_oferta={codigo}')
with open(f'arquivos_csv/sisu_resultados_usp_final.csv', 'a') as file:
dw = csv.DictWriter(file, fieldnames=cols, lineterminator='\n')
dw.writeheader()
ies = browser.find_element_by_xpath('//div[#class ="nome_ies_p"]').text.strip()
campus = browser.find_element_by_xpath('//div[#class ="nome_campus_p"]').text.strip()
curso = browser.find_element_by_xpath('//div[#class ="nome_curso_p"]').text.strip()
grau_turno = browser.find_element_by_xpath('//div[#class = "grau_turno_p"]').text.strip()
tabelas = browser.find_elements_by_xpath('//table[#class = "resultado_selecionados"]')
for t in tabelas:
modalidade = t.find_element_by_xpath('tbody//tr//th[#colspan = "4"]').text.strip()
aprovados = t.find_elements_by_xpath('tbody//tr')
for a in aprovados[2:]:
linha = a.find_elements_by_class_name('no_candidato')
classificacao = linha[0].text.strip()
nome = linha[1].text.strip()
inscricao = linha[2].text.strip()
nota = linha[3].text.strip().replace(',', '.')
dw.writerow({
'ies': ies, 'campus': campus, 'curso': curso,
'grau_turno': grau_turno, 'modalidade': modalidade,
'classificacao': classificacao, 'nome': nome,
'inscricao': inscricao, 'nota': nota
})
browser.quit()
In short, you set preferences, choose a webdriver (I recommend Chrome), point to the URL and that's it. The browser is automatically opened and start executing your instructions.
I have tested using it to log in and it works fine, but never tried to take screenshot. It theoretically should do.

Related

Attempting to open browser with Specific Profile in Incognito Mode

Attempting to open browser with Specific Profile in Incognito Mode.
After the browser is opened, need to verify that the URL opened the proper page
I am using the following code
browserName = set()
profile_name = 'Person 1
rpa_string = 'https://www.google.ca'
path_br_ch = '/Google/Chrome/Application/chrome.exe'
browserName.add("chrome")
browser_profile = f"--profile-directory={profile_name}"
browser_path = path_br_ch
browser_privacy = "--incognito"
rpa_command = [browser_path, browser_privacy, browser_profile, rpa_string]
browser_process = subprocess.Popen(rpa_command)
# Next step need to verify that the google page is opened
# Please help

Selenium anticaptcha submission

I am trying to fill recaptcha using anticaptcha api.
But I am unable to figure out how to submit response.
Here is what I am trying to do:
driver.switch_to.frame(driver.find_element_by_xpath('//iframe'))
url_key = driver.find_element_by_xpath('//*[#id="captcha-submit"]/div/div/iframe').get_attribute('src')
#site_key = re.search('k=([^&]+)',url_key).group(1)
site_key = '6Ldd2doaAAAAAFhvJxqgQ0OKnYEld82b9FKDBnRE'
api_key = 'api_keys'
url = driver.current_url
client = AnticaptchaClient(api_key)
task = NoCaptchaTaskProxylessTask(url, site_key)
job = client.createTask(task)
job.join()
driver.execute_script("document.getElementById('g-recaptcha-response').innerHTML='{}';".format(job.get_solution_response()))
driver.refresh()
Above code snippet only refreshes the same page and not redirecting to input url.
Then I see that there is a variable in script on the same page and I tried to execute that variable too to submit form just like that
driver.execute_script("var captchaSubmitEl = document.getElementById('captcha-submit');")
driver.refresh()
Which also fails.The webpage is here.

How do you find a url from a input button (web scraping)

I'm webscraping a asp.net website, and there is a input button that links to a page I need. I'm wondering how I can get the url to the site without using automation like Selenium.
Note: I don't need to scrape the actual page, the url contains all the info I need.
This is the code I used to get to the website but I don't know where to start with scraping the button url:
select_session_url = 'http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx'
session = requests.Session()
session_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$gvSessions", "__EVENTARGUMENT": "$3"}
session.post(select_session_url, session_payload, headers)
senate_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$btnSenate", "__EVENTARGUMENT": "Senate"}
session.post('http://alisondb.legislature.state.al.us/Alison/SessPrefiledBills.aspx', senate_payload, headers)
page = session.get('http://alisondb.legislature.state.al.us/Alison/SESSBillsList.aspx?SELECTEDDAY=1:2019-03-05&BODY=1753&READINGTYPE=R1&READINGCODE=B&PREFILED=Y')
member_soup = BeautifulSoup(page.text, 'lxml')
member = member_soup.find_all('input', value='Jones')
The html for the button is below:
<input type="button" value="Jones" onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvBills','SponsorName$47')" style="background-color:Transparent;border-color:Silver;border-style:Outset;font-size:Small;height:30px;width:100px;">
How to find the inputs onclick?
You were close by but should replace your line with:
member_soup.find('input', {"value" : "Jones"})['onclick']
Example
import requests
from bs4 import BeautifulSoup
select_session_url = 'http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx'
session = requests.Session()
session_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$gvSessions", "__EVENTARGUMENT": "$3"}
session.post(select_session_url, session_payload, headers)
senate_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$btnSenate", "__EVENTARGUMENT": "Senate"}
session.post('http://alisondb.legislature.state.al.us/Alison/SessPrefiledBills.aspx', senate_payload, headers)
page = session.get('http://alisondb.legislature.state.al.us/Alison/SESSBillsList.aspx?SELECTEDDAY=1:2019-03-05&BODY=1753&READINGTYPE=R1&READINGCODE=B&PREFILED=Y')
member_soup = BeautifulSoup(page.text, 'lxml')
member = member_soup.find('input', {"value" : "Jones"})['onclick']
member
Output
"javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvBills','SponsorName$39')"
Edit
You may interested how to start with selenium ...
from selenium import webdriver
from time import sleep
browser = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
browser.get('http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx')
sleep(0.9)
browser.find_element_by_link_text('Regular Session 2019').click()
sleep(0.9)
browser.find_element_by_link_text('Prefiled Bills').click()
sleep(2)
browser.find_element_by_css_selector('input[value="Senate"]').click()
sleep(2)
browser.find_element_by_css_selector('input[value="Jones"]').click()
sleep(2)
print(browser.current_url)
browser.close()
Output
http://alisondb.legislature.state.al.us/Alison/Member.aspx?SPONSOR=Jones&BODY=1753&SPONSOR_OID=100453

Selenium to submit recaptcha using 2captcha Python

I am trying to submit Recaptcha on a search form using Python3, Selenium, and 2captcha.
Everything is working fine except submitting the Recaptcha after sending google-tokin in the text-area of Recaptcha.
Please guide me what am I missing?
When I look into my Selenium Webdriver window it shows Recaptcha text-area filled with google-tokin but I am not able to submit it to continue for search result.
Thankyou.
from selenium import webdriver
from time import sleep
from datetime import datetime
from twocaptcha import TwoCaptcha
import requests
## Launching webdriver
driverop = webdriver.ChromeOptions()
driverop.add_argument("--start-maximized")
driver = webdriver.Chrome("chromedriver/chromedriver",options=driverop)
url = "https://app.skipgenie.com/Account/Login"
sleep(randint(5,10))
email = "..."
password = ".."
input_data = pd.read_excel("input_data.xlsx")
user_Data = []
driver.get(url)
driver.find_element_by_id("Email").send_keys(email)
driver.find_element_by_id("Password").send_keys(password)
driver.find_element_by_class_name("btn-lg").click()
driver.find_element_by_id("firstName").send_keys(input_data.iloc[0][0])
driver.find_element_by_id("lastName").send_keys(input_data.iloc[0][1])
driver.find_element_by_id("street").send_keys(input_data.iloc[0][2])
driver.find_element_by_id("city").send_keys(input_data.iloc[0][3])
driver.find_element_by_id("state").send_keys(input_data.iloc[0][4])
driver.find_element_by_id("zip").send_keys(int(input_data.iloc[0][5]))
# 2Captcha service
service_key = 'ec.....' # 2captcha service key
google_site_key = '6LcxZtQZAAAAAA7gY9-aUIEkFTnRdPRob0Dl1k8a'
pageurl = 'https://app.skipgenie.com/Search/Search'
url = "http://2captcha.com/in.php?key=" + service_key + "&method=userrecaptcha&googlekey=" + google_site_key + "&pageurl=" + pageurl
resp = requests.get(url)
if resp.text[0:2] != 'OK':
quit('Service error. Error code:' + resp.text)
captcha_id = resp.text[3:]
fetch_url = "http://2captcha.com/res.php?key="+ service_key + "&action=get&id=" + captcha_id
for i in range(1, 10):
sleep(5) # wait 5 sec.
resp = requests.get(fetch_url)
if resp.text[0:2] == 'OK':
break
driver.execute_script('var element=document.getElementById("g-recaptcha-response"); element.style.display="";')
driver.execute_script("""
document.getElementById("g-recaptcha-response").innerHTML = arguments[0]
""", resp.text[3:])
Answering the question so the people who encounter situations like this could get help from this answer.
I was missing that after you get google token you need to display recaptcha text-area and send google-token to text-area like this
To display text-area of recaptcha.
driver.execute_script('var element=document.getElementById("g-recaptcha-response"); element.style.display="";')
after that send google token like this:
driver.execute_script("""
document.getElementById("g-recaptcha-response").innerHTML = arguments[0]
""", resp.text[3:])
then you need to make text-area display to none so the search button near repcatcha is clickable.
driver.execute_script('var element=document.getElementById("g-recaptcha-response"); element.style.display="none";')
then you need to click on the search button to get the search result.

Python 3.6.3 - Send MozillaCookieJar File and read HTML source code

I'm very fresh about python (i'm learning just about 1 day long).
I need to send cookies (i got them from my Google Chrome browser to a *.text file) and be redirected after login to my account page, to after read a source HTML code do what i wanna do. With much searches allong internet, i already have this piece of code:
import os
import time
import urllib.request
import http.cookiejar
while 1:
cj = http.cookiejar.MozillaCookieJar('cookies.txt')
cj.load()
print(len(cj)) # output: 9
print(cj) # output: <MozillaCookieJar[<Cookie .../>, <Cookie .../>, ... , <Cookie .../>]>
for cookie in cj:
cookie.expires = time.time() + 14 * 24 * 3600
cookieProcessor = urllib.request.HTTPCookieProcessor(cj)
opener = urllib.request.build_opener(cookieProcessor)
request = urllib.request.Request(url='https://.../')
response = opener.open(request, timeout=100)
s = str(response.read(), 'utf-8')
print(s)
if 'class' in s:
os.startfile('test.mp3')
time.sleep(5)
With this code i believe, hope i'm not be mistaken, have sending the cookies correctly. My main question is: How can i wait and catch the source HTML code after server redirect my login to personal page? I can't call again my Request with the same URL.
Thank you in advance.

Resources