Selenium anticaptcha submission - python-3.x

I am trying to fill recaptcha using anticaptcha api.
But I am unable to figure out how to submit response.
Here is what I am trying to do:
driver.switch_to.frame(driver.find_element_by_xpath('//iframe'))
url_key = driver.find_element_by_xpath('//*[#id="captcha-submit"]/div/div/iframe').get_attribute('src')
#site_key = re.search('k=([^&]+)',url_key).group(1)
site_key = '6Ldd2doaAAAAAFhvJxqgQ0OKnYEld82b9FKDBnRE'
api_key = 'api_keys'
url = driver.current_url
client = AnticaptchaClient(api_key)
task = NoCaptchaTaskProxylessTask(url, site_key)
job = client.createTask(task)
job.join()
driver.execute_script("document.getElementById('g-recaptcha-response').innerHTML='{}';".format(job.get_solution_response()))
driver.refresh()
Above code snippet only refreshes the same page and not redirecting to input url.
Then I see that there is a variable in script on the same page and I tried to execute that variable too to submit form just like that
driver.execute_script("var captchaSubmitEl = document.getElementById('captcha-submit');")
driver.refresh()
Which also fails.The webpage is here.

Related

Attempting to open browser with Specific Profile in Incognito Mode

Attempting to open browser with Specific Profile in Incognito Mode.
After the browser is opened, need to verify that the URL opened the proper page
I am using the following code
browserName = set()
profile_name = 'Person 1
rpa_string = 'https://www.google.ca'
path_br_ch = '/Google/Chrome/Application/chrome.exe'
browserName.add("chrome")
browser_profile = f"--profile-directory={profile_name}"
browser_path = path_br_ch
browser_privacy = "--incognito"
rpa_command = [browser_path, browser_privacy, browser_profile, rpa_string]
browser_process = subprocess.Popen(rpa_command)
# Next step need to verify that the google page is opened
# Please help

How do you find a url from a input button (web scraping)

I'm webscraping a asp.net website, and there is a input button that links to a page I need. I'm wondering how I can get the url to the site without using automation like Selenium.
Note: I don't need to scrape the actual page, the url contains all the info I need.
This is the code I used to get to the website but I don't know where to start with scraping the button url:
select_session_url = 'http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx'
session = requests.Session()
session_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$gvSessions", "__EVENTARGUMENT": "$3"}
session.post(select_session_url, session_payload, headers)
senate_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$btnSenate", "__EVENTARGUMENT": "Senate"}
session.post('http://alisondb.legislature.state.al.us/Alison/SessPrefiledBills.aspx', senate_payload, headers)
page = session.get('http://alisondb.legislature.state.al.us/Alison/SESSBillsList.aspx?SELECTEDDAY=1:2019-03-05&BODY=1753&READINGTYPE=R1&READINGCODE=B&PREFILED=Y')
member_soup = BeautifulSoup(page.text, 'lxml')
member = member_soup.find_all('input', value='Jones')
The html for the button is below:
<input type="button" value="Jones" onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvBills','SponsorName$47')" style="background-color:Transparent;border-color:Silver;border-style:Outset;font-size:Small;height:30px;width:100px;">
How to find the inputs onclick?
You were close by but should replace your line with:
member_soup.find('input', {"value" : "Jones"})['onclick']
Example
import requests
from bs4 import BeautifulSoup
select_session_url = 'http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx'
session = requests.Session()
session_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$gvSessions", "__EVENTARGUMENT": "$3"}
session.post(select_session_url, session_payload, headers)
senate_payload = {"__EVENTTARGET":"ctl00$ContentPlaceHolder1$btnSenate", "__EVENTARGUMENT": "Senate"}
session.post('http://alisondb.legislature.state.al.us/Alison/SessPrefiledBills.aspx', senate_payload, headers)
page = session.get('http://alisondb.legislature.state.al.us/Alison/SESSBillsList.aspx?SELECTEDDAY=1:2019-03-05&BODY=1753&READINGTYPE=R1&READINGCODE=B&PREFILED=Y')
member_soup = BeautifulSoup(page.text, 'lxml')
member = member_soup.find('input', {"value" : "Jones"})['onclick']
member
Output
"javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvBills','SponsorName$39')"
Edit
You may interested how to start with selenium ...
from selenium import webdriver
from time import sleep
browser = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
browser.get('http://alisondb.legislature.state.al.us/Alison/SelectSession.aspx')
sleep(0.9)
browser.find_element_by_link_text('Regular Session 2019').click()
sleep(0.9)
browser.find_element_by_link_text('Prefiled Bills').click()
sleep(2)
browser.find_element_by_css_selector('input[value="Senate"]').click()
sleep(2)
browser.find_element_by_css_selector('input[value="Jones"]').click()
sleep(2)
print(browser.current_url)
browser.close()
Output
http://alisondb.legislature.state.al.us/Alison/Member.aspx?SPONSOR=Jones&BODY=1753&SPONSOR_OID=100453

Selenium to submit recaptcha using 2captcha Python

I am trying to submit Recaptcha on a search form using Python3, Selenium, and 2captcha.
Everything is working fine except submitting the Recaptcha after sending google-tokin in the text-area of Recaptcha.
Please guide me what am I missing?
When I look into my Selenium Webdriver window it shows Recaptcha text-area filled with google-tokin but I am not able to submit it to continue for search result.
Thankyou.
from selenium import webdriver
from time import sleep
from datetime import datetime
from twocaptcha import TwoCaptcha
import requests
## Launching webdriver
driverop = webdriver.ChromeOptions()
driverop.add_argument("--start-maximized")
driver = webdriver.Chrome("chromedriver/chromedriver",options=driverop)
url = "https://app.skipgenie.com/Account/Login"
sleep(randint(5,10))
email = "..."
password = ".."
input_data = pd.read_excel("input_data.xlsx")
user_Data = []
driver.get(url)
driver.find_element_by_id("Email").send_keys(email)
driver.find_element_by_id("Password").send_keys(password)
driver.find_element_by_class_name("btn-lg").click()
driver.find_element_by_id("firstName").send_keys(input_data.iloc[0][0])
driver.find_element_by_id("lastName").send_keys(input_data.iloc[0][1])
driver.find_element_by_id("street").send_keys(input_data.iloc[0][2])
driver.find_element_by_id("city").send_keys(input_data.iloc[0][3])
driver.find_element_by_id("state").send_keys(input_data.iloc[0][4])
driver.find_element_by_id("zip").send_keys(int(input_data.iloc[0][5]))
# 2Captcha service
service_key = 'ec.....' # 2captcha service key
google_site_key = '6LcxZtQZAAAAAA7gY9-aUIEkFTnRdPRob0Dl1k8a'
pageurl = 'https://app.skipgenie.com/Search/Search'
url = "http://2captcha.com/in.php?key=" + service_key + "&method=userrecaptcha&googlekey=" + google_site_key + "&pageurl=" + pageurl
resp = requests.get(url)
if resp.text[0:2] != 'OK':
quit('Service error. Error code:' + resp.text)
captcha_id = resp.text[3:]
fetch_url = "http://2captcha.com/res.php?key="+ service_key + "&action=get&id=" + captcha_id
for i in range(1, 10):
sleep(5) # wait 5 sec.
resp = requests.get(fetch_url)
if resp.text[0:2] == 'OK':
break
driver.execute_script('var element=document.getElementById("g-recaptcha-response"); element.style.display="";')
driver.execute_script("""
document.getElementById("g-recaptcha-response").innerHTML = arguments[0]
""", resp.text[3:])
Answering the question so the people who encounter situations like this could get help from this answer.
I was missing that after you get google token you need to display recaptcha text-area and send google-token to text-area like this
To display text-area of recaptcha.
driver.execute_script('var element=document.getElementById("g-recaptcha-response"); element.style.display="";')
after that send google token like this:
driver.execute_script("""
document.getElementById("g-recaptcha-response").innerHTML = arguments[0]
""", resp.text[3:])
then you need to make text-area display to none so the search button near repcatcha is clickable.
driver.execute_script('var element=document.getElementById("g-recaptcha-response"); element.style.display="none";')
then you need to click on the search button to get the search result.

Login to a website then open it in browser

I am trying to write a Python 3 code that logins in to a website and then opens it in a web browser to be able to take a screenshot of it.
Looking online I found that I could do webbrowser.open('example.com')
This opens the website, but cannot login.
Then I found that it is possible to login to a website using the request library, or urllib.
But the problem with both it that they do not seem to provide the option of opening a web page.
So how is it possible to login to a web page then display it, so that a screenshot of that page could be taken
Thanks
Have you considered Selenium? It drives a browser natively as a user would, and its Python client is pretty easy to use.
Here is one of my latest works with Selenium. It is a script to scrape multiple pages from a certain website and save their data into a csv file:
import os
import time
import csv
from selenium import webdriver
cols = [
'ies', 'campus', 'curso', 'grau_turno', 'modalidade',
'classificacao', 'nome', 'inscricao', 'nota'
]
codigos = [
96518, 96519, 96520, 96521, 96522, 96523, 96524, 96525, 96527, 96528
]
if not os.path.exists('arquivos_csv'):
os.makedirs('arquivos_csv')
options = webdriver.ChromeOptions()
prefs = {
'profile.default_content_setting_values.automatic_downloads': 1,
'profile.managed_default_content_settings.images': 2
}
options.add_experimental_option('prefs', prefs)
# Here you choose a webdriver ("the browser")
browser = webdriver.Chrome('chromedriver', chrome_options=options)
for codigo in codigos:
time.sleep(0.1)
# Here is where I set the URL
browser.get(f'http://www.sisu.mec.gov.br/selecionados?co_oferta={codigo}')
with open(f'arquivos_csv/sisu_resultados_usp_final.csv', 'a') as file:
dw = csv.DictWriter(file, fieldnames=cols, lineterminator='\n')
dw.writeheader()
ies = browser.find_element_by_xpath('//div[#class ="nome_ies_p"]').text.strip()
campus = browser.find_element_by_xpath('//div[#class ="nome_campus_p"]').text.strip()
curso = browser.find_element_by_xpath('//div[#class ="nome_curso_p"]').text.strip()
grau_turno = browser.find_element_by_xpath('//div[#class = "grau_turno_p"]').text.strip()
tabelas = browser.find_elements_by_xpath('//table[#class = "resultado_selecionados"]')
for t in tabelas:
modalidade = t.find_element_by_xpath('tbody//tr//th[#colspan = "4"]').text.strip()
aprovados = t.find_elements_by_xpath('tbody//tr')
for a in aprovados[2:]:
linha = a.find_elements_by_class_name('no_candidato')
classificacao = linha[0].text.strip()
nome = linha[1].text.strip()
inscricao = linha[2].text.strip()
nota = linha[3].text.strip().replace(',', '.')
dw.writerow({
'ies': ies, 'campus': campus, 'curso': curso,
'grau_turno': grau_turno, 'modalidade': modalidade,
'classificacao': classificacao, 'nome': nome,
'inscricao': inscricao, 'nota': nota
})
browser.quit()
In short, you set preferences, choose a webdriver (I recommend Chrome), point to the URL and that's it. The browser is automatically opened and start executing your instructions.
I have tested using it to log in and it works fine, but never tried to take screenshot. It theoretically should do.

scrapy + splash : not rendering full page javascript data

I am just exploring scrapy with splash and I am trying to scrape all the product (pants) data with productid,name and price from one of the e-commerce site
gap but I didn't see all the dynamic product data loaded when I see from splash web UI splash web UI (only 16 items are loading though for every request - no clue why)
I tried with the following options but no luck
Increasing wait time upto 20 sec
By starting the docker with "--disable-private-mode"
By using lua_script for page scrolling
With view report full option splash:set_viewport_full()
lua_script2 = """ function main(splash)
local num_scrolls = 10
local scroll_delay = 2.0
local scroll_to = splash:jsfunc("window.scrollTo")
local get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
splash:wait(splash.args.wait)
for _ = 1, num_scrolls do
scroll_to(0, get_body_height())
splash:wait(scroll_delay)
end
return splash:html()
end"""
yield SplashRequest(
url,
self.parse_product_contents,
endpoint='execute',
args={
'lua_source': lua_script2,
'wait': 5,
}
)
Can anyone please shed some light on this behavior?
p.s : I am using scrapy framework and I am able to parse the product information (itemid,name and price) from the render.html (but render.html has only 16 items information)
I updated the script to below
function main(splash)
local num_scrolls = 10
local scroll_delay = 2.0
splash:set_viewport_size(1980, 8020)
local scroll_to = splash:jsfunc("window.scrollTo")
local get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
-- splash:set_viewport_full()
splash:wait(10)
splash:runjs("jQuery('span.icon-x').click();")
splash:wait(1)
for _ = 1, num_scrolls do
scroll_to(0, get_body_height())
splash:wait(scroll_delay)
end
splash:wait(30)
return {
png = splash:png(),
html = splash:html(),
har = splash:har()
}
end
And ran it in my local splash, the png doesn't work fine but the HTML has the last product
The only issue was when the email subscribe popup is there it won't scroll, so I added code to close it

Resources