scraping selenium protected site - python-3.x

I'm running into an issue being able to automate (website (clickhere))
It appears that the site is protected in someway for chromedriver. When I visit the website normally I have no problem, but when selenium attempts to automate the site, the url redirects to some other home page.
Here is my sample code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
chrome_options = Options()
#chrome_options.add_argument("--headless")
EXE_PATH = 'chromedriver.exe'
driver = webdriver.Chrome(executable_path=EXE_PATH)#, options=chrome_options)
driver.get(SEE URL ABOVE)
time.sleep(5)
print(driver.current_url)
driver.quit()
Please use the link in the hyperlinked text. I removed it from my code here.
Wondering if anyone has run into similar issues with websites picking up that the browser is being automated with selenium, and if there is any possible way around this. If not, maybe you have a suggestion that you could share to tackle from another angle.

A bit more about your usecase and why you felt ...that the site is protected... would have helped us to further analyze the issue. However through Selenium to access the site you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
#options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://publicindex.sccourts.org/horry/publicindex/")
WebDriverWait(driver, 10).until(EC.title_contains("Index"))
print(driver.current_url)
driver.quit()
Console Output:
https://publicindex.sccourts.org/horry/publicindex/
Outro
You can find a couple of relevant discussions in:
Chrome browser initiated through ChromeDriver gets detected
Selenium and non-headless browser keeps asking for Captcha

Related

Selenium, how to hide webdriver's presence on site

I was just wondering, how can i hide the fact that i'm using selenium to access epic games store site? It heavily kicks me even if i'm just browsing the site myself using webdriver. I saw similar posts about this topic on stack, but they didn't help me. Here's the settings I have now. What's wrong with them?
from fake_useragent import UserAgent
from selenium import webdriver
ua = UserAgent()
us_ag = ua.random
options = webdriver.ChromeOptions()
options.add_argument(f'user-agent={us_ag}')
options.add_argument('--ignore-ssl-errors')
options.add_argument('window-size=1280,800')
options.add_argument('--ignore-certificate-errors-spki-list')
options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(options=options)
The site keeps telling me that i've done captcha wrong. Even if I try for another 10 times - he still tells me the same. There are no problems when I'm browsing with my regular browser.

Selenium using Chrome browser - Timeout during HTTP basic authentication [Python]

As a part of automation, I am trying to login to a locally hosted website using selenium. The page serves a HTTP Basic authentication popup and I use the below code to send the credentials. However, upon using a debugger and executing the code step-wise, I deciphered that a TimeOut exception occurs repeatedly (at line marked with a comment beside it).
Details
I tried on Chrome browser and its corresponding chrome WebDriver for all versions from 79.0 till the latest 84.0, but this exception seems to occur in all the cases.
OS - Windows Server W2k12 VM. [Tried on Windows 10 as well]
Python version 3.8
Code
import time
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
driver = webdriver.Chrome(chrome_options=options)
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 10)
alert = wait.until(EC.alert_is_present()) # This line causes the time out exception
alert = driver.switch_to.alert
alert.send_keys('domain\username' + Keys.TAB + 'password')
alert.accept()
time.sleep(5)
Note:
Due to a bug in the internal page, I cannot send the credentials through the URL using the https://username:password#ipaddress:port format, so this has forced me to resort to making the above selenium method as my only choice.
The corresponding code for firefox works well (for the same target internal website)
Probable Hunch
I wonder, if I am missing any packages on the newly created VM which is crucial for chrome WebDriver to work. For example, Firefox Gecko driver required Visual studio redistributables in order to work. Does chrome WebDriver require any such equivalent packages?
I don't believe that the Basic Auth popup is exposed as an "alert" in ChromeDriver, so AFAIK your only option is https://username:password#ipaddress:port. Interesting that you say that you can program to the popup in Firefox.
Until Chrome 78, the auth popup would display and block the test script, and you had to "manually" enter the credentials (or use a more general "desktop window manipulation" API), but I don't think that works anymore.
What worked here finally was to discard using the Selenium way of using send_keys() approach but use other packages to do the job. The following two snippets worked.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import keyboard
​
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
driver = webdriver.Chrome(r"path_to_chromedriver.exe", options=options)
driver.maximize_window()
driver.get("<replace with url>")
​
keyboard.write(r"<replace with username>")
keyboard.press_and_release("tab")
keyboard.write("<replace with password>")
keyboard.press_and_release("tab")
keyboard.press_and_release("enter")
Or this one (pip install pywin32 before)
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import win32com.client
​
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
driver = webdriver.Chrome(r"path_to_chromedriver.exe", options=options)
driver.maximize_window()
driver.get("<replace with url>")
​
shell = win32com.client.Dispatch("WScript.Shell")
shell.SendKeys(r"<replace with username>")
shell.SendKeys("{TAB}")
shell.SendKeys("<replace with password>")
shell.SendKeys("{TAB}")

Selenium - Log in with URL failed

I am new to selenium am having trouble authenticating to a website. I was provided a link that automatically authenticates me and redirects me to a homepage of a the site. When I try running this code below I am just redirected to the login page and it says authentication failed(Changed the link for obvious reasons). I use the same link on my browser and it logs me in. Am I missing an option that I have to specify somewhere?
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('--headless')
browser = webdriver.Chrome('/home/ec2-user/test/chromedriver',options=chrome_options)
browser.get('https://examplelink.com/#/login?token=aewoijfadsklnfwjeojwfqoj234ihg')

How to login into suntrust bank account using Selenium through Python

The goal is to login to a suntrust bank account and scrape information about checking account transaction data .
I have tried using request library and the selenium library . I am currently using selenium to see where the code fails .
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
LOGIN_URL = 'https://login.onlinebanking.suntrust.com/olb/login'
userID = 'username'
password = 'password'
chrome_path= "path_to_chromedriver"
chrome_options=webdriver.ChromeOptions()
driver=webdriver.Chrome(chrome_path)
driver.get(LOGIN_URL)
time.sleep(5)
driver.get_cookies()
driver.find_element_by_id('userId').send_keys(userID)
driver.find_element_by_id('password').send_keys(password)
driver.find_element_by_class_name("suntrust-sign-on").click()
The program should successfully log the user in . However I receive an error message sayings ReasonCode = 6004.
I have modified your code a bit and tried to login as follows:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
options = webdriver.ChromeOptions()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://login.onlinebanking.suntrust.com/olb/login")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.suntrust-input-text.ng-pristine.ng-valid.ng-touched#userId"))).send_keys("username")
driver.find_element_by_css_selector("input.suntrust-input-text.ng-untouched.ng-pristine.ng-valid#password").send_keys("password")
driver.find_element_by_css_selector("button.suntrust-sign-on.suntrust-button-text>span").click()
But was still unable to login.
Now on inspecting the DOM Tree of SUNTRUST - Online Banking Sign On login page you will find the following tags within the <body> tag:
<script type="text/javascript" src="dist/runtime.7d6aba6a1596ee0b757c.js"></script>
<script type="text/javascript" src="dist/polyfills.65913a8531010587b6fe.js"></script>
<script type="text/javascript" src="dist/scripts.46e57c2d57ad1b3d210d.js"></script>
<script type="text/javascript" src="dist/vendor.43f2240dc35276d98b10.js"></script>
<script type="text/javascript" src="dist/main.5d227767baa37ef78819.js"></script>
Snapshot
The presence of the phrase dist is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.
Distil
As per the article There Really Is Something About Distil.it...:
Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.
Further,
"One pattern with **Selenium** was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".
Reference
You can find a couple of relevant discussions in:
Chrome browser initiated through ChromeDriver gets detected
Unable to use Selenium to automate Chase site login

Selenium web driver - login without hard-coding the password in the code

how can I login to an application using Selenium webdriver without hard-coding the password in the code?
Plus I want to hide the 'geckodriver.exe window' during execution.
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import keys
import getpass
driver = webdriver.Firefox()
driver.maxmimize_window()
driver.get("https://example.com")
time.sleep(10)
username=driver.find_element_by_id("txtUserID")
username.clear()
username.send_keys("abc#xyz.com")
.....the rest of the code will continue....
xvfb is a common way of doing this. Searching for "selenium xvfb" should find lots, such as:
How do I run Selenium in Xvfb?
Is it possible to run Selenium scripts without having an X server running, too?
These will help you for headless firefox.
how can I login to an application using Selenium webdriver without
hard-coding the password in the code?
For this you need to create one GUI or one screen from where user add the password and then you will get that value which was added by user and then use as a password in a specific website.
Add an option for firefox headless . Than it will not be seen in the view.
binary.add_command_line_options('-headless')
You will get the total procedure here
For not hardcoding the password you can create a config file and put the password or other configurations there. Then you can parse the password from that config file.
If you want to minimize the browser just apply below code
driver.manage().window().setPosition(new Point(-2000, 0));
Another option is to use the open source Blinq.io, that copies the cookies of the login and then injects them in the automation test . Effectively it enables you to bypass the login of the application when doing test automation. The code is in github: https://github.com/blinq-io/selenium-session-manager/

Resources