Multithreading with Selenium using Python and Telpot - multithreading

I'm coding my first telegram bot, but now I have to serve multiple user at the same time.
This code it's just a little part, but it should help me to use multithread with selenium
class MessageCounter(telepot.helper.ChatHandler):
def __init__(self, *args, **kwargs):
super(MessageCounter, self).__init__(*args, **kwargs)
def on_chat_message(self, msg):
content_type, chat_type, chat_id = telepot.glance(msg)
chat_id = str(chat_id)
browser = browserSelenium.start_browser(chat_id)
userIsLogged = igLogin.checkAlreadyLoggedIn(browser, chat_id)
print(userIsLogged)
TOKEN = "***"
bot = telepot.DelegatorBot(TOKEN, [
pave_event_space()(
per_chat_id(), create_open, MessageCounter, timeout=10),
])
MessageLoop(bot).run_as_thread()
while 1:
time.sleep(10)
when the bot recive any message it starts a selenium session calling this function:
def start_browser(chat_id):
global browser
try:
browser.get('https://www.google.com')
#igLogin.checkAlreadyLoggedIn(browser)
#links = telegram.getLinks(24)
#instagramLikes(browser, links)
except Exception as e:
print("type error: " + str(e))
print('No such session! starting webDivers!')
sleep(3)
# CLIENT CONNECTION !!
chrome_options = Options()
chrome_options.add_argument('user-data-dir=/home/ale/botTelegram/users/'+ chat_id +'/cookies')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--lang=en')
print("Starting WebDrivers")
browser = webdriver.Chrome(options=chrome_options)
start_browser(chat_id)
return browser
and then this one check if the user is logged:
def checkAlreadyLoggedIn(browser, chat_id):
browser.get('https://www.instagram.com/instagram/')
try:
WebDriverWait(browser, 5).until(EC.element_to_be_clickable(
(By.XPATH, instagramClicks.buttonGoToProfile))).click()
print('User already Logged')
return True
except:
print('User not Logged')
userLogged = login(browser, chat_id)
return userLogged
and if the user is not logged it try to log the user in whit username and password
so, basically, when I write at the bot with one account everithing works fine, but if I write to the bot from two different account it opens two browser, but it controll just one.
What I mean it's that for example, one window remain over the google page, and then the other one recive two times the comand, so, even when it has to write the username, it writes the username two times
How can I interract with multiple sessions?

WebDriver is not thread-safe. Having said that, if you can serialise access to the underlying driver instance, you can share a reference in more than one thread. This is not advisable. But you can always instantiate one WebDriver instance for each thread.
Ideally the issue of thread-safety isn't in your code but in the actual browser bindings. They all assume there will only be one command at a time (e.g. like a real user). But on the other hand you can always instantiate one WebDriver instance for each thread which will launch multiple browsing tabs/windows. Till this point it seems your program is perfect.
Now, different threads can be run on same Webdriver, but then the results of the tests would not be what you expect. The reason behind is, when you use multi-threading to run different tests on different tabs/windows a little bit of thread safety coding is required or else the actions you will perform like click() or send_keys() will go to the opened tab/window that is currently having the focus regardless of the thread you expect to be running. Which essentially means all the test will run simultaneously on the same tab/window that has focus but not on the intended tab/window.
Reference
You can find a relevant detailed discussion in:
Chrome crashes after several hours while multiprocessing using Selenium through Python

Related

What is best practice to interact with subprocesses in python

I'm building an apllication which is intended to do a bulk-job processing data within another software. To control the other software automatically I'm using pyautoit, and everything works fine, except for application errors, caused from the external software, which occur from time to time.
To handle those cases, I built a watchdog:
It starts the script with the bulk job within a subprocess
process = subprocess.Popen(['python', job_script, src_path], stdout=subprocess.PIPE,
stderr=subprocess.PIPE, shell=True)
It listens to the system event using winevt.EventLog module
EventLog.Subscribe('System', 'Event/System[Level<=2]', handle_event)
In case of an error occurs, it shuts down everything and re-starts the script again.
Ok, if an system error event occurs, this event should get handled in a way, that the supprocess gets notified. This notification should then lead to the following action within the subprocess:
Within the subprocess there's an object controlling everything and continuously collecting
generated data. In order to not having to start the whole job from the beginnig, after re-starting the script, this object has to be dumped using pickle (which isn't the problem here!)
Listening to the system event from inside the subprocess didn't work. It results in a continuous loop, when calling subprocess.Popen().
So, my question is how I can either subscribe for system events from inside a childproces, or communicate between the parent and childprocess - means, sending a message like "hey, an errorocurred", listening within the subprocess and then creating the dump?
I'm really sorry not being allowed to post any code in this case. But I hope (and actually think), that my description should be understandable. My question is just about what module to use to accomplish this in the best way?
Would be really happy, if somebody could point me into the right direction...
Br,
Mic
I believe the best answer may lie here: https://docs.python.org/3/library/subprocess.html#subprocess.Popen.stdin
These attributes should allow for proper communication between the different processes fairly easily, and without any other dependancies.
Note that Popen.communicate() may suit better if other processes may cause issues.
EDIT to add example scripts:
main.py
from subprocess import *
import sys
def check_output(p):
out = p.stdout.readline()
return out
def send_data(p, data):
p.stdin.write(bytes(f'{data}\r\n', 'utf8')) # auto newline
p.stdin.flush()
def initiate(p):
#p.stdin.write(bytes('init\r\n', 'utf8')) # function to send first communication
#p.stdin.flush()
send_data(p, 'init')
return check_output(p)
def test(p, data):
send_data(p, data)
return check_output(p)
def main()
exe_name = 'Doc2.py'
p = Popen([sys.executable, exe_name], stdout=PIPE, stderr=STDOUT, stdin=PIPE)
print(initiate(p))
print(test(p, 'test'))
print(test(p, 'test2')) # testing responses
print(test(p, 'test3'))
if __name__ == '__main__':
main()
Doc2.py
import sys, time, random
def recv_data():
return sys.stdin.readline()
def send_data(data):
print(data)
while 1:
d = recv_data()
#print(f'd: {d}')
if d.strip() == 'test':
send_data('return')
elif d.strip() == 'init':
send_data('Acknowledge')
else:
send_data('Failed')
This is the best method I could come up with for cross-process communication. Also make sure all requests and responses don't contain newlines, or the code will break.

Python Telegram bot that gets stats from Scrapy

I'd like to write a Telegram bot which can provide Scrapy stats on request.
My try mostly works, the only issue is that forcefully closing the spider (obviously) doesn't stop the bot.
So I have two questions :
is my general approach the correct one?
is it possible to close the bot even on forceful shutdown of the spider?
Here is the relevant class:
class TelegramBot(object):
telegram_token = telegram_credentials.token
#classmethod
def from_crawler(cls, crawler):
return cls(crawler)
def __init__(self, crawler):
self.crawler = crawler
cs = crawler.signals
cs.connect(self._spider_closed, signal=signals.spider_closed)
"""Start the bot."""
# Create the Updater and pass it your bot's token.
# Make sure to set use_context=True to use the new context based callbacks
# Post version 12 this will no longer be necessary
self.updater = Updater(self.telegram_token, use_context=True)
# Get the dispatcher to register handlers
dp = self.updater.dispatcher
# on different commands - answer in Telegram
dp.add_handler(CommandHandler("stats", self.stats))
# Start the Bot
self.updater.start_polling()
def _spider_closed(self, spider, reason):
# Stop the Bot
self.updater.stop()
def stats(self, update, context):
# Send a message with the stats
msg = (
"Spider "
+ self.crawler.spider.name
+ " stats: "
+ str(self.crawler.stats.get_stats())
)
update.message.reply_text(msg)
Here you can find my full code inside the Scrapy tutorial quotes spider https://github.com/jtommi/scrapy_telegram-bot_example/blob/master/tutorial/tutorial/telegram-bot.py
My code is a combination of
latencies extension from the "Learning Scrapy" book https://github.com/scalingexcellence/scrapybook/blob/master/ch09/properties/properties/latencies.py
echobot example from the python-telegram-bot library https://github.com/python-telegram-bot/python-telegram-bot/blob/master/examples/echobot.py
official scrapy documentation on stats collection https://docs.scrapy.org/en/latest/topics/stats.html
Calling the updater.stop() will definitely stop the bot. From the docs of python-telegram-bot,
"""Stops the polling/webhook thread, the dispatcher and the job queue."""
Check if updater.stop() is called after spider closes. Bot might not stop immediately, but eventually will stop.

Selenium - Logging Into Facebook Causes Web Driver to Stall Indefinitely

I have a simple program that logs into Facebook and gets 3 urls:
def setup_driver():
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path="./chromedriver_linux",
chrome_options=chrome_options)
return driver
def log_into_facebook(driver):
driver.get("https://www.facebook.com/")
email_field = driver.find_element_by_id("email")
email_field.send_keys("<MY EMAIL ADDRESS>")
password_field = driver.find_element_by_id("pass")
password_field.send_keys("<MY FB PASSWORD>")
driver.find_element_by_id("loginbutton").click()
if __name__ == "__main__":
driver = setup_driver()
log_into_facebook(driver)
print("before getting url 1")
driver.get('https://facebook.com/2172111592857876')
print("before getting url 2")
#Stackoverflow is breaking indentation
driver.get('https://www.facebook.com/beaverconfessions/posts/2265225733546461')
print("before getting url 3")
driver.get('https://www.facebook.com/beaverconfessions/posts/640487179353666')
print("finished getting 3 urls")
On my local machine, this program runs fine. However, on my AWS EC2 instance, this program makes my instance unusable (the Python script will hang/stall after "before getting url 2" is printed to the console. While the script is hanging, the EC2 instance will become so slow that other programs on the instance will also stop working properly. I need to forcefully close the program with Ctrl-C in order for the instance to start being responsive again.). However, if I comment out log_into_facebook(driver), then the program runs fine.
I would try to get an stacktrace, but the program doesn't actually crash, rather it just never reaches "before getting url 3".
It is worth nothing, previously I was getting "invalid session id" errors with a program that was similar to this (it also logged into Facebook and then called driver.get several times).
Update: Removing the --no-sandbox option from the webdriver seemed to fix the problem. I'm not sure why. I originally had this option in place because I was previously having a "unable to fix open pages" error, and I read that "--no-sandbox" would fix the error.
chrome_options.add_argument('--no-sandbox')
Roymunson reports that the appropriate way to fix the hanging problem is:
Avoid specifying the --no-sandbox option in the webdriver.

trouble getting the current url on selenium

I want to get the current url when I am running Selenium.
I looked at this stackoverflow page: How do I get current URL in Selenium Webdriver 2 Python?
and tried the things posted but it's not working. I am attaching my code below:
from selenium import webdriver
#launch firefox
driver = webdriver.Firefox()
url1='https://poshmark.com/search?'
# search in a window a window
driver.get(url1)
xpath='//input[#id="user-search-box"]'
searchBox=driver.find_element_by_xpath(xpath)
brand="freepeople"
style="top"
searchBox.send_keys(' '.join([brand,"sequin",style]))
from selenium.webdriver.common.keys import Keys
#EQUIValent of hitting enter key
searchBox.send_keys(Keys.ENTER)
print(driver.current_url)
my code prints https://poshmark.com/search? but it should print: https://poshmark.com/search?query=freepeople+sequin+top&type=listings&department=Women because that is what selenium goes to.
The issue is that there is no lag between your searchBox.send_keys(Keys.ENTER) and print(driver.current_url).
There should be some time lag, so that the statement can pick the url change. If your code fires before url has actually changed, it gives you old url only.
The workaround would be to add time.sleep(1) to wait for 1 second. A hard coded sleep is not a good option though. You should do one of the following
Keep polling url and wait for the change to happen or the url
Wait for a object that you know would appear when the new page comes
Instead of using Keys.Enter simulate the operation using a .click() on search button if it is available
Usually when you use click method in selenium it takes cared of the page changes, so you don't see such issues. Here you press a key using selenium, which doesn't do any kind of waiting for page load. That is why you see the issue in the first place
I had the same issue and I came up with solution that uses default explicit wait (see how explicit wait works in documentation).
Here is my solution
class UrlHasChanged:
def __init__(self, old_url):
self.old_url = old_url
def __call__(self, driver):
return driver.current_url != self.old_url:
#contextmanager
def url_change(driver):
current_url = driver.current_url
yield
WebDriverWait(driver, 10).until(UrlHasChanged(current_url))
Explanation:
At first, I created my own wait condition (see here) that takes old_url as a parameter (url from before action was made) and checks whether old url is the same like current_url after some action. It returns false when both urls are the same and true otherwise.
Then, I created context manager to wrap action that I wanted to make, and I saved url before action was made, and after that I used WebDriverWait with created before wait condition.
Thanks to that solution I can now reuse this function with any action that changes url to wait for the change like that:
with url_change(driver):
login_panel.login_user(normal_user['username'], new_password)
assert driver.current_url == dashboard.url
It is safe because WebDriverWait(driver, 10).until(UrlHasChanged(current_url)) waits until current url will change and after 10 seconds it will stop waiting by throwing an exception.
What do you think about this?
I fixed this problem by clicking on the button by using href. Then do driver.get(hreflink). Click() was not working for me!

Setting and Getting Variables within an imported Class

All, I'm implementing websockets using flask/uWSGI. This is relegated to a module that's instantiated in the main application. Redacted code for the server and module:
main.py
from WSModule import WSModule
app = Flask(__name__)
wsmodule = WSModule()
websock = WebSocket(app)
#websock.route('/websocket')
def echo(ws):
wsmodule.register(ws)
print("websock clients", wsmodule.clients)
while True: # This while loop is related to the uWSGI websocket implementation
msg = ws.receive()
if msg is not None:
ws.send(msg)
else: return
#app.before_request
def before_request():
print ("app clients:",wsmodule.clients)
and WSModule.py:
class WSModule(object):
def __init__(self):
self.clients = list()
def register(self, client):
self.clients.append(client)
Problem: When a user connects using websockets (into the '/websocket' route), the wsmodule.register appends their connection socket, this works fine- printout 'websocket clients' shows the appended connection.
The issue is that I can't access those sockets from the main application. This is seen by the 'app clients' printout which never updates (list stays empty). Something is clearly updating, but how to access those changes?
It sounds like your program is being run with either threads or processes, and a wsmodule exists for each thread/process that is running.
So one wsmodule is being updated with the client info, while a different one is being asked for clients... but the one being asked is still empty.
If you are using threads, check out thread local storage.

Resources