Pyppeteer behaving differently on Linux and Windows - python-3.x

I have pyppeteer code that browses around. Let's assume it only clicks on a tags.
It runs fine on my local Windows machine, but breaks whenever I run it remotely on a Linux server. Same conda env, same code.
The relevant part of my code, simplified, looks like:
async def act(self):
element = self.element
async def get_action():
tag_name = await self.page.evaluate(
'elem => { return elem.tagName.toLowerCase(); }',
element)
action = None
if tag_name == 'a':
action = element.click()
else:
action = async_pass()
return action
async def get_action_future():
# gather syntax based on:
# https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.click
action = await get_action()
future_action = asyncio.gather(
action,
asyncio.sleep(0.001), # dirty, dirty work-around, doesn't work nicely otherwise
)
waited_future = await asyncio.shield(future_action)
if waited_future[0] is None:
await self.page.waitForNavigation(self.wait_options)
return None
await get_action_future()
It runs fine on my Windows machine.
When I start it on a Linux machine, it starts off OK, whether there's navigation or not. Then, after a few navigation clicks, I'm getting a timeout, then another error:
Error encountered: Navigation Timeout Exceeded: 20000 ms exceeded.
# then I trigger the element selector and the act method again, wrapped in try/except
Error encountered: Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.
I'm stuck on this problem for a while and would appreciate any help!
My environment includes:
python=3.6, pyppeteer=0.0.25.
BTW:
I noticed that this question has a similar error. BUT, the error is different (Protocol error (Page.navigate): Target closed instead of Protocol Error (Runtime.callFunctionOn)), as well as the environment (node.js, Puppeteer, etc.).

It seems that Pypepeteer is no longer maintained. I was experiencing the same issue and migrated to Selenium which is working well.

Related

How to lock virtualbox to get a screenshot through SOAP API

I'm trying to use the SOAP interface of Virtualbox 6.1 from Python to get a screenshot of a machine. I can start the machine but get locking errors whenever I try to retrieve the screen layout.
This is the code:
import zeep
# helper to show the session lock status
def show_lock_state(session_id):
session_state = service.ISession_getState(session_id)
print('current session state:', session_state)
# connect
client = zeep.Client('http://127.0.0.1:18083?wsdl')
service = client.create_service("{http://www.virtualbox.org/}vboxBinding", 'http://127.0.0.1:18083?wsdl')
manager_id = service.IWebsessionManager_logon('fakeuser', 'fakepassword')
session_id = service.IWebsessionManager_getSessionObject(manager_id)
# get the machine id and start it
machine_id = service.IVirtualBox_findMachine(manager_id, 'Debian')
progress_id = service.IMachine_launchVMProcess(machine_id, session_id, 'gui')
service.IProgress_waitForCompletion(progress_id, -1)
print('Machine has been started!')
show_lock_state(session_id)
# unlock and then lock to be sure, doesn't have any effect apparently
service.ISession_unlockMachine(session_id)
service.IMachine_lockMachine(machine_id, session_id, 'Shared')
show_lock_state(session_id)
console_id = service.ISession_getConsole(session_id)
display_id = service.IConsole_getDisplay(console_id)
print(service.IDisplay_getGuestScreenLayout(display_id))
The machine is started properly but the last line gives the error VirtualBox error: rc=0x80004001 which from what I read around means locked session.
I tried to release and acquire the lock again, but even though it succeeds the error remains. I went through the documentation but cannot find other types of locks that I'm supposed to use, except the Write lock which is not usable here since the machine is running. I could not find any example in any language.
I found an Android app called VBoxManager with this SOAP screenshot capability.
Running it through a MITM proxy I reconstructed the calls it performs and wrote them as the Zeep equivalent. In case anyone is interested in the future, the last lines of the above script are now:
console_id = service.ISession_getConsole(session_id)
display_id = service.IConsole_getDisplay(console_id)
resolution = service.IDisplay_getScreenResolution(display_id, 0)
print(f'display data: {resolution}')
image_data = service.IDisplay_takeScreenShotToArray(
display_id,
0,
resolution['width'],
resolution['height'],
'PNG')
with open('screenshot.png', 'wb') as f:
f.write(base64.b64decode(image_data))

Bot executing the same command twice

I am making a discord bot with the rewrite, but when my command runs, it sends the message twice
There is 100% no other calls to send that message, and it is only the 1st message(Hold on, I'm gathering the data), that is sent twice.
Here is the command's code:
#bot.command()
async def testcmd(ctx):
print("called")
msgtemp = await ctx.message.channel.send("Hold on, I'm gathering the data")
print("sent")
time.sleep(3)
await msgtemp.delete()
with open("fileofthings.txt") as fl:
await ctx.send(fl.read())
I had the same issue with my bot sending responses twice, does this happen with this particular command or it happens with other commands as well.
My theory is that you are running 2 versions of the bot meaning you get 2 messages. I developed a shutdown command in case this happens to me again
This is my code for a shutdown command if you need it.
#commands.command()
async def shutdown(self,ctx):
if ctx.message.author.id == OWNERID: #replace OWNERID with your user id
print("shutdown")
try:
await self.bot.logout()
except:
print("EnvironmentError")
self.bot.clear()
else:
await ctx.send("You do not own this bot!")
Had the same issue and it literally drove me mad. The problem may be that either you are running the same bot file from multiple devices or you are running it multiple times with the same device. My problem was fixed by following this method:
Open Task Manager in your device
Click on 'More details'
In the processes tab, search for 'python 3.9 (or whatever version you are currently using)', and click on it and click 'End task'.
Hope this resolves your issue.

Selenium - Logging Into Facebook Causes Web Driver to Stall Indefinitely

I have a simple program that logs into Facebook and gets 3 urls:
def setup_driver():
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path="./chromedriver_linux",
chrome_options=chrome_options)
return driver
def log_into_facebook(driver):
driver.get("https://www.facebook.com/")
email_field = driver.find_element_by_id("email")
email_field.send_keys("<MY EMAIL ADDRESS>")
password_field = driver.find_element_by_id("pass")
password_field.send_keys("<MY FB PASSWORD>")
driver.find_element_by_id("loginbutton").click()
if __name__ == "__main__":
driver = setup_driver()
log_into_facebook(driver)
print("before getting url 1")
driver.get('https://facebook.com/2172111592857876')
print("before getting url 2")
#Stackoverflow is breaking indentation
driver.get('https://www.facebook.com/beaverconfessions/posts/2265225733546461')
print("before getting url 3")
driver.get('https://www.facebook.com/beaverconfessions/posts/640487179353666')
print("finished getting 3 urls")
On my local machine, this program runs fine. However, on my AWS EC2 instance, this program makes my instance unusable (the Python script will hang/stall after "before getting url 2" is printed to the console. While the script is hanging, the EC2 instance will become so slow that other programs on the instance will also stop working properly. I need to forcefully close the program with Ctrl-C in order for the instance to start being responsive again.). However, if I comment out log_into_facebook(driver), then the program runs fine.
I would try to get an stacktrace, but the program doesn't actually crash, rather it just never reaches "before getting url 3".
It is worth nothing, previously I was getting "invalid session id" errors with a program that was similar to this (it also logged into Facebook and then called driver.get several times).
Update: Removing the --no-sandbox option from the webdriver seemed to fix the problem. I'm not sure why. I originally had this option in place because I was previously having a "unable to fix open pages" error, and I read that "--no-sandbox" would fix the error.
chrome_options.add_argument('--no-sandbox')
Roymunson reports that the appropriate way to fix the hanging problem is:
Avoid specifying the --no-sandbox option in the webdriver.

Multithreading with Selenium using Python and Telpot

I'm coding my first telegram bot, but now I have to serve multiple user at the same time.
This code it's just a little part, but it should help me to use multithread with selenium
class MessageCounter(telepot.helper.ChatHandler):
def __init__(self, *args, **kwargs):
super(MessageCounter, self).__init__(*args, **kwargs)
def on_chat_message(self, msg):
content_type, chat_type, chat_id = telepot.glance(msg)
chat_id = str(chat_id)
browser = browserSelenium.start_browser(chat_id)
userIsLogged = igLogin.checkAlreadyLoggedIn(browser, chat_id)
print(userIsLogged)
TOKEN = "***"
bot = telepot.DelegatorBot(TOKEN, [
pave_event_space()(
per_chat_id(), create_open, MessageCounter, timeout=10),
])
MessageLoop(bot).run_as_thread()
while 1:
time.sleep(10)
when the bot recive any message it starts a selenium session calling this function:
def start_browser(chat_id):
global browser
try:
browser.get('https://www.google.com')
#igLogin.checkAlreadyLoggedIn(browser)
#links = telegram.getLinks(24)
#instagramLikes(browser, links)
except Exception as e:
print("type error: " + str(e))
print('No such session! starting webDivers!')
sleep(3)
# CLIENT CONNECTION !!
chrome_options = Options()
chrome_options.add_argument('user-data-dir=/home/ale/botTelegram/users/'+ chat_id +'/cookies')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--lang=en')
print("Starting WebDrivers")
browser = webdriver.Chrome(options=chrome_options)
start_browser(chat_id)
return browser
and then this one check if the user is logged:
def checkAlreadyLoggedIn(browser, chat_id):
browser.get('https://www.instagram.com/instagram/')
try:
WebDriverWait(browser, 5).until(EC.element_to_be_clickable(
(By.XPATH, instagramClicks.buttonGoToProfile))).click()
print('User already Logged')
return True
except:
print('User not Logged')
userLogged = login(browser, chat_id)
return userLogged
and if the user is not logged it try to log the user in whit username and password
so, basically, when I write at the bot with one account everithing works fine, but if I write to the bot from two different account it opens two browser, but it controll just one.
What I mean it's that for example, one window remain over the google page, and then the other one recive two times the comand, so, even when it has to write the username, it writes the username two times
How can I interract with multiple sessions?
WebDriver is not thread-safe. Having said that, if you can serialise access to the underlying driver instance, you can share a reference in more than one thread. This is not advisable. But you can always instantiate one WebDriver instance for each thread.
Ideally the issue of thread-safety isn't in your code but in the actual browser bindings. They all assume there will only be one command at a time (e.g. like a real user). But on the other hand you can always instantiate one WebDriver instance for each thread which will launch multiple browsing tabs/windows. Till this point it seems your program is perfect.
Now, different threads can be run on same Webdriver, but then the results of the tests would not be what you expect. The reason behind is, when you use multi-threading to run different tests on different tabs/windows a little bit of thread safety coding is required or else the actions you will perform like click() or send_keys() will go to the opened tab/window that is currently having the focus regardless of the thread you expect to be running. Which essentially means all the test will run simultaneously on the same tab/window that has focus but not on the intended tab/window.
Reference
You can find a relevant detailed discussion in:
Chrome crashes after several hours while multiprocessing using Selenium through Python

Download page with javascript executed

I want to download a page with javascript executed using python. QT is one of solutions and here is the code:
class Downloader(QApplication):
__event = threading.Event()
def __init__(self):
QApplication.__init__(self, [])
self.webView = QWebView()
self.webView.loadFinished.connect(self.loadFinished)
def load(self, url):
self.__event.clear()
self.webView.load(QUrl(url))
while not self.__event.wait(.05): self.processEvents()
return self.webView.page().mainFrame().documentElement() if self.__ok else None
def loadFinished(self, ok):
self.__ok = ok
self.__event.set()
downloader = Downloader()
page = downloader.load(url)
The problem is that sometimes downloader.load() return a page without javascript executed. Downloader.loadStarted() and Downloader.loadFinished() are called only once.
What is the proper way to wait for a complete page download?
EDIT
If add self.webView.page().networkAccessManager().finished.connect(request_ended) into __init__() and define
def request_ended(reply):
print(reply.error(), reply.url().toString())
then it turns out that sometimes reply.error()==QNetworkReply.UnknownNetworkError. This behaviour stands when unreliable proxy is used, that fails to download some of the resources (part of which are js files), hence some of js not being executed. When proxy is not used (== connection is stable), every reply.error()==QNetworkReply.NoError.
So, the updated question is:
Is it possible to retry getting reply.request() and apply it to the self.webView?
JavaScript requires a runtime to be executed with (python alone won't do) a popular one is PhantomJS these days.
Unfortuantely, PhantomJs has no python support anymore so you could resort to e.g. Ghost.py to do this job for you which allows you to selectively execute JS you want.
You should use Selenium
It provides different WebDriver, for example, PhantomJS, or other common browsers, like firefox.

Resources