AWS Lambda Python 3.7 Web Scraping - "Could not get version for Chrome with this command: google-chrome --version" - python-3.x

Via an S3 bucket, I've uploaded a lambda function along with its dependencies as a ZIP file. The lambda function is a web scraper with the following initial code to get the scraper started:
import json
import os
import pymysql
import boto3
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--user-data-dir=/tmp/user-data')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--v=99')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--data-path=/tmp/data-path')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--homedir=/tmp')
chrome_options.add_argument('--disk-cache-dir=/tmp/cache-dir')
chrome_options.binary_location = os.getcwd() + "/bin/headless-chromium"
browser = webdriver.Chrome(executable_path=ChromeDriverManager().install(), options=chrome_options)
When I try to test the lambda function, I get the following error in the console:
{
"errorMessage": "Could not get version for Chrome with this command: google-chrome --version",
"errorType": "ValueError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 67, in lambda_handler\n browser = webdriver.Chrome(executable_path=ChromeDriverManager().install(), options=chrome_options)\n",
" File \"/var/task/webdriver_manager/chrome.py\", line 24, in install\n driver_path = self.download_driver(self.driver)\n",
" File \"/var/task/webdriver_manager/manager.py\", line 32, in download_driver\n driver_version, is_latest = self.__get_version_to_download(driver)\n",
" File \"/var/task/webdriver_manager/manager.py\", line 23, in __get_version_to_download\n return self.__get_latest_driver_version(driver), True\n",
" File \"/var/task/webdriver_manager/manager.py\", line 17, in __get_latest_driver_version\n return driver.get_latest_release_version()\n",
" File \"/var/task/webdriver_manager/driver.py\", line 54, in get_latest_release_version\n self._latest_release_url + '_' + chrome_version())\n",
" File \"/var/task/webdriver_manager/utils.py\", line 98, in chrome_version\n .format(cmd)\n"
]
}
In response, I tried editing the utils.py file in the webdriver_manager dependency folder, by using other commands like 'chrome --version' and 'chromium-browser --version' instead of 'google-chrome --version' under the function definition of 'chrome_version()', but got the similar error of not being able to the get the chrome version from the new command:
def chrome_version():
pattern = r'\d+\.\d+\.\d+'
cmd_mapping = {
OSType.LINUX: 'google-chrome --version',
OSType.MAC: r'/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --version',
OSType.WIN: r'reg query "HKEY_CURRENT_USER\Software\Google\Chrome\BLBeacon" /v version'
}
cmd = cmd_mapping[os_name()]
stdout = os.popen(cmd).read()
version = re.search(pattern, stdout)
if not version:
raise ValueError(
'Could not get version for Chrome with this command: {}'
.format(cmd)
)
return version.group(0)
Can anyone tell me what command I should be using instead of 'google-chrome --version'?

By default, Google Chrome does not exists on the container that runs our lambda functions.
I'm implementing similar solutions but with JavaScript and the way I solve is by using a micro-browser (Chromium) using the following packages:
"chrome-aws-lambda": "^1.19.0",
"puppeteer-core": "^1.19.0"
For Python, here is a tutorial that might help in your situation.
https://robertorocha.info/setting-up-a-selenium-web-scraper-on-aws-lambda-with-python/

Related

Selenium XVFB - Unable to receive message from renderer

Overview:
Selenium scraper works perfectly in headless mode. Spawning a virtual display shows no errors via XVFB:
from xvfbwrapper import Xvfb
vdisplay = Xvfb()
vdisplay.start()
vdisplay.stop()
But when I try to run them together, it errors out with:
[ERROR] SessionNotCreatedException: Message: session not created
from disconnected: Unable to receive message from renderer (Session info: chrome=96.0.4664.0)
Traceback:
Traceback (most recent call last):
File "/var/task/slack_main.py", line 34, in handler
scrape_price(asin_list)
File "/var/task/slack_main.py", line 58, in scrape_price
driver = webdriver.Chrome("/opt/chromedriver",options=options)
File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 70, in __init__
super(WebDriver, self).__init__(DesiredCapabilities.CHROME['browserName'], "goog",
File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/chromium/webdriver.py", line 92, in __init__
RemoteWebDriver.__init__(
File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 275, in __init__
self.start_session(capabilities, browser_profile)
File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 365, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 430, in execute
self.error_handler.check_response(response)
File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
Configuration:
Below is my complete Selenium and XVFB configuration:
from selenium import webdriver
from selenium_stealth import stealth
from xvfbwrapper import Xvfb
vdisplay = Xvfb()
vdisplay.start()
options = webdriver.ChromeOptions()
prefs = {"browser.downloads.dir": "//tmp//", "download.default_directory": "//tmp//", "directory_upgrade": True}
options.add_experimental_option("prefs", prefs)
options.binary_location = '/opt/chrome/chrome'
#options.add_argument('--headless') #toggled on and off when running with or without XVFB
options.add_argument('--no-sandbox')
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1280x1696")
options.add_argument("--single-process")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-dev-tools")
options.add_argument("--no-zygote")
options.set_capability('unhandledPromptBehavior', 'ignore')
options.add_argument("download.default_directory=/tmp")
driver = webdriver.Chrome("/opt/chromedriver",options=options)
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
fix_hairline=True,
)
vdisplay.stop()
driver.close()
Why is it not connecting to the display? My guess is it has something to do with the '--headless' toggle?
Versions and Tools:
Selenium version 3.141.0
xvfbwrapper version 0.2.9
Docker is used to compile and push to AWS Lambda, base image used (no changes are made in docker file with or without XVFB)
Edit:
Found a pull request for XVFB configuration in the Github Repo of my base image. Even used the exact same code from the pull request and I still recieve the exact same error. Maybe this has something to do with AWS offboard?

Unable to run Python Script from within an Ansible Playbook

I am trying to write an ansible playbook to crawl a website and then store its contents into a static file under aws s3 bucket. Here is the crawler code :
"""
Handling pages with the Next button
"""
import sys
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
url = "https://xyz.co.uk/"
file_name = "web_content.txt"
while True:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
raw_html = soup.prettify()
file = open(file_name, 'wb')
print('Collecting the website contents')
file.write(raw_html.encode())
file.close()
print('Saved to %s' % file_name)
#print(type(raw_html))
# Finding next page
next_page_element = soup.select_one('li.next > a')
if next_page_element:
next_page_url = next_page_element.get('href')
url = urljoin(url, next_page_url)
else:
break
This is my ansible-playbook:
---
- name: create s3 bucket and upload static website content into it
hosts: localhost
connection: local
tasks:
- name: create a s3 bucket
amazon.aws.aws_s3:
bucket: testbucket393647914679149
region: ap-south-1
mode: create
- name: create a folder in the bucket
amazon.aws.aws_s3:
bucket: testbucket393647914679149
object: /my/directory/path
mode: create
- name: Upgrade pip
pip:
name: pip
version: 21.1.3
- name: install virtualenv via pip
pip:
requirements: /root/ansible/requirements.txt
virtualenv: /root/ansible/myvenv
virtualenv_python: python3.6
environment:
PATH: "{{ ansible_env.PATH }}:{{ ansible_user_dir }}/.local/bin"
- name: Run script to crawl the website
script: /root/ansible/beautiful_crawl.py
- name: copy file into bucket folder
amazon.aws.aws_s3:
bucket: testbucket393647914679149
object: /my/directory/path/web_content.text
src: web_content.text
mode: put
Problem is when I run this, it runs fine upto the task name: install virtualenv via pip and then throws following error while executing the task name: Run script to crawl the website:
fatal: [localhost]: FAILED! => {"changed": true, "msg": "non-zero return code", "rc": 2, "stderr": "/root/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-9798 3643645466/beautiful_crawl.py: line 1: import: command not found\n/root/.ansible /tmp/ansible-tmp-1625137700.8854306-13026-97983643645466/beautiful_crawl.py: lin e 2: from: command not found\n/root/.ansible/tmp/ansible-tmp-1625137700.8854306- 13026-97983643645466/beautiful_crawl.py: line 3: import: command not found\n/roo t/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-97983643645466/beautiful_cra wl.py: line 4: from: command not found\n/root/.ansible/tmp/ansible-tmp-162513770 0.8854306-13026-97983643645466/beautiful_crawl.py: line 6: url: command not foun d\n/root/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-97983643645466/beauti ful_crawl.py: line 7: file_name: command not found\n/root/.ansible/tmp/ansible-t mp-1625137700.8854306-13026-97983643645466/beautiful_crawl.py: line 10: syntax e rror near unexpected token ('\n/root/.ansible/tmp/ansible-tmp-1625137700.885430 6-13026-97983643645466/beautiful_crawl.py: line 10: response = requests.get (url)'\n", "stderr_lines": ["/root/.ansible/tmp/ansible-tmp-1625137700.8854306-1 3026-97983643645466/beautiful_crawl.py: line 1: import: command not found", "/ro ot/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-97983643645466/beautiful_cr awl.py: line 2: from: command not found", "/root/.ansible/tmp/ansible-tmp-162513 7700.8854306-13026-97983643645466/beautiful_crawl.py: line 3: import: command no t found", "/root/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-9798364364546 6/beautiful_crawl.py: line 4: from: command not found", "/root/.ansible/tmp/ansi ble-tmp-1625137700.8854306-13026-97983643645466/beautiful_crawl.py: line 6: url: command not found", "/root/.ansible/tmp/ansible-tmp-1625137700.8854306-13026-97 983643645466/beautiful_crawl.py: line 7: file_name: command not found", "/root/. ansible/tmp/ansible-tmp-1625137700.8854306-13026-97983643645466/beautiful_crawl. py: line 10: syntax error near unexpected token ('", "/root/.ansible/tmp/ansibl e-tmp-1625137700.8854306-13026-97983643645466/beautiful_crawl.py: line 10: response = requests.get(url)'"], "stdout": "", "stdout_lines": []}
What am I doing wrong here?
You have multiple problems.
Check the documentation.
No. 1: The script modules will run bash scripts by default, not python scripts. If you want to run a python script, you need to add a shebang like #!/usr/bin/env python3 as the first line of the script or use the executable parameter.
No 2: You create a venv, so I assume you want to run the script in that venv. You can't do that out of the box with the script module, so you would need to work around that.
This should work for you (you don't need the shebang, as you tell the script module to run it with python in the venv using the executable parameter):
- name: Run script to crawl the website
script: /root/ansible/beautiful_crawl.py
executable: /root/ansible/myvenv/bin/python

The run time error by import python module

I am trying to make correctly import the function "mmenu" from another module
I got a run time error:
Traceback (most recent call last):
File "path.../venv/src/routes.py", line 4, in <module>
from venv.src.main_menu import mmenu
ModuleNotFoundError: No module named 'venv.src'
after Java this Python is a bit of a mystery to me :)
I have 2 files in the same "src" directory
main_menu.py
def mmenu():
menu = [{"name": "HOME", "url": "home"},
{"name": "fooo", "url": "foo"},
{"name": "bar", "url": "bar"},
{"name": "CONTACT", "url": "contact"}]
return menu
routes.py
from flask import Flask, render_template, request, flash, session, redirect, abort
from jinja2 import Template
from flask.helpers import url_for
from venv.src.main_menu import mmenu
app = Flask(__name__)
menu = mmenu
#app.route("/")
#app.route("/index")
def index():
# return "index"
print("loaded" + url_for('index'))
return render_template('index.html', title="Index page", menu=menu)
# ...
if __name__ == "__main__":
app.run(debug=True)
import was generated by IDE like:
from venv.src.parts.main_menu import mmenu
Your IDE (which IDE?) is likely misconfigured if it generates imports like that.
The venv itself, nor any src directory within them, shouldn't be within import paths.
You never mention you have a parts/ package, but from the import I'll assume you do, and your structure is something like
venv/
src/ # source root
routes.py
parts/ # parts package
__init__.py # (empty) init for parts package
main_menu.py
With this structure, main_menu is importable as parts.main_menu from routes.py assuming you start the program with python routes.py.
If you have __init__.py files in venv/ and/or src/, get rid of them; you don't want those directories to be assumed to be Python packages.

cx_Freeze error module SSL not available Python 3.7 Windows 10

I made a program in Python 3 to make a board of a Bot to Crypto currency. The program works fine without error but with cx_Freeze I have an error on a query with coinmarketcap with the error that the SSL module is missing.
import sys
from cx_Freeze import setup, Executable
import os
import requests.certs
packages = ["tkinter", "requests", "idna", "queue", "coinmarketcap", "requests_cache", "PIL", "urllib3", "OpenSSL", "ssl", "arrow", "tempfile", "json", "locale", "C:\\Users\\cavaud\\Desktop\\botTKinker\\config", "time", "sys", "MySQLdb", "urllib.request"]
includeFile = [requests.certs.where(), "cacert.pem", "ico24x24.ico" , "bas.png", "haut.png", "egal.png", "level.png", "logoBotV2H2.png", "orderNOK.gif", "orderOK.gif", "C:\\Users\\cavaud\\AppData\\Local\\Programs\\Python\\Python37-32\\DLLs\\tcl86t.dll", "C:\\Users\\cavaud\\AppData\\Local\\Programs\\Python\\Python37-32\\DLLs\\tk86t.dll"]
path = sys.path
os.environ['TCL_LIBRARY'] = "C:\\Users\\cavaud\\AppData\\Local\\Programs\\Python\\Python37-32\\tcl\\tcl8.6"
os.environ['TK_LIBRARY'] = "C:\\Users\\cavaud\\AppData\\Local\\Programs\\Python\\Python37-32\\tcl\\tk8.6"
os.environ['REQUESTS_CA_BUNDLE'] = "cacert.pem"
base = None
if sys.platform == "win32":
base = "Win32GUI"
options = { "path": path,
"includes": includeModule,
"include_files": includeFile,
"packages" : packages,
"silent": False
}
options["include_msvcr"] = True
cible_1 = Executable(
script="botTK.py",
base=base,
icon="ico24x24.ico"
)
setup(
name="BotTK",
version="1.00",
description="BOT TK",
author="moi",
options={"build_exe": options},
executables=[cible_1]
)
Thank you
Try to modify your setup.py script as follows:
includeFile = [(requests.certs.where(), "cacert.pem"), "ico24x24.ico" , "bas.png", "haut.png", "egal.png", "level.png", "logoBotV2H2.png", "orderNOK.gif", "orderOK.gif", "C:\\Users\\cavaud\\AppData\\Local\\Programs\\Python\\Python37-32\\DLLs\\tcl86t.dll", "C:\\Users\\cavaud\\AppData\\Local\\Programs\\Python\\Python37-32\\DLLs\\tk86t.dll"]
(please note the parentheses around the first two entries!), and
os.environ['REQUESTS_CA_BUNDLE'] = os.path.join(os.getcwd(), "cacert.pem")
See Requests library: missing SSL handshake certificates file after cx_Freeze

Using cx_freeze in PyQt5, can't find PyQt5

I want to build a standalone binary file for windowz(xp, 7, ...) from my python3(+ PyQt5) script and I inevitably use cx_freeze because other freezing apps do not work with python3 (like py2exe, pyinstaller).
I read the cx_freeze docs and lots of stackoverflow asks ans use this config for setup.py‍‍‍ file :
import sys
from cx_Freeze import setup, Executable
path_platforms = ( "C:\Python33\Lib\site-packages\PyQt5\plugins\platforms\qwindows.dll", "platforms\qwindows.dll" )
includes = ["atexit","PyQt5.QtCore","PyQt5.QtGui", "PyQt5.QtWidgets"]
includefiles = [path_platforms]
excludes = [
'_gtkagg', '_tkagg', 'bsddb', 'curses', 'email', 'pywin.debugger',
'pywin.debugger.dbgcon', 'pywin.dialogs', 'tcl',
'Tkconstants', 'Tkinter'
]
packages = ["os"]
path = []
# Dependencies are automatically detected, but it might need fine tuning.
build_exe_options = {
"includes": includes,
"include_files": includefiles,
"excludes": excludes,
"packages": packages,
"path": path
}
# GUI applications require a different base on Windows (the default is for a
# console application).
base = None
exe = None
if sys.platform == "win32":
exe = Executable(
script="D:\\imi\\aptanaWorKPCworkspace\\azhtel\\tel.py",
initScript = None,
base="Win32GUI",
targetDir = r"dist",
targetName="tel.exe",
compress = True,
copyDependentFiles = True,
appendScriptToExe = False,
appendScriptToLibrary = False,
icon = None
)
setup(
name = "telll",
version = "0.1",
author = 'me',
description = "My GUI application!",
options = {"build_exe": build_exe_options},
executables = [exe]
)
run with:
python D:\imi\aptanaWorKPCworkspace\azhtel\setup.py build
This is my library that I used:
from PyQt5 import QtGui, QtCore, QtWidgets
import sys
from telGui import Ui_MainWindow
import mysql
import mysql.connector
from mysql.connector import errorcode
and this is my files in workspace:
But this error happened (or another kind of errors).
Why this happened and what config for setup.py is good for pyqt5 app ??
Thanks.
Python3.3, PyQt5, Mysqlconnector.
I solved this problem with find another directory near the dist directory called build and all library files are in there, i delete targetDir = r"dist" part of setup.py and everythings is alright !
Try pyinstaller. It's much better than cxfreeze.

Resources