I want to periodically scrape a website with Selenium and a headless PhantomJS driver.
My boss wants me to run it "in the cloud" for reasons, and a serverless Azure Function looks like it could be a useful way to do it, instead of having to run a VM or something.
I've got my VS.net code to do the scraping mostly done, but I just realized that I'm not sure if I can actually deploy it as a function, since it looks like it wants me to include the phantomjs.exe in my project in order to run, which may not work in a Azure Function...
Can I do what I wanted to do, or should I explore other options?
PhantomJS is a known unsupported framework in App Service, which is the same environment Azure Functions runs on.
You can find more information here: https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#unsupported-frameworks
Related
Some context: Our Cloud Build process relies on manual triggers and about 8 substitutions to customize deploys to various firebase projects, hosting sites, and preview channels. Previously we used a bash script and gcloud to automate the selection of these substitution options, the "updating" of the trigger (via gcloud beta builds triggers import: our needs require us to use a single trigger, it's a long story), and the "running" of the trigger.
This bash script was hard to work with and improve, and through the import-run shenanigans actually led to some faulty deploys that caused all kinds of chaos: not great.
However, recently I found a way to pass substitution variables as part of a manual trigger operation using the Node.js library for Cloud Build (runTrigger with subs passed as part of the request)!
Problem: So I'm converting our build utility to Node, which is great, but as far as I can tell there isn't a native way to steam build logs from a running build in the console (except maybe with exec, but that feels hacky).
Am I missing something? Or should I be looking at one of the logging libraries?
I've tried my best scanning Google's docs and APIs (Cloud Build REST, the Node client library, etc.) but to no avail.
i have written a C# selenium code which opens browser and takes the screenshot . Code is running fine in my local laptop but when i try to deploy the code on azure webjobs. code fails to run. It may be because of local system has chrome browser installed whereas it is missing from cloud. I have even included the chrome.exe file with my directory but not finding a success. i am stuck and tried many thing but not found a way to proceed .
Selenium is not supported by Azure Web App/Web Jobs currently. It's noted under Unsupported frameworks tags.
Azure Web App sandbox:Unsupported frameworks
Other scenarios that are not supported:
PhantomJS/Selenium: tries to connect to local address, and also uses
GDI+.
I'm trying to get Selenium Grid and Jenkins working together in GKE.
I found the Selenium plugin (https://plugins.jenkins.io/selenium) for Jenkins, but I'm not sure it can be used to get what I want.
I stood Jenkins up by following the steps here:
https://github.com/GoogleCloudPlatform/kube-jenkins-imager
( I changed the image for the jenkins node to use Jenkins 2.86 )
This creates an instance of Jenkins running in kubernetes that spawns slaves into the cluster as needed.
But I don't believe that this is compatible with the Selenium plug-in. What's the best way to take what I have and get it working with this instance of Jenkins?
I was also able to get an instance of Selenium up and going in the same cluster using this:
https://gist.github.com/elsonrodriguez/261e746cf369a60a5e2d
( I dropped the version 2.x from the instances to pull in the latest containers. )
I had to bump the k8s nodes up to n1-standard-2 (2 vCPUs, 7.5 G Memory ) to get those containers to run.
For this proof of concept, the SE nodes don't need to be ephemeral. But I'm unsure what kind of permanent node container image I can deploy in k8s that would have the necessary SE drivers.
On the other hand, maybe it would be easier to just use the stand-alone SE containers that I found. If so, how do I use them with Jenkins2?
Has anyone else gone down this path?
Edit: I'm not interested in third-party selenium services at this time.
SauceLabs is a selenium grid in the cloud.
I wrote Saucery to make integrating from C# or Java with NUnit2, NUnit3 or JUnit 4 easy.
You can see the source code here, here and here or take a look at the Github Pages site here for more information.
Here is what I figured out.
I saw many indications that it was a hassle to run your own instance of Selenium grid. Enough time may have passed for this to be a little easier than it used to be. There seem to be a few ways to do it.
Jenkins itself has a plugin that is supposed to turn your Jenkins cluster into a Selenium 3 grid: https://plugins.jenkins.io/selenium . The problem I had with this is that I'm planning on hosting these instances in the cloud, and I wanted the Jenkins slaves to be ephemeral. I was unable to figure out how to get the plugin to work with ephemeral slaves.
I was trying to get this done as quickly as I could, so I only spent three days total on this project.
These are the forked repos that I'm using for the Jenkins solution:
https://github.com/jnorment-q2/kube-jenkins-imager
which basically implements this:
https://github.com/jnorment-q2/continuous-deployment-on-kubernetes
I'm pointing to my own repos to give a reference to exactly what I used in late October 2017 to get this working. Those repos are forked from the main repos, and it should be easy to compare the differences.
I had contacted google support with a question, they responded that this link might actually be a bit clearer:
https://cloud.google.com/solutions/jenkins-on-container-engine-tutorial
From what I can tell, this is a manual version of the more automated scripts I referenced.
To stand up Selenium, I used this:
https://github.com/jnorment-q2/selenium-on-k8s
This is a project I built from a gist referenced in the Readme, which references a project maintained by SeleniumHQ.
The main trick here is that Selenium is resource hungry. I had to use the second tier of google compute engines in order for it to deploy in Kubernetes. I adapted the script I used to stand up Jenkins to deploy Selenium Grid in a similar fashion.
Also of note, there appear to be only Firefox and Chrome options in the project from SeleniumHQ. I have yet to determine if it is even possible to run an instance of Safari.
For now, this is what we're going to go with.
The piece left is how to make a call to the Selenium grid from Jenkins. It turns out that selenium can be pip-installed into ephemeral slaves, and webdriver.Remote can be used to make the call.
Here is the demo script that I wrote to prove that everything works:
https://github.com/jnorment-q2/demo-se-webdriver-pytest/blob/master/test/testmod.py
It has a Jenkinsfile, so it should work with a fresh instance of Jenkins. Just create a new pipeline, change definition to 'Pipeline script from SCM', Git, https://github.com/jnorment-q2/demo-se-webdriver-pytest, then scroll up and click 'run with parameters' and add the parameter SE_GRID_SERVER with the full url ( including port ) of the SE grid server.
It should run three tests and fail on the third. ( The third test requires additional parameters for TEST_URL and TEST_URL_TITLE )
I recently opened an account with PythonAnywhere and learnt it is an online IDE and web hosting service but as a beginner in python 3.4, what exactly can i do with it?
PythonAnywhere dev here,
You can use PythonAnywhere to do most of the things you can do on your own computer with Python
start a Python interactive console (from the "Consoles" tab)
edit a python file and run it (from the "Files" tab)
The exception is that, if you want to do things with graphics, like use pygame, that won't work on PythonAnywhere. But most text-based console things will work.
You can also do some more funky things, like host a web application ("Web"), and schedule tasks to run at regular intervals ("Schedule"). If you upgrade to a premium account, you can also run "Jupyter Notebooks", which are popular in the scientific commmunity.
If you need help with anything, drop us a line to support#pythonanywhere.com
Pythonanywhere is a cloud PAAS, what that means is you can just worry about coding and leave the headache of hosting, platform, DB and PAAS considerations on pythonanywhere. Anyone who has tried to deploy a website prior to the cloud days can attest to how many more things developers had to worry about
A good example to get started
https://technovechno.com/free-website-creation-hosting-publishing-in-the-cloud-using-pythonanywhere/
I have an app using Selenium's ChromeDriver to click and retrieve a file from a website. I have decided to publish it as an Azure Job, would I still be able to run parts of the code that interface the ChromeDriver?
Also, I prefer not to use PhantomJS as sometimes it throws an error that the element is unclickable.
Many thanks in advance for your help.
Might not be possible on Azure App Service.
From https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#unsupported-frameworks
Unsupported Frameworks
[...]
PhantomJS/Selenium: tries to connect to local address, and also uses GDI+.
If you can convince Selenium to not bind to a socket on 127.0.0.1 and if you're not using stuff that hooks into GDI+ then it MAY work, it's a long shot though.
Try Cloud Services with a Worker Role instead or a VM (IaaS).
I deployed a few functional tests (Phantom) in a Webrole (Cloud Services) and everything went without a hitch. In my particular case i'm calling those tests from the build server over REST.