Selenium::WebDriver::Error with Firefox & Chrome - linux

Problems
1. Browser = Firefox (Non Geckodriver, Selenium v2.53.4)
(Works on one linux thin client but not on another...)
$ bundle exec rake parallel:spec
Selenium::WebDriver::Error::WebDriverError:
unable to bind to locking port 7054 within 120 seconds
2. Broswer = Firefox (Geckodriver v0.14.0, Selenium-webdriver v3.1.0)
$ bundle exec rake parallel:spec
Net::ReadTimeout:
Net::ReadTimeout
3. Browser = Chrome (Chromedriver v2.27, Selenium-webdriver v3.1.0)
$ bundle exec rake parallel:spec
Selenium::WebDriver::Error::NoSuchDriverError:
no such session
(Driver info: chromedriver=2.27.440175 ,platform=Linux 3.16.0-0.bpo.4-amd64 x86_64)
My Setup
Server with the following installed:
-Linux - Debian x86_64 Wheezy
-ruby 2.2.5p319 (2016-04-26 revision 54774)
-Firefox v46.0.3
-Chrome 56.0.2924.87 (64-bit)
-ChromeDriver 2.27.440175
-Xvfb (x11-xserver-utils v 7.7~3 through headless gem)
Gems
-Selenium v3.1.0 (was 2.53.4)
-parallel_tests v2.10.0
-capybara (2.7.1)
-rspec-activemodel-mocks (1.0.3)
-rspec-core (3.4.4)
-rspec-expectations (3.4.0)
-rspec-mocks (3.4.1)
-rspec-rails (3.4.2)
-rspec-support (3.4.1)
-headless (2.2.3) (Xvfb)
Multiple thin clients running their software off the mentioned server setup.
My computer is one of many...
Important: There is another computer that does NOT encounter the mentioned problem running the same software and same versions off the same server!
Things that are NOT the problem
It is not an incompatibility between my firefox browser version and Selenium.
Why not?
a) Firefox v46.0.3 and Selenium v2.53.4 is currently installed on our server and another client of this server successfully runs parallel_tests using the mentioned versions of Firefox & Selenium.
b) Which Firefox version is compatible with Selenium 2.53.0?
There are no zombie processes(still running firefox) causing firefox to lock port 7054.
This is specifically after each failure has occurred and prior to starting a new $ bundle exec rake parallel:spec run.
Why not?
refer to items 1 & 2 in 'Things I Have Tried'
Turned Out this was not the case, although it was not the cause of the problem
Databases were not always properly killed, See Update 5.
However the databases not being killed were an outcome of the problem.
They were not the cause of the problem, refer to solution section.
Side note
For those wishing to install the mentioned versions to get selenium / firefox working:
Installing a previous version doesn't fix most problems
Things I have tried
Removed any processes still running
$ killall ruby; killall rspec; killall firefox
Result: Failed...
Discovered that completing step 1 is not enough to kill all zombie processes.
After logging out into a different tty i discovered that there was still an rspec, ruby and firefox process running!
So I logged out of my user, logged into a new tty, killed all zombie processes using:
$ kill -9 process_id
I then rerun $ ps aux to ensure all processes are cleaned.
Result: Failed...
Gain insight into the problem.
Ran $ lsof -i TCP:7054 see what is holding that process.
Result: It was my firefox test, no suprise, no real insight gained.
Ensured parallel test databases were running correctly.
Dropped all databases, recreated databases, reloaded all schemas, reseeded (development), reprepared.
Result: Failed... I doubted this was the cause, but doing this certainly eliminated it.
Deleted firefox cache, all persisted setting, everything, for a clean start.
Result: Failed...
Try to eliminate any local environment variables obtained from the project.
Did this by copying the project directory from the working computer.
Then reran $ bundle exec rake parallel:spec.
Result: Failed...
Try to eliminate all local environment variables (project and linux).
Did this by creating a new linux user.
Then switched into the new user.
$ su new_user -l
Copied over the minimum zsh items needed.
Then ran $ bundle exec rake parallel:spec
Result: Failed...
Ensured that /etc/hosts contained:.
127.0.0.1 localhost
Result: Failed...
Running the tests in a single thread (not parallel).
$ rspec spec
Result: Successfully runs (does not hit the problem)
See Update 1
See Update 2
See Update 3
See Update 4
See Update 5
Partial Solution
See Update 6
Debugged Selenium & Parallel_tests gems
Result: Identified that the issue is NOT in Selenium
See Update 7
Result: Running tests in parallel worked. But why?
See Update 8
Result:
Discovered Selenium 3.1.0 changed the way files are automatically downloaded.
This caused tests to hang indefinitely whilst running parallel test.
Which caused the databases to be held open.
Things I am going to try (Updates)
Run tests with chromedriver in chrome browser and see if it passes after the fix.
Update 1
I replaced firefox for chrome.
When I run a single test, the test successfully completes with chrome.
It did so with firefox as well.
However running $ bundle exec rake parallel:spec
Result: Failed...
Selenium::WebDriver::Error::NoSuchDriverError:
no such session
(Driver info: chromedriver=2.27.440175 ,platform=Linux 3.16.0-0.bpo.4-amd64 x86_64)
Update 2
I updated selenium-webdriver gem to the latest gem (was v2.53.4 now 3.2.2)
Result: Failed...
Selenium::WebDriver::Error::NoSuchDriverError:
no such session
(Driver info: chromedriver=2.27.440175 ,platform=Linux 3.16.0-0.bpo.4-amd64 x86_64)
Update 3
Located lock file for parallel test (~.config/google-chrome).
Identified 3 persisting lock files.
Other users only had 1.
Deleted these and reran tests.
Result: Failed...
Selenium::WebDriver::Error::NoSuchDriverError:
no such session
(Driver info: chromedriver=2.27.440175 ,platform=Linux 3.16.0-0.bpo.4-amd64 x86_64)
Update 4
Upgraded selenium-webdriver to v3.1.0 (latest stable)
Upgraded parallel_tests to v2.13.0 (latest)
Installed Geckodriver v0.14.0 (latest)
Then ran $ bundle exec rake parallel:spec
Result: Failed...
Failure/Error: visit "#/login"
Net::ReadTimeout:
Net::ReadTimeout
Update 5
Whilst in the firefox (Geckodriver v0.14.0, Selenium-webdriver v3.1.0) branch.
I only realised when I had to drop all my parallel_test databases that some were still open.
#ltsp:~/ap$ bundle exec rake parallel:drop[32]
Couldn't drop ap_test_andre32 : #<ActiveRecord::StatementInvalid: PG::ObjectInUse: ERROR: database "ap_test_andre32" is being accessed by other users
DETAIL: There are 3 other sessions using the database.
: DROP DATABASE IF EXISTS "ap_test_andre32">
Couldn't drop ap_test_andre25 : #<ActiveRecord::StatementInvalid: PG::ObjectInUse: ERROR: database "ap_test_andre25" is being accessed by other users
DETAIL: There are 3 other sessions using the database.
: DROP DATABASE IF EXISTS "ap_test_andre25">
When rake parallel:spec does not complete (indefinetly hangs during ),
the process must be killed manually.
Doing so leaves databases locked to the parallel_tests that were using them at the time.
So they must be identified and cleaned up.
postgres 743 0.0 0.0 222364 33628 ? Ss 15:30 0:00 postgres: andre ap_test_andre32 [local] idle in transaction
andre 24581 0.0 0.0 7852 2028 pts/36 S+ 15:49 0:00 grep andre32
postgres 26822 0.0 0.0 220032 23400 ? Ss 15:35 0:00 postgres: andre ap_test_andre32 [local] ALTER TABLE waiting
postgres 29684 0.0 0.0 220032 24064 ? Ss 15:40 0:00 postgres: andre ap_test_andre32 [local] ALTER TABLE waiting
Update 5 Solution:
search for database processes & kill all of them
ps aux | grep test_andre
andre#ltsp:~/ap$ sudo kill -9 743 26822 29684
I was then able to drop my databases.
bundle exec rake parallel:drop[32]
Update 6
Whilst in the firefox (Geckodriver v0.14.0, Selenium-webdriver v3.1.0) branch.
Cloned parallel_tests & Selenium projects locally.
Replaced my gems with a path to the locally cloned projects.
Debugged starting with the error stack trace.
Results
Updated to selenium 3.1.0 and loaded geckodriver (marionette).
I discovered that my firefox profile was not setup correctly with Capybara.
This broke my local single thread tests.
Fixed this.
Discovered that geckodriver is not to be used for FF<48.
Also discovered that the capybara, selenium 3+ & FF48+ combo is not yet ready for use.
Some vital functions are not working. (Right clicking, window resizing...)
Refer here for full details
After investigating parallel_tests, was able to rule that out.
Continued to debug in the firefox test case.
Used the locking port error as my guide.
Ruled out Selenium as the cause of the error.
After debugging the stack trace, it was proving to be very likely that the error state was inherited.
This was just a strong hunch at the time.
It later proved to be correct...
So summary here was that firefox had processes that were being locked.
And they were not being locked by Selenium.
Update 7
Whilst in the firefox (Selenium-webdriver v2.53.4) branch.
Went back to the new linux user that was created.
In light of Update 5, I dropped cleaned up all running processes.
Dropped all databases.
$ bundle exec rake parallel:spec
Result: Parallel tests worked
But why?
The databases were not the cause of the issue.
There was something else.
Update 8
Whilst in the firefox (Geckodriver v0.14.0, Selenium-webdriver v3.1.0) branch.
Identified the reason why the tests were failing and idenfinitely hanging.
This caused the issues described in Update 5 & 6.
It was caused by a change in the way Selenium accepts firefox profile settings.
I identified that the integration tests that were failing were ones that launched a pdf download.
Previously, I had this automated so that the download modal would not appear.
Instead it would automatically download the file to a specified folder.
Updating to Selenium 3.1.0 broke this.
Tests hanged indefinitely.
Databases were held open.

The problems identified, in the updates were not the root cause.
The root cause was that firefox/chrome browser ports were not closed and held open.
After looking at htop, Polkitd was seen to be taking up 16.5gb of ram!
This was caused by a memory leak in Polkitd.
After checking the issues it was confirmed that Polkitd memory leak is a know issue.
The issue has been fixed but only in later distributions of linux debian and not for Wheezy.
After restarting Polkitd, and rerunning the tests in parallel they worked!
This explains why the first time I created a new linux user with a clean profile the parallel test issues were still occuring. - Memory leaks are unpredictable.
It also explains why another computer did not run into the issue.
And why the second time I created a new user the parallel tests worked!
Phew, that took a lot of effort!
Polkitd was uninstalled as it was not needed for any printers or other software that we run.
Overall, if anyone else has the locking issue, it would be helpful to follow some of the process detection that I have done as some of the issues are common to all OS.

Related

Packer failed when executed on Gitlab-runner

I have a packer file to deploy Centos 7 using vSphere-Iso builder that works Ok when executed directly on a linux server but when I try to run the same packer file using a gitlab-runner it fails as it does not wait until the OS is installed. It fails after waiting for 1 minute but if I run the packer command with -on-error=run-cleanup-provisioner the OS install finish succesuflly so clear the issue is that packer is just not waiting.
2021/07/20 12:02:40 packer.io plugin: [INFO] Waiting for IP, up to total timeout: 30m0s, settle timeout: 5m0s
==> vsphere-iso.autogenerated_1: Waiting for IP...
==> vsphere-iso.autogenerated_1: Clear boot order...
==> vsphere-iso.autogenerated_1: Power off VM...
==> vsphere-iso.autogenerated_1: Destroying VM...
2021/07/20 12:03:12 [INFO] (telemetry) ending
==> Wait completed after 1 minute 2 seconds
2021/07/20 12:03:12 machine readable: error-count []string{"1"}
==> Some builds didn't complete successfully and had errors:
My boot command is the following as I do not use DHCP.
boot_command = ["<up><tab> text inst.ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/vmware-ks.cfg ip=10.118.12.117::10.118.12.1:255.255.255.0:{{ .Name }}.localhost:ens192:none<enter><wait>"]
I have tested using options like ssh_host, ip_wait_address, ip_settle_timeout, ssh_wait_timeout, pause_before_connecting but nothing seems to work.
As I said, the same packer pkr.hcl file works OK if run it manually on a regular linux but not on my gitlab-runner that is a runner installed directly on my Gitlab server (Yes, I know is not the best practice but I only use the runner for this task)
Packer versions 1.7.2 and 1.7.3 tested, gitlab-runner 14.0.0 and 14.0.1 tested.
Managed to make it work by changing the las wait on my boot command for wait5m. This will give the OS enough time to get installed and the VM rebooted.
New boot command boot_command = ["<up><tab> text inst.ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/vmware-ks.cfg ip=10.118.12.117::10.118.12.1:255.255.255.0:{{ .Name }}.localhost:ens192:none<enter><wait5m>"]
All the other wait options from packer are no longer needed with this boot command.
Doing some test I managed to make it work as well by creating a Firewall drop rule for the VM just after the kickstar file was loaded and removing the FW rules once the OS was installed. Definitelly, packer is just ignoring all the wait machanism native to packer when running on the gitlab-runner
EDIT: After having the same issue with my Windows Templates y tested using a different gitlab-runner installed on a different server instead of the one in the same gitlab server and it worked perfectly with my initial contifiguration for both, windows and centos.

No browsers available, yet actions requested?

I'm getting this error while trying to capture a browser using the JsTestDriver:
java.lang.RuntimeException: No browsers available, yet actions [com.google.jstestdriver.RunTestsAction#5427ee05] requested. If running against a persistent server please capture browsers. Otherwise, ensure that browsers are defined.
at com.google.jstestdriver.browser.BrowserActionExecutorAction.run(BrowserActionExecutorAction.java:94)
at com.google.jstestdriver.ActionRunner.runActions(ActionRunner.java:81)
at com.google.jstestdriver.embedded.JsTestDriverImpl.runConfigurationWithFlags(JsTestDriverImpl.java:342)
at com.google.jstestdriver.embedded.JsTestDriverImpl.runConfiguration(JsTestDriverImpl.java:233)
at com.google.jstestdriver.Main.main(Main.java:70)
Basically, what I'm doing is the following:
Starting the JsTestDriver server with:
nohup java -jar JsTestDriver-1.3.5.jar --port 9876 > jstd.out 2> jstd.err < /dev/null &
Then I'm trying to capture a browser with:
nohup ./phantomjs phantomjs-jstd.js > phantomjs.out 2> phantomjs.err < /dev/null &
And finally I try to run the tests with:
java -jar JsTestDriver-1.3.5.jar --server http://localhost:9876 --config ../../jsTestDriver.conf --tests all
I have to say, that this is happening after I have updated the Ubuntu server I had, from 11.10 to 12.04. It could help to bring some light to the issue I'm experiencing here.
I have no idea about what's going on...
By the way, I have accessed the link http://localhost:9876 and I get this output HTML:
<html>
<head>
<title>JsTestDriver</title>
<script>
function getEl(id){return document.getElementById(id);}function toggle(id) {
if (getEl(id).style.display=='block') {getEl(id).style.display='none';} else {getEl(id).style.display='block';}}
</script>
</head>
<body>
Capture This Browser
<br/>
Capture This Browser in strict mode
<br/>
<p><strong>Captured Browsers: (0)</strong></p>
</body>
</html>
Which tells me that there is something wrong since there are no elegible browsers.
EDIT
While trying in a different machine (Fedora 21), I have successfully executed everything. The different output I had was when running the second command, the one that attempts to capture a browser. The output I got was this one:
Wed Apr 13 2016 12:46:33 GMT+0200 (CEST): Attempting (1) to load: http://localhost:9876/capture
Wed Apr 13 2016 12:46:34 GMT+0200 (CEST): Finished loading http://localhost:9876/capture with status: success
And when visiting the URL http://localhost:9876 now I see there is one Captured browser. Still no clue about what happens in the Ubuntu 12.04 server.
Well, I finally got the answer to my own question, after the whole day of trial and error. I have downloaded PhantomJS and replaced the previous binary with the new one. When running PhantomJS, the output was empty, no matter what I did with it. Even "phantomjs --version" made it exit unexpectedly.
The weird thing is that I used that 'corrupted' binary in Fedora 21, which worked as well in Ubuntu 11.10 prior to upgrading that Ubuntu to 12.04.
Problem solved!

Running protractor tests on IE 11

OK, so I have been searching a lot to get proper solution to the blocker I am facing right now. Let me give you a background of what I have done so far :
I want to run protractor tests (located on Linux machine) on IE 11 of Windows Server 2012 R2 (IP : 10.81.73.248). My protractorTest.conf.js has below :
exports.config = {
seleniumAddress: 'http://10.81.73.248:4444/wd/hub',
baseURL: 'http://10.81.78.137:80000/',
capabilities: {
browserName: 'internet explorer',
platform: 'ANY',
version: '11'
},
On my Windows Server 2012 R2 machine, I've downloaded IEDriverServer_Win32_2.47.0 and placed it under C:\Windows\System32, environment variable PATH has been updated with above location. Protected mode settings are same for all zones. Windows machine also has selenium-server-standalone-2.48.2.jar placed under C:\Users\Selenium.
On Windows machine, I am starting selenium server using below command :
java -jar selenium-server-standalone-2.48.2.jar -port 4444 -Dwebdriver.ie.driver="C:\Windows\System32\IEDriverServer_Win32_2.47.0\IEDriverServer.exe" , which starts selenium server fine.
With above settings, I run protractor tests from my Linux machine using grunt protractor_test, which launches IE browser on Windows machine, shows localhost:dynamic port and a message as : This is the initial page of webdriver server and within 2 seconds, closes the browser.
The exception I get on selenium server terminal is as below :
Session ID is null. Using WebDriver after calling quit() ?
This is where I am stuck at. I looked at various posts which describes similar issue (?) as mine along with the potential solution, but I am unable to resolve my issue here.
Is there anything I might be doing wrong to setup the connections ? or am I missing some steps to get me through ?
I would really appreciate if you guide me in resolving this long time pending blocker.
I think you are trying to run using old selenium version.It should be 2.53.x something.
Few basics things to check first regarding IE execution:
1).IE Setting for protractor(Selenium)
http://elgalu.github.io/2014/run-protractor-against-internet-explorer-vm/
2).Take IE driver of 32 bit(don't take 64 it has known slowness issues) and manually copy on the following path:
Root Folder\node_modules\protractor\node_modules\webdriver-manager\selenium\IEDriverServer_Win32_2.53.1
3). IE Driver can be downloaded from following path:
http://selenium-release.storage.googleapis.com/index.html?path=2.53/
**OR**
Please upgrade your protractor version to latest like 4.0.11 by changing the version in package.json file and do from command prompt(inside project root directory):
npm update
and then give update your selenium driver with following command from command prompt
webdriver-manager update --ie
it will update the selenium version of IE driver to latest and then try running your tests again.

Parallel-test Cucumber watir testing with phantomjs ECONREFUSED

I`m having issues with phantomjs with my parallel testing, firefox is running fine. I use parallel_tests, watir-webdriver, and Cucumber.
No connection could be made because the target machine actively refused it. - connect(2) for "127.0.0.1" port 8910 (Errno::ECONNREFUSED)
Tests are running via:
parallel_cucumber features/parallel_tests -n 3
After some debugging I figured the problem appears when first process is finished with testing, it somehow kills all phantomjs browser instances.
This is env.rb setup:
browser = Watir::Browser.new :phantomjs, args: %w(--ignore-ssl-errors=true)
Before do
#browser = browser
#browser.cookies.clear
end
at_exit do
browser.close
end
I also tried not closing browser at all, but with no luck, it is somehow done automatically.
I tried both Windows, and CentOS.
phantomjs -v
2.0.0
Using cucumber 1.3.19
Using selenium-webdriver 2.45.0
Using watir-webdriver 0.6.11
Using parallel 1.4.1
Using parallel_tests 1.3.9
I have a feeling it is a phantomjs/webdriver bug...
This is likely a race condition to port 8910 by your 3 phantom instances. Similar to this issue.
# env.rb
Before do
sleep ENV['TEST_ENV_NUMBER'].to_i
#browser = Watir::Browser.new :phantomjs, args: %w(--ignore-ssl-errors=true)
end
If I'm reading the ParallelTests source correctly, environment variable TEST_ENV_NUMBER is set to the process index for each process. So the first process has a TEST_ENV_NUMBER of 0. As long as that's the case, the Before hook above will sleep for that number of seconds before initializing Watir::Browser. That will stagger the the parallelization a bit, but it should remove the race condition.

Error on neo4j server start on arch linux

I have an arch linux setup and installed neo4j through the arch user repository (yaourt -S neo4j), and I'm able to run the web console fine (sudo neo4j console with seemingly normal output and full functionality), however when trying to start the server (sudo neo4j start), I encounter the following error message:
/usr/share/neo4j/bin/utils: line 345: [: -lt: unary operator expected
Using additional JVM arguments: -server -XX:+DisableExplicitGC -Dorg.neo4j.server.properties=/etc/neo4j/neo4j-server.properties -Djava.util.logging.config.file=/etc/neo4j/logging.properties -Dlog4j.configuration=file:/etc/neo4j/log4j.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled
Starting Neo4j Server...cat: /run/neo4j/neo4j-service.pid: No such file or directory
process []... waiting for server to be ready. Failed to start within 120 seconds.
Neo4j Server may have failed to start, please check the logs.
rm: cannot remove ‘/run/neo4j/neo4j-service.pid’: No such file or directory
There's no delay before the error message is printed, so it seems to be something other than the timeout. I'm quite new to neo4j (I worked through a fair bit of the user manual using the web console, but no development or server config experience), so I'm not really sure what else might be relevant. I tried looking through the utils script and the error appears to be where it attempts to su neo4j, but it also seems to proceed to attempt to start the server. I also tried changing the port it's starting on as in this question, but no change. The only log I can find just has this over and over (with appropriate timestamps):
Oct 15, 2014 1:33:49 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Any help at all would be appreciated!
EDIT:
The line 345 that it's failing on is the end of this snippet:
if [ $UID == 0 ] ; then
OPEN_FILES=`su $NEO4J_USER -c "ulimit -n"`
else
OPEN_FILES=`ulimit -n`
fi
if [ $OPEN_FILES -lt 40000 ]; then
From doing some echo debugging, it seems that su $NEO4J_USER is failing, probably because $NEO4J_USER is set to neo4j, a user that does not exist on my system. I tried setting that to root in one of the config files, but evidently that's not working properly. Arch is a continual learning experience for me, but I've not had to add a new user before to get software working.
The interesting line here is:
/usr/share/neo4j/bin/utils: line 345: [: -lt: unary operator expected
I assume that is caused by a wrong default shell for the neo4j user. What default is currently set for the neo4j system user? Try to switch that to bash. The startup scripts should work nicely with bash.

Resources