BitTorrent client: trouble downloading last few blocks from peers - python-3.x

The BitTorrent client I'm working on is almost working except it couldnt get the last few blocks from peers even though the requests have been sent. I get no data from peers except for keep-alive messages and peers would close their connection after sending a few keep-alive.
I know there is a end game mode, where there's a tendency for the last few blocks to trickle in slowly. But in my case, all peers would stop sending any data when there are a few blocks left and close their connection one by one.
What could be the problem that is causing this?
2016-07-19 15:05:27,131 - main.torrent_client - INFO - we have piece 69
2016-07-19 15:05:27,131 - main.torrent_client - INFO - downloaded: 71, total: 72
2016-07-19 15:05:27,132 - main.torrent_client - INFO - peer queue {70, 71}
2016-07-19 15:05:27,132 - main.torrent_client - INFO - 2 blocks left to request from 198.251.56.71
2016-07-19 15:05:27,132 - main.torrent_client - INFO - pop a block from piece 70
2016-07-19 15:05:27,132 - main.torrent_client - INFO - peer queue {71}
2016-07-19 15:05:27,132 - main.torrent_client - INFO - 1 blocks left to request from 198.251.56.71
2016-07-19 15:05:27,133 - main.torrent_client - INFO - pop a block from piece 71
2016-07-19 15:05:30,138 - main.torrent_client - INFO - requested {'index': 71, 'begin_offset': 0, 'request_length': 16384} from 198.251.56.71
2016-07-19 15:05:27,133 - main.pieces - INFO - Done requesting all the pieces!!!!!!!!!
2016-07-19 15:06:18,066 - main.torrent_client - INFO - just sent keep alive message to {'port': 57430, 'host': '198.251.56.71'}
2016-07-19 15:06:18,066 - main.torrent_client - INFO - just sent keep alive message to {'port': 29063, 'host': '38.76.93.8'}
2016-07-19 15:07:15,856 - main.torrent_client - DEBUG - Peer {'port': 29063, 'host': '38.76.93.8'} sent KEEP ALIVE message
2016-07-19 15:07:48,065 - main.torrent_client - INFO - just sent keep alive message to {'port': 57430, 'host': '198.251.56.71'}
2016-07-19 15:07:48,065 - main.torrent_client - INFO - just sent keep alive message to {'port': 29063, 'host': '38.76.93.8'}
2016-07-19 15:09:16,796 - main.torrent_client - DEBUG - Peer {'port': 29063, 'host': '38.76.93.8'} sent KEEP ALIVE message
2016-07-19 15:09:18,066 - main.torrent_client - INFO - just sent keep alive message to {'port': 57430, 'host': '198.251.56.71'}
2016-07-19 15:10:48,070 - main.torrent_client - INFO - just sent keep alive message to {'port': 29063, 'host': '38.76.93.8'}
2016-07-19 15:11:17,785 - main.torrent_client - DEBUG - Peer {'port': 29063, 'host': '38.76.93.8'} sent KEEP ALIVE message
2016-07-19 15:13:48,073 - main.torrent_client - DEBUG - connection closed by {'port': 57430, 'host': '198.251.56.71'}

Related

How to parse stream of logs aggregated from multiple files with logstash?

I have logs from GitLab installed on Kubernetes. Amongst other pods, there is Sidekiq which has a very peculiar structure of logs - it gathers multiple files that all then go into the stdout (see example at the end or official documentation). I want to gather all these logs by Filebeat, send them to Logstash and process them in a sane way (parse JSONs, get important data from line logs, etc. Also, I would like to add info about the original file) and send the output to elasticsearch.
However, I am struggling with how to do that - as a newbie regarding Logstash I am not sure how it works under the hood - and so far, I was able to come up only with grok that matches line with the file name.
From one perspective it should be relatively easy - I just need to use some sort of a state to mark which file is being processed in the log stream but in the first place I am not sure if Filebeat somehow passes information about the stream to Logstash (important to distinguish from which pod logs came) and secondly whether Logstash allows this state-based processing of log stream.
Is it possible to parse these logs and add the original filename as a field this state-based way? Could you possibly point me in the right direction?
filter {
grok {
match => {"message" => "\*\*\* %{PATH:file} \*\*\*"}
}
if [file] == "/var/log/gitlab/production_json.log" {
json {
match => { ... }
}
}
else if [file] == "/var/log/gitlab/application_json.log" {
grok {
match => { ... }
}
}
}
Please notice that even for each file, there might be multiple types of logs (/var/log/gitlab/sidekiq_exporter.log)
*** /var/log/gitlab/application.log ***
2020-11-18T10:08:28.568Z: Cannot obtain an exclusive lease for Namespace::AggregationSchedule. There must be another instance already in execution.
*** /var/log/gitlab/application_json.log ***
{"severity":"ERROR","time":"2020-11-18T10:08:28.568Z","correlation_id":"BsVuSTdkM45","message":"Cannot obtain an exclusive lease for Namespace::AggregationSchedule. There must be another instance already in execution."}
*** /var/log/gitlab/sidekiq_exporter.log ***
[2020-11-18T10:08:32.076+0000] 10.103.149.75 - - [18/Nov/2020:10:08:32 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:08:42.076+0000] 10.103.149.75 - - [18/Nov/2020:10:08:42 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:08:43.771+0000] 10.103.149.75 - - [18/Nov/2020:10:08:43 UTC] "GET /liveness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:08:52.076+0000] 10.103.149.75 - - [18/Nov/2020:10:08:52 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:09:02.076+0000] 10.103.149.75 - - [18/Nov/2020:10:09:02 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:09:12.076+0000] 10.103.149.75 - - [18/Nov/2020:10:09:12 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:09:22.076+0000] 10.103.149.75 - - [18/Nov/2020:10:09:22 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:09:32.076+0000] 10.103.149.75 - - [18/Nov/2020:10:09:32 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:09:42.076+0000] 10.103.149.75 - - [18/Nov/2020:10:09:42 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:09:43.771+0000] 10.103.149.75 - - [18/Nov/2020:10:09:43 UTC] "GET /liveness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:09:52.076+0000] 10.103.149.75 - - [18/Nov/2020:10:09:52 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:10:02.076+0000] 10.103.149.75 - - [18/Nov/2020:10:10:02 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:10:12.076+0000] 10.103.149.75 - - [18/Nov/2020:10:10:12 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
2020-11-18T10:10:15.783Z 10 TID-oslmgxbxm PagesDomainSslRenewalCronWorker JID-e4891c8d6d57d73f401da697 INFO: start
2020-11-18T10:10:15.807Z 10 TID-oslmgxbxm PagesDomainSslRenewalCronWorker JID-e4891c8d6d57d73f401da697 INFO: done: 0.024 sec
[2020-11-18T10:10:22.076+0000] 10.103.149.75 - - [18/Nov/2020:10:10:22 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:10:32.076+0000] 10.103.149.75 - - [18/Nov/2020:10:10:32 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:10:42.076+0000] 10.103.149.75 - - [18/Nov/2020:10:10:42 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:10:43.771+0000] 10.103.149.75 - - [18/Nov/2020:10:10:43 UTC] "GET /liveness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
*** /var/log/gitlab/application_json.log ***
{"severity":"ERROR","time":"2020-11-18T10:49:11.565Z","correlation_id":"H9wDObekY74","message":"Cannot obtain an exclusive lease for Ci::PipelineProcessing::AtomicProcessingService. There must be another instance already in execution."}
*** /var/log/gitlab/application.log ***
2020-11-18T10:49:11.564Z: Cannot obtain an exclusive lease for Ci::PipelineProcessing::AtomicProcessingService. There must be another instance already in execution.
2020-11-18T10:49:11.828Z 10 TID-gn2cjsz0a ProjectServiceWorker JID-ccb9b5b0f74ced684e15af75 INFO: done: 0.275 sec
2020-11-18T10:49:11.835Z 10 TID-gn2dwudy2 Namespaces::ScheduleAggregationWorker JID-7db9fe9200701bbc7dc7360c INFO: start
2020-11-18T10:49:11.844Z 10 TID-gn2dwudy2 Namespaces::ScheduleAggregationWorker JID-7db9fe9200701bbc7dc7360c INFO: done: 0.009 sec
2020-11-18T10:49:11.888Z 10 TID-oslmgxbxm ArchiveTraceWorker JID-999cc768143b644d051cfe82 INFO: done: 0.21 sec
*** /var/log/gitlab/sidekiq_exporter.log ***
[2020-11-18T10:49:12.076+0000] 10.103.149.75 - - [18/Nov/2020:10:49:12 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:49:22.076+0000] 10.103.149.75 - - [18/Nov/2020:10:49:22 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:49:32.076+0000] 10.103.149.75 - - [18/Nov/2020:10:49:32 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
[2020-11-18T10:49:42.076+0000] 10.103.149.75 - - [18/Nov/2020:10:49:42 UTC] "GET /readiness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
2020-11-18T10:49:43.216Z 10 TID-gn2cjsz0a Namespaces::RootStatisticsWorker JID-c277b38f3daa09648934d99f INFO: start
2020-11-18T10:49:43.243Z 10 TID-gn2cjsz0a Namespaces::RootStatisticsWorker JID-c277b38f3daa09648934d99f INFO: done: 0.027 sec
[2020-11-18T10:49:43.771+0000] 10.103.149.75 - - [18/Nov/2020:10:49:43 UTC] "GET /liveness HTTP/1.1" 200 15 "-" "kube-probe/1.17+"
You can give all the logs path in filebeat.yml for filebeat to read the logs and send it to logstash.
Example filebeat.yml for gitlab:
###################### Filebeat Configuration Example #########################
#=========================== Filebeat inputs =============================
filebeat.inputs:
-
paths:
- /var/log/gitlab/gitlab-rails/application_json.log
fields:
- type: gitlab-application-json
fields_under_root: true
encoding: utf-8
-
paths:
- /var/log/gitlab/sidekiq_exporter.log
fields:
- type: gitlab-sidekiq-exporter
fields_under_root: true
encoding: utf-8
-
paths:
- /var/log/gitlab/gitlab-rails/api_json.log
fields:
- type: gitlab-api-json
fields_under_root: true
encoding: utf-8
-
paths:
- /var/log/gitlab/gitlab-rails/application.log
fields:
- type: gitlab-application
fields_under_root: true
encoding: utf-8
#============================= Filebeat modules ===============================
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
hosts: ["10.127.55.155:5066"]
#================================ Processors =====================================
# Configure processors to enhance or manipulate events generated by the beat.
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
Now, in logstash, you can create different grok pattern to filter these logs.
Here is a sample logstash.yml,
input {
beats {
port => "5066"
}
}
filter {
if [type] == "gitlab-sidekiq-exporter" {
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\] %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[(?<timestamp>%{MONTHDAY}/%{MONTH}/%{YEAR}\:%{TIME}) %{TZ:timezone}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}" }
overwrite => [ "message" ]
}
}
filter {
mutate {
remove_tag => [
"_grokparsefailure"
]
}
}
output {
#filtered logs are getting indexed in elasticsearch
elasticsearch {
hosts => ["10.127.55.155:9200"]
user => elastic
password => elastic
action => "index"
index => "gitlab"
}
stdout { codec => rubydebug } #filtered logs can be seen as console output as well, you can comment this out as well, this is for debugging purpose only
}
Note: The beat input port in logstash.yml should be same, as given in output.logstash in filebeat.yml
You can append the logstash.yml for filtering out application_json.log and application.log similar to that of sidekiq_exporter.log
For creating and validating grok pattern to filter the logs, you can use online Grok Debugger.
Here, I have used the Grok Debugger to create a pattern for filtering sidekiq_exporter.log
Pattern: %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}

Elastic beanstalk + socket.io sticky sessions

I am having a weird issue with socket.io on elastic beanstalk using the Application Load Balancer and connecting from node (server-side). Currently I have two nodes, each behind their own nginx, and all behind an application load balancer configured with sticky sessions.
The issue I am having is that the upgrade from long polling -> websocket works fine in the browser but fails from node. The only way I can connect from node is to manually set transports: ["websockets"] which is undesirable. Below are the logs for connecting to the API via node with this code and DEBUG=*
const clientSocket = io(URL);
Looking at the nginx access.logs, nothing seems out of the ordinary regarding the sessions. Here's a snippet. As you can see, everything from my-ip is consistently routed to the same node
172.31.8.41 - - [28/Feb/2019:19:51:42 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:51:42 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargPwH&b64=1&sid=7k33d9491Y8GWscOAABp HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:51:42 +0000] "GET /socket.io/?EIO=3&transport=websocket&sid=7k33d9491Y8GWscOAABp HTTP/1.1" 400 18 "-" "-" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:51:44 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargQGg&b64=1&sid=hy5Trvc1rOCOV-BrAABq HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:51:44 +0000] "GET /socket.io/?EIO=3&transport=websocket&sid=hy5Trvc1rOCOV-BrAABq HTTP/1.1" 400 18 "-" "-" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:51:44 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargI-p&b64=1&sid=4ZsFjp8X8MDznyr9AABE HTTP/1.1" 200 3 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:51:45 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargQfk&b64=1 HTTP/1.1" 200 103 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:51:51 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:51:57 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:52:06 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:52:11 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargW-E&b64=1 HTTP/1.1" 200 103 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:52:12 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:52:15 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargQhr&b64=1&sid=uxJMGa3egW9Wb2jhAABG HTTP/1.1" 200 3 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:52:21 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:52:27 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:52:36 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:52:36 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargX0N&b64=1&sid=bBINNyBJvvX5cP0RAABH HTTP/1.1" 200 3 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:52:36 +0000] "POST /socket.io/?EIO=3&transport=polling&t=MargdA9&b64=1&sid=bBINNyBJvvX5cP0RAABH HTTP/1.1" 200 2 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:52:37 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargdR3&b64=1 HTTP/1.1" 200 103 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:52:38 +0000] "GET /socket.io/?EIO=3&transport=websocket&sid=gfTQu0aAJF6Z6Q5gAABI HTTP/1.1" 101 0 "-" "-" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:52:39 +0000] "GET /socket.io/?EIO=3&transport=polling&t=Margdm-&b64=1 HTTP/1.1" 200 103 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:52:42 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:52:51 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:52:57 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:53:04 +0000] "GET /socket.io/?EIO=3&transport=polling&t=Margdpg&b64=1&sid=3nZgirr9cXOZNj_XAABJ HTTP/1.1" 200 3 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:04 +0000] "POST /socket.io/?EIO=3&transport=polling&t=Margjz2&b64=1&sid=3nZgirr9cXOZNj_XAABJ HTTP/1.1" 200 2 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:05 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargkGk&b64=1 HTTP/1.1" 200 103 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:06 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:53:12 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:53:21 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:53:27 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:53:31 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargkJN&b64=1&sid=rKuyOzJrHA7Fe41JAABK HTTP/1.1" 200 3 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:31 +0000] "POST /socket.io/?EIO=3&transport=polling&t=MargqSU&b64=1&sid=rKuyOzJrHA7Fe41JAABK HTTP/1.1" 200 2 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:32 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargqnP&b64=1&sid=vlqaqHh9VJNz5gXjAABr HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:33 +0000] "POST /socket.io/?EIO=3&transport=polling&t=Margqpb&b64=1&sid=vlqaqHh9VJNz5gXjAABr HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:34 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargrBM&b64=1&sid=oJJ-VBCV30wZRG_4AABs HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:34 +0000] "POST /socket.io/?EIO=3&transport=polling&t=MargrDy&b64=1&sid=oJJ-VBCV30wZRG_4AABs HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:36 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MargrdM&b64=1&sid=UR3CqEnDl_-nFC5pAABt HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:36 +0000] "POST /socket.io/?EIO=3&transport=polling&t=MargrfZ&b64=1&sid=UR3CqEnDl_-nFC5pAABt HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:53:36 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:53:37 +0000] "GET /socket.io/?EIO=3&transport=websocket&sid=04idFIicBmnKXJVyAABu HTTP/1.1" 400 18 "-" "-" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:53:42 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:53:51 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:53:57 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:54:06 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:54:12 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:54:21 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:54:27 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:54:27 +0000] "POST /socket.io/?EIO=3&transport=polling&t=Marh2FA&b64=1&sid=04idFIicBmnKXJVyAABu HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:54:27 +0000] "POST /socket.io/?EIO=3&transport=polling&t=Marh2Hf&b64=1&sid=04idFIicBmnKXJVyAABu HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:54:28 +0000] "GET /socket.io/?EIO=3&transport=websocket&sid=LE3bgCzvK0VAtXL_AABv HTTP/1.1" 400 18 "-" "-" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:54:36 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:54:42 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:54:51 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:54:54 +0000] "GET /socket.io/?EIO=3&transport=polling&t=Marh8em&b64=1&sid=LE3bgCzvK0VAtXL_AABv HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:54:54 +0000] "POST /socket.io/?EIO=3&transport=polling&t=Marh8hg&b64=1&sid=LE3bgCzvK0VAtXL_AABv HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:54:55 +0000] "GET /socket.io/?EIO=3&transport=polling&t=Marh928&b64=1&sid=F_VPIilCbjKlcTAfAABw HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:54:55 +0000] "GET /socket.io/?EIO=3&transport=websocket&sid=F_VPIilCbjKlcTAfAABw HTTP/1.1" 400 18 "-" "-" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:54:56 +0000] "GET /socket.io/?EIO=3&transport=websocket&sid=w2fGwjpj8ElojoJTAABx HTTP/1.1" 400 18 "-" "-" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:54:57 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:55:06 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:55:12 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:55:21 +0000] "POST /socket.io/?EIO=3&transport=polling&t=MarhFRZ&b64=1&sid=w2fGwjpj8ElojoJTAABx HTTP/1.1" 400 52 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:55:21 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:55:22 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MarhFfr&b64=1 HTTP/1.1" 200 103 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:55:27 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:55:36 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:55:42 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:55:47 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MarhFiU&b64=1&sid=RmuQkr9UirrYp6NGAABL HTTP/1.1" 200 3 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:55:47 +0000] "POST /socket.io/?EIO=3&transport=polling&t=MarhLpU&b64=1&sid=RmuQkr9UirrYp6NGAABL HTTP/1.1" 200 2 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:55:48 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MarhM45&b64=1 HTTP/1.1" 200 103 "-" "node-XMLHttpRequest" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:55:49 +0000] "GET /socket.io/?EIO=3&transport=websocket&sid=fTWuT3g-SQ6Y-7DKAABM HTTP/1.1" 101 0 "-" "-" "my-ip"
172.31.8.41 - - [28/Feb/2019:19:55:50 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MarhMTu&b64=1 HTTP/1.1" 200 103 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:55:52 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:55:57 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.28.138 - - [28/Feb/2019:19:56:07 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:56:12 +0000] "GET / HTTP/1.1" 200 5 "-" "ELB-HealthChecker/2.0" "-"
172.31.8.41 - - [28/Feb/2019:19:56:16 +0000] "GET /socket.io/?EIO=3&transport=polling&t=MarhMWX&b64=1&sid=4UCaq5tvWPrIxr-RAABN HTTP/1.1" 200 3 "-" "node-XMLHttpRequest" "my-ip"
172.31.28.138 - - [28/Feb/2019:19:56:16 +0000] "POST /socket.io/?EIO=3&transport=polling&t=MarhSg5&b64=1&sid=4UCaq5tvWPrIxr-RAABN HTTP/1.1" 200 2 "-" "node-XMLHttpRequest" "my-ip"
socket.io logs from node:
julian#wilson:~/project/app-server$ tsc && node dist/src/order_spammer.js
socket.io-client:url parse https://env.api.app.bet +0ms
socket.io-client new io instance for https://env.api.app.bet +0ms
socket.io-client:manager readyState closed +0ms
socket.io-client:manager opening https://env.api.app.bet +1ms
engine.io-client:socket creating transport "polling" +0ms
engine.io-client:polling polling +0ms
engine.io-client:polling-xhr xhr poll +0ms
engine.io-client:polling-xhr xhr open GET: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard8kg&b64=1 +1ms
engine.io-client:polling-xhr xhr data null +0ms
engine.io-client:socket setting transport polling +12ms
socket.io-client:manager connect attempt will timeout after 20000 +13ms
socket.io-client:manager readyState opening +1ms
engine.io-client:polling polling got data 96:0{"sid":"xxj_pihkgmI4gJEhAABB","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}2:40 +174ms
engine.io-client:socket socket receive: type "open", data "{"sid":"xxj_pihkgmI4gJEhAABB","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}" +164ms
engine.io-client:socket socket open +0ms
socket.io-client:manager open +162ms
socket.io-client:manager cleanup +0ms
socket.io-client:socket transport is open - connecting +0ms
engine.io-client:socket starting upgrade probes +1ms
engine.io-client:socket probing transport "websocket" +0ms
engine.io-client:socket creating transport "websocket" +0ms
engine.io-client:socket socket receive: type "message", data "0" +4ms
socket.io-parser decoded 0 as {"type":0,"nsp":"/"} +0ms
engine.io-client:polling polling +7ms
engine.io-client:polling-xhr xhr poll +180ms
engine.io-client:polling-xhr xhr open GET: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard8nV&b64=1&sid=xxj_pihkgmI4gJEhAABB +0ms
engine.io-client:polling-xhr xhr data null +0ms
engine.io-client:socket socket error {"type":"TransportError","description":400} +171ms
socket.io-client:manager error { Error: xhr poll error
at XHR.Transport.onError (/home/julian/project/app-server/node_modules/engine.io-client/lib/transport.js:64:13)
at Request.<anonymous> (/home/julian/project/app-server/node_modules/engine.io-client/lib/transports/polling-xhr.js:128:10)
at Request.Emitter.emit (/home/julian/project/app-server/node_modules/component-emitter/index.js:133:20)
at Request.onError (/home/julian/project/app-server/node_modules/engine.io-client/lib/transports/polling-xhr.js:309:8)
at Timeout._onTimeout (/home/julian/project/app-server/node_modules/engine.io-client/lib/transports/polling-xhr.js:256:18)
at ontimeout (timers.js:427:11)
at tryOnTimeout (timers.js:289:5)
at listOnTimeout (timers.js:252:5)
at Timer.processTimers (timers.js:212:10) type: 'TransportError', description: 400 } +177ms
engine.io-client:socket socket close with reason: "transport error" +4ms
engine.io-client:polling transport open - closing +174ms
engine.io-client:polling writing close packet +0ms
engine.io-client:polling-xhr xhr open POST: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard8qD&b64=1&sid=xxj_pihkgmI4gJEhAABB +175ms
engine.io-client:polling-xhr xhr data 1:1 +0ms
socket.io-client:manager onclose +4ms
socket.io-client:manager cleanup +0ms
socket.io-client:socket close (transport error) +181ms
socket.io-client:manager will wait 661ms before reconnect attempt +1ms
engine.io-client:socket probe transport "websocket" failed because of error: socket closed +3ms
socket.io-client:socket emitting packet with ack id 0 +425ms
socket.io-client:manager attempting reconnect +661ms
socket.io-client:manager readyState closed +0ms
socket.io-client:manager opening https://env.api.app.bet +0ms
engine.io-client:socket creating transport "polling" +661ms
engine.io-client:polling polling +664ms
engine.io-client:polling-xhr xhr poll +663ms
engine.io-client:polling-xhr xhr open GET: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard8-b&b64=1 +1ms
engine.io-client:polling-xhr xhr data null +0ms
engine.io-client:socket setting transport polling +3ms
socket.io-client:manager connect attempt will timeout after 20000 +4ms
engine.io-client:polling polling got data 96:0{"sid":"HADNz_FR5SkW-zAdAAA0","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}2:40 +140ms
engine.io-client:socket socket receive: type "open", data "{"sid":"HADNz_FR5SkW-zAdAAA0","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}" +137ms
engine.io-client:socket socket open +0ms
socket.io-client:manager open +137ms
socket.io-client:manager cleanup +1ms
socket.io-client:socket transport is open - connecting +378ms
socket.io-client:manager reconnect success +0ms
engine.io-client:socket starting upgrade probes +1ms
engine.io-client:socket probing transport "websocket" +0ms
engine.io-client:socket creating transport "websocket" +0ms
engine.io-client:socket socket receive: type "message", data "0" +3ms
socket.io-parser decoded 0 as {"type":0,"nsp":"/"} +983ms
socket.io-client:manager writing packet {"type":2,"data":["active_markets",null],"options":{"compress":true},"id":0,"nsp":"/"} +3ms
socket.io-parser encoding packet {"type":2,"data":["active_markets",null],"options":{"compress":true},"id":0,"nsp":"/"} +1ms
socket.io-parser encoded {"type":2,"data":["active_markets",null],"options":{"compress":true},"id":0,"nsp":"/"} as 20["active_markets",null] +0ms
engine.io-client:socket flushing 1 packets in socket +1ms
engine.io-client:polling-xhr xhr open POST: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard90s&b64=1&sid=HADNz_FR5SkW-zAdAAA0 +145ms
engine.io-client:polling-xhr xhr data 26:420["active_markets",null] +0ms
engine.io-client:polling polling +7ms
engine.io-client:polling-xhr xhr poll +1ms
engine.io-client:polling-xhr xhr open GET: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard90u&b64=1&sid=HADNz_FR5SkW-zAdAAA0 +1ms
engine.io-client:polling-xhr xhr data null +0ms
engine.io-client:socket probe transport "websocket" failed because of error: Error: websocket error +135ms
engine.io-client:socket socket error {"type":"TransportError","description":400} +5ms
socket.io-client:manager error { Error: xhr post error
at XHR.Transport.onError (/home/julian/project/app-server/node_modules/engine.io-client/lib/transport.js:64:13)
at Request.<anonymous> (/home/julian/project/app-server/node_modules/engine.io-client/lib/transports/polling-xhr.js:109:10)
at Request.Emitter.emit (/home/julian/project/app-server/node_modules/component-emitter/index.js:133:20)
at Request.onError (/home/julian/project/app-server/node_modules/engine.io-client/lib/transports/polling-xhr.js:309:8)
at Timeout._onTimeout (/home/julian/project/app-server/node_modules/engine.io-client/lib/transports/polling-xhr.js:256:18)
at ontimeout (timers.js:427:11)
at tryOnTimeout (timers.js:289:5)
at listOnTimeout (timers.js:252:5)
at Timer.processTimers (timers.js:212:10) type: 'TransportError', description: 400 } +141ms
engine.io-client:socket socket close with reason: "transport error" +1ms
engine.io-client:polling transport open - closing +139ms
engine.io-client:polling writing close packet +0ms
engine.io-client:polling-xhr xhr open POST: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard933&b64=1&sid=HADNz_FR5SkW-zAdAAA0 +138ms
engine.io-client:polling-xhr xhr data 1:1 +0ms
socket.io-client:manager onclose +2ms
socket.io-client:manager cleanup +0ms
socket.io-client:socket close (transport error) +146ms
socket.io-client:manager will wait 760ms before reconnect attempt +1ms
engine.io-client:polling polling got data 1:6 +168ms
socket.io-client:manager attempting reconnect +760ms
socket.io-client:manager readyState closed +0ms
socket.io-client:manager opening https://env.api.app.bet +0ms
engine.io-client:socket creating transport "polling" +762ms
engine.io-client:polling polling +594ms
engine.io-client:polling-xhr xhr poll +763ms
engine.io-client:polling-xhr xhr open GET: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard9E-&b64=1 +0ms
engine.io-client:polling-xhr xhr data null +0ms
engine.io-client:socket setting transport polling +2ms
socket.io-client:manager connect attempt will timeout after 20000 +2ms
engine.io-client:polling polling got data 96:0{"sid":"WKWlwvuSqRP4z_kbAABC","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}2:40 +197ms
engine.io-client:socket socket receive: type "open", data "{"sid":"WKWlwvuSqRP4z_kbAABC","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}" +195ms
engine.io-client:socket socket open +0ms
socket.io-client:manager open +195ms
socket.io-client:manager cleanup +0ms
socket.io-client:socket transport is open - connecting +958ms
socket.io-client:manager reconnect success +0ms
engine.io-client:socket starting upgrade probes +0ms
engine.io-client:socket probing transport "websocket" +0ms
engine.io-client:socket creating transport "websocket" +0ms
engine.io-client:socket socket receive: type "message", data "0" +1ms
socket.io-parser decoded 0 as {"type":0,"nsp":"/"} +1s
engine.io-client:polling polling +2ms
engine.io-client:polling-xhr xhr poll +198ms
engine.io-client:polling-xhr xhr open GET: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard9I4&b64=1&sid=WKWlwvuSqRP4z_kbAABC +0ms
engine.io-client:polling-xhr xhr data null +0ms
engine.io-client:socket socket error {"type":"TransportError","description":400} +135ms
socket.io-client:manager error { Error: xhr poll error
at XHR.Transport.onError (/home/julian/project/app-server/node_modules/engine.io-client/lib/transport.js:64:13)
at Request.<anonymous> (/home/julian/project/app-server/node_modules/engine.io-client/lib/transports/polling-xhr.js:128:10)
at Request.Emitter.emit (/home/julian/project/app-server/node_modules/component-emitter/index.js:133:20)
at Request.onError (/home/julian/project/app-server/node_modules/engine.io-client/lib/transports/polling-xhr.js:309:8)
at Timeout._onTimeout (/home/julian/project/app-server/node_modules/engine.io-client/lib/transports/polling-xhr.js:256:18)
at ontimeout (timers.js:427:11)
at tryOnTimeout (timers.js:289:5)
at listOnTimeout (timers.js:252:5)
at Timer.processTimers (timers.js:212:10) type: 'TransportError', description: 400 } +136ms
engine.io-client:socket socket close with reason: "transport error" +1ms
engine.io-client:polling transport open - closing +135ms
engine.io-client:polling writing close packet +0ms
engine.io-client:polling-xhr xhr open POST: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard9KB&b64=1&sid=WKWlwvuSqRP4z_kbAABC +135ms
engine.io-client:polling-xhr xhr data 1:1 +0ms
socket.io-client:manager onclose +3ms
socket.io-client:manager cleanup +0ms
socket.io-client:socket close (transport error) +139ms
socket.io-client:manager will wait 1076ms before reconnect attempt +0ms
engine.io-client:socket probe transport "websocket" failed because of error: socket closed +2ms
socket.io-client:manager attempting reconnect +1s
socket.io-client:manager readyState closed +0ms
socket.io-client:manager opening https://env.api.app.bet +0ms
engine.io-client:socket creating transport "polling" +1s
engine.io-client:polling polling +1s
engine.io-client:polling-xhr xhr poll +1s
engine.io-client:polling-xhr xhr open GET: https://env.api.app.bet/socket.io/?EIO=3&transport=polling&t=Mard9b3&b64=1 +0ms
engine.io-client:polling-xhr xhr data null +0ms
engine.io-client:socket setting transport polling +2ms
socket.io-client:manager connect attempt will timeout after 20000 +2ms
Any ideas how I can debug this?

Intermittent connectionTimeout errors in spark streaming job

I have a Spark (2.1) streaming job that writes processed data to azure blob storage every batch (with batch interval 1 min). Every now and then (once every couple of hours) I get the 'java.net.ConnectException' with connection timeout message. This does gets retried and eventually succeeds. But this issue is causing delay in the completion of the 1 min streaming batch and is causing it to finish in 2 to 3 min when this error occurs.
Below is the executor log snippet with error message. I have spark.executor.cores=5.
Is there some kind of number of connections limit that might be causing this?
17/10/11 16:09:02 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Starting operation.}
17/10/11 16:09:02 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Starting operation with location 'PRIMARY' per location mode 'PRIMARY_ONLY'.}
17/10/11 16:09:02 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Starting request to 'http://<name>.blob.core.windows.net/rawData/2017/10/11/16/data1.json' at 'Wed, 11 Oct 2017 16:09:02 GMT'.}
17/10/11 16:09:02 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Waiting for response.}
..
17/10/11 16:11:09 WARN root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Retryable exception thrown. Class = 'java.net.ConnectException', Message = 'Connection timed out (Connection timed out)'.}
17/10/11 16:11:09 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Checking if the operation should be retried. Retry count = '0', HTTP status code = '-1', Error Message = 'An unknown failure occurred : Connection timed out (Connection timed out)'.}
17/10/11 16:11:09 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {The next location has been set to 'PRIMARY', per location mode 'PRIMARY_ONLY'.}
17/10/11 16:11:09 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {The retry policy set the next location to 'PRIMARY' and updated the location mode to 'PRIMARY_ONLY'.}
17/10/11 16:11:09 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Operation will be retried after '0'ms.}
17/10/11 16:11:09 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Retrying failed operation.}

Spark job can not acquire resource from mesos cluster

I am using Spark Job Server (SJS) to create context and submit jobs.
My cluster includes 4 servers.
master1: 10.197.0.3
master2: 10.197.0.4
master3: 10.197.0.5
master4: 10.197.0.6
But only master1 has a public ip.
First of all I set up zookeeper for master1, master3 and master3 and zookeeper-id from 1 to 3.
I intend use master1, master2, master3 to be a masters of cluster.
That mean quorum=2 I set for 3 masters.
The zk connect is zk://master1:2181,master2:2181,master3:2181/mesos
each server I also start mesos-slave so I have 4 slaves and 3 masters.
As you can see all slaves are conencted.
But the funny thing is when I create a job to run it can not acquire the resource.
From logs I saw that it's continuing DECLINE the offer. This logs from master.
I0523 15:01:00.116981 32513 master.cpp:3641] Processing DECLINE call for offers: [ dc18c89f-d802-404b-9221-71f0f15b096f-O4264 ] for framework dc18c89f-d802-404b-9221-71f0f15b096f-0001 (sql_context-1) at scheduler-f5196abd-f420-48c6-b2fe-0306595601d4#10.197.0.3:28765
I0523 15:01:00.117086 32513 master.cpp:3641] Processing DECLINE call for offers: [ dc18c89f-d802-404b-9221-71f0f15b096f-O4265 ] for framework dc18c89f-d802-404b-9221-71f0f15b096f-0001 (sql_context-1) at scheduler-f5196abd-f420-48c6-b2fe-0306595601d4#10.197.0.3:28765
I0523 15:01:01.460502 32508 replica.cpp:673] Replica in VOTING status received a broadcasted recover request from (914)#127.0.0.1:5050
I0523 15:01:02.117753 32510 master.cpp:5324] Sending 1 offers to framework dc18c89f-d802-404b-9221-71f0f15b096f-0000 (sql_context) at scheduler-9b4637cf-4b27-4629-9a73-6019443ed30b#10.197.0.3:28765
I0523 15:01:02.118099 32510 master.cpp:5324] Sending 1 offers to framework dc18c89f-d802-404b-9221-71f0f15b096f-0001 (sql_context-1) at scheduler-f5196abd-f420-48c6-b2fe-0306595601d4#10.197.0.3:28765
I0523 15:01:02.119299 32508 master.cpp:3641] Processing DECLINE call for offers: [ dc18c89f-d802-404b-9221-71f0f15b096f-O4266 ] for framework dc18c89f-d802-404b-9221-71f0f15b096f-0000 (sql_context) at scheduler-9b4637cf-4b27-4629-9a73-6019443ed30b#10.197.0.3:28765
I0523 15:01:02.119858 32515 master.cpp:3641] Processing DECLINE call for offers: [ dc18c89f-d802-404b-9221-71f0f15b096f-O4267 ] for framework dc18c89f-d802-404b-9221-71f0f15b096f-0001 (sql_context-1) at scheduler-f5196abd-f420-48c6-b2fe-0306595601d4#10.197.0.3:28765
I0523 15:01:02.900946 32509 http.cpp:312] HTTP GET for /master/state from 10.197.0.3:35778 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36' with X-Forwarded-For='113.161.38.181'
I0523 15:01:03.118147 32514 master.cpp:5324] Sending 1 offers to framework dc18c89f-d802-404b-9221-71f0f15b096f-0001 (sql_context-1) at scheduler-f5196abd-f420-48c6-b2fe-0306595601d4#10.197.0.3:28765
For 1 of my slave I check
W0523 14:53:15.487599 32681 status_update_manager.cpp:475] Resending status update TASK_FAILED (UUID: 3c3a022c-2032-4da1-bbab-c367d46e07de) for task driver-20160523111535-0003 of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0019
W0523 14:53:15.487773 32681 status_update_manager.cpp:475] Resending status update TASK_FAILED (UUID: cfb494b3-6484-4394-bd94-80abf2e11ee8) for task driver-20160523112724-0001 of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0020
I0523 14:53:15.487820 32680 slave.cpp:3400] Forwarding the update TASK_FAILED (UUID: 3c3a022c-2032-4da1-bbab-c367d46e07de) for task driver-20160523111535-0003 of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0019 to master#10.197.0.3:5050
I0523 14:53:15.488008 32680 slave.cpp:3400] Forwarding the update TASK_FAILED (UUID: cfb494b3-6484-4394-bd94-80abf2e11ee8) for task driver-20160523112724-0001 of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0020 to master#10.197.0.3:5050
I0523 15:02:24.120436 32680 http.cpp:190] HTTP GET for /slave(1)/state from 113.161.38.181:63097 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'
W0523 15:02:24.165690 32685 slave.cpp:4979] Failed to get resource statistics for executor 'driver-20160523111535-0003' of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0019: Container 'cac7667c-3309-4380-9f95-07d9b888e44e' not found
W0523 15:02:24.165771 32685 slave.cpp:4979] Failed to get resource statistics for executor 'driver-20160523112724-0001' of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0020: Container '9c661311-bf7f-4ea6-9348-ce8c7f6cfbcb' not found
From SJS Logs
[2016-05-23 15:04:10,305] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4565 with attributes: Map() mem: 63403.0 cpu: 8
[2016-05-23 15:04:10,305] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4566 with attributes: Map() mem: 47244.0 cpu: 8
[2016-05-23 15:04:10,305] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4567 with attributes: Map() mem: 47244.0 cpu: 8
[2016-05-23 15:04:10,366] WARN cheduler.TaskSchedulerImpl [] [akka://JobServer/user/context-supervisor/sql_context] - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
[2016-05-23 15:04:10,505] DEBUG cheduler.TaskSchedulerImpl [] [akka://JobServer/user/context-supervisor/sql_context] - parentName: , name: TaskSet_0, runningTasks: 0
[2016-05-23 15:04:11,306] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4568 with attributes: Map() mem: 47244.0 cpu: 8
[2016-05-23 15:04:11,306] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4569 with attributes: Map() mem: 63403.0 cpu: 8
[2016-05-23 15:04:11,505] DEBUG cheduler.TaskSchedulerImpl [] [akka://JobServer/user/context-supervisor/sql_context] - parentName: , name: TaskSet_0, runningTasks: 0
[2016-05-23 15:04:12,308] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4570 with attributes: Map() mem: 47244.0 cpu: 8
[2016-05-23 15:04:12,505] DEBUG cheduler.TaskSchedulerImpl [] [akka://JobServer/user/context-supervisor/sql_context] - parentName: , name: TaskSet_0, runningTasks: 0
In master2 logs
May 23 08:19:44 ants-vps mesos-master[1866]: E0523 08:19:44.273349 1902 process.cpp:1958] Failed to shutdown socket with fd 28: Transport endpoint is not connected
May 23 08:19:54 ants-vps mesos-master[1866]: I0523 08:19:54.274245 1899 replica.cpp:673] Replica in VOTING status received a broadcasted recover request from (1257)#127.0.0.1:5050
May 23 08:19:54 ants-vps mesos-master[1866]: E0523 08:19:54.274533 1902 process.cpp:1958] Failed to shutdown socket with fd 28: Transport endpoint is not connected
May 23 08:20:04 ants-vps mesos-master[1866]: I0523 08:20:04.275291 1897 replica.cpp:673] Replica in VOTING status received a broadcasted recover request from (1260)#127.0.0.1:5050
May 23 08:20:04 ants-vps mesos-master[1866]: E0523 08:20:04.275512 1902 process.cpp:1958] Failed to shutdown socket with fd 28: Transport endpoint is not connected
From master3:
May 23 08:21:05 ants-vps mesos-master[22023]: I0523 08:21:05.994082 22042 recover.cpp:193] Received a recover response from a replica in EMPTY status
May 23 08:21:15 ants-vps mesos-master[22023]: I0523 08:21:15.994051 22043 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
May 23 08:21:15 ants-vps mesos-master[22023]: I0523 08:21:15.994529 22036 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (1282)#127.0.0.1:5050
How to find the reason of that issues and fix it?

Selenium Webdriver requires restarts to function consistently

My testing stack consists of the latest version of Selenium Server (2.33.0, aka selenium-server-standalone-2.33.0.jar), Mocha, Node.js, and PhantomJS.
My question regards the following code:
var webdriver = require('../../../lib/selenium/node_modules/selenium-webdriver/'),
driver = new webdriver.Builder().
withCapabilities({'browserName': 'phantomjs'}).
build();
driver.manage().timeouts().implicitlyWait(15000);
describe('Wordpress', function() {
it('should be able to log in', function(done) {
driver.get('http://#### REDACTED ####/wp-login.php');
driver.findElement(webdriver.By.css('#user_login')).sendKeys('#### REDACTED ####');
driver.findElement(webdriver.By.css('#user_pass')).sendKeys('#### REDACTED ####');
driver.findElement(webdriver.By.css('#wp-submit')).click();
// #wpwrap is an element on the Wordpress dashboard that is displayed once
// the user is logged in. By testing for its presence, we can determine
// if the login attempt succeeded.
driver.findElement(webdriver.By.css('#wpwrap')).then(function(v) {
done();
});
});
});
On my local system, OS X, the test runs fine consistently. However, once the test is uploaded to our CentOS server (where we hope to do Continuous Integration testing), the test behaves extremely strangely.
After Selenium Server is started, the test runs successfully once. From that point on, the test only succeeds every one out of ten times or so. Restarting Selenium Server guarantees that the test will run successfully. In fact, if Selenium Server is restarted every time the test is run, the test will succeed every time.
How can I get this test to succeed without restarting Selenium Server every time?
Thank you so much for your help! :)
UPDATE: In addition to the error log below, I'm also occasionally getting the following error:
Exception in thread "Thread-21" java.lang.OutOfMemoryError: unable to create new native thread
Details on the error messages follow below:
A successful test yields the following output from Mocha:
[s5rich#host ~]$ mocha test/selenium/acceptance/simple.js
Wordpress
✓ should be able to log in (2604ms)
1 passing (3 seconds)
A successful test also yields the following output from Selenium Server:
23:21:50.517 INFO - Executing: [new session: {browserName=phantomjs}] at URL: /session)
23:21:50.527 INFO - Creating a new session for Capabilities [{browserName=phantomjs}]
23:21:50.547 INFO - executable: /usr/local/bin/phantomjs
23:21:50.547 INFO - port: 26515
23:21:50.547 INFO - arguments: [--webdriver=26515, --webdriver-logfile=/home/s5rich/phantomjsdriver.log]
23:21:50.547 INFO - environment: {}
PhantomJS is launching GhostDriver...
[INFO - 2013-07-24T05:21:50.923Z] GhostDriver - Main - running on port 26515
[INFO - 2013-07-24T05:21:51.435Z] Session [f235d040-f420-11e2-8d90-f50327bc3449] - CONSTRUCTOR - Desired Capabilities: {"browserName":"phantomjs"}
[INFO - 2013-07-24T05:21:51.435Z] Session [f235d040-f420-11e2-8d90-f50327bc3449] - CONSTRUCTOR - Negotiated Capabilities: {"browserName":"phantomjs","version":"1.9.1","driverName":"ghostdriver","driverVersion":"1.0.3","platform":"linux-unknown-32bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false,"databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents":true,"proxy":{"proxyType":"direct"}}
[INFO - 2013-07-24T05:21:51.435Z] SessionManagerReqHand - _postNewSessionCommand - New Session Created: f235d040-f420-11e2-8d90-f50327bc3449
23:21:51.495 INFO - Done: /session
23:21:51.504 INFO - Executing: org.openqa.selenium.remote.server.handler.GetSessionCapabilities#46b78d at URL: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2)
23:21:51.505 INFO - Done: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2
23:21:51.520 INFO - Executing: [get: http://#### REDACTED ####/wp-login.php] at URL: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/url)
23:21:51.821 INFO - Done: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/url
23:21:51.827 INFO - Executing: [find element: By.selector: #user_login] at URL: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element)
23:21:51.874 INFO - Done: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element
23:21:51.883 INFO - Executing: [send keys: 0 org.openqa.selenium.support.events.EventFiringWebDriver$EventFiringWebElement#20788bf8, [#### REDACTED ####]] at URL: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element/0/value)
23:21:51.939 INFO - Done: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element/0/value
23:21:51.948 INFO - Executing: [find element: By.selector: #user_pass] at URL: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element)
23:21:51.965 INFO - Done: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element
23:21:52.001 INFO - Executing: [send keys: 1 org.openqa.selenium.support.events.EventFiringWebDriver$EventFiringWebElement#20788bf9, [#### REDACTED ####]] at URL: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element/1/value)
23:21:52.065 INFO - Done: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element/1/value
23:21:52.074 INFO - Executing: [find element: By.selector: #wp-submit] at URL: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element)
23:21:52.099 INFO - Done: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element
23:21:52.106 INFO - Executing: [click: 2 org.openqa.selenium.support.events.EventFiringWebDriver$EventFiringWebElement#20788bfa] at URL: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element/2/click)
23:21:52.842 INFO - Done: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element/2/click
23:21:52.850 INFO - Executing: [find element: By.selector: #wpwrap] at URL: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element)
23:21:52.871 INFO - Done: /session/8750a6b4-ec7d-4313-a86d-04ac344b74f2/element
A failed test yields the following output from Mocha:
[s5rich#host ~]$ mocha test/selenium/acceptance/simple.js
Wordpress
1) should be able to log in
0 passing (2 seconds)
1 failing
1) Wordpress should be able to log in:
Uncaught UnknownError: Error Message => 'Unable to find element with css selector '#wpwrap''
caused by Request => {"headers":{"Accept":"application/json, image/png","Connection":"Keep-Alive","Content-Length":"42","Content-Type":"application/json; charset=utf-8","Host":"localhost:2897"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"css selector\",\"value\":\"#wpwrap\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/77a0a110-f421-11e2-a6fd-61cd002d7d02/element"}
Build info: version: '2.33.0', revision: '4e90c97', time: '2013-05-22 15:32:38'
System info: os.name: 'Linux', os.arch: 'i386', os.version: '2.6.32-042stab076.8', java.version: '1.7.0'
Driver info: driver.version: unknown
at new bot.Error (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/atoms/error.js:108:18)
at Object.bot.response.checkResponse (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/atoms/response.js:106:9)
at /home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/webdriver.js:262:20
at /home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/goog/base.js:1112:15
at webdriver.promise.ControlFlow.runInNewFrame_ (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:1431:20)
at notify (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:315:12)
at notifyAll (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:284:7)
at fulfill (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:389:7)
at /home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:1298:10
at /home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/goog/base.js:1112:15
at webdriver.promise.ControlFlow.runInNewFrame_ (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:1431:20)
at notify (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:315:12)
at notifyAll (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:284:7)
at fulfill (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:389:7)
at /home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/goog/base.js:1112:15
at webdriver.promise.ControlFlow.runInNewFrame_ (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:1431:20)
at notify (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:315:12)
at notifyAll (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:284:7)
at fulfill (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:389:7)
at /home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/promise.js:600:51
at /home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/http/http.js:96:5
at IncomingMessage.<anonymous> (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/http/index.js:113:7)
at IncomingMessage.EventEmitter.emit (events.js:117:20)
at _stream_readable.js:910:16
at process._tickCallback (node.js:415:13)
==== async task ====
WebDriver.findElement(By.cssSelector("#wpwrap"))
at webdriver.WebDriver.schedule (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/webdriver.js:246:15)
at webdriver.WebDriver.findElement (/home/s5rich/lib/selenium/node_modules_osx/selenium-webdriver/lib/webdriver/webdriver.js:685:17)
at Context.<anonymous> (/home/s5rich/test/selenium/acceptance/simple.js:12:10)
at Test.Runnable.run (/usr/local/lib/node_modules/mocha/lib/runnable.js:194:15)
at Runner.runTest (/usr/local/lib/node_modules/mocha/lib/runner.js:355:10)
at /usr/local/lib/node_modules/mocha/lib/runner.js:401:12
at next (/usr/local/lib/node_modules/mocha/lib/runner.js:281:14)
at /usr/local/lib/node_modules/mocha/lib/runner.js:290:7
at next (/usr/local/lib/node_modules/mocha/lib/runner.js:234:23)
at Object._onImmediate (/usr/local/lib/node_modules/mocha/lib/runner.js:258:5)
at processImmediate [as _immediateCallback] (timers.js:330:15)
A failed test also yields the following output from Selenium Server:
23:25:34.742 INFO - Executing: [new session: {browserName=phantomjs}] at URL: /session)
23:25:34.743 INFO - Creating a new session for Capabilities [{browserName=phantomjs}]
23:25:34.744 INFO - executable: /usr/local/bin/phantomjs
23:25:34.744 INFO - port: 2897
23:25:34.744 INFO - arguments: [--webdriver=2897, --webdriver-logfile=/home/s5rich/phantomjsdriver.log]
23:25:34.744 INFO - environment: {}
PhantomJS is launching GhostDriver...
[INFO - 2013-07-24T05:25:34.879Z] GhostDriver - Main - running on port 2897
[INFO - 2013-07-24T05:25:35.270Z] Session [77a0a110-f421-11e2-a6fd-61cd002d7d02] - CONSTRUCTOR - Desired Capabilities: {"browserName":"phantomjs"}
[INFO - 2013-07-24T05:25:35.270Z] Session [77a0a110-f421-11e2-a6fd-61cd002d7d02] - CONSTRUCTOR - Negotiated Capabilities: {"browserName":"phantomjs","version":"1.9.1","driverName":"ghostdriver","driverVersion":"1.0.3","platform":"linux-unknown-32bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false,"databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents":true,"proxy":{"proxyType":"direct"}}
[INFO - 2013-07-24T05:25:35.270Z] SessionManagerReqHand - _postNewSessionCommand - New Session Created: 77a0a110-f421-11e2-a6fd-61cd002d7d02
23:25:35.275 INFO - Done: /session
23:25:35.283 INFO - Executing: org.openqa.selenium.remote.server.handler.GetSessionCapabilities#13b4ce4 at URL: /session/b62d0c67-2000-439c-a0bd-f2c100350dee)
23:25:35.284 INFO - Done: /session/b62d0c67-2000-439c-a0bd-f2c100350dee
23:25:35.297 INFO - Executing: [get: http://#### REDACTED ####/wp-login.php] at URL: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/url)
23:25:35.592 INFO - Done: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/url
23:25:35.597 INFO - Executing: [find element: By.selector: #user_login] at URL: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element)
23:25:35.619 INFO - Done: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element
23:25:35.631 INFO - Executing: [send keys: 0 org.openqa.selenium.support.events.EventFiringWebDriver$EventFiringWebElement#240035bc, [#### REDACTED ####]] at URL: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element/0/value)
23:25:35.683 INFO - Done: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element/0/value
23:25:35.695 INFO - Executing: [find element: By.selector: #user_pass] at URL: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element)
23:25:35.712 INFO - Done: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element
23:25:35.723 INFO - Executing: [send keys: 1 org.openqa.selenium.support.events.EventFiringWebDriver$EventFiringWebElement#240035bd, [#### REDACTED ####]] at URL: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element/1/value)
23:25:35.783 INFO - Done: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element/1/value
23:25:35.800 INFO - Executing: [find element: By.selector: #wp-submit] at URL: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element)
23:25:35.822 INFO - Done: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element
23:25:35.832 INFO - Executing: [click: 2 org.openqa.selenium.support.events.EventFiringWebDriver$EventFiringWebElement#240035be] at URL: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element/2/click)
23:25:36.105 INFO - Done: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element/2/click
23:25:36.121 INFO - Executing: [find element: By.selector: #wpwrap] at URL: /session/b62d0c67-2000-439c-a0bd-f2c100350dee/element)
e = java.awt.HeadlessException:
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
23:25:36.285 WARN - Exception thrown
org.openqa.selenium.NoSuchElementException: Error Message => 'Unable to find element with css selector '#wpwrap''
caused by Request => {"headers":{"Accept":"application/json, image/png","Connection":"Keep-Alive","Content-Length":"42","Content-Type":"application/json; charset=utf-8","Host":"localhost:2897"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"css selector\",\"value\":\"#wpwrap\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/77a0a110-f421-11e2-a6fd-61cd002d7d02/element"}
Command duration or timeout: 149 milliseconds
For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html
Build info: version: '2.33.0', revision: '4e90c97', time: '2013-05-22 15:32:38'
System info: os.name: 'Linux', os.arch: 'i386', os.version: '2.6.32-042stab076.8', java.version: '1.7.0'
Session ID: 77a0a110-f421-11e2-a6fd-61cd002d7d02
Driver info: org.openqa.selenium.phantomjs.PhantomJSDriver
Capabilities [{platform=LINUX, acceptSslCerts=false, javascriptEnabled=true, browserName=phantomjs, rotatable=false, driverVersion=1.0.3, locationContextEnabled=false, version=1.9.1, databaseEnabled=false, cssSelectorsEnabled=true, handlesAlerts=false, browserConnectionEnabled=false, proxy={proxyType=direct}, webStorageEnabled=false, nativeEvents=true, driverName=ghostdriver, applicationCacheEnabled=false, takesScreenshot=true}]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.openqa.selenium.remote.ErrorHandler.createThrowable(ErrorHandler.java:191)
at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:145)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:554)
at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:307)
at org.openqa.selenium.remote.RemoteWebDriver.findElementByCssSelector(RemoteWebDriver.java:396)
at org.openqa.selenium.By$ByCssSelector.findElement(By.java:407)
at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:299)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.openqa.selenium.support.events.EventFiringWebDriver$2.invoke(EventFiringWebDriver.java:101)
at $Proxy1.findElement(Unknown Source)
at org.openqa.selenium.support.events.EventFiringWebDriver.findElement(EventFiringWebDriver.java:180)
at org.openqa.selenium.remote.server.handler.FindElement.call(FindElement.java:47)
at org.openqa.selenium.remote.server.handler.FindElement.call(FindElement.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at org.openqa.selenium.remote.server.DefaultSession$1.run(DefaultSession.java:169)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.openqa.selenium.remote.ScreenshotException: Screen shot has been taken
Build info: version: '2.33.0', revision: '4e90c97', time: '2013-05-22 15:32:38'
System info: os.name: 'Linux', os.arch: 'i386', os.version: '2.6.32-042stab076.8', java.version: '1.7.0'
Driver info: driver.version: EventFiringWebDriver
at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:125)
... 20 more
Caused by: org.openqa.selenium.remote.ErrorHandler$UnknownServerException: Error Message => 'Unable to find element with css selector '#wpwrap''
caused by Request => {"headers":{"Accept":"application/json, image/png","Connection":"Keep-Alive","Content-Length":"42","Content-Type":"application/json; charset=utf-8","Host":"localhost:2897"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"css selector\",\"value\":\"#wpwrap\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/77a0a110-f421-11e2-a6fd-61cd002d7d02/element"}
Build info: version: '2.33.0', revision: '4e90c97', time: '2013-05-22 15:32:38'
System info: os.name: 'Linux', os.arch: 'i386', os.version: '2.6.32-042stab076.8', java.version: '1.7.0'
Driver info: driver.version: unknown
23:25:36.293 WARN - Exception: Error Message => 'Unable to find element with css selector '#wpwrap''
caused by Request => {"headers":{"Accept":"application/json, image/png","Connection":"Keep-Alive","Content-Length":"42","Content-Type":"application/json; charset=utf-8","Host":"localhost:2897"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"css selector\",\"value\":\"#wpwrap\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/77a0a110-f421-11e2-a6fd-61cd002d7d02/element"}
Build info: version: '2.33.0', revision: '4e90c97', time: '2013-05-22 15:32:38'
System info: os.name: 'Linux', os.arch: 'i386', os.version: '2.6.32-042stab076.8', java.version: '1.7.0'
Driver info: driver.version: unknown
From what I see, it is a simple problem to solve. You have a unhandled exception that is being thrown when there is a failure: NoSuchElementException . I don't think you need to restart your grid node every time (assuming you are running your grid hub separate from your grid node instances). It may be that successive runs of the browser from your Grid Node use just enough more memory to cause a minor delay in the page, breaking your script. All you need to do is handle the NoSuchElementException gracefully by wrapping it in a retry try loop. You can also effectively do similar using FluentWait with the .ignoring method.

Resources