Stormcrawler not retrieving all text content from web page - stormcrawler

I'm attempting to use Stormcrawler to crawl a set of pages on our website, and while it is able to retrieve and index some of the page's text, it's not capturing a large amount of other text on the page.
I've installed Zookeeper, Apache Storm, and Stormcrawler using the Ansible playbooks provided here (thank you a million for those!) on a server running Ubuntu 18.04, along with Elasticsearch and Kibana. For the most part, I'm using the configuration defaults, but have made the following changes:
For the Elastic index mappings, I've enabled _source: true, and turned on indexing and storing for all properties (content, host, title, url)
In the crawler-conf.yaml configuration, I've commented out all textextractor.include.pattern and textextractor.exclude.tags settings, to enforce capturing the whole page
After re-creating fresh ES indices, running mvn clean package, and then starting the crawler topology, stormcrawler begins doing its thing and content starts appearing in Elasticsearch. However, for many pages, the content that's retrieved and indexed is only a subset of all the text on the page, and usually excludes the main page text we are interested in.
For example, the text in the following XML path is not returned/indexed:
<html> <body> <div#maincontentcontainer.container> <div#docs-container> <div> <div.row> <div.col-lg-9.col-md-8.col-sm-12.content-item> <div> <div> <p> (text)
While the text in this path is returned:
<html> <body> <div> <div.container> <div.row> <p> (text)
Are there any additional configuration changes that need to be made beyond commenting out all specific tag include and exclude patterns? From my understanding of the documentation, the default settings for those options are to enforce the whole page to be indexed.
I would greatly appreciate any help. Thank you for the excellent software.
Below are my configuration files:
crawler-conf.yaml
config:
topology.workers: 3
topology.message.timeout.secs: 1000
topology.max.spout.pending: 100
topology.debug: false
fetcher.threads.number: 100
# override the JVM parameters for the workers
topology.worker.childopts: "-Xmx2g -Djava.net.preferIPv4Stack=true"
# mandatory when using Flux
topology.kryo.register:
- com.digitalpebble.stormcrawler.Metadata
# metadata to transfer to the outlinks
# metadata.transfer:
# - customMetadataName
# lists the metadata to persist to storage
metadata.persist:
- _redirTo
- error.cause
- error.source
- isSitemap
- isFeed
http.agent.name: "My crawler"
http.agent.version: "1.0"
http.agent.description: ""
http.agent.url: ""
http.agent.email: ""
# The maximum number of bytes for returned HTTP response bodies.
http.content.limit: -1
# FetcherBolt queue dump => comment out to activate
# fetcherbolt.queue.debug.filepath: "/tmp/fetcher-dump-{port}"
parsefilters.config.file: "parsefilters.json"
urlfilters.config.file: "urlfilters.json"
# revisit a page daily (value in minutes)
fetchInterval.default: 1440
# revisit a page with a fetch error after 2 hours (value in minutes)
fetchInterval.fetch.error: 120
# never revisit a page with an error (or set a value in minutes)
fetchInterval.error: -1
# text extraction for JSoupParserBolt
# textextractor.include.pattern:
# - DIV[id="maincontent"]
# - DIV[itemprop="articleBody"]
# - ARTICLE
# textextractor.exclude.tags:
# - STYLE
# - SCRIPT
# configuration for the classes extending AbstractIndexerBolt
# indexer.md.filter: "someKey=aValue"
indexer.url.fieldname: "url"
indexer.text.fieldname: "content"
indexer.canonical.name: "canonical"
indexer.md.mapping:
- parse.title=title
- parse.keywords=keywords
- parse.description=description
- domain=domain
# Metrics consumers:
topology.metrics.consumer.register:
- class: "org.apache.storm.metric.LoggingMetricsConsumer"
parallelism.hint: 1
http.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.selenium.RemoteDriverProtocol"
https.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.selenium.RemoteDriverProtocol"
selenium.addresses: "http://localhost:9515"
es-conf.yaml
config:
# ES indexer bolt
es.indexer.addresses: "localhost"
es.indexer.index.name: "content"
# es.indexer.pipeline: "_PIPELINE_"
es.indexer.create: false
es.indexer.bulkActions: 100
es.indexer.flushInterval: "2s"
es.indexer.concurrentRequests: 1
# ES metricsConsumer
es.metrics.addresses: "http://localhost:9200"
es.metrics.index.name: "metrics"
# ES spout and persistence bolt
es.status.addresses: "http://localhost:9200"
es.status.index.name: "status"
es.status.routing: true
es.status.routing.fieldname: "key"
es.status.bulkActions: 500
es.status.flushInterval: "5s"
es.status.concurrentRequests: 1
# spout config #
# positive or negative filters parsable by the Lucene Query Parser
# es.status.filterQuery:
# - "-(key:stormcrawler.net)"
# - "-(key:digitalpebble.com)"
# time in secs for which the URLs will be considered for fetching after a ack of fail
spout.ttl.purgatory: 30
# Min time (in msecs) to allow between 2 successive queries to ES
spout.min.delay.queries: 2000
# Delay since previous query date (in secs) after which the nextFetchDate value will be reset to the current time
spout.reset.fetchdate.after: 120
es.status.max.buckets: 50
es.status.max.urls.per.bucket: 2
# field to group the URLs into buckets
es.status.bucket.field: "key"
# fields to sort the URLs within a bucket
es.status.bucket.sort.field:
- "nextFetchDate"
- "url"
# field to sort the buckets
es.status.global.sort.field: "nextFetchDate"
# CollapsingSpout : limits the deep paging by resetting the start offset for the ES query
es.status.max.start.offset: 500
# AggregationSpout : sampling improves the performance on large crawls
es.status.sample: false
# max allowed duration of a query in sec
es.status.query.timeout: -1
# AggregationSpout (expert): adds this value in mins to the latest date returned in the results and
# use it as nextFetchDate
es.status.recentDate.increase: -1
es.status.recentDate.min.gap: -1
topology.metrics.consumer.register:
- class: "com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer"
parallelism.hint: 1
#whitelist:
# - "fetcher_counter"
# - "fetcher_average.bytes_fetched"
#blacklist:
# - "__receive.*"
es-crawler.flux
name: "crawler"
includes:
- resource: true
file: "/crawler-default.yaml"
override: false
- resource: false
file: "crawler-conf.yaml"
override: true
- resource: false
file: "es-conf.yaml"
override: true
spouts:
- id: "spout"
className: "com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout"
parallelism: 10
- id: "filespout"
className: "com.digitalpebble.stormcrawler.spout.FileSpout"
parallelism: 1
constructorArgs:
- "."
- "seeds.txt"
- true
bolts:
- id: "filter"
className: "com.digitalpebble.stormcrawler.bolt.URLFilterBolt"
parallelism: 3
- id: "partitioner"
className: "com.digitalpebble.stormcrawler.bolt.URLPartitionerBolt"
parallelism: 3
- id: "fetcher"
className: "com.digitalpebble.stormcrawler.bolt.FetcherBolt"
parallelism: 3
- id: "sitemap"
className: "com.digitalpebble.stormcrawler.bolt.SiteMapParserBolt"
parallelism: 3
- id: "parse"
className: "com.digitalpebble.stormcrawler.bolt.JSoupParserBolt"
parallelism: 12
- id: "index"
className: "com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt"
parallelism: 3
- id: "status"
className: "com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt"
parallelism: 3
- id: "status_metrics"
className: "com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt"
parallelism: 3
streams:
- from: "spout"
to: "partitioner"
grouping:
type: SHUFFLE
- from: "spout"
to: "status_metrics"
grouping:
type: SHUFFLE
- from: "partitioner"
to: "fetcher"
grouping:
type: FIELDS
args: ["key"]
- from: "fetcher"
to: "sitemap"
grouping:
type: LOCAL_OR_SHUFFLE
- from: "sitemap"
to: "parse"
grouping:
type: LOCAL_OR_SHUFFLE
- from: "parse"
to: "index"
grouping:
type: LOCAL_OR_SHUFFLE
- from: "fetcher"
to: "status"
grouping:
type: FIELDS
args: ["url"]
streamId: "status"
- from: "sitemap"
to: "status"
grouping:
type: FIELDS
args: ["url"]
streamId: "status"
- from: "parse"
to: "status"
grouping:
type: FIELDS
args: ["url"]
streamId: "status"
- from: "index"
to: "status"
grouping:
type: FIELDS
args: ["url"]
streamId: "status"
- from: "filespout"
to: "filter"
grouping:
type: FIELDS
args: ["url"]
streamId: "status"
- from: "filter"
to: "status"
grouping:
streamId: "status"
type: CUSTOM
customClass:
className: "com.digitalpebble.stormcrawler.util.URLStreamGrouping"
constructorArgs:
- "byDomain"
parsefilters.json
{
"com.digitalpebble.stormcrawler.parse.ParseFilters": [
{
"class": "com.digitalpebble.stormcrawler.parse.filter.XPathFilter",
"name": "XPathFilter",
"params": {
"canonical": "//*[#rel=\"canonical\"]/#href",
"parse.description": [
"//*[#name=\"description\"]/#content",
"//*[#name=\"Description\"]/#content"
],
"parse.title": [
"//TITLE",
"//META[#name=\"title\"]/#content"
],
"parse.keywords": "//META[#name=\"keywords\"]/#content"
}
},
{
"class": "com.digitalpebble.stormcrawler.parse.filter.LinkParseFilter",
"name": "LinkParseFilter",
"params": {
"pattern": "//FRAME/#src"
}
},
{
"class": "com.digitalpebble.stormcrawler.parse.filter.DomainParseFilter",
"name": "DomainParseFilter",
"params": {
"key": "domain",
"byHost": false
}
},
{
"class": "com.digitalpebble.stormcrawler.parse.filter.CommaSeparatedToMultivaluedMetadata",
"name": "CommaSeparatedToMultivaluedMetadata",
"params": {
"keys": ["parse.keywords"]
}
}
]
}
Attempting to use Chromedriver
I installed the latest versions of Chromedriver and Google Chrome for Ubuntu.
First I start chromedriver in headless mode at localhost:9515 as the stormcrawler user (via a separate python shell, as shown below), and then I restart the stormcrawler topology (also as stormcrawler user) but end up with a stack of errors related to Chrome. The odd thing however is that I can confirm chromedriver is running OK within the Python shell directly, and I can confirm that both the driver and browser are actively running via ps -ef). This same stack of errors also occurs when I attempt to simply start chromedriver from the command line (i.e., chromedriver --headless &).
Starting chromedriver in headless mode (in python3 shell)
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--window-size=1200x600')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-setuid-sandbox')
options.add_argument('--disable-extensions')
options.add_argument('--disable-infobars')
options.add_argument('--remote-debugging-port=9222')
options.add_argument('--user-data-dir=/home/stormcrawler/cache/google/chrome')
options.add_argument('--disable-gpu')
options.add_argument('--profile-directory=Default')
options.binary_location = '/usr/bin/google-chrome'
driver = webdriver.Chrome(chrome_options=options, port=9515, executable_path=r'/usr/bin/chromedriver')
Stack trace from starting stormcrawler topology
Run command: storm jar target/stormcrawler-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local es-crawler.flux --sleep 60000
9486 [Thread-26-fetcher-executor[3 3]] ERROR o.a.s.util - Async loop died!
java.lang.RuntimeException: org.openqa.selenium.WebDriverException: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Build info: version: '4.0.0-alpha-6', revision: '5f43a29cfc'
System info: host: 'stormcrawler-dev', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.15.0-33-generic', java.version: '1.8.0_282'
Driver info: driver.version: RemoteWebDriver
remote stacktrace: #0 0x55d590b21e89 <unknown>
at com.digitalpebble.stormcrawler.protocol.selenium.RemoteDriverProtocol.configure(RemoteDriverProtocol.java:101) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at com.digitalpebble.stormcrawler.protocol.ProtocolFactory.<init>(ProtocolFactory.java:69) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at com.digitalpebble.stormcrawler.bolt.FetcherBolt.prepare(FetcherBolt.java:818) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.storm.daemon.executor$fn__10180$fn__10193.invoke(executor.clj:803) ~[storm-core-1.2.3.jar:1.2.3]
at org.apache.storm.util$async_loop$fn__624.invoke(util.clj:482) [storm-core-1.2.3.jar:1.2.3]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: org.openqa.selenium.WebDriverException: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
...
Confirming that chromedriver and chrome are both running and reachable
~/stormcrawler$ ps -ef | grep -i 'driver'
stormcr+ 18862 18857 0 14:28 pts/0 00:00:00 /usr/bin/chromedriver --port=9515
stormcr+ 18868 18862 0 14:28 pts/0 00:00:00 /usr/bin/google-chrome --disable-background-networking --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-gpu --disable-hang-monitor --disable-infobars --disable-popup-blocking --disable-prompt-on-repost --disable-setuid-sandbox --disable-sync --enable-automation --enable-blink-features=ShadowDOMV0 --enable-logging --headless --log-level=0 --no-first-run --no-sandbox --no-service-autorun --password-store=basic --profile-directory=Default --remote-debugging-port=9222 --test-type=webdriver --use-mock-keychain --user-data-dir=/home/stormcrawler/cache/google/chrome --window-size=1200x600
stormcr+ 18899 18877 0 14:28 pts/0 00:00:00 /opt/google/chrome/chrome --type=renderer --no-sandbox --disable-dev-shm-usage --enable-automation --enable-logging --log-level=0 --remote-debugging-port=9222 --test-type=webdriver --allow-pre-commit-input --ozone-platform=headless --field-trial-handle=17069524199442920904,10206176048672570859,131072 --disable-gpu-compositing --enable-blink-features=ShadowDOMV0 --lang=en-US --headless --enable-crash-reporter --lang=en-US --num-raster-threads=1 --renderer-client-id=4 --shared-files=v8_context_snapshot_data:100
~/stormcrawler$ sudo netstat -lp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 localhost:9222 0.0.0.0:* LISTEN 18026/google-chrome
tcp 0 0 localhost:9515 0.0.0.0:* LISTEN 18020/chromedriver

IIRC you need to set some additional config to work with ChomeDriver.
Alternatively (haven't tried yet) https://hub.docker.com/r/browserless/chrome would be a nice way of handling Chrome in a Docker container.

Related

capture journald properties with rsyslog

I am struggling on how to capture systemd-journald properties into rsyslog files.
My setup
ubuntu inside docker on arm (raspberrypi): FROM arm64v8/ubuntu:20.04
docker command (all subsequent actions taken inside running docker container)
$ docker run --privileged -ti --cap-add SYS_ADMIN --security-opt seccomp=unconfined --cgroup-parent=docker.slice --cgroupns private --tmpfs /tmp --tmpfs /run --tmpfs /run/lock systemd:origin
rsyslog under $ sytemctl status rsyslog
● rsyslog.service - System Logging Service
Loaded: loaded (/lib/systemd/system/rsyslog.service; enabled; vendor prese>
Active: active (running)
...
[origin software="rsyslogd" swVersion="8.2001.0" x-pid="39758" x-info="https://www.rsyslog.com"] start
...
My plan
Having a small c program to put some information into journal:
#include <systemd/sd-journal.h>
#include <stdio.h>
#include <unistd.h>
int main(int arcg, char** args) {
char buffer [50];
sprintf (buffer, "%lu", (unsigned long)getpid());
printf("writing to journal\n");
sd_journal_print(LOG_WARNING, "%s", "a little journal test message");
sd_journal_send("MESSAGE=%s", "there shoud be a text", "SYSLOG_PID=%s", buffer, "PRIORITY=%i", LOG_ERR, "DOCUMENTATION=%s", "any doc link", "MESSAGE_ID=%s", "e5e4132e441541f89bca0cc3e7be3381", "MEAS_VAL=%d", 1394, NULL);
return 0;
}
Compile it: $ gcc joutest.c -lsystemd -o jt
Execute it: $ ./jt
This results inside the journal as $ journalctl -r -o json-pretty:
{
"_GID" : "0",
"MESSAGE" : "there shoud be a text",
"_HOSTNAME" : "f1aad951c039",
"SYSLOG_IDENTIFIER" : "jt",
"_TRANSPORT" : "journal",
"CODE_FILE" : "joutest.c",
"DOCUMENTATION" : "any doc link",
"_BOOT_ID" : "06a36b314cee462591c65a2703c8b2ad",
"CODE_LINE" : "14",
"MESSAGE_ID" : "e5e4132e441541f89bca0cc3e7be3381",
"_CAP_EFFECTIVE" : "3fffffffff",
"__REALTIME_TIMESTAMP" : "1669373862349599",
"_SYSTEMD_UNIT" : "init.scope",
"CODE_FUNC" : "main",
"_MACHINE_ID" : "5aba31746bf244bba6081297fe061445",
"SYSLOG_PID" : "39740",
"PRIORITY" : "3",
"_COMM" : "jt",
"_SYSTEMD_SLICE" : "-.slice",
"MEAS_VAL" : "1394",
"__MONOTONIC_TIMESTAMP" : "390853282189",
"_PID" : "39740",
"_SOURCE_REALTIME_TIMESTAMP" : "1669373862336503",
"_UID" : "0",
"_SYSTEMD_CGROUP" : "/init.scope",
"__CURSOR" : "s=63a46a30bbbb4b8c9288a9b12c622b37;i=6cb;b=06a36b314cee46>
}
Now as a test, extracting all properties from that journal entry via rsyslog; property in the jargon of rsyslog in principle is the name of a key in the formatted json entry. But if a property (or key name) matches, the whole dictionary item (key and value) shall be captured
To start with this, I've configured rsyslog as:
module(load="imjournal")
module(load="mmjsonparse")
action(type="mmjsonparse")
if $programname == 'jt' and $syslogseverity == 3 then
action(type="omfile" file="/var/log/jt_err.log" template="RSYSLOG_DebugFormat")
This config is located in /etc/rsyslog.d/filter.conf and gets automatically included by /etc/rsyslog.conf:
# /etc/rsyslog.conf configuration file for rsyslog
#
# For more information install rsyslog-doc and see
# /usr/share/doc/rsyslog-doc/html/configuration/index.html
#
# Default logging rules can be found in /etc/rsyslog.d/50-default.conf
#################
#### MODULES ####
#################
#module(load="imuxsock") # provides support for local system logging
#module(load="immark") # provides --MARK-- message capability
# provides UDP syslog reception
#module(load="imudp")
#input(type="imudp" port="514")
# provides TCP syslog reception
#module(load="imtcp")
#input(type="imtcp" port="514")
# provides kernel logging support and enable non-kernel klog messages
module(load="imklog" permitnonkernelfacility="on")
###########################
#### GLOBAL DIRECTIVES ####
###########################
#
# Use traditional timestamp format.
# To enable high precision timestamps, comment out the following line.
#
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
# Filter duplicated messages
$RepeatedMsgReduction on
#
# Set the default permissions for all log files.
#
$FileOwner syslog
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022
$PrivDropToUser syslog
$PrivDropToGroup syslog
#
# Where to place spool and state files
#
$WorkDirectory /var/spool/rsyslog
#
# Include all config files in /etc/rsyslog.d/
#
$IncludeConfig /etc/rsyslog.d/*.conf
Applied this config: $ systemctl restart rsyslog
Which results in the following: $ cat /var/log/jt_err.log
Debug line with all properties:
FROMHOST: 'f1aad951c039', fromhost-ip: '127.0.0.1', HOSTNAME:
'f1aad951c039', PRI: 11,
syslogtag 'jt[39765]:', programname: 'jt', APP-NAME: 'jt', PROCID:
'39765', MSGID: '-',
TIMESTAMP: 'Nov 25 11:47:50', STRUCTURED-DATA: '-',
msg: ' there shoud be a text'
escaped msg: ' there shoud be a text'
inputname: imuxsock rawmsg: '<11>Nov 25 11:47:50 jt[39765]: there
shoud be a text'
$!:{ "msg": "there shoud be a text" }
$.:
$/:
My problem
Looking on the resulting rsyslog, I miss a majority, if not all, of items originating from the journal entry.
There is really no property (key) matching. Shouldn't be there all properties matched as it is a debug output?
Specifically I am concentrating on my custom property, MEAS_VAL, it is not there.
The only property which occurs is "msg", which by the way is questionable whether it is a match of the journal, since the originating property name attached to the resulting content "there shoud be a text" is MESSAGE
So it feels that I don't hit the whole journal capturing mechanism at all, why?
Can we be sure that imjournal gets loaded properly?
I would say yes because of systemd's startup messages:
Nov 28 16:27:38 f1aad951c039 rsyslogd[144703]: imjournal: Journal indicates no msgs when positioned at head. [v8.2212.0.master try https://www.rsyslog.com/e/0 ]
Nov 28 16:27:38 f1aad951c039 rsyslogd[144703]: imjournal: journal files changed, reloading... [v8.2212.0.master try https://www.rsyslog.com/e/0 ]
Nov 28 16:27:38 f1aad951c039 rsyslogd[144703]: imjournal: Journal indicates no msgs when positioned at head. [v8.2212.0.master try https://www.rsyslog.com/e/0 ]
Edit 2022-11-29
Meanwhile I've compiled my own version 8.2212.0.master. But the phenomenon persists.
You're missing most items originating from the journal, because both templates RSYSLOG_DebugFormat and RSYSLOG_TraditionalFileFormat do not have the needed properties (See Reserved template names). RSYSLOG_DebugFormat, however, includes atleast some fields, e.g. procid, msgid and structured-data - which can be seen in the output you've provided.
This means, that if you want to include all the fields, you'll have to create your own template.
The journal fields are stored in key-value pairs. The imjournal module is able to parse these key-value pairs and generate the jsonf property,
which then can be used to access fields of the log message as if they were fields in a JSON object.
# load imjournal module
module(load="imjournal")
# specify journal as input source
input(type="imjournal")
template(name="journalTemplate" type="list") {
property(name="timestamp" dateFormat="rfc3339")
constant(value=" ")
property(name="hostname")
constant(value=" ")
property(name="syslogtag")
constant(value=": {")
property(name="jsonf")
constant(value="}")
}
if $programname == 'jt' and $syslogseverity == 3 then {
action(type="omfile" file="/var/log/jt_err.log" template="journalTemplate")
stop
}
The output of the provided log would then look something like the following:
YYYY-MM-DDTHH:mm:ss myHostname syslogtag: {"_GID" : "0", "MESSAGE" : "there shoud be a text", ... }
As seen in the log above, the output of the provided properties will be in JSON. By using the json property parser this can be prevented, as the output can be tailored as desired. If this is used, however, each property must be defined specifically.
template(name="journalTemplate" type="list") {
property(name="timestamp" dateFormat="rfc3339")
constant(value=" ")
property(name="hostname")
constant(value=" ")
property(name="syslogtag")
constant(value=": _GID=")
property(name="$._GID" format="json")
constant(value=" MESSAGE=")
property(name="$.MESSAGE" format="json")
constant(value=" _HOSTNAME=")
property(name="$._HOSTNAME" format="json")
...
}

Ansible Molecule fails to create the docker instance

I am trying to get Molecule working with Docker on a Centos7 host.
The 'molecule create' command fails to create docker instance, and gives error: 'Unsupported parameters for (community.docker.docker_container) module: ....'
Checking with 'docker ps -a' confirms instance was not created.
I can confirm starting containers manually with 'docker run' works
fine.
It's only the molecule failing to start the container
I am stuck with this, and your help is highly appreciated
Here is the complete error message:
TASK [Create molecule instance(s)] *********************************************
changed: [localhost] => (item=centos7)
TASK [Wait for instance(s) creation to complete] *******************************
FAILED - RETRYING: Wait for instance(s) creation to complete (300 retries left).
failed: [localhost] (item={'started': 1, 'finished': 0, 'ansible_job_id': '353704502720.15356', 'results_file': '/home/guest/.ansible_async/353704502720.15356', 'changed': True, 'failed': False, 'item': {'image': 'test-runner', 'name': 'centos7', 'pre_build_image': True}, 'ansible_loop_var': 'item'}) => {"ansible_job_id": "353704502720.15356", "ansible_loop_var": "item", "attempts": 2, "changed": false, "finished": 1, "item": {"ansible_job_id": "353704502720.15356", "ansible_loop_var": "item", "changed": true, "failed": false, "finished": 0, "item": {"image": "test-runner", "name": "centos7", "pre_build_image": true}, "results_file": "/home/guest/.ansible_async/353704502720.15356", "started": 1}, "msg": "Unsupported parameters for (community.docker.docker_container) module: command_handling Supported parameters include: api_version, auto_remove, blkio_weight, ca_cert, cap_drop, capabilities, cgroup_parent, cleanup, client_cert, client_key, command, comparisons, container_default_behavior, cpu_period, cpu_quota, cpu_shares, cpus, cpuset_cpus, cpuset_mems, debug, default_host_ip, detach, device_read_bps, device_read_iops, device_requests, device_write_bps, device_write_iops, devices, dns_opts, dns_search_domains, dns_servers, docker_host, domainname, entrypoint, env, env_file, etc_hosts, exposed_ports, force_kill, groups, healthcheck, hostname, ignore_image, image, init, interactive, ipc_mode, keep_volumes, kernel_memory, kill_signal, labels, links, log_driver, log_options, mac_address, memory, memory_reservation, memory_swap, memory_swappiness, mounts, name, network_mode, networks, networks_cli_compatible, oom_killer, oom_score_adj, output_logs, paused, pid_mode, pids_limit, privileged, published_ports, pull, purge_networks, read_only, recreate, removal_wait_timeout, restart, restart_policy, restart_retries, runtime, security_opts, shm_size, ssl_version, state, stop_signal, stop_timeout, sysctls, timeout, tls, tls_hostname, tmpfs, tty, ulimits, user, userns_mode, uts, validate_certs, volume_driver, volumes, volumes_from, working_dir", "stderr": "/tmp/ansible_community.docker.docker_container_payload_1djctes1/ansible_community.docker.docker_container_payload.zip/ansible_collections/community/docker/plugins/modules/docker_container.py:1193: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", "stderr_lines": ["/tmp/ansible_community.docker.docker_container_payload_1djctes1/ansible_community.docker.docker_container_payload.zip/ansible_collections/community/docker/plugins/modules/docker_container.py:1193: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead."]}
$ pip list | egrep "mole|docker|ansible"
ansible 2.10.7
ansible-base 2.10.17
ansible-compat 2.2.0
ansible-core 2.12.0
docker 6.0.0
molecule 4.0.1
molecule-docker 2.0.0
$ docker --version
Docker version 20.10.14, build a224086
$ ansible --version
ansible 2.10.17
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/guest/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/guest/mole3/mole3/lib/python3.8/site-packages/ansible
executable location = /home/guest/mole3/mole3/bin/ansible
python version = 3.8.11 (default, Sep 1 2021, 12:33:46) [GCC 9.3.1 20200408 (Red Hat 9.3.1-2)]
$ molecule --version
molecule 4.0.1 using python 3.8
ansible:2.10.17
delegated:4.0.1 from molecule
docker:2.0.0 from molecule_docker requiring collections: community.docker>=3.0.0-a2
$ cat molecule/default/molecule.yml
---
dependency:
name: galaxy
enabled: False
driver:
name: docker
platforms:
- name: centos7
image: test-runner
pre_build_image: true
provisioner:
name: ansible
verifier:
name: ansible

StormCrawler DISCOVER and FETCH a website but nothing gets saved in docs

There is a website that I'm trying to crawl, the crawler DISCOVER and FETCH the URLs but there is nothing in docs. this is
the website https://cactussara.ir. where is the problem?!
And this is the robots.txt of this website:
User-agent: *
Disallow: /
And this is my urlfilters.json:
{
"com.digitalpebble.stormcrawler.filtering.URLFilters": [
{
"class": "com.digitalpebble.stormcrawler.filtering.basic.BasicURLFilter",
"name": "BasicURLFilter",
"params": {
"maxPathRepetition": 8,
"maxLength": 8192
}
},
{
"class": "com.digitalpebble.stormcrawler.filtering.depth.MaxDepthFilter",
"name": "MaxDepthFilter",
"params": {
"maxDepth": -1
}
},
{
"class": "com.digitalpebble.stormcrawler.filtering.basic.BasicURLNormalizer",
"name": "BasicURLNormalizer",
"params": {
"removeAnchorPart": true,
"unmangleQueryString": true,
"checkValidURI": true,
"removeHashes": false
}
},
{
"class": "com.digitalpebble.stormcrawler.filtering.host.HostURLFilter",
"name": "HostURLFilter",
"params": {
"ignoreOutsideHost": true,
"ignoreOutsideDomain": false
}
},
{
"class": "com.digitalpebble.stormcrawler.filtering.regex.RegexURLNormalizer",
"name": "RegexURLNormalizer",
"params": {
"regexNormalizerFile": "default-regex-normalizers.xml"
}
},
{
"class": "com.digitalpebble.stormcrawler.filtering.regex.RegexURLFilter",
"name": "RegexURLFilter",
"params": {
"regexFilterFile": "default-regex-filters.txt"
}
}
]
}
And this is crawler-conf.yaml:
# Default configuration for StormCrawler
# This is used to make the default values explicit and list the most common configurations.
# Do not modify this file but instead provide a custom one with the parameter -conf
# when launching your extension of ConfigurableTopology.
config:
fetcher.server.delay: 1.0
# min. delay for multi-threaded queues
fetcher.server.min.delay: 0.0
fetcher.queue.mode: "byHost"
fetcher.threads.per.queue: 1
fetcher.threads.number: 10
fetcher.max.urls.in.queues: -1
fetcher.max.queue.size: -1
# max. crawl-delay accepted in robots.txt (in seconds)
fetcher.max.crawl.delay: 30
# behavior of fetcher when the crawl-delay in the robots.txt
# is larger than fetcher.max.crawl.delay:
# (if false)
# skip URLs from this queue to avoid that any overlong
# crawl-delay throttles the crawler
# (if true)
# set the delay to fetcher.max.crawl.delay,
# making fetcher more aggressive than requested
fetcher.max.crawl.delay.force: false
# behavior of fetcher when the crawl-delay in the robots.txt
# is smaller (ev. less than one second) than the default delay:
# (if true)
# use the larger default delay (fetcher.server.delay)
# and ignore the shorter crawl-delay in the robots.txt
# (if false)
# use the delay specified in the robots.txt
fetcher.server.delay.force: false
# time bucket to use for the metrics sent by the Fetcher
fetcher.metrics.time.bucket.secs: 10
# SimpleFetcherBolt: if the delay required by the politeness
# is above this value, the tuple is sent back to the Storm queue
# for the bolt on the _throttle_ stream.
fetcher.max.throttle.sleep: -1
# alternative values are "byIP" and "byDomain"
partition.url.mode: "byHost"
# metadata to transfer to the outlinks
# used by Fetcher for redirections, sitemapparser, etc...
# these are also persisted for the parent document (see below)
# metadata.transfer:
# - customMetadataName
# lists the metadata to persist to storage
# these are not transfered to the outlinks
metadata.persist:
- _redirTo
- error.cause
- error.source
- isSitemap
- isFeed
metadata.track.path: true
metadata.track.depth: true
http.agent.name: "Anonymous Coward"
http.agent.version: "1.0"
http.agent.description: "built with StormCrawler ${version}"
http.agent.url: "http://someorganization.com/"
http.agent.email: "someone#someorganization.com"
http.accept.language: "fa-IR,fa_IR,en-us,en-gb,en;q=0.7,*;q=0.3"
http.accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
http.content.limit: -1
http.store.headers: false
http.timeout: 10000
http.skip.robots: true
# store partial fetches as trimmed content (some content has been fetched,
# but reading more data from socket failed, eg. because of a network timeout)
http.content.partial.as.trimmed: false
# for crawling through a proxy:
# http.proxy.host:
# http.proxy.port:
# okhttp only, defaults to "HTTP"
# http.proxy.type: "SOCKS"
# for crawling through a proxy with Basic authentication:
# http.proxy.user:
# http.proxy.pass:
http.robots.403.allow: true
# should the URLs be removed when a page is marked as noFollow
robots.noFollow.strict: false
# Guava caches used for the robots.txt directives
robots.cache.spec: "maximumSize=10000,expireAfterWrite=6h"
robots.error.cache.spec: "maximumSize=10000,expireAfterWrite=1h"
protocols: "http,https,file"
http.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.httpclient.HttpProtocol"
https.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.httpclient.HttpProtocol"
file.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.file.FileProtocol"
# navigationfilters.config.file: "navigationfilters.json"
# selenium.addresses: "http://localhost:9515"
selenium.implicitlyWait: 0
selenium.pageLoadTimeout: -1
selenium.setScriptTimeout: 0
selenium.instances.num: 1
selenium.capabilities:
takesScreenshot: false
loadImages: false
javascriptEnabled: true
# illustrates the use of the variable for user agent
# phantomjs.page.settings.userAgent: "$userAgent"
# ChromeDriver config
# goog:chromeOptions:
# args:
# - "--headless"
# - "--disable-gpu"
# - "--mute-audio"
# DelegatorRemoteDriverProtocol
selenium.delegated.protocol: "com.digitalpebble.stormcrawler.protocol.httpclient.HttpProtocol"
# no url or parsefilters by default
parsefilters.config.file: "parsefilters.json"
urlfilters.config.file: "urlfilters.json"
# JSoupParserBolt
jsoup.treat.non.html.as.error: false
parser.emitOutlinks: true
parser.emitOutlinks.max.per.page: -1
track.anchors: true
detect.mimetype: true
detect.charset.maxlength: 10000
# filters URLs in sitemaps based on their modified Date (if any)
sitemap.filter.hours.since.modified: -1
# staggered scheduling of sitemaps
sitemap.schedule.delay: -1
# whether to add any sitemaps found in the robots.txt to the status stream
# used by fetcher bolts
sitemap.discovery: false
# Default implementation of Scheduler
scheduler.class: "com.digitalpebble.stormcrawler.persistence.DefaultScheduler"
# revisit a page daily (value in minutes)
# set it to -1 to never refetch a page
fetchInterval.default: 1440
# revisit a page with a fetch error after 2 hours (value in minutes)
# set it to -1 to never refetch a page
fetchInterval.fetch.error: 120
# never revisit a page with an error (or set a value in minutes)
fetchInterval.error: -1
# custom fetch interval to be used when a document has the key/value in its metadata
# and has been fetched succesfully (value in minutes)
# fetchInterval.FETCH_ERROR.isFeed=true
# fetchInterval.isFeed=true: 10
# max number of successive fetch errors before changing status to ERROR
max.fetch.errors: 3
# Guava cache use by AbstractStatusUpdaterBolt for DISCOVERED URLs
status.updater.use.cache: true
status.updater.cache.spec: "maximumSize=10000,expireAfterAccess=1h"
# Can also take "MINUTE" or "HOUR"
status.updater.unit.round.date: "SECOND"
# configuration for the classes extending AbstractIndexerBolt
# indexer.md.filter: "someKey=aValue"
indexer.url.fieldname: "url"
indexer.text.fieldname: "content"
indexer.text.maxlength: -1
indexer.canonical.name: "canonical"
indexer.md.mapping:
- parse.title=title
- parse.keywords=keywords
- parse.description=description
Thanks in advance.
The pages contain
<meta name="robots" content="noindex,follow"/>
which are found by the parser and causes the indexer bolt to skip the page.
This should be confirmed in the metrics where Filtered should be the same number as the pages fetched.
http.skip.robots does not apply to the directives set in the page itself.

Exception while parsing logfile in logstash

I am trying to parse Selenium Log file using logstash , but i am getting exception while parsing.
grok pattern used :
grok {
match => { "message" => "%{LOGLEVEL:severity} *%{GREEDYDATA:msg}" }
}
and sample log file is :
INFO Parental Page Loaded and Lock Content displayed
INFO Received PIN:1234
INFO The UI element mtfileTV.ParentalControls.Enabled [Enabled On] is displayed
INFO Parental Switch is already Enabled
ERROR Exception occoured while finding the given element name "mtfileTV.ParentalControls.Disabled" type "NAME" description " Parental lock is disabled text status" Value "Disabled Off"
ERROR An element could not be located on the page using the given search parameters. (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 382 milliseconds
For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html
Build info: version: 'unknown', revision: 'b526bd5', time: '2017-03-07 11:11:07 -0800'
System info: host: 'IN5033-01', ip: '192.168.86.127', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_151'
Driver info: io.appium.java_client.windows.WindowsDriver
Capabilities [{app=mtfileTVLLC.mtfileTV_vgszm6stshdqy!App, platformName=Windows, platform=ANY}]
Session ID: 9D664F38-2D99-4751-A535-60DF0466A08F
*** Element info: {Using=name, value=Disabled Off}
ERROR Exception occoured while finding the given element name "mtfileTV.ParentalControls.Disabled" type "NAME" description " Parental lock is disabled text status" Value "Disabled Off" The specified element mtfileTV.ParentalControls.Disabled Parental lock is disabled text status does not exist
ERROR The UI element mtfileTV.ParentalControls.Disabled is not displayed
INFO Parental control is already Enabled.Process ahead
INFO The UI element mtfileTV.ParentalControls.Status.RNC17TVMA.Locked [R, NC-17, TV-MA Locked] is displayed
INFO The UI element mtfileTV.ParentalControls.Status.RNC17TVMA.Locked [R, NC-17, TV-MA Locked] is displayed
INFO The text "R, NC-17, TV-MA Locked" got from UI element with name mtfileTV.ParentalControls.Status.RNC17TVMA.Locked [R, NC-17, TV-MA Locked]
INFO The UI element mtfileTV.ParentalControls.Status.RNC17TVMA.Locked [R, NC-17, TV-MA Locked] is displayed
INFO The text "R, NC-17, TV-MA Locked" got from UI element with name mtfileTV.ParentalControls.Status.RNC17TVMA.Locked [R, NC-17, TV-MA Locked]
INFO Cur status of the Requested Rating :R, NC-17, TV-MA Locked: is = R, NC-17, TV-MA Locked
INFO R, NC-17, TV-MA Locked is already locked.Do Nothing.
ERROR Exception occoured while finding the given element name "mtfileTV.ParentalControls.Disabled" type "NAME" description " Parental lock is disabled text status" Value "Disabled Off"
ERROR An element could not be located on the page using the given search parameters. (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 438 milliseconds
For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html
Build info: version: 'unknown', revision: 'b526bd5', time: '2017-03-07 11:11:07 -0800'
System info: host: 'IN5033-01', ip: '192.168.86.127', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_151'
Driver info: io.appium.java_client.windows.WindowsDriver
Capabilities [{app=mtfileTVLLC.mtfileTV_vgszm6stshdqy!App, platformName=Windows, platform=ANY}]
Please suggest me suitable grok pattern
The problem is that you have messages consisting of multiple lines. Therefore, you need to use the multiline codec to join all lines which do not start with a LOGLEVEL.
We can write a sample configuration (call it selenium.conf):
input {
stdin {
codec => multiline {
pattern => "^%{LOGLEVEL} "
negate => true
what => "previous"
}
}
}
filter {
grok {
match => { "message" => "%{LOGLEVEL:severity} %{GREEDYDATA:lines}" }
remove_field => [ "message" ]
}
}
output {
stdout {
codec => rubydebug
}
}
If we assume that the sample log file you provided is stored in selenium.log, we can do
$ logstash -f selenium.conf < selenium.log
and see that it works :)
Sample output follows (note the multiline tag in the first entry):
{
"host" => "d125q-PC",
"#timestamp" => 2018-07-04T11:10:28.141Z,
"lines" => "An element could not be located on the page using the given search parameters. (WARNING: The server did not provide any stacktrace information)\nCommand duration or timeout: 382 milliseconds\nFor documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html\nBuild info: version: 'unknown', revision: 'b526bd5', time: '2017-03-07 11:11:07 -0800'\nSystem info: host: 'IN5033-01', ip: '192.168.86.127', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_151'\nDriver info: io.appium.java_client.windows.WindowsDriver\nCapabilities [{app=mtfileTVLLC.mtfileTV_vgszm6stshdqy!App, platformName=Windows, platform=ANY}]\nSession ID: 9D664F38-2D99-4751-A535-60DF0466A08F\n*** Element info: {Using=name, value=Disabled Off}",
"tags" => [
[0] "multiline"
],
"#version" => "1",
"severity" => "ERROR"
}
{
"host" => "d125q-PC",
"#timestamp" => 2018-07-04T11:10:28.142Z,
"lines" => "The UI element mtfileTV.ParentalControls.Status.RNC17TVMA.Locked [R, NC-17, TV-MA Locked] is displayed",
"#version" => "1",
"severity" => "INFO"
}
{
"host" => "d125q-PC",
"#timestamp" => 2018-07-04T11:10:28.144Z,
"lines" => "The text \"R, NC-17, TV-MA Locked\" got from UI element with name mtfileTV.ParentalControls.Status.RNC17TVMA.Locked [R, NC-17, TV-MA Locked]",
"#version" => "1",
"severity" => "INFO"
}

Why does web-component-tester time out in flight mode?

I've got a basic web-component-tester project which works fine when I'm online.
If I switch to flight mode, it seems to fail to connect to Selenium, and instead gives a largely useless error message after about 60s delay: "Error: Unable to connect to selenium".
Edit 2: I've narrowed the problem down in the following question, but I'd still like to know how to avoid it with web-component-tester:
Why does NodeJS request() fail on localhost in flight mode, but not 127.0.0.1? (Windows 10)
Edit: After some digging, it's something to do with DNS resolver somewhere beneath selenium-standalone failing while in flight mode, and not a lot to do with web-component-tester.
After inserting some debug logging into selenium-standalone, I tracked down the failure point to the check for whether Selenium is running. When online, this works fine, but when offline I get:
// check-started.js, logging the error inside the request() call:
Error: getaddrinfo ENOENT localhost:60435
at Object.exports._errnoException (util.js:1022:11)
at errnoException (dns.js:33:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:76:26)
The following seem to describe similar situations, but I don't see how to persuade selenium-standalone nor web-component-tester to specify an IP address family to even try the suggested solutions:
https://github.com/nodejs/node/issues/4825
https://github.com/nodejs/node/issues/10290
node.js http.request and ipv6 vs ipv4
My original text is below.
The full error log and wct.conf.json are below. I can supply package.json and bower.json too if it would help.
I'm on Windows 10.
wct.conf.json:
{
"verbose": true,
"plugins": {
"local": {
"skipSeleniumInstall": true,
"browsers": ["chrome"]
},
"sauce": {
"disabled": true
}
}
}
error log:
> color-curve#0.0.1 test C:\Users\Dave\projects\infinity-components\color-curve
> standard "**/*.html" && wct -l chrome
step: loadPlugins
step: configure
hook: configure
Expanded local browsers: [ 'chrome' ] into capabilities: [ { browserName: 'chrome',
version: '60',
chromeOptions:
{ binary: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',
args: [Object] } } ]
configuration: { suites: [ 'test/index.html' ],
verbose: true,
quiet: false,
expanded: false,
testTimeout: 90000,
persistent: false,
extraScripts: [],
clientOptions: { root: '/components/', verbose: true },
compile: 'auto',
activeBrowsers: [ { browserName: 'chrome', version: '60', chromeOptions: [Object] } ],
browserOptions: {},
plugins:
{ local:
{ disabled: false,
skipSeleniumInstall: true,
browsers: [Object],
seleniumArgs: [] },
sauce: { disabled: true } },
registerHooks: [Function: registerHooks],
enforceJsonConf: false,
webserver:
{ hostname: 'localhost',
_generatedIndexContent: '<!doctype html>\n<html>\n <head>\n <meta charset="utf-8">\n <script>WCT = {"root":"/components/","verbose":true};</script>\n <script>window.__generatedByWct = true;</script>\n <script src="../web-component-tester/browser.js"></script>\n\n <script src="../web-component-tester/data/a11ySuite.js"></script>\n</head>\n <body>\n <script>\n WCT.loadSuites(["test/index.html"]);\n </script>\n </body>\n</html>\n' },
root: 'C:\\Users\\Dave\\projects\\infinity-components\\color-curve',
_: [],
origSuites: [ 'test/' ] }
hook: prepare
hook: prepare:selenium
Starting Selenium server for local browsers
INFO - Selenium build info: version: '3.0.1', revision: '1969d75'
INFO - Launching a standalone Selenium Server
INFO::main: Logging initialized #222ms
INFO - Driver class not found: com.opera.core.systems.OperaDriver
INFO - Driver provider com.opera.core.systems.OperaDriver registration is skipped:
Unable to create new instances on this machine.
INFO - Driver class not found: com.opera.core.systems.OperaDriver
INFO - Driver provider com.opera.core.systems.OperaDriver is not registered
INFO - Driver provider org.openqa.selenium.safari.SafariDriver registration is skipped:
registration capabilities Capabilities [{browserName=safari, version=, platform=MAC}] does not match the current platform WIN10
INFO:osjs.Server:main: jetty-9.2.15.v20160210
INFO:osjsh.ContextHandler:main: Started o.s.j.s.ServletContextHandler#100fc185{/,null,AVAILABLE}
INFO:osjs.ServerConnector:main: Started ServerConnector#2922e2bb{HTTP/1.1}{0.0.0.0:51126}
INFO:osjs.Server:main: Started #419ms
INFO - Selenium Server is up and running
INFO - Selenium build info: version: '3.0.1', revision: '1969d75'
INFO - Launching a standalone Selenium Server
INFO::main: Logging initialized #222ms
INFO - Driver class not found: com.opera.core.systems.OperaDriver
INFO - Driver provider com.opera.core.systems.OperaDriver registration is skipped:
Unable to create new instances on this machine.
INFO - Driver class not found: com.opera.core.systems.OperaDriver
INFO - Driver provider com.opera.core.systems.OperaDriver is not registered
INFO - Driver provider org.openqa.selenium.safari.SafariDriver registration is skipped:
registration capabilities Capabilities [{browserName=safari, version=, platform=MAC}] does not match the current platform WIN10
INFO:osjs.Server:main: jetty-9.2.15.v20160210
INFO:osjsh.ContextHandler:main: Started o.s.j.s.ServletContextHandler#100fc185{/,null,AVAILABLE}
INFO:osjs.ServerConnector:main: Started ServerConnector#2922e2bb{HTTP/1.1}{0.0.0.0:51126}
INFO:osjs.Server:main: Started #419ms
INFO - Selenium Server is up and running
Error: Unable to connect to selenium

Resources