I have an argo workflow with a single step where the following resources defined:
resources:
limits:
cpu: "6"
requests:
cpu: "3"
memory: 120Mi
After the workflow is executed and has completed successfully, I retrieved the run specifications, where the time duration for each pod and resource usage for CPU and memory can be found:
status:
artifactRepositoryRef:
artifactRepository:
archiveLogs: true
configMap: artifact-repositories
key: default-v1
namespace: argo
conditions:
- status: "False"
type: PodRunning
- status: "True"
type: Completed
finishedAt: "2022-12-18T11:13:25Z"
nodes:
test-cpu-usageszprm:
children:
- test-cpu-usageszprm-4131026553
displayName: test-cpu-usageszprm
finishedAt: "2022-12-18T11:13:25Z"
id: test-cpu-usageszprm
name: test-cpu-usageszprm
phase: Succeeded
progress: 1/1
resourcesDuration:
cpu: 173
memory: 79
startedAt: "2022-12-18T11:12:30Z"
templateName: main
templateScope: local/test-cpu-usageszprm
type: Retry
test-cpu-usageszprm-1944147306:
boundaryID: test-cpu-usageszprm-4131026553
children:
- test-cpu-usageszprm-2680509841
displayName: step1
finishedAt: "2022-12-18T11:13:25Z"
id: test-cpu-usageszprm-1944147306
name: test-cpu-usageszprm(0).step1
outputs:
artifacts:
- name: main-logs
s3:
key: test-cpu-usageszprm/test-cpu-usageszprm-2680509841/main.log
exitCode: "0"
phase: Succeeded
progress: 1/1
resourcesDuration:
cpu: 173
memory: 79
startedAt: "2022-12-18T11:12:30Z"
templateName: step1
templateScope: local/test-cpu-usageszprm
type: Retry
test-cpu-usageszprm-2680509841:
boundaryID: test-cpu-usageszprm-4131026553
displayName: step1(0)
finishedAt: "2022-12-18T11:13:15Z"
hostNodeName: center03-blade06
id: test-cpu-usageszprm-2680509841
name: test-cpu-usageszprm(0).step1(0)
outputs:
artifacts:
- name: main-logs
s3:
key: test-cpu-usageszprm/test-cpu-usageszprm-2680509841/main.log
exitCode: "0"
phase: Succeeded
progress: 1/1
resourcesDuration:
cpu: 173
memory: 79
startedAt: "2022-12-18T11:12:30Z"
templateName: step1
templateScope: local/test-cpu-usageszprm
type: Pod
test-cpu-usageszprm-4131026553:
children:
- test-cpu-usageszprm-1944147306
displayName: test-cpu-usageszprm(0)
finishedAt: "2022-12-18T11:13:25Z"
id: test-cpu-usageszprm-4131026553
name: test-cpu-usageszprm(0)
outboundNodes:
- test-cpu-usageszprm-2680509841
phase: Succeeded
progress: 1/1
resourcesDuration:
cpu: 173
memory: 79
startedAt: "2022-12-18T11:12:30Z"
templateName: main
templateScope: local/test-cpu-usageszprm
type: DAG
phase: Succeeded
progress: 1/1
resourcesDuration:
cpu: 173
memory: 79
startedAt: "2022-12-18T11:12:30Z"
Knowing that resources for CPU and memory are set for each pod, I would like to know how resourceDuration CPU (173) and memory (79) are calculated.
I found this document (https://argoproj.github.io/argo-workflows/resource-duration/#example) which describes the calculation for the resourceDuration for CPU and memory however I am not able to get 173, and 79 for cpu and memory reosurce duration respectively based on the pods time duration and resource limits and requests.
Does anyone know how resourceDuration is calculated?
I'm attempting to use Stormcrawler to crawl a set of pages on our website, and while it is able to retrieve and index some of the page's text, it's not capturing a large amount of other text on the page.
I've installed Zookeeper, Apache Storm, and Stormcrawler using the Ansible playbooks provided here (thank you a million for those!) on a server running Ubuntu 18.04, along with Elasticsearch and Kibana. For the most part, I'm using the configuration defaults, but have made the following changes:
For the Elastic index mappings, I've enabled _source: true, and turned on indexing and storing for all properties (content, host, title, url)
In the crawler-conf.yaml configuration, I've commented out all textextractor.include.pattern and textextractor.exclude.tags settings, to enforce capturing the whole page
After re-creating fresh ES indices, running mvn clean package, and then starting the crawler topology, stormcrawler begins doing its thing and content starts appearing in Elasticsearch. However, for many pages, the content that's retrieved and indexed is only a subset of all the text on the page, and usually excludes the main page text we are interested in.
For example, the text in the following XML path is not returned/indexed:
<html> <body> <div#maincontentcontainer.container> <div#docs-container> <div> <div.row> <div.col-lg-9.col-md-8.col-sm-12.content-item> <div> <div> <p> (text)
While the text in this path is returned:
<html> <body> <div> <div.container> <div.row> <p> (text)
Are there any additional configuration changes that need to be made beyond commenting out all specific tag include and exclude patterns? From my understanding of the documentation, the default settings for those options are to enforce the whole page to be indexed.
I would greatly appreciate any help. Thank you for the excellent software.
Below are my configuration files:
crawler-conf.yaml
config:
topology.workers: 3
topology.message.timeout.secs: 1000
topology.max.spout.pending: 100
topology.debug: false
fetcher.threads.number: 100
# override the JVM parameters for the workers
topology.worker.childopts: "-Xmx2g -Djava.net.preferIPv4Stack=true"
# mandatory when using Flux
topology.kryo.register:
- com.digitalpebble.stormcrawler.Metadata
# metadata to transfer to the outlinks
# metadata.transfer:
# - customMetadataName
# lists the metadata to persist to storage
metadata.persist:
- _redirTo
- error.cause
- error.source
- isSitemap
- isFeed
http.agent.name: "My crawler"
http.agent.version: "1.0"
http.agent.description: ""
http.agent.url: ""
http.agent.email: ""
# The maximum number of bytes for returned HTTP response bodies.
http.content.limit: -1
# FetcherBolt queue dump => comment out to activate
# fetcherbolt.queue.debug.filepath: "/tmp/fetcher-dump-{port}"
parsefilters.config.file: "parsefilters.json"
urlfilters.config.file: "urlfilters.json"
# revisit a page daily (value in minutes)
fetchInterval.default: 1440
# revisit a page with a fetch error after 2 hours (value in minutes)
fetchInterval.fetch.error: 120
# never revisit a page with an error (or set a value in minutes)
fetchInterval.error: -1
# text extraction for JSoupParserBolt
# textextractor.include.pattern:
# - DIV[id="maincontent"]
# - DIV[itemprop="articleBody"]
# - ARTICLE
# textextractor.exclude.tags:
# - STYLE
# - SCRIPT
# configuration for the classes extending AbstractIndexerBolt
# indexer.md.filter: "someKey=aValue"
indexer.url.fieldname: "url"
indexer.text.fieldname: "content"
indexer.canonical.name: "canonical"
indexer.md.mapping:
- parse.title=title
- parse.keywords=keywords
- parse.description=description
- domain=domain
# Metrics consumers:
topology.metrics.consumer.register:
- class: "org.apache.storm.metric.LoggingMetricsConsumer"
parallelism.hint: 1
http.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.selenium.RemoteDriverProtocol"
https.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.selenium.RemoteDriverProtocol"
selenium.addresses: "http://localhost:9515"
es-conf.yaml
config:
# ES indexer bolt
es.indexer.addresses: "localhost"
es.indexer.index.name: "content"
# es.indexer.pipeline: "_PIPELINE_"
es.indexer.create: false
es.indexer.bulkActions: 100
es.indexer.flushInterval: "2s"
es.indexer.concurrentRequests: 1
# ES metricsConsumer
es.metrics.addresses: "http://localhost:9200"
es.metrics.index.name: "metrics"
# ES spout and persistence bolt
es.status.addresses: "http://localhost:9200"
es.status.index.name: "status"
es.status.routing: true
es.status.routing.fieldname: "key"
es.status.bulkActions: 500
es.status.flushInterval: "5s"
es.status.concurrentRequests: 1
# spout config #
# positive or negative filters parsable by the Lucene Query Parser
# es.status.filterQuery:
# - "-(key:stormcrawler.net)"
# - "-(key:digitalpebble.com)"
# time in secs for which the URLs will be considered for fetching after a ack of fail
spout.ttl.purgatory: 30
# Min time (in msecs) to allow between 2 successive queries to ES
spout.min.delay.queries: 2000
# Delay since previous query date (in secs) after which the nextFetchDate value will be reset to the current time
spout.reset.fetchdate.after: 120
es.status.max.buckets: 50
es.status.max.urls.per.bucket: 2
# field to group the URLs into buckets
es.status.bucket.field: "key"
# fields to sort the URLs within a bucket
es.status.bucket.sort.field:
- "nextFetchDate"
- "url"
# field to sort the buckets
es.status.global.sort.field: "nextFetchDate"
# CollapsingSpout : limits the deep paging by resetting the start offset for the ES query
es.status.max.start.offset: 500
# AggregationSpout : sampling improves the performance on large crawls
es.status.sample: false
# max allowed duration of a query in sec
es.status.query.timeout: -1
# AggregationSpout (expert): adds this value in mins to the latest date returned in the results and
# use it as nextFetchDate
es.status.recentDate.increase: -1
es.status.recentDate.min.gap: -1
topology.metrics.consumer.register:
- class: "com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer"
parallelism.hint: 1
#whitelist:
# - "fetcher_counter"
# - "fetcher_average.bytes_fetched"
#blacklist:
# - "__receive.*"
es-crawler.flux
name: "crawler"
includes:
- resource: true
file: "/crawler-default.yaml"
override: false
- resource: false
file: "crawler-conf.yaml"
override: true
- resource: false
file: "es-conf.yaml"
override: true
spouts:
- id: "spout"
className: "com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout"
parallelism: 10
- id: "filespout"
className: "com.digitalpebble.stormcrawler.spout.FileSpout"
parallelism: 1
constructorArgs:
- "."
- "seeds.txt"
- true
bolts:
- id: "filter"
className: "com.digitalpebble.stormcrawler.bolt.URLFilterBolt"
parallelism: 3
- id: "partitioner"
className: "com.digitalpebble.stormcrawler.bolt.URLPartitionerBolt"
parallelism: 3
- id: "fetcher"
className: "com.digitalpebble.stormcrawler.bolt.FetcherBolt"
parallelism: 3
- id: "sitemap"
className: "com.digitalpebble.stormcrawler.bolt.SiteMapParserBolt"
parallelism: 3
- id: "parse"
className: "com.digitalpebble.stormcrawler.bolt.JSoupParserBolt"
parallelism: 12
- id: "index"
className: "com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt"
parallelism: 3
- id: "status"
className: "com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt"
parallelism: 3
- id: "status_metrics"
className: "com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt"
parallelism: 3
streams:
- from: "spout"
to: "partitioner"
grouping:
type: SHUFFLE
- from: "spout"
to: "status_metrics"
grouping:
type: SHUFFLE
- from: "partitioner"
to: "fetcher"
grouping:
type: FIELDS
args: ["key"]
- from: "fetcher"
to: "sitemap"
grouping:
type: LOCAL_OR_SHUFFLE
- from: "sitemap"
to: "parse"
grouping:
type: LOCAL_OR_SHUFFLE
- from: "parse"
to: "index"
grouping:
type: LOCAL_OR_SHUFFLE
- from: "fetcher"
to: "status"
grouping:
type: FIELDS
args: ["url"]
streamId: "status"
- from: "sitemap"
to: "status"
grouping:
type: FIELDS
args: ["url"]
streamId: "status"
- from: "parse"
to: "status"
grouping:
type: FIELDS
args: ["url"]
streamId: "status"
- from: "index"
to: "status"
grouping:
type: FIELDS
args: ["url"]
streamId: "status"
- from: "filespout"
to: "filter"
grouping:
type: FIELDS
args: ["url"]
streamId: "status"
- from: "filter"
to: "status"
grouping:
streamId: "status"
type: CUSTOM
customClass:
className: "com.digitalpebble.stormcrawler.util.URLStreamGrouping"
constructorArgs:
- "byDomain"
parsefilters.json
{
"com.digitalpebble.stormcrawler.parse.ParseFilters": [
{
"class": "com.digitalpebble.stormcrawler.parse.filter.XPathFilter",
"name": "XPathFilter",
"params": {
"canonical": "//*[#rel=\"canonical\"]/#href",
"parse.description": [
"//*[#name=\"description\"]/#content",
"//*[#name=\"Description\"]/#content"
],
"parse.title": [
"//TITLE",
"//META[#name=\"title\"]/#content"
],
"parse.keywords": "//META[#name=\"keywords\"]/#content"
}
},
{
"class": "com.digitalpebble.stormcrawler.parse.filter.LinkParseFilter",
"name": "LinkParseFilter",
"params": {
"pattern": "//FRAME/#src"
}
},
{
"class": "com.digitalpebble.stormcrawler.parse.filter.DomainParseFilter",
"name": "DomainParseFilter",
"params": {
"key": "domain",
"byHost": false
}
},
{
"class": "com.digitalpebble.stormcrawler.parse.filter.CommaSeparatedToMultivaluedMetadata",
"name": "CommaSeparatedToMultivaluedMetadata",
"params": {
"keys": ["parse.keywords"]
}
}
]
}
Attempting to use Chromedriver
I installed the latest versions of Chromedriver and Google Chrome for Ubuntu.
First I start chromedriver in headless mode at localhost:9515 as the stormcrawler user (via a separate python shell, as shown below), and then I restart the stormcrawler topology (also as stormcrawler user) but end up with a stack of errors related to Chrome. The odd thing however is that I can confirm chromedriver is running OK within the Python shell directly, and I can confirm that both the driver and browser are actively running via ps -ef). This same stack of errors also occurs when I attempt to simply start chromedriver from the command line (i.e., chromedriver --headless &).
Starting chromedriver in headless mode (in python3 shell)
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--window-size=1200x600')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-setuid-sandbox')
options.add_argument('--disable-extensions')
options.add_argument('--disable-infobars')
options.add_argument('--remote-debugging-port=9222')
options.add_argument('--user-data-dir=/home/stormcrawler/cache/google/chrome')
options.add_argument('--disable-gpu')
options.add_argument('--profile-directory=Default')
options.binary_location = '/usr/bin/google-chrome'
driver = webdriver.Chrome(chrome_options=options, port=9515, executable_path=r'/usr/bin/chromedriver')
Stack trace from starting stormcrawler topology
Run command: storm jar target/stormcrawler-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local es-crawler.flux --sleep 60000
9486 [Thread-26-fetcher-executor[3 3]] ERROR o.a.s.util - Async loop died!
java.lang.RuntimeException: org.openqa.selenium.WebDriverException: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Build info: version: '4.0.0-alpha-6', revision: '5f43a29cfc'
System info: host: 'stormcrawler-dev', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.15.0-33-generic', java.version: '1.8.0_282'
Driver info: driver.version: RemoteWebDriver
remote stacktrace: #0 0x55d590b21e89 <unknown>
at com.digitalpebble.stormcrawler.protocol.selenium.RemoteDriverProtocol.configure(RemoteDriverProtocol.java:101) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at com.digitalpebble.stormcrawler.protocol.ProtocolFactory.<init>(ProtocolFactory.java:69) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at com.digitalpebble.stormcrawler.bolt.FetcherBolt.prepare(FetcherBolt.java:818) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.storm.daemon.executor$fn__10180$fn__10193.invoke(executor.clj:803) ~[storm-core-1.2.3.jar:1.2.3]
at org.apache.storm.util$async_loop$fn__624.invoke(util.clj:482) [storm-core-1.2.3.jar:1.2.3]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: org.openqa.selenium.WebDriverException: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
...
Confirming that chromedriver and chrome are both running and reachable
~/stormcrawler$ ps -ef | grep -i 'driver'
stormcr+ 18862 18857 0 14:28 pts/0 00:00:00 /usr/bin/chromedriver --port=9515
stormcr+ 18868 18862 0 14:28 pts/0 00:00:00 /usr/bin/google-chrome --disable-background-networking --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-gpu --disable-hang-monitor --disable-infobars --disable-popup-blocking --disable-prompt-on-repost --disable-setuid-sandbox --disable-sync --enable-automation --enable-blink-features=ShadowDOMV0 --enable-logging --headless --log-level=0 --no-first-run --no-sandbox --no-service-autorun --password-store=basic --profile-directory=Default --remote-debugging-port=9222 --test-type=webdriver --use-mock-keychain --user-data-dir=/home/stormcrawler/cache/google/chrome --window-size=1200x600
stormcr+ 18899 18877 0 14:28 pts/0 00:00:00 /opt/google/chrome/chrome --type=renderer --no-sandbox --disable-dev-shm-usage --enable-automation --enable-logging --log-level=0 --remote-debugging-port=9222 --test-type=webdriver --allow-pre-commit-input --ozone-platform=headless --field-trial-handle=17069524199442920904,10206176048672570859,131072 --disable-gpu-compositing --enable-blink-features=ShadowDOMV0 --lang=en-US --headless --enable-crash-reporter --lang=en-US --num-raster-threads=1 --renderer-client-id=4 --shared-files=v8_context_snapshot_data:100
~/stormcrawler$ sudo netstat -lp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 localhost:9222 0.0.0.0:* LISTEN 18026/google-chrome
tcp 0 0 localhost:9515 0.0.0.0:* LISTEN 18020/chromedriver
IIRC you need to set some additional config to work with ChomeDriver.
Alternatively (haven't tried yet) https://hub.docker.com/r/browserless/chrome would be a nice way of handling Chrome in a Docker container.
I have an error while running a pipeline in Jenkins using a Kubernetes Cloud server.
Everything works fine until the moment of the npm install where i get Cannot contact nodejs-rn5f3: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel#3b1e0041:nodejs-rn5f3": Remote call on nodejs-rn5f3 failed. The channel is closing down or has closed down
How can I fix this error ?
Here are my logs :
[Pipeline] Start of Pipeline
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] node
Still waiting to schedule task
‘nodejs-rn5f3’ is offline
Agent nodejs-rn5f3 is provisioned from template nodejs
---
apiVersion: "v1"
kind: "Pod"
metadata:
labels:
jenkins: "slave"
jenkins/label-digest: "XXXXXXXXXXXXXXXXXXXXXXXXXX"
jenkins/label: "nodejs"
name: "nodejs-rn5f3"
spec:
containers:
- args:
- "cat"
command:
- "/bin/sh"
- "-c"
image: "node:15.5.1-alpine3.10"
imagePullPolicy: "IfNotPresent"
name: "node"
resources:
limits: {}
requests: {}
tty: true
volumeMounts:
- mountPath: "/home/jenkins/agent"
name: "workspace-volume"
readOnly: false
workingDir: "/home/jenkins/agent"
- env:
- name: "JENKINS_SECRET"
value: "********"
- name: "JENKINS_AGENT_NAME"
value: "nodejs-rn5f3"
- name: "JENKINS_WEB_SOCKET"
value: "true"
- name: "JENKINS_NAME"
value: "nodejs-rn5f3"
- name: "JENKINS_AGENT_WORKDIR"
value: "/home/jenkins/agent"
- name: "JENKINS_URL"
value: "http://XX.XX.XX.XX/"
image: "jenkins/inbound-agent:4.3-4"
name: "jnlp"
resources:
requests:
cpu: "100m"
memory: "256Mi"
volumeMounts:
- mountPath: "/home/jenkins/agent"
name: "workspace-volume"
readOnly: false
hostNetwork: false
nodeSelector:
kubernetes.io/os: "linux"
restartPolicy: "Never"
volumes:
- emptyDir:
medium: ""
name: "workspace-volume"
Running on nodejs-rn5f3 in /home/jenkins/agent/workspace/something
[Pipeline] {
[Pipeline] stage
[Pipeline] { (Test)
[Pipeline] checkout
Selected Git installation does not exist. Using Default
[... cloning repository]
[Pipeline] container
[Pipeline] {
[Pipeline] sh
+ ls -la
total 1240
drwxr-xr-x 5 node node 4096 Feb 26 07:33 .
drwxr-xr-x 4 node node 4096 Feb 26 07:33 ..
-rw-r--r-- 1 node node 1689 Feb 26 07:33 package.json
and some other files and folders
[Pipeline] sh
+ cat package.json
{
[...]
"dependencies": {
[blabla....]
},
"devDependencies": {
[blabla...]
}
}
[Pipeline] sh
+ npm install
Cannot contact nodejs-rn5f3: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel#3b1e0041:nodejs-rn5f3": Remote call on nodejs-rn5f3 failed. The channel is closing down or has closed down
At this stage, here are the logs of the container jnlp in my pod nodejs-rnf5f3 :
INFO: Connected
Feb 26, 2021 8:05:53 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Read side closed
Feb 26, 2021 8:05:53 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Feb 26, 2021 8:05:53 AM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1 onReconnect
INFO: Restarting agent via jenkins.slaves.restarter.UnixSlaveRestarter#1a39588e
Feb 26, 2021 8:05:55 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: nodejs-rnf5f3
Feb 26, 2021 8:05:55 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Feb 26, 2021 8:05:55 AM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.3
Feb 26, 2021 8:05:55 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Feb 26, 2021 8:05:55 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Feb 26, 2021 8:05:55 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: WebSocket connection open
Feb 26, 2021 8:05:58 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
.... same as above
I don't know where this error come from. Is this related to the usage of resources ?
Here are the usage of my containers :
POD NAME CPU(cores) MEMORY(bytes)
jenkins-1-jenkins-0 jenkins-master 61m 674Mi
nodejs-rnf5f3 jnlp 468m 104Mi
nodejs-rnf5f3 node 1243m 1284Mi
My cluster is a e2-medium in GKE with 2 nodes.
If I had to bet (but its just a wild guess) I had say that the pod was killed due to running out of memory (OOM Killed).
The ChannelClosedException is a symptom, not the problem.
Its kind of hard to debug because the agent pod is being deleted, you can try kubectl get events in the relevant namespace, but events only last for 1 hour by default.
Test environment is as follows:
CPU: Intel L5640 2.26 GHz 6 cores * 2 EA
Memory: SAMSUNG PC3-10600R 4 GB * 4 EA
HDD: TOSHIBA SAS 10,000 RPM 300 GB * 6 EA
OS: CentOS release 6.6 (Final)
Logstash 2.3.4
I used the following configuration:
input {
kafka {
zk_connect => '1.2.3.4:2181'
topic_id => 'some-log'
consumer_threads => 1
}
}
filter {
metrics {
meter => "events"
add_tag => "metric"
}
}
output {
if "metric" in [tags] {
stdout {
codec => line {
format => "Count: %{[events][count]}"
}
}
}
}
I got the following result:
./bin/logstash -f some-log-kafka.conf
Settings: Default pipeline workers: 24
Pipeline main started
Count: 9614
Count: 23080
Count: 37087
Count: 50815
Count: 64517
Count: 78296
Count: 91977
Count: 105990
Default flush_interval is 5 seconds, so it looks roughly 14K per 5 seconds (2.8K per second).
With consumer_threads set to 10, I got the following result:
./bin/logstash -f impression-log-kafka.conf
Settings: Default pipeline workers: 24
Pipeline main started
Count: 9599
Count: 23254
Count: 37253
Count: 51029
Count: 64881
Count: 78868
Count: 92663
Count: 106267
It looks increasing consumer_threads doesn't make much difference.
Based on my simple no-op consumer benchmark, I expected around 30K and at least 10K but it's just 1/10 of the expected performance.
How can I enhance its performance?
Additional comment:
With Kafka client Java library, I'm using bootstrap servers, whereas with Logstash Kafka input plugin, I'm using ZooKeeper (there's no option for bootstrap servers). I'm not sure this could lead to the huge difference.
I am trying to install 3 cassandra using bosh release.I am getting error
java.lang.UnsupportedOperationException: Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true
On searching in net i found that we need to put some delay when cluster joins.Let me know how to introduced delay. Do we have any attribute for this ?
- name: cassandra_seed
templates:
- name: cassandra
release: cassandra
- name: collectd
release: metrics
- name: logstash-shipper
release: cassandra
- name: consul
release: consul
instances: 1
resource_pool: service-net-medium
persistent_disk: 10240
networks:
- name: ccc-service-net
default: [dns, gateway]
properties:
collectd:
plugin_templates: [cassandra]
cassandra:
broadcast_address: 0.cassandra-seed.ccc-service-net.<%= $deployment_name %>.microbosh
consul:
bootstrap_expect: 0
join_hosts: ["0.vault-consul.ccc-service-net.<%= $deployment_name %>.microbosh"]
service:
name: cassandra
process:
name: ps -ef |grep cassandra |grep -v grep || exit 2
server: false
default_recursor: 8.8.8.8
update:
serial: false
Error
root#9e3c9ac3-1832-48cf-a58c-3ef25ee17869:/var/vcap/sys/log/cassandra# vim cassandra.stderr.log
java.lang.UnsupportedOperationException: Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:584)
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:855)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:725)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:625)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:366)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:581)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:710)