All hikaricp connection are in use - slick

hikaricp: 2.3.7
mysql-connector-j: 5.1.30
slick: 3.1.0
HikariCP Config
autoCommit......................true
catalog.........................
connectionCustomizer............com.zaxxer.hikari.AbstractHikariConfig$1#37b70343
connectionCustomizerClassName...
connectionInitSql...............
connectionTestQuery.............
connectionTimeout...............30000
dataSource......................
dataSourceClassName.............
dataSourceJNDI..................
dataSourceProperties............{password=<masked>}
driverClassName.................com.mysql.jdbc.Driver
healthCheckProperties...........{}
healthCheckRegistry.............
idleTimeout.....................600000
initializationFailFast..........true
isolateInternalQueries..........false
jdbc4ConnectionTest.............false
jdbcUrl.........................jdbc:mysql://10.66.108.202/weixin_prod
leakDetectionThreshold..........0
maxLifetime.....................1800000
maximumPoolSize.................10
metricRegistry..................
minimumIdle.....................10
password........................<masked>
poolName........................HikariPool-0
readOnly........................false
registerMbeans..................false
threadFactory...................
transactionIsolation............
username........................weixin_prod
validationTimeout...............5000
HikariCP pool HikariPool-0 configuration:
allowPoolSuspension.............false
autoCommit......................true
catalog.........................
connectionCustomizer............com.zaxxer.hikari.AbstractHikariConfig$1#37b70343
connectionCustomizerClassName...
connectionInitSql...............
connectionTestQuery.............
connectionTimeout...............30000
dataSource......................
dataSourceClassName.............
dataSourceJNDI..................
dataSourceProperties............{password=<masked>}
driverClassName.................com.mysql.jdbc.Driver
healthCheckProperties...........{}
healthCheckRegistry.............
vaidleTimeout.....................600000
Logs
2015-10-23 10:25:21 +0800 [DEBUG] from com.zaxxer.hikari.pool.HikariPool in Hikari Housekeeping Timer (pool HikariPool-0) - Before cleanup pool stats HikariPool-0 (total=10, inUse=0, avail=10, waiting=0)
2015-10-23 10:25:21 +0800 [DEBUG] from com.zaxxer.hikari.pool.HikariPool in Hikari Housekeeping Timer (pool HikariPool-0) - After cleanup pool stats HikariPool-0 (total=10, inUse=0, avail=10, waiting=0)
2015-10-23 10:25:51 +0800 [DEBUG] from com.zaxxer.hikari.pool.HikariPool in Hikari Housekeeping Timer (pool HikariPool-0) - Before cleanup pool stats HikariPool-0 (total=10, inUse=10, avail=0, waiting=0)
2015-10-23 10:25:51 +0800 [DEBUG] from com.zaxxer.hikari.pool.HikariPool in Hikari Housekeeping Timer (pool HikariPool-0) - After cleanup pool stats HikariPool-0 (total=10, inUse=10, avail=0, waiting=0)
2015-10-23 10:25:51 +0800 [DEBUG] from com.zaxxer.hikari.pool.HikariPool in HikariCP connection filler (pool HikariPool-0) - After fill pool stats HikariPool-0 (total=10, inUse=10, avail=0, waiting=0)
2015-10-23 10:26:21 +0800 [DEBUG] from com.zaxxer.hikari.pool.HikariPool in Hikari Housekeeping Timer (pool HikariPool-0) - Before cleanup pool stats HikariPool-0 (total=10, inUse=10, avail=0, waiting=10)
2015-10-23 10:26:21 +0800 [DEBUG] from com.zaxxer.hikari.pool.HikariPool in Hikari Housekeeping Timer (pool HikariPool-0) - After cleanup pool stats HikariPool-0 (total=10, inUse=10, avail=0, waiting=10)
2015-10-23 10:26:21 +0800 [DEBUG] from com.zaxxer.hikari.pool.HikariPool in HikariCP connection filler (pool HikariPool-0) - After fill pool stats HikariPool-0 (total=10, inUse=10, avail=0, waiting=10)
2015-10-23 10:26:21 +0800 [DEBUG] from com.zaxxer.hikari.pool.HikariPool in slick-async-executor-3 - Timeout failure pool stats HikariPool-0 (total=10, inUse=10, avail=0, waiting=9)
2015-10-23 10:26:21 +0800 [DEBUG] from com.zaxxer.hikari.pool.HikariPool in slick-async-executor-8 - Timeout failure pool stats HikariPool-0 (total=10, inUse=10, avail=0, waiting=9)
view detail log
All connections are in use after 2015-10-23 10:26:21 and then all operations are waiting and timeout util restart the whole app ?
What's wrong ?

Related

spring cloud kafka streams terminating in azure kubernetes service

I created Kafka streams application with spring cloud stream which reads data from one topic and writes to another topic and I'm trying to deploy and run the job in AKS with ACR image but the stream getting closed without any error after reading all the available messages(lag 0) in the topic. strange thing I'm facing is, it is running fine in Intellj.
Here is my AKS pod logs:
2021-03-02 17:30:39,131] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.NetworkClient NetworkClient.java:840] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Received FETCH response from node 3 for request with header RequestHeader(apiKey=FETCH, apiVersion=11, clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, correlationId=62): org.apache.kafka.common.requests.FetchResponse#7b021a01
[2021-03-02 17:30:39,131] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.FetchSessionHandler FetchSessionHandler.java:463] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Node 0 sent an incremental fetch response with throttleTimeMs = 3 for session 614342128 with 0 response partition(s), 1 implied partition(s)
[2021-03-02 17:30:39,132] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.c.i.Fetcher Fetcher.java:1177] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Added READ_UNCOMMITTED fetch request for partition test.topic at position FetchPosition{offset=128, offsetEpoch=Optional[0], currentLeader=LeaderAndEpoch{leader=Optional[vm3.lab (id: 3 rack: 1)], epoch=1}} to node vm3.lab (id: 3 rack: 1)
[2021-03-02 17:30:39,132] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.FetchSessionHandler FetchSessionHandler.java:259] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Built incremental fetch (sessionId=614342128, epoch=49) for node 3. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s) out of 1 partition(s)
[2021-03-02 17:30:39,132] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.c.i.Fetcher Fetcher.java:261] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), implied=(test.topic)) to broker vm3.lab (id: 3 rack: 1)
[2021-03-02 17:30:39,132] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.NetworkClient NetworkClient.java:505] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=11, clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, correlationId=63) and timeout 60000 to node 3: {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=614342128,session_epoch=49,topics=[],forgotten_topics_data=[],rack_id=}
[2021-03-02 17:30:39,636] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.NetworkClient NetworkClient.java:840] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Received FETCH response from node 3 for request with header RequestHeader(apiKey=FETCH, apiVersion=11, clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, correlationId=63): org.apache.kafka.common.requests.FetchResponse#50fb365c
[2021-03-02 17:30:39,636] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.FetchSessionHandler FetchSessionHandler.java:463] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Node 0 sent an incremental fetch response with throttleTimeMs = 3 for session 614342128 with 0 response partition(s), 1 implied partition(s)
[2021-03-02 17:30:39,637] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.c.i.Fetcher Fetcher.java:1177] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Added READ_UNCOMMITTED fetch request for partition test.topic at position FetchPosition{offset=128, offsetEpoch=Optional[0], currentLeader=LeaderAndEpoch{leader=Optional[vm3.lab (id: 3 rack: 1)], epoch=1}} to node vm3.lab (id: 3 rack: 1)
[2021-03-02 17:30:39,637] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.FetchSessionHandler FetchSessionHandler.java:259] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Built incremental fetch (sessionId=614342128, epoch=50) for node 3. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s) out of 1 partition(s)
[2021-03-02 17:30:39,637] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.c.i.Fetcher Fetcher.java:261] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), implied=(test.topic)) to broker vm3.lab (id: 3 rack: 1)
[2021-03-02 17:30:39,637] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.c.NetworkClient NetworkClient.java:505] [Consumer clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, groupId=latest] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=11, clientId=latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1-consumer, correlationId=64) and timeout 60000 to node 3: {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=614342128,session_epoch=50,topics=[],forgotten_topics_data=[],rack_id=}
[2021-03-02 17:30:39,710] [DEBUG] [SpringContextShutdownHook] [o.s.c.a.AnnotationConfigApplicationContext AbstractApplicationContext.java:1006] Closing org.springframework.context.annotation.AnnotationConfigApplicationContext#dc9876b, started on Tue Mar 02 17:29:08 GMT 2021
[2021-03-02 17:30:39,715] [DEBUG] [SpringContextShutdownHook] [o.s.c.a.AnnotationConfigApplicationContext AbstractApplicationContext.java:1006] Closing org.springframework.context.annotation.AnnotationConfigApplicationContext#71391b3f, started on Tue Mar 02 17:29:12 GMT 2021, parent: org.springframework.context.annotation.AnnotationConfigApplicationContext#dc9876b
[2021-03-02 17:30:39,718] [DEBUG] [SpringContextShutdownHook] [o.s.c.s.DefaultLifecycleProcessor DefaultLifecycleProcessor.java:369] Stopping beans in phase 2147483547
[2021-03-02 17:30:39,718] [DEBUG] [SpringContextShutdownHook] [o.s.c.s.DefaultLifecycleProcessor DefaultLifecycleProcessor.java:242] Bean 'org.springframework.kafka.config.internalKafkaListenerEndpointRegistry' completed its stop procedure
[2021-03-02 17:30:39,719] [DEBUG] [SpringContextShutdownHook] [o.a.k.s.KafkaStreams KafkaStreams.java:1016] stream-client [latest-e07d649d-5178-4107-898b-08b8008d822e] Stopping Streams client with timeoutMillis = 10000 ms.
[2021-03-02 17:30:39,719] [INFO] [SpringContextShutdownHook] [o.a.k.s.KafkaStreams KafkaStreams.java:287] stream-client [latest-e07d649d-5178-4107-898b-08b8008d822e] State transition from RUNNING to PENDING_SHUTDOWN
[2021-03-02 17:30:39,729] [INFO] [kafka-streams-close-thread] [o.a.k.s.p.i.StreamThread StreamThread.java:1116] stream-thread [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] Informed to shut down
[2021-03-02 17:30:39,729] [INFO] [kafka-streams-close-thread] [o.a.k.s.p.i.StreamThread StreamThread.java:221] stream-thread [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] State transition from RUNNING to PENDING_SHUTDOWN
[2021-03-02 17:30:39,788] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.s.p.i.StreamThread StreamThread.java:772] stream-thread [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] State already transits to PENDING_SHUTDOWN, skipping the run once call after poll request
[2021-03-02 17:30:39,788] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.s.p.i.StreamThread StreamThread.java:206] stream-thread [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] Ignoring request to transit from PENDING_SHUTDOWN to PENDING_SHUTDOWN: only DEAD state is a valid next state
[2021-03-02 17:30:39,788] [INFO] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.s.p.i.StreamThread StreamThread.java:1130] stream-thread [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] Shutting down
[2021-03-02 17:30:39,788] [DEBUG] [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] [o.a.k.s.p.i.AssignedStreamsTasks AssignedStreamsTasks.java:529] stream-thread [latest-e07d649d-5178-4107-898b-08b8008d822e-StreamThread-1] Clean shutdown of all active tasks
Please advise.

gunicorn killing workers repeatedly with no traffic

I have a Flask app running on top of gunicorn which works fine on centos8 and rhel8. I'm trying to bring the app up on an rhel8 derivative which is hardened for FIPS/military compliance. This new os constantly sees workers timing out with no REST traffic offered.
The keepalive files are not having their timestamps updated by the worker threads which leads the arbiter to kill them off. I've added enough debug code to know that much.
I've studied the code and see that gunicorn uses a notify mechanism which relies on updating the permissions on temporary files. I've been looking around for these files but can't seem to find them. I've looked in /tmp (which I believe to be the default) and even set --worker-tmp-dir /tmp/gunicorn with no luck. Both are empty.
Any advice on how to proceed here will be appreciated.
[root#host-172-23-14-219 ~]# ls -ltr /tmp/gunicorn/
total 0
[root#host-172-23-14-219 ~]# ls -latr /tmp/gunicorn/
total 0
These are the messages I see in the debug output. The repeat every 30 seconds which is the default value for timeout. If I change the timeout, the time before being killed increases.
[2021-01-15 16:33:21 +0300] [9120] [CRITICAL] WORKER TIMEOUT (pid:14483)
[2021-01-15 16:33:21 +0300] [9120] [CRITICAL] WORKER TIMEOUT (pid:14485)
[2021-01-15 16:33:21 +0300] [9120] [CRITICAL] WORKER TIMEOUT (pid:14488)
[2021-01-15 16:33:21 +0300] [14483] [INFO] Worker exiting (pid: 14483)
[2021-01-15 16:33:21 +0300] [14485] [INFO] Worker exiting (pid: 14485)
[2021-01-15 16:33:21 +0300] [14488] [INFO] Worker exiting (pid: 14488)
[2021-01-15 16:33:21 +0300] [14782] [INFO] Booting worker with pid: 14782
[2021-01-15 16:33:21 +0300] [9120] [DEBUG] 1 workers
[2021-01-15 16:33:21 +0300] [14783] [INFO] Booting worker with pid: 14783
[2021-01-15 16:33:21 +0300] [14785] [INFO] Booting worker with pid: 14785
[2021-01-15 16:33:21 +0300] [9120] [DEBUG] 3 workers
Complete trace running gunicorn from the command line.
[root#dbaas1 restapi]# /usr/bin/python3 /usr/local/bin/gunicorn --log-level debug --log-syslog --workers 3 --bind 127.0.0.1:5000 -m 007 wsgi:app
[2021-01-18 22:11:23 +0300] [2942060] [DEBUG] Current configuration:
config: None
bind: ['127.0.0.1:5000']
backlog: 2048
workers: 3
worker_class: sync
threads: 1
worker_connections: 1000
max_requests: 0
max_requests_jitter: 0
timeout: 30
graceful_timeout: 30
keepalive: 2
limit_request_line: 4094
limit_request_fields: 100
limit_request_field_size: 8190
reload: False
reload_engine: auto
reload_extra_files: []
spew: False
check_config: False
preload_app: False
sendfile: None
reuse_port: False
chdir: /opt/rbbn/dbaas/src/serviceDiscovery/restapi
daemon: False
raw_env: []
pidfile: None
worker_tmp_dir: None
user: 0
group: 0
umask: 7
initgroups: False
tmp_upload_dir: None
secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
forwarded_allow_ips: ['127.0.0.1']
accesslog: None
disable_redirect_access_to_syslog: False
access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
errorlog: -
loglevel: debug
capture_output: False
logger_class: gunicorn.glogging.Logger
logconfig: None
logconfig_dict: {}
syslog_addr: udp://localhost:514
syslog: True
syslog_prefix: None
syslog_facility: user
enable_stdio_inheritance: False
statsd_host: None
dogstatsd_tags:
statsd_prefix:
proc_name: None
default_proc_name: wsgi:app
pythonpath: None
paste: None
on_starting: <function OnStarting.on_starting at 0x7f13b046be18>
on_reload: <function OnReload.on_reload at 0x7f13b046bf28>
when_ready: <function WhenReady.when_ready at 0x7f13b04860d0>
pre_fork: <function Prefork.pre_fork at 0x7f13b04861e0>
post_fork: <function Postfork.post_fork at 0x7f13b04862f0>
post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f13b0486400>
worker_int: <function WorkerInt.worker_int at 0x7f13b0486510>
worker_abort: <function WorkerAbort.worker_abort at 0x7f13b0486620>
pre_exec: <function PreExec.pre_exec at 0x7f13b0486730>
pre_request: <function PreRequest.pre_request at 0x7f13b0486840>
post_request: <function PostRequest.post_request at 0x7f13b04868c8>
child_exit: <function ChildExit.child_exit at 0x7f13b04869d8>
worker_exit: <function WorkerExit.worker_exit at 0x7f13b0486ae8>
nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f13b0486bf8>
on_exit: <function OnExit.on_exit at 0x7f13b0486d08>
proxy_protocol: False
proxy_allow_ips: ['127.0.0.1']
keyfile: None
certfile: None
ssl_version: 2
cert_reqs: 0
ca_certs: None
suppress_ragged_eofs: True
do_handshake_on_connect: False
ciphers: None
raw_paste_global_conf: []
strip_header_spaces: False
[2021-01-18 22:11:23 +0300] [2942060] [INFO] Starting gunicorn 20.0.4
[2021-01-18 22:11:23 +0300] [2942060] [DEBUG] Arbiter booted
[2021-01-18 22:11:23 +0300] [2942060] [INFO] Listening at: http://127.0.0.1:5000 (2942060)
[2021-01-18 22:11:23 +0300] [2942060] [INFO] Using worker: sync
[2021-01-18 22:11:23 +0300] [2942066] [INFO] Booting worker with pid: 2942066
[2021-01-18 22:11:23 +0300] [2942068] [INFO] Booting worker with pid: 2942068
[2021-01-18 22:11:23 +0300] [2942069] [INFO] Booting worker with pid: 2942069
[2021-01-18 22:11:23 +0300] [2942060] [DEBUG] 3 workers
[2021-01-18 22:11:53 +0300] [2942060] [CRITICAL] WORKER TIMEOUT (pid:2942066)
[2021-01-18 22:11:53 +0300] [2942060] [CRITICAL] WORKER TIMEOUT (pid:2942068)
[2021-01-18 22:11:53 +0300] [2942060] [CRITICAL] WORKER TIMEOUT (pid:2942069)
[2021-01-18 22:11:53 +0300] [2942066] [INFO] Worker exiting (pid: 2942066)
[2021-01-18 22:11:53 +0300] [2942068] [INFO] Worker exiting (pid: 2942068)
[2021-01-18 22:11:53 +0300] [2942069] [INFO] Worker exiting (pid: 2942069)
[2021-01-18 22:11:53 +0300] [2942357] [INFO] Booting worker with pid: 2942357
[2021-01-18 22:11:53 +0300] [2942060] [DEBUG] 1 workers
[2021-01-18 22:11:53 +0300] [2942359] [INFO] Booting worker with pid: 2942359
[2021-01-18 22:11:53 +0300] [2942360] [INFO] Booting worker with pid: 2942360
[2021-01-18 22:11:53 +0300] [2942060] [DEBUG] 3 workers
Running into a similar problem - everything works fine on Ubuntu 18.04, but fails consistently with similar errors / logging on Ubuntu 22.04.
Try increasing timeout higher than 30 in gunicorn config / settings.
Changing the timeout to 90 was sufficient for me to get things working again - my best guess is that the initial load-in of the Flask app takes longer on some machines and so it's hitting this limit.

Hive on Spark infinite connections

Im trying to run a simple query on a column with only 10 rows:
select MAX(Column3) from table;
However the spark application runs infinitely with the following message:
> 2017-05-10T16:23:40,397 DEBUG [IPC Parameter Sending Thread #0]
> ipc.Client: IPC Client (1360312263) connection to /0.0.0.0:8032 from
> ubuntu sending #1841 2017-05-10T16:23:40,397 DEBUG [IPC Client
> (1360312263) connection to /0.0.0.0:8032 from ubuntu] ipc.Client: IPC
> Client (1360312263) connection to /0.0.0.0:8032 from ubuntu got value
> #1841 2017-05-10T16:23:40,397 DEBUG [main] ipc.ProtobufRpcEngine: Call: getApplicationReport took 0ms 2017-05-10T16:23:41,397 DEBUG
> [main] security.UserGroupInformation: PrivilegedAction as:ubuntu
> (auth:SIMPLE)
> from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323)
> 2017-05-10T16:23:41,398 DEBUG [IPC Parameter Sending Thread #0]
> ipc.Client: IPC Client (1360312263) connection to /0.0.0.0:8032 from
> ubuntu sending #1842 2017-05-10T16:23:41,398 DEBUG [IPC Client
> (1360312263) connection to /0.0.0.0:8032 from ubuntu] ipc.Client: IPC
> Client (1360312263) connection to /0.0.0.0:8032 from ubuntu got value
> #1842 2017-05-10T16:23:41,398 DEBUG [main] ipc.ProtobufRpcEngine: Call: getApplicationReport took 1ms 2017-05-10T16:23:41,399 DEBUG
> [main] security.UserGroupInformation: PrivilegedAction as:ubuntu
> (auth:SIMPLE)
> from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323)
> 2017-05-10T16:23:41,399 DEBUG [IPC Parameter Sending Thread #0]
> ipc.Client: IPC Client (1360312263) connection to /0.0.0.0:8032 from
> ubuntu sending #1843 2017-05-10T16:23:41,399 DEBUG [IPC Client
> (1360312263) connection to /0.0.0.0:8032 from ubuntu] ipc.Client: IPC
> Client (1360312263) connection to /0.0.0.0:8032 from ubuntu got value
> #1843 2017-05-10T16:23:41,399 DEBUG [main] ipc.ProtobufRpcEngine: Call: getApplicationReport took 0ms
The issue was related to an unhealthy node, therefore it was not able to assign the task. The solution was to increase the yarn maximum disk utilization percentage in yarn-site.xml because my disk was at 97% used :
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>99</value>
</property>

ConfigurationException while launching Apache Cassanda DB: This node was decommissioned and will not rejoin the ring

This is a snippet from the system log while shutting down:
INFO [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:28:50,995 StorageService.java:3788 - Announcing that I have left the ring for 30000ms
INFO [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:29:20,995 ThriftServer.java:142 - Stop listening to thrift clients
INFO [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:29:20,997 Server.java:182 - Stop listening for CQL clients
WARN [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:29:20,997 Gossiper.java:1508 - No local state or state is in silent shutdown, not announcing shutdown
INFO [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:29:20,997 MessagingService.java:786 - Waiting for messaging service to quiesce
INFO [ACCEPT-sysengplayl0127.bio-iad.ea.com/10.72.194.229] 2016-07-27 22:29:20,998 MessagingService.java:1133 - MessagingService has terminated the accept() thread
INFO [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:29:21,022 StorageService.java:1411 - DECOMMISSIONED
INFO [main] 2016-07-27 22:32:17,534 YamlConfigurationLoader.java:89 - Configuration location: file:/opt/cassandra/product/apache-cassandra-3.7/conf/cassandra.yaml
And then while starting up:
INFO [main] 2016-07-27 22:32:20,316 StorageService.java:630 - Cassandra version: 3.7
INFO [main] 2016-07-27 22:32:20,316 StorageService.java:631 - Thrift API version: 20.1.0
INFO [main] 2016-07-27 22:32:20,316 StorageService.java:632 - CQL supported versions: 3.4.2 (default: 3.4.2)
INFO [main] 2016-07-27 22:32:20,351 IndexSummaryManager.java:85 - Initializing index summary manager with a memory pool size of 397 MB and a resize interval of 60 minutes
ERROR [main] 2016-07-27 22:32:20,357 CassandraDaemon.java:731 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set, or all existing data is removed and the node is bootstrapped again
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:815) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:725) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:625) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:370) [apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:585) [apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:714) [apache-cassandra-3.7.jar:3.7]
WARN [StorageServiceShutdownHook] 2016-07-27 22:32:20,358 Gossiper.java:1508 - No local state or state is in silent shutdown, not announcing shutdown
INFO [StorageServiceShutdownHook] 2016-07-27 22:32:20,359 MessagingService.java:786 - Waiting for messaging service to quiesce
Is there something wrong with the configuration?
I had faced same issue.
Posting the answer so that it might help others.
As the log suggests, the property "cassandra.override_decommission" should be overridden.
start cassandra with the syntax:
cassandra -Dcassandra.override_decommission=true
This should add the node back to the cluster.

Unable to Start Production Profile in Jhipster Version 1.0.0

Helo Everyone,
I have just take update of jipster to 1.0.0 but when i created project without webscoket i.e Atmosphere is not working on Production profile giving these exception while try to run Project on Production Profile Please Help Me.
[DEBUG] com.application.gom.config.AsyncConfiguration - Creating Async Task Executor
[DEBUG] com.application.gom.config.MetricsConfiguration - Registring JVM gauges
[INFO] com.application.gom.config.MetricsConfiguration - Initializing Metrics JMX reporting
[INFO] com.hazelcast.instance.DefaultAddressPicker - null [dev] [3.2.5] Prefer IPv4 stack is true.
[INFO] com.hazelcast.instance.DefaultAddressPicker - null [dev] [3.2.5] Picked Address[192.168.1.11]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
[INFO] com.hazelcast.system - [192.168.1.11]:5701 [dev] [3.2.5] Hazelcast 3.2.5 (20140814) starting at Address[192.168.1.11]:5701
[INFO] com.hazelcast.system - [192.168.1.11]:5701 [dev] [3.2.5] Copyright (coffee) 2008-2014 Hazelcast.com
[INFO] com.hazelcast.instance.Node - [192.168.1.11]:5701 [dev] [3.2.5] Creating MulticastJoiner
[INFO] com.hazelcast.core.LifecycleService - [192.168.1.11]:5701 [dev] [3.2.5] Address[192.168.1.11]:5701 is STARTING
[INFO] com.hazelcast.nio.SocketConnector - [192.168.1.11]:5701 [dev] [3.2.5] Connecting to /192.168.1.12:5701, timeout: 0, bind-any: true
[INFO] com.hazelcast.nio.TcpIpConnectionManager - [192.168.1.11]:5701 [dev] [3.2.5] 55994 accepted socket connection from /192.168.1.12:5701
[INFO] com.hazelcast.nio.TcpIpConnection - [192.168.1.11]:5701 [dev] [3.2.5] Connection [Address[192.168.1.12]:5701] lost. Reason: java.io.EOFException[Remote socket closed!]
[WARN] com.hazelcast.nio.ReadHandler - [192.168.1.11]:5701 [dev] [3.2.5] hz.gomapplication.IO.thread-in-0 Closing socket to endpoint Address[192.168.1.12]:5701, Cause:java.io.EOFException: Remote socket closed!
[INFO] com.hazelcast.nio.SocketConnector - [192.168.1.11]:5701 [dev] [3.2.5] Connecting to /192.168.1.12:5701, timeout: 0, bind-any: true
[INFO] com.hazelcast.nio.TcpIpConnectionManager - [192.168.1.11]:5701 [dev] [3.2.5] 55995 accepted socket connection from /192.168.1.12:5701
The error you have has nothing to do with Atmosphere, but HazelCast.
It looks like you selected the HazelCast option, and that it can't create its cluster, because of an issue with host 192.168.1.12 -> this is probably a network error, do you have a firewall running?

Resources