Frequent reports of jgroups dropped messages during a performance test - multicast

Jgroup version : 3.4.4
Config :
UDP(mcast_addr=228.6.7.8;mcast_port=22222;ip_ttl=8;mcast_send_buf_size=150000;mcast_recv_buf_size=80000):PING(timeout=2000;num_initial_members=3):MERGE2(min_interval=5000;max_interval=10000):FD_SOCK:VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(gc_lag=50;retransmit_timeout=300,600,1200,2400,4800):UNICAST(timeout=5000):pbcast.STABLE(desired_avg_gossip=20000):FRAG(frag_size=4096;down_thread=false;up_thread=false):pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false;print_local_addr=false):pbcast.STATE_TRANSFER
During a performance test(four node JGroup cluster) frequent reports (WARN severity) of dropped / failed messages from jgroups.
These reports also occasionally occur during traffic (and even sometimes when idle).
WARN messages :
WARN [tid=bpsp-XYZ01-150615142241421-618331378-0-2] [bpsp-XYZ01-4296] org.jgroups.protocols.pbcast.NAKACK2 - JGRP000011: bpsp-XYZ01-4296: dropped message 330 from non-member bpsp-XYZ03-31792 (view=[bpsp-XYZ02-58715|3260] (15) [bpsp-XYZ02-58715, bpsp-XYZ04-33866, bpsp-XYZ04-35133, bpsp-XYZ01-1551, bpsp-XYZ02-17088, bpsp-XYZ02-2701, bpsp-XYZ02-50162, bpsp-XYZ02-19697, bpsp-XYZ01-8027, bpsp-XYZ01-32523, bpsp-XYZ01-4296, bpsp-XYZ01-10112, bpsp-XYZ04-10116, bpsp-XYZ04-48624, bpsp-XYZ04-16847])
WARN [tid=bpsp-XYZ01-150615142241421-618331378-0-1b] [bpsp-XYZ01-28987] org.jgroups.protocols.pbcast.NAKACK2 - JGRP000011: bpsp-XYZ01-28987: dropped message 1447 from non-member bpsp-XYZ03-54278 (view=[bpsp-XYZ02-33248|3071] (3) [bpsp-XYZ02-33248, bpsp-XYZ01-28987, bpsp-XYZ04-38112])
WARN [tid=bpsp-XYZ01-150615142241421-618331378-0-2a] [bpsp-XYZ01-46462] org.jgroups.protocols.pbcast.NAKACK2 - JGRP000011: bpsp-XYZ01-46462: dropped message 4146 from non-member bpsp-XYZ03-39195 (view=[bpsp-XYZ04-59688|3045] (3) [bpsp-XYZ04-59688, bpsp-XYZ01-46462, bpsp-XYZ02-14036])
WARN [tid=bpsp-XYZ01-150615142241421-618331378-0-5] [XYZ01-34208] org.jgroups.protocols.pbcast.GMS - bpsp-XYZ01-34208: failed to collect all ACKs (expected=3) for view [bpsp-XYZ03-20005|3047] after 2000ms, missing 1 ACKs from bpsp-XYZ02-33303
WARN [tid=bpsp-XYZ01-150615142241421-618331378-0-12] [bpsp-XYZ01-36922] org.jgroups.protocols.pbcast.NAKACK2 - JGRP000011: bpsp-XYZ01-36922: dropped message 541 from non-member bpsp-XYZ02-55864 (view=[bpsp-XYZ04-32626|3097] (3) [bpsp-XYZ04-32626, bpsp-XYZ01-36922, bpsp-XYZ03-58090])
Appreciate your idea on this.
Thanks

Related

rampUser method is getting stuck in gatling 3.3

I am having issues using rampUser() method in my gatling script. The request is getting stuck after the following entry which had passed half way through.
Version : 3.3
================================================================================
2019-12-18 09:51:44 45s elapsed
---- Requests ------------------------------------------------------------------
> Global (OK=2 KO=0 )
> graphql / request_0 (OK=1 KO=0 )
> rest / request_0 (OK=1 KO=0 )
---- xxxSimulation ---------------------------------------------------
[##################################### ] 50%
waiting: 1 / active: 0 / done: 1
================================================================================
I am seeing the following in the log which gets repeated for ever and the log size increases
09:35:46.495 [GatlingSystem-akka.actor.default-dispatcher-2] DEBUG io.gatling.core.controller.inject.open.OpenWorkload - Injecting 0 users in scenario xxSimulation, continue=true
09:35:47.494 [GatlingSystem-akka.actor.default-dispatcher-6] DEBUG io.gatling.core.controller.inject.open.OpenWorkload - Injecting 0 users in scenario xxSimulation, continue=true
The above issue is happening only with rampUser and not happening with
atOnceUsers()
rampUsersPerSec()
rampConcurrentUsers()
constantConcurrentUsers()
constantUsersPerSec()
incrementUsersPerSec()
Is there a way to mimic rampUser() in some other way or is there a solution for this.
My code is very minimal
setUp(
scenarioBuilder.inject(
rampUsers(2).during(1 minutes)
)
).protocols(protocolBuilder)
I am stuck with this for some time and my earlier post with more information can be found here
Can any of the gatling experts help me on this?
Thanks for looking into it.
It seems you have slightly incorrect syntax for a rampUsers. You should try remove a . before during.
I have in my own script this code and it works fine:
setUp(userScenario.inject(
// atOnceUsers(4),
rampUsers(24) during (1 seconds))
).protocols(httpProtocol)
Also, in Gatling documentation example is also without a dot Open model:
scn.inject(
nothingFor(4 seconds), // 1
atOnceUsers(10), // 2
rampUsers(10) during (5 seconds), // HERE
constantUsersPerSec(20) during (15 seconds), // 4
constantUsersPerSec(20) during (15 seconds) randomized, // 5
rampUsersPerSec(10) to 20 during (10 minutes), // 6
rampUsersPerSec(10) to 20 during (10 minutes) randomized, // 7
heavisideUsers(1000) during (20 seconds) // 8
).protocols(httpProtocol)
)
My guess is that syntax can't be parsed, so instead 0 is substituted. (Here is example of rounding. Not applicable, but as reference: gatling-user-injection-constantuserspersec)
Also, you mentioned that others method work, could you paste working code as well?

tmux crashing after refocusing on the tmux session running vim with mouse enabled

tmux crashes consistently after returning to the terminal running tmux. I can open files move around using both keyboard and mouse and edit etc in vim. However after I do some other work in other terminals/windows/browser etc and come back to the tmux terminal to continue working it just crashes.
Suspect it has something to do with mouse being enabled in vim. I use putty(term=xterm). Below are log files. I do not have a .tmux_conf file setup - so nothing from there. Any ideas?
My tmux details: I compiled from source using this link: http://jhshi.me/2016/07/08/installing-tmux-from-source-non-root/index.html#.XF977FUzbvs
tmux -V
tmux 2.8
Here is how I start tmux
tmux -vvvv
tmux-server-34191.log:
1549962713.519987 #0 active pane not changed
<----- HERE I came back to the tmux terminal and the below were when it crashed ------->
1549962717.827661 client 0x1e224a0, status interval 15
1549962717.827675 cmdq_next <global>: empty
1549962717.827679 cmdq_next <0x1e224a0>: empty
1549962717.827689 screen_write_start: size 211x1, no pane
1549962717.827724 screen_write_collect_flush: flushed 0 items (0 bytes)
1549962717.827728 screen_write_stop: 211 cells (211 written, 0 skipped)
1549962717.827806 format '[#S] ' -> '[0] '
1549962717.827816 unref client 0x1e224a0 (2 references)
1549962717.827877 format ' "#{=21:pane_title}" 09:11 12-Feb-19' -> ' "PROJ_NAME" 09:11 12-Feb-19'
1549962717.827886 unref client 0x1e224a0 (2 references)
1549962717.827931 format '#{window_flags}' -> '*'
1549962717.827934 format '#I:#W#{?window_flags,#{window_flags}, }' -> '0:vim*'
1549962717.827942 unref client 0x1e224a0 (2 references)
1549962717.827947 screen_write_start: size 7x1, no pane
1549962717.827951 screen_write_collect_flush: flushed 0 items (0 bytes)
1549962717.827953 screen_write_stop: 7 cells (7 written, 0 skipped)
1549962717.827956 screen_write_start: size 211x1, no pane
1549962717.827962 screen_write_collect_flush: flushed 0 items (0 bytes)
1549962717.827965 screen_write_stop: 44 cells (40 written, 4 skipped)
1549962717.827978 #0 active pane not changed
1549962718.336657 /dev/pts/3: read 12 bytes (already 0)
1549962718.336665 /dev/pts/3: keys are 12 (\033[<0;137;24M)
1549962718.336668 /dev/pts/3: mouse input (SGR): \033[<0;137;24M
1549962718.336671 /dev/pts/3: complete key \033[<0;137;24M 0x10000005
1549962718.336676 session $0 0 activity 1549962718.336674 (last 1549962678.439002)
1549962718.336680 mouse 00 at 136,23 (last 0,0) (0)
1549962718.336683 down at 136,23
1549962718.336686 mouse at 136,23 is on pane %0
1549962718.336690 cmd_find_from_mouse: s=$0 0
1549962718.336692 cmd_find_from_mouse: wl=0 1 w=#0 vim
1549962718.336694 cmd_find_from_mouse: wp=%0
1549962718.336696 cmd_find_from_mouse: idx=none
1549962718.336700 writing key 0x1000000a (MouseDown1Pane) to %0
tmux-client-34189.log:
1549962672.798365 sending message 105 to peer 0x1def610 (29 bytes)
1549962672.798368 sending message 105 to peer 0x1def610 (35 bytes)
1549962672.798370 sending message 105 to peer 0x1def610 (53 bytes)
1549962672.798372 sending message 105 to peer 0x1def610 (88 bytes)
1549962672.798375 sending message 105 to peer 0x1def610 (107 bytes)
1549962672.798377 sending message 105 to peer 0x1def610 (75 bytes)
1549962672.798379 sending message 105 to peer 0x1def610 (67 bytes)
1549962672.798382 sending message 105 to peer 0x1def610 (77 bytes)
1549962672.798384 sending message 105 to peer 0x1def610 (38 bytes)
1549962672.798386 sending message 105 to peer 0x1def610 (32 bytes)
1549962672.798389 sending message 105 to peer 0x1def610 (71 bytes)
1549962672.798391 sending message 105 to peer 0x1def610 (21 bytes)
1549962672.798393 sending message 105 to peer 0x1def610 (16 bytes)
1549962672.798395 sending message 105 to peer 0x1def610 (42 bytes)
1549962672.798398 sending message 106 to peer 0x1def610 (0 bytes)
1549962672.798401 sending message 200 to peer 0x1def610 (4 bytes)
1549962672.798403 client loop enter
1549962672.815446 peer 0x1def610 message 207
1549962672.815456 sending message 208 to peer 0x1def610 (0 bytes)
1549962718.338531 client loop exit
Update 1: I just tried enabling mouse in tmux by creating a .tmux.conf file containing the below but still gave the same result
set -g mouse on
Update 2: It stopped crashing when I removed the mouse support from vim and the tmux mouse support I added in "Update 1". So it is something to do with mouse support for tmux. Any ideas?

Connect to Vehicle Using Telemetry on Linux

I am having problems with connection to vehicle. First, I could not connect to the vehicle even with USB (I used "/dev/ttyUSB0" connection string and got an error). Later I got it working with connection string '/dev/serial/by-id/usb-3D_Robotics_PX4_FMU_v2.x_0-if00' and was able to send commands and receive response. Now I want to test it with the telemetry block connected to laptop USB. I tried the same way - with connection string "/dev/serial/by-id/usb-Silicon_Labs_CP2102_USB_to_UART_Bridge_Controller_0001-if00-port0", but it gives timeout message.
USB connection test output:
>>> PreArm: Check FS_THR_VALUE
>>> PreArm: Throttle below Failsafe
>>> APM:Copter V3.5.4 (284349c3)
>>> PX4: 0384802e NuttX: 1bcae90b
>>> Frame: QUAD
>>> PX4v3 0035003B 3136510A 34313630
Mode: STABILIZE
Autopilot Firmware version: APM:Copter-3.5.4
Autopilot capabilities (supports ftp): False
Global Location: LocationGlobal:lat=40.3985757,lon=49.8104986,alt=38.7
Global Location (relative altitude): LocationGlobalRelative:lat=40.3985757,lon=49.8104986,alt=38.7
Local Location: LocationLocal:north=None,east=None,down=None
Attitude: Attitude:pitch=-0.013171303086,yaw=0.0626983344555,roll=-0.0145587390289
Velocity: [-0.01, -0.01, 0.03]
GPS: GPSInfo:fix=3,num_sat=5
Groundspeed: 0.0168827120215
Airspeed: 0.263999998569
Gimbal status: Gimbal: pitch=None, roll=None, yaw=None
Battery: Battery:voltage=0.0,current=None,level=None
EKF OK?: False
Last Heartbeat: 0.967473479002
Rangefinder: Rangefinder: distance=None, voltage=None
Rangefinder distance: None
Rangefinder voltage: None
Heading: 3
Is Armable?: False
System status: STANDBY
Mode: STABILIZE
Armed: False
I am opening a connection like this:
vehicle = connect('/dev/serial/by-id/usb-Silicon_Labs_CP2102_USB_to_UART_Bridge_Controller_0001-if00-port0', wait_ready=True)
This results in the following traceback:
>>> Link timeout, no heartbeat in last 5 seconds
>>> No heartbeat in 30 seconds, aborting.
Traceback (most recent call last):
File "x.py", line 6, in <module>
vehicle = connect('/dev/serial/by-id/usb-Silicon_Labs_CP2102_USB_to_UART_Bridge_Controller_0001-if00-port0', wait_ready=True)
File "/home/seyid/.local/lib/python2.7/site-packages/dronekit/__init__.py", line 2845, in connect
vehicle.initialize(rate=rate, heartbeat_timeout=heartbeat_timeout)
File "/home/seyid/.local/lib/python2.7/site-packages/dronekit/__init__.py", line 2117, in initialize
raise APIException('Timeout in initializing connection.')
dronekit.APIException: Timeout in initializing connection.
Telemetry block is working when using MavProxy.
What is the problem here? Thank you
There are a couple of problems that can cause dronekit to fail with a connection timeout:
Ensure you have the pyserial module installed.
Specify the baud rate for your connection explicitly, as in:
vehicle = connect('/dev/ttyUSB0',
wait_ready=True,
baud=57600,
)
If connections with mavproxy to the same serial port work on your system it is likely that the second one is the culprit.

Cassandra File too Large Error on Start Up

When restarting a node in my cluster I sometimes get this error message
INFO [IndexSummaryManager:1] 2016-04-12 19:32:53,574 IndexSummaryRedistribution.java:74 - Redistributing index summaries
ERROR [HintsWriteExecutor:1] 2016-04-12 20:02:15,636 CassandraDaemon.java:195 - Exception in thread Thread[HintsWriteExecutor:1,5,main]
org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: bin/../data/hints/389cb0d3-87b9-4221-8352-065e8ce50fdb-1460462523225-1.crc32: File too large
at org.apache.cassandra.hints.HintsWriter.writeChecksum(HintsWriter.java:116) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriter.close(HintsWriter.java:124) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsStore.closeWriter(HintsStore.java:201) ~[apache-cassandra-3.2.1.jar:3.2.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_72]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
Caused by: java.nio.file.FileSystemException: bin/../data/hints/389cb0d3-87b9-4221-8352-065e8ce50fdb-1460462523225-1.crc32: File too large
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[na:1.8.0_72]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_72]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_72]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[na:1.8.0_72]
at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) ~[na:1.8.0_72]
at java.nio.file.Files.newOutputStream(Files.java:216) ~[na:1.8.0_72]
at org.apache.cassandra.hints.HintsWriter.writeChecksum(HintsWriter.java:110) ~[apache-cassandra-3.2.1.jar:3.2.1]
... 7 common frames omitted
The node then starts shutting down, but gets stuck repeated throwing an exception and retrying every few seconds. This is the main problem for me because it stopped the cluster from accepting connections
ERROR [HintsWriteExecutor:1] 2016-04-12 20:02:15,638 StorageService.java:440 - Stopping gossiper
WARN [HintsWriteExecutor:1] 2016-04-12 20:02:15,651 StorageService.java:347 - Stopping gossip by operator request
INFO [HintsWriteExecutor:1] 2016-04-12 20:02:15,651 Gossiper.java:1455 - Announcing shutdown
DEBUG [HintsWriteExecutor:1] 2016-04-12 20:02:15,653 StorageService.java:1921 - Node /169.34.103.150 state shutdown, token [-1028691827956217809, -1257393635657191129, -1285475466194230655, -1398822673992383910, -1549844858878358481, -1638651369075180065, -1660825917518666149, -1802478872312866489, -1834618755337322564, -188187624477415935, -2034018930607672685, -210049110249018365, -2157250079133002505, -2171215058533514263, -2183510006393476006, -2193567329545672696, -2317710820662725097, -2319735333341559730, -2333531623390263516, -2458839661565177963, -2489690089103800827, -2710032230533922787, -2780200665893123668, -283628639049224915, -2886550293705646069, -293132189636842303, -2945647150702034785, -2965944925251907629, -2990231874502594267, -2991676811317743630, -2997538046339800243, -3176643432551515484, -3176889844544735478, -3201806929871841501, -3211631881211211792, -322057073400957538, -3242716520974847469, -3424682940569182570, -3441313897213257083, -3448874645237774640, -3452014929671888774, -3487048220426765500, -3523033168154067409, -3738270231064896111, -3792947624538231469, -3850123184653095411, -3859434367535677710, -3993763147657603241, -4010731345091378481, -4258687888114256086, -429860111391304244, -4318544125783476774, -4471769468265919226, -4588065176944445932, -4669765414071774677, -4670558952147294236, -4710259358376415554, -4907784900060021493, -4934593823248165235, -4934821923831720820, -5013288056003569837, -5110268421077583856, -5133973510660140774, -5159515181162633178, -5276029184678521021, -5286266972273013716, -540287937883749850, -5456649087226389873, -5495658378651725051, -5501165049612471047, -5535468008960837763, -5716046948204274477, -5721008906555397374, -5753456205099778029, -5770577886564351775, -5790919034460455792, -5929058167490034525, -5943865033771694477, -608562636816376813, -6109108822129963089, -6140834685397419488, -6170120179807852740, -6179956847809119210, -6245955388738336647, -6286189790411746933, -6299162407942815080, -6315904471665416400, -6364734987085439789, -6419018190136685454, -6451287738650323275, -6547213964231430849, -6548484474977763138, -6549052151069571925, -6698516302891374040, -6872407277556537836, -6901128430416607497, -6935384932230430038, -6937998036050345125, -7031528786091227188, -7106019277455303867, -7119774336125808637, -7191744312745689956, -7225558820114789693, -7349977359560186580, -7422626834116218143, -7431995410347149964, -7466585374358878727, -750799820874518113, -7610594360825096930, -7616154542884798259, -7629884030042898550, -7728553164832596613, -7789353727430662940, -779402220858888622, -7843332228444504745, -7854439386306622129, -815495344326874929, -8338520822777210140, -8649102261484559375, -8796027903791901112, -8898390484583881495, -8923261220379460832, -8943079358447105951, -9050583546904370510, -9080494386502531561, -9139630196350606101, -939306213156730751, -942614916980620152, 1049830730407134075, 1125127596836820990, 1133356003300268705, 1133623932124213230, 1247043876318218235, 1490023295198042772, 1497436537080113324, 1516791905674857253, 1603966065250122923, 1646125781869948326, 1740544126107535998, 1756218012030701589, 1804735370513211257, 1812139850525677114, 1819880350303805394, 1841691686666460445, 1888363141244474676, 2010883847009222978, 2016297526252235227, 2021110586668181290, 2084880932156441613, 2093427980091185166, 2112052724153374980, 2186638483475842552, 2195406825247987731, 2283720951686386464, 23875829161989945, 2521818329391092608, 2522645057607851918, 2524720168145693638, 2541003400153964040, 2650684785592761012, 2723290502273715430, 2808119513098478236, 2821997019638778146, 2891379770529557184, 2907285214187020532, 2963307217336709534, 3061757915053031951, 3122571062025066142, 3128771694670016319, 3130206542424936603, 3197285318974197102, 3218987271686146429, 3329594065878248111, 3331926835266199716, 3526280986313508860, 3542343528340649978, 3589794725284000659, 3610364312437568329, 3701861372719378732, 373747767999916658, 3826422069022675393, 3856151860383170644, 3862031127704782057, 4049338078570571707, 4137865494092400430, 4241199357440741315, 4520402233521387342, 456519309520244643, 4715328215899051522, 4817677510120292180, 497627869146346949, 4995322204306807081, 5030633110404844305, 5038572404428039197, 5042627643214511398, 5281377762367584052, 5494577271219306513, 5530410713928998603, 5537215727145277166, 567120218785751902, 575743986375007756, 5784212620383428248, 5837914425280614947, 5977153680566647690, 5996674261833528410, 6083452088392300601, 6112178449036583235, 6264713703969393897, 6287772759341778176, 6314363909221383341, 6321343658409071604, 6475821468968027456, 6543311556613206558, 6912492987521221000, 6922280123185191829, 694545242943535806, 7183280296372529849, 7306070312091628992, 7412331756775823975, 7518294356359523088, 7567542433462808235, 7589810674525331548, 7637277610587157806, 773319528418720822, 7760484456189230502, 7816590204960057932, 7820991841796591957, 7836345109808448402, 7859570796753174, 8003409347394992259, 8012927612089894493, 8031750463661605171, 8051744553293723603, 8066222841813137181, 8073294271415597086, 8117861819974218900, 817982542709209563, 8198846486095494968, 8214665766962397555, 8277428606113435880, 8279100634451559360, 8316406004646641445, 8367052745804770548, 8373819718798220972, 8439087240414018142, 8444359473760446267, 8449256096936507263, 8717779586961798956, 8912188109780463904, 8920579922439529433, 8951968899880736480, 9043168611036813220, 9044578575232639242, 9045812874827336349, 9140634849238500115, 915715308827014103]
INFO [HintsWriteExecutor:1] 2016-04-12 20:02:15,653 StorageService.java:1924 - Node /169.34.103.150 state jump to shutdown
DEBUG [PendingRangeCalculator:1] 2016-04-12 20:02:15,655 PendingRangeCalculatorService.java:64 - finished calculation for 74 keyspaces in 0ms
ERROR [HintsWriteExecutor:1] 2016-04-12 20:02:17,655 StorageService.java:450 - Stopping native transport
INFO [HintsWriteExecutor:1] 2016-04-12 20:02:17,705 Server.java:182 - Stop listening for CQL clients
ERROR [HintsWriteExecutor:1] 2016-04-12 20:02:23,219 CassandraDaemon.java:195 - Exception in thread Thread[HintsWriteExecutor:1,5,main]
org.apache.cassandra.io.FSWriteError: java.nio.channels.ClosedChannelException
at org.apache.cassandra.hints.HintsWriter.newSession(HintsWriter.java:146) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriteExecutor.flushInternal(HintsWriteExecutor.java:221) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriteExecutor.flush(HintsWriteExecutor.java:203) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriteExecutor.lambda$flush$217(HintsWriteExecutor.java:196) ~[apache-cassandra-3.2.1.jar:3.2.1]
at java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) ~[na:1.8.0_72]
at org.apache.cassandra.hints.HintsWriteExecutor.flush(HintsWriteExecutor.java:196) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriteExecutor.access$000(HintsWriteExecutor.java:36) ~[apache-cassandra-3.2.1.jar:3.2.1]
at org.apache.cassandra.hints.HintsWriteExecutor$FlushBufferPoolTask.run(HintsWriteExecutor.java:155) ~[apache-cassandra-3.2.1.jar:3.2.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_72]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
Caused by: java.nio.channels.ClosedChannelException: null
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) ~[na:1.8.0_72]
at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:300) ~[na:1.8.0_72]
at org.apache.cassandra.hints.HintsWriter.newSession(HintsWriter.java:142) ~[apache-cassandra-3.2.1.jar:3.2.1]
... 12 common frames omitted
I googled around a bit and think the initial exception is caused by the node trying to replay a hint that is very large and it cant load.
I tried to find some parameters that can prevent this, but all I could find was turning off hint handoff (hinted_handoff_enabled) or reducing the time hint handoff runs for (max_hint_window_in_ms). I don't think I can live with an inconsistent cluster and was hoping for an option to split the hints into multiple files, but couldn't find such an option.
Has anyone seen this issue before? Is there a way to split hints into multiple files? How else might I deal with this?
EDIT: I hunted through my config and found this
max_hints_file_size_in_mb: 128
That seems pretty concervative to me, given the machine I'm running on. If my hints are limited to 128MB then I really dont understand why I have the above exception.
When run on this node nodetool threw an exception. Other nodes were ok, but I only ran nodetool the next morning (12 hours after the exception).
The file the exception complains about is no longer there, but the file name should be ok because I have many other files with similar names (same length). It is interesting the the exception is complaining about a .crc32 file (not a .hint file)

spring integration task executor queue filled with more records

I started to build a Spring Integration app, in which the input gateway generates a fixed number (50) of records and then stops generating new records. There are basic filters/routers/transformers in the middle, and the ending service activator and task executor config are as following:
<int:service-activator input-channel="inChannel" output-channel="outChannel" ref="svcProcessor">
<int:poller fixed-rate="100" task-executor="myTaskExecutor"/>
</int:service-activator>
<task:executor id = "myTaskExecutor" pool-size="5" queue-capacity="100"/>
I tried to put some debug info at the begging of the svcProcessor method:
#Qualifier(value="myTaskExecutor")
#Autowired
ThreadPoolTaskExecutor executor;
#ServiceActivator
public Order processOrder(Order order) {
log.debug("---- " + "executor size: " + executor.getActiveCount() +
" q: " + executor.getThreadPoolExecutor().getQueue().size() +
" r: " + executor.getThreadPoolExecutor().getQueue().remainingCapacity()+
" done: " + executor.getThreadPoolExecutor().getCompletedTaskCount() +
" task: " + executor.getThreadPoolExecutor().getTaskCount()
);
//
//process order takes up to 5 seconds.
//
return order;
}
After sometimes the program runs, the log shows the queue has reached over 50, then eventually gets reject exception:
23:38:31.096 DEBUG [myTaskExecutor-2] ---- executor size: 5 q: 44 r: 56 done: 11 task: 60
23:38:31.870 DEBUG [myTaskExecutor-5] ---- executor size: 5 q: 51 r: 49 done: 11 task: 67
23:38:33.600 DEBUG [myTaskExecutor-4] ---- executor size: 5 q: 69 r: 31 done: 11 task: 85
23:32:46.792 DEBUG [myTaskExecutor-1] ---- executor size: 5 q: 72 r: 28 done: 11 task: 88
It looks like the active count and sum of queue size/remaining looks right with the config of 5 and 100, but I am not clear why there are more than 50 records in the queue, and the taskCount is also larger than the limit 50.
Am I looking at the wrong info from the executor and the queue?
Thanks
UPDATE:
(not sure if I should open another question)
I tried the xml version of the cafeDemo from spring-integration (branch SI3.0.x), and used pool provided in the document, but used 100 milliseconds rate and added capacity:
<int:service-activator input-channel="hotDrinks" ref="barista" method="prepareHotDrink" output-channel="preparedDrinks">
<int:poller task-executor="pool" fixed-rate="100"/>
</int:service-activator>
<task:executor id="pool" pool-size="5" queue-capacity="200"/>
After I ran it, it also got rejection exception after around the 20th delivery:
org.springframework.core.task.TaskRejectedException: Executor [java.util.concurrent.ThreadPoolExecutor#6c31732b[Running, pool size = 5, active threads = 5, queued tasks = 200, completed tasks = 0]]
There are only about 32 orders placed until the exception, so I am not sure why queued tasks = 200 and completed task = 0?
THANKS
getTaskCount() This method gives the number of total task assigned to executor since the start. So, it will increase with time.
And other variables are approximate number not exact as per documentation of java.
getCompletedTaskCount()
Returns the approximate total number of tasks that have completed execution.
public int getActiveCount()
Returns the approximate number of threads that are actively executing tasks.
Ideally getTaskCount() and getCompletedTaskCount() will increase linearly with time, as it includes all the previous tasks assigned since start of execution of your code. However, activeCount should be less than 50, but being approximate number it will go beyond 50 sometimes with little margin.
Refer :-
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html

Resources