Cassandra 3.3 - cluster node - irregular behavior - cassandra
I have a Cassandra Cluster conformed with the nodes
{ 192.168.120.57, 192.168.120.58, 192.168.120.59 }, with replication factor of 2
I detected the node 192.168.120.59 was behaving irregularly
-> The service was running (sudo service cassandra status)
-> The nodetool also told me that the node was running
XXXX#cassandra-prod03:~$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.120.57 4.05 GB 256 ? 6d0f5961-a600-42c2-8ba5-5b6ebff52ceb rack1
UN 192.168.120.58 4.4 GB 256 ? 53bc77ff-50b3-4bf0-bd55-17e2bb7c8208 rack1
UN 192.168.120.59 2.77 GB 256 ? 03d8d7fd-1d86-4034-a537-9915adb0d4b3 rack1
BUT -> I couldn't connect by DevCenter (the last time I tried I could and the cassandra.yml hasn't changed)
I looked into the log of this node (192.168.120.59), and showed:
ERROR [HintsDispatcher:1] 2016-06-23 16:57:01,827 HintsDispatchExecutor.java:224 - Failed to dispatch hints file 6d0f5961-a600-42c2-8ba5-5b6ebff52ceb-1465184315371-1.hints: file is corrupted ({})
ERROR [HintsDispatcher:1] 2016-06-23 16:57:01,830 CassandraDaemon.java:195 - Exception in thread Thread[HintsDispatcher:1,1,main]
ERROR [HintsDispatcher:1] 2016-06-23 16:57:01,834 StorageService.java:470 - Stopping gossiper
ERROR [HintsDispatcher:1] 2016-06-23 16:57:03,842 StorageService.java:480 - Stopping native transport
In the other nodes, the log showed many many times:
DEBUG [GossipTasks:1] 2016-06-24 07:56:21,551 Gossiper.java:336 - Convicting /192.168.120.59 with status shutdown - alive false
DEBUG [GossipTasks:1] 2016-06-24 07:56:22,552 Gossiper.java:336 - Convicting /192.168.120.59 with status shutdown - alive false
DEBUG [GossipTasks:1] 2016-06-24 07:56:23,552 Gossiper.java:336 - Convicting /192.168.120.59 with status shutdown - alive false
DEBUG [GossipTasks:1] 2016-06-24 07:56:24,553 Gossiper.java:336 - Convicting /192.168.120.59 with status shutdown - alive false
DEBUG [GossipTasks:1] 2016-06-24 07:56:25,553 Gossiper.java:336 - Convicting /192.168.120.59 with status shutdown - alive false
DEBUG [GossipTasks:1] 2016-06-24 07:56:26,553 Gossiper.java:336 - Convicting /192.168.120.59 with status shutdown - alive false
DEBUG [GossipTasks:1] 2016-06-24 07:56:27,554 Gossiper.java:336 - Convicting /192.168.120.59 with status shutdown - alive false
These are the things I have done and which didn't work:
1) reboot the machine.
2) nodetool repair.
XXXXs#cassandra-prod03:~$ nodetool repair
[2016-06-23 17:07:37,582] Starting repair command #1, repairing keyspace recommendersjobs with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 512)
Jun 23, 2016 5:53:57 PM ClientCommunicatorAdmin Checker-run
WARNING: Failed to check the connection: java.net.SocketTimeoutException: Read timed out
Exception occurred during clean-up. java.lang.reflect.UndeclaredThrowableException
error: [2016-06-23 18:06:22,229] JMX connection closed. You should check server log for repair status of keyspace recommendersjobs(Subsequent keyspaces are not going to be repaired).
-- StackTrace --
java.io.IOException: [2016-06-23 18:06:22,229] JMX connection closed. You should check server log for repair status of keyspace recommendersjobs(Subsequent keyspaces are not going to be repaired).
at org.apache.cassandra.tools.RepairRunner.handleConnectionFailed(RepairRunner.java:97)
at org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:86)
at javax.management.NotificationBroadcasterSupport.handleNotification(NotificationBroadcasterSupport.java:275)
at javax.management.NotificationBroadcasterSupport$SendNotifJob.run(NotificationBroadcasterSupport.java:352)
at javax.management.NotificationBroadcasterSupport$1.execute(NotificationBroadcasterSupport.java:337)
at javax.management.NotificationBroadcasterSupport.sendNotification(NotificationBroadcasterSupport.java:248)
at javax.management.remote.rmi.RMIConnector.sendNotification(RMIConnector.java:441)
at javax.management.remote.rmi.RMIConnector.access$1200(RMIConnector.java:121)
at javax.management.remote.rmi.RMIConnector$RMIClientCommunicatorAdmin.gotIOException(RMIConnector.java:1531)
at javax.management.remote.rmi.RMIConnector$RMINotifClient.fetchNotifs(RMIConnector.java:1352)
3) nodetool scrub.
After this I followed this page https://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_backup_noderestart_t.html
On the node that presented the irregular behaviour:
1) Deleted all tables
2) Copy and Paste the tables that were in the snapshot sub-directory
3) Restarted cassandra service (sudo service cassandra start)
Now the Dev-Center can connect
4) I Ran nodetool repair again, and it seems that is not finishing. It has been running for more than 3 hours and is still in 0% (but for now didn't throw the other error), but I think it is not working.
XXXX#cassandra-prod03:/var/lib/cassandra/data/recommendersjobs$ nodetool repair
[2016-06-24 12:24:13,901] Starting repair command #2, repairing keyspace recommendersjobs with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 512)
[2016-06-24 12:24:17,235] Repair session 7990e900-3a30-11e6-9861-e998758604bf for range [(-4211712190859502280,-4203992560184719114], (-8318576077833367495,-8293862912754480840], (6846385493332587078,6848188842228182830], (-6460899805527117383,-6451580681320322560], (-927549969226058129,-890279568362794135], (-6984061332646781425,-6971866923830433843], (4166814791483192953,4181831925595728309], (212986607214778070,288868023576418487], (656453115636318699,670358990995510048], (-3857111065907140708,-3844748631754416189], (-731643009820930522,-708535538994856284], (-3646398519882341977,-3630383383747491969], (-4215258070829115921,-4211712190859502280], (6972519766285293067,6983272857539313387], (8698322395242654268,8707318407090795417], (-1229710138608978352,-1142107267252652523], (-6043663680829718954,-6000947666680635781], (-2555062915045199657,-2518876915835081063], (700476075055800053,714305291929584762], (-9040157763677832405,-9035752029446223323], (-5508447172120753932,-5486202973508763691], (685259738113005224,700476075055800053], (-3498127715533695144,-3481009398659298196], (-6642818648542557724,-6636310914487378937], (8434600097698749124,8447388029675461333], (-9196939878235591876,-9176601682787902008], (-8392736270165548628,-8358469817741800124], (2668692603556301942,2674434559513729415], (-1868370051056061493,-1860700618558449042], (-6090102443758865946,-6081413019910708044], (-1234056662103299181,-1233526281479424892], (5500764053339166116,5524556889632227041], (-2343395791776959523,-2315566773401476532], (-1401161284995746969,-1393787270658712827], (-2204381944261735990,-2170477179223028079], (6158419946524545293,6171118529963906232], (-4713529768344258113,-4662557763507006574], (-6844075271953818452,-6821121360440227129], (-2945271950056514025,-2943444578534843676], (336935533896784347,357900664320554816], (-3228825867554770158,-3208435083108021414], (-4955223362385024239,-4912831044277064162], (4255028482041276438,4266791030997837989], (6476681916885657705,6500903072598398827], (-1233526281479424892,-1229710138608978352], (1478830670694875260,1516022971169730278], (5055515752742518226,5080231569691472194], (-8220031638992810484,-8180158125318738371], (-1644262187188908209,-1640818085920683524], (-4203992560184719114,-4201062514351678359], (-5222304299920694120,-5219839223573200521], (-8773724135260117843,-8764792082520695033], (-1355888553876954583,-1337212328034401607], (4736203535340725934,4798689233015535742], (1271210285313194340,1296150437356677365], (-3540891434049035738,-3498615337040509123], (-6469678775729021799,-6460899805527117383], (6720989190599823387,6725081429318018351], (-3498615337040509123,-3498127715533695144], (-3146863944770153663,-3146145879742664566], (1516022971169730278,1543205759172249070], (2435469497781312092,2442862581422161803], (-6451580681320322560,-6450897957558236116], (6401972194395279212,6422279168069398956], (-3185403021559141625,-3146863944770153663], (714305291929584762,737464206890275575], (-8358469817741800124,-8348341616724830280], (1175374807063094969,1214972878391460620], (-7604581392807650182,-7599009809842308024], (737464206890275575,753307058766478061], (6142712373291735013,6158419946524545293], (-4780493053770543287,-4715780574795279713], (-6475595045987094984,-6472782860721811368], (-6000947666680635781,-5984299660747839008], (1296150437356677365,1341203791801750057], (-708535538994856284,-698490751868803865], (7377888475905026189,7453466617724696329], (-6634208348348544983,-6608369916989089358], (6422279168069398956,6425564550565269465], (7864684753798709404,7865351483298068861], (-1142107267252652523,-1130763472498460650], (2661263629361374231,2668692603556301942], (5080231569691472194,5114813438584055434], (-3146145879742664566,-3133020507157500012], (-8847809439930946447,-8773724135260117843], (-7455563937756770941,-7393173741279261649], (-8001929233768101133,-7997359338916453120], (-3208435083108021414,-3185403021559141625], (-8504346858359312144,-8489394145547730314], (330577967974463458,336935533896784347], (7563807678381276197,7582113707662028389], (-8542082760751850432,-8504346858359312144], (-3554430985922725406,-3540891434049035738], (-5146821899868165131,-5132879094583817494], (-240050552784290962,-235772081518906038], (3192195406141461438,3232311876000191165], (4396701296202861119,4435447637968520477], (-6608369916989089358,-6589281199205137237], (-1981758794620356502,-1980394476029734464], (-3563262972235992519,-3554430985922725406], (-8237266126949227268,-8229088751363770412], (4603382773438618005,4650956890538380770], (7199369658829377994,7213887142216841811], (-4962133286795414966,-4960837410336448181], (5732306214090585629,5737923082195078719], (-1640818085920683524,-1638198095283336424], (2953772832389323874,2968606959459466094], (-1235736831018577452,-1234056662103299181], (-8489394145547730314,-8465866933095448418], (-8764792082520695033,-8742184848088246957], (8970234525288431416,8995803891393002810], (-8465866933095448418,-8446751642929552235], (-4836950976239959831,-4798814403290155024], (8821435912538079613,8871870554368996448], (-6636310914487378937,-6634208348348544983], (1127798841363210583,1175374807063094969], (1618461143046543312,1632362492578265392], (4994543089928849457,5055515752742518226], (1593785714944278507,1618461143046543312], (-8885929495137049667,-8847809439930946447], (7514316870399738740,7535648178040640648], (-7997359338916453120,-7884947734877215590], (6848188842228182830,6851091943952222998], (-7468624498429238352,-7467528674261593018], (-5420066870415394025,-5415525963610970555], (-1990341934536435876,-1981758794620356502], (-2119663304358780223,-2025913005077454264], (6951976975129490336,6972519766285293067], (4814028481954927072,4845532697727637101], (6905718428902118395,6951976975129490336], (2442862581422161803,2473486929810000384], (-7050886265796431130,-6984061332646781425], (899736981222314462,902246412094263205], (4181831925595728309,4186043100749193555], (5228334028316552580,5251807687845401107], (7453466617724696329,7514316870399738740], (-5313682991153179320,-5303078638925764028], (4435447637968520477,4466559239870893508], (-1326922053645316997,-1235736831018577452], (1401641821520873210,1444010990722687070], (1751424608310627366,1764837197082917798], (4501052626153108190,4556808478342919483], (4927485308468803128,4934126082298511287], (9034829150987061085,9125279489658634069], (8296589981550156872,8314630209670832253], (8448913759582599761,8531687930406385678], (-6946999157236744448,-6844075271953818452], (1444010990722687070,1478830670694875260], (8959600358987559211,8960859570137987260], (-1336371416389083059,-1326922053645316997], (-61295809418728622,-42017489632420450], (8969118191184771962,8970234525288431416], (1635093197288200819,1646180075666173338], (4934126082298511287,4936412559469748364], (1751396675459233161,1751424608310627366], (-8742184848088246957,-8729944250195454990], (-8229088751363770412,-8220031638992810484], (4598535296351629260,4603382773438618005], (-8174261525446719402,-8163507145438773646], (755491933174209966,772199007723251143], (-8624222028448607604,-8600591663798592212], (-7519020387526209046,-7508903488301733990], (753307058766478061,755491933174209966], (7582113707662028389,7704299093463733587], (8962972620592857243,8969118191184771962], (-3630383383747491969,-3629090531220364683], (-4468127887710219501,-4455300706501649674], (5852098730437702158,5901360113812253268], (-744327910454436920,-731643009820930522], (1911291563147302352,1937039485053092324], (6233932117267819795,6310871518339058610], (-890279568362794135,-887061681980670889], (-3679929070886276718,-3646398519882341977], (670358990995510048,685259738113005224], (4845532697727637101,4882596066849804097], (-6472782860721811368,-6469678775729021799], (-4715780574795279713,-4713529768344258113], (3078864489691764074,3115136329005690845], (9125279489658634069,9133717713662800417], (8960859570137987260,8962972620592857243], (9203390852073803003,-9222340996132828621], (4798689233015535742,4814028481954927072], (-6821121360440227129,-6777406043019946301], (6310871518339058610,6322362350826191135], (-4233493116436171657,-4215258070829115921], (3130822610289703870,3145528043810254358], (5114813438584055434,5123797488219630832], (-6081413019910708044,-6079621320236638978], (8447388029675461333,8448913759582599761], (-3795970094034238564,-3779568630549582632], (-8180158125318738371,-8174261525446719402], (-3844748631754416189,-3795970094034238564], (4113442287147808270,4131576074431163736], (4936412559469748364,4994543089928849457], (4466559239870893508,4501052626153108190], (-3263191959146785290,-3228825867554770158], (6766473197422691050,6766938926557171347], (-8899258035651035854,-8885929495137049667], (-5381097520433537861,-5313682991153179320], (6883419270850869410,6905718428902118395], (5303695175174121453,5306682210025644500], (-9176601682787902008,-9096402285624130055], (-5219839223573200521,-5146821899868165131], (7862240594123780934,7864684753798709404], (6425564550565269465,6447403433784268090], (8996665254881958342,9032628945801968053], (-4912831044277064162,-4902449072491610799], (-2025913005077454264,-1990341934536435876], (-7467528674261593018,-7455563937756770941], (-3293906917083373209,-3283303577982631091], (4186043100749193555,4255028482041276438], (-4902449072491610799,-4875492513338002087], (8531687930406385678,8562083491872293504], (-8600591663798592212,-8567258569010210224], (-6079621320236638978,-6077962914827619257], (-4960837410336448181,-4955223362385024239], (-7634134348451706917,-7604581392807650182], (1632362492578265392,1635093197288200819], (291067515209041750,330577967974463458], (7789905572870286790,7790287256445162065], (-8007287567032110044,-8001929233768101133], (-6971866923830433843,-6946999157236744448], (9032628945801968053,9034829150987061085], (-4798814403290155024,-4780493053770543287], (3905716352976079268,3920453505268467479], (8995803891393002810,8996665254881958342], (-1393787270658712827,-1355888553876954583], (6522490073023565012,6553480210988421066], (7162495952820208409,7199369658829377994], (4882596066849804097,4927485308468803128], (-5893410924582496350,-5858193870200257506], (8233960782654491894,8252708412304213777], (6725081429318018351,6766473197422691050], (-5303078638925764028,-5222304299920694120], (2376875294823216717,2379477119284337609], (-7393173741279261649,-7376038212607696986], (-1337212328034401607,-1336371416389083059], (902246412094263205,902703964241874282], (288868023576418487,291067515209041750], (1341203791801750057,1364316712425392743], (5199663539610295629,5228334028316552580], (6447403433784268090,6476681916885657705], (-3481009398659298196,-3391281017675748608], (4266791030997837989,4315496651682018963], (3115136329005690845,3130822610289703870], (-7599009809842308024,-7598746015885846943]] failed with error [repair #7990e900-3a30-11e6-9861-e998758604bf on recommendersjobs/postulationsbyuser, [(-4211712190859502280,-4203992560184719114], (-8318576077833367495,-8293862912754480840], (6846385493332587078,6848188842228182830], (-6460899805527117383,-6451580681320322560], (-927549969226058129,-890279568362794135], (-6984061332646781425,-6971866923830433843], (4166814791483192953,4181831925595728309], (212986607214778070,288868023576418487], (656453115636318699,670358990995510048], (-3857111065907140708,-3844748631754416189], (-731643009820930522,-708535538994856284], (-3646398519882341977,-3630383383747491969], (-4215258070829115921,-4211712190859502280], (6972519766285293067,6983272857539313387], (8698322395242654268,8707318407090795417], (-1229710138608978352,-1142107267252652523], (-6043663680829718954,-6000947666680635781], (-2555062915045199657,-2518876915835081063], (700476075055800053,714305291929584762], (-9040157763677832405,-9035752029446223323], (-5508447172120753932,-5486202973508763691], (685259738113005224,700476075055800053], (-3498127715533695144,-3481009398659298196], (-6642818648542557724,-6636310914487378937], (8434600097698749124,8447388029675461333], (-9196939878235591876,-9176601682787902008], (-8392736270165548628,-8358469817741800124], (2668692603556301942,2674434559513729415], (-1868370051056061493,-1860700618558449042], (-6090102443758865946,-6081413019910708044], (-1234056662103299181,-1233526281479424892], (5500764053339166116,5524556889632227041], (-2343395791776959523,-2315566773401476532], (-1401161284995746969,-1393787270658712827], (-2204381944261735990,-2170477179223028079], (6158419946524545293,6171118529963906232], (-4713529768344258113,-4662557763507006574], (-6844075271953818452,-6821121360440227129], (-2945271950056514025,-2943444578534843676], (336935533896784347,357900664320554816], (-3228825867554770158,-3208435083108021414], (-4955223362385024239,-4912831044277064162], (4255028482041276438,4266791030997837989], (6476681916885657705,6500903072598398827], (-1233526281479424892,-1229710138608978352], (1478830670694875260,1516022971169730278], (5055515752742518226,5080231569691472194], (-8220031638992810484,-8180158125318738371], (-1644262187188908209,-1640818085920683524], (-4203992560184719114,-4201062514351678359], (-5222304299920694120,-5219839223573200521], (-8773724135260117843,-8764792082520695033], (-1355888553876954583,-1337212328034401607], (4736203535340725934,4798689233015535742], (1271210285313194340,1296150437356677365], (-3540891434049035738,-3498615337040509123], (-6469678775729021799,-6460899805527117383], (6720989190599823387,6725081429318018351], (-3498615337040509123,-3498127715533695144], (-3146863944770153663,-3146145879742664566], (1516022971169730278,1543205759172249070], (2435469497781312092,2442862581422161803], (-6451580681320322560,-6450897957558236116], (6401972194395279212,6422279168069398956], (-3185403021559141625,-3146863944770153663], (714305291929584762,737464206890275575], (-8358469817741800124,-8348341616724830280], (1175374807063094969,1214972878391460620], (-7604581392807650182,-7599009809842308024], (737464206890275575,753307058766478061], (6142712373291735013,6158419946524545293], (-4780493053770543287,-4715780574795279713], (-6475595045987094984,-6472782860721811368], (-6000947666680635781,-5984299660747839008], (1296150437356677365,1341203791801750057], (-708535538994856284,-698490751868803865], (7377888475905026189,7453466617724696329], (-6634208348348544983,-6608369916989089358], (6422279168069398956,6425564550565269465], (7864684753798709404,7865351483298068861], (-1142107267252652523,-1130763472498460650], (2661263629361374231,2668692603556301942], (5080231569691472194,5114813438584055434], (-3146145879742664566,-3133020507157500012], (-8847809439930946447,-8773724135260117843], (-7455563937756770941,-7393173741279261649], (-8001929233768101133,-7997359338916453120], (-3208435083108021414,-3185403021559141625], (-8504346858359312144,-8489394145547730314], (330577967974463458,336935533896784347], (7563807678381276197,7582113707662028389], (-8542082760751850432,-8504346858359312144], (-3554430985922725406,-3540891434049035738], (-5146821899868165131,-5132879094583817494], (-240050552784290962,-235772081518906038], (3192195406141461438,3232311876000191165], (4396701296202861119,4435447637968520477], (-6608369916989089358,-6589281199205137237], (-1981758794620356502,-1980394476029734464], (-3563262972235992519,-3554430985922725406], (-8237266126949227268,-8229088751363770412], (4603382773438618005,4650956890538380770], (7199369658829377994,7213887142216841811], (-4962133286795414966,-4960837410336448181], (5732306214090585629,5737923082195078719], (-1640818085920683524,-1638198095283336424], (2953772832389323874,2968606959459466094], (-1235736831018577452,-1234056662103299181], (-8489394145547730314,-8465866933095448418], (-8764792082520695033,-8742184848088246957], (8970234525288431416,8995803891393002810], (-8465866933095448418,-8446751642929552235], (-4836950976239959831,-4798814403290155024], (8821435912538079613,8871870554368996448], (-6636310914487378937,-6634208348348544983], (1127798841363210583,1175374807063094969], (1618461143046543312,1632362492578265392], (4994543089928849457,5055515752742518226], (1593785714944278507,1618461143046543312], (-8885929495137049667,-8847809439930946447], (7514316870399738740,7535648178040640648], (-7997359338916453120,-7884947734877215590], (6848188842228182830,6851091943952222998], (-7468624498429238352,-7467528674261593018], (-5420066870415394025,-5415525963610970555], (-1990341934536435876,-1981758794620356502], (-2119663304358780223,-2025913005077454264], (6951976975129490336,6972519766285293067], (4814028481954927072,4845532697727637101], (6905718428902118395,6951976975129490336], (2442862581422161803,2473486929810000384], (-7050886265796431130,-6984061332646781425], (899736981222314462,902246412094263205], (4181831925595728309,4186043100749193555], (5228334028316552580,5251807687845401107], (7453466617724696329,7514316870399738740], (-5313682991153179320,-5303078638925764028], (4435447637968520477,4466559239870893508], (-1326922053645316997,-1235736831018577452], (1401641821520873210,1444010990722687070], (1751424608310627366,1764837197082917798], (4501052626153108190,4556808478342919483], (4927485308468803128,4934126082298511287], (9034829150987061085,9125279489658634069], (8296589981550156872,8314630209670832253], (8448913759582599761,8531687930406385678], (-6946999157236744448,-6844075271953818452], (1444010990722687070,1478830670694875260], (8959600358987559211,8960859570137987260], (-1336371416389083059,-1326922053645316997], (-61295809418728622,-42017489632420450], (8969118191184771962,8970234525288431416], (1635093197288200819,1646180075666173338], (4934126082298511287,4936412559469748364], (1751396675459233161,1751424608310627366], (-8742184848088246957,-8729944250195454990], (-8229088751363770412,-8220031638992810484], (4598535296351629260,4603382773438618005], (-8174261525446719402,-8163507145438773646], (755491933174209966,772199007723251143], (-8624222028448607604,-8600591663798592212], (-7519020387526209046,-7508903488301733990], (753307058766478061,755491933174209966], (7582113707662028389,7704299093463733587], (8962972620592857243,8969118191184771962], (-3630383383747491969,-3629090531220364683], (-4468127887710219501,-4455300706501649674], (5852098730437702158,5901360113812253268], (-744327910454436920,-731643009820930522], (1911291563147302352,1937039485053092324], (6233932117267819795,6310871518339058610], (-890279568362794135,-887061681980670889], (-3679929070886276718,-3646398519882341977], (670358990995510048,685259738113005224], (4845532697727637101,4882596066849804097], (-6472782860721811368,-6469678775729021799], (-4715780574795279713,-4713529768344258113], (3078864489691764074,3115136329005690845], (9125279489658634069,9133717713662800417], (8960859570137987260,8962972620592857243], (9203390852073803003,-9222340996132828621], (4798689233015535742,4814028481954927072], (-6821121360440227129,-6777406043019946301], (6310871518339058610,6322362350826191135], (-4233493116436171657,-4215258070829115921], (3130822610289703870,3145528043810254358], (5114813438584055434,5123797488219630832], (-6081413019910708044,-6079621320236638978], (8447388029675461333,8448913759582599761], (-3795970094034238564,-3779568630549582632], (-8180158125318738371,-8174261525446719402], (-3844748631754416189,-3795970094034238564], (4113442287147808270,4131576074431163736], (4936412559469748364,4994543089928849457], (4466559239870893508,4501052626153108190], (-3263191959146785290,-3228825867554770158], (6766473197422691050,6766938926557171347], (-8899258035651035854,-8885929495137049667], (-5381097520433537861,-5313682991153179320], (6883419270850869410,6905718428902118395], (5303695175174121453,5306682210025644500], (-9176601682787902008,-9096402285624130055], (-5219839223573200521,-5146821899868165131], (7862240594123780934,7864684753798709404], (6425564550565269465,6447403433784268090], (8996665254881958342,9032628945801968053], (-4912831044277064162,-4902449072491610799], (-2025913005077454264,-1990341934536435876], (-7467528674261593018,-7455563937756770941], (-3293906917083373209,-3283303577982631091], (4186043100749193555,4255028482041276438], (-4902449072491610799,-4875492513338002087], (8531687930406385678,8562083491872293504], (-8600591663798592212,-8567258569010210224], (-6079621320236638978,-6077962914827619257], (-4960837410336448181,-4955223362385024239], (-7634134348451706917,-7604581392807650182], (1632362492578265392,1635093197288200819], (291067515209041750,330577967974463458], (7789905572870286790,7790287256445162065], (-8007287567032110044,-8001929233768101133], (-6971866923830433843,-6946999157236744448], (9032628945801968053,9034829150987061085], (-4798814403290155024,-4780493053770543287], (3905716352976079268,3920453505268467479], (8995803891393002810,8996665254881958342], (-1393787270658712827,-1355888553876954583], (6522490073023565012,6553480210988421066], (7162495952820208409,7199369658829377994], (4882596066849804097,4927485308468803128], (-5893410924582496350,-5858193870200257506], (8233960782654491894,8252708412304213777], (6725081429318018351,6766473197422691050], (-5303078638925764028,-5222304299920694120], (2376875294823216717,2379477119284337609], (-7393173741279261649,-7376038212607696986], (-1337212328034401607,-1336371416389083059], (902246412094263205,902703964241874282], (288868023576418487,291067515209041750], (1341203791801750057,1364316712425392743], (5199663539610295629,5228334028316552580], (6447403433784268090,6476681916885657705], (-3481009398659298196,-3391281017675748608], (4266791030997837989,4315496651682018963], (3115136329005690845,3130822610289703870], (-7599009809842308024,-7598746015885846943]]] Validation failed in /192.168.120.58 (progress: 0%)
In this log appears:
1) failed with error [repair #7990e900-3a30-11e6-9861-e998758604bf on recommendersjobs/postulationsbyuser (near the middle)
2) Validation failed in /192.168.120.58 (progress: 0%) (in the last part)
While this repair is running (I think it is running), in the log of this node and in the others nodes of the cluster, this appears many times:
DEBUG [ReadRepairStage:4] 2016-06-24 14:08:18,318 ReadCallback.java:235 - Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-3297346077674417080, 42555f31313131303131333538) (d41d8cd98f00b204e9800998ecf8427e vs 9e7f1588ffcd53d13ab3f8bb2be0f05e)
at org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) ~[apache-cassandra-3.3.jar:3.3]
at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:226) ~[apache-cassandra-3.3.jar:3.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
NOTE 1: The cassandra was and is working because of the replication factor of 2.
NOTE 2: The cassandra Cluster is being used in production, and applications are inserting and reading rows.
What can I do to correct this irregular behaviour?
I thought of creating a new node and swap it with the irregular node, what are the correct steps for doing this?
When a node does not respond / is malfunctioning
If you can create a new VM that will repace the malfunctioning node.
See: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html
If you do not want to create a new VM and use the existing malfunctioning node.
IMPORTANT: NEED TO HAVE A REPLICATION FACTOR THAT ALLOWS YOU TO DELETE ALL THE DATA OF THE AFFECTED NODE.
nodetool decommission (On that node)
See: https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsDecommission.html
sudo rm -rf /var/lib/cassandra/*
clear/delete all data in the node
See: https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/referenceClearCpkgData.html
sudo service cassandra start
After a time,
nodetool repair
If the nodetool repair FAILS:
For tables in keyspace:
nodetool scrub <keyspace> <table>
in EACH of the nodes→ see log of the node and wait to complete
run nodetool repair <keyspace> <table>
only in one node (anyone)
NOTE: it is not necessary to run for each table, yo can run:
nodetool scrub <keyspace>
in EACH of the nodes→ see log of the node and wait to complete
nodetool repair <keyspace>
I first suggest trying for 1 table for seeing the log of only that table and comprehend better what is going on
NOTE:
set the loggin levels to INFO to see the important things
nodetool setlogginglevel org.apache.cassandra INFO
Related
inserting in table emp1234 but got an error in cassandra
This is what I tried: emp1234(user_id,age,city,country,height,mobile_no,name,sex,state,weight) FROM '/tmp/user.csv' WITH HEADER = FALSE; But it returns the following error: error - Failed to import 17 rows: WriteTimeout - Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}, will retry later, attempt 1 of 5
The most probable reason for it is that you have different DC name in the replication factor settings of keyspace, than in your cluster. Get DC name(s) via nodetool status, check the names in the settings of the keyspace (via DESCRIBE keyspace_name;), they should be the same, including character case. If they are different, change replication factor via alter KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'DC_name': X};
OverloadException in single node cassandra
I read through this page : https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesHintedHandoff.html and identified that OverloadedException in Cassandra is due to the "The coordinator tracks how many hints it is currently writing, and if the number increases too much, the coordinator refuses writes and throws the OverloadedException exception." But I use a single node and able to get Overload exception frequently so what could be the reason for Overload exception in singlenode with Consistency as 1 and ReplicationFactor as 1? EDITED : Total hints In progress JMX I checked in the code : private static void checkHintOverload(InetAddressAndPort destination) { // avoid OOMing due to excess hints. we need to do this check even for "live" nodes, since we can // still generate hints for those if it's overloaded or simply dead but not yet known-to-be-dead. // The idea is that if we have over maxHintsInProgress hints in flight, this is probably due to // a small number of nodes causing problems, so we should avoid shutting down writes completely to // healthy nodes. Any node with no hintsInProgress is considered healthy. if (StorageMetrics.totalHintsInProgress.getCount() > maxHintsInProgress && (getHintsInProgressFor(destination).get() > 0 && shouldHint(destination))) { throw new OverloadedException("Too many in flight hints: " + StorageMetrics.totalHintsInProgress.getCount() + " destination: " + destination + " destination hints: " + getHintsInProgressFor(destination).get()); } } 1) Is this the only way to get the Overloaded exception ? 2) Why am I getting the Overload exception in single node ? 3) When this checkHintOverload method is called in Single node? NOTE : 1) My Keyspace has been configured with NetworkTopologyStrategy, Will that be a reason? : CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter_1': '1'} AND durable_writes = true; 2) hinted_handoff = enabled in cassandra.yaml, Still I wonder will it trigger the hints in Single node, If so why? 3) consistency level is Quorum in this single node Could any of the three parameter might be a reason for this?
Cassandra issuing an error while selecting the data in table "NoHostAvailable:"
I have created the keyspace and also created a table using Cassandra 3.0 server. I am using the 3 nodes architecture. And three of the servers are working and able to connect the 3 nodes. However when i insert or selecting the data using the CQL, Its showing the error saying that "NoHostAvailable:". Please could anyone provide me the reason and solution for this issue. Topology nodetool status output UN 172.30.1.7 230.22 KB 256 ? 2103dcd3-f09b-47da-a187-bf28b42b918e rack1 DN 172.30.1.20 ? 256 ? 683db65d-0836-40e4-ab5b-fa0db20bae30 rack1 DN 172.30.1.2 ? 256 ? 2b1f15d1-2f92-41ef-a03e-0e5f5f578cf4 rack1 Schema Keyspace CREATE KEYSPACE test WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2}; Table CREATE TABLE testrep(id INT PRIMARY KEY);
Note that from nodetool status, 2 out of your 3 node cluster is down(DN). You might be inserting with a Consistency Level that cannot be satisfied. nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 127.0.0.1 237.31 MiB 256 ? 3c8a8d8d-992c-4b7c-a220-6951e37870c6 rack1 cassandra#cqlsh> create KEYSPACE qqq WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2}; cassandra#cqlsh> use qqq; cassandra#cqlsh:qqq> CREATE TABLE testrep(id INT PRIMARY KEY); cassandra#cqlsh:qqq> insert into testrep (id) VALUES ( 1); cassandra#cqlsh:qqq> CONSISTENCY Current consistency level is ONE. cassandra#cqlsh:qqq> CONSISTENCY TWO ; Consistency level set to TWO. cassandra#cqlsh:qqq> insert into testrep (id) VALUES (2); NoHostAvailable: cassandra#cqlsh:qqq> exit
DSE cassandra not starting
faced with a problem, we have cluster of 5 nodes after restart dse trying to start without success the last record in system.log is below... Tried with heap and 48 and 64, node has 128GB. Three of them started but these two cannot, no error in the log just that record. INFO [main] 2017-05-16 21:16:27,507 CassandraDaemon.java:487 - JVM Arguments: [-Ddse.server_process, -XX:+AlwaysPreTouch, -Dcassandra.disable_auth_caches_remote_configuration=false, -Dcassandra.force_default_indexing_page_size=false, -Dcassandra.join_ring=true, -Dcassandra.load_ring_state=true, -Dcassandra.write_survey=false, -XX:CMSInitiatingOccupancyFraction=75, -XX:CMSWaitDuration=10000, -ea, -XX:G1RSetUpdatingPauseTimePercent=5, -XX:+HeapDumpOnOutOfMemoryError, -Xms16G, -Djava.net.preferIPv4Stack=true, -XX:MaxGCPauseMillis=500, -Xmx16G, -XX:MaxTenuringThreshold=1, -Xss256k, -XX:+PerfDisableSharedMem, -XX:+ResizeTLAB, -XX:StringTableSize=1000003, -XX:SurvivorRatio=8, -XX:ThreadPriorityPolicy=42, -XX:+UseThreadPriorities, -XX:+UseTLAB, -XX:+UseG1GC, -Dcom.sun.management.jmxremote.authenticate=false, -Dcassandra.jmx.local.port=7199, -XX:CompileCommandFile=/etc/dse/cassandra/hotspot_compiler, -javaagent:/usr/share/dse/cassandra/lib/jamm-0.3.0.jar, -Djava.library.path=/usr/share/dse/hadoop2-client/lib/native:/usr/share/dse/cassandra/lib/sigar-bin:/usr/share/dse/hadoop2-client/lib/native:/usr/share/dse/cassandra/lib/sigar-bin:, -Dguice_include_stack_traces=OFF, -Ddse.system_memory_in_mb=128658, -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader, -Dguice_include_stack_traces=OFF, -Ddse.system_memory_in_mb=128658, -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader, -Dlogback.configurationFile=logback.xml, -Dcassandra.logdir=/var/log/cassandra, -Dcassandra.storagedir=/usr/share/dse/data, -Dcassandra-pidfile=/var/run/dse/dse.pid, -Dgraph-enabled=true, -XX:HeapDumpPath=/var/lib/cassandra/java_1494958565.hprof, -XX:ErrorFile=/var/lib/cassandra/hs_err_1494958565.log, -Dguice_include_stack_traces=OFF, -Ddse.system_memory_in_mb=128658, -Dcassandra.config.loader=com.datastax.bdp.config
Howto debug why hints doesn't get processed after all nodes are up again
Did some extended maintenance on a node d1r1n3 out of a 14x node dsc 2.1.15 cluster today, but finished well within the cluster's max hint window. After bringing the node back up most other nodes' hints disappeared again within minutes except for two nodes (d1r1n4 and d1r1n7), where only part of the hints went away. After few hours of still showing 1 active hintedhandoff task I restarted node d1r1n7 and then quickly d1r1n4 emptied its hint table. Howto see for which node stored hints on d1r1n7 are destined? And possible howto get hints processed? Update: Found later corresponding to end-of-maxhint-window after taking node d1r1n3 offline for maintenance that d1r1n7' hints had vanished. Leaving us with a confused feeling of whether this was okay or not. Had the hinted been processed okay or some how just expired after end of maxhint window? If the latter would we need to run a repair on node d1r1n3 after it's mainenance (this takes quite some time and IO... :/) What if we now applied read [LOCAL]QUORUM instead of as currently read ONE w/one DC and RF=3, could this then trigger read path repairs on needed-basis and maybe spare us is this case for a full repair? Answer: turned out hinted_handoff_throttle_in_kb was # default 1024 on these two nodes while rest of cluster were # 65536 :)
hints are stored in cassandra 2.1.15 in system.hints table cqlsh> describe table system.hints; CREATE TABLE system.hints ( target_id uuid, hint_id timeuuid, message_version int, mutation blob, PRIMARY KEY (target_id, hint_id, message_version) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (hint_id ASC, message_version ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = 'hints awaiting delivery' AND compaction = {'enabled': 'false', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 0 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 3600000 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; the target_id correlated with the node id for example in my sample 2 node cluster with RF=2 nodetool status Datacenter: datacenter1 Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.1 71.47 KB 256 100.0% d00c4b10-2997-4411-9fc9-f6d9f6077916 rack1 DN 127.0.0.2 75.4 KB 256 100.0% 1ca6779d-fb41-4a26-8fa8-89c6b51d0bfa rack1 I executed the following while node2 was down cqlsh> insert into ks.cf (key,val) values (1,1); cqlsh> select * from system.hints; target_id | hint_id | message_version | mutation --------------------------------------+--------------------------------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1ca6779d-fb41-4a26-8fa8-89c6b51d0bfa | e80a6230-ec8c-11e6-a1fd-d743d945c76e | 8 | 0x0004000000010000000101cfb4fba0ec8c11e6a1fdd743d945c76e7fffffff80000000000000000000000000000002000300000000000547df7ba68692000000000006000376616c0000000547df7ba686920000000400000001 (1 rows) as can be seen the system.hints.target_id correlates with host id in nodetool status (1ca6779d-fb41-4a26-8fa8-89c6b51d0bfa)