Exceptions bringing down cassandra 3.3 cluster - cassandra

This is a rather large post of seemingly related issues and we've posted to the Cassandra user list as well. We're using Cassandra 3.3 and our development been trying to track down some new issues we’ve been seeing on one of our pre-prod environments where we’ve been having consistent failures very often (every day or every 2-3 days), even if load/number of transactions are very light.
We’re running a 2 data center deployment with 3 nodes in each data center. Our tables are setup with replication factor = 2 and we have 16G dedicated to the heap with the G1GC for garbage collection. Our systems are AWS M4.2xlarge with 8 CPUs and 32GB of RAM and we have 2 general purpose EBS volumes on each node of 500GB each.
Once we hit this it seems like the only way to recover is to shutdown the cluster and restart. Running repairs after the restart often results in failures and we pretty much end up having to truncate the tables before starting up clean again. We are not sure if the two are inter-related. We pretty much see the same issue on all the nodes.
If anyone has any tips or any suggestions on how to diagnose/debug this further, it will help a great deal! The issues are:
Issue 1: Once the errors occur they just repeat for a bit followed by the errors in issue 2.
INFO [CompactionExecutor:165] 2017-01-08 08:32:39,915 AutoSavingCache.java:386 - Saved KeyCache (63 items) in 5 ms
INFO [IndexSummaryManager:1] 2017-01-08 08:32:41,438 IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO [HANDSHAKE-ahldataslave4.bos.manhattan.aspect-cloud.net/10.184.8.224] 2017-01-08 09:30:03,988 OutboundTcpConnection.java:505 - Handshaking version with ahldataslave4.bos.manhattan.aspect-cloud.net/10.184.8.224
INFO [IndexSummaryManager:1] 2017-01-08 09:32:41,440 IndexSummaryRedistribution.java:74 - Redistributing index summaries
WARN [SharedPool-Worker-9] 2017-01-08 10:30:00,116 BatchStatement.java:289 - Batch of prepared statements for [manhattan.rcmessages] is of size 9264, exceeding specified threshold of 5120 by 4144.
INFO [IndexSummaryManager:1] 2017-01-08 10:32:41,442 IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO [IndexSummaryManager:1] 2017-01-08 11:32:41,443 IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO [CompactionExecutor:162] 2017-01-08 12:32:39,914 AutoSavingCache.java:386 - Saved KeyCache (108 items) in 4 ms
INFO [IndexSummaryManager:1] 2017-01-08 12:32:41,446 IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO [IndexSummaryManager:1] 2017-01-08 13:32:41,448 IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO [IndexSummaryManager:1] 2017-01-08 14:32:41,450 IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO [IndexSummaryManager:1] 2017-01-08 15:32:41,451 IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO [CompactionExecutor:170] 2017-01-08 16:32:39,915 AutoSavingCache.java:386 - Saved KeyCache (109 items) in 4 ms
INFO [IndexSummaryManager:1] 2017-01-08 16:32:41,453 IndexSummaryRedistribution.java:74 - Redistributing index summaries
WARN [SharedPool-Worker-4] 2017-01-08 17:30:45,048 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-4,5,main]: {}
java.lang.AssertionError: null
at org.apache.cassandra.db.rows.BufferCell.<init>(BufferCell.java:49) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:88) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:83) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BufferCell.purge(BufferCell.java:175) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.ComplexColumnData.lambda$purge$107(ComplexColumnData.java:165) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree$FiltrationTracker.apply(BTree.java:650) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:693) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:668) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.ComplexColumnData.transformAndFilter(ComplexColumnData.java:170) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.ComplexColumnData.purge(ComplexColumnData.java:165) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.ComplexColumnData.purge(ComplexColumnData.java:43) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BTreeRow.lambda$purge$102(BTreeRow.java:333) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree$FiltrationTracker.apply(BTree.java:650) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:693) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:668) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BTreeRow.transformAndFilter(BTreeRow.java:338) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BTreeRow.purge(BTreeRow.java:333) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.partitions.PurgeFunction.applyToRow(PurgeFunction.java:88) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:116) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:133) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:89) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:294) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:292) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:50) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-3.3.0.jar:3.3.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_111]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) [apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.3.0.jar:3.3.0]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Issue 2:
WARN [SharedPool-Worker-2] 2017-01-09 05:18:58,880 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-2,5,main]: {}
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2461) ~[apache-cassandra-3.3.0.jar:3.3.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_111]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) [apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.3.0.jar:3.3.0]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Caused by: java.lang.NullPointerException: null
WARN [SharedPool-Worker-1] 2017-01-09 05:19:14,220 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2461) ~[apache-cassandra-3.3.0.jar:3.3.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_111]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) [apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.3.0.jar:3.3.0]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Caused by: java.lang.NullPointerException: null
WARN [SharedPool-Worker-1] 2017-01-09 05:19:29,876 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}
java.lang.NullPointerException: null
WARN [SharedPool-Worker-1] 2017-01-09 05:19:45,217 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}
java.lang.NullPointerException: null
WARN [SharedPool-Worker-2] 2017-01-09 05:20:00,241 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-2,5,main]: {}
Issue 3: Periodic warnings about leaks (this was during a repair) - we had an OOM occurrence about a week ago:
INFO [Thread-9] 2017-01-09 15:01:11,828 RepairSession.java:237 - [repair #7593ad41-d67c-11e6-a76f-f1d6fd321f0d] new session: will sync ahldataslave1.bos.manhattan.aspect-cloud.net/10.184.8.151, /10.184.8.137, /10.184.8.219, /10.184.8.224 on range [(4815877001221178276,4825317823493984218], (3375739423664266689,3378070249246945856], (-7393599445401300009,-7393031105803218575], (-8130357422229977058,-8127934975152946543], (-4611683212167667055,-4597307445147354114], (4829191785668845783,4847928037092763493], (5522544388079333104,5528633388915935404], (1021118078158195547,1032730403399741255], (-20739833830226207,-14719054523073734], (-6205449048248449637,-6203730539323274527], (45435932670861622,69554714149031706], (148647262653106287,181904829526383862], (-854190757691833472,-854087983203765538], (-7714687510919982336,-7700463340346726648], (-360739232930366185,-339249720009847821], (524595904743716879,537371267474489752], (8185882494697143291,8187345181917158585], (181904829526383862,190096166859570216], (-1152328644109499972,-1134711681717758961], (-586016578220869954,-566687033455187734], (-4597307445147354114,-4541034774723552824], (-4488655272471923458,-4481233629800439836], (3111053603069111732,3121096557789915147], (-1167831376731724655,-1152328644109499972], (-6302582962631272172,-6290099341099600242], (8082429902489189721,8084233962135014968], (-2781993705200227829,-2777492754784519178], (5996816756578981045,6022011570976925279], (4404919629779433734,4422232637329318322], (-643317786131863042,-611582394998065454], (4514099138058135901,4514906323961126580], (7273471227296986256,7316794306200362675], (4462984376163219141,4464984714163480487], (-2390509810947844616,-2382282465250439153], (-7773729052257068390,-7772926086336061342], (-161569209320145017,-149914302996366392], (1392011488470148469,1403436343776848669], (546940383302067460,574197777751440495], (6624568645457217871,6629411218351717560], (-1134711681717758961,-1133755953039077472], (-7248281701751445296,-7243217407431962906], (2374653705052569734,2377396978942737768], (5619550734131192401,5628464353424177974], (-192925818286153751,-181902834109073167], (5896301416900157412,5911734690958039138], (4422232637329318322,4462984376163219141], (-4524564105581793254,-4504900851715014673], (-2115528876758613050,-2112322288682500569], (5504098260114685179,5515264579319185925], (35614519826275404,45435932670861622], (-855913784497695841,-854190757691833472], (5736245318761313405,5738445441111493493], (4807463771329802801,4815877001221178276], (-2884278504117308670,-2881329894351532736], (-207545241327995479,-207389332352346254], (-181902834109073167,-172819893538312319], (-7636862515759014032,-7626686102011136779], (-8140870857704067579,-8130357422229977058], (-2342122616740401439,-2335433185444036860], (-2881329894351532736,-2880349360092682175], (-611582394998065454,-587918354120388686], (-7320730625612121443,-7317644215584737598], (-13045270760172222,8260384715982984], (-8668935926714005039,-8657580917928283146], (8260384715982984,35614519826275404], (2861067483327829799,2864748323206840859], (3127114336475141934,3139163273099443478], (-7657420721193956821,-7636862515759014032], (1032730403399741255,1032951438204062821], (-1133755953039077472,-1130095656542041882], (5515264579319185925,5522544388079333104], (6826479291574328528,6828593995008227175], (658600138550738740,680809612853177633], (2364271158745468459,2374653705052569734], (2864748323206840859,2868871349367827459], (-7741910906325473118,-7738712583959975589], (2749562470526527976,2765502980561577063], (4569656624441412194,4581277061087905130], (-1130095656542041882,-1102822102812415509], (2357686856375407577,2364271158745468459], (2875146491882456097,2913694653712675567], (-2365721528619247416,-2362417204623405805], (-7700463340346726648,-7657420721193956821], (-1545400958003324469,-1527145992709891047], (6595468060656389384,6611122695943865425], (-2362417204623405805,-2360989147149959339], (680809612853177633,691564499046700936], (1048818468740031469,1051848236340749987], (-6290099341099600242,-6284139189426371408], (-430511209136180745,-417631748808875679], (-2127191227331961211,-2115528876758613050], (-587918354120388686,-586016578220869954], (5950659959556897324,5972536818860340742], (7228681112293176899,7273471227296986256], (-92115275177849392,-73078256610822906], (-2150182737057335813,-2131084968707606517], (-5931891237188071234,-5925084712702354100], (1854063092500381970,1894128417137542926], (3139163273099443478,3144353879892041737], (2795130016444690338,2849989872521310320], (69554714149031706,73259277905023853], (-198884270246085916,-192925818286153751], (2301867852127537258,2344138511230415627], (-2360989147149959339,-2342122616740401439], (2868871349367827459,2875146491882456097], (-8520120040026037727,-8484758828657908805], (1044175704807438060,1048818468740031469], (3658700192210846721,3709106004630729567], (-8715352510373593584,-8695484052773103051], (4825317823493984218,4829191785668845783], (5881177751245354064,5896301416900157412], (-1557733846751863547,-1545400958003324469], (3764340229927726801,3772352737530248438], (5723087139361841478,5736245318761313405], (-339249720009847821,-325063001525001753], (4581277061087905130,4585591969590606261], (2849989872521310320,2861067483327829799], (-201641945389990466,-198884270246085916], (3062119490734517386,3070339219117072282], (6611989977644669113,6624568645457217871], (-6303561532875621980,-6302582962631272172], (-2880349360092682175,-2815907193866317595], (-2210673969380244356,-2200655423201071981], (6920406973111913689,6944396390330370813], (3070339219117072282,3089918739923672617], (1018479337556218583,1021118078158195547], (-22373917471115570,-20802161777184740], (2349224556257120691,2357686856375407577], (1610291395978708348,1610350862682840167], (-7774308296105089774,-7773729052257068390], (-207389332352346254,-201641945389990466], (-4504900851715014673,-4488655272471923458], (-913362829471550684,-900550299179500948], (574197777751440495,575687259162456966], (-209852186312631406,-207545241327995479], (3961560378473489131,4029122901768525705], (1059898575782522398,1065913666232086022], (6611122695943865425,6611989977644669113], (-2815907193866317595,-2781993705200227829], (-8149621014270042242,-8140870857704067579], (-999997390934570392,-998678768267637866], (2377396978942737768,2384267095612056863], (-2267001694418802171,-2210871230482048364], (-2131084968707606517,-2127191227331961211], (3144353879892041737,3183343610940682057], (-2210871230482048364,-2210673969380244356], (4399306102077085676,4404919629779433734], (537371267474489752,546940383302067460], (-7722143244670988745,-7714687510919982336], (3754196018068729018,3763367677110000435], (-5925084712702354100,-5919412889821651249], (4585591969590606261,4590124192625756967], (-1189315862160445133,-1167831376731724655], (-14719054523073734,-13045270760172222], (3772352737530248438,3791853264732937441], (-149914302996366392,-138596249554837825], (-1459586834232833970,-1455782887662096513], (-7626686102011136779,-7613607518487725198], (-2370342707880863831,-2365721528619247416], (1051848236340749987,1059898575782522398], (575687259162456966,587747968172054843], (1846675642540253784,1854063092500381970], (5911734690958039138,5950659959556897324], (3709106004630729567,3754196018068729018], (2274687735800823626,2301867852127537258], (3763367677110000435,3764340229927726801], (2765502980561577063,2768302382460911776], (-4835698441475193016,-4821740368643417588], (-4840592682190099042,-4835698441475193016], (-900550299179500948,-898509802837897450], (5972536818860340742,5996816756578981045], (7316794306200362675,7316948326538779302], (-20802161777184740,-20739833830226207], (-8695484052773103051,-8693825425156218047], (-6228811049824135380,-6205449048248449637], (-8522744988679262888,-8520120040026037727], (-1527145992709891047,-1522580222898594256], (1065913666232086022,1070185844169898505], (3121096557789915147,3127114336475141934]] for manhattan.[config, messages, rcmessages, csseedmessages, metadata, rqmessages]
ERROR [Reference-Reaper:1] 2017-01-09 15:01:12,481 Ref.java:202 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#8520181) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier#110489339:/data/manhattan/messages-5e11ec20d49411e6ac36cb64da3c5bcf/ma-1-big was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-01-09 15:01:12,481 Ref.java:202 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#2aa6af1) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier#1202250944:/data/manhattan/rcmessages-5f65e360d49411e6ac36cb64da3c5bcf/ma-1-big was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-01-09 15:01:12,481 Ref.java:202 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#5feebbbd) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier#1990550669:/data/manhattan/rcmessages-5f65e360d49411e6ac36cb64da3c5bcf/ma-2-big was not released before the reference was garbage collected
Issue 4: Failure on repair (cannot complete repair) - validation failures.
INFO [Thread-9] 2017-01-09 15:01:12,496 RepairSession.java:237 - [repair #75f99b00-d67c-11e6-a76f-f1d6fd321f0d] new session: will sync ahldataslave1.bos.manhattan.aspect-cloud.net/10.184.8.151, /10.184.8.136, /10.184.8.219, /10.184.8.206 on range [(7070473513877450523,7071785512108503562], (7540323476844137856,7551069853951512536], (6496851087781126402,6506947825241172690], (4956723654110177663,4997988272614120716], (6159728418399742238,6161785046042182181], (-6111201816165886930,-6109358652076682027], (6765434292326999735,6786216263996459971], (5670991529662923445,5678922736379606122], (-8368452819419163190,-8352559832808839636], (-3579114434972148656,-3552744642655983590], (7574005965138357735,7575976736757457093], (3443342880213529041,3448513378658973391], (-6374441519731158925,-6368083975098827965], (1506273267531550913,1508155162580364084], (-3841814855559153943,-3835938678769454001], (9170965416227969215,9181174655986386320], (-6589650497263617394,-6578994961053045565], (-7526495575912144496,-7518448902193774650], (3448513378658973391,3466857105667177973], (7553098986919382391,7574005965138357735], (-7927877550785715833,-7927648953253692861], (8525761079018902513,8533541215797136240], (-9020129600263662082,-9010501329825003246], (1090724998362773160,1097939939665235297], (6495099300248039972,6496851087781126402], (6161785046042182181,6230901034893815059], (-7463338857564269606,-7459175745770455573], (-5365113529531278290,-5358403398116793300], (2249898065901245108,2256752991177466019], (427458615961451939,456792456721828910], (-7518448902193774650,-7476577165771266744], (8496964666711176665,8503454084084785332], (1508155162580364084,1531943799366406071], (1623939448991457330,1628591069075100791], (2500969373198505595,2510256322219581447], (1487072216030226303,1506273267531550913], (-8842784826526224555,-8840216173643212475], (2233348668129647030,2247658708769712814], (-8289206614243730701,-8287074180401539509], (-7476577165771266744,-7470389682785988771], (3509709684061378867,3512938839959595162], (1296769759309759534,1302830049359124705], (7575976736757457093,7612135532866611803], (7539950122318008545,7540323476844137856], (-1860975172751451123,-1853676643512030698], (-8116786947316919605,-8107369675963369796], (-5576866678991811211,-5548824138592035245], (-9211881728677655622,-9209566963086044390], (7489367377277174970,7511259639253940249], (1302830049359124705,1310010318365119318], (-8750733220711433842,-8741408722769214425], (-6060376367842620341,-6049131555099770258], (2476529278549364684,2476659930520260878], (-4418742343319617797,-4384333951548488071], (1930660747072472501,1946687557062422024], (-3702380150896445622,-3691910222307497728], (-4419703203609450987,-4418742343319617797], (7551697213870628887,7553098986919382391], (-7909869864636951454,-7904683595914140744], (2492882420790179151,2500969373198505595], (759685512451005610,773693246847159219], (722689136592207691,730821328907957274], (5219025897237634110,5229566135848567231], (4149517035748854868,4162833917677490963], (1326132699179277905,1349642653396670633], (-3552744642655983590,-3549073980410969181], (-4456698363145934960,-4419703203609450987], (-5482989026069906323,-5481213659503324794], (279535406932511121,297415597645494493], (-9204266745720873498,-9161795161892156657], (6786216263996459971,6786883290959946615], (1635462812102810232,1638975594001298412], (1531943799366406071,1546279755200334429], (-3851094911406737954,-3841814855559153943], (8488130353439482734,8496964666711176665], (9146936609495586767,9148689706364420814], (-3674452659583777301,-3669778494255064100], (1546279755200334429,1554992050513055527], (2969501640373218313,2975603144417796179], (-3865994019611705472,-3851094911406737954], (1119818899797211952,1121114266048498658], (5555981883036126656,5559877922798642583], (-9161795161892156657,-9148239227447645822], (-1988480938545038453,-1972594091279542463], (-8457443005098123276,-8426466451426413266], (7527886006442742638,7539950122318008545], (1482724986864293552,1487072216030226303], (218814708185728764,222373457441562248], (-1844742172330357357,-1834186452017037646], (-7914786200548736052,-7909869864636951454], (1554992050513055527,1556471756149638584], (2247658708769712814,2249241162600622119], (2249241162600622119,2249898065901245108], (-6368083975098827965,-6357129727004451888], (1349642653396670633,1372390693080139703], (-5628093303198956105,-5602276528881256607], (-5602276528881256607,-5576866678991811211], (-1972594091279542463,-1932380490869749103], (6746187069375198686,6759453964579921534], (-9209566963086044390,-9204266745720873498], (2969169700471726709,2969501640373218313], (-1846725725884736711,-1844742172330357357], (8466845312658849111,8478632986527402947], (6786883290959946615,6786909234664957611], (-8105207391882034800,-8058995460560987960], (-5481213659503324794,-5480151738028046165], (6152943806705127166,6159728418399742238], (-7931416471770778308,-7927877550785715833], (-7444305315017293180,-7438698090368547807], (-5405765462772655927,-5365113529531278290], (7468218535666030613,7489367377277174970], (7370990959551251700,7384554095789173771], (9181174655986386320,9197356813903415489], (-7891619558235712999,-7886926665775314135], (4863660993721847898,4876110806493712229], (5546813895398001993,5555981883036126656], (6793672076817085638,6817463007248949484], (730821328907957274,741073317079317196], (1285297621466628116,1296769759309759534], (-6109358652076682027,-6097749005908966679], (-3160359268440132710,-3150255443388600113], (-6067028212437311851,-6060376367842620341], (3512938839959595162,3531669213417260709], (-1641724835017736571,-1630212592242213627], (-5631109618795517413,-5628093303198956105], (6759453964579921534,6765434292326999735], (8027648385044762960,8030142266857918084], (1264145341476479882,1285297621466628116], (1116504555019555851,1119818899797211952], (-3139664283152392978,-3138393885745606369], (-1652241560457928053,-1641724835017736571], (8503454084084785332,8525761079018902513], (-1853676643512030698,-1846725725884736711], (1628591069075100791,1635462812102810232], (-8840216173643212475,-8817712049837887184], (-6357129727004451888,-6355060010614218705], (-6097749005908966679,-6088616433680186631], (-7904683595914140744,-7891619558235712999], (-4952793941753266051,-4937049631932742604], (-3150255443388600113,-3139664283152392978], (2512801900471921714,2516325796899709313], (2510256322219581447,2512801900471921714], (-5638226176228799616,-5631109618795517413], (-7886926665775314135,-7882937045572193145], (6786909234664957611,6793672076817085638], (-3691910222307497728,-3674452659583777301], (-4867940821711541762,-4866946150328928181], (7071785512108503562,7081793743588959944], (4876110806493712229,4879643371134562502], (6138845154266790671,6152943806705127166], (-6355060010614218705,-6332373169802347251], (-6125214696594291714,-6111201816165886930], (-7470389682785988771,-7463338857564269606], (-3598204681868075295,-3579114434972148656], (-8107369675963369796,-8105207391882034800], (7612135532866611803,7641727005250566009], (9148689706364420814,9170965416227969215], (6506947825241172690,6515762919020447377], (-6088616433680186631,-6067028212437311851], (741073317079317196,759685512451005610], (-8352559832808839636,-8344010281020219217], (-6049131555099770258,-6043632332003790850], (3494099643169044463,3509709684061378867], (3466857105667177973,3494099643169044463], (7551069853951512536,7551697213870628887], (2476659930520260878,2492882420790179151], (-8842911646480810622,-8842784826526224555], (193808271419729511,198960416318160202], (-4873803694032119901,-4867940821711541762], (-7569775426219162591,-7569610743922997927], (4730391117872973381,4732570096640666262], (9137762516596245936,9146936609495586767], (-9148239227447645822,-9137147760354721369], (-8296203956052422690,-8289206614243730701], (198960416318160202,218814708185728764]] for manhattan.[config, messages, rcmessages, csseedmessages, metadata, rqmessages]
WARN [SharedPool-Worker-2] 2017-01-09 15:01:38,219 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-2,5,main]: {}
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2461) ~[apache-cassandra-3.3.0.jar:3.3.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_111]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) [apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.3.0.jar:3.3.0]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Caused by: java.lang.NullPointerException: null
WARN [SharedPool-Worker-2] 2017-01-09 15:04:41,678 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-2,5,main]: {}
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2461) ~[apache-cassandra-3.3.0.jar:3.3.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_111]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) ~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) [apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.3.0.jar:3.3.0]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]

Related

Multithreaded paging query database Caused by :java.sql.SQLException: GC overhead limit exceeded

enter image description here
1、First, I will paginate the order_id from this table
2、After getting the order_id, check the data in this table
The reason for this check is to ensure that the billing of each order is complete,But I ran into a problem when I did this, and it's as follows
2022-12-15 11:16:52.798 [,] [master housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - master - Thread starvation or clock leap detected (housekeeper delta=1m344ms).
Exception in thread "RiskOverdueBusiness-1" org.springframework.dao.TransientDataAccessResourceException:
### Error querying database. Cause: java.sql.SQLException: GC overhead limit exceeded
### The error may exist in qnvip/data/overview/mapper/risk/RiskOverdueBaseMapper.java (best guess)
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### SQL: SELECT id,renew_term,deleted,count_day,order_id,order_no,mini_type,repay_date,real_repay_time,platform,finance_type,term,risk_level,risk_strategy,audit_type,forced_conversion,discount_return_amt,rent_total,buyout_amt,act_buyout_amt,buyout_discount,overdue_fine,bond_amt,before_discount,total_discount,bond_rate,is_overdue,real_capital,capital,repay_status,overdue,real_repay_time_status,renew_total_rent,max_term,hit_value,renew_status,renew_day,surplus_bond_amt,actual_supply_price,is_settle,order_status,current_overdue_days,surplus_amt,overdue_day,term_overdue_days,renew_type,is_deleted,version,create_time,update_time FROM dataview_risk_overdue_base WHERE is_deleted=0 AND (overdue_day = ?)
### Cause: java.sql.SQLException: GC overhead limit exceeded
; GC overhead limit exceeded; nested exception is java.sql.SQLException: GC overhead limit exceeded
at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:110)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:88)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:440)
at com.sun.proxy.$Proxy142.selectList(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.selectList(SqlSessionTemplate.java:223)
at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.executeForMany(MybatisMapperMethod.java:173)
at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:78)
at com.baomidou.mybatisplus.core.override.MybatisMapperProxy$PlainMethodInvoker.invoke(MybatisMapperProxy.java:148)
at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:89)
at com.sun.proxy.$Proxy212.selectList(Unknown Source)
at com.baomidou.mybatisplus.extension.service.IService.list(IService.java:279)
at qnvip.data.overview.service.risk.impl.RiskOverdueBaseServiceImpl.getListByOrderId(RiskOverdueBaseServiceImpl.java:61)
at qnvip.data.overview.service.risk.impl.RiskOverdueBaseServiceImpl$$FastClassBySpringCGLIB$$31ab1dda.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:687)
at qnvip.data.overview.service.risk.impl.RiskOverdueBaseServiceImpl$$EnhancerBySpringCGLIB$$e7698a08.getListByOrderId(<generated>)
at qnvip.data.overview.business.risk.RiskOverdueBusinessNew.oldRepayTask(RiskOverdueBusinessNew.java:95)
at qnvip.data.overview.business.risk.RiskOverdueBusinessNew.lambda$execOldData$1(RiskOverdueBusinessNew.java:82)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: GC overhead limit exceeded
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)
at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:370)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:64)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:63)
at com.sun.proxy.$Proxy481.query(Unknown Source)
at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.doQuery(MybatisSimpleExecutor.java:69)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:325)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
at com.baomidou.mybatisplus.core.executor.MybatisCachingExecutor.query(MybatisCachingExecutor.java:165)
at com.baomidou.mybatisplus.extension.plugins.MybatisPlusInterceptor.intercept(MybatisPlusInterceptor.java:81)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
at com.sun.proxy.$Proxy480.query(Unknown Source)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:147)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:140)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:426)
... 18 more
2022-12-15 11:17:12.040 [,] [slave housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - slave - Thread starvation or clock leap detected (housekeeper delta=59s747ms).
Exception in thread "RiskOverdueBusiness-5" org.springframework.dao.TransientDataAccessResourceException:
### Error querying database. Cause: java.sql.SQLException: Can not read response from server. Expected to read 321 bytes, read 167 bytes before connection was unexpectedly lost.
### The error may exist in qnvip/data/overview/mapper/risk/RiskOverdueBaseMapper.java (best guess)
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### Cause: java.sql.SQLException: Can not read response from server. Expected to read 321 bytes, read 167 bytes before connection was unexpectedly lost.
; Can not read response from server. Expected to read 321 bytes, read 167 bytes before connection was unexpectedly lost.; nested exception is java.sql.SQLException: Can not read response from server. Expected to read 321 bytes, read 167 bytes before connection was unexpectedly lost.
at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:110)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at com.sun.proxy.$Proxy142.selectList(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.selectList(SqlSessionTemplate.java:223)
at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.executeForMany(MybatisMapperMethod.java:173)
at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:78)
at com.baomidou.mybatisplus.core.override.MybatisMapperProxy$PlainMethodInvoker.invoke(MybatisMapperProxy.java:148)
at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:89)
at com.sun.proxy.$Proxy212.selectList(Unknown Source)
at com.baomidou.mybatisplus.extension.service.IService.list(IService.java:279)
at qnvip.data.overview.service.risk.impl.RiskOverdueBaseServiceImpl.getListByOrderId(RiskOverdueBaseServiceImpl.java:61)
at qnvip.data.overview.service.risk.impl.RiskOverdueBaseServiceImpl$$FastClassBySpringCGLIB$$31ab1dda.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:687)
at qnvip.data.overview.service.risk.impl.RiskOverdueBaseServiceImpl$$EnhancerBySpringCGLIB$$e7698a08.getListByOrderId(<generated>)
at qnvip.data.overview.business.risk.RiskOverdueBusinessNew.oldRepayTask(RiskOverdueBusinessNew.java:95)
at qnvip.data.overview.business.risk.RiskOverdueBusinessNew.lambda$execOldData$1(RiskOverdueBusinessNew.java:82)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Can not read response from server. Expected to read 321 bytes, read 167 bytes before connection was unexpectedly lost.
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)
at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:370)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:64)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:79)
at sun.reflect.GeneratedMethodAccessor209.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:63)
at com.sun.proxy.$Proxy481.query(Unknown Source)
at com.baomidou.mybatisplus.core.executor.MybatisSimpleExecutor.doQuery(MybatisSimpleExecutor.java:69)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:325)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
at com.baomidou.mybatisplus.core.executor.MybatisCachingExecutor.query(MybatisCachingExecutor.java:165)
at com.baomidou.mybatisplus.extension.plugins.MybatisPlusInterceptor.intercept(MybatisPlusInterceptor.java:81)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
at com.sun.proxy.$Proxy480.query(Unknown Source)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:147)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:140)
at sun.reflect.GeneratedMethodAccessor210.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:426)
... 18 more
I thought the query was too large to release the reference in time, but it didn't seem to work. How do I deal with exceptions to make sure that all the data gets queried
enter image description here

How to fix this error running Nutch 1.15 ERROR fetcher.Fetcher - Fetcher job did not succeed, job status:FAILED, reason: NA

When I'm starting a crawl using Nutch 1.15 with this:
/usr/local/nutch/bin/crawl --i -s urls/seed.txt crawldb 5
Then it starts to run and I get this error when it tries to fetch:
2019-02-10 15:29:32,021 INFO mapreduce.Job - Running job: job_local1267180618_0001
2019-02-10 15:29:32,145 INFO fetcher.FetchItemQueues - Using queue mode : byHost
2019-02-10 15:29:32,145 INFO fetcher.Fetcher - Fetcher: threads: 50
2019-02-10 15:29:32,145 INFO fetcher.Fetcher - Fetcher: time-out divisor: 2
2019-02-10 15:29:32,149 INFO fetcher.QueueFeeder - QueueFeeder finished: total 1 records hit by time limit : 0
2019-02-10 15:29:32,234 WARN mapred.LocalJobRunner - job_local1267180618_0001
java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.NullPointerException
at org.apache.nutch.net.URLExemptionFilters.<init>(URLExemptionFilters.java:39)
at org.apache.nutch.fetcher.FetcherThread.<init>(FetcherThread.java:154)
at org.apache.nutch.fetcher.Fetcher$FetcherRun.run(Fetcher.java:222)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2019-02-10 15:29:33,023 INFO mapreduce.Job - Job job_local1267180618_0001 running in uber mode : false
2019-02-10 15:29:33,025 INFO mapreduce.Job - map 0% reduce 0%
2019-02-10 15:29:33,028 INFO mapreduce.Job - Job job_local1267180618_0001 failed with state FAILED due to: NA
2019-02-10 15:29:33,038 INFO mapreduce.Job - Counters: 0
2019-02-10 15:29:33,039 ERROR fetcher.Fetcher - Fetcher job did not succeed, job status:FAILED, reason: NA
2019-02-10 15:29:33,039 ERROR fetcher.Fetcher - Fetcher: java.lang.RuntimeException: Fetcher job did not succeed, job status:FAILED, reason: NA
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:503)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:543)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:517)
And I get this error in the console which is the command it runs:
Error running:
/usr/local/nutch/bin/nutch fetch -D mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true -D fetcher.timelimit.mins=180 crawlsites/segments/20190210152929 -noParsing -threads 50
I had to delete the nutch folder and do a new install and it worked after this.

Spark ML ALS collaborative filtering always fail if the iteration more than 20 [duplicate]

This question already has an answer here:
Checkpointing In ALS Spark Scala
(1 answer)
Closed 4 years ago.
My data set‘s size is about 3G and has 380 million datas. Always wrong if I add iteration steps. And increase memory, increase block or decrease block, decrease checkpoint cannot solve my problem.
Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method)
The method introduced to set small checkpoint cannot solve my problem.
StackOverflow-error when applying pyspark ALS's "recommendProductsForUsers" (although cluster of >300GB Ram available)
This is the DataFrame for ALS training, which is about 380 million rows.
+---------+-----------+------+
| user_id|item_id|rating|
+---------+-----------+------+
|154317644| 58866| 6|
| 69669214| 601866| 7|
|126094876| 909352| 3|
| 45246613| 1484481| 3|
|123317968| 2101977| 3|
| 375928| 2681933| 1|
|136939309| 3375806| 2|
| 3150751| 4198976| 2|
| 87648646| 1030196| 3|
| 57672425| 5385142| 2|
+---------+-----------+------+
This is the code to train ALS.
val als = new ALS()
.setMaxIter(setMaxIter)
.setRegParam(setRegParam)
.setUserCol("user_id")
.setItemCol("item_id")
.setRatingCol("rating")
.setImplicitPrefs(false)
.setCheckpointInterval(setCheckpointInterval)
.setRank(setRank)
.setNumItemBlocks(setNumItemBlocks)
.setNumUserBlocks(setNumUserBlocks)
val Array(training, test) = ratings.randomSplit(Array(0.9, 0.1))
val model = als.fit(training) // wrong in this step
This is the ALS source code where error happens.
val srcOut = srcOutBlocks.join(srcFactorBlocks).flatMap {
case (srcBlockId, (srcOutBlock, srcFactors)) =>
srcOutBlock.view.zipWithIndex.map { case (activeIndices, dstBlockId) =>
(dstBlockId, (srcBlockId, activeIndices.map(idx => srcFactors(idx))))
}
}
This is the Exception and Error logs.
18/08/23 15:05:43 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/08/23 15:13:35 WARN scheduler.TaskSetManager: Lost task 20.0 in stage 56.0 (TID 31322, 6.ai.bjs-datalake.p1staff.com, executor 9): java.lang.StackOverflowError
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2669)
at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:3170)
at java.io.ObjectInputStream.readHandle(ObjectInputStream.java:1678)
18/08/23 15:13:35 WARN server.TransportChannelHandler: Exception in connection from /10.191.161.108:23300
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
18/08/23 15:13:36 ERROR cluster.YarnClusterScheduler: Lost executor 15 on 2.ai.bjs-datalake.p1staff.com: Container marked as failed: container_e04_1533096025492_4001_01_000016 on host: 2.ai.bjs-datalake.p1staff.com. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_e04_1533096025492_4001_01_000016
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
18/08/23 15:05:43 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/08/23 15:13:35 WARN scheduler.TaskSetManager: Lost task 20.0 in stage 56.0 (TID 31322, 6.ai.bjs-datalake.p1staff.com, executor 9): java.lang.StackOverflowError
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2669)
at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:3170)
at java.io.ObjectInputStream.readHandle(ObjectInputStream.java:1678)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1739)
18/08/23 15:13:36 ERROR cluster.YarnClusterScheduler: Lost executor 10 on 5.ai.bjs-datalake.p1staff.com: Container marked as failed: container_e04_1533096025492_4001_01_000011 on host: 5.ai.bjs-datalake.p1staff.com. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_e04_1533096025492_4001_01_000011
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Is anybody meet this error?
After set the check point directory, it works. Thank #eliasah
spark.sparkContext.setCheckpointDir("hdfs://datalake/check_point_directory/als")
Check point will not work if you do not set the directory.

Cassandra:2.2.8:org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses

I'm experiencing node crashes where system.logfile is showing bunch of 'ReadTimeoutException' hitting 500ms.
cassandra.yaml file has setting for [read_request_timeout_in_ms: 10000]
can you folks please share how i can address these timeout! Thanks in advance!
error stack:
ERROR [SharedPool-Worker-241] 2017-02-01 13:18:27,663 Message.java:611 - Unexpected exception during request; channel = [id: 0x5d8abf33, /172.18.30.62:47580 => /216.12.225.9:9042]
java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:497) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.auth.CassandraRoleManager.canLogin(CassandraRoleManager.java:306) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.ClientState.login(ClientState.java:269) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:79) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:507) [apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:401) [apache-cassandra-2.2.8.jar:2.2.8]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) [apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.2.8.jar:2.2.8]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:110) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:147) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1441) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1365) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1282) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:176) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:505) ~[apache-cassandra-2.2.8.jar:2.2.8]
at org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:493) ~[apache-cassandra-2.2.8.jar:2.2.8]
... 13 common frames omitted
INFO [ScheduledTasks:1] 2017-02-01 13:18:27,682 MessagingService.java:946 - READ messages were dropped in last 5000 ms: 149 for internal timeout and 0 for cross node timeout
INFO [Service Thread] 2017-02-01 13:18:27,693 StatusLogger.java:106 - enterprise.t_sf_venue_test 0,0
INFO [ScheduledTasks:1] 2017-02-01 13:18:27,699 MessagingService.java:946 - REQUEST_RESPONSE messages were dropped in last 5000 ms: 7 for internal timeout and 0 for cross node timeout
INFO [Service Thread] 2017-02-01 13:18:27,699 StatusLogger.java:106 - enterprise.alestnstats 0,0
INFO [ScheduledTasks:1] 2017-02-01 13:18:27,699 MessagingService.java:946 - RANGE_SLICE messages were dropped in last 5000 ms: 116 for internal timeout and 0 for cross node timeout
As you see in your logs, actually the failing query is not the one you are trying to execute.
the failing query is internal to cassandra:
"SELECT * FROM system_auth.roles;"
These internal cassandra queries(misc queries) does not use 'read_request_timeout_in_ms'. Instead, it uses 'request_timeout_in_ms'.

Saving a dataframe using spark-csv package throws exceptions and crashes (pyspark)

I am running a script on spark 1.5.2 in standalone mode (using 8 cores), and at the end of the script I attempt to serialize a very large dataframe to disk, using the spark-csv package. The code snippet that throws the exception is:
numfileparts = 16
data = data.repartition(numfileparts)
# Save the files as a bunch of csv files
datadir = "~/tempdatadir.csv/"
try:
(data
.write
.format('com.databricks.spark.csv')
.save(datadir,
mode="overwrite",
codec="org.apache.hadoop.io.compress.GzipCodec"))
except:
sys.exit("Could not save files.")
where data is a spark dataframe. At execution time, I get the following stracktrace:
16/04/19 20:16:24 WARN QueuedThreadPool: 8 threads could not be stopped
16/04/19 20:16:24 ERROR TaskSchedulerImpl: Exception in statusUpdate
java.util.concurrent.RejectedExecutionException: Task org.apache.spark.scheduler.TaskResultGetter$$anon$2#70617ec1 rejected from java.util.concurrent.ThreadPoolExecutor#1bf5370e[Shutting d\
own, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 2859]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at org.apache.spark.scheduler.TaskResultGetter.enqueueSuccessfulTask(TaskResultGetter.scala:49)
at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:347)
at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:330)
at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalBackend.scala:65)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
This leads to a bunch of these:
16/04/19 20:16:24 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/blockmgr-84d7d0a6-a3e5-4f48-bde0-0f6610e44e16/38/temp_shuffle_b9886819-be46-4e\
28-b57f-e592ea37ab95
java.io.FileNotFoundException: /tmp/blockmgr-84d7d0a6-a3e5-4f48-bde0-0f6610e44e16/38/temp_shuffle_b9886819-be46-4e28-b57f-e592ea37ab95 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:160)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:174)
at org.apache.spark.shuffle.sort.SortShuffleWriter.stop(SortShuffleWriter.scala:104)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/04/19 20:16:24 ERROR BypassMergeSortShuffleWriter: Error while deleting file for block temp_shuffle_b9886819-be46-4e28-b57f-e592ea37ab95
16/04/19 20:16:24 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/blockmgr-84d7d0a6-a3e5-4f48-bde0-0f6610e44e16/29/temp_shuffle_e474bcb1-5ead-4d\
7c-a58f-5398f32892f2
java.io.FileNotFoundException: /tmp/blockmgr-84d7d0a6-a3e5-4f48-bde0-0f6610e44e16/29/temp_shuffle_e474bcb1-5ead-4d7c-a58f-5398f32892f2 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
...and so on (I have intentionally left out some of the last lines.)
I do understand (roughly) what is happening, but am very uncertain of what to do about it - is it a memory issue?
I seek advice on what to do - is there some setting I can change, add, etc.?

Resources