I tried running my spark job on GKE using spark-operator and dataproc but on both instances the hadoop adaptor is able to list the files but gets stuck in a sleep-retry loop while trying to read them from GCS.
The service account has full access and I was able to fetch the file using gsutil on the same executor container using the same service account. This seems to rule out network or permission issues.
Using spark-operator version v2.4.0-v1beta1-latest
Logs:
2019-07-12 11:33:12 INFO HadoopRDD:54 - Input split: gs://app-logs/2019/07/04/08/ip-10-1-34-63-app-json.log-2019-07-04-08-20.gz:0+295144331
2019-07-12 11:33:12 INFO HadoopRDD:54 - Input split: gs://app-logs/2019/07/04/08/ip-10-1-33-94-app-json.log-2019-07-04-08-20.gz:0+305812437
2019-07-12 11:33:12 INFO HadoopRDD:54 - Input split: gs://app-logs/2019/07/04/08/ip-10-1-34-61-app-json.log-2019-07-04-08-20.gz:0+297933921
2019-07-12 11:33:12 INFO HadoopRDD:54 - Input split: gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz:0+309553279
2019-07-12 11:33:12 INFO TorrentBroadcast:54 - Started reading broadcast variable 0
2019-07-12 11:33:12 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.1 KB, free 3.3 GB)
2019-07-12 11:33:12 INFO TorrentBroadcast:54 - Reading broadcast variable 0 took 13 ms
2019-07-12 11:33:12 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 323.8 KB, free 3.3 GB)
2019-07-12 11:33:14 INFO CodecPool:181 - Got brand-new decompressor [.gz]
2019-07-12 11:33:14 INFO CodecPool:181 - Got brand-new decompressor [.gz]
2019-07-12 11:33:14 INFO CodecPool:181 - Got brand-new decompressor [.gz]
2019-07-12 11:33:14 INFO CodecPool:181 - Got brand-new decompressor [.gz]
2019-07-12 11:42:00 WARN GoogleCloudStorageReadChannel:76 - Failed read retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz'. Sleeping...
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
at sun.security.ssl.InputRecord.read(InputRecord.java:532)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.MeteredStream.read(MeteredStream.java:134)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169)
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:370)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:130)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:293)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:224)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
2019-07-12 11:42:00 INFO GoogleCloudStorageReadChannel:76 - Done sleeping before retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz'
2019-07-12 11:42:00 INFO GoogleCloudStorageReadChannel:76 - Success after 1 retries on reading 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz'
2019-07-12 11:42:00 WARN GoogleCloudStorageReadChannel:76 - Failed read retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-34-61-app-json.log-2019-07-04-08-20.gz'. Sleeping...
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
at sun.security.ssl.InputRecord.read(InputRecord.java:532)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.MeteredStream.read(MeteredStream.java:134)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169)
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:370)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:130)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:293)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:224)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
2019-07-12 11:42:00 INFO GoogleCloudStorageReadChannel:76 - Done sleeping before retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-34-61-app-json.log-2019-07-04-08-20.gz'
2019-07-12 11:42:00 INFO GoogleCloudStorageReadChannel:76 - Success after 1 retries on reading 'gs://app-logs/2019/07/04/08/ip-10-1-34-61-app-json.log-2019-07-04-08-20.gz'
2019-07-12 11:50:44 WARN GoogleCloudStorageReadChannel:76 - Failed read retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz'. Sleeping...
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
at sun.security.ssl.InputRecord.read(InputRecord.java:532)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:244)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:689)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169)
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:370)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:130)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:293)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:224)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
2019-07-12 11:50:44 INFO GoogleCloudStorageReadChannel:76 - Done sleeping before retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz'
2019-07-12 11:50:44 INFO GoogleCloudStorageReadChannel:76 - Success after 1 retries on reading 'gs://app-logs/2019/07/04/08/ip-10-1-33-112-app-json.log-2019-07-04-08-20.gz'
2019-07-12 11:55:06 WARN GoogleCloudStorageReadChannel:76 - Failed read retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-94-app-json.log-2019-07-04-08-20.gz'. Sleeping...
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
at sun.security.ssl.InputRecord.read(InputRecord.java:532)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.MeteredStream.read(MeteredStream.java:134)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169)
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:370)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:130)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:293)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:224)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
2019-07-12 11:55:06 INFO GoogleCloudStorageReadChannel:76 - Done sleeping before retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-33-94-app-json.log-2019-07-04-08-20.gz'
2019-07-12 11:55:06 INFO GoogleCloudStorageReadChannel:76 - Success after 1 retries on reading 'gs://app-logs/2019/07/04/08/ip-10-1-33-94-app-json.log-2019-07-04-08-20.gz'
2019-07-12 11:55:10 WARN GoogleCloudStorageReadChannel:76 - Failed read retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-34-63-app-json.log-2019-07-04-08-20.gz'. Sleeping...
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
at sun.security.ssl.InputRecord.read(InputRecord.java:532)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.MeteredStream.read(MeteredStream.java:134)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169)
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:370)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:130)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:248)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:293)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:224)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
2019-07-12 11:55:10 INFO GoogleCloudStorageReadChannel:76 - Done sleeping before retry #1/10 for 'gs://app-logs/2019/07/04/08/ip-10-1-34-63-app-json.log-2019-07-04-08-20.gz'
2019-07-12 11:55:10 INFO GoogleCloudStorageReadChannel:76 - Success after 1 retries on reading 'gs://app-logs/2019/07/04/08/ip-10-1-34-63-app-json.log-2019-07-04-08-20.gz'
What could be causing this? Have relaxed the firewall rules as well.
You need to check if your cluster and GCS bucket that you are reading from are in the same GCP region - it could be slow if reads are cross regional.
Also, it seems that you are processing gzipped log files that are can not be split and need to be re-read from the beginning on each failure, which could lead to long read times if there are some network flakiness (because of cross regional reads, for example).
I want to add or remove nodes to the cluster.
When I tring to add/remove node, I get a LEAK DETECTED and a STREAM FAILED ERROR messange.
If I drop the index - pushcapabilityindx,presetsearchval - before adding / removing nodes, the node add / remove succeeds.
If there is no data update, the data of abc_db.sub is automatically deleted after 24 hours. (TTL 86400 sec)
Also, if I do not have any work including index deletion, add / remove nodes succeeds normally after 14 days.
Where should I start troubleshooting this error?
Ubuntu 16.04.3 LTS
[cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
CREATE TABLE abc_db.sub (
phoneno text,
deviceid text,
subid text,
callbackdata text,
corelator text,
duration int,
local_index bigint,
phase int,
presetsearchvalue text,
pushcapability list<text>,
pushtoken text,
pushtype text,
searchcriteria frozen<typesearchcriteria>,
PRIMARY KEY (phoneno, deviceid, subid)
) WITH CLUSTERING ORDER BY (deviceid ASC, subid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX pushcapabilityindx ON abc_db.sub (values(pushcapability));
CREATE INDEX presetsearchval ON abc_db.sub (presetsearchvalue);
INFO [main] 2019-05-09 07:57:15,741 StorageService.java:1435 - JOINING: waiting for schema information to complete
INFO [main] 2019-05-09 07:57:16,497 StorageService.java:1435 - JOINING: schema complete, ready to bootstrap
INFO [main] 2019-05-09 07:57:16,497 StorageService.java:1435 - JOINING: waiting for pending range calculation
INFO [main] 2019-05-09 07:57:16,497 StorageService.java:1435 - JOINING: calculation complete, ready to bootstrap
INFO [main] 2019-05-09 07:57:16,498 StorageService.java:1435 - JOINING: getting bootstrap token
INFO [main] 2019-05-09 07:57:16,531 StorageService.java:1435 - JOINING: sleeping 30000 ms for pending range setup
INFO [main] 2019-05-09 07:57:46,532 StorageService.java:1435 - JOINING: Starting to bootstrap...
INFO [main] 2019-05-09 07:57:47,775 StreamResultFuture.java:90 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Executing streaming plan for Bootstrap
INFO [StreamConnectionEstablisher:1] 2019-05-09 07:57:47,783 StreamSession.java:266 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Starting streaming to /172.50.20.10
INFO [StreamConnectionEstablisher:1] 2019-05-09 07:57:47,786 StreamCoordinator.java:264 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488, ID#0] Beginning stream session with /172.50.20.10
INFO [STREAM-IN-/172.50.20.10:5000] 2019-05-09 07:57:48,887 StreamResultFuture.java:173 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488 ID#0] Prepare completed. Receiving 261 files(1012.328MiB), sending 0 files(0.000KiB)
INFO [StreamConnectionEstablisher:2] 2019-05-09 07:57:48,891 StreamSession.java:266 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Starting streaming to /172.50.22.10
INFO [StreamConnectionEstablisher:2] 2019-05-09 07:57:48,893 StreamCoordinator.java:264 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488, ID#0] Beginning stream session with /172.50.22.10
INFO [STREAM-IN-/172.50.22.10:5000] 2019-05-09 07:57:50,020 StreamResultFuture.java:173 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488 ID#0] Prepare completed. Receiving 254 files(1.286GiB), sending 0 files(0.000KiB)
INFO [StreamConnectionEstablisher:3] 2019-05-09 07:57:50,022 StreamSession.java:266 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Starting streaming to /172.50.21.10
INFO [StreamConnectionEstablisher:3] 2019-05-09 07:57:50,025 StreamCoordinator.java:264 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488, ID#0] Beginning stream session with /172.50.21.10
INFO [STREAM-IN-/172.50.21.10:5000] 2019-05-09 07:57:50,998 StreamResultFuture.java:173 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488 ID#0] Prepare completed. Receiving 114 files(1.085GiB), sending 0 files(0.000KiB)
INFO [StreamReceiveTask:1] 2019-05-09 07:58:02,509 SecondaryIndexManager.java:365 - Submitting index build of pushcapabilityindx,presetsearchval for data in BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-1-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-2-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-3-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:02,519 SecondaryIndexManager.java:385 - Index build of pushcapabilityindx,presetsearchval complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:11,213 SecondaryIndexManager.java:365 - Submitting index build of pushcapabilityindx,presetsearchval for data in BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-4-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-5-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-6-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-7-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-8-big-Data.db'),BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-9-big-Data.db')
ERROR [StreamReceiveTask:1] 2019-05-09 07:58:11,295 StreamSession.java:593 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Streaming error occurred on session with peer 172.50.22.10
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.NoSuchElementException
at org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:51) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:393) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.SecondaryIndexManager.buildIndexesBlocking(SecondaryIndexManager.java:382) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.SecondaryIndexManager.buildAllIndexesBlocking(SecondaryIndexManager.java:269) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:215) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
Caused by: java.util.concurrent.ExecutionException: java.util.NoSuchElementException
at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_191]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_191]
at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:386) ~[apache-cassandra-3.10.jar:3.10]
... 9 common frames omitted
Caused by: java.util.NoSuchElementException: null
at org.apache.cassandra.utils.AbstractIterator.next(AbstractIterator.java:64) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.SecondaryIndexManager.lambda$indexPartition$20(SecondaryIndexManager.java:618) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_191]
at org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:618) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.index.internal.CollatedViewIndexBuilder.build(CollatedViewIndexBuilder.java:71) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1587) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
... 6 common frames omitted
ERROR [STREAM-IN-/172.50.22.10:5000] 2019-05-09 07:58:11,305 StreamSession.java:593 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Streaming error occurred on session with peer 172.50.22.10
java.lang.RuntimeException: Outgoing stream handler has been closed
at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:143) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:655) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:523) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
INFO [StreamReceiveTask:1] 2019-05-09 07:58:11,310 StreamResultFuture.java:187 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Session with /172.50.22.10 is complete
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,115 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#441b19c1) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy#1255997234:[[OffHeapBitSet]] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,115 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#4ee372b1) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy#199357705:Memory#[7f12a4ad4970..7f12a4ad4a10) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,115 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#172830a5) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy#983595037:Memory#[7f12a4b3f670..7f12a4b3f990) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2019-05-09 07:58:19,116 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#6f83e302) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy#1808665158:Memory#[7f12a41a56d0..7f12a41a56d4) was not released before the reference was garbage collected
INFO [StreamReceiveTask:1] 2019-05-09 07:58:40,657 SecondaryIndexManager.java:365 - Submitting index build of groupchatid_idx_giinfo for data in BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-1-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-2-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:40,714 SecondaryIndexManager.java:385 - Index build of groupchatid_idx_giinfo complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:41,494 SecondaryIndexManager.java:365 - Submitting index build of groupchatid_idx_giinfo for data in BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-3-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-4-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-5-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-6-big-Data.db'),BigTableReader(path='/cassandra/data/srib_storeserver_objectstore_db/groupinfoobjects-28134170627f11e9b62cc50700a5eaee/mc-7-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:41,537 SecondaryIndexManager.java:385 - Index build of groupchatid_idx_giinfo complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:43,175 SecondaryIndexManager.java:365 - Submitting index build of pushcapabilityindx,presetsearchval for data in BigTableReader(path='/cassandra/data/abc_db/sub-28f738d0627f11e9b62cc50700a5eaee/mc-11-big-Data.db')
INFO [StreamReceiveTask:1] 2019-05-09 07:58:43,209 SecondaryIndexManager.java:385 - Index build of pushcapabilityindx,presetsearchval complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:43,972 StreamResultFuture.java:187 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Session with /172.50.20.10 is complete
INFO [StreamReceiveTask:1] 2019-05-09 07:58:45,643 StreamResultFuture.java:187 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Session with /172.50.21.10 is complete
WARN [StreamReceiveTask:1] 2019-05-09 07:58:45,664 StreamResultFuture.java:214 - [Stream #21f71b70-7230-11e9-9495-ab5d6d9b9488] Stream failed
WARN [StreamReceiveTask:1] 2019-05-09 07:58:45,665 StorageService.java:1497 - Error during bootstrap.
org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) ~[apache-cassandra-3.10.jar:3.10]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) [guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) [guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) [guava-18.0.jar:na]
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:766) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:727) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:244) [apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
ERROR [main] 2019-05-09 07:58:45,665 StorageService.java:1507 - Error while waiting on bootstrap to complete. Bootstrap will have to be restarted.
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na]
at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1502) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:962) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:681) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) [apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735) [apache-cassandra-3.10.jar:3.10]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) ~[apache-cassandra-3.10.jar:3.10]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-18.0.jar:na]
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:766) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:727) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:244) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_191]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
WARN [main] 2019-05-09 07:58:45,676 StorageService.java:1013 - Some data streaming failed. Use nodetool to check bootstrap state and resume. For more, see `nodetool help bootstrap`. IN_PROGRESS
INFO [main] 2019-05-09 07:58:45,677 CassandraDaemon.java:694 - Waiting for gossip to settle before accepting client requests...
INFO [main] 2019-05-09 07:58:53,678 CassandraDaemon.java:725 - No gossip backlog; proceeding
INFO [main] 2019-05-09 07:58:53,737 NativeTransportService.java:70 - Netty using native Epoll event loop
INFO [main] 2019-05-09 07:58:53,781 Server.java:155 - Using Netty Version: [netty-buffer=netty-buffer-4.0.39.Final.38bdf86, netty-codec=netty-codec-4.0.39.Final.38bdf86, netty-codec-haproxy=netty-codec-haproxy-4.0.39.Final.38bdf86, netty-codec-http=netty-codec-http-4.0.39.Final.38bdf86, netty-codec-socks=netty-codec-socks-4.0.39.Final.38bdf86, netty-common=netty-common-4.0.39.Final.38bdf86, netty-handler=netty-handler-4.0.39.Final.38bdf86, netty-tcnative=netty-tcnative-1.1.33.Fork19.fe4816e, netty-transport=netty-transport-4.0.39.Final.38bdf86, netty-transport-native-epoll=netty-transport-native-epoll-4.0.39.Final.38bdf86, netty-transport-rxtx=netty-transport-rxtx-4.0.39.Final.38bdf86, netty-transport-sctp=netty-transport-sctp-4.0.39.Final.38bdf86, netty-transport-udt=netty-transport-udt-4.0.39.Final.38bdf86]
INFO [main] 2019-05-09 07:58:53,781 Server.java:156 - Starting listening for CQL clients on /172.50.20.11:7042 (unencrypted)...
INFO [main] 2019-05-09 07:58:53,809 CassandraDaemon.java:528 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
I solved this issue by myself.
No. Linux OS Package version RESULT
1 Ubuntu 16.04.3 LTS deb 3.10 Stream Failed
2 Ubuntu 16.04.3 LTS deb 3.11.4 Stream Failed
3 Ubuntu 16.04.3 LTS tgz 3.11.4 Successful
4 Ubuntu 18.04.2 LTS deb 3.10 Stream Failed
5 Ubuntu 18.04.2 LTS deb 3.11.4 Stream Failed
6 Debian GNU/Linux 9.9 (stretch) deb 3.10 Stream Failed
7 Debian GNU/Linux 9.9 (stretch) deb 3.11.4 Stream Failed
8 Amazon Linux 2 rpm 3.11.4 Successful
9 CentOS 7.6 rpm 3.11.4 Successful
Finally, I am going to install from binary tarball files.
I am not a developer, so I can not find the cause.
I hope someone finds the cause and fixes it in Ubuntu & Debian.
Regards.
Sungjae Yun.
I am running a spark application on YARN cluster(on AWS EMR). The application seems to be killed and I want to find the cause. I try to understand the YARN info given in the following screen.
The diagnostic line in the screen seems to show that YARN killing the app because of the memory limit:
Diagnostics: Container [pid=1540,containerID=container_1488651686158_0012_02_000001] is running beyond physical memory limits. Current usage: 1.6 GB of 1.4 GB physical memory used; 3.6 GB of 6.9 GB virtual memory used. Killing container.
However, the appattempt log shows completely different exception, something related to the IO/network. My question is : should I trust the diagnostic in the screen or the appattempt log? Is the IO exception causing the kill or the out of memory cause the IO exception in the appattempt log? Is it another log/diagnostic I should look at? Thanks.
17/03/04 21:59:02 ERROR Utils: Uncaught exception in thread task-result-getter-0
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:104)
at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:579)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "task-result-getter-0" java.lang.Error: java.lang.InterruptedException
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:104)
at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:579)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
... 2 more
17/03/04 21:59:02 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/04 21:59:02 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from ip-172-31-9-207.ec2.internal/172.31.9.207:38437 is closed
17/03/04 21:59:02 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms
17/03/04 21:59:02 ERROR DiskBlockManager: Exception while deleting local spark dir: /mnt/yarn/usercache/hadoop/appcache/application_1488651686158_0012/blockmgr-941a13d8-1b31-4347-bdec-180125b6f4ca
java.io.IOException: Failed to delete: /mnt/yarn/usercache/hadoop/appcache/application_1488651686158_0012/blockmgr-941a13d8-1b31-4347-bdec-180125b6f4ca
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
at org.apache.spark.storage.DiskBlockManager$$anonfun$org$apache$spark$storage$DiskBlockManager$$doStop$1.apply(DiskBlockManager.scala:169)
at org.apache.spark.storage.DiskBlockManager$$anonfun$org$apache$spark$storage$DiskBlockManager$$doStop$1.apply(DiskBlockManager.scala:165)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:165)
at org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:160)
at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1361)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:89)
at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1841)
at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
17/03/04 21:59:02 INFO MemoryStore: MemoryStore cleared
17/03/04 21:59:02 INFO BlockManager: BlockManager stopped
17/03/04 21:59:02 INFO BlockManagerMaster: BlockManagerMaster stopped
17/03/04 21:59:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/04 21:59:02 ERROR Utils: Uncaught exception in thread Thread-3
java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder
at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
at java.nio.file.Paths.get(Paths.java:138)
at org.apache.spark.util.Utils$.isSymlink(Utils.scala:1021)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:991)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:102)
at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1841)
at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
17/03/04 21:59:02 WARN ShutdownHookManager: ShutdownHook '$anon$2' failed, java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder
java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder
at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
at java.nio.file.Paths.get(Paths.java:138)
at org.apache.spark.util.Utils$.isSymlink(Utils.scala:1021)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:991)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:102)
at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1841)
at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
The information in your screenshot is the most relevant. Your ApplicationMaster container ran out of memory. You need to increase yarn.app.mapreduce.am.resource.mb which is set in mapred-site.xml. I recommend a value of 2000 since that will usually accommodate running Spark and MapReduce applications at scale.
The container was killed (memory exceeds physical memory limits) so any attempt to reach this container fails.
Yarn is fine to have an overall view of the process, but you should prefer spark history server to analyse better your job (check unbalanced memory in spark history).
I am trying to stream some sstables to Cassandra cluster using SStableLoader utility. I am getting a streaming error. Here is the stack.
Established connection to initial hosts
Opening sstables and calculating sections to stream
18:05:04.058 [main] DEBUG o.a.c.i.s.m.MetadataSerializer - Load metadata for /path/new/xyz/search/xyz-search-ka-1
18:05:04.073 [main] INFO o.a.c.io.sstable.SSTableReader - Opening /path/new/xyz/new/xyz_search/search/xyz_search-search-ka-1 (330768 bytes)
Streaming relevant part of /path/new/xyz/xyz_search/search/xyz_search-search-ka-1-Data.db to [/10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX]
18:05:04.411 [main] INFO o.a.c.streaming.StreamResultFuture - [Stream #ed3a0cd0-fd25-11e5-8509-63e9961cf787] Executing streaming plan for Bulk Load
Streaming relevant part of /path/xyz-search-ka-1-Data.db to [/10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX, /10.XXX.XXX.XXX]
17:22:44.175 [main] INFO o.a.c.streaming.StreamResultFuture - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Executing streaming plan for Bulk Load
17:22:44.177 [StreamConnectionEstablisher:1] INFO o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Starting streaming to /10.XX.XX.XX
17:22:44.177 [StreamConnectionEstablisher:1] DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Sending stream init for incoming stream
17:22:44.183 [StreamConnectionEstablisher:2] INFO o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Starting streaming to /10.XX.XX.XX
17:22:44.183 [StreamConnectionEstablisher:2] DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Sending stream init for incoming stream
17:23:47.191 [StreamConnectionEstablisher:2] ERROR o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Streaming error occurred
java.net.ConnectException: Connection timed out
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_45]
at sun.nio.ch.Net.connect(Net.java:458) ~[na:1.8.0_45]
at sun.nio.ch.Net.connect(Net.java:450) ~[na:1.8.0_45]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_45]
at java.nio.channels.SocketChannel.open(SocketChannel.java:189) ~[na:1.8.0_45]
at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:62) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:236) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:79) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:223) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208) [cassandra-all-2.1.6.jar:2.1.6]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
17:23:47.202 [StreamConnectionEstablisher:2] DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Closing stream connection handler on /10.XXX.XXX.XXX
17:23:47.205 [StreamConnectionEstablisher:1] ERROR o.a.c.streaming.StreamSession - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Streaming error occurred
java.net.ConnectException: Connection timed out
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_45]
at sun.nio.ch.Net.connect(Net.java:458) ~[na:1.8.0_45]
at sun.nio.ch.Net.connect(Net.java:450) ~[na:1.8.0_45]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_45]
at java.nio.channels.SocketChannel.open(SocketChannel.java:189) ~[na:1.8.0_45]
at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:62) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:236) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:79) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:223) ~[cassandra-all-2.1.6.jar:2.1.6]
at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208) [cassandra-all-2.1.6.jar:2.1.6]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Also the machine where I am running the sstableloader is out of the cassandra cluster.
Thanks
After debugging a little more, found that sstableloader also uses port 7000 while streaming sstables to cassandra cluster. My local machine did not have access to port 7000 on the machines on cassandra cluster. That's why i was getting connection time out exception.
Anyone who encounters this make sure that your machine from where you are running the sstableloader has access to port 9160,7000 and 9042 of all the casssandra nodes you are trying to stream to.
DEBUG o.a.c.streaming.ConnectionHandler - [Stream #0327a9e0-fd20-11e5-b350-63e9961cf787] Closing stream connection handler on /10.XXX.XXX.XXX
Hint: I suspect the machine 10.xxx.xxx.xxx is under heavy load. Worth checking the /var/log/cassandra/system.log file on this machine to narrow down the root cause
When I run "nodetool -h 192.168.1.161 repair" on a node I get the following error. How should I resolve this? I'm running Cassandra 2.0.2.
ERROR [Thread-26] 2014-08-17 18:24:01,289 StorageService.java (line 2477) Repair session 8d2a1190-25aa-11e4-8a15-ff681618d551 for range (1844674407370955161,5534023222112865484] failed with error org.apache.cassandra.exceptions.RepairException: [repair #8d2a1190-25aa-11e4-8a15-ff681618d551 on WIDGET_CF/STATS, (1844674407370955161,5534023222112865484]] Validation failed in /192.168.1.164
java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #8d2a1190-25aa-11e4-8a15-
ff681618d551 on WIDGET_CF/STATS, (1844674407370955161,5534023222112865484]] Validation failed in /192.168.1.164
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2469)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #8d2a1190-25aa-11e4-8a15-ff681618d551 on WIDGET_CF/STA
TS, (1844674407370955161,5534023222112865484]] Validation failed in /192.168.1.164
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
Caused by: org.apache.cassandra.exceptions.RepairException: [repair #8d2a1190-25aa-11e4-8a15-ff681618d551 on WIDGET_CF/STATS, (1844674407370955161,553
4023222112865484]] Validation failed in /192.168.1.164
at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:152)
at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:188)
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:59)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
... 3 more