cassandra sstableloader hanging with receiving progress 100% - cassandra

I'm trying to load data by sstableloader in cassandra 2.0.7.
The terminal shows progress 100%. I check netstats by nodetool netstats
It shows:
Mode: NORMAL
Bulk Load 21d7d610-a5f2-11e5-baa7-8fc95be03ac4
/10.19.150.70
Receiving 4 files, 138895248 bytes total
/root/data/whatyyun/metadata/whatyyun-metadata-tmp-jb-8-Data.db 67039680/67039680 bytes(100%) received from /10.19.150.70
/root/data/whatyyun/metadata/whatyyun-metadata-tmp-jb-10-Data.db 3074549/3074549 bytes(100%) received from /10.19.150.70
/root/data/whatyyun/metadata/whatyyun-metadata-tmp-jb-9-Data.db 43581052/43581052 bytes(100%) received from /10.19.150.70
/root/data/whatyyun/metadata/whatyyun-metadata-tmp-jb-7-Data.db 25199967/25199967 bytes(100%) received from /10.19.150.70
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 0
Responses n/a 0 11671
The sstableloader hangs for hours. I check the log there is an error that may concerns.
ERROR [CompactionExecutor:7] 2015-12-19 09:45:53,811 CassandraDaemon.java (line 198) Exception in thread Thread[CompactionExecutor:7,1,main]
java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:62)
at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:51)
at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:31)
at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:44)
at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:85)
at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:36)
at java.util.concurrent.ConcurrentSkipListMap.findNode(ConcurrentSkipListMap.java:804)
at java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:828)
at java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1626)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:215)
at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:107)
at org.apache.cassandra.db.index.SecondaryIndexManager.indexRow(SecondaryIndexManager.java:441)
at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:407)
at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:833)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR [NonPeriodicTasks:1] 2015-12-19 09:45:53,812 CassandraDaemon.java (line 198) Exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException
at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:413)
at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:142)
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:113)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:409)
... 9 more
Caused by: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:62)
at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:51)
at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:31)
at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:44)
at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:85)
at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:36)
at java.util.concurrent.ConcurrentSkipListMap.findNode(ConcurrentSkipListMap.java:804)
at java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:828)
at java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1626)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:215)
at java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1626)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:215)
at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:107)
at org.apache.cassandra.db.index.SecondaryIndexManager.indexRow(SecondaryIndexManager.java:441)
at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:407)
at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:833)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more
The scheme of the table is as follows:
CREATE TABLE metadata (
userid timeuuid,
dirname text,
basename text,
ctime timestamp,
fileid timeuuid,
imagefileid timeuuid,
imagefilesize int,
mtime timestamp,
nodetype int,
showname text,
size bigint,
timelong text,
PRIMARY KEY (userid, dirname, basename, ctime)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX idx_fileid ON metadata (fileid);
CREATE INDEX idx_nodetype ON metadata (nodetype);
Can I kill the process of the sstableloader safely? Has this bulk load process finished?

You should try to increase heap for SSTABLELoader.
vim $(which sstableloader)
#########"$JAVA" $JAVA_AGENT -ea -cp "$CLASSPATH" $JVM_OPTS -Xmx$MAX_HEAP_SIZE \
"$JAVA" $JAVA_AGENT -ea -cp "$CLASSPATH" $JVM_OPTS -XX:+UseG1GC -Xmx10G -Xms10G -XX:+UseTLAB -XX:+ResizeTLAB \
-Dcassandra.storagedir="$cassandra_storagedir" \
-Dlogback.configurationFile=logback-tools.xml \
org.apache.cassandra.tools.BulkLoader "$#"
I hope that would solve your issue.

Your node must be running out of resources may be due to heavy load or any other process.
Try restarting Cassandra on the nodes running on high load and see if that helps.

From the above error it seems you are running some resource crunch. So you need to tune some setting on memory side like Xmx,Xms Which indicates min and max heap size in cassandra-env.sh file for lower version of Cassandra(i.e 2.x)
After tuning above you need to restart you node/cluster and try loading again.

Related

Cassandra : data not replicated on new node

I added a new node to my cassandra cluster (the new node is not a seed node). I now have 3 nodes on my cluster :
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN XXX.XXX.XXX.XXX 52.25 GB 256 100.0% XXX rack1
UN XXX.XXX.XXX.XXX 63.65 GB 256 100.0% XXX rack1
UN XXX.XXX.XXX.XXX 314.72 MB 256 100.0% XXX rack1
I have a replication factor of 3 :
DESCRIBE KEYSPACE mykeyspace
CREATE KEYSPACE mykeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '3'} AND durable_writes = true;
but the data is not replicated on the new cluster (node with 314 MB of data).
I tried to use nodetool rebuild :
ERROR [STREAM-IN-/XXX.XXX.XXX.XXX] 2016-11-11 08:28:42,765
StreamSession.java:520 - [Stream
#0e7a0580-a81b-11e6-9a1c-6d75503d5d02] Streaming error occurred java.lang.IllegalArgumentException: Unknown type 0 at
org.apache.cassandra.streaming.messages.StreamMessage$Type.get(StreamMessage.java:97)
~[apache-cassandra-3.1.1.jar:3.1.1] at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:58)
~[apache-cassandra-3.1.1.jar:3.1.1] at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261)
~[apache-cassandra-3.1.1.jar:3.1.1] at
java.lang.Thread.run(Thread.java:745) [na:1.8.0_74] ERROR [Thread-16]
2016-11-11 08:28:42,765 CassandraDaemon.java:195 - Exception in thread
Thread[Thread-16,5,RMI Runtime] java.lang.RuntimeException:
java.lang.InterruptedException at
com.google.common.base.Throwables.propagate(Throwables.java:160)
~[guava-18.0.jar:na] at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
~[apache-cassandra-3.1.1.jar:3.1.1] at
java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_74] Caused by:
java.lang.InterruptedException: null at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
~[na:1.8.0_74] at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
~[na:1.8.0_74] at
java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
~[na:1.8.0_74] at
org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:184)
~[apache-cassandra-3.1.1.jar:3.1.1] at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-3.1.1.jar:3.1.1] ... 1 common frames omitted INFO
[STREAM-IN-/XXX.XXX.XXX.XXX] 2016-11-11 08:28:42,805
StreamResultFuture.java:182 - [Stream
#0e7a0580-a81b-11e6-9a1c-6d75503d5d02] Session with /XXX.XXX.XXX.XXX is complete WARN [STREAM-IN-/XXX.XXX.XXX.XXX] 2016-11-11 08:28:42,807
StreamResultFuture.java:209 - [Stream
#0e7a0580-a81b-11e6-9a1c-6d75503d5d02] Stream failed ERROR [RMI TCP Connection(14)-127.0.0.1] 2016-11-11 08:28:42,808
StorageService.java:1128 - Error while rebuilding node
org.apache.cassandra.streaming.StreamException: Stream failed at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
~[apache-cassandra-3.1.1.jar:3.1.1] at
com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
~[guava-18.0.jar:na] at
com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
~[guava-18.0.jar:na] at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
~[guava-18.0.jar:na] at
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
~[guava-18.0.jar:na] at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
~[guava-18.0.jar:na] at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
~[apache-cassandra-3.1.1.jar:3.1.1] at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
~[apache-cassandra-3.1.1.jar:3.1.1] at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430)
~[apache-cassandra-3.1.1.jar:3.1.1] at
org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:525)
~[apache-cassandra-3.1.1.jar:3.1.1] at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:279)
~[apache-cassandra-3.1.1.jar:3.1.1] at
java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_74]
I also tried to change the option but the data is still not copied to the new node :
auto_bootstrap: true
Could you please help me understand why the data is not replicated on the new node ?
Please let me know if you need further information from my configuration.
Thank you for your help
It appears (from https://issues.apache.org/jira/browse/CASSANDRA-10448) that this is due to CASSANDRA-10961. Applying that fix should address it.

Cassandra read calls failing with com.datastax.driver.core.exceptions.ReadFailureException

In one of our single node Cassadra deployment, there's this table schema:
CREATE table CTS_SVC_PT_INT_READ (
svc_pt_id bigint,
meas_type_id bigint,
value double,
flags bigint,
read_time timestamp,
last_upd_time timestamp,
PRIMARY KEY (svc_pt_id, meas_type_id, read_time)
) WITH CLUSTERING ORDER BY (meas_type_id ASC, read_time DESC)
AND compaction = {
'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
'timestamp_resolution': 'MILLISECONDS',
'base_time_seconds': '3600',
'max_sstable_age_days': '365'
};
While querying select distinct svc_pt_id from cts.CTS_SVC_PT_INT_READ through the Java client, it's failing with the exception:
select distinct svc_pt_id from cts.CTS_SVC_PT_INT_READ com.datastax.driver.core.exceptions.ReadFailureException: Cassandra failure during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded, 1 failed)
java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.ReadFailureException: Cassandra failure during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded, 1 failed)
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at rx.internal.operators.OnSubscribeToObservableFuture$ToObservableFuture.call(OnSubscribeToObservableFuture.java:74)
at rx.internal.operators.OnSubscribeToObservableFuture$ToObservableFuture.call(OnSubscribeToObservableFuture.java:43)
at rx.Observable.unsafeSubscribe(Observable.java:8314)
at rx.internal.operators.OperatorSubscribeOn$1.call(OperatorSubscribeOn.java:94)
at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.ReadFailureException: Cassandra failure during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded, 1 failed)
at com.datastax.driver.core.exceptions.ReadFailureException.copy(ReadFailureException.java:95)
at com.datastax.driver.core.Responses$Error.asException(Responses.java:128)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:184)
at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:798)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:617)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:354)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
... 1 more
Caused by: com.datastax.driver.core.exceptions.ReadFailureException: Cassandra failure during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded, 1 failed)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:76)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:266)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:246)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
... 15 more
I see the same error if I issue this cql command through cqlsh. Is it due to a ReadTimeOut issue or something else?
"ReadFailureException: Cassandra failure ... 0 replica responded, 1 failed)" indicates a failure, not a timeout. You might learn something further by looking at the cassandra log on the server.

Saving a dataframe using spark-csv package throws exceptions and crashes (pyspark)

I am running a script on spark 1.5.2 in standalone mode (using 8 cores), and at the end of the script I attempt to serialize a very large dataframe to disk, using the spark-csv package. The code snippet that throws the exception is:
numfileparts = 16
data = data.repartition(numfileparts)
# Save the files as a bunch of csv files
datadir = "~/tempdatadir.csv/"
try:
(data
.write
.format('com.databricks.spark.csv')
.save(datadir,
mode="overwrite",
codec="org.apache.hadoop.io.compress.GzipCodec"))
except:
sys.exit("Could not save files.")
where data is a spark dataframe. At execution time, I get the following stracktrace:
16/04/19 20:16:24 WARN QueuedThreadPool: 8 threads could not be stopped
16/04/19 20:16:24 ERROR TaskSchedulerImpl: Exception in statusUpdate
java.util.concurrent.RejectedExecutionException: Task org.apache.spark.scheduler.TaskResultGetter$$anon$2#70617ec1 rejected from java.util.concurrent.ThreadPoolExecutor#1bf5370e[Shutting d\
own, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 2859]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at org.apache.spark.scheduler.TaskResultGetter.enqueueSuccessfulTask(TaskResultGetter.scala:49)
at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:347)
at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:330)
at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalBackend.scala:65)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
This leads to a bunch of these:
16/04/19 20:16:24 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/blockmgr-84d7d0a6-a3e5-4f48-bde0-0f6610e44e16/38/temp_shuffle_b9886819-be46-4e\
28-b57f-e592ea37ab95
java.io.FileNotFoundException: /tmp/blockmgr-84d7d0a6-a3e5-4f48-bde0-0f6610e44e16/38/temp_shuffle_b9886819-be46-4e28-b57f-e592ea37ab95 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:160)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:174)
at org.apache.spark.shuffle.sort.SortShuffleWriter.stop(SortShuffleWriter.scala:104)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/04/19 20:16:24 ERROR BypassMergeSortShuffleWriter: Error while deleting file for block temp_shuffle_b9886819-be46-4e28-b57f-e592ea37ab95
16/04/19 20:16:24 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/blockmgr-84d7d0a6-a3e5-4f48-bde0-0f6610e44e16/29/temp_shuffle_e474bcb1-5ead-4d\
7c-a58f-5398f32892f2
java.io.FileNotFoundException: /tmp/blockmgr-84d7d0a6-a3e5-4f48-bde0-0f6610e44e16/29/temp_shuffle_e474bcb1-5ead-4d7c-a58f-5398f32892f2 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
...and so on (I have intentionally left out some of the last lines.)
I do understand (roughly) what is happening, but am very uncertain of what to do about it - is it a memory issue?
I seek advice on what to do - is there some setting I can change, add, etc.?

hive on spark - ArrayIndexOutOfBoundsException

i want to run 'select count(*) from test' on 'hive on spark' through beeline, but it crashed.
INFO client.SparkClientUtilities: Added jar[file:/home/.../hive/lib/zookeeper-3.4.6.jar] to classpath
INFO client.RemoteDriver: Failed to run job 41dc814c-deb7-4743-9b6a-b6cace2eae19
com.esotericsoftware.kryo.KryoException: java.lang.ArrayIndexOutOfBoundsException: 1
Serialization trace:
dummyOps (org.apache.hadoop.hive.ql.plan.ReduceWork) left (org.apache.commons.lang3.tuple.ImmutablePair) edgeProperties (org.apache.hadoop.hive.ql.plan.SparkWork)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:126)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
at org.apache.hadoop.hive.ql.exec.spark.KryoSerializer.deserialize(KryoSerializer.java:49)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:235)
at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:366)
at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at com.esotericsoftware.kryo.serializers.MapSerializer.setGenerics(MapSerializer.java:53)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:604)
... 19 more
hive on spark configuration is:
set spark.home=/home/.../spark;
set hive.execution.engine=spark;
set spark.master=yarn;
set spark.eventLog.enabled=true;
set spark.eventLog.dir=hdfs://...:9000/user/.../spark-history-server-records;
set spark.executor.memory=1024m;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
and what's worse, when i change KryoSerializer to JavaSerializer, this issue still exists.
Thanks.

Why cassandra fails with OutOfMemoryError during sstables compaction

Hi may be this is a stupid question, but I did not find the answer via google.
So what I have:
java 1.7
cassandra 1.2.8 running in single node with -Xmx1G and -Xms1G without any changes to yaml file
I've created next test column family:
CREATE COLUMN FAMILY TEST_HUGE_SF
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type;
Then I try to insert rows in this column family.
I use astyanax lib to access cassandra:
final long START = 1;
final long MAX_ROWS_COUNT = 1000000000; // 1 Billion
Keyspace keyspace = AstyanaxProvider.getAstyanaxContext().getClient();
ColumnFamily<String, String> cf = new ColumnFamily<>(
"TEST_HUGE_SF",
StringSerializer.get(),
StringSerializer.get());
MutationBatch mb = keyspace.prepareMutationBatch()
.withRetryPolicy(new BoundedExponentialBackoff(250, 5000, 20));
for (long i = START; i<MAX_ROWS_COUNT; i++) {
long t = i % 1000;
if (t == 0) {
System.out.println("pushed: " + i);
mb.execute();
Thread.sleep(1);
mb = keyspace.prepareMutationBatch()
.withRetryPolicy(new BoundedExponentialBackoff(250, 5000, 20));
}
ColumnListMutation<String> clm = mb.withRow(cf, String.format("row_%012d", i));
clm.putColumn("col1", i);
clm.putColumn("col2", t);
}
mb.execute();
So as you can see from code, I try to insert 1 Billion rows, each one contains two columns, each column contains simple long value.
After inserting ~ 122 million rows, - cassandra crashed with OutOfMemoryError.
In logs there is next:
INFO [CompactionExecutor:1571] 2014-08-08 08:31:45,334 CompactionTask.java (line 263) Compacted 4 sstables to [\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-2941,]. 865 252 169 bytes to 901 723 715 (~104% of original) in 922 963ms = 0,931728MB/s. 26 753 257 total rows, 26 753 257 unique. Row merge counts were {1:26753257, 2:0, 3:0, 4:0, }
INFO [CompactionExecutor:1571] 2014-08-08 08:31:45,337 CompactionTask.java (line 106) Compacting [SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-2069-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-629-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-2941-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-1328-Data.db')]
ERROR [CompactionExecutor:1571] 2014-08-08 08:31:46,167 CassandraDaemon.java (line 132) Exception in thread Thread[CompactionExecutor:1571,1,main]
java.lang.OutOfMemoryError
at sun.misc.Unsafe.allocateMemory(Native Method)
at org.apache.cassandra.io.util.Memory.<init>(Memory.java:52)
at org.apache.cassandra.io.util.Memory.allocate(Memory.java:60)
at org.apache.cassandra.utils.obs.OffHeapBitSet.<init>(OffHeapBitSet.java:40)
at org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:143)
at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:137)
at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:126)
at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.<init>(SSTableWriter.java:445)
at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:92)
at org.apache.cassandra.db.ColumnFamilyStore.createCompactionWriter(ColumnFamilyStore.java:1958)
at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:144)
at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:59)
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:191)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
INFO [CompactionExecutor:1570] 2014-08-08 08:31:46,994 CompactionTask.java (line 263) Compacted 4 sstables to [\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-3213,]. 34 773 524 bytes to 35 375 883 (~101% of original) in 44 162ms = 0,763939MB/s. 1 151 482 total rows, 1 151 482 unique. Row merge counts were {1:1151482, 2:0, 3:0, 4:0, }
INFO [CompactionExecutor:1570] 2014-08-08 08:31:47,105 CompactionTask.java (line 106) Compacting [SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-2069-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-629-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-2941-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-1328-Data.db')]
ERROR [CompactionExecutor:1570] 2014-08-08 08:31:47,110 CassandraDaemon.java (line 132) Exception in thread Thread[CompactionExecutor:1570,1,main]
java.lang.OutOfMemoryError
at sun.misc.Unsafe.allocateMemory(Native Method)
at org.apache.cassandra.io.util.Memory.<init>(Memory.java:52)
at org.apache.cassandra.io.util.Memory.allocate(Memory.java:60)
at org.apache.cassandra.utils.obs.OffHeapBitSet.<init>(OffHeapBitSet.java:40)
at org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:143)
at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:137)
at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:126)
at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.<init>(SSTableWriter.java:445)
at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:92)
at org.apache.cassandra.db.ColumnFamilyStore.createCompactionWriter(ColumnFamilyStore.java:1958)
at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:144)
at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:59)
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:191)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
As I see cassandra crashes during sstables compaction.
Does this mean that to handle more rows cassandra needs more heap space?
I expected that lack of heap space will only affect performance. Can someone describe, why my expectations are wrong?
Someone else noted this - 1GB heap is very small. With Cassandra 2.0, you could look into this tuning guide for further information:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_tune_jvm_c.html
Another consideration is how garbage collecting is being handled. In the cassandra log directory, there should be also GC logs indicating how often and how long the collections were. You can monitor them live using jvisualvm, if you want.

Resources