We are running a Cassandra cluster with node servers. Initially, the cluster only had one node, and we decided that since that node was running out of space, we could add another node to the cluster.
Info on the cluster:
Keyspace with replication factor 1 using the SimpleStrategy class on a single datacenter
Node 1 - 256 tokens, almost no space available (1TB occupied by Cassandra data)
Node 2 - connected with 256 tokens, had 13TB available
First we added node 2 to the cluster and then realized that to stream the data to node 2, we'd have to decommission node 1.
So we decided to decommission, empty and reconfigure node 1 (we wanted node 1 to hold only 32 tokens) and re-add node 1 to the cluster datacenter.
When launching the decommission process, it created a stream of 29 files making a total of almost 600GB. That stream copied successfully (we checked the logs and used nodetool netstats) and we were expecting that a second stream would follow as we had 1TB on node 1. But nothing else happened, the node reported as decommissioned and node 2 reported data stream complete.
The log from node 2 related to the copy stream:
INFO [STREAM-INIT-/10.131.155.200:48267] 2018-10-08 16:05:55,636 StreamResultFuture.java:116 - [Stream #a248d100-cb0b-11e8-a427-37a119a8af0a ID#0] Creating new streaming plan for Unbootstrap
INFO [STREAM-INIT-/10.131.155.200:48267] 2018-10-08 16:05:55,648 StreamResultFuture.java:123 - [Stream #a248d100-cb0b-11e8-a427-37a119a8af0a, ID#0] Received streaming plan for Unbootstrap
INFO [STREAM-INIT-/10.131.155.200:57298] 2018-10-08 16:05:55,648 StreamResultFuture.java:123 - [Stream #a248d100-cb0b-11e8-a427-37a119a8af0a, ID#0] Received streaming plan for Unbootstrap
INFO [STREAM-IN-/10.131.155.200:57298] 2018-10-08 16:05:55,663 StreamResultFuture.java:173 - [Stream #a248d100-cb0b-11e8-a427-37a119a8af0a ID#0] Prepare completed. Receiving 29 files(584.444GiB), sending 0 files(0.000KiB)
INFO [StreamReceiveTask:2] 2018-10-09 16:55:33,646 StreamResultFuture.java:187 - [Stream #a248d100-cb0b-11e8-a427-37a119a8af0a] Session with /10.131.155.200 is complete
INFO [StreamReceiveTask:2] 2018-10-09 16:55:33,709 StreamResultFuture.java:219 - [Stream #a248d100-cb0b-11e8-a427-37a119a8af0a] All sessions completed
After clearing the cassandra data folder (we should've backed it up, we know), we started cassandra again in node 1 and it successfully joined the cluster.
The cluster is functional with:
Node 1 - 32 tokens
Node 2 - 256 tokens
But, we seem to have a lost a lot of data. We were doing this as instructed in the Cassandra documentation.
We tried doing nodetool repair on both nodes, but to no avail (both reported no data to be recovered).
What did we miss here? Is there a way to recover this lost data?
Thank you all!
Related
Frequently, I encounter an issue where Dask randomly stalls on a couple tasks, usually tied to a read of data from a different node on my network (more details about this below). This can happen after several hours of running the script with no issues. It will hang indefinitely in a form shown below (this loop otherwise takes a few seconds to complete):
In this case, I see that there just a handful of stalled processes, and all are on one particular node (192.168.0.228):
Each worker on this node is stalled on a couple read_parquet tasks:
This was called using the following code and is using fastparquet:
ddf = dd.read_parquet(file_path, columns=['col1', 'col2'], index=False, gather_statistics=False)
My cluster is running Ubuntu 19.04 and all the latest versions (as of 11/12) of Dask and Distributed and the required packages (e.g., tornado, fsspec, fastparquet, etc.)
The data that the .228 node is trying to access is located on another node in my cluster. The .228 node accesses the data through CIFS file sharing. I run the Dask scheduler on the same node on which I'm running the script (different from both the .228 node and the data storage node). The script connects the workers to the scheduler via SSH using Paramiko:
ssh_client = paramiko.SSHClient()
stdin, stdout, stderr = ssh_client.exec_command('sudo dask-worker ' +
' --name ' + comp_name_decode +
' --nprocs ' + str(nproc_int) +
' --nthreads 10 ' +
self.dask_scheduler_ip, get_pty=True)
The connectivity of the .228 node to the scheduler and to the data storing node all look healthy. It is possible that the .228 node experienced some sort of brief connectivity issue while trying to process the read_parquet task, but if that occurred, the connectivity of .228 node to the scheduler and the CIFS shares were not impacted beyond that brief moment. In any case, the logs do not show any issues. This is the whole log from the .228 node:
distributed.worker - INFO - Start worker at: tcp://192.168.0.228:42445
distributed.worker - INFO - Listening to: tcp://192.168.0.228:42445
distributed.worker - INFO - dashboard at: 192.168.0.228:37751
distributed.worker - INFO - Waiting to connect to: tcp://192.168.0.167:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 2
distributed.worker - INFO - Memory: 14.53 GB
distributed.worker - INFO - Local Directory: /home/dan/worker-50_838ig
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Registered to: tcp://192.168.0.167:8786
distributed.worker - INFO - -------------------------------------------------
Putting aside whether this is a bug in Dask or in my code/network, is it possible to set a general timeout for all tasks handled by the scheduler? Alternatively, is it possible to:
identify stalled tasks,
copy a stalled task and move it to another worker, and
cancel the stalled task?
is it possible to set a general timeout for all tasks handled by the scheduler?
As of 2019-11-13 unfortunately the answer is no.
If a task has properly failed then you can retry that task with client.retry(...) but there is no automatic way to have a task fail itself after a certain time. This is something that you would have to write into your Python functions yourself. Unfortunately it is hard to interrupt a Python function in another thread, which is partially why this is not implemented.
If the worker goes down then things will be tried elsewhere. However from what you say it sounds like everything is healthy, it's just that the tasks themselves are likely to take forever. It's hard to identify this as a failure case unfortunately.
I am getting some transient excepting in using spark-streaming with Amazon Kinesis with storage level "MEMORY_AND_DISK_2". We are using Spark 2.2.0 with emr-5.9.0.
19/05/22 01:56:16 ERROR TransportRequestHandler: Error opening block StreamChunkId{streamId=438690479801, chunkIndex=0} for request from /10.1.100.56:38074
org.apache.spark.storage.BlockNotFoundException: Block broadcast_13287_piece0 not found
I have checked that are no lost nodes in EMR cluster. And HDFS utilization percentage is 35%
We have 3 datacenters with 12 cassandra nodes in each. Current Cassandra version is 1.2.19. We want to migrate to Cassandra 2.0.15. We cannot have a full downtime and we need to do a rolling upgrade. As a preliminary check we've done 2 experiments:
Experiment 1
Created a new 2.0.15 node and tried to bootstrap it into the cluster with 10% of token interval of already existing node.
The node was unable to join the cluster by producing: "java.lang.RuntimeException: Unable to gossip with any seeds"
CassandraDaemon.java (line 584) Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1296)
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457)
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:671)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:623)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:515)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:437)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:656)
Experiment 2
Added one 1.2.19 node into the cluster with 10% of token interval of already existing node.
When the node was up we stopped it and upgraded to 2.0.15, then started again (minor downtime). This time the node joined the cluster and started serving requests correctly.
To check how it behaves under heavier load we tried to move token to cover 15% of a normal node. Unfortunately
the move operation has failed with the following exception:
INFO [RMI TCP Connection(1424)-192.168.1.100] 2015-07-10 11:37:05,235 StorageService.java (line 982) MOVING: fetching new ranges and streaming old ranges
INFO [RMI TCP Connection(1424)-192.168.1.100] 2015-07-10 11:37:05,262 StreamResultFuture.java (line 87) [Stream #fc3f4290-26f7-11e5-9988-afe392008597] Executing streaming plan for Moving
INFO [RMI TCP Connection(1424)-192.168.1.100] 2015-07-10 11:37:05,262 StreamResultFuture.java (line 91) [Stream #fc3f4290-26f7-11e5-9988-afe392008597] Beginning stream session with /192.168.1.101
INFO [StreamConnectionEstablisher:1] 2015-07-10 11:37:05,263 StreamSession.java (line 218) [Stream #fc3f4290-26f7-11e5-9988-afe392008597] Starting streaming to /192.168.1.101
INFO [StreamConnectionEstablisher:1] 2015-07-10 11:37:05,274 StreamResultFuture.java (line 173) [Stream #fc3f4290-26f7-11e5-9988-afe392008597] Prepare completed. Receiving 0 files(0 bytes), sending 112 fi
les(6538607891 bytes)
ERROR [STREAM-IN-/192.168.1.101] 2015-07-10 11:37:05,303 StreamSession.java (line 467) [Stream #fc3f4290-26f7-11e5-9988-afe392008597] Streaming error occurred
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:239)
at java.lang.Thread.run(Thread.java:745)
ERROR [STREAM-OUT-/192.168.1.101] 2015-07-10 11:37:05,312 StreamSession.java (line 467) [Stream #fc3f4290-26f7-11e5-9988-afe392008597] Streaming error occurred
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDi
Questions
Q1. Is it normal for Cassandra 2.0.15 to not bootstrap into 1.2.19 cluster as it was in eperiment 1? (Here I mean that it might not supposed to work by design)
Q2. Is move token operation supposed to work for Cassandra 2.0.15 node which operates in 1.2.19 cluster?
Q3. Are there any workarounds/recommendations of doing a proper rolling upgrade in our case?
I'm having a difficult time adding a hadoop node in datastax enterprise 4.5.1. I have an existing Cassandra virtual dc with two nodes, using vnodes. I am using opscenter, and I startup a hadoop node, setting the initial_token value to 0. Opscenter installs everything just fine (i.e. I get passed the 5 green dots), but about a minute after, the node dies. The system.log file has this exception:
INFO [main] 2014-12-28 05:40:37,931 StorageService.java (line 1007) JOINING: Starting to bootstrap...
ERROR [main] 2014-12-28 05:40:37,998 CassandraDaemon.java (line 513) Exception encountered during startup
java.lang.IllegalStateException: No sources found for (-1,0]
at org.apache.cassandra.dht.RangeStreamer.getAllRangesWithSourcesFor(RangeStreamer.java:159)
at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:117)
at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:72)
at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1035)
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:797)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:614)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:504)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:394)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:574)
INFO [StorageServiceShutdownHook] 2014-12-28 05:40:38,015 Gossiper.java (line 1279) Announcing shutdown
INFO [Thread-1] 2014-12-28 05:40:38,015 DseDaemon.java (line 477) DSE shutting down...
INFO [Thread-1] 2014-12-28 05:40:38,022 PluginManager.java (line 317) All plugins are stopped.
INFO [Thread-1] 2014-12-28 05:40:38,023 CassandraDaemon.java (line 463) Cassandra shutting down...
ERROR [Thread-1] 2014-12-28 05:40:38,023 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-1,5,main]
java.lang.NullPointerException
at org.apache.cassandra.service.CassandraDaemon.stop(CassandraDaemon.java:464)
at com.datastax.bdp.server.DseDaemon.stop(DseDaemon.java:480)
at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:384)
INFO [StorageServiceShutdownHook] 2014-12-28 05:40:40,015 MessagingService.java (line 683) Waiting for messaging service to quiesce
INFO [ACCEPT-/172.31.19.81] 2014-12-28 05:40:40,017 MessagingService.java (line 923) MessagingService has terminated the accept() thread
I have a keyspace that looks like this:
CREATE KEYSPACE mykeyspace WITH replication = {
'class': 'NetworkTopologyStrategy',
'Analytics': '1',
'Cassandra': '1'
};
I'm wondering if it is because I am using vnodes in my Cassandra dc and not in the Analytics dc? The datastax documentation mentions this type of mixed architecture is ok. My snitch is set to DSEDelegateSnitch, which in turn uses DSESimpleSnitch, the default. I've ran a node repair, but to no avail. One other detail is that in opscenter, I get a warning that I am using two different versions of a Datastax enterprise, 4.5.1 in my cassandra dc, and 2.0.8.39 in the Analytics dc. In addition, opscenter lists the hadoop dc as "unknown." Any help at all would be appreciated.
In our dev cluster, which has been running smooth before, when we replace a node (which we have been doing constantly) the following failure occurs and prevents the replacement node from joining.
cassandra version is 2.0.7
What can be done about it?
ERROR [STREAM-IN-/10.128.---.---] 2014-11-19 12:35:58,007 StreamSession.java (line 420) [Stream #9cad81f0-6fe8-11e4-b575-4b49634010a9] Streaming error occurred
java.lang.AssertionError: Unknown keyspace system_traces
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:260)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:239)
at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:368)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:289)
at java.lang.Thread.run(Thread.java:745)
I got the same error while I was trying to setup my cluster, and as I was experimenting with different switches in cassandra.yaml, I restarted the service multiple times and removed the system dir under data directory (/var/lib/cassandra/data as mentioned here).
I guess for some reason cassandra tries to load system_traces keyspace and fails (the other dir under /var/lib/cassandra/data), and nodetool throws this error. You can just remove both system and system_traces before starting cassandra service, or even better delete all content of bommitlog, data and savedcache there.
This works obviously if you dont have any data just yet in the system.