Less rows being inserted by sstableloader in ScyllaDB - cassandra

I'm trying to migrate data from Cassandra to ScyllaDB from snapshot using sstableloader and data in some tables gets loaded without any error but when verifying count using PySpark, it gives less rows in ScyllaDB than in Cassandra. Help needed!

I work at ScyllaDB
There are two tools that can be used to help find the differences:
https://github.com/scylladb/scylla-migrate (https://github.com/scylladb/scylla-migrate/blob/master/docs/scylla-migrate-user-guide.md) you can use the check mode to find the missing rows.
https://github.com/scylladb/scylla-migrator is a tool for migration from alive CQL clusters one to another (Cassandra --> Scylla) will work that also supports validation (https://github.com/scylladb/scylla-migrator#running-the-validator). There is a blog series on using this tool https://www.scylladb.com/2019/02/07/moving-from-cassandra-to-scylla-via-apache-spark-scylla-migrator/.
Please post a bug on https://github.com/scylladb/scylla/issues if indeed there are missing rows.

Solved this problem by using nodetool repair on Cassandra keyspace, took snapshot and loaded the snapshot in ScyllaDB using sstableloader.

Related

How to flush data in all tables of keyspace in cassandra?

I am currently writing tests in golang and I want to get rid of all the data of tables after finishing tests. I was wondering if it is possible to flush the data of all tables in cassandra.
FYI: I am using 3.11 version of Cassandra.
The term "flush" is ambiguous in this case.
In Cassandra, "flush" is an operation where data is "flushed" from memory and written to disk as SSTables. Flushing can happen automatically based on certain triggers or can be done manually with the nodetool flush command.
However based on your description, what you want is to "truncate" the contents of tables. You can do this using the following CQL command:
cqlsh> TRUNCATE ks_name.table_name
You will need to iterate over each table in the keyspace. For more info, see the CQL TRUNCATE command. Cheers!

Cassandra drop keyspace, tables not working

I am starting with cassandra and have had some problems. I create keyspaces and tables to go playing, if I delete them and then run a describe keyspace they keep appearing to me. Other times I delete them and it tells me they don't exist, but I can't create them either because it says it exists.
Is there a way to clear that "cache" or something similar?
I would also like to know if through cqlsh I can execute a .cql file that is on my computer.
[cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4]
This may be due to the eventually consistent nature of Cassandra. If you are on a small test cluster and just playing around you could try doing CONSISTENCY ALL in cqlsh which will force the nodes to become consistent.
You should always run delete or drop command manually with CONSISTENCY ALL so that it will reflect all the nodes and DCs. Also you need to wait for a moment to replicate into the cluster. Once replicated you will not get deleted data else you need to run a repair in the cluster.

Why read fails with cqlsh query when huge tombstones is present

I have a table with huge tombstones.So when i performed a spark job (which reads) on that specific table, it gave result without any issues. But i executed same query using cqlsh it gave me error because huge tombstones present in that table.
Cassandra failure during read query at consistency one(1 replica
needed but o replicas responded ,1 failed
I know tombstones should not be there, i can run repair to avoid them , but apart from that why spark succeeded and cqlsh failed. They both use same sessions and queries.
How spark-cassandra connector works? is it different than cqlsh?
Please let me know .
thank you.
The Spark Cassandra Connector is different to cqlsh in a few ways.
It uses the Java Driver and not the python Driver
It has significantly more lenient retry policies
It full table scans by breaking the request up into pieces
Any of these items could be contributing to why it would work in SCC and not in CQLSH.

Cassandra data Migration from 1.2 to 3.0.2

I know similar questions have been asked before but I think my use case is very specific for which I could not find any answer.
In Production we are using Cassandra 1.2 with ByteOrderPartitioner in a 6 Node cluster with Priam as seed management tool. We have recently upgraded all the dependencies and trying to migrate to Cassandra 3.0.2 with Murmur Partitioner and for backward compatibility we need to enable thrift on new cluster .Also we want to migrate away from Priam also.
I was able to setup new cluster but facing lot of issues during data migration. I tried 3 things:
1) Use Copy Command : Fails when number of rows is large
2) SSTable2Json : Cassandra 3.0.2 has stopped supporting SSTable2Json
3) SSTableloader: Failing I think because of different cassandra version of source and destination
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:233)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:119)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:67)
Caused by: InvalidRequestException(why:unconfigured table schema_columnfamilies)
at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:37849)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1562)
at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1547)
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:225)
... 2 more
Right now I am kind of stuck,any help regarding this will be deeply appreciated. Please let me know if you need more details.
No, You cannot upgrade your sstables from 1.2 to 3.0.2 directly since the sstable will differ for different version. This link describes the steps for upgrading the cassandra versions. But it also does not helps for you, since you are having a change in the partitioner type.
Changing the partitioner type is not yet supported in cassandra as of
now (Link).
One of the solution I would prefer is,
Create a stand alone utility which is of cassandra 3.0.2 version to read all the data from you source cassandra and write to sstable
with the help of CQLSSTableWriter with the partition type of Murmur Partitioner (The trick is, you are writing
the sstable with the version 3.0.2, so this sstable will be easily
recognized by your new cluster). Then use SSTableLoader in your target cluster
But I am not sure about why you still require backward compatibility, while creating CQLSSTableWritter you can specify the column family schema with keyword
"WITH COMPACT STORAGE". But I didn't tried CQLSSTableWritter with "WITH COMPACT STORAGE", but without "WITH COMPACT STORAGE" I had tried, it will work for your case too.
Ok so if you try to migrate directly from 1.2 to 3.0.2, you're really looking for trouble.
The migration path should be
latest minor or 1.2
2.0 latest minor
2.1 latest minor
3.0.2
For each jump between version, read the https://github.com/apache/cassandra/blob/trunk/NEWS.txt file to know if you need special actions (upgrade sstable, ...)

Cassandra bulk insert solution

I have a java program run as service , this program must insert 50k rows/s (1 row have 25 column ) to cassandra cluster.
My cluster contain 3 nodes, 1 node have 4 cpu core (core i5 2.4 ghz) , 4 gb ram.
i used Hector api, multithread, bulk insert but the performance is too low as expect (about 25k rows /s ).
Any one have suggest another solution for that. Is there cassandra support an internal bulk insert (without use Thrift).
Astyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database.
Astyanax is currently in use at Netflix. Issues generally are fixed as quickly as possbile and releases done frequently.
https://github.com/Netflix/astyanax
I've had good luck creating sstables and loading them directly. There is a sstableloader
tool included in the distribution as well as a JMX interface. You can create the sstables using the SSTableSimpleUnsortedWriter class.
Details here.
The fastest way to bulk-insert data into Cassandra is sstableloader an utility provided by Cassandra in 0.8 onwards. For that you have to create sstables first which is possible with SSTableSimpleUnsortedWriter more about this is described here
Another faster way is Cassandras BulkoutputFormat for hadoop.With this we can write Hadoop job to load data to cassandra.See more on this bulkload to cassandra with hadoo

Resources