Cassandra back using java code - cassandra

Is there a way that i can backup a single table in Apache Cassandra with java code. i want to run such code once every week using a scheduler. Can some one share links to such resources, if there any?

Have a look at this answer.
Fetch all rows in cassandra
It's just a matter of adding code to export let's say every row to csv or some similar format that would be fine for you.
You will also have to write script to load this data, but those are just simple inserts.

Related

What is the best way to export all of my data from a Cassandra Cluster?

I am very new to Cassandra and any help here would be appreciated. I have a cluster of 6 nodes that spans 2 datacenters (3 nodes to each cluster). My client has decided that they do not want to renew their Cassandra license with Datastax anymore and want their data exported into a format that can be easily imported into another Database in the future. I was thinking of exporting the data as a CSV file, but since the data is distributed between all the nodes, I am not sure what is the best way to export all the data.
One option - You should be able to use the CQL COPY command - which copies the data into a CSV format. The nice thing about copy is that you can run it from a single node (i.e. it is not a "node" level tool). Command would be (once in cqlsh):
CQL> COPY . to '/path/to/file'
If there is a LOT of data, or a lot of tables, this tool may not be a great fit. But for small number of tables that don't have HUGE rowcounts (< several million), this works well. Hope that helps.
-Jim
Since 2018 you can use DSBulk with DSE to export or import data to/from CSV (by default), or JSON. Since the end of 2019 it's possible to use it with open source Cassandra as well.
It could be as simple as:
dsbulk unload -k keyspace -t table -u user -p password -url filename
DSBulk is heavily optimized for fast data export, without putting too much load onto the coordinator node that happens when you just run select * from table.
You can control what columns to export, and even provide your own query, etc. DataStax blog has a series of blog posts about different aspects of using DSBulk:
Introduction and Loading
More Loading
Common Settings
Unloading
Counting
Examples for Loading From Other Locations
You can use CQL COPY command for exporting the data from Cassandra cluster. However it is performant for small set of data if you are having big size of data this command is not useful cause it will give some error or timeout issue. Also, you may use sstabledump and export your node-wise date into JSON format. Hope, this will useful for you.
I have implemented small script for this purpose. It isn't the best way, since it slow and, in my experience, produces connection errors on system tables. But it could be useful for inspecting Cassandra on small datasets: https://github.com/kirillt/cassandra-utils

Load data from one table to another every 10 mins - Cassandra

We have a stream of data coming to Table A every 10 mins. No history preserved. The existing data has to be flushed to a new table B every time data is loaded in Table A. Can this be done dynamically or automated in Cassandra?
I can think of loading the Table A into a CSV file and then loading back to Table B every time Table A is flushed. But i would like to have something done at the database level itself.
Any ideas or suggestions appreciated.
Thanks,
Arun
For smaller amounts of data you could put this into cron:
https://dba.stackexchange.com/questions/58901/what-is-a-good-way-to-copy-data-from-one-cassandra-columnfamily-to-another-on-th
If larger and running newer versions of cassandra (3.8+)
http://cassandra.apache.org/doc/latest/operating/cdc.html
https://issues.apache.org/jira/browse/CASSANDRA-8844
and then replay the data to the table that you need (by some sort of outside process, script, app etc ...).
Basically there are already some tools around like:
https://github.com/carloscm/cassandra-commitlog-extract
You could use the samples there to cover your use-case.
But for most use cases this is handled at the application level, writes are relatively cheap with cassandra.

How to migrate data between two tables in Cassandra properly

I have to change the schema of one of my tables in Cassandra. It's cannot be done by simply using ALTER TABLE command, because there are some changes in primary key.
So the question is: How to do such a migration in the best way?
Using COPY command in cql is not an option in here because dump file can be really huge.
Can I solve this problem by not creating some custom application?
Like Guillaume has suggested in the comment - you can't do this directly in cassandra. Schema altering operations are very limited here. You have to perform such migration manually using one of suggested there tools OR if you have very large tables you can leverage Spark.
Spark can efficiently read data from your nodes, transform them locally and save them back to db. Remember that such migration requires reading whole db content, so might take a while. It might be the most performant solution, however needs some bigger preparation - Spark cluster setup.

SQLyog Job Agent (SJA) for Linux: how to write something like 'all table but a single one'

I am trying to learn SQLyog Job Agent (SJA).
I am on a Linux machine, and use SJA from within a bash script by a line command: ./sja myschema.xml
I need to sync an almost 100 tables db and its local clone.
Since a single table stores a few config data, which I do not wish to sync, it seems I need to write a myschema.xml where I list all the remaining 99 tables.
Question is: is there a way to write to sync all the table but a single one?
I hope my question is clear. I appreciate your help.
If you are using the latest version of sqlyog:You are given the table below, and the option to generate an xml job file at the end of the database syncronisation wizard reflecting the operation you've opted to perform. This will in effect list the other 99 tables in the xml file itself for you, but it will give you what you are looking for, and I dont think you would be doing anything in particular with an individual table, since you are specifying all tables in a single element.

Cassandra Database Problem

I am using Cassandra database for large scale application. I am new to using Cassandra database. I have a database schema for a particular keyspace for which I have created columns using Cassandra Command Line Interface (CLI). Now when I copied dataset in the folder /var/lib/cassandra/data/, I was not able to access the values using the key of a particular column. I am getting message zero rows present. But the files are present. All these files are under extension, XXXX-Data.db, XXXX-Filter.db, XXXX-Index.db. Can anyone tell me how to access the columns for existing datasets.
(a) Cassandra doesn't expect you to move its data files around out from underneath it. You'll need to restart if you do any manual surgery like that.
(b) if you didn't also copy the schema definition it will ignore data files for unknown column families.
For what you are trying to achieve it may probably be better to export and import your SSTables.
You should have a look at bin/sstable2json and bin/json2sstable.
Documentation is there (near the end of the page): Cassandra Operations

Resources