Is it feasible to migrate from Hbase to Cassandra?
If so, Could you please assist with steps to migrate from Hbase to Cassandra?
The simplest way to migrate data into a Cassandra database is by exporting the data from the source into CSV files.
Once you have the CSV files, you can bulk-load them into Cassandra tables using the DataStax Bulk Loader (DSBulk) tool.
Here are some references with examples to help you get started quickly:
Blog - DSBulk Intro + Loading data
Blog - More DSBulk Loading examples
Blog - Counting records with DSBulk count
Docs - Loading data examples
You can use the count command I linked above to check how many records were bulk-loaded as a way of verifying that it worked. DSBulk is open-source so it's free to use. Cheers!
Related
We are using DSE Cassandra v4.8.9. I have a requirement where I need to find if the tables in Cassandra are being read using some metadata query or analyzing the logs.
We can definitely find from the application but for that we need to do code change, and we don't have cycles, so thought of considering the ways to get these details using features of DSE/Cassandra if that already available. Please advise.
You can obtain this information via JMX metrics - you need to look to the table metrics. You can obtain these metrics via JMX client, or use the nodetool cfstats command (it was renamed to tablestats in the later versions) - look to local read & write counts...
I am new to DSE graphs.
I have around 1000 records in a csv file, where each record has around 20 attributes, which I want to load in gremlin. All the records would form a separate vertex in the graph.
Is there a way to directly load all the records in one go. I found this link that used storage backend as DynamoDB but that link didn't help. I have my backend as DSE Cassandra.
I also tried populating the Cassandra DB with the records and then trying to form graph using this data but it didn't work.
Please let me know if there is way to do the required. Thanks in advance.
The DSE Graph Loader is a standalone tool can load CSV documents into DSE Graph. You would need to create a mapping script to use with the loader. You can find its documentation at https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/graph/dgl/dglCSV.html
I have a use case where i had to analyze real time data using Apache Spark. But i still have a confusion related to choosing data store for my application. The analysis mostly include aggregation, KPI based identity analysis and machine learning tools to predict trends and analysis. Cassandra has good support and large tech companies are already using it in production. But after research i found Druid is faster than Cassandra and is good for OLAP queries but it's results are inconsistent of queries like Count Distinct.
Guys any help related that will be appreciated. Thanks
As your use case is to analyze real time data, I will suggest you to use Druid not Apache Cassandra. For Apache Cassandra, due to its asynchronous master less replication you could have missed the updated data in real time analyzing. On the other hand, Druid is designed for real time analyzing.
Druid Details: http://druid.io/druid.html
Apache Cassandra Details: https://en.wikipedia.org/wiki/Apache_Cassandra
I am trying to get all inserts, updates, deletes to a normalized DB2 database (hosted on an IBM Mainframe) synced to a Cassandra database. I also need to denormalize these changes before I write them to Cassandra so that the data structure meets my Cassandra model.
Searched on google but tools either lack processing support or streaming CDC support.
Is there any tool out there that can help me achieve the above?
It's likely that no stock tool exists. What's the format of the CDC stream coming out? What queries do you need to run? Like any other Cassandra data modeling question, start with the queries you need to run and work backwards to the table structure(s).
We are using Apache Cassandra to save data into. Except the spark what are the tools/technologies to perform the data analytics after reading data from cassandra. Spark is good but it needs a programmer(java/scala/python) to add/modify the future requirements which leads to high maintenance cost. What are the other alternatives?
If you want to go with Spark on top of Cassandra, many have accomplished good results with Cassandra, Hive, and Hadoop. Others have accomplished similar results using a mix of Cassandra, Hive, and Solr.
Another decent set of slides and tutorial for running analysis of data via Cassandra and Hadoop. You will find more in depth explanation of this via the PDF download on the provided page.
If you're interested in continuing to pursue Spark, you can evaluate DataStax Enterprise, which took the complexity out of it and allows you to run Spark right on top of Cassandra.
To answer your question, you have a few industry proven options... Primarily Hadoop and Hive.