Change Capture from DB2 to Cassandra - cassandra

I am trying to get all inserts, updates, deletes to a normalized DB2 database (hosted on an IBM Mainframe) synced to a Cassandra database. I also need to denormalize these changes before I write them to Cassandra so that the data structure meets my Cassandra model.
Searched on google but tools either lack processing support or streaming CDC support.
Is there any tool out there that can help me achieve the above?

It's likely that no stock tool exists. What's the format of the CDC stream coming out? What queries do you need to run? Like any other Cassandra data modeling question, start with the queries you need to run and work backwards to the table structure(s).

Related

what is the best way to migrate data from HBase to Cassandra for Janusgraph backend storage?

I am using HBase as backend for Janusgraph. I have to migrate to Cassandra as backend. What is the best way to migrate the old data?
one way to go for it is to read data from Hbase and put into Cassandra using java code.
Migrating data from JanusGraph is not well supported, so I would prefer myself to start from copies of the data that were made before ingesting it into JanusGraph. If that is not an option, your suggestion of using java code to read from one graph and ingest into the other comes first.
Naturally, you want to parallellize this, because millions of operations on a single thread and process take too long for being practical. Although JanusGraph supports OLAP traversals for reading vertices and edges in parallel, JanusGraph OLAP has its own problems and you are probably better of segmenting the data using a mixed index in JanusGraph and have each process/thread read the segment assigned to it using an OLTP traversal.

How different and efficient AlibabaTable Store is When compared with Apache Cassandra?

How different and efficient AlibabaTable Store is When compared with Apache Cassandra? I understand both are NoSQL Database. Can anyone please elaborate where and when Alibaba Table Store is preferred instead of Apache Cassandra.
You can think of Alibaba Cloud Table store as the Apache Cassandra because Table store checks all the requirements of Cassandra
The next thing when we talk about the benefits of Table Store compared to Cassandra, you need not worry about the below things when we use Table Store:
Scalability
multi-datacenter replication
Distributed
MapReduce support
Fault-tolerant
Well, Alibaba Cloud may not be using Cassandra at the backend, there is no mention of that.
All the scenarios where Cassandra is used, you can replace it with Table Store. But again I have not extensively worked with application involved Apache Cassandra.
If you read sample code for filtering, you will realized the differences with Cassandra. You will need use different data modelling in table store.

Cassandra vs Druid

I have a use case where i had to analyze real time data using Apache Spark. But i still have a confusion related to choosing data store for my application. The analysis mostly include aggregation, KPI based identity analysis and machine learning tools to predict trends and analysis. Cassandra has good support and large tech companies are already using it in production. But after research i found Druid is faster than Cassandra and is good for OLAP queries but it's results are inconsistent of queries like Count Distinct.
Guys any help related that will be appreciated. Thanks
As your use case is to analyze real time data, I will suggest you to use Druid not Apache Cassandra. For Apache Cassandra, due to its asynchronous master less replication you could have missed the updated data in real time analyzing. On the other hand, Druid is designed for real time analyzing.
Druid Details: http://druid.io/druid.html
Apache Cassandra Details: https://en.wikipedia.org/wiki/Apache_Cassandra

Is Apache Ignite suitable for my usecase(load oracle tables to cache,do join between these tables, and reflect changes to oracle data)

I would ask whether Ignite is suitable for my use case which is:
Load all the data of oracle tables to the Ignite cache, and then do various SQL queries(aggregation/join/sub-query) against the data in the cache.
When oracle has newly created data or some data are updated, there are some way that these data can be inserted into the cache or update the corresponding entry in the cache
When the cache is down, there should be some way to restore the data from oracle?
Not sure Ignite SQLGrid can fit in this use case.
Also, I notice that IgniteRDD is not immutable, is IgniteRDD suitable for this use case? That is, I first load the data in oracle into IgniteRDD,
and make the corresponding changes to IgniteRDD with the newly created/updated data to oracle? But it looks that IgniteRDD doesn't support complicated SQL?( aggregation/join/sub-query)
This is one of the basic use cases supported by Ignite.
Data can pre-loaded from Oracle using one of the methods covered in this documentation section.
If you're planning to update the data in Ignite first and propagate to Oracle after (which is preferred way), then it makes sense to use Oracle as a CacheStore in write-through/read-through mode. Ignite will make sure to sync up data with the persistent layer. Moreover, it'll be straightforward to pre-load data from Oracle if the cluster is restarted.
Finally, you can take advantage of GridGain Web Console by connecting to Oracle and map Oracle's scheme to Ignite caches configuration and POJO objects.
As I mentioned, it's recommended to make all the updates through Ignite first which will persist them to Oracle. But if Oracle is updated by other applications that are not aware of Ignite you need to update Ignite cluster on your own somehow. Ignite doesn't have any feature that covers this use case. However, this can be easily implemented with GridGain, that is built on top of Ignite, with it's Oracle Golden Gate Integration.
Once the data is in the Ignite Cluster use SQL Grid to query and/or update your data. SQL Grid engine is ANSI-99 compliant and doesn't have any limitations.
As for Ignite Shared RDD, it stores data in a distributed Ignite cache. This is why it's mutable which is opposite to Spark native RDDs. Shared RDDs SQL capabilities are absolutely the same - it's just one more API on top of SQL Grid.

Distributed Data Store - Hazelcast Vs Cassandra

We need to choose between HazelCast Or Cassandra as a distributed data store option. I have worked with cassandra but not with Hazelcast, will like to have a comparative analysis done features like :
Replication
Scalability
Availability
Data Distribution
Performance of reads/writes
Consistency
Will appreciate some help here to help us make the right choice.
The following page and the documents on the page might help on your decision: https://hazelcast.com/use-cases/nosql/apache-cassandra-replacement/
https://db-engines.com/en/system/Cassandra%3BHazelcast

Resources