Apache Cassandra as a message data store for ActiveMQ - cassandra

Can I use Apache Cassandra as a message data store for ActiveMQ?
The reason I am exploring this option is we have to have our application on cluster and provide scalability and failover :
a) KahaDb - Can be clustered but if the disk space goes out we cannot be failover.
b) MySQL / another RDBMS - point of failure is the DB.
c) Cassandra is in-memory database and also provides clustering.
Can someone help me understand if my reasons are correct and that Cassandra can help us persist message better than KahaDb,Database?
Thanks in advance.
Sandeep

Yes, it can be done. The QSandra project on GitHub has implemented an ActiveMQ message store on top of Cassandra.

Related

Best way to persist join table data in Cassandra

I've a scenario where I've a join created on top of 10 tables. This works great when the join is done in the database. Now, these tables are streaming data through Kafka topics (1:1 - table:topic mapping). I need to create/update the join(s) as the new messages come to the topic. So far, I've decided to store this data in a NoSQL DB like Cassandra and update the joined records as events keep coming. Here are my questions:
Is there a way to do this within Kafka itself?
If not in Kafka, what is the best way to do that?
Does the solution of persisting in Cassandra offer a better alternative?
Please Note: I've read that Cassandra isn't the right solution for joins. If not Cassandra, what is recommended? Please don't shoot the question down being subjective because if not others, at least, I expect to gain insights with that as well.
Is there a way to do this within Kafka itself?
Yes, using Kafka Streams or KSQL.
Kafka Streams details and example
KSQL details and example
As Justin Cameron noted, joins are limited to 2-way joins, so you would need to "daisy chain" your transformations. Each would write back to an staging Kafka topic, and the final joined result would also be a Kafka topic. From here, you can stream it to Cassandra using Kafka Connect (part of Apache Kafka).
Disclaimer: I work for Confluent, the company behind the open-source KSQL project.

Is it right to access external cache in apache spark applications?

We have many micro-services(java) and data is being written to hazelcast cache for better performance. Now the same data needs to be made available to Spark application for data analysis. I am not sure If this is right design approach to access external cache in apache spark. I cannot make database calls to get the data as there will be many database hits which might affect micro-services(currently we dont have http caching).
I thought about pushing the latest data into Kafka and read the same in spark. However, data(each message) might be big(> 1 MB sometimes) which is not right.
If its ok to use external cache in apache spark, is it better to use hazelcast client or to read Hazelcast cached data over rest service ?
Also, please let me know If there are any other recommended way of sharing data between Apache Spark and micro-services
Please let me know your thoughts. Thanks in advance.

Distributed Data Store - Hazelcast Vs Cassandra

We need to choose between HazelCast Or Cassandra as a distributed data store option. I have worked with cassandra but not with Hazelcast, will like to have a comparative analysis done features like :
Replication
Scalability
Availability
Data Distribution
Performance of reads/writes
Consistency
Will appreciate some help here to help us make the right choice.
The following page and the documents on the page might help on your decision: https://hazelcast.com/use-cases/nosql/apache-cassandra-replacement/
https://db-engines.com/en/system/Cassandra%3BHazelcast

Maximum parallel queries in Cassandra

I'm new in Cassandra DB, and I have a very trivial question: how much parallel queries can O do without compromising perfomance? The queries are going to be like
Select data from table where id='asdasdasd';
Its a server in a datacenter, it should work properly with 3000 read querys? Sorry for the poor information but its all i have.
It all depends on the server's capacity where you have installed your cluster of Cassandra, and how you have configured the nodes.
There is a configuration parameter in cassandra.yaml that is concurrent_reads
Tune it to get a better read rate.

how to integrate cassandra with zookeeper to support transactions

I have a Cassandra cluster and Zookeeper server installed. Now I want to support transactions in cassandra using zookeeper. How do i do that.
Zookeeper creates znodes to perform read and write operations and data to and fro goes through znodes in Zookeeper. I want to know that how to support rollback and commit feature in cassandra using Zookeeper. Is there any way by which we can specify cassandra configurations in zookeeper or zookeeper configurations in cassandra.
I know cassandra and zookeeper individually how data is read and written but I dont know how to integrate both of them using Java.
how can we do transactions in Cassandra using Zookeeper.
Thanks.
I have a Cassandra cluster and Zookeeper server installed. Now I want to support transactions in cassandra using zookeeper. How do i do that.
With great difficulty. Cassandra does not work well as a transactional system. Writes to multiple rows are not atomic, there is no way to rollback writes if some writes fail, and there is no way to ensure readers read a consistent view when reading.
I want to know that how to support rollback and commit feature in cassandra using Zookeeper.
Zookeeper won't help you with this, especially the commit feature. You may be able to write enough information to zookeeper to roll back in case of failure, but if you are doing that, you might as well store the rollback info in cassandra.
Zookeeper and Cassandra work well together when you use Zookeeper as a locking service. Look at the Cages library. Use zookeeper to co-ordinate read/writes to cassandra.
Trying to use cassandra as a transactional system with atomic commits to multiple rows and rollbacks is going to be very frustrating.
There are ways you can use to implement transactions in Cassandra without ZooKeeper.
Cassandra itself has a feature called Lightweight transactions which provides per key linearizability and compare-and-set. With such primitives you can implement serializable transactions on the application level by youself.
Please see the Visualization of serializable cross shard client-side transactions post for for details and step-by-step visualization.
The variants of this approach are used in Google's Percolator system and in CockroachDB.
By the way, if you're fine with Read Committed isolation level then it makes sense to take a look on the RAMP transactions paper by Peter Bailis.
There is a BATCH feature for Cassandra's CQL3 (Cassandra 1.2 is the formal version that released CQL3), which allegedly can atomically apply all the updates in the BATCH as one unit all-or-nothing.
This does not mean you can rollback a successfully executed BATCH as an RDBMS could do, that would have to be manually done.
Depending on the consistency and preferences you provide to the BATCH statement, guarantees of atomicity of the updates can be increased or decreased to some degree with the UNLOGGED option.
http://www.datastax.com/docs/1.2/cql_cli/cql/BATCH
Well, I'm not an exepert at this (far from it actually) but the way I see it, either you deploy some middleware made by yourself, in order to guarantee the specific properties you are looking for or you can just have Cassandra write data to auxilliary files and then copy them through the file system, since the copy function in Java works as an atomic operation.
I don't know anything about the size of the data files you are considering so I don't really know if it is doable, however there might be a way to use this property through smaller bits of information and then combining them as a whole.
Just my 2 cents...

Resources