I am trying to build a connector from Apache Cassandra to Apache Ignite. Basically, I want to write all the new incoming data from Cassandra to Ignite. Is there any connector or something which can be helpful?
N.B - Stream data from Cassandra to Ignite
Ignite provides such integration out of the box, see here: https://apacheignite-mix.readme.io/docs/ignite-with-apache-cassandra
Related
Is there an official Apache Spark Streaming connector for Rabbitmq. If not is it there an alternate way to do it without using external libraries.
There is a lot of information about kappa architecture in the internet and after going through some of the conceptual aspects I am trying to drill down to something more concrete. As I main source I used this website.
Let's imaging you want to implement a kappa architecture involving the following tech stack:
Apache Kafka
Apache Spark
Apache Superset
Now imagine the application you want to build do data-analytics against has a PostgreSQL database. Of course you can easily directly connect apache superset with the PostgresSQL database and create charts.
But now you want to see how you would do this with a kappa architecture and you add kafka and spark.
You can emit events to kafka and you can read such events in apache spark. Kafka will retain messages for topcis a certain period as pointed out in the answers to this quesition. When I read about connecting superset with spark in the docs it says hive should be used as a connector (also the project websites states the tool is unsupported, and if you look at this issue on pyhive then you find impyla could be an alternative). But apache hive is a completely different project for a storage system. So how would this connection work?
Assuming you have kafka nodes running (with zookeper obviously) and also have spark running and then you connect apache superset through this hive connector with spark.
How can you write queries against the data that is in kafka (which is in fact the live data)?
On spark side itself you can easily write a scala program that reads data from kafka and does something with it but how can you achieve this from apache superset?
Or is this not the intended way of connecting the things?
If I understood your question, you'd need to use Spark Structured Streaming to register a streaming SQL table into the Hive metastore, which could be queried from Superset from the Spark Thiftserver.
Hive itself doesn't store any of the data. Hive also has a built-in Kafka query handler, so Spark isn't completely necessary.
But, Hive/Spark isn't the only option. You could use Spark to write to HDFS/S3 and have Presto query that from Superset.
Or you can remove Spark and use Kafka Connect write to any other thing that a dashboarding tool (Tableau is another popular one) can support - JDBC database (i.e. Postgres), Mongo, Cassandra, etc. Then you'd just refresh the panels to run a new query.
I need to connect to Apache Spark Stream where input will come from Kafka and processed data then go to Cassandra. I tried to find Spark connector but didn't get any result.
Is there any custom connector available ?
How can I use Apache Spark Stream in Mule ?
I need to connect to Apache Spark Stream where input will come from
Kafka and processed data then go to Cassandra.
So you need not a Spark connector, but Kafka connector: https://docs.mulesoft.com/mule-user-guide/v/3.8/kafka-connector
Is there a way to do Apache kafta ACL authentication using cassandra ? I have not seen any example on this so far.
Simple answer: There's no way. But i can't understand how you want to connect kafka with cassandra. You can't connect kafka directly with cassandra. You need an application between kafka and cassandra. Usually: A stream processor like spark or flink. If you want to use a ACL in this application, you can use Kafka ACL in Zookeeper. Here's wiki entry about this: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Authorization+Command+Line+Interface
I am working for a small concern and very new to apache cassandra. Studying about cassandra and performing some small analytics like sum function on cassandra DB for creating reports. For the same, Hive and Accunu can be choices.
Datastax Enterprise provides the solution for Apache Cassandra and Hive Integration. Is Datastax Enterprise is the only solution for such integration. Is there any way to resolve the hive and cassandra integration. If so, Can I get the links or documents regarding the same. Is that possible to work the same with the windows platform.
Is any other solution to perform analytics on cassandra DB?
Thanks in advance .
I was trying to download DataStax Enterprise (DSE) for Windows but found there is no such option on their website. I suppose they do not support DSE for Windows.
Apache Cassandra does have builtin Hadoop support. You need to set up a standalone Hadoop cluster colocated with Apache Cassandra nodes and then use ColumnFamilyInputFormat and ColumnFamilyOutputFormat to read/write data from/to your Hadoop cluster.