Apache kafta ACL authentication using cassandra - cassandra

Is there a way to do Apache kafta ACL authentication using cassandra ? I have not seen any example on this so far.

Simple answer: There's no way. But i can't understand how you want to connect kafka with cassandra. You can't connect kafka directly with cassandra. You need an application between kafka and cassandra. Usually: A stream processor like spark or flink. If you want to use a ACL in this application, you can use Kafka ACL in Zookeeper. Here's wiki entry about this: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Authorization+Command+Line+Interface

Related

Understanging kappa architecture with apache superset

There is a lot of information about kappa architecture in the internet and after going through some of the conceptual aspects I am trying to drill down to something more concrete. As I main source I used this website.
Let's imaging you want to implement a kappa architecture involving the following tech stack:
Apache Kafka
Apache Spark
Apache Superset
Now imagine the application you want to build do data-analytics against has a PostgreSQL database. Of course you can easily directly connect apache superset with the PostgresSQL database and create charts.
But now you want to see how you would do this with a kappa architecture and you add kafka and spark.
You can emit events to kafka and you can read such events in apache spark. Kafka will retain messages for topcis a certain period as pointed out in the answers to this quesition. When I read about connecting superset with spark in the docs it says hive should be used as a connector (also the project websites states the tool is unsupported, and if you look at this issue on pyhive then you find impyla could be an alternative). But apache hive is a completely different project for a storage system. So how would this connection work?
Assuming you have kafka nodes running (with zookeper obviously) and also have spark running and then you connect apache superset through this hive connector with spark.
How can you write queries against the data that is in kafka (which is in fact the live data)?
On spark side itself you can easily write a scala program that reads data from kafka and does something with it but how can you achieve this from apache superset?
Or is this not the intended way of connecting the things?
If I understood your question, you'd need to use Spark Structured Streaming to register a streaming SQL table into the Hive metastore, which could be queried from Superset from the Spark Thiftserver.
Hive itself doesn't store any of the data. Hive also has a built-in Kafka query handler, so Spark isn't completely necessary.
But, Hive/Spark isn't the only option. You could use Spark to write to HDFS/S3 and have Presto query that from Superset.
Or you can remove Spark and use Kafka Connect write to any other thing that a dashboarding tool (Tableau is another popular one) can support - JDBC database (i.e. Postgres), Mongo, Cassandra, etc. Then you'd just refresh the panels to run a new query.

How to load data from Cassandra to Druid

I am newbie to Druid. I am able to ingest data to Druid from S3, Kafka. Now, I want to load data from Cassandra hosted over AWS private subnet.
Is it even possible? If yes, please share some resources.
No. There doesn't seem to be any support for direct ingest from Cassandra. But you could setup Cassandra CDC to kafka and use kafka ingestion.

How to build Aggregations on Apache Solr with Spark

I have a requirement to build aggregations on the data that we receive to our Apache Kafka...
I am little bit lost which technlogical path to follow...
It seems people see the standard way, a constellation of Apache Kafka <-> Apache Spark <-> Solr
Bitnami Data Platform
I can't find concrete examples how this actually functions, but I am also asking myself would any solution von
Apache Kafka <-> Kafka Connect Solr <-> Solr
would not do the trick becasue solr supports aggregations also...
Solr Aggregation
but I saw some code snippets that aggregate the Data in Spark and write under special index to Solr.....
Also probably aggregation mit Kafka <-> Kafka Connect Solr <-> Solr will only function for only one Topic from Kafka, so if I have to combine the data from 2 or more, different Topics and aggregate, then Kafka, Spark, Solr is way to go.... (or this viable at all)
So as you may read, I am little bit confused, so I like to ask here, how are you approching this problem with your real life solutions....
Thx for answers...
Spark can of course join multiple topics. So can Flink, or Kafka Streams/KsqlDB. Spark or Flink just happen to be able to also write their data to external sources, such as Solr, rather than exclusively back into a new Kafka topic. The "downside" is that you need to maintain a scheduler exclusively for those, as compared to running a cluster of standalone Kafka Connect or Kafka Streams JAR applications. If you're using kubernetes, then that could be used for all of above (maybe not Flink... Haven't tried)
Kafka Connect can consume multiple topics and, depending on the connector configuration, might write to one or many Solr collections.

Connector from Apache Cassandra to Ignite

I am trying to build a connector from Apache Cassandra to Apache Ignite. Basically, I want to write all the new incoming data from Cassandra to Ignite. Is there any connector or something which can be helpful?
N.B - Stream data from Cassandra to Ignite
Ignite provides such integration out of the box, see here: https://apacheignite-mix.readme.io/docs/ignite-with-apache-cassandra

How to use Spark Streaming from an other vm with kafka

I have Spark Streaming on a virtual machine, and I would like to connect it with an other vm which contains kafka . I want Spark to get the data from the kafka machine.
Is it possible to do that ?
Thanks
Yes, it is definitely possible. In fact, this is the reason why we have distributed systems in place :)
When writing your Spark Streaming program, if you are using Kafka, you will have to create a Kafka config data structure (syntax will vary depending on your programming language and client). In that config structure, you will have to specify the Kafka brokers IP. This would be the IP of your Kafka VM.
You then just need to run Spark Streaming Application on your Spark VM.
It's possible and makes perfect sense to have them on separate VM's. That way there is a clear separation of roles.

Resources