I have a Kafka cluster with zookeeper and Kafka security manager (KSM). KSM helps with Kafka ACLs (https://github.com/conduktor/kafka-security-manager).
When a topic is created, it becomes automatically public if no permissions entry are in the Kafka ACL.
How can I make a topic private when it gets created in Kafka?
Thank you
Related
I am writing a pub/sub implementation which uses Azure EventHub as the underlying event ingestion service. In my application, the publishers will publish events to a particular EventHub partition and the consumers who are subscribed to that particular partition will receive events. Usually a consumer will be assigned to a unique EventHub ConsumerGroup, and in some cases there can be multiple consumer assignments to the same ConsumerGroup.
Let's say I have two consumers (consumer-1, consumer-2) in the same ConsumerGroup (consumer-group-1) who are subscribed to events of a particular EventHub partition (partition '0' of event-hub-1).
When we send an event to the partition '0' of 'event-hub-1', how would the message delivery happens ?
Will both consumers (consumer-1, consumer-2) get the same message ?
Or will the ConsumerGroup load-balance the messages among the consumers as in traditional Kafka and only one consumer gets the message ?
Sample Code: https://github.com/ballerina-platform/ballerina-standard-library/issues/3483#issuecomment-1272824977
Note:
Application is written in ballerina language which internally uses Kafka Java Client
A consumergroup is a "group of consumers" as the name suggests. Each consumergroup gets a copy of the message and one consumer of that consumergroup out of many receives that message. So, regarding your scenario, either consumer-1 or consumer-2 will get the message since they are in the same consumergroup.
Kafka Consumer supports two models to consume messages from a topic.
Join a ConsumerGroup and subscribe to the topic.
If the consumer know from which partition to read events, assign itself to the relevant partition of the topic.
Both models are mutually exclusive and given a Kafka Consumer, it should use only one model to consumer messages.
When a Kafka Consumer joins a particular ConsumerGroup, the consumer will be assigned to a set of partitions from the topic to which it has subscribed. Two consumers from the same ConsumerGroup are not assigned to the same partition(s) of a given topic. As per the Kafka documentation this is either handled by the zookeeper or the Kafka cluster itself.
But when we assign partitions manually to a consumer, the consumer will not use the consumer's group management functionality.
In the above scenario I have manually assigned partitions to the consumers and hence both consumers will not use the group management functionality. So, the both consumers will get all the messages sent to that particular partition. This is properly explained in the EventHub documentation.
For more information about the inner workings of the Kafka Consumer we could refer to Standalone Consumer: Why and How to Use a Consumer Without a Group section of Kafka Definitive Guide - Chapter 04.
Is there a way to know creation date of databricks interactive cluster ?
I looked at configuration tab as well as JSON of ARM but couldn't find it..
Cluster event logs, which capture cluster lifecycle events, like creation, termination, configuration edits, and so on.
The cluster event log displays important cluster lifecycle events that are triggered manually by user actions or automatically by Azure Databricks. Such events affect the operation of a cluster as a whole and the jobs running in the cluster.
I was listening to this talk, where it is told that Event Hub is compatible with the Kafka protocol and that if an app is writing or reading from Kafka topics, its possible to use an Event Hub broker in place of a Kafka broker.
But does that also mean that we can use Kafka connectors for Event Hub? For example, if I want to bring in data from a Postgres Database into a Kafka topic using a Postgres Kafka connector, can I simply change the broker address to that of an Event Hub broker to bring it into an Event Hub topic instead?
Yes, it is possible to use Kafka connectors with Azure Event Hubs endpoints. I am not knowledgeable about PostgreSQL connector configuration however I can point you this Kafka CLI sample - https://github.com/Azure/azure-event-hubs-for-kafka/tree/master/tutorials/connect. Probably, PostgreSQL connector is also configured similar to CLI connector.
I read about, Event HUB, HDinsights and deploying Kafka on IaaS in Availability Set.
I need to know which are the requirements to implementing Kafka on AKS.
How can I know how many nodes needed? And also want to know how to calculate billing.
Finally to compare the three that I mentioned vs Kafka on AKS
The number of nodes depends on your requirements in terms of load, throughput and so on. If you are going to choose Apache Kafka on AKS I would suggest using the Strimzi project (https://strimzi.io/) to deploy it pretty easily. I also wrote a simple demo for a session I had about this (https://github.com/ppatierno/strimzi-aks).
We have been PoC-ing falcon for our data ingestion workflow. We have a requirement to use falcon to setup a replication between two clusters (feed replication, not mirroring). The problem I have is that the user ID on cluster A is difference from the ID in cluster B. Has anyone used falcon with this setup? I can't seem to find a way to get this to work.
1) I am setting up a replication from Cluster A => Cluster B
2) I am defining the falcon job on cluster A
At the time of the job setup it looks like I can only define one user ID that owns the job. How do I setup a job where the ID on cluster A is different from the ID in cluster B? Any help would be awesome!!
Apache Falcon uses 'ACL owner', which should have write access as the target cluster where the data is to be copied.
Source cluster should have webhdfs enabled, by which the data will be accessed.
So on the source cluster dont schedule the feed, if the user does not have write access which is required for retention.
Hope this helps.