Kafka Schema Registry failed to initialize after ZK and Brokers are configured with SASL_PLAINTEXT Security - security

We are using Confluent community edition setup for Kafka, currently we have a requirement to configure ACLs around the cluster, accordingly we have configured the zk and broker nodes so clients requires authentication(username/password) SASL_PLAINTEXT tokens to publish/subscribe to cluster, its working perfectly without schema registry, however while configuring schema-registry, its unable initialize throwing below exception even though we have configurred it to use SASL+PLAINTEXT connection with brokers/zk nodes. Is there anything I'm missing please help.
Please note we are using allow.everyone.if.no.acl.found=true flag and currently we dont have and ACL defined, so I don't think we need to setup any ACLs for _schemas topic which is used by schema registry to initialize.
[2019-12-17 00:33:23,844] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:64)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: Error initializing kafka store while initializing schema registry
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:212)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:62)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:73)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:40)
at io.confluent.rest.Application.createServer(Application.java:201)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:42)
Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreInitializationException: Timed out trying to create or validate schema topic configuration
at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:172)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:114)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:210)
... 5 more
Caused by: java.util.concurrent.TimeoutException
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:274)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:165)
... 7 more

Related

org.postgresql.util.PSQLException: SSL error: Received fatal alert: handshake_failure while writing from Azure Databricks to Azure Postgres Citus

I am trying to write pyspark dataframe to Azure Postgres Citus (Hyperscale).
I am using latest Postgres JDBC Driver and I tried writing on Databricks Runtime 7,6,5.
df.write.format("jdbc").option("url","jdbc:postgresql://<HOST>:5432/citus?user=citus&password=<PWD>&sslmode=require" ).option("dbTable", table_name).mode(method).save()
This is what I get after running the above command
org.postgresql.util.PSQLException: SSL error: Received fatal alert: handshake_failure
I have already tried different parameters in the URL and unders the option as well, but no luck so far.
However, I am able to connect to this instance using my local machine and on databricks driver/notebook using psycopg2
Both the Azure Postgres Citus and Databricks are in the same region and Azure Postgres Citus is public.
It worked by overwriting the java security properties for driver and executor
spark.driver.extraJavaOptions -Djava.security.properties=
spark.executor.extraJavaOptions -Djava.security.properties=
Explanation:
What is happening in reality is that the “security” variable of the JVM is reading by default the following file (/databricks/spark/dbconf/java/extra.security) and in this file there are some TLS algorithms that are being disabled by default. That means that if I edit this file and replace the TLS cyphers that work for PostGres citus for an empty string that should also work.
When I set this variable to the executors (spark.executor.extraJavaOptions) it will not change the default variables from the JVM. The same does not happen for the driver which overwrites and so it starts to work.
Note: We need to edit this file before the variable is read and so the init script is the only way of accomplishing that.

How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

I am trying to read data from a secured Kafka cluster using spark structured streaming.
Also I am using the below library to read the data - "spark-sql-kafka-0-10_2.12":"3.0.0-preview" since it has the feature to specify our custom group id (instead of spark setting its own custom group id)
Dependency used in code:
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.12</artifactId>
<version>3.0.0-preview</version>
I am getting the below error - even after specifying the required JAAS configuration in spark options.
Caused by: java.lang.IllegalArgumentException: requirement failed: Delegation token must exist for this connector.
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.kafka010.KafkaTokenUtil$.isConnectorUsingCurrentToken(KafkaTokenUtil.scala:299)
at org.apache.spark.sql.kafka010.KafkaDataConsumer.getOrRetrieveConsumer(KafkaDataConsumer.scala:533)
at org.apache.spark.sql.kafka010.KafkaDataConsumer.$anonfun$get$1(KafkaDataConsumer.scala:275)
Following document specifies that we can disable the feature of obtaining delegation token - https://spark.apache.org/docs/3.0.0-preview/structured-streaming-kafka-integration.html
I tried setting this property spark.security.credentials.kafka.enabled to false in spark config, but it is still failing with the same error.
Apparently there seems to be a bug on the preview release and has been fixed on the GA Spark 3.x release.
Reference :
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-30495
Now, we can specify our custom consumer group name while fetching the data from Kafka (Even though it's not recommended and we will see a warning message while specifying it).

How to connect KairosDB with Azure Cosmos (Cassandra)

I use KairosDB on top of Cassandra for saving all our time series data. I am now trying to replicate the same KairosDB with Azure CosmosDB (Cassandra API). But its throwing error:
16:59:08.364 [main] INFO [LZ4Compressor.java:52] - Using LZ4Factory:JNI
16:59:08.441 [main] INFO [NettyUtil.java:73] - Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
16:59:08.842 [main] ERROR [CassandraModule.java:136] - Unable to setup cassandra schema
com.datastax.driver.core.exceptions.AuthenticationException: Authentication error on host ilenstsdb2.cassandra.cosmos.azure.com/40.65.106.154:10350: Cql request had unsupported headers Compression
at com.datastax.driver.core.Connection$8.apply(Connection.java:392)
at com.datastax.driver.core.Connection$8.apply(Connection.java:361)
enter image description here
I don't have a huge amount of exposure to the Cassandra API but the error is indicating that your API request contained the header Compression while authenticating and the server didn't like it.
Make sure you're not using withCompression() when starting your datastax driver:
cluster = Cluster.builder()
.addContactPoint("X.X.X.X")
.withCompression(ProtocolOptions.Compression.LZ4)
.build();
...should be...
cluster = Cluster.builder()
.addContactPoint("X.X.X.X")
.build();

Kafka Zookeeper Security Authentication & Authorization(JAAS) Using SASL

Regarding Kafka-Zookeeper Security using DIGEST MD5 Authentication, I am trying to rotate/change credentials/password for both server(zookeeper) and client(kafka) jaas config file.
We have a 3 node cluster of 3 zookeepers and 3 kafka broker nodes with below jaas configuration file.
kafka.conf
org.apache.zookeeper.server.auth.DigestLoginModule required
username="super"
password="password";
};
zookeeper.conf
Server {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_super="password";
};
To rotate we do a rolling restart of server(zookeeper) instances after updating the credential(password) and during the process of rolling restart after updating the same credential/password for super user for client(kafka instances) one at a time, we notice
[2019-06-15 17:17:38,929] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-06-15 17:17:38,929] INFO [ZooKeeperClient] Connected. (kafka.zookeeper.ZooKeeperClient)
these info level in server logs, which eventually results in unclean shutdown and restart of the broker which impacts the writes and reads for longer than expected. I have tried commenting requireClientAuthScheme=sasl in zookeeper zoo.cfg https://cwiki.apache.org/confluence/display/ZOOKEEPER/Client-Server+mutual+authentication to allow any clients authenticate to zookeeper but no success.
Also, alternative approach - tried to update the credential/password in jaas config file dynamically using sasl.jaas.config and do get the same exception documented in this jira (reference: https://issues.apache.org/jira/browse/KAFKA-8010).
can someone have any suggestions? Thanks in advance.

what is zookeeper.broker.path

I'm learning Spark and Kafka and came across this project kafka-spark-consumer that seems to consume messages from Kafka efficiently. This project requires to configure few kafka & zookeeper properties thats where I'm struggling. I mean what does this property mean zookeeper.broker.path? Sorry, if its a basic question.
I have configured kafka in single node and with the following properties,
broker.id=1
port=9093
log.dir=/tmp/kafka-logs-1
and zookeeper as,
zookeeper.connect=localhost:2181/brokers
zookeeper.connection.timeout.ms=6000
if i try to configure the zookeeper.broker.path with /brokers i get the following exception from the consumer,
Exception in thread "main" java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/<name>/partitions
at consumer.kafka.ReceiverLauncher.getNumPartitions(ReceiverLauncher.java:217)
at consumer.kafka.ReceiverLauncher.createStream(ReceiverLauncher.java:79)
at consumer.kafka.ReceiverLauncher.launch(ReceiverLauncher.java:51)
at com.ibm.spark.streaming.KafkaConsumer.run(KafkaConsumer.java:78)
at com.ibm.spark.streaming.KafkaConsumer.start(KafkaConsumer.java:43)
at com.ibm.spark.streaming.KafkaConsumer.main(KafkaConsumer.java:103)
Can you help me to understand what is the zookeeper broker path here and how can i configure that?
EDIT
The above error is caused due to non-existent topic, the moment i created the topic, the error went away.
As answered by user007, the /brokers directory is created by zookeeper by default.
No need of '/brokers' for zookeeper.connect property. It should be
zookeeper.connect=localhost:2181
I am not familiar with the "kafka-spark-consumer" project which you mentioned. But usually /brokers is the default node kafka creates in zookeeper. I haven't seen any library asking the user to configure it.
/brokers is the znode path under which metadata like topics are stored.
Go to kafka bin directory. Then invoke zookeeper shell - ./zookeeper-shell.sh localhost
Then do ls. You should be able to see topics and other child nodes created there.

Resources