I'm mainly looking for advice here around Kafka and disaster recovery failover.
Is there any way to use Kafka through CNAMEs/load balancer when using Kerberos?
When trying it, I get the below SPN error. This makes sense and I would fully expect this behaviour.
The only way I could picture this working would be to include a CNAME resolver into the Java client code before establishing a connection:
#Using the New Consumer API
#On any new connections, do the following:
1) Provide CNAME hostname in config
2) Resolve CNAME to list of A records for broker hosts
3) Pass these into the New Consumer as the bootstrap servers
This should work, however it would involve custom code.
The same concept applies for publishing to a topic.
Are there any ideas that might work without having to resort to this?
I am using CDH 5 with Cloudera-managed keytab distribution.
Consumer log
17/03/01 14:12:06 DEBUG consumer.KafkaConsumer: Subscribed to topic(s): build_smoke_test
17/03/01 14:12:06 DEBUG clients.NetworkClient: Initiating connection to node -1 at lb.cdh-poc-cluster.internal.cdhnetwork:9093.
17/03/01 14:12:06 DEBUG authenticator.SaslClientAuthenticator: Set SASL client state to SEND_HANDSHAKE_REQUEST
17/03/01 14:12:06 DEBUG authenticator.SaslClientAuthenticator: Creating SaslClient: client=alex#CDH-POC-CLUSTER.INTERNAL.CDHNETWORK;service=kafka;serviceHostname=lb.cdh-poc-cluster.internal.cdhnetwork;mechs=[GSSAPI]
17/03/01 14:12:06 DEBUG network.Selector: Connection with lb.cdh-poc-cluster.internal.cdhnetwork/172.3.1.10 disconnected
java.io.EOFException
at org.apache.kafka.common.network.SslTransportLayer.read(SslTransportLayer.java:488)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:81)
Broker log
2017-03-01 14:12:08,330 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Set SASL server state to HANDSHAKE_REQUEST
2017-03-01 14:12:08,330 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Handle Kafka request SASL_HANDSHAKE
2017-03-01 14:12:08,330 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Using SASL mechanism 'GSSAPI' provided by client
2017-03-01 14:12:08,331 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Creating SaslServer for kafka/kf0.cdh-poc-cluster.internal.cdhnetwork#CDH-POC-CLUSTER.INTERNAL.CDHNETWORK with mechanism GSSAPI
2017-03-01 14:12:08,331 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Set SASL server state to AUTHENTICATE
2017-03-01 14:12:08,334 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Set SASL server state to FAILED
2017-03-01 14:12:08,334 DEBUG org.apache.kafka.common.network.Selector: Connection with lb.cdh-poc-cluster.internal.cdhnetwork/172.3.1.10 disconnected
java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
at org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:243)
at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at kafka.network.Processor.poll(SocketServer.scala:472)
at kafka.network.Processor.run(SocketServer.scala:412)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199)
at org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:228)
... 6 more
Did you have a SPN registered against your CNAME (and if not, might that also be a solution)?
Related
We are trying to connect to two keyspaces of Cassandra (3.x) in the same application with the same Kerberos credentials. The application is able to connect to one keyspace but no the other. Access to the keyspaces has been verified.
Error on connection:
2022-08-22 13:15:10,972 [cluster-reconnection-0] DEBUG c.d.d.c.ControlConnection [--]- [Control connection] error on 169.24.167.109:9042 connection, trying next host
javax.security.auth.login.LoginException: No LoginModules configured for CassandraJavaClient
at javax.security.auth.login.LoginContext.init(LoginContext.java:264)
at javax.security.auth.login.LoginContext.<init>(LoginContext.java:417)
The ticket cache is :
CassandraJavaClient {
com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true ticketCache="/var//krb5cc_userlogin";
};
The same ticket cache file is used by the first connection - which succeeds. While the second connection fails. I am not even sure as to how to debug it (tried remote debugging and since the initial control connection is an Async call, unable to get to the actual error).
We are using com.datastax.cassandra:cassandra-driver-core:jar:3.6.0
Any ideas/help to debug / resolve this will be highly appreciated
I'm trying to make a connection to elasticsearch from my spark program.
My elasticsearch host is https and found no connection property for that.
We are using spark structred streaming Java API and the connection details are as follows,
SparkSession spark = SparkSession.builder()
.config(ConfigurationOptions.ES_NET_HTTP_AUTH_USER, "username")
.config(ConfigurationOptions.ES_NET_HTTP_AUTH_PASS, "password")
.config(ConfigurationOptions.ES_NODES, "my_host_url")
.config(ConfigurationOptions.ES_PORT, "9200")
.config(ConfigurationOptions.ES_NET_SSL_TRUST_STORE_LOCATION,"C:\\certs\\elastic\\truststore.jks")
.config(ConfigurationOptions.ES_NET_SSL_TRUST_STORE_PASS,"my_password") .config(ConfigurationOptions.ES_NET_SSL_KEYSTORE_TYPE,"jks")
.master("local[2]")
.appName("spark_elastic").getOrCreate();
spark.conf().set("spark.sql.shuffle.partitions",2);
spark.conf().set("spark.default.parallelism",2);
And I'm getting the following error
19/07/01 12:26:00 INFO HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server 10.xx.xxx.xxx failed to respond
19/07/01 12:26:00 INFO HttpMethodDirector: Retrying request
19/07/01 12:26:00 ERROR NetworkClient: Node [10.xx.xxx.xxx:9200] failed (The server 10.xx.xxx.xxx failed to respond); no other nodes left - aborting...
19/07/01 12:26:00 ERROR StpMain: Error
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:344)
Probably it's because it tries to initiate connection by http protocol but in my case I need https connection and not sure how to configure that
The error happened as spark was not able to locate the truststore file. It seems we need to add "file:\\" for the path to be accepted.
I'm trying some exercise with spark streaming with kafka. If I use kafka producer and consumer in command line, I can publish and consume the messages in kafka. When I try to do it using pyspark in jupyter notebook. I am getting zookeeper connection timeout error.
Client session timed out, have not heard from server in 6004ms for sessionid 0x0, closing socket connection and attempting reconnect
[2017-08-04 15:49:37,494] INFO Initiating client connection, connectString=127.0.0.1:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient#158da8e (org.apache.zookeeper.ZooKeeper)
[2017-08-04 15:49:37,524] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)
[2017-08-04 15:49:37,527] INFO Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2017-08-04 15:49:37,533] WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
[2017-08-04 15:49:38,637] INFO Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2017-08-04 15:49:38,639] WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
`
Zookeeper has issues when using localhost (127.0.0.1). Described in https://issues.apache.org/jira/browse/ZOOKEEPER-1661?focusedCommentId=13599352
This little program explains the following things:
ZooKeeper does call InetAddress.getAllByName (see StaticHostProvider:60) on the connect string "localhost:2181" => as a result it gets 3 different addresses for localhost which then get shuffled (Collections.shuffle(this.serverAddresses): L72
Because of the shuffling (random), the call to StaticHostProvider.next will sometime return the fe80:0:0:0:0:0:0:1%1 address which as you can see from this small program times out after 5s => this explains the randomness I am experiencing.
It really seems to me that what I am experiencing is a reverse dns lookup issue with IPv6. Whether this reverse dns lookup is actually useful and required by ZooKeeper, I do not know. It did not behave this way in 3.3.3.
Solution, specify your zookeeper address as a FQDN and make sure the reverse lookup works or use 0.0.0.0 instead of localhost.
We are running a map reduce/spark job to bulk load hbase data in one of the environments.
While running it, connection to the hbase zookeeper cannot initialize throwing the following error.
16/05/10 06:36:10 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=c321shu.int.westgroup.com:2181,c149jub.int.westgroup.com:2181,c167rvm.int.westgroup.com:2181 sessionTimeout=90000 watcher=hconnection-0x74b47a30, quorum=c321shu.int.westgroup.com:2181,c149jub.int.westgroup.com:2181,c167rvm.int.westgroup.com:2181, baseZNode=/hbase
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Opening socket connection to server c321shu.int.westgroup.com/10.204.152.28:2181. Will not attempt to authenticate using SASL (unknown error)
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.204.24.16:35740, server: c321shu.int.westgroup.com/10.204.152.28:2181
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Session establishment complete on server c321shu.int.westgroup.com/10.204.152.28:2181, sessionid = 0x5534bebb441bd3f, negotiated timeout = 60000
16/05/10 06:36:11 INFO mapreduce.HFileOutputFormat2: Looking up current regions for table ecpdevv1patents:NormNovusDemo
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Tue May 10 06:36:11 CDT 2016, org.apache.hadoop.hbase.client.RpcRetryingCaller#3927df20, java.io.IOException: Call to c873gpv.int.westgroup.com/10.204.67.9:60020 failed on local exception: java.io.EOFException
We have executed the same job in Titan DEV too but facing the same problem. Please let us know if anyone has faced the same problem before.
Details are,
• Earlier job was failing to connect to localhost/127.0.0.1:2181. Hence only the property hbase.zookeeper.quorum has been set in map reduce code with c149jub.int.westgroup.com,c321shu.int.westgroup.com,c167rvm.int.westgroup.com which we got from hbase-site.xml.
• We are using jars of cdh version 5.3.3.
I'm using the latest jar files of cassandra-mesos framework (by using this jason file: https://teamcity.mesosphere.io/repository/download/Oss_Mesos_Cassandra_CassandraFramework/97399:id/marathon.json), but getting the following errors:
I0310 13:19:34.699774 16389 sched.cpp:264] No credentials provided.
Attempting to register without authentication I0310 13:19:34.701026
16389 sched.cpp:819] Got error 'Completed framework attempted to
re-register' I0310 13:19:34.701038 16389 sched.cpp:1625] Asked to
abort the driver I0310 13:19:34.701364 16389 sched.cpp:861] Aborting
framework '20160309-183453-2497969674-5050-19271-0001' I0310
13:19:34.719744 16373 sched.cpp:1591] Asked to stop the driver I0310
13:19:34.719784 16389 sched.cpp:835] Stopping framework
'20160309-183453-2497969674-5050-19271-0001'
Any idea?
The error says Completed framework attempted to re-register which means the framework keeps its state somewhere (probably in Zookeeper, but cannot access your URL with marathon.json to verify), and thus tries to start with the framework ID stored in this state. However, that framework ID is already deregistered, and Mesos does not allow you to start the framework with the same ID again.
The solution to this would be either to pick a different znode for framework storage or remove the existing znode before starting the framework.
Thanks a lot :-). It's working now. but when I tried to check zookeeper for cassandra-mesos, I got the following error: mesos-resolve zk://mesos-master-2:2181/cassandra-mesos/cassandra-mesos-fw
2016-03-13 12:46:22,428:26613(0x7fa4fa843700):ZOO_INFO#log_env#712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2016-03-13 12:46:22,428:26613(0x7fa4fa843700):ZOO_INFO#log_env#716: Client environment:host.name=mesos-slave-1
2016-03-13 12:46:22,428:26613(0x7fa4fa843700):ZOO_INFO#log_env#723: Client environment:os.name=Linux
2016-03-13 12:46:22,428:26613(0x7fa4fa843700):ZOO_INFO#log_env#724: Client environment:os.arch=3.10.0-327.4.4.el7.x86_64
2016-03-13 12:46:22,428:26613(0x7fa4fa843700):ZOO_INFO#log_env#725: Client environment:os.version=#1 SMP Tue Jan 5 16:07:00 UTC 2016
2016-03-13 12:46:22,428:26613(0x7fa4fa843700):ZOO_INFO#log_env#733: Client environment:user.name=root
2016-03-13 12:46:22,428:26613(0x7fa4fa843700):ZOO_INFO#log_env#741: Client environment:user.home=/root
2016-03-13 12:46:22,428:26613(0x7fa4fa843700):ZOO_INFO#log_env#753: Client environment:user.dir=/ephemeral/cassandra-mesos
2016-03-13 12:46:22,428:26613(0x7fa4fa843700):ZOO_INFO#zookeeper_init#786: Initiating client connection, host=mesos-master-2:2181 sessionTimeout=10000 watcher=0x7fa5023200b0 sessionId=0 sessionPasswd= context=0x7fa4d8001ec0 flags=0
2016-03-13 12:46:22,429:26613(0x7fa4f6628700):ZOO_INFO#check_events#1703: initiated connection to server [10.254.227.148:2181]
2016-03-13 12:46:22,434:26613(0x7fa4f6628700):ZOO_INFO#check_events#1750: session establishment complete on server [10.254.227.148:2181], sessionId=0x25364fb9f3a0020, negotiated timeout=10000
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0313 12:46:22.434587 26616 group.cpp:313] Group process (group(1)#10.254.235.46:56890) connected to ZooKeeper
I0313 12:46:22.434659 26616 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0313 12:46:22.434670 26616 group.cpp:385] Trying to create path '/cassandra-mesos/cassandra-mesos-fw' in ZooKeeper
Failed to detect master from 'zk://mesos-master-2:2181/cassandra-mesos/cassandra-mesos-fw' within 5secs