Failure to re-authenticate Kafka client SASL connection, should close connection - security

As per KIP-368 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-368), when 'connections.max.reauth.ms' is explicitly set to a positive number the server will disconnect any SASL connection that does not re-authenticate.
If the re-authentication attempt fails then the connection will be closed by the broker, retries are not supported.
However when my client fails to re-authenticate, it goes into infinite loops of retry.
INFO [kafka-producer-network-thread | producer-1] org.apache.kafka.common.network.Selector: [Producer clientId=producer-1][Producer clientId=producer-1] Failed authentication with 10.4.252.249/10.4.252.249 (Authentication failed during authentication due to invalid credentials with SASL mechanism)
ERROR [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient: [Producer clientId=producer-1] Connection to node 0 (10.4.252.249/10.4.252.249:9096) failed authentication due to: Authentication failed during authentication due to invalid credentials with SASL mechanism
INFO [kafka-producer-network-thread | producer-1] org.apache.kafka.common.network.Selector: [Producer clientId=producer-1][Producer clientId=producer-1] Failed authentication with 10.4.252.249/10.4.252.249 (Authentication failed during authentication due to invalid credentials with SASL mechanism)
ERROR [kafka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient: [Producer clientId=producer-1] Connection to node 0 (10.4.252.249/10.4.252.249:9096) failed authentication due to: Authentication failed during authentication due to invalid credentials with SASL mechanism
I want the client to exit so I can bubble up the exception.
Any ideas how I can address this?

java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.SaslAuthenticationException: Authentication failed: Invalid username or password
Kafka will throw SaslAuthenticationException when authenticate fail. You can surround client code with try catch and close the client in the catch block.
Take admin client for example.
//... init other properties
// set request time out, the default timeout takes too long
properties.put(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 3000);
// get client
Admin admin = Admin.create(properties);
// test connection, if error happens, close it, in case of infinite retrying to connect to kafka by admin client, because of the meta data update thread
try {
admin.listTopics().names().get();
} catch (Exception e) {
closeClient(admin);
throw new RRException(e);
}

Related

Why does Cassandra Kerberos Connection to second keyspace fail?

We are trying to connect to two keyspaces of Cassandra (3.x) in the same application with the same Kerberos credentials. The application is able to connect to one keyspace but no the other. Access to the keyspaces has been verified.
Error on connection:
2022-08-22 13:15:10,972 [cluster-reconnection-0] DEBUG c.d.d.c.ControlConnection [--]- [Control connection] error on 169.24.167.109:9042 connection, trying next host
javax.security.auth.login.LoginException: No LoginModules configured for CassandraJavaClient
at javax.security.auth.login.LoginContext.init(LoginContext.java:264)
at javax.security.auth.login.LoginContext.<init>(LoginContext.java:417)
The ticket cache is :
CassandraJavaClient {
com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true ticketCache="/var//krb5cc_userlogin";
};
The same ticket cache file is used by the first connection - which succeeds. While the second connection fails. I am not even sure as to how to debug it (tried remote debugging and since the initial control connection is an Async call, unable to get to the actual error).
We are using com.datastax.cassandra:cassandra-driver-core:jar:3.6.0
Any ideas/help to debug / resolve this will be highly appreciated

zookeeper connection timing out, kafa-spark streaming

I'm trying some exercise with spark streaming with kafka. If I use kafka producer and consumer in command line, I can publish and consume the messages in kafka. When I try to do it using pyspark in jupyter notebook. I am getting zookeeper connection timeout error.
Client session timed out, have not heard from server in 6004ms for sessionid 0x0, closing socket connection and attempting reconnect
[2017-08-04 15:49:37,494] INFO Initiating client connection, connectString=127.0.0.1:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient#158da8e (org.apache.zookeeper.ZooKeeper)
[2017-08-04 15:49:37,524] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)
[2017-08-04 15:49:37,527] INFO Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2017-08-04 15:49:37,533] WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
[2017-08-04 15:49:38,637] INFO Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2017-08-04 15:49:38,639] WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
`
Zookeeper has issues when using localhost (127.0.0.1). Described in https://issues.apache.org/jira/browse/ZOOKEEPER-1661?focusedCommentId=13599352
This little program explains the following things:
ZooKeeper does call InetAddress.getAllByName (see StaticHostProvider:60) on the connect string "localhost:2181" => as a result it gets 3 different addresses for localhost which then get shuffled (Collections.shuffle(this.serverAddresses): L72
Because of the shuffling (random), the call to StaticHostProvider.next will sometime return the fe80:0:0:0:0:0:0:1%1 address which as you can see from this small program times out after 5s => this explains the randomness I am experiencing.
It really seems to me that what I am experiencing is a reverse dns lookup issue with IPv6. Whether this reverse dns lookup is actually useful and required by ZooKeeper, I do not know. It did not behave this way in 3.3.3.
Solution, specify your zookeeper address as a FQDN and make sure the reverse lookup works or use 0.0.0.0 instead of localhost.

I can't get AMQP publish and subscribe to run with Node JS v6 and mqlight v2.0 from IBM MQ v9.0.1.0

I am trying to get the example snippet to publish and subscribe below, I can't get it to run with Node JS 6 and mqlight v2.0
https://www.npmjs.com/package/mqlight?cm_mc_uid=47189062138014548006442&cm_mc_sid_50200000=1490060435
// Receive:
var mqlight = require('mqlight');
var recvClient = mqlight.createClient({service: 'amqp://user:user#localhost:5672'});
recvClient.on('started', function() {
recvClient.subscribe('/TEST/#','sub1');
recvClient.on('message', function(data, delivery) {
console.log(data);
});
});
// Send:
var sendClient = mqlight.createClient({service: 'amqp://user:user#localhost:5672'});
sendClient.on('started', function() {
sendClient.send('TEST');
});
i run the sample code mqlight 2.0 with node js v6
$node mqlight_sample.js
events.js:160
throw er; // Unhandled 'error' event
^
SecurityError: AMQXR0100E: A connection from 172.17.0.1 was not authorized.
at lookupError (/media/Data/mqlight/node_modules/mqlight/mqlight.js:1034:11)
at AMQPClient.<anonymous> (/media/anonim/Data/mqlight/node_modules/mqlight/mqlight.js:1925:13)
at emitOne (events.js:96:13)
at AMQPClient.emit (events.js:188:7)
at Connection.<anonymous> (/media/anonim/Data/mqlight/node_modules/amqp10/lib/amqp_client.js:388:10)
at emitOne (events.js:96:13)
at Connection.emit (events.js:188:7)
at Connection._processCloseFrame (/media/anonim/Data/mqlight/node_modules/amqp10/lib/connection.js:495:10)
at Connection._receiveAny (/media/anonim/Data/mqlight/node_modules/amqp10/lib/connection.js:423:12)
at Connection._receiveData (/media/anonim/Data/mqlight/node_modules/amqp10/lib/connection.js:357:8)
at NetTransport.<anonymous> (/media/anonim/Data/mqlight/node_modules/amqp10/lib/connection.js:515:38)
at emitOne (events.js:96:13)
at NetTransport.emit (events.js:188:7)
at Socket.<anonymous> (/media/anonim/Data/mqlight/node_modules/amqp10/lib/transport/net_transport.js:26:49)
at emitOne (events.js:96:13)
at Socket.emit (events.js:188:7)
this one error log from MQ Server
# tail -100f /var/mqm/qmgrs/QM1/errors/amqp_0.log
3/31/17 19:14:44.115 AMQXR0041E: A connection was not authorized for channel SYSTEM.DEF.AMQP received from 172.17.0.1. MQRC 2035 MQRC_NOT_AUTHORIZED
3/31/17 19:14:45.142 AMQXR0041E: A connection was not authorized for channel SYSTEM.DEF.AMQP received from 172.17.0.1. MQRC 2035 MQRC_NOT_AUTHORIZED
actually authenticate for AMQP is enabled if CONNAUTH and CHCKCLNT required changed to disabled i can connected with Node JS 6
START SERVICE(SYSTEM.AMQP.SERVICE)
SET CHLAUTH(SYSTEM.DEF.AMQP) TYPE(BLOCKUSER) USERLIST('nobody')
SET CHLAUTH(SYSTEM.DEF.AMQP) TYPE(ADDRESSMAP) ADDRESS(*) USERSRC(CHANNEL) CHCKCLNT(REQUIRED)
REFRESH SECURITY TYPE(CONNAUTH)
START CHANNEL(SYSTEM.DEF.AMQP)
DISPLAY CHSTATUS(SYSTEM.DEF.AMQP) CHLTYPE(AMQP)
below the error log from /var/mqm/qmgrs/QM1/errors/AMQERR01.LOG
04/02/17 07:10:16 - Process(587.6) User(mqm) Program(java)
Host(770e29171038) Installation(Installation1)
VRMF(9.0.1.0) QMgr(QM1)
AMQ5534: User ID 'user' authentication failed
EXPLANATION:
The user ID and password supplied by the 'AMQP' program could not be
authenticated.
Additional information: 'N/A'.
ACTION:
Ensure that the correct user ID and password are provided by the application.
Ensure that the authentication repository is correctly configured. Look at
previous error messages for any additional information.
----- amqzfuca.c : 4486 -------------------------------------------------------
04/02/17 07:10:16 - Process(587.6) User(mqm) Program(java)
Host(770e29171038) Installation(Installation1)
VRMF(9.0.1.0) QMgr(QM1)
AMQ5542: The failed authentication check was caused by the queue manager
CONNAUTH CHCKCLNT(REQDADM) configuration.
EXPLANATION:
The user ID 'user' and its password were checked because the queue manager
connection authority (CONNAUTH) configuration refers to an authentication
information (AUTHINFO) object named 'USE.OS' with CHCKCLNT(REQDADM).
This message accompanies a previous error to clarify the reason for the user ID
and password check.
ACTION:
Refer to the previous error for more information.
Ensure that a password is specified by the client application and that the
password is correct for the user ID. The authentication configuration of the
queue manager connection determines the user ID repository. For example, the
local operating system user database or an LDAP server.
If the CHCKCLNT setting is OPTIONAL, the authentication check can be avoided by
not passing a user ID across the channel. For example, by omitting the MQCSP
structure from the client MQCONNX API call.
To avoid the authentication check, you can amend the authentication
configuration of the queue manager connection, but you should generally not
allow unauthenticated remote access.
-------------------------------------------------------------------------------
04/02/17 07:10:17 - Process(587.6) User(mqm) Program(java)
Host(770e29171038) Installation(Installation1)
VRMF(9.0.1.0) QMgr(QM1)
AMQ5534: User ID 'user' authentication failed
EXPLANATION:
The user ID and password supplied by the 'AMQP' program could not be
authenticated.
Additional information: 'N/A'.
ACTION:
Ensure that the correct user ID and password are provided by the application.
Ensure that the authentication repository is correctly configured. Look at
previous error messages for any additional information.
----- amqzfuca.c : 4486 -------------------------------------------------------
04/02/17 07:10:17 - Process(587.6) User(mqm) Program(java)
Host(770e29171038) Installation(Installation1)
VRMF(9.0.1.0) QMgr(QM1)
AMQ5542: The failed authentication check was caused by the queue manager
CONNAUTH CHCKCLNT(REQDADM) configuration.
EXPLANATION:
The user ID 'user' and its password were checked because the queue manager
connection authority (CONNAUTH) configuration refers to an authentication
information (AUTHINFO) object named 'USE.OS' with CHCKCLNT(REQDADM).
This message accompanies a previous error to clarify the reason for the user ID
and password check.
ACTION:
Refer to the previous error for more information.
Ensure that a password is specified by the client application and that the
password is correct for the user ID. The authentication configuration of the
queue manager connection determines the user ID repository. For example, the
local operating system user database or an LDAP server.
If the CHCKCLNT setting is OPTIONAL, the authentication check can be avoided by
not passing a user ID across the channel. For example, by omitting the MQCSP
structure from the client MQCONNX API call.
To avoid the authentication check, you can amend the authentication
configuration of the queue manager connection, but you should generally not
allow unauthenticated remote access.
-------------------------------------------------------------------------------
SASL flow has been changed within the new Node JS client version. The new SASL flow is currently not supported by the IBM AMQP server. The AMQP server thinks that at this moment it should already have enough data for authentication and authorization of the client user. However, because of the change in the new Node JS client, the rest of the required data has not yet been sent when the server tries to authenticate the client. This is why the logs show that only the user 'mqm' has been set and no password supplied to the QMgr. Thus causing an authorization error APAR IT20283
In reviewing the error logs from the queue manager it appears that MQ is not able to authenticate the user being passed to the AMQP channel via the mqlight_sample.js program.
Please try the following two commands and note the output:
echo 'goodpassword' | /opt/mqm/bin/security/amqoamax user ; echo $?
echo 'badpassword' | /opt/mqm/bin/security/amqoamax user ; echo $?
OP noted the output was 0 and 1 for the above commands. This means that MQ can properly authenticate the the UserId "user" with a correct password since it returns 0.
Next please create a normal SVRCONN channel on the queue manager and try the following sample program, this would again rule out MQ and CONNAUTH being an issue.
echo 'goodpassword' | amqscnxc -x 'localhost(5672)' -c SVRCONN.CHANNEL -u user QM1; echo $?
The output if good should look like this:
Sample AMQSCNXC start
Connecting to queue manager QM1
using the server connection channel SVRCONN.CHANNEL
on connection name localhost(5672).
Enter password: Connection established to queue manager QM1
Sample AMQSCNXC end
0
If output if it fails should look like this:
Sample AMQSCNXC start
Connecting to queue manager QM1
using the server connection channel SVRCONN.CHANNEL
on connection name localhost(5672).
Enter password: MQCONNX ended with reason code 2035
243
If the above test is also successful then please verify that the mqlight_sample.js has the same user and goodpassword values that worked with the two tests.
If you find that the UserID and password are correct, then it would appear that the amqp program is not passing the password correctly and someone else with more AMQP knowledge would need to help.
Update 2017-04-28
OP #dhaavhincy has posted a new answer that per IBM the issue was a result of the SASL flow in Node JS v6 being changed and incompatible with IBM MQ AMQP. IBM has provided that this will be fixed via APAR IT20283 which has not been published to the web.
Update 2017-06-20
APAR IT20283 was published to the web around May 22nd.

Kafka through CNAMEs/load balancer when using Kerberos?

I'm mainly looking for advice here around Kafka and disaster recovery failover.
Is there any way to use Kafka through CNAMEs/load balancer when using Kerberos?
When trying it, I get the below SPN error. This makes sense and I would fully expect this behaviour.
The only way I could picture this working would be to include a CNAME resolver into the Java client code before establishing a connection:
#Using the New Consumer API
#On any new connections, do the following:
1) Provide CNAME hostname in config
2) Resolve CNAME to list of A records for broker hosts
3) Pass these into the New Consumer as the bootstrap servers
This should work, however it would involve custom code.
The same concept applies for publishing to a topic.
Are there any ideas that might work without having to resort to this?
I am using CDH 5 with Cloudera-managed keytab distribution.
Consumer log
17/03/01 14:12:06 DEBUG consumer.KafkaConsumer: Subscribed to topic(s): build_smoke_test
17/03/01 14:12:06 DEBUG clients.NetworkClient: Initiating connection to node -1 at lb.cdh-poc-cluster.internal.cdhnetwork:9093.
17/03/01 14:12:06 DEBUG authenticator.SaslClientAuthenticator: Set SASL client state to SEND_HANDSHAKE_REQUEST
17/03/01 14:12:06 DEBUG authenticator.SaslClientAuthenticator: Creating SaslClient: client=alex#CDH-POC-CLUSTER.INTERNAL.CDHNETWORK;service=kafka;serviceHostname=lb.cdh-poc-cluster.internal.cdhnetwork;mechs=[GSSAPI]
17/03/01 14:12:06 DEBUG network.Selector: Connection with lb.cdh-poc-cluster.internal.cdhnetwork/172.3.1.10 disconnected
java.io.EOFException
at org.apache.kafka.common.network.SslTransportLayer.read(SslTransportLayer.java:488)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:81)
Broker log
2017-03-01 14:12:08,330 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Set SASL server state to HANDSHAKE_REQUEST
2017-03-01 14:12:08,330 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Handle Kafka request SASL_HANDSHAKE
2017-03-01 14:12:08,330 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Using SASL mechanism 'GSSAPI' provided by client
2017-03-01 14:12:08,331 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Creating SaslServer for kafka/kf0.cdh-poc-cluster.internal.cdhnetwork#CDH-POC-CLUSTER.INTERNAL.CDHNETWORK with mechanism GSSAPI
2017-03-01 14:12:08,331 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Set SASL server state to AUTHENTICATE
2017-03-01 14:12:08,334 DEBUG org.apache.kafka.common.security.authenticator.SaslServerAuthenticator: Set SASL server state to FAILED
2017-03-01 14:12:08,334 DEBUG org.apache.kafka.common.network.Selector: Connection with lb.cdh-poc-cluster.internal.cdhnetwork/172.3.1.10 disconnected
java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
at org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:243)
at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at kafka.network.Processor.poll(SocketServer.scala:472)
at kafka.network.Processor.run(SocketServer.scala:412)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199)
at org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:228)
... 6 more
Did you have a SPN registered against your CNAME (and if not, might that also be a solution)?

We are running a map reduce/spark job to bulk load hbase data in One of the environment

We are running a map reduce/spark job to bulk load hbase data in one of the environments.
While running it, connection to the hbase zookeeper cannot initialize throwing the following error.
16/05/10 06:36:10 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=c321shu.int.westgroup.com:2181,c149jub.int.westgroup.com:2181,c167rvm.int.westgroup.com:2181 sessionTimeout=90000 watcher=hconnection-0x74b47a30, quorum=c321shu.int.westgroup.com:2181,c149jub.int.westgroup.com:2181,c167rvm.int.westgroup.com:2181, baseZNode=/hbase
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Opening socket connection to server c321shu.int.westgroup.com/10.204.152.28:2181. Will not attempt to authenticate using SASL (unknown error)
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.204.24.16:35740, server: c321shu.int.westgroup.com/10.204.152.28:2181
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Session establishment complete on server c321shu.int.westgroup.com/10.204.152.28:2181, sessionid = 0x5534bebb441bd3f, negotiated timeout = 60000
16/05/10 06:36:11 INFO mapreduce.HFileOutputFormat2: Looking up current regions for table ecpdevv1patents:NormNovusDemo
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Tue May 10 06:36:11 CDT 2016, org.apache.hadoop.hbase.client.RpcRetryingCaller#3927df20, java.io.IOException: Call to c873gpv.int.westgroup.com/10.204.67.9:60020 failed on local exception: java.io.EOFException
We have executed the same job in Titan DEV too but facing the same problem. Please let us know if anyone has faced the same problem before.
Details are,
• Earlier job was failing to connect to localhost/127.0.0.1:2181. Hence only the property hbase.zookeeper.quorum has been set in map reduce code with c149jub.int.westgroup.com,c321shu.int.westgroup.com,c167rvm.int.westgroup.com which we got from hbase-site.xml.
• We are using jars of cdh version 5.3.3.

Resources