Configuration Error in Cassandra Kerberos Authentication with Java - cassandra

I am trying to connect to Cassandra cluster through Kerberos ticket cache with following set of configs -
java -Dcassaandra.ip.address=<IPaddress> \
-Djava.security.auth.login.config=kerb-client.conf \
-Dsun.security.krb5.debug=true \
-Djavax.security.auth.useSubjectCredsOnly=false \
-Djava.security.krb5.conf=krb5.conf -jar test-kerberos.jar
kerb-client.conf:
CassandraJavaClient {
com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true ticketCache=cacheFile principal="abc#abc.net";
};
I am getting following error -
Unexpected error during transport initialization (java.lang.SecurityException: java.io.IOException: Configuration Error:
Line 3: expected [option value])
at com.datastax.driver.core.Connection$2.apply(Connection.java:205)
at com.datastax.driver.core.Connection$2.apply(Connection.java:191)
It seems like that there is some config missing but i am unable to identify the root cause.
Please advise on the corrective steps.

The Java driver needs to have a custom authenticator configured in order to authenticate using Kerberos.
Instaclustr has an open-source Kerberos authenticator for Cassandra that works with the Java driver. Details and code are available here:
Cassandra Kerberos authenticator
Java driver Kerberos plugin
Cheers!

Thanks for your reply. I am already using com.datastax.cassandra:cassandra-driver-core:3.5.1. Below is my sample code snippet.
Cluster cluster = Cluster.builder()
.addContactPoints(ipAddress)
.withAuthProvider(KerberosAuthProvider.builder().withSaslProperties(saslProperties).build())
.withSSL()
.build();
cluster.connect();
The issue got fixed after changing the ticketCache file name. It seems like that i was referring to incorrect cache file.Now, i seem to be getting something else - No valid credentials exception provided.
But i have moved ahead from basic configuration error.

Related

Providing AWS_PROFILE when reading S3 files with Spark

I want my Spark app (Scala) to be able to read S3 files
spark.read.parquet("s3://my-bucket-name/my-object-key")
On my dev machine, I could access S3 files using awscli a pre-configured profile in ~/.aws/config or ~/.aws/credentials, like:
aws --profile my-profile s3 ls s3://my-bucket-name/my-object-key
But when trying to read those files from Spark, with the aws_profile provided as an env variable (AWS_PROFILE), I got the following error:
doesBucketExist on my-bucket-name: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
Also tried to provide the profile as a JVM option (-Daws.profile=my-profile), with no luck.
Thanks for reading.
The solution is to provide the spark property: fs.s3a.aws.credentials.provider, setting it to com.amazonaws.auth.profile.ProfileCredentialsProvider.
If I could change the code to build the Spark Session, then something like:
SparkSession
.builder()
.config("fs.s3a.aws.credentials.provider","com.amazonaws.auth.profile.ProfileCredentialsProvider")
.getOrCreate()
The other way is to provide the JVM option -Dspark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.profile.ProfileCredentialsProvider.*NOTE the prefix spark.hadoop
If problems arise still after setting fs.s3a.aws.credentials.provider to com.amazonaws.auth.profile.ProfileCredentialsProvider and correctly setting AWS_PROFILE, it might be because you're using Hadoop 2 for which the above configuration is not supported.
Therefore, the only workaround I found was to upgrade to Hadoop 3.
Check this post and Hadoop docs for more information.

Kerberos: Spark UGI credentials are not getting passed down to Hive

I'm using Spark-2.4, I have a Kerberos enabled cluster where I'm trying to run a query via the spark-sql shell.
The simplified setup basically looks like this: spark-sql shell running on one host in a Yarn cluster -> external hive-metastore running one host -> S3 to store table data.
When I launch the spark-sql shell with DEBUG logging enabled, this is what I see in the logs:
> bin/spark-sql --proxy-user proxy_user
...
DEBUG HiveDelegationTokenProvider: Getting Hive delegation token for proxy_user against hive/_HOST#REALM.COM at thrift://hive-metastore:9083
DEBUG UserGroupInformation: PrivilegedAction as:spark/spark_host#REALM.COM (auth:KERBEROS) from:org.apache.spark.deploy.security.HiveDelegationTokenProvider.doAsRealUser(HiveDelegationTokenProvider.scala:130)
This means that Spark made a call to fetch the delegation token from the Hive metastore and then added it to the list of credentials for the UGI. This is the piece of code in Spark which does that. I also verified in the metastore logs that the get_delegation_token() call was being made.
Now when I run a simple query like create table test_table (id int) location "s3://some/prefix"; I get hit with an AWS credentials error. I modified the hive metastore code and added this right before the file system in Hadoop is initialized (org/apache/hadoop/hive/metastore/Warehouse.java):
public static FileSystem getFs(Path f, Configuration conf) throws MetaException {
...
try {
// get the current user
UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
LOG.info("UGI information: " + ugi);
Collection<Token<? extends TokenIdentifier>> tokens = ugi.getCredentials().getAllTokens();
// print all the tokens it has
for(Token token : tokens) {
LOG.info(token);
}
} catch (IOException e) {
e.printStackTrace();
}
...
}
In the metastore logs, this does print the correct UGI information:
UGI information: proxy_user (auth:PROXY) via hive/hive-metastore#REALM.COM (auth:KERBEROS)
but there are no tokens present in the UGI. Looks like Spark code adds it with the alias hive.server2.delegation.token but I don't see it in the UGI. This makes me suspect that somehow the UGI scope is isolated and not being shared between spark-sql and hive metastore. How do I go about solving this?
Spark is not picking up your Kerberos identity -it asks each FS to issue some "delegation token" which lets the caller interact with that service and that service alone. This is more restricted and so more secure.
The problem here is that spark collects delegation tokens from every filesystem which can issue them -and as your S3 connector isn't issuing any, nothing is coming down.
Now, Apache Hadoop 3.3.0's S3A connector can be set to issue your AWS credentials inside a delegation token, or, for bonus security, ask AWS for session credentials and send only those over. But (a) you need a spark build with those dependencies, and (b) Hive needs to be using those credentials to talk to S3.

Using Apache Spark with a local S3-compatible Object store

I am trying to run a simple Apache spark (Cloudera) read operation using a local object store that is fully s3 sdk/api compatible. But I can not seem to figure out how to get Spark to understand that I am trying to access a local S3 bucket and not remote AWS/S3.
Here's what I've tried...
pyspark2 --conf spark.hadoop.hadoop.security.credential.provider.path=jceks://hdfs/user/myusername/awskeyfile.jceks --conf fs.s3a.endpoint=https://myenvironment.domain.com
df = spark.read.parquet("s3a://mybucket/path1/")
Error message...
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to mybucket.s3.amazonaws.com:443 [mybucket.s3.amazonaws.com/12.345.678.90] failed: Connection refused (Connection refused)
I can list the local bucket contents without issue on the command-line so I know that I have the access/secret key correct but I need to make Spark understand not to reach out to aws to try and resolve the bucket url.
Thanks.
Update / Resolution:
The fix to the issue was a missing prerequisite jar at maven coordinates: org.apache.hadoop:hadoop-aws:2.6.0
So the final pyspark call looked like:
pyspark2 --conf spark.hadoop.hadoop.security.credential.provider.path=jceks://hdfs/user/myusername/awskeyfile.jceks --conf fs.s3a.endpoint=https://myenvironment.domain.com --jars hadoop-aws-2.6.0.jar
df = spark.read.parquet("s3a://mybucket/path1/")
This is covered in HDP docs, Working with third party object stores.
Settings are the same for CDH.
It comes down
endpoint fs.s3a.endpoint = hostname
disable DNS to bucket map fs.s3a.path.style.access = true
play with signing options.
There are a few other switches you can turn for better compatibility; they're in those docs.
You might find the Cloudstore storediag command useful.

Access Openstack Swift from Spark - SwiftAuthenticationFailedException

I am attempting to access Openstack Swift from Spark 2.4 but I get an error.
org.apache.hadoop.fs.swift.exceptions.SwiftAuthenticationFailedException: Authenticate as tenant '78axxxxxxxxxxxxxxxxxxxxxxxxxxxx' PasswordCredentials{username='xxxxxxxxxxxx'}
sc.hadoopConfiguration.set(s"fs.swift.service.ovh.auth.url", "https://auth.cloud.ovh.net/v3/")
sc.hadoopConfiguration.set(s"fs.swift.service.ovh.tenant", "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
sc.hadoopConfiguration.set(s"fs.swift.service.ovh.username", "xxxxxxxxxxxx")
sc.hadoopConfiguration.set(s"fs.swift.service.ovh.password", "xxxxxxxxxxxxxxxxxxxx")
sc.hadoopConfiguration.set(s"fs.swift.service.ovh.http.port", "8080")
sc.hadoopConfiguration.set(s"fs.swift.service.ovh.region", "BHS3")
sc.hadoopConfiguration.set(s"fs.swift.service.ovh.public", "false")
I believe that these credentials are correct as they came directly from the openstack rc file and I can use them fine when using python-swiftclient. I have also tried using the v2.0 endpoint without success.
Unfortunately I always get this very generic error message and it won't tell me which part is failing. Is there any way to debug this better?
I used the example below which I received from the OVH spark submit team.
An important note is to use the tenant name instead of the tenant id from the openstack.rc file.
val hadoopConf = spark.sparkContext.hadoopConfiguration
hadoopConf.set("fs.swift.impl","org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem")
hadoopConf.set("fs.swift.service.auth.endpoint.prefix","/AUTH_")
hadoopConf.set("fs.swift.service.abc.http.port","443")
hadoopConf.set("fs.swift.service.abc.auth.url","https://auth.cloud.ovh.net/v2.0/tokens")
hadoopConf.set("fs.swift.service.abc.tenant","<TENANT NAME> or <PROJECT NAME>")
hadoopConf.set("fs.swift.service.abc.region","<REGION NAME>")
hadoopConf.set("fs.swift.service.abc.useApikey","false")
hadoopConf.set("fs.swift.service.abc.username","<USER NAME>")
hadoopConf.set("fs.swift.service.abc.password","<PASSWORD>")
https://github.com/mojtabaimani/spark-wordcount-swift-scala/blob/master/src/main/scala/com/ovh/example/SparkScalaApp.scala

mesos-slave can not connect No credentials provided error

I am new to mesos.
After starting mesos-master, I tried to connect mesos-slave with the following command
/usr/sbin/mesos-slave --ip=192.192.7.180 --master=192.192.7.19:5050 --work_dir=/tmp/mesos/work/int --no-systemd_enable_support
It is not connecting to master. It is throwing the following error
No credentials provided. Attempting to register without authentication
Thank you in advance.
No credentials provided meaning you are trying to load a slave which is on a different network which is not configured in mesos config files.
once you register it then you can add it.
for the example sake try
for master
./bin/mesos-master.sh –ip=127.0.0.1 –work_dir=/var/lib/mesos
for slave
./bin/mesos-slave.sh –master=127.0.0.1:5050 –work_dir=/tmp/mesos –no-systemd_enable_support
open browser
localhost:5050
if all the steps are followed properly you should find a Mesos dashboard more details

Resources