I have a spark job that I call with my shell scripts using the spark submit command line tool.
I need to pass the credentials for DB2 connection username and password to the jar file.
The requirement is that the username and password be not readable.
Can anyone help with this?
I would try using hadoop credential API
Hadoop has introduced keystore since version hadoop 2.6. You can pass the key store path and alias to your spark program and get the password from the keystore at run time
https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/CommandsManual.html#credential
Here is sample Java program
https://apache.googlesource.com/hadoop-common/+/refs/heads/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/alias/JavaKeyStoreProvider.java
Related
I'm running a PySpark (Spark 3.1.1) application in cluster mode on YARN cluster, which is supposed to process input data and send appropriate kafka messages to a given topic.
Data manipulation part is already covered, however I struggle to use kafka-python library to send the notifications. The problem is that it can't find a valid kerberos ticket to authenticate to kafka cluster.
While executing spark3-submit I add --principal and --keytab properties (equivalents to spark.kerberos.keytab and spark.kerberos.principal). Moreover, I am able to access HDFS and HBase resources.
Does Spark store TGT in a ticket cache that I can reference by setting krb5ccname variable? I am not able to locate a valid kerberos ticket while the app is running.
Is it common to issue kinit from PySpark application to create a ticket to get an access to resources outside HDFS etc.? I tried using krbticket module to issue kinit command from the app (using keytab that I pass as a parameter to spark3-submit), however then the process hangs.
Is there a way, we can execute Spark code(package in jar) from Nifi, using Livy?
I can see in Nifi that using ExecuteSparkInteractive, we can submit custom code which can be run in spark cluster using livy. but what i want is , pass name of the jar file , main class in Nifi that connects Spark via Livy.
I see below article on this, but seems option like Session JARs is not available in plain Nifi installation.
https://community.cloudera.com/t5/Community-Articles/Apache-Livy-Apache-NiFi-Apache-Spark-Executing-Scala-Classes/ta-p/247985
In order to create a DBLink from Oracle to Cassandra, I am trying to have a connection in Cassandra via username and password.
By default, the installation of Cassandra does not ask for username.
Searching here, I found a topic where it is described the steps for that.
unfortunately, when I modify the authenticator and authorizer parameters, the Cassandra CQL Shell opens and immediately closes.
Cassandra Installer version: datastax-ddc-64bit-3.9.0.msi
OS: Windows 7
Can someone tell me how to solve this, please?
Thank you,
To Solve CQL Shell error edit cqlshrc file.
Default location in windows is C:\Users\USER\.cassandra where USER is windows username.
In cqlshrc file edit following:
[authentication]
;; If Cassandra has auth enabled, fill out these options
username = cassandra
password = cassandra
Note cassandra is default username & password which is also a superuser.
I have a Spark application that I am submitting to the Bluemix Spark Cluster. It reads from a DASHDB database and writes the results to Cloudant. The code accesses the DASHDB using both Spark and JDBC.
The userid & password for the DASHDB database are passed as arguments to the program. I can pass these parameters via spark-submit but I don't think that would be secure. In the code I need to know the credentials of the DASHDB database because I am using JDBC to connect to various tables.
I am trying to find the "Best Practices" way to pass credentials using spark-submit in a secure manner.
Thanks in advance - John
I think the jdbc driver will always need username and password to connect to database so that is out of question as you are in multi-tenant enviornment on bluemix.
Now about spark-submit.sh to read the arguments securely, that option is not available yet.
Thanks,
Charles.
Based on the answer here, my preference would be to pass a properties file that has the credentials. Other tenants will not be able to read the properties file, but you will be able to read if from your spark application, e.g. as a dataframe spfrom which you can access the parameters.
I am trying to write dataframe to sqlserver using spark. I am using the method write for dataframewriter to write to sql server.
Using DriverManager.getConnection I am able to get connection of sqlserver and able to write but when using jdbc method and passing uri I am getting "No suitable driver found".
I have passed the jtds jar in the --jars in spark-shell.
Spark version : 1.4
The issue is that spark is not finding driver jar file. So download jar and place in all worker nodes of spark cluster on the same path and add this path to SPARK_CLASSPATH in spark-env.sh file
as follow
SPARK_CLASSPATH=/home/mysql-connector-java-5.1.6.jar
hope it will help