SparkSQL Hive Error : "HikariCP" not found in the CLASSPATH - apache-spark

I configured Hive with mySQL as my metastore. I can enter hive shell and create tables successfully.
Spark version: 2.4.0
Hive version: 3.1.1
When I try to run a SparkSQL program using spark submit, I'm getting the below error.
2019-03-02 15:43:41 WARN HiveMetaStore:622 - Retrying creating default database after error: Error creating transactional connection factory
javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
......
......
Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
......
......
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "HikariCP" plugin to create a ConnectionPool gave an error : The connection pool plugin of type "HikariCP" was not found in the CLASSPATH!
Please let me know if anyone can help me in this regard.

I don't know if you have already solved this problem. There is my advice.
the default database connection is HikariCP in the hive-site.xml. You can search for this in the hive-site.xml: datanucleus.connectionPoolingType. The value is HikariCP. So you need to change it to dbcp since you use Mysql as your metastore.
And at last, don't forget about adding the mysql-connector-java-5.x.x.jar to the path like
/home/hadoop/spark-2.3.0-bin-hadoop2.7/jars

Related

pyspark; Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;'

I am trying to configure spark with hive meta store, to access hive tables from spark.
When i moved hive.site.xml to spark_home/conf i am getting errors. in spark when i tried to read data from hive tables. and the same error when i tried to execute simple select query in hive shell.
error i am getting is below.
for hive
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
for spark
pyspark.sql.utils.AnalysisException: 'java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;'

Spark Cluster mode issue to read Hive-Hbase table on Kerberized Environment

Error description
We are not able execute our Spark job in yarn-cluster or yarn-client mode, though it is working fine in the local mode.
This issue occurs when we try to read the Hive-HBase tables in a Kerberized cluster.
What we have tried so far
Passing all the HBASE jar in the –jar parameter in spark submi
--jars /usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.5.3.16-1.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/current/hbase-client/lib/protobuf-java-2.5.0.jar,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar
Passing Hbase site and hive site in file parameter in Spark submit
--files /usr/hdp/2.5.3.16-1/hbase/conf/hbase-site.xml,/usr/hdp/current/spark-client/conf/hive-site.xml,/home/pasusr/pasusr.keytab
Doing Kerberos authentication inside the application. In the code we are explicitly passing the key tab
UserGroupInformation.setConfiguration(configuration)
val ugi: UserGroupInformation =
UserGroupInformation.loginUserFromKeytabAndReturnUGI(principle, keyTab)
UserGroupInformation.setLoginUser(ugi)
ConnectionFactory.createConnection(configuration)
return ugi.doAs(new PrivilegedExceptionActionConnection {
#throws[IOException]
def run: Connection = {
ConnectionFactory.createConnection(configuration) }
})
Passing key tab information in the Spark submit
Passing the HBASE jar in the spark.driver.extraClassPath and spark.executor.extraClassPath
Error Log
18/03/20 15:33:24 WARN TableInputFormatBase: You are using an HTable instance that relies on an HBase-managed Connection. This is usually due to directly creating an HTable, which is deprecated. Instead, you should create a Connection object and then request a Table instance from it. If you don't need the Table instance for your own use, you should instead use the TableInputFormatBase.initalizeTable method directly.
18/03/20 15:47:38 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 406, hadoopnode.server.name): java.lang.IllegalStateException: Error while configuring input job properties
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:444)
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:342)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=50, exceptions:
Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:679)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
I was able to resolve this by adding following configuration in the spark-env.sh
export SPARK_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar
And removing the spark.driver.extraClassPath and spark.executor.extraClassPath in which I was passing the above Jar from the Spark submit command.

Caused by:Ambari Upgradation java.sql.SQLSyntaxErrorException: Unknown table 'hostcomponentstate' in information_schema

I'm trying to upgrade Ambari 2.1 to 2.5.
Steps followed:
Stopped ambari-server
Took backup of ambari database from MySql database.
Ran ambari-server upgrade.
I'm getting this SQL Error,
Exception in thread "main" org.apache.ambari.server.AmbariException: Unknown table 'hostcomponentstate' in information_schema
at org.apache.ambari.server.upgrade.SchemaUpgradeHelper.executeUpgrade(SchemaUpgradeHelper.java:212)
at org.apache.ambari.server.upgrade.SchemaUpgradeHelper.main(SchemaUpgradeHelper.java:427)
Caused by: java.sql.SQLSyntaxErrorException: Unknown table 'hostcomponentstate' in information_schema
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:536)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:513)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:115)
at com.mysql.cj.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:1983)
at com.mysql.cj.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:1936)
at com.mysql.cj.jdbc.StatementImpl.executeQuery(StatementImpl.java:1422)
at com.mysql.cj.jdbc.DatabaseMetaData$7.forEach(DatabaseMetaData.java:3182)
at com.mysql.cj.jdbc.DatabaseMetaData$7.forEach(DatabaseMetaData.java:3170)
at com.mysql.cj.jdbc.IterateBlock.doForAll(IterateBlock.java:50)
at com.mysql.cj.jdbc.DatabaseMetaData.getPrimaryKeys(DatabaseMetaData.java:3223)
at org.apache.ambari.server.orm.DBAccessorImpl.tableHasPrimaryKey(DBAccessorImpl.java:1082)
at org.apache.ambari.server.upgrade.UpgradeCatalog211.executeHostComponentStateDDLUpdates(UpgradeCatalog211.java:204)
at org.apache.ambari.server.upgrade.UpgradeCatalog211.executeDDLUpdates(UpgradeCatalog211.java:108)
at org.apache.ambari.server.upgrade.AbstractUpgradeCatalog.upgradeSchema(AbstractUpgradeCatalog.java:925)
at org.apache.ambari.server.upgrade.SchemaUpgradeHelper.executeUpgrade(SchemaUpgradeHelper.java:209)
... 1 more
Am I missing any step or any configuration change is required?
You may try upgrading first to Ambari 2.2.x
Not sure that such a long upgrade path works flawlessly

insert data into Microsoft SQL server using Spark

I am trying to insert data into sql server using spark using the below Jdbc methods.
Option 1:
prop.put("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
dataf.write.mode(org.apache.spark.sql.SaveMode.Append).jdbc(url,table_name, prop)
Table is already created. Appending new data.Job Error-ed with the below exception
Exception in thread "main"
com.microsoft.sqlserver.jdbc.SQLServerException: CREATE TABLE
permission denied in database
Question is : Why create table permission is required for appending the data?
Option2:
prop.put("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(dataf, url, table_name, prop)
Above command working from spark-shell. when the same is used in scala code and packaged with dependencies giving below exception
Exception in thread "main" java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
I tried setting driver class-path and executor class-path and also --jars still no luck. Included sqljdbc4.jar in driver-classpath and --jars.
Copied sqljdbc4.jar to all worker nodes as well still no luck.
Any Ideas on this?
After Lot of searching and Testing, I found the answer. It might be useful for someone.
Option 1: This is because of bug in spark 1.5.X. the same was resolved
in 1.6.x and later. Because of the bug, It always try to create a new
table.
Option2: This causes because , driver name on classpath given
priority than properties we are passing as argument. Workaround for
this is to create connection and then invoke savetable.
workaround if you are using spark 1.5.x or lower.
JdbcUtils.createConnection(url, prop)
JdbcUtils.saveTable()

Hector Cassandra Connectivity Error

I am trying to connect to a local cassandra instance through a java client powered by Hector. I attempt to read rows after trying to connect. The code snippet is as follows
Cluster myCluster = HFactory.getOrCreateCluster("test" , "localhost:9160");
KeyspaceDefinition keySpaceDef = myCluster.describeKeyspace("testkeyspace");
.....
However the connectivity fails with this error
Exception in thread "main" java.lang.NoSuchFieldError: DEFAULT_MEMTABLE_OPERATIONS_IN_MILLIONS
at me.prettyprint.cassandra.service.ThriftCfDef.(ThriftCfDef.java:65)
at me.prettyprint.cassandra.service.ThriftCfDef.fromThriftList(ThriftCfDef.java:144)
at me.prettyprint.cassandra.service.ThriftKsDef.(ThriftKsDef.java:34)
at me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractCluster.java:192)
at me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractCluster.java:187)
at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232)
at me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(AbstractCluster.java:201)
I have cassandra, thrift as dependencies in my pom.xml. Any clues as to what could be wrong?

Resources