What is LongAdder related to cassandra+spark connector?

What is LongAdder related to cassandra+spark connector? - apache-spark

When i load data into cassandra with using databricks, its getting the issue with
Caused by: java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
Its simple saveToCassandra to table.
I looked this twitter jsr166e jar in maven , its very old, added in 2013,
I don't know why this jar is not available in Spark+cassandra_coonector

That error indicates you are missing dependencies and / or the Spark Cassandra connector is not on the runtime classpath of the Spark application. Not sure how you installed the connector but you should have used the packages method to ensure that dependencies are met and the Connector is correctly configured.
Read more HERE
Hope that helps,
Pat

Related

Spark 2.3 - Log4j Vunlerability

In our project, running spark 2.3 with 7 nodes.
Recently as part of Security scan, log4j vulnerability is reported by security Team.
We can see log4j 1.x jar in the spark folder (/opt/spark/jars/log4j-1.2.17.jar).
We tried to replace the jar with log4j 2.17.1 version and tried to run the spark. But Spark is failing with "NoClassDefFoundError" for class org/apache/log4j/or/RendererMap
Please help me to resolve this issue.

Try using log4j-1.2-api of version 2.17.1
https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-1.2-api

You need to copy 3 jars(core,api,bridge) from https://archive.apache.org/dist/logging/log4j/ and put in spark/jar folder.
Refer this page for details.
https://logging.apache.org/log4j/2.x/manual/migration.html

How to use EMRFS S3-optimized committer without EMR?

I want to use EMRFS S3-optimized committer locally without EMR cluster.
I have set "fs.s3a.impl" = "com.amazon.ws.emr.hadoop.fs.EmrFileSystem" instead of "org.apache.hadoop.fs.s3a.S3AFileSystem" and following exception raised:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
Tried to use following packages from maven without any success:
com.amazonaws:aws-java-sdk:1.12.71
com.amazonaws:aws-java-sdk-emr:1.12.70

Sorry, but using EMRFS, including the S3-optimized committer, is not possible off of EMR.
EMRFS is not an open source package, nor is the library available in Maven Central. This is why the class is not found when you try to add aws-java-sdk-emr as a dependency; that package is solely for the AWS Java SDK client package used when interfacing with the EMR service (e.g., to create clusters).

java.lang.NoSuchMethodError: org.apache.solr.client.solrj.impl.CloudSolrClient$Builder.withHttpClient

I am following this example to get data from Solr to my Scala Spark program. Below is my code:
val solrURL = "someurl"
val collectionName = "somecollection"
val solrRDD = new SelectSolrRDD(solrURL,collectionName,sc)
val solrQuery=new SolrQuery("somequery")
solrQuery.setTimeAllowed(0)
val solrDataRDD=solrRDD.query(solrQuery)
When I run this code on my local Spark cluster, I get the following exception at new selectSolrRDD line:
java.lang.NoSuchMethodError: org.apache.solr.client.solrj.impl.CloudSolrClient$Builder.withHttpClient(Lorg/apache/http/client/HttpClient;)Lorg/apache/solr/client/solrj/impl/SolrClientBuilder;
I looked at some other answers on StackOverflow but nothing worked.

The problem is with your packaging and deployment (your pom.xml assuming you are using maven). The issue is that the Solr client libraries are not being loaded when you run your Spark app. You need to package your app and any dependencies into an "uber jar" for deployment to a cluster.
Take a look at how spark-solr has it setup. They use the maven-shade-plugin to generate the uber jar.

My cluster had jars of spark-solr already present which were conflicting with the jars I was using. After removing those jars, my code worked correctly.

Providing Hive support to a deployed Apache Spark

I need to use Hive-specific features in Spark SQL, however I have to work with an already deployed Apache Spark instance that, unfortunately, doesn't have Hive support compiled in.
What would I have to do to include Hive support for my job?
I tried using the spark.sql.hive.metastore.jars setting, but then I always get these exceptions:
DataNucleus.Persistence: Error creating validator of type org.datanucleus.properties.CorePropertyValidator
ClassLoaderResolver for class "" gave error on creation : {1}
and
org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.
In the setting I am providing a fat-jar of spark-hive (excluded spark-core and spark-sql) with all its optional Hadoop dependencies (CDH-specific versions of hadoop-archives, hadoop-common, hadoop-hdfs, hadoop-mapreduce-client-core, hadoop-yarn-api, hadoop-yarn-client and hadoop-yarn-common).
I am also specifying spark.sql.hive.metastore.version with the value 1.2.1
I am using CDH5.3.1 (with Hadoop 2.5.0) and Spark 1.5.2 on Scala 2.10

Query Hive table created with built-in Serde from Spark app

I have an hadoop cluster deployed using Hortonwork's HDP 2.2 (Spark 1.2.1 & Hive 0.14)
I have developped a simple Spark app that is supposed to retrieve the content of a Hive table, perform some actions and output to a file. The Hive table was imported using Hive's built-in SerDe.
When I run the app on the cluster I get the following exception :
ERROR log: error in initSerDe: java.lang.ClassNotFoundException Class org.apache.hadoop.hive.serde2.OpenCSVSerde not found
java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.serde2.OpenCSVSerde not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1982)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:337)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:281)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:631)
at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:189)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1017)
...
Basically, Spark doesn't find Hive's SerDe (org.apache.hadoop.hive.serde2.OpenCSVSerde)
I didn't find any jar to include at the app's execution and no mention of a similar problem anywhere. I have no idea how to tell Spark where to find it.

Make a shaded JAR of your application which includes hive-serde JAR. Refer this

add jar file in spark config spark.driver.extraClassPath.
Any external jar must be added here , then spark environment will automatically load them.
Or use spark-shell --jars command
example
spark.executor.extraClassPath /usr/lib/hadoop/lib/csv-serde-0.9.1.jar

The .jar was in hive's lib folder, just had to add it on launch with --jar and know where to look !
--jars /usr/hdp/XXX/hive/lib/hive-serde-XXX.jar

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What is LongAdder related to cassandra+spark connector? - apache-spark

Related

Spark 2.3 - Log4j Vunlerability

How to use EMRFS S3-optimized committer without EMR?

java.lang.NoSuchMethodError: org.apache.solr.client.solrj.impl.CloudSolrClient$Builder.withHttpClient

Providing Hive support to a deployed Apache Spark

Query Hive table created with built-in Serde from Spark app

Categories

Resources