Apache Spark: Streaming without HDFS checkpoint - apache-spark

I'm implementing a Spark job which makes use of reduceByKeyAndWindow, therefore I need to add checkpointing.
From Spark's website I see that:
Checkpointing can be enabled by setting a directory in a fault-tolerant, reliable file system (e.g., HDFS, S3, etc.) to which the checkpoint information will be saved.
My application is just for academic purposes, thus I don't want to set an HDFS for checkpointing but just a local file. Doing so in MacOS works fine (setting a temporary dir as checkpoint dir), the problem comes when doing it in Windows, which throws an exception for permissions.
I already tried starting eclipse as administrator and creating the directory manually setting setWritable, setReadable and setExecutable to true. Any hint on how to overcome the problem in Windows?
Thanks!
Update Here's my code and exception. Just to clarify again, it works fine in Mac but not in Windows.
SparkConf conf = new SparkConf().setAppName("testApp").setMaster("local[2]");
JavaSparkContext ctx = new JavaSparkContext(conf);
JavaStreamingContext jsc = new JavaStreamingContext(ctx, new Duration(1000));
jsc.checkpoint(Files.createTempDir().getAbsolutePath());
Exception:
Exception in thread "pool-7-thread-3" java.lang.NullPointerException
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:678)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:661)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:639)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:905)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:783)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:772)
at org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:135)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Solved by adding the latest Hadoop libraries to my project.
If using Maven, the following set of dependencies do the trick.
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>

On Windows, you can solve this problem as following
Download winutils.exe to a folder say MY_UTILS/bin
Create an environment variable HADOOP_HOME and point it to MY_UTILS

Related

Azure blob storage client in java intellij not initializing

Running into an error while initializing context for an azure blob container storage..
Would greatly appreciate any help
>org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [listeners.AzureBackupManagerContextListener]
java.lang.NoClassDefFoundError:
[....]
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.AnnotationIntrospector$XmlExtensions
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1412)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1220)
... 76 more
I also hit this issue and was able to solve this with adding fasterxml jackson dependencies:
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.13.3</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.13.3</version>
</dependency>

Cucumber: NoSuchMethodError: cucumber.runtime.formatter.Plugins

I am getting the exception in subject on IntelliJ and have no idea why.
This is the runner file:
#RunWith(Cucumber.class)
#CucumberOptions(
features = "src/main/resources/features",
glue = {"DemoDefinitions"},
tags = "#tests"
)
public class CucumberRunner {}
This is the definitions file:
import cucumber.api.java.en.Given;
public class DemoDefinitions {
#Given("Login to Azure Succeeded")
public void login_to_Azure_Succeeded() {
// Write code here that turns the phrase above into concrete actions
throw new cucumber.api.PendingException();
}
}
This is the feature file:
#tests
Feature: PoC Feature
Scenario: PoC Operations Scenario
Given Login to Azure Succeeded
And the maven dependencies are:
<dependencies>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-core</artifactId>
<version>4.4.0</version>
</dependency>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-java</artifactId>
<version>4.2.6</version>
</dependency>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-junit</artifactId>
<version>4.2.6</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
</dependencies>
When I run execute the runner class I get:
java.lang.NoSuchMethodError:
cucumber.runtime.formatter.Plugins.(Ljava/lang/ClassLoader;Lcucumber/runtime/formatter/PluginFactory;Lcucumber/api/event/EventPublisher;Lio/cucumber/core/options/PluginOptions;)V
When I run the feature file itself I get:
Undefined scenarios:
/C:/Users/talt/IdeaProjects/Poc/src/main/resources/features/poc.feature:5
PoC Operations Scenario
1 Scenarios (1 undefined) 1 Steps (1 undefined)
Can you please advise what is the problem?
You are mixing different versions of Cucumber. Take a careful look at the version numbers.
You are also including more dependencies then strictly needed. Merely using cucumber-java and cucumber-junit would be sufficient. Both cucumber-core and junit are transitive dependencies.
After you fix your dependencies makes sure to reimport the maven project.
<dependencies>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-java</artifactId>
<version>4.4.0</version>
</dependency>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-junit</artifactId>
<version>4.4.0</version>
</dependency>
</dependencies>
The solution was to set glue to empty string:
glue = {""}
We shall not mix direct & transitive dependencies, specially their versions! Doing so can cause unpredictable outcome.Below are few errors being reported by people due to wrong use of dependencies.
The import cucumber.api.junit cannot be resolved
java.lang.NoClassDefFoundError: gherkin/IGherkinDialectProvider
import cucumber.api.DataTable; cannot be resolved
Solution: You shall add right set of cucumber dependencies.
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-junit</artifactId>
<version>4.2.6</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-picocontainer</artifactId>
<version>4.2.6</version>
<scope>test</scope>
</dependency>
Second you could say we did not have correct path pointing to glue. But just making it empty shall not be one of the solution even though it worked. We shall have correct path here, not empty string.

NoClassDefFoundError: com/aventstack/extentreports/reporter/ExtentHtmlReporter

I am trying to obtain a report in my maven project using Cucumber, and ExtentReport. I am using java 8. Here are my dependencies:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.4.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/info.cukes/cucumber-java -->
<dependency>
<groupId>info.cukes</groupId>
<artifactId>cucumber-java</artifactId>
<version>1.2.5</version>
</dependency>
<!-- https://mvnrepository.com/artifact/info.cukes/cucumber-junit -->
<dependency>
<groupId>info.cukes</groupId>
<artifactId>cucumber-junit</artifactId>
<version>1.2.5</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/com.vimalselvam/cucumber-extentsreport -->
<dependency>
<groupId>com.vimalselvam</groupId>
<artifactId>cucumber-extentsreport</artifactId>
<version>3.1.1</version>
</dependency>
</dependencies>
and then i used the below runner class, to run all my *.feature files:
import org.junit.runner.RunWith;
import cucumber.api.CucumberOptions;
import cucumber.api.junit.Cucumber;
#RunWith(Cucumber.class)
#CucumberOptions(
features = {"src/test/resources/features" },
monochrome = true,
plugin = {"com.vimalselvam.cucumber.listener.ExtentCucumberFormatter:output/report.html"},
glue = {"stepDefinitions"}
)
public class TestRunnerJUnit {
}
But when i run the test-Runner, it complains with:
cucumber.runtime.CucumberException: java.lang.NoClassDefFoundError: com/aventstack/extentreports/reporter/ExtentHtmlReporter
at cucumber.runtime.formatter.PluginFactory.instantiate(PluginFactory.java:114)
at cucumber.runtime.formatter.PluginFactory.create(PluginFactory.java:87)
at cucumber.runtime.RuntimeOptions.getPlugins(RuntimeOptions.java:245)
at cucumber.runtime.RuntimeOptions$1.invoke(RuntimeOptions.java:291)
at com.sun.proxy.$Proxy11.done(Unknown Source)
at cucumber.runtime.junit.JUnitReporter.done(JUnitReporter.java:227)
at cucumber.api.junit.Cucumber.run(Cucumber.java:101)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:89)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:41)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:541)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:763)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:463)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:209)
Caused by: java.lang.NoClassDefFoundError: com/aventstack/extentreports/reporter/ExtentHtmlReporter
at com.vimalselvam.cucumber.listener.ExtentCucumberFormatter.setExtentHtmlReport(ExtentCucumberFormatter.java:55)
at com.vimalselvam.cucumber.listener.ExtentCucumberFormatter.<init>(ExtentCucumberFormatter.java:38)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at cucumber.runtime.formatter.PluginFactory.instantiate(PluginFactory.java:107)
... 12 more
Update:
When i add the below dependency:
<dependency>
<groupId>com.aventstack</groupId>
<artifactId>extentreports</artifactId>
<version>3.1.1</version>
</dependency>
I receive this new error:
java.lang.IllegalArgumentException: testName cannot be null or empty
at com.aventstack.extentreports.ExtentTest.<init>(ExtentTest.java:80)
at com.aventstack.extentreports.ExtentReports.createTest(ExtentReports.java:105)
at com.aventstack.extentreports.ExtentReports.createTest(ExtentReports.java:145)
at com.vimalselvam.cucumber.listener.ExtentCucumberFormatter.feature(ExtentCucumberFormatter.java:155)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at cucumber.runtime.Utils$1.call(Utils.java:40)
at cucumber.runtime.Timeout.timeout(Timeout.java:16)
at cucumber.runtime.Utils.invoke(Utils.java:34)
at cucumber.runtime.RuntimeOptions$1.invoke(RuntimeOptions.java:294)
at com.sun.proxy.$Proxy11.feature(Unknown Source)
at cucumber.runtime.junit.JUnitReporter.feature(JUnitReporter.java:184)
at cucumber.runtime.junit.FeatureRunner.run(FeatureRunner.java:69)
at cucumber.api.junit.Cucumber.runChild(Cucumber.java:95)
at cucumber.api.junit.Cucumber.runChild(Cucumber.java:38)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at cucumber.api.junit.Cucumber.run(Cucumber.java:100)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:89)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:41)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:541)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:763)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:463)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:209)
try to replace this
<dependency>
<groupId>com.aventstack</groupId>
<artifactId>extentreports</artifactId>
<version>3.1.1</version>
</dependency>
with this one
<dependency>
<groupId>com.aventstack</groupId>
<artifactId>extentreports</artifactId>
<version>3.0.7</version>
</dependency>
the component "cucumber-extentsreport" and "extentreports" are different things.
I had the same issue and solved by using "extentreports" version 3.0.7 with "cucumber-extentsreport" version 3.1.1
Good luck!
I got the answer.
Everything was fine, but i needed to added a value to Feature in the *.feature file.
So, here you can see the Feature key word should exist, and exactly in front of it should be a text. If you write the text in next line, you will face with the error.
Feature: Check Welcome-message
As a user, I should find a welcome-message in my mailbox, and my age in my profile.
I got the answer. Everything was fine, but i needed to added a value to Feature in the *.feature file. So, here you can see the Feature key word should exist, and exactly in front of it should be a text. If you write the text in next line, you will face with the error. :- Yes its work for me too

Spark error with google/guava library: java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.refreshAfterWrite

I have a simple spark project - in which in the pom.xml the dependencies are only the basic scala, scalatest/junit, and spark:
<dependency>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.binary.version}</artifactId>
<version>3.0.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
When attempting to run a basic spark program the SparkSession init fails on this line:
SparkSession.builder.master(master).appName("sparkApp").getOrCreate
Here is the output / error:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/07 18:06:15 INFO SparkContext: Running Spark version 2.2.1
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder
.refreshAfterWrite(JLjava/util/concurrent/TimeUnit;)
Lcom/google/common/cache/CacheBuilder;
at org.apache.hadoop.security.Groups.<init>(Groups.java:96)
at org.apache.hadoop.security.Groups.<init>(Groups.java:73)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2424)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)
I have run spark locally many dozens of times on other projects, what might be wrong with this simple one? Is there a dependency on $HADOOP_HOME environment variable or similar?
Update By downgrading the spark version to 2.0.1 I was able to compile. That does not fix the problem (we need newer) version. But it helps point out the source of the problem
Another update In a different project the hack to downgrade to 2.0.1 does help - i.e. execution proceeds further : but then when writing out to parquet a similar exception does happen.
8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0 (TID 2618)
java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
at org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at org.apache.hadoop.io.compress.CodecPool.<clinit>(CodecPool.java:74)
at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.<init>(CodecFactory.java:92)
at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:169)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetFileFormat.scala:562)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
This error occurs due to version mismatch between Google's guava library and Spark.
Spark shades guava but many libraries use guava. You can try Shading the Guava dependencies as per this post.
Apache-Spark-User-List
Adding shade plugin to your pom file and relocating google package can resolve this issue.
More information can found here and here
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>shade.com.google.common</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
If this also doesn't help then adding guava library of version 15.0 works nicely. The reason of this work around is in dependencyManagement. The nice SO answer is here
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>15.0</version>
</dependency>
</dependencies>
</dependencyManagement>
I am getting this error in spring boot : java.lang.TypeNotPresentException: Type com.google.common.cache.CacheBuilderSpec
com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache
The issue is due to "com.google.guava:guava" api. In springboot this api comes under some other api might be "spring-boot-starter-web" or "springfox-swagger2" api so we need to first exclude guava api from springfox-swagger2 jar and need to add updated version of guava api.spring-data-mongodb
Solution:
1. add guava dependency on the top of all the dependency so that springboot can ge the latest version:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>19.0</version>
</dependency>
Find out the spring boot dependecy where artifactId: "guava" included then exlude "guava" artifact from that dependency and then add the guava dependency like above.

spark kinesis failing on cloudera with java.lang.AbstractMethodError

below is my POM file. I am writing a spark streaming with aws kinesis
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>amazon-kinesis-client</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kinesis-asl_2.10</artifactId>
<version>1.6.0</version>
</dependency>
I am facing below exception during run of spark of spark program on Cloudera 5.10
17/04/27 05:34:04 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 58.0 (TID 179, hadoop1.local, executor 5): java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:50)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.log(KinesisCheckpointer.scala:39)
at org.apache.spark.Logging$class.logDebug(Logging.scala:62)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.logDebug(KinesisCheckpointer.scala:39)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.startCheckpointerThread(KinesisCheckpointer.scala:119)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.<init>(KinesisCheckpointer.scala:50)
at org.apache.spark.streaming.kinesis.KinesisReceiver.onStart(KinesisReceiver.scala:149)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2000)
at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2000)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This runs perfectly fine on EMR4.4 However CDH fails. Any suggestion
The underlying problem seems to be the use of org.apache.spark.Logging:
NOTE: DO NOT USE this class outside of Spark.
It is intended as an internal utility.
This will likely be changed or removed in future releases.
http://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/Logging.html
This is fixed in 2.0.0 as mentioned in https://issues.apache.org/jira/browse/SPARK-9307.

Resources