NoSuchMethodError? code is ok in idea, but wrong on cluster - apache-spark

My code runs okay on my win7 idea64, but when i package the code, and run it on yarn cluster, it throws an expection:
java.lang.NoSuchMethodError: org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(Lorg/apache/commons/beanutils/BeanIntrospector;)V
I checked the pom.xml, and find relative jars is already included
my pom.xml is like this:
<properties>
<scala.version>2.10.6</scala.version>
</properties>
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-pool2</artifactId>
<version>2.4.2</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-configuration2</artifactId>
<version>2.2</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>commons-beanutils</groupId>
<artifactId>commons-beanutils</artifactId>
<version>1.9.3</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>2.9.0</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.5</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0-cdh5.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.0-cdh5.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.0-cdh5.7.2</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.0-cdh5.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.0-cdh5.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.0-cdh5.7.2</version>
</dependency>
The line/part/block where the error is happening exactly is:
line43 val config = ConfigurationUtil("config.properties").config
The stack trace is:
19/05/22 10:30:11 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
19/05/22 10:30:12 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1557905260816_0073_000002
19/05/22 10:30:12 INFO spark.SecurityManager: Changing view acls to: yarn,yizheng
19/05/22 10:30:12 INFO spark.SecurityManager: Changing modify acls to: yarn,yizheng
19/05/22 10:30:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, yizheng); users with modify permissions: Set(yarn, yizheng)
19/05/22 10:30:12 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
19/05/22 10:30:12 INFO yarn.ApplicationMaster: Waiting for spark context initialization
19/05/22 10:30:12 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
19/05/22 10:30:12 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(Lorg/apache/commons/beanutils/BeanIntrospector;)V
java.lang.NoSuchMethodError: org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(Lorg/apache/commons/beanutils/BeanIntrospector;)V
at org.apache.commons.configuration2.beanutils.BeanHelper.initBeanUtilsBean(BeanHelper.java:631)
at org.apache.commons.configuration2.beanutils.BeanHelper.<clinit>(BeanHelper.java:89)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:259)
at com.sun.proxy.$Proxy11.<clinit>(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:739)
at org.apache.commons.configuration2.builder.fluent.Parameters.createParametersProxy(Parameters.java:307)
at org.apache.commons.configuration2.builder.fluent.Parameters.properties(Parameters.java:246)
at com.sunwada.utils.ConfigurationUtil$.<init>(ConfigurationUtil.scala:16)
at com.sunwada.utils.ConfigurationUtil$.<clinit>(ConfigurationUtil.scala)
at com.sunwada.sparkStreaming.SparkStreamKafkaHbaseSaveOffsetGson$.main(SparkStreamKafkaHbaseSaveOffsetGson.scala:43)
at com.sunwada.sparkStreaming.SparkStreamKafkaHbaseSaveOffsetGson.main(SparkStreamKafkaHbaseSaveOffsetGson.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
19/05/22 10:30:13 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(Lorg/apache/commons/beanutils/BeanIntrospector;)V)
19/05/22 10:30:22 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
19/05/22 10:30:22 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: java.lang.NoSuchMethodError: org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(Lorg/apache/commons/beanutils/BeanIntrospector;)V)
19/05/22 10:30:22 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1557905260816_0073
19/05/22 10:30:23 INFO util.ShutdownHookManager: Shutdown hook called
What can I do now?

AFAIK this is the culprit.
<dependency>
<groupId>commons-beanutils</groupId>
<artifactId>commons-beanutils</artifactId>
<version>1.9.3</version>
<scope>compile</scope>
</dependency>
scope compile will be used for only compilation purpose.
you may need to change it to runtime
OR
It is an inconsistency between the dependencies at compile time and at runtime.

NoSuchMethodError mostly caused by version conflict.you can install dependency help plugin in your IDE solve it

Related

Spark error with google/guava library: java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.refreshAfterWrite

I have a simple spark project - in which in the pom.xml the dependencies are only the basic scala, scalatest/junit, and spark:
<dependency>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.binary.version}</artifactId>
<version>3.0.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
When attempting to run a basic spark program the SparkSession init fails on this line:
SparkSession.builder.master(master).appName("sparkApp").getOrCreate
Here is the output / error:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/07 18:06:15 INFO SparkContext: Running Spark version 2.2.1
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder
.refreshAfterWrite(JLjava/util/concurrent/TimeUnit;)
Lcom/google/common/cache/CacheBuilder;
at org.apache.hadoop.security.Groups.<init>(Groups.java:96)
at org.apache.hadoop.security.Groups.<init>(Groups.java:73)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2424)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)
I have run spark locally many dozens of times on other projects, what might be wrong with this simple one? Is there a dependency on $HADOOP_HOME environment variable or similar?
Update By downgrading the spark version to 2.0.1 I was able to compile. That does not fix the problem (we need newer) version. But it helps point out the source of the problem
Another update In a different project the hack to downgrade to 2.0.1 does help - i.e. execution proceeds further : but then when writing out to parquet a similar exception does happen.
8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0 (TID 2618)
java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
at org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at org.apache.hadoop.io.compress.CodecPool.<clinit>(CodecPool.java:74)
at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.<init>(CodecFactory.java:92)
at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:169)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetFileFormat.scala:562)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
This error occurs due to version mismatch between Google's guava library and Spark.
Spark shades guava but many libraries use guava. You can try Shading the Guava dependencies as per this post.
Apache-Spark-User-List
Adding shade plugin to your pom file and relocating google package can resolve this issue.
More information can found here and here
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>shade.com.google.common</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
If this also doesn't help then adding guava library of version 15.0 works nicely. The reason of this work around is in dependencyManagement. The nice SO answer is here
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>15.0</version>
</dependency>
</dependencies>
</dependencyManagement>
I am getting this error in spring boot : java.lang.TypeNotPresentException: Type com.google.common.cache.CacheBuilderSpec
com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache
The issue is due to "com.google.guava:guava" api. In springboot this api comes under some other api might be "spring-boot-starter-web" or "springfox-swagger2" api so we need to first exclude guava api from springfox-swagger2 jar and need to add updated version of guava api.spring-data-mongodb
Solution:
1. add guava dependency on the top of all the dependency so that springboot can ge the latest version:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>19.0</version>
</dependency>
Find out the spring boot dependecy where artifactId: "guava" included then exlude "guava" artifact from that dependency and then add the guava dependency like above.

spark kinesis failing on cloudera with java.lang.AbstractMethodError

below is my POM file. I am writing a spark streaming with aws kinesis
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>amazon-kinesis-client</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kinesis-asl_2.10</artifactId>
<version>1.6.0</version>
</dependency>
I am facing below exception during run of spark of spark program on Cloudera 5.10
17/04/27 05:34:04 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 58.0 (TID 179, hadoop1.local, executor 5): java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:50)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.log(KinesisCheckpointer.scala:39)
at org.apache.spark.Logging$class.logDebug(Logging.scala:62)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.logDebug(KinesisCheckpointer.scala:39)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.startCheckpointerThread(KinesisCheckpointer.scala:119)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.<init>(KinesisCheckpointer.scala:50)
at org.apache.spark.streaming.kinesis.KinesisReceiver.onStart(KinesisReceiver.scala:149)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2000)
at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2000)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This runs perfectly fine on EMR4.4 However CDH fails. Any suggestion
The underlying problem seems to be the use of org.apache.spark.Logging:
NOTE: DO NOT USE this class outside of Spark.
It is intended as an internal utility.
This will likely be changed or removed in future releases.
http://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/Logging.html
This is fixed in 2.0.0 as mentioned in https://issues.apache.org/jira/browse/SPARK-9307.

Spark-Cassandra Connector Always Defaulting to 127.0.1.1

The Spark connector to Cassandra trying to connect to 127.0.1.1:9042, even though I am hardcoding the address.
Even hardcoding the address
conf.set("cassandra.connection.host", "37.61.205.66"), does not work. I do not want Cassandra CQL port to run on 127.0.1.1. What are the solutions.
POM.xml:
<dependencies>
<!-- Scala and Spark dependencies -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.0-RC1</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.5.0-RC1</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.0.0-rc1</version>
</dependency>
<dependency>
Error:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:163)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:82)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:121)
at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:322)
at com.datastax.spark.connector.cql.Schema$.tableFromCassandra(Schema.scala:342)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.tableDef(CassandraTableRowReaderProvider.scala:50)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef$lzycompute(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.verify(CassandraTableRowReaderProvider.scala:137)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.verify(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:232)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.RDD$$anonfun$distinct$2.apply(RDD.scala:401)
at org.apache.spark.rdd.RDD$$anonfun$distinct$2.apply(RDD.scala:401)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.distinct(RDD.scala:400)
at com.access_company.twine.OnlineGatewayCount$.main(OnlineGatewayCount.scala:93)
at com.access_company.twine.OnlineGatewayCount.main(OnlineGatewayCount.scala)
... 6 more
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.1.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.1.1] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:233)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1424)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:403)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:156)
The proper setting is prefixed with 'spark'. Please see the docs.
conf.set("spark.cassandra.connection.host", cassandraHost)

Spring data cassandra - Error creating bean with name 'cassandraSession': Invocation of init method failed

I am trying to used Spring Data Cassandra
I am getting the following error as shown in the stack trace, while using Spring Data Cassandra. \
Could you help with this error
My pom.xml is like so..
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.sap.icn</groupId>
<artifactId>sample-cassandra-data</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>sample-cassandra-data</name>
<properties>
<spring.framework.version>3.2.8.RELEASE</spring.framework.version>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-cassandra</artifactId>
<version>1.4.0.RC1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.12</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
</dependencies>
</project>
Stacktrace:
2016-05-20 20:04:42 INFO ClassPathXmlApplicationContext:578 - Refreshing org.springframework.context.support.ClassPathXmlApplicationContext#1324409e: startup date [Fri May 20 20:04:42 SGT 2016]; root of context hierarchy
2016-05-20 20:04:42 INFO XmlBeanDefinitionReader:317 - Loading XML bean definitions from class path resource [application-context.xml]
2016-05-20 20:04:43 INFO PropertySourcesPlaceholderConfigurer:172 - Loading properties file from class path resource [cassandra.properties]
2016-05-20 20:04:43 WARN ClassPathXmlApplicationContext:546 - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cassandraSession': Invocation of init method failed; nested exception is java.lang.NoClassDefFoundError: io/netty/util/concurrent/EventExecutor
2016-05-20 20:04:43 WARN DisposableBeanAdapter:271 - Invocation of destroy method failed on bean with name 'cassandraCluster': java.lang.NullPointerException
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.158 sec <<< FAILURE!
testTxnSave(com.sap.icn.yaas.recommender.data.TransactionTest) Time elapsed: 1.076 sec <<< ERROR!
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cassandraSession': Invocation of init method failed; nested exception is java.lang.NoClassDefFoundError: io/netty/util/concurrent/EventExecutor
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1578)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:545)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:753)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:839)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:538)
at org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:139)
at org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:83)
at com.sap.icn.yaas.recommender.data.TransactionTest.setUp(TransactionTest.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Caused by: java.lang.NoClassDefFoundError: io/netty/util/concurrent/EventExecutor
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1286)
at com.datastax.driver.core.Cluster.init(Cluster.java:159)
at com.datastax.driver.core.Cluster.connect(Cluster.java:249)
at com.datastax.driver.core.Cluster.connect(Cluster.java:282)
at org.springframework.cassandra.config.CassandraCqlSessionFactoryBean.afterPropertiesSet(CassandraCqlSessionFactoryBean.java:82)
at org.springframework.data.cassandra.config.CassandraSessionFactoryBean.afterPropertiesSet(CassandraSessionFactoryBean.java:43)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1637)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1574)
... 42 more
Caused by: java.lang.ClassNotFoundException: io.netty.util.concurrent.EventExecutor
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 50 more
Change your Spring Data Cassandra dependency to...
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-cassandra</artifactId>
<version>1.4.1.RELEASE</version>
</dependency>
The 1.4.1.RELEASE is the most current, up-to-date version of Spring Data Cassandra now.
Also, you need to declare an explicit dependency on Netty, which is used by the DataStax Cassandra driver's I/O subsystem (when communicating to Cassandra asynchronously). The DataStax Cassandra driver is used by SD Cassandra.
The dependency is...
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.0.36.Final</version>
</dependency>
Finally, I am not certain your core Spring Framework version property is doing a whole lot in your Maven POM file...
<spring.framework.version>3.2.8.RELEASE</spring.framework.version>
...as it does not seem to be used anywhere, but you should be aware that SD Cassandra 1.4 is built on core Spring Framework 4.2.5.RELEASE.
You can determine this by following the SD Cassandra (parent) POM hierarchy starting here, then here (along with this). You can take a look at the core Spring Framework BOM file to see all that it pulls in.
Hope this helps.
Cheers!

Apache Spark: Streaming without HDFS checkpoint

I'm implementing a Spark job which makes use of reduceByKeyAndWindow, therefore I need to add checkpointing.
From Spark's website I see that:
Checkpointing can be enabled by setting a directory in a fault-tolerant, reliable file system (e.g., HDFS, S3, etc.) to which the checkpoint information will be saved.
My application is just for academic purposes, thus I don't want to set an HDFS for checkpointing but just a local file. Doing so in MacOS works fine (setting a temporary dir as checkpoint dir), the problem comes when doing it in Windows, which throws an exception for permissions.
I already tried starting eclipse as administrator and creating the directory manually setting setWritable, setReadable and setExecutable to true. Any hint on how to overcome the problem in Windows?
Thanks!
Update Here's my code and exception. Just to clarify again, it works fine in Mac but not in Windows.
SparkConf conf = new SparkConf().setAppName("testApp").setMaster("local[2]");
JavaSparkContext ctx = new JavaSparkContext(conf);
JavaStreamingContext jsc = new JavaStreamingContext(ctx, new Duration(1000));
jsc.checkpoint(Files.createTempDir().getAbsolutePath());
Exception:
Exception in thread "pool-7-thread-3" java.lang.NullPointerException
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:678)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:661)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:639)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:905)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:783)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:772)
at org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:135)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Solved by adding the latest Hadoop libraries to my project.
If using Maven, the following set of dependencies do the trick.
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>
On Windows, you can solve this problem as following
Download winutils.exe to a folder say MY_UTILS/bin
Create an environment variable HADOOP_HOME and point it to MY_UTILS

Resources