SAP HANA VORA - Spark Controller issue - apache-spark

I am trying to install start the SAP HANA Spark Controller on VORA 1.2 using Ambari.
However, when I am starting my Spark controller, I am getting the below exception.
Kindly help here...
[hanaes#ip-172-30-2-218 bin]$ ./hanaes start
Starting HANA Spark Controller ...
Class path is /usr/sap/spark/controller/bin/../conf:/usr/hdp/2.3.4.7-4/hadoop/conf:/etc/hive/conf:../*:../lib/*:../lib/external/*:/usr/hdp/2.3.4.7-4/hadoop/*:/usr/hdp/2.3.4.7-4/hadoop/lib/*:/usr/hdp/2.3.4.7-4/hadoop-hdfs/*:/usr/hdp/2.3.4.7-4/hadoop-hdfs/lib/*
STARTED
[hanaes#ip-172-30-2-218 bin]$ clear
[hanaes#ip-172-30-2-218 bin]$ tail -1000f /var/log/hanaes/hana_controller.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/external/spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.7-4/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/09 12:04:43 INFO HanaESConfig: Loaded HANA Extended Store Configuration Found Spark Libraries. Proceeding with Current Class Path
16/05/09 12:04:44 INFO Server: Starting Spark Controller
16/05/09 12:04:52 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:125)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:65)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:523)
at com.sap.hana.spark.network.CommandRouter.initializeHanaContext(CommandRouter.scala:125)
at com.sap.hana.spark.network.CommandRouter.<init>(CommandRouter.scala:38)
at com.sap.hana.spark.network.Server$$anonfun$1.apply(Server.scala:96)
at com.sap.hana.spark.network.Server$$anonfun$1.apply(Server.scala:96)
at akka.actor.TypedCreatorFunctionConsumer.produce(Props.scala:343)
at akka.actor.Props.newActor(Props.scala:252)
at akka.actor.ActorCell.newActor(ActorCell.scala:552)
at akka.actor.ActorCell.create(ActorCell.scala:578)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 16/05/09 12:04:52 ERROR Utils: Uncaught exception in thread SAPHanaSpark-akka.actor.default-dispatcher-2 java.lang.NullPointerException
at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152)
at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1228)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:100)
at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1749)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1748)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:593)
at com.sap.hana.spark.network.CommandRouter.initializeHanaContext(CommandRouter.scala:125)
at com.sap.hana.spark.network.CommandRouter.<init>(CommandRouter.scala:38)
at com.sap.hana.spark.network.Server$$anonfun$1.apply(Server.scala:96)
at com.sap.hana.spark.network.Server$$anonfun$1.apply(Server.scala:96)
at akka.actor.TypedCreatorFunctionConsumer.produce(Props.scala:343)
at akka.actor.Props.newActor(Props.scala:252)
at akka.actor.ActorCell.newActor(ActorCell.scala:552)
at akka.actor.ActorCell.create(ActorCell.scala:578)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

The spark controller log indicates issue with Yarn, you need to check Yarn log that is responsible for the failed spark controller job:
Ambari -> Yarn -> Quick Links -> Resource Manager UI -> find the failed Spark Controller job -> click on application ID on left -> click on ‘logs'

Related

Start PySpark in Jupyter notebook on EMR 6.5

I am trying to start a pyspark job using Amazon EMR Jupyter hub feature, as follow:
And with following code:
from pyspark import SparkSession
spark = SparkSession \
.builder \
.appName("My App") \
.getOrCreate()
But at the end, I always got:
The code failed because of a fatal error:
Session 0 unexpectedly reached final status 'dead'. See logs:
stdout:
stderr:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/emr/emrfs/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/redshift/jdbc/redshift-jdbc42-1.2.37.1061.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/lib/spark/jars/spark-unsafe_2.12-3.1.2-amzn-1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/05/04 13:21:11 INFO RSCDriver: Connecting to: ip-10-42-255-42.eu-west-1.compute.internal:10000
22/05/04 13:21:11 INFO RSCDriver: Starting RPC server...
22/05/04 13:21:11 INFO RpcServer: Connected to the port 10001
22/05/04 13:21:11 WARN RSCConf: Your hostname, ip-10-42-255-42.eu-west-1.compute.internal, resolves to a loopback address, but we couldn't find any external IP address!
22/05/04 13:21:11 WARN RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
Exception in thread "main" java.lang.IncompatibleClassChangeError: Inconsistent constant pool data in classfile for class org/apache/livy/shaded/json4s/DefaultFormats. Method 'java.text.SimpleDateFormat $anonfun$df$1(org.apache.livy.shaded.json4s.DefaultFormats)' at index 156 is CONSTANT_MethodRef and should be CONSTANT_InterfaceMethodRef
at org.apache.livy.shaded.json4s.DefaultFormats.$init$(Formats.scala:318)
at org.apache.livy.shaded.json4s.DefaultFormats$.<init>(Formats.scala:296)
at org.apache.livy.shaded.json4s.DefaultFormats$.<clinit>(Formats.scala)
at org.apache.livy.repl.Session.<init>(Session.scala:66)
at org.apache.livy.repl.ReplDriver.initializeSparkEntries(ReplDriver.scala:43)
at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:337)
at org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1047)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1056)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/05/04 13:21:11 INFO ShutdownHookManager: Shutdown hook called
22/05/04 13:21:11 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-2804f6ee-21f1-4773-98dc-8b3e3bd1924a
Seems the livy version is clahing with the livy version embedded with the apache shaded jar, so I tried to override the jar using a fat jar that contains all the spark jar I'm used to use, and use the following config to import it:
%%configure -f
{
"conf": {
"spark.jars": "s3://mybucket/myfatjar.jar"
}
}
But without any effect.

java.lang.ClassCastException: org.apache.hadoop.conf.Configuration cannot be cast to org.apache.hadoop.yarn.conf.YarnConfiguration

I am running a spark application using yarn in cloudera.
Spark version: 2.1
I get the following error:
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found
binding in
[jar:file:/data/yarn/nm/filecache/13/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory] 18/04/14 22:20:57 INFO
util.SignalUtils: Registered signal handler for TERM 18/04/14 22:20:57
INFO util.SignalUtils: Registered signal handler for HUP 18/04/14
22:20:57 INFO util.SignalUtils: Registered signal handler for INT
Exception in thread "main" java.lang.ClassCastException:
org.apache.hadoop.conf.Configuration cannot be cast to
org.apache.hadoop.yarn.conf.YarnConfiguration at
org.apache.spark.deploy.yarn.ApplicationMaster.(ApplicationMaster.scala:60)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:763)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
I managed to solve it by verifyning that the spark version configured in SPARK_HOME variable matches the hadoop version installed in cloudera.
From the following link https://spark.apache.org/downloads.html you can download the suitable version for your required hadoop.
The haddop version in cloudera can by found by:
$ hadoop version
I encounter the same issue while trying to start a Spark job using Yarn Rest API.
And the reason was that the environment variable SPARK_YARN_MODE was missing. Adding this env var, everything works fine :
export SPARK_YARN_MODE=true

Running Spark on Yarn

I am trying to run spark on yarn in quickstart cloudera vm.It already has spark 1.3 and Hadoop 2.6.0-cdh5.4.0 installed.(I am not using spark-submit since I want to run a different version of spark).
I am able to run spark 1.3 on yarn but get the below error for spark 1.4.
The log shows its running on spark 1.4 but still gives a error on a method which is present in 1.4 and not 1.3. Even the fat jar contains the class files of 1.4.
As far as running in yarn the installed spark version shouldnt matter, but still its running on the other version.
Hadoop Version:
Hadoop 2.6.0-cdh5.4.0
Subversion http://github.com/cloudera/hadoop -r c788a14a5de9ecd968d1e2666e8765c5f018c271
Compiled by jenkins on 2015-04-21T19:18Z
Compiled with protoc 2.5.0
From source with checksum cd78f139c66c13ab5cee96e15a629025
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.4.0.jar
Error:
LogType:stderr
Log Upload Time:Tue Oct 20 21:58:56 -0700 2015
LogLength:2334
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12- 1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/filecache/10/simple-yarn-app-1.1.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/10/20 21:58:50 INFO spark.SparkContext: Running Spark version 1.4.0
15/10/20 21:58:53 INFO spark.SecurityManager: Changing view acls to: yarn
15/10/20 21:58:53 INFO spark.SecurityManager: Changing modify acls to: yarn
15/10/20 21:58:53 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn); users with modify permissions: Set(yarn)
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.network.util.JavaUtils.timeStringAsSec(Ljava/lang/String;)J
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1027)
at org.apache.spark.SparkConf.getTimeAsSeconds(SparkConf.scala:194)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:68)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:247)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at com.hortonworks.simpleyarnapp.HelloWorld.main(HelloWorld.java:50)
15/10/20 21:58:53 INFO util.Utils: Shutdown hook called
Please help

SPARK_RPC_CLIENT_CONNECT_TIMEOUT in running Hive On Spark - YARN Cluster mode

I am using HDP2.3 and trying to use Spark(1.3.1) as the execution engine for running hive queries.
spark-assembly jar is also available in the hive/lib folder.
I am able to run the query in spark-master: local but facing the below issue when using spark-master: yarn-cluster.
command run,
hive -e "set hive.execution.engine=spark; set
spark.master=yarn-cluster; select count(*) from db_name.table_name;"
output,
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/hive/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/downloads/machine/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/hive/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/downloads/machine/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Logging initialized using configuration in file:/etc/hive/2.3.0.0-2557/0/hive-log4j.properties
Query ID = root_20150909201120_a67d5ca3-36df-43fe-894a-3645585eec7a
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
yarn log of the application,
15/09/09 19:42:27 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
15/09/09 19:42:27 INFO client.RemoteDriver: Connecting to: sandbox.hortonworks.com:59941
15/09/09 19:42:27 ERROR yarn.ApplicationMaster: User class threw exception: SPARK_RPC_CLIENT_CONNECT_TIMEOUT
java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT
at org.apache.hive.spark.client.rpc.RpcConfiguration.<clinit>(RpcConfiguration.java:46)
at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:146)
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480)
15/09/09 19:42:27 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: SPARK_RPC_CLIENT_CONNECT_TIMEOUT)
15/09/09 19:42:37 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
15/09/09 19:42:37 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: SPARK_RPC_CLIENT_CONNECT_TIMEOUT)
15/09/09 19:42:37 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1441817597849_0008
Any help on debugging the issue is much appreciated.
I don't think queries can be executed in yarn-cluster mode.
You can run interactive queries in local and yarn-client mode only

Step by step running apache Nutch 2.2.1

I have config plugin.folders in nutch-default.xml but when I run Nutch via Eclipse & Netbeans,
Main class: org.apache.nutch.crawl.InjectorJob
Arguments: /MY_DATA_SOURCE/HR_PROJECTS/JSearch/Apache_Nutch/RELEASE/release-2.2.1/urls
VM Options: -Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
THe errors like below:
cd /MY_DATA_SOURCE/HR_PROJECTS/JSearch/Apache_Nutch/RELEASE/release-2.2.1; JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home "/Applications/NetBeans/NetBeans 7.3.app/Contents/Resources/NetBeans/java/maven/bin/mvn" "-Dexec.args=-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log -classpath %classpath org.apache.nutch.crawl.InjectorJob /MY_DATA_SOURCE/HR_PROJECTS/JSearch/Apache_Nutch/RELEASE/release-2.2.1/urls" -Dexec.executable=/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/bin/java process-classes org.codehaus.mojo:exec-maven-plugin:1.2.1:exec
Scanning for projects...
------------------------------------------------------------------------
Building Apache Nutch 2.2.1
------------------------------------------------------------------------
[resources:resources]
[debug] execute contextualize
Using platform encoding (US-ASCII actually) to copy filtered resources, i.e. build is platform dependent!
skip non existing resourceDirectory /MY_DATA_SOURCE/HR_PROJECTS/JSearch/Apache_Nutch/RELEASE/release-2.2.1/src/main/resources
[compiler:compile]
Nothing to compile - all classes are up to date
[exec:exec]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/hung/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/hung/.m2/repository/org/slf4j/slf4j-jdk14/1.6.1/slf4j-jdk14-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/hung/.m2/repository/org/slf4j/slf4j-simple/1.6.1/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/07/06 08:55:18 INFO crawl.InjectorJob: InjectorJob: starting at 2013-07-06 08:55:18
13/07/06 08:55:18 INFO crawl.InjectorJob: InjectorJob: Injecting urlDir: /MY_DATA_SOURCE/HR_PROJECTS/JSearch/Apache_Nutch/RELEASE/release-2.2.1/urls
2013-07-06 08:55:18.420 java[1206:1c03] Unable to load realm info from SCDynamicStore
13/07/06 08:55:18 WARN store.DataStoreFactory: gora.properties not found, properties will be empty.
13/07/06 08:55:18 WARN store.DataStoreFactory: gora.properties not found, properties will be empty.
13/07/06 08:55:19 INFO crawl.InjectorJob: InjectorJob: Using class org.apache.gora.sql.store.SqlStore as the Gora storage class.
13/07/06 08:55:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/07/06 08:55:19 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/07/06 08:55:19 INFO input.FileInputFormat: Total input paths to process : 1
13/07/06 08:55:19 WARN snappy.LoadSnappy: Snappy native library not loaded
13/07/06 08:55:19 INFO mapred.JobClient: Running job: job_local226390157_0001
13/07/06 08:55:19 INFO mapred.LocalJobRunner: Waiting for map tasks
13/07/06 08:55:19 INFO mapred.LocalJobRunner: Starting task: attempt_local226390157_0001_m_000000_0
13/07/06 08:55:19 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/07/06 08:55:19 INFO mapred.MapTask: Processing split: file:/MY_DATA_SOURCE/HR_PROJECTS/JSearch/Apache_Nutch/RELEASE/release-2.2.1/urls/seed.txt:0+20
13/07/06 08:55:19 WARN store.DataStoreFactory: gora.properties not found, properties will be empty.
13/07/06 08:55:19 INFO mapreduce.GoraRecordWriter: gora.buffer.write.limit = 10000
13/07/06 08:55:19 INFO mapred.LocalJobRunner: Map task executor complete.
13/07/06 08:55:19 WARN mapred.FileOutputCommitter: Output path is null in cleanup
13/07/06 08:55:19 WARN mapred.LocalJobRunner: job_local226390157_0001
java.lang.Exception: java.lang.IllegalArgumentException: plugin.folders is not defined
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.IllegalArgumentException: plugin.folders is not defined
at org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
at org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:69)
at org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:97)
at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
at org.apache.nutch.crawl.InjectorJob$UrlMapper.setup(InjectorJob.java:99)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
13/07/06 08:55:20 INFO mapred.JobClient: map 0% reduce 0%
13/07/06 08:55:20 INFO mapred.JobClient: Job complete: job_local226390157_0001
13/07/06 08:55:20 INFO mapred.JobClient: Counters: 0
13/07/06 08:55:20 ERROR crawl.InjectorJob: InjectorJob: java.lang.RuntimeException: job failed: name=inject /MY_DATA_SOURCE/HR_PROJECTS/JSearch/Apache_Nutch/RELEASE/release-2.2.1/urls, jobid=job_local226390157_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)
------------------------------------------------------------------------
BUILD FAILURE
------------------------------------------------------------------------
Total time: 6.572s
Finished at: Sat Jul 06 08:55:20 ICT 2013
Final Memory: 11M/236M
------------------------------------------------------------------------
Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default-cli) on project nutch: Command execution failed. Process exited with an error: 255 (Exit value: 255) -> [Help 1]
To see the full stack trace of the errors, re-run Maven with the -e switch.
Re-run Maven using the -X switch to enable full debug logging.
For more information about the errors and possible solutions, please read the following articles:
[Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
The error message clearly indicates the problem (and where to look for a solution):
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/hung/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/hung/.m2/repository/org/slf4j/slf4j-jdk14/1.6.1/slf4j-jdk14-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/hung/.m2/repository/org/slf4j/slf4j-simple/1.6.1/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

Resources