zeppelin notebook "error: not found: value %" - apache-spark

according to reading csv in zeppelin I should be using %dep to load the csv jar, but i get error: not found: value % anyone knows what i'm missing?
%spark
val a = 1
%dep
z.reset()
z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
z.load("com.databricks:spark-csv_2.10:1.2.0")
a: Int = 1
<console>:28: error: not found: value %
%dep
^
in zeppelin logs i see:
INFO [2016-04-21 11:44:19,300] ({pool-2-thread-11} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1461228259278 finished by scheduler org.apache.zeppelin.spark.SparkInterpreter1173192611
INFO [2016-04-21 11:44:19,678] ({pool-2-thread-4} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1461228259678 started by scheduler org.apache.zeppelin.spark.SparkInterpreter1173192611
INFO [2016-04-21 11:44:19,704] ({pool-2-thread-4} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1461228259678 finished by scheduler org.apache.zeppelin.spark.SparkInterpreter1173192611
INFO [2016-04-21 11:44:36,968] ({pool-2-thread-12} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1461228276968 started by scheduler 1367682354
INFO [2016-04-21 11:44:36,969] ({pool-2-thread-12} RReplInterpreter.scala[liftedTree1$1]:41) - intrpreting %dep
z.reset()
z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
z.load("com.databricks:spark-csv_2.10:1.2.0")
ERROR [2016-04-21 11:44:36,975] ({pool-2-thread-12} RClient.scala[eval]:79) - R Error .zreplout <- rzeppelin:::.z.valuate(.zreplin) <text>:1:1: unexpected input
1: %dep
^
INFO [2016-04-21 11:44:36,978] ({pool-2-thread-12} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1461228276968 finished by scheduler 1367682354
INFO [2016-04-21 11:45:22,157] ({pool-2-thread-8} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1461228322157 started by scheduler org.apache.zeppelin.spark.SparkInterpreter1173192611

Each cell can hold one type of interpreter. Thus is order to use %dep and %spark you should separate them into two cells starting with %dep after restarting the spark interpreter so it can be taken into consideration. e.g :
In the first cell :
%dep
z.reset()
z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
z.load("com.databricks:spark-csv_2.10:1.2.0")
Now that your dependencies are loaded, you can access spark interpreter in a different cell:
%spark
val a = 1
PS: By default, a cell runs with the spark interpreter so you don't need to explicitly use %spark.

Related

How to fix this fatal error while running spark jobs on HDIinsight cluster? Session 681 unexpectedly reached final status 'dead'. See logs:

I am running pyspark code on HDIcluster and getting this error:
The code failed because of a fatal error: Session 681 unexpectedly
reached final status 'dead'. See logs:
I don't have experience in YARN or Hadoop. I tried few links provided in stack overflow. But none of them helped. One strange thing is I was able to run the same code yesterday with out that error.
I just ran this import
from pyspark.sql import SparkSession
This is the error I am getting:
19/06/21 20:35:35 INFO Client:
client token: N/A
diagnostics: [Fri Jun 21 20:35:35 +0000 2019] Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:819200, vCores:240> ; Queue's Absolute capacity = 50.0 % ; Queue's Absolute used capacity = 99.1875 % ; Queue's Absolute max capacity = 100.0 % ;
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1561149335158
final status: UNDEFINED
tracking URL: https://mmsorderpredhdi.azurehdinsight.net/yarnui/hn/proxy/application_1560840076505_0062/
user: livy
19/06/21 20:35:35 INFO ShutdownHookManager: Shutdown hook called
19/06/21 20:35:35 INFO ShutdownHookManager: Deleting directory /tmp/spark-bb63c5f0-7579-4456-b32a-0e643ca97ecc
YARN Diagnostics:
Application killed by user..
Question : Is there something to deal with Queue's absolute used capacity?
Could you please check the logs to find the exact issue?
Where do I find the log file?
On Azure HDInsight cluster, You may found the livy log by connecting to one of the Head nodes with SSH and downloading a file at this path.
hdfs dfs -ls /app-logs/livy/logs-ifile
For more details, refer "Access Apache Hadoop YARN application logs on Linux-based HDInsight"
And also, you may refer “How to start sparksession in pyspark”.
Hope this helps.

How to enable a Spark-Mesos job to be launched from inside a Docker container?

Summary:
Is it possible to submit a Spark job on Mesos from inside a Docker container with 1 Mesos master (no Zookeeper) and 1 Mesos agent also each running in separate Docker containers (on the same host for now)? The Mesos containerizer described at http://mesos.apache.org/documentation/latest/container-image/ seems to apply to the case where the Mesos application is simply encapsulated in a Docker container and run. My Docker application is more interactive with multiple PySpark Mesos jobs being instantiated at run-time based on user input. The driver program in the Docker container is not itself run as a Mesos app. Only the user-initiated job requests are handled as PySpark Mesos apps.
Specifics:
I have 3 Docker containers based on centos:7 linux, and running on the same host machine for now:
Container "Master" running a Mesos Master.
Container "Agent" running a Mesos Agent.
Container "Test" with Spark and Mesos installed where I run a bash shell and launch the following PySpark test program from the command line.
from pyspark import SparkContext, SparkConf
from operator import add
# Configure Spark
sp_conf = SparkConf()
sp_conf.setAppName("spark_test")
sp_conf.set("spark.scheduler.mode", "FAIR")
sp_conf.set("spark.dynamicAllocation.enabled", "false")
sp_conf.set("spark.driver.memory", "500m")
sp_conf.set("spark.executor.memory", "500m")
sp_conf.set("spark.executor.cores", 1)
sp_conf.set("spark.cores.max", 1)
sp_conf.set("spark.mesos.executor.home", "/usr/local/spark-2.1.0")
sp_conf.set("spark.executor.uri", "file://usr/local/spark-2.1.0-bin-without-hadoop.tgz")
sc = SparkContext(conf=sp_conf)
# Simple computation
x = [(1.5,100.),(1.5,200.),(1.5,300.),(2.5,150.)]
rdd = sc.parallelize(x,1)
tot = rdd.foldByKey(0,add).collect()
cnt = rdd.countByKey()
time = [t[0] for t in tot]
avg = [t[1]/cnt[t[0]] for t in tot]
print 'tot=', tot
print 'cnt=', cnt
print 't=', time
print 'avg=', avg
The relevant software versions I am using are as follows:
Hadoop: 2.7.3
Spark: 2.1.0
Mesos: 1.2.0
Docker: 17.03.1-ce, build c6d412e
The following works fine:
I can run the simple PySpark test program above from inside the Test container with Spark's MASTER=local[N] for N=1 or N=4.
I can see in the Mesos logs and in the Mesos user interface (UI) that the Mesos agent and master come up fine. The Mesos UI shows that the agent is connected with plenty of resources (cpu, memory, disk).
I can run the Mesos Python tests successfully from inside the Test container with /usr/local/mesos-1.2.0/build/src/examples/python/test-framework 127.0.0.1:5050. This seems to confirm that the Mesos containers can be accessed from within my Test container, but these tests are not using Spark.
This is the Failure:
With Spark's MASTER=mesos://127.0.0.1:5050, when I launch my PySpark test program from inside the Test container there is activity in the logs of both the Mesos Master and Agent, and in the couple seconds before failure, the Mesos UI shows resources assigned for the job that are well within what is available. However, the PySpark test program then fails with: WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources.
The steps I followed are as follows.
Start Mesos Master:
docker run -it --net=host -p 5050:5050 the_master
Relevant excerpts from the master's log shows:
I0418 01:05:08.540192 27 master.cpp:383] Master 15b354eb-6a20-4bc9-a13b-6533b1e91bd2 (localhost) started on 127.0.0.1:5050
I0418 01:05:08.540210 27 master.cpp:385] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/mesos-1.2.0/build/../src/webui" --work_dir="/var/lib/mesos" --zk_session_timeout="10secs"
Start Mesos Agent:
docker run -it --net=host -e MESOS_AGENT_PORT=5051 the_agent
The agent's log shows:
I0418 01:42:00.234244 40 slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_mesos_image="spark-mesos-agent-test" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/usr/local/mesos-1.2.0/build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="false" --systemd_enable_support="false" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
I get the following warning for both the Mesos Master and Agent, but ignore it because I am running everything on the same host for now:
Master/Agent bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
In fact, my tests with assigning a routable IP address instead of 127.0.0.1 failed to change any of the behavior I describe here.
Start Test Container (with bash shell for testing):
docker run -it --net=host the_test /bin/bash
Some relevant environment variables set inside all three container (Master, Agent, and Test):
HADOOP_HOME=/usr/local/hadoop-2.7.3
HADOOP_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop
SPARK_HOME=/usr/local/spark-2.1.0
SPARK_EXECUTOR_URI=file:////usr/local/spark-2.1.0-bin-without-hadoop.tgz
MASTER=mesos://127.0.0.1:5050
PYSPARK_PYTHON=/usr/local/anaconda2/bin/python
PYSPARK_DRIVER_PYTHON=/usr/local/anaconda2/bin/python
PYSPARK_SUBMIT_ARGS=--driver-memory=4g pyspark-shell
MESOS_PORT=5050
MESOS_IP=127.0.0.1
MESOS_WORKDIR=/var/lib/mesos
MESOS_HOME=/usr/local/mesos-1.2.0
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
MESOS_MASTER=mesos://127.0.0.1:5050
PYTHONPATH=:/usr/local/spark-2.1.0/python:/usr/local/spark-2.1.0/python/lib/py4j-0.10.1-src.zip
Run Mesos (non-Spark) tests from inside the Test container:
/usr/local/mesos-1.2.0/build/src/examples/python/test-framework 127.0.0.1:5050
This produces the following log output (as expected I think):
I0417 21:28:36.912542 20 sched.cpp:232] Version: 1.2.0
I0417 21:28:36.920013 62 sched.cpp:336] New master detected at master#127.0.0.1:5050
I0417 21:28:36.920472 62 sched.cpp:352] No credentials provided. Attempting to register without authentication
I0417 21:28:36.924165 62 sched.cpp:759] Framework registered with be89e739-be8d-430e-b1e9-3fe55fa18459-0000
Registered with framework ID be89e739-be8d-430e-b1e9-3fe55fa18459-0000
Received offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0 with cpus: 16.0 and mem: 119640.0
Launching task 0 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 1 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 2 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 3 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 4 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Task 0 is in state TASK_RUNNING
Task 1 is in state TASK_RUNNING
Task 2 is in state TASK_RUNNING
Task 3 is in state TASK_RUNNING
Task 4 is in state TASK_RUNNING
Task 0 is in state TASK_FINISHED
Task 1 is in state TASK_FINISHED
Task 2 is in state TASK_FINISHED
Task 3 is in state TASK_FINISHED
Task 4 is in state TASK_FINISHED
All tasks done, waiting for final framework message
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
All tasks done, and all messages received, exiting
Run PySpark test program from inside the Test container:
python spark_test.py
This produces the following log output:
17/04/17 21:29:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I0417 21:29:19.187747 205 sched.cpp:232] Version: 1.2.0
I0417 21:29:19.196535 188 sched.cpp:336] New master detected at master#127.0.0.1:5050
I0417 21:29:19.197453 188 sched.cpp:352] No credentials provided. Attempting to register without authentication
I0417 21:29:19.201884 195 sched.cpp:759] Framework registered with be89e739-be8d-430e-b1e9-3fe55fa18459-0001
17/04/17 21:29:34 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I searched for this error on the internet but every page I found indicates that it is a common error caused by insufficient resources being allocated to the Mesos agent. As I mentioned, the Mesos UI indicates that there are sufficient resources. Please respond if you have any idea why my Spark job is not accepting resources from Mesos or if you have any suggestions of things I could try.
Thank you for your help.
This error is now resolved. In case anybody encounters a similar problem, I wanted to post that in my case it was caused by the HADOOP CLASSPATH not being set in the Mesos Master and Agent containers. Once set, everything works as expected.

Spark Submit Error Could Not Allocate Memory but Exit Code 0

I use Spark to do ETL and "could not allocate memory" error occasionaly appears.
The problem here is it returns exit code 0 even when it is failed.
I am using Airflow BashOperator and it uses bash exit code as success parameter. It makes job with mentioned error has false success.
Error log looks like below.
[2016-06-22 10:00:54,494] {models.py:974} INFO - Executing <Task(SparkSubmitPythonOperator): ingest_track> on 2016-02-12 01:00:00
[2016-06-22 10:00:54,548] {bash_operator.py:52} INFO - tmp dir root location:
/tmp
[2016-06-22 10:00:54,548] {bash_operator.py:61} INFO - Temporary script location :/tmp/airflowtmp7gl4hl//tmp/airflowtmp7gl4hl/ingest_track0lEaKC
[2016-06-22 10:00:54,548] {bash_operator.py:62} INFO - Running command: spark-submit.sh --master yarn-cluster --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.1 --num-executors 1 ingest_script.py --dt "2016-02-12 01:00:00" --env prod --task_id ingest_track
[2016-06-22 10:00:54,556] {bash_operator.py:70} INFO - Output:
[2016-06-22 10:00:55,200] {bash_operator.py:74} INFO - Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000771b00000, 82313216, 0) failed; error='Cannot allocate memory' (errno=12)
[2016-06-22 10:00:55,226] {bash_operator.py:77} INFO - Command exited with return code 0
My question is, why the exit code is 0 while the submit processes itself failed? How to make it right?
Thanks!

Why does Spark job fail on Mesos with "hadoop: not found"?

I use Spark 1.6.1, Hadoop 2.6.4 and Mesos 0.28 on Debian 8.
While trying to submit a job via spark-submit to a Mesos cluster a slave fails with the following in stderr log:
I0427 22:35:39.626055 48258 fetcher.cpp:424] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/ad642fcf-9951-42ad-8f86-cc4f5a5cb408-S0\/hduser","items":[{"action":"BYP$
I0427 22:35:39.628031 48258 fetcher.cpp:379] Fetching URI 'hdfs://xxxxxxxxx:54310/sources/spark/SimpleEventCounter.jar'
I0427 22:35:39.628057 48258 fetcher.cpp:250] Fetching directly into the sandbox directory
I0427 22:35:39.628078 48258 fetcher.cpp:187] Fetching URI 'hdfs://xxxxxxx:54310/sources/spark/SimpleEventCounter.jar'
E0427 22:35:39.629243 48258 shell.hpp:93] Command 'hadoop version 2>&1' failed; this is the output:
sh: 1: hadoop: not found
Failed to fetch 'hdfs://xxxxxxx:54310/sources/spark/SimpleEventCounter.jar': Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was e$
Failed to synchronize with slave (it's probably exited)
My Jar file contains hadoop 2.6 binaries
The path to spark executor/binary is via an hdfs:// link
My jobs don't appear in the framework tab, but they do appear in the driver with the status 'queued' and they just sit there till I shut down the spark-mesos-dispatcher.sh service.
I was seeing a very similar error and I figured out my problem was that hadoop_home wasn't set in the mesos agent.
I added to /etc/default/mesos-slave (path may be different on your install) on each mesos-slave the following line: MESOS_hadoop_home="/path/to/my/hadoop/install/folder/"
EDIT: Hadoop has to be installed on each slave, the path/to/my/haoop/install/folder is a local path

scala nutch gora-cassandra - RuntimeException: job failed

I'm trying to run nutch and load the crawled data into cassandra.
I've got my sbt file
"org.apache.gora" % "gora-cassandra" % "0.3",
"org.apache.nutch" % "nutch" % "2.2.1",
"com.datastax.cassandra" % "cassandra-driver-core" % "2.1.2"
and am kicking off the job
ToolRunner.run(NutchConfiguration.create(), new Crawler(), Array("urls"));
but am hitting the slightly vague error
EDIT - updated to be full logs from start of request
[Ljava.lang.String;#526950c7
****file:/home/abdev/Working/Qordaoba/gl/web-crawling-services/crawling-services/urls
[error] play - Cannot invoke the action, eventually got an error: java.lang.RuntimeException: job failed: name=generate: null, jobid=job_local_0002
[error] application -
! #6kemm159h - Internal server error, for (POST) [/nutch/job] ->
play.api.Application$$anon$1: Execution exception[[RuntimeException: job failed: name=generate: null, jobid=job_local_0002]]
at play.api.Application$class.handleError(Application.scala:296) ~[play_2.11-2.3.6.jar:2.3.6]
at play.api.DefaultApplication.handleError(Application.scala:402) [play_2.11-2.3.6.jar:2.3.6]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [play_2.11-2.3.6.jar:2.3.6]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [play_2.11-2.3.6.jar:2.3.6]
at scala.Option.map(Option.scala:145) [scala-library-2.11.1.jar:na]
Caused by: java.lang.RuntimeException: job failed: name=generate: null, jobid=job_local_0002
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54) ~[nutch-2.2.1.jar:na]
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199) ~[nutch-2.2.1.jar:na]
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) ~[nutch-2.2.1.jar:na]
at org.apache.nutch.crawl.Crawler.run(Crawler.java:152) ~[nutch-2.2.1.jar:na]
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) ~[nutch-2.2.1.jar:na]
In cassandra - the keyspace webpage and tables sc p f are being created before the error is thrown.
EDIT --- If I put all (sorry its a long list I know) the below jars in my lib folder - then the job runs; and the first few logs are about connecting to cassandra. I don't see those logs when I'm trying to just use the SBT dependencies.
Logs when running with below jar files:
SLF4J: The following set of substitute loggers may have been accessed
SLF4J: during the initialization phase. Logging calls during this
SLF4J: phase were not honored. However, subsequent logging calls to these
SLF4J: loggers will work as normally expected.
SLF4J: See also http://www.slf4j.org/codes.html#substituteLogger
SLF4J: org.webjars.WebJarExtractor
[info] Compiling 5 Scala sources and 1 Java source to /home/abdev/Working/Qordaoba/gl/web-crawling-services/crawling-services/target/scala-2.11/classes...
14/12/10 07:31:03 INFO play: Application started (Dev)
14/12/10 07:31:03 INFO slf4j.Slf4jLogger: Slf4jLogger started
[Ljava.lang.String;#3a6f1296
14/12/10 07:31:05 INFO connection.CassandraHostRetryService: Downed Host Retry service started with queue size -1 and retry delay 10s
14/12/10 07:31:05 INFO service.JmxMonitor: Registering JMX me.prettyprint.cassandra.service_Test Cluster:ServiceType=hector,MonitorType=hector
14/12/10 07:31:06 INFO crawl.InjectorJob: InjectorJob: Using class org.apache.gora.cassandra.store.CassandraStore as the Gora storage class.
14/12/10 07:31:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/10 07:31:06 INFO input.FileInputFormat: Total input paths to process : 1
Full list of Jar files
activation-1.1.jar
antlr-3.2.jar
aopalliance-1.0.jar
apache-cassandra-1.2.19.jar
apache-cassandra-clientutil-1.2.19.jar
apache-cassandra-thrift-1.2.19.jar
apache-nutch-2.2.1.jar
asm-3.2.jar
avro-1.3.3.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.1.jar
commons-cli-1.2.jar
commons-codec-1.2.jar
commons-codec-1.4.jar
commons-collections-3.2.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-el-1.0.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.1.jar
commons-math-2.1.jar
commons-net-1.4.1.jar
compress-lzf-0.8.4.jar
concurrentlinkedhashmap-lru-1.3.jar
cql-internal-only-1.4.1.zip
crawler-commons-0.2.jar
cxf-api-2.5.2.jar
cxf-common-utilities-2.5.2.jar
cxf-rt-bindings-xml-2.5.2.jar
cxf-rt-core-2.5.2.jar
cxf-rt-frontend-jaxrs-2.5.2.jar
cxf-rt-transports-common-2.5.2.jar
cxf-rt-transports-http-2.5.2.jar
elasticsearch-0.19.4.jar
geronimo-javamail_1.4_spec-1.7.1.jar
geronimo-stax-api_1.0_spec-1.0.1.jar
gora-cassandra-0.3.jar
gora-core-0.3.jar
guava-11.0.2.jar
guava-13.0.1.jar
hadoop-core-1.2.0.jar
hamcrest-core-1.3.jar
hector-core-1.1-4.jar
high-scale-lib-1.1.2.jar
hsqldb-2.2.8.jar
httpclient-4.1.1.jar
httpcore-4.1.jar
icu4j-4.0.1.jar
jackson-core-asl-1.8.8.jar
jackson-core-asl-1.9.2.jar
jackson-jaxrs-1.7.1.jar
jackson-mapper-asl-1.8.8.jar
jackson-mapper-asl-1.9.2.jar
jackson-xc-1.7.1.jar
jamm-0.2.5.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jbcrypt-0.3m.jar
jdom-1.1.jar
jersey-core-1.8.jar
jersey-json-1.8.jar
jersey-server-1.8.jar
jettison-1.3.1.jar
jetty-6.1.26.jar
jetty-client-6.1.26.jar
jetty-sslengine-6.1.26.jar
jetty-util5-6.1.26.jar
jetty-util-6.1.26.jar
jline-0.9.1.jar
jline-1.0.jar
json-simple-1.1.jar
jsr305-1.3.9.jar
jsr311-api-1.1.1.jar
junit-4.11.jar
juniversalchardet-1.0.3.jar
libthrift-0.7.0.jar
log4j-1.2.16.jar
lucene-analyzers-3.6.0.jar
lucene-core-3.6.0.jar
lucene-highlighter-3.6.0.jar
lucene-memory-3.6.0.jar
lucene-queries-3.6.0.jar
lz4-1.1.0.jar
metrics-core-2.2.0.jar
neethi-3.0.1.jar
org.osgi.core-4.0.0.jar
org.restlet.ext.jackson-2.0.5.jar
org.restlet-2.0.5.jar
oro-2.0.8.jar
paranamer-2.2.jar
paranamer-ant-2.2.jar
paranamer-generator-2.2.jar
qdox-1.10.1.jar
serializer-2.7.1.jar
servlet-api-2.5-6.1.14.jar
servlet-api-2.5-20081211.jar
slf4j-api-1.6.6.jar
slf4j-api-1.7.2.jar
slf4j-log4j12-1.6.1.jar
slf4j-log4j12-1.7.2.jar
snakeyaml-1.6.jar
snappy-java-1.0.5.jar
snaptree-0.1.jar
solr-solrj-3.4.0.jar
spring-aop-3.0.6.RELEASE.jar
spring-asm-3.0.6.RELEASE.jar
spring-beans-3.0.6.RELEASE.jar
spring-context-3.0.6.RELEASE.jar
spring-core-3.0.6.RELEASE.jar
spring-expression-3.0.6.RELEASE.jar
spring-web-3.0.6.RELEASE.jar
stax2-api-3.1.1.jar
stax-api-1.0.1.jar
stax-api-1.0-2.jar
thrift-python-internal-only-0.7.0.zip
tika-core-1.3.jar
woodstox-core-asl-4.1.1.jar
wsdl4j-1.6.2.jar
wstx-asl-3.2.7.jar
xercesImpl-2.9.1.jar
xml-apis-1.3.04.jar
xmlenc-0.52.jar
xmlParserAPIs-2.6.2.jar
xmlschema-core-2.0.1.jar
zookeeper-3.3.1.jar
Thanks,
Brent

Resources