I'm trying to setup a HDInsight emulator on a Windows 8.1 PC following these instructions: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-emulator-get-started/
When trying to run a MapReduce job, I get a connection error.
How can I solve or further investigate this issue?
Details below.
Prerequisites:
Installed Azure Powershell and Azure SDK for VS 2015
Installed HDInsight Emulator for Azure incl. Hortonworks Data Platform
Started local hdp services (13 services running)
Connected Visual Studio to Emulator (had to follow troubleshooting point 2: replacing IP addresses in core-site.xml with '*' due to dynamic IP)
Created directories and copied text files as suggested
Problem:
When trying to run the first example, I get the following error:
16/01/11 10:36:39 INFO mapreduce.Job: Job job_1452503376359_0003 failed with state FAILED due to: Application application_1452503376359_0003 failed 2 times due to AM Container for appattempt_1452503376359_0003_000002 exited with exitCode: -1000 due to: Call From EH3HOST/192.168.56.1 to EH3HOST:8020 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
.Failing this attempt.. Failing the application.
16/01/11 10:36:39 INFO mapreduce.Job: Counters: 0
The following worked for me:
Search for XML files containing <your own host name>:8020 inside the c:\hdp\hdp-<Version Number>\etc\hadoop\ folder. (e.g. EH3HOST:8020)
You should find at least
mapred-site.xml
core-site.xml
yarn-site.xml
Replace all occurrences within these files with 127.0.0.1:8020.
Related
Running an application in in client mode, the driver logs are printed with the below info messages, any idea on how to resolve this? Any spark configs to be updated? or missing?
[INFO ][dispatcher-event-loop-29][SparkRackResolver:54] Got an error when resolving hostNames. Falling back to /default-rack for all
The jobs runs fine, this msg is not in the executor logs.
Check this bug:
https://issues.apache.org/jira/browse/SPARK-28005
If you want to suppress this in the logs you can try to add this into your log4j.properties
log4j.logger.org.apache.spark.deploy.yarn.SparkRackResolver=ERROR
This can happen while using spart-submit with master yarn in a deploy mode local (not using --deploy-mode cluster) and the path to topology.py script is not correct into your core-site.xml.
Path to core-site.xml can be set via environment variable HADOOP_CONF_DIR (or YARN_CONF_DIR).
Check the path in the param net.topology.script.file.name value of core-site.xml.
If the path is incorrect, deploying driver in local mode will lead to error of executing with the following warning:
23/01/15 18:39:43 WARN ScriptBasedMapping: Exception running /home/alexander/xxx/.conf/topology.py 10.15.21.199
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/home/john"): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
...
23/01/15 18:39:43 INFO SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
I'm trying to send spark job to yarn (without HDFS) in HA mode.
For submitting I'm using org.apache.spark.deploy.SparkSubmit.
When I send request from machine with active Resource Manager, it works well. But if I' trying to send from machine with standby Resource Manager, job fails with error:
DEBUG org.apache.hadoop.ipc.Client - Connecting to spark2-node-dev/10.10.10.167:8032
DEBUG org.apache.hadoop.ipc.Client - Connecting to /0.0.0.0:8032
org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep
However, when I send request via command line (spark-submit), it works well through both active and standby machine.
What can cause the problem?
P.S. Use the same parameters for both type of sending job: org.apache.spark.deploy.SparkSubmit and spark-submit command line request. And properties yarn.resourcemanager.hostname.rm_id defined for all rm hosts
The problem was with absence of yarn-site.xml within class path for spark-submitter jar. Actually spark submitter jar does not take to account YARN_CONF_DIR or HADOOP_CONF_DIR env var, so cannot see yarn-site.
One solution that I found was to put yarn-site into classpath of jar.
I was using Spark 1.6.0 to access data on Kerberos enabled HDFS by API DataFrame.read.parquet($path).
My application is deployed as spark on yarn with client mode.
By default, Kerberos ticket expires every 24 hours. Everything works fine in the first 24 hours but failing to read files after 24 hours(or more, like 27 hours).
I have tried several ways to login and renew the ticket, doesn't work.
Set spark.yarn.keytab and spark.yarn.principal in spark-defaults.conf
Set --keytab and --principal in the spark-submit command line
Start a timer in code to call UserGroupInformation.getLoginUser().checkTGTAndReloginFromKeytab() every 2 hours.
Error details are:
WARN [org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:671)] - Couldn't setup connection for adam/cluster1#DEV.COM to cdh01/192.168.1.51:8032
DEBUG [org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1632)] - PrivilegedActionException as:adam/cluster1#DEV.COM (auth:KERBEROS) cause:java.io.IOException: Couldn't setup connection for adam/cluster1#DEV.COMto cdh01/192.168.1.51:8032
ERROR [org.apache.spark.Logging$class.logError(Logging.scala:95)] - Failed to contact YARN for application application_1490607689611_0002.
java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for adam/cluster1#DEV.COM to cdh01/192.168.1.51:8032; Host Details : local host is: "cdh05/192.168.1.41"; destination host is: "cdh01":8032;
The problem was solved.
It was caused by the wrong version of Hadoop lib.
In Spark 1.6 assembly jar, it refer to the old ver. of Hadoop lib, so I downloaded it again without build-in Hadoop lib, and referring to a third party Hadop 2.8 lib.
Then it just works.
I'm setting up a tiny cluster in GCE to play around with it but although instances are created some failures prevent to get it working. I'm following the steps in https://cloud.google.com/hadoop/downloads
So far I'm using (as of now) lastest versions of gcloud (143.0.0) and bdutil (1.3.5), freshly installed.
./bdutil deploy -e extensions/spark/spark_env.sh
using debian-8 as image (as bdutil still uses debian-7-backports).
At some point I got
Fri Feb 10 16:19:34 CET 2017: Command failed: wait ${SUBPROC} on line 326.
Fri Feb 10 16:19:34 CET 2017: Exit code of failed command: 1
full debug output is in https://gist.github.com/jlorper/4299a816fc0b140575ed70fe0da1f272
(project id and bucket names changed)
Instances are created, but spark not even installed. Digging a bit I've managed to run spark installation and start hadoop commands in the master after after ssh. But it fails badly when starting the spark-shell:
17/02/10 15:53:20 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.5-hadoop1
17/02/10 15:53:20 INFO gcsio.FileSystemBackedDirectoryListCache: Creating '/hadoop_gcs_connector_metadata_cache' with createDirectories()...
java.lang.RuntimeException: java.lang.RuntimeException: java.nio.file.AccessDeniedException: /hadoop_gcs_connector_metadata_cache
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
and not able to import sparkSQL. For what I've read everything should be started automatically.
Up to this point I'm a bit lost and don't know what else to do.
Am I missing any step? Is any of the commands faulty? Thanks in advance.
Update: solved
As pointed out in accepted solution I cloned the repo and cluster was created without issues. When trying to start the spark-shell though it gave
java.lang.RuntimeException: java.io.IOException: GoogleHadoopFileSystem has been closed or not initialized.`
That sounded to me like connectors were not initialized properly, so after running
./bdutil --env_var_files extensions/spark/spark_env.sh,bigquery_env.sh run_command_group install_connectors
it worked as expected.
The last version of bdutil on https://cloud.google.com/hadoop/downloads is a bit stale and I'd instead recommend using the version of bdutil at head on github: https://github.com/GoogleCloudPlatform/bdutil.
I am trying to run Titan with embedded cassandra and rexster. Downloaded Titan distribution titan-all-0.3.2 and unpacked on a linux box. After unpacking this is what i ran the command
$ ./bin/titan.sh config/titan-server-rexster.xml config/titan-server-cassandra.properties
This is what i see in the logs
After starting RexPro services its unable to deploy and start grizzly. Has anyone had this issue?
Exception stack trace:
13/10/18 14:51:31 INFO server.RexProRexsterServer: RexPro serving on port: [8184]
Oct 18, 2013 2:51:31 PM org.glassfish.grizzly.servlet.WebappContext deploy
INFO: Starting application [jersey] ...
Oct 18, 2013 2:51:31 PM org.glassfish.grizzly.servlet.WebappContext deploy
SEVERE: [jersey] Exception deploying application. See stack trace for details.
java.lang.RuntimeException: com.sun.jersey.api.container.ContainerException: No WebApplication provider is present
at org.glassfish.grizzly.servlet.WebappContext.initServlets(WebappContext.java:1479)
at org.glassfish.grizzly.servlet.WebappContext.deploy(WebappContext.java:265)
There were some packaging problems in some of the 0.3.2 zip files. You basically need to replace a jar file or two around Jersey to get it to work (or I think use the titan-cassandra distribution instead of titan-all).
You can read more about the issue here and its solution (also reported here), but the answer is:
You should be able to patch 0.3.2 by replacing this jar file in the
Titan lib directory:
jersey-core-1.8.jar
with:
jersey-core-1.17
(http://repo1.maven.org/maven2/com/sun/jersey/jersey-core/1.17/jersey-core-1.17.jar)