I'm trying to start a example spark job (hadoop is 2.7 and spark is 3.3.1) on my hadoop cluster containing of namenode and datanode0.
Upon running start-dfs.sh, I can see the datanode within the UI, and running jps on datanode returns me with "Datanode" process.
When I try to run the spark-submit with example, I get following output:
spark#namenode:~$ spark-submit --deploy-mode client --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.12-3.3.1.jar 10
23/02/07 21:23:08 INFO SparkContext: Running Spark version 3.3.1
23/02/07 21:23:08 INFO ResourceUtils: ==============================================================
23/02/07 21:23:08 INFO ResourceUtils: No custom resources configured for spark.driver.
23/02/07 21:23:08 INFO ResourceUtils: ==============================================================
23/02/07 21:23:08 INFO SparkContext: Submitted application: Spark Pi
23/02/07 21:23:09 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 512, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/02/07 21:23:09 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor
23/02/07 21:23:09 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/02/07 21:23:09 INFO SecurityManager: Changing view acls to: spark
23/02/07 21:23:09 INFO SecurityManager: Changing modify acls to: spark
23/02/07 21:23:09 INFO SecurityManager: Changing view acls groups to:
23/02/07 21:23:09 INFO SecurityManager: Changing modify acls groups to:
23/02/07 21:23:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
23/02/07 21:23:09 INFO Utils: Successfully started service 'sparkDriver' on port 45161.
23/02/07 21:23:09 INFO SparkEnv: Registering MapOutputTracker
23/02/07 21:23:09 INFO SparkEnv: Registering BlockManagerMaster
23/02/07 21:23:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/02/07 21:23:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/02/07 21:23:09 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/02/07 21:23:09 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-09c3077f-8d16-4496-9808-0626cefe1cc7
23/02/07 21:23:09 INFO MemoryStore: MemoryStore started with capacity 93.3 MiB
23/02/07 21:23:09 INFO SparkEnv: Registering OutputCommitCoordinator
23/02/07 21:23:10 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/02/07 21:23:10 INFO SparkContext: Added JAR file:/home/spark/spark/examples/jars/spark-examples_2.12-3.3.1.jar at spark://namenode:45161/jars/spark-examples_2.12-3.3.1.jar with timestamp 1675801388738
23/02/07 21:23:10 INFO RMProxy: Connecting to ResourceManager at namenode/192.168.1.17:8032
23/02/07 21:23:11 INFO Client: Retrying connect to server: namenode/192.168.1.17:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
23/02/07 21:23:12 INFO Client: Retrying connect to server: namenode/192.168.1.17:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
23/02/07 21:23:13 INFO Client: Retrying connect to server: namenode/192.168.1.17:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
23/02/07 21:23:14 INFO Client: Retrying connect to server: namenode/192.168.1.17:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
23/02/07 21:23:15 INFO Client: Retrying connect to server: namenode/192.168.1.17:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
23/02/07 21:23:16 INFO Client: Retrying connect to server: namenode/192.168.1.17:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
^C23/02/07 21:23:17 INFO DiskBlockManager: Shutdown hook called
23/02/07 21:23:17 INFO ShutdownHookManager: Shutdown hook called
23/02/07 21:23:17 INFO ShutdownHookManager: Deleting directory /tmp/spark-5a15bfc6-654d-4332-be86-3f20b4cf40f1/userFiles-049088cd-d541-4424-930b-dcaa18634860
23/02/07 21:23:17 INFO ShutdownHookManager: Deleting directory /tmp/spark-4d529cc6-0bc8-43a9-b2af-3671ee11f963
23/02/07 21:23:17 INFO ShutdownHookManager: Deleting directory /tmp/spark-5a15bfc6-654d-4332-be86-3f20b4cf40f1
Is it possible that I've messed up the yarn configuration?
Here's my etc hosts:
192.168.1.17 namenode
192.168.1.23 datanode0
This is my yarn-site.xml, maybe I've messed up the configuration?
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1536</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1536</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
I am running Spark Streaming job in Google Dataproc Cluster. I want to test the case what happens Dataproc abruptly stops/fails while streaming job is still running. Hence I stopped the Dataproc cluster manually while the job is still running,
I got the below info in log once I stopped,
22/10/12 06:46:39 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: airflow-stream-test-m/10.***:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
22/10/12 06:46:39 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: airflow-stream-test-m/10.***:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
22/10/12 06:46:40 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: airflow-stream-test-m/10.***:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
22/10/12 06:46:40 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: airflow-stream-test-m/10.***:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
22/10/12 06:46:41 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: airflow-stream-test-m/10.***:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
22/10/12 06:46:41 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: airflow-stream-test-m/10.***:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
The problem is the job just hangs right here. RetryUpToMaximumCountWithFixedSleep never reaches the count 10. Hence the job's status is just shown as 'running' in dataproc console. But I want the job to fail.
Is there any way to reduce RetryUpToMaximumCountWithFixedSleep? I want to reduce it to 2. Also I don't understand why job hangs and RetryUpToMaximumCountWithFixedSleep doesn't increment to 10.
I am using the org.apache.spark.deploy.yarn.Client API in Java to submit Spark application to YARN.
SparkConf sparkConf = new SparkConf();
List<String> submitArgs = new ArrayList<>();
if (StringUtils.hasText(appName)) {
submitArgs.add("--name");
submitArgs.add(appName);
sparkConf.setAppName(appName);
}
submitArgs.add("--jar");
submitArgs.add(appJarPath);
submitArgs.add("--class");
submitArgs.add(appMainClass);
System.setProperty("SPARK_YARN_MODE", "true");
sparkConf.setMaster("yarn")
.set("spark.submit.deployMode", "cluster")
.set("spark.yarn.queue",queue);
ClientArguments clientArguments = new ClientArguments(submitArgs.toArray(new String[submitArgs.size()]));
Client client = new Client(clientArguments, sparkConf);
client.run();
I am trying to create a scenario when the client cannot connect to the resource manager, after so may attempts, it will timeout and throw back an exception. However, the client keeps retrying connecting to the YARN resource manager without any timeout. See below console output:
2018-09-18 14:20:25.046 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.yarn.client.RMProxy : Connecting to ResourceManager at /0.0.0.0:8032
2018-09-18 14:20:27.600 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:29.603 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:31.600 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:33.607 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:35.608 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:37.619 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:39.620 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:41.623 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:43.633 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:45.632 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:18.665 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:20.676 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:22.679 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:24.686 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:26.691 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:28.695 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:30.697 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:32.708 INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
What custom configuration I should do and how so that the retry will stop after a limited number of attempts or time?
I have installed hadoop 2.6.0 multi-node cluster in EC2 instance(ubuntu 14.04 64 bit). All demons(NameNode,SecondaryNameNode,ResourceManager) in master is up,but in slave machine only DataNode is up NodeManager is shutting down due to connection refuse.
Kindly help me in this regard. Thanks in advance
The log file of my NodeManager is below:
2015-09-08 07:59:36,606 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: NodeManager configured with 8 G physical memory allocated to containers, which is more than 80% of the total physical memory available (992.5 M). Thrashing might happen.
2015-09-08 07:59:36,613 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for null: physical-memory=8192 virtual-memory=17204 virtual-cores=8
2015-09-08 07:59:36,646 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-09-08 07:59:36,666 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 53949
2015-09-08 07:59:36,688 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the server
2015-09-08 07:59:36,688 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Blocking new container-requests as container manager rpc server is still starting.
2015-09-08 07:59:36,691 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-09-08 07:59:36,692 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 53949: starting
2015-09-08 07:59:36,707 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Updating node address : ec2-52-88-167-9.us-west-2.compute.amazonaws.com:53949
2015-09-08 07:59:36,713 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-09-08 07:59:36,713 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8040
2015-09-08 07:59:36,716 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB to the server
2015-09-08 07:59:36,717 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-09-08 07:59:36,717 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8040: starting
2015-09-08 07:59:36,717 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer started on port 8040
2015-09-08 07:59:36,719 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager started at ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154:53949
2015-09-08 07:59:36,719 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager bound to 0.0.0.0/0.0.0.0:0
2015-09-08 07:59:36,719 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:8042
2015-09-08 07:59:36,790 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2015-09-08 07:59:36,793 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.nodemanager is not defined
2015-09-08 07:59:36,805 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2015-09-08 07:59:36,806 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context node
2015-09-08 07:59:36,806 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2015-09-08 07:59:36,807 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2015-09-08 07:59:36,812 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /node/*
2015-09-08 07:59:36,812 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2015-09-08 07:59:36,820 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8042
2015-09-08 07:59:36,820 INFO org.mortbay.log: jetty-6.1.26
2015-09-08 07:59:36,863 INFO org.mortbay.log: Extract jar:file:/home/ubuntu/hadoop/hadoop-2.6.0/share/hadoop/yarn/hadoop-yarn-common-2.6.0.jar!/webapps/node to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp
2015-09-08 07:59:37,358 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:8042
2015-09-08 07:59:37,359 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 8042
2015-09-08 07:59:37,879 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2015-09-08 07:59:37,885 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8031
2015-09-08 07:59:37,913 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []
2015-09-08 07:59:37,917 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
**2015-09-08 07:59:38,951 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-09-08 07:59:39,956 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-09-08 07:59:40,957 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-09-08 07:59:41,957 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-09-08 07:59:42,958 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)**
2015-09-08 08:19:48,256 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed in state STARTED; cause: **org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused**
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:197)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:264)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
Caused by: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy27.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy28.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:191)
... 6 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 18 more
2015-09-08 08:19:48,257 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:197)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:264)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
Caused by: java.net.ConnectException: Call From ec2-52-88-167-9.us-west-2.compute.amazonaws.com/172.31.29.154 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy27.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy28.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:191)
... 6 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 18 more
2015-09-08 08:19:48,263 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:8042
2015-09-08 08:19:48,264 INFO org.apache.hadoop.ipc.Server: Stopping server on 53949
2015-09-08 08:19:48,266 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 53949
2015-09-08 08:19:48,267 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-09-08 08:19:48,267 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2015-09-08 08:19:48,267 INFO org.apache.hadoop.ipc.Server: Stopping server on 8040
2015-09-08 08:19:48,268 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8040
2015-09-08 08:19:48,268 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-09-08 08:19:48,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
2015-09-08 08:19:48,269 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2015-09-08 08:19:48,270 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2015-09-08 08:19:48,270 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2015-09-08 08:19:48,270 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ec2-52-26-161-203.us-west-2.compute.amazonaws.com:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hdfstmp</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://ec2-52-26-161-203.us-west-2.compute.amazonaws.com:8021</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
Master machine:
ubuntu#ec2-52-26-161-203:~$ vim /etc/hosts
172.31.23.167 ec2-52-26-161-203.us-west-2.compute.amazonaws.com
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
ubuntu#ec2-52-26-161-203:~$ vim /etc/hadoop/masters
ec2-52-26-161-203.us-west-2.compute.amazonaws.com
ubuntu#ec2-52-26-161-203:~$ vim /etc/hadoop/slaves
ec2-52-88-167-9.us-west-2.compute.amazonaws.com
Slave Machine :
ubuntu#ec2-52-88-167-9:~ vim /etc/hosts
172.31.29.154 ec2-52-88-167-9.us-west-2.compute.amazonaws.com
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
ubuntu#ec2-52-88-167-9:~ vim /etc/hadoop/slaves
ec2-52-88-167-9.us-west-2.compute.amazonaws.com
ubuntu#ec2-52-26-161-203:~$ sudo netstat -lpten | grep java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1000 569904 19910/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1000 570916 20136/java
tcp 0 0 172.31.23.167:8020 0.0.0.0:* LISTEN 1000 569911 19910/java
tcp6 0 0 :::8088 :::* LISTEN 1000 571699 20278/java
tcp6 0 0 :::8030 :::* LISTEN 1000 571690 20278/java
tcp6 0 0 :::8031 :::* LISTEN 1000 571683 20278/java
tcp6 0 0 :::8032 :::* LISTEN 1000 571695 20278/java
tcp6 0 0 :::8033 :::* LISTEN 1000 571702 20278/java
Telnet command:
ubuntu#ec2-52-26-161-203:~$ telnet localhost 8031
Trying ::1...
Connected to localhost.
Escape character is '^]'.
How is it taking 8031 port for resource manager ? I haven't giving in my hadoop configuration files(coresite.xml,mapred-site.xml,hdfs-site.xml) which is above.
I have made modifications in mapred-site.xml and yarn-site.xml which solved my issue.Since I haven't mentioned the host name property value for resource manager in yarn-site.xml it was trying to connect with the address 0.0.0.0 which was the cause for connection refused exception.
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ec2-52-26-161-203.us-west-2.compute.amazonaws.com</value>
</property>
The hadoop documentation http://wiki.apache.org/hadoop/ConnectionRefused
clearly says:
Make sure the destination address in the exception isn't 0.0.0.0 -this
means that you haven't actually configured the client with the real
address for that
Could you please try adding master ip entry in slave machine's host and slave's ip entry to master as well. Also comment out all the entries in the hosts file, if not needed.
I tried to create the folder in file system of HDFS by means of a command
./hadoop fs -mkdir /user/hadoop
and as a result I received the following messages
13/02/17 09:45:50 INFO ipc.Client: Retrying connect to server: one/192.168.1.8:9000. Already tried 0 time(s).
13/02/17 09:45:51 INFO ipc.Client: Retrying connect to server: one/192.168.1.8:9000. Already tried 1 time(s).
13/02/17 09:45:52 INFO ipc.Client: Retrying connect to server: one/192.168.1.8:9000. Already tried 2 time(s).
13/02/17 09:45:53 INFO ipc.Client: Retrying connect to server: one/192.168.1.8:9000. Already tried 3 time(s).
13/02/17 09:45:54 INFO ipc.Client: Retrying connect to server: one/192.168.1.8:9000. Already tried 4 time(s).
13/02/17 09:45:55 INFO ipc.Client: Retrying connect to server: one/192.168.1.8:9000. Already tried 5 time(s).
13/02/17 09:45:56 INFO ipc.Client: Retrying connect to server: one/192.168.1.8:9000. Already tried 6 time(s).
13/02/17 09:45:57 INFO ipc.Client: Retrying connect to server: one/192.168.1.8:9000. Already tried 7 time(s).
13/02/17 09:45:58 INFO ipc.Client: Retrying connect to server: one/192.168.1.8:9000. Already tried 8 time(s).
13/02/17 09:45:59 INFO ipc.Client: Retrying connect to server: one/192.168.1.8:9000. Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to one/192.168.1.8:9000 failed on connection exception: java.net.ConnectException: Connection refused
In the /export/hadoop-1.0.1/conf/core-site.xml file the following is specified:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.8:9000</value>
</property>
</configuration>
Wanted to specify - whether correct I specified port? If isn't present, tell what it is necessary?
seems like hadoop daemons are not running. check with jps