YARN add new queue or clear default queue - apache-spark

I'm running YARN on an EMR cluster.
mapred queue -list returns:
Queue Name : default
Queue State : running
Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 0.0
How do I clear this queue or add a new one? I've been looking for a while now and can't find CLI commands to do so. I only have access to CLI. Any Spark applications I submit hang in the ACCEPTED state, and I've killed all submitted applications via yarn app --kill [app_id]

CurrentCapacity: 0.0 means that the queue is fully unused.
Your jobs, if thats your concern, are NOT hung due to unavailability of resources.
Not sure whether EMR allows yarn cli commands such as schedulerconf
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#queue:~:text=ResourceManager%20admin%20client-,schedulerconf,-Usage%3A%20yarn

Related

Spark app fails after ACCEPTED state for a long time. Log says Socket timeout exception

I have Hadoop 3.2.2 running on a cluster with 1 name node, 2 data nodes and 1 resource manager node. I tried to run the sparkpi example on cluster mode. The spark-submit is done from my local machine. YARN accepts the job but the application UI says
this. Further in the terminal where I submitted the job it says
2021-06-05 13:10:03,881 INFO yarn.Client: Application report for application_1622897708349_0001 (state: ACCEPTED)
This continues to print until it fails. Upon failure it prints
I tried increasing the spark.executor.heartbeatInterval to 3600 secs. Still no luck. I also tried running the code from namenode thinking there must be some connection issue with my local machine. Still I'm unable to run it
found the answer albeit I don't know why it works! Adding the private IP address to the security group in AWS did the trick.

Spark on Kubernetes driver pod cleanup

I am running spark 3.1.1 on kubernetes 1.19. Once job finishes executor pods get cleaned up but driver pod remains in completed state. How to clean up driver pod once it is completed? any configuration option to set?
NAME READY STATUS RESTARTS AGE
my-job-0e85ea790d5c9f8d-driver 0/1 Completed 0 2d20h
my-job-8c1d4f79128ccb50-driver 0/1 Completed 0 43h
my-job-c87bfb7912969cc5-driver 0/1 Completed 0 43h
Concerning the initial question "Spark on Kubernetes driver pod cleanup", it seems that there is no way to pass, at spark-submit time, a TTL parameter to kubernetes for avoiding the never-removal of driver pods in completed status.
From Spark documentation:
https://spark.apache.org/docs/latest/running-on-kubernetes.html
When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists logs and remains in “completed” state in the Kubernetes API until it’s eventually garbage collected or manually cleaned up.
It is not very clear who is doing this 'eventually garbage collected'.
spark.kubernetes.driver.service.deleteOnTermination was added to spark in 3.2.0. This should solve the issue. src: https://spark.apache.org/docs/latest/core-migration-guide.html
update: this will only delete the service to the pod..but not the pod itself
According to the official documentation since Kubernetes 1.12:
Another way to clean up finished Jobs (either Complete or Failed) automatically is to use a TTL mechanism provided by a TTL controller for finished resources, by specifying the .spec.ttlSecondsAfterFinished field of the Job.
When the TTL controller cleans up the Job, it will delete the Job cascadingly, i.e. delete its dependent objects, such as Pods, together with the Job. Note that when the Job is deleted, its lifecycle guarantees, such as finalizers, will be honored.
Example:
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-ttl
spec:
ttlSecondsAfterFinished: 100
template:
spec:
...
The Job pi-with-ttl will be eligible to be automatically deleted, 100 seconds after it finishes.
If the field is set to 0, the Job will be eligible to be automatically deleted immediately after it finishes.
If customisation of the Job resource is not possible you may use an external tool to clean up completed jobs. For example check https://github.com/dtan4/k8s-job-cleaner

Pyspark: How can I collect all worker logs in a specified single file in a yarn cluster?

In my code I want to put some logger.info('Doing something'). Using standard library logging dosen't work.
You can use log4j for logging information in your application make sure to have log4j dependency provided in runtime with configured log4j.xml.
In order to aggregate logs, you need check following things
check whether yarn.log-aggregation-enable it set to true in yarn-site.xml and make sure necessary mount points are added in yarn.nodemanager.remote-app-log-dir and yarn.nodemanager.remote-app-log-dir-suffix.
For example : yarn.nodemanager.remote-app-log-dir=/mnt/app-logs,/mnt1/app-logs/
and yarn.nodemanager.remote-app-log-dir-suffix=/logs/,/logs/
With the above settings,logs are aggregated in HDFS at "/mnt/app-logs/{username}/logs/". Under this folder,
When the MapReduce/spark applications are running, then you can access the logs from the YARN's web UI. Once the application is completed, the logs are served through the Job History Server.
If your yarn.log-aggregation-enable is disable then you can check logs in this location of yarn.nodemanager.log-dirs in local node filesystem.
For Example : yarn.nodemanager.log-dirs=/mnt/hadoop/logs/,/mnt1/hadoop/logs/
yarn.log-aggregation.retain-seconds -- This value only makes sense when you have a long-running job which will take more than 7 days. (default value of yarn.log-aggregation.retain-seconds = 7 days ), which means those logs will be available to aggregate till 7 days after that cleanup job will delete from nodes.
After checking the above properties, you can do few things
You can use yarn Resource manager UI and check the logs of current running job .If it is finished you can check logs via history server
(OR)
You can ssh to master node and do yarn logs -applicationId appid only after application is finished.
Note: make sure your job history server is up and running and configured with enough resources
You can retrieve all logs (driver and executor) of your application into one file this way
:
yarn logs -applicationId <application id> /tmp/mylog.log
application id is the id of your application ... you can retrieve it from spark-history server if needed .
some exemple : application_1597227165470_1073

why there is a left over java process even after the closure of the spark context

I wrote a small application in python that accepts requests and executes pyspark jobs in worker processes. Things are working fine but even after closing the spark context the java process that spanned when starting the spark context still exits. I checked the cluster and the resources are also released properly after the closure of the context. Also the worker is processing subsequent requests with out any issues .
10542 pts/3 Sl+ 0:00 \_ Worker - 1
12960 pts/3 Sl+ 0:22 | \_ /usr/jdk64/jdk1.8.0_77//bin/java - hdp.version=3.0.0.0-1634 -cp /usr/hdp/3.0.0.0-1634/spark2//conf/:/usr/hdp/3.0.0.0-1634/spark2/jars/*:/usr
There are 2 questions
Why the sparkcontext.stop not killing the java process on the master node
My cluster is kerberos integrated cluster . Worker submits the job with keytab and principal populated in the spark context. If the Worker is not processing any jobs for a decent amount of time , the next job is errored out with exception:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation
Token can be issued only with kerberos or web authentication.
When I restart the application, things are fine again. I am suspecting because of expired token left in the java process I am facing this issue. Is there a better way to handle this scenario .
I strongly believe if the java process is not left over i wouldn't face the 2nd issue.
Any suggestions/pointer will be a great help .
Additional Info : The jobs are submitted as "yarn" and deploy mode is "client".

shut down local client of hazelcast exector service

We are using a hazelcast executor service to distribute tasks across our cluster of servers.
We want to shut down one of our servers and take it out of the cluster but allow it to continue working for a period to finish what it is doing but not accept any new tasks from the hazelcast executor service.
I don't want to shut down the hazelcast instance because the current tasks may need it to complete their work.
Shutting down the hazelcast executor service is not what I want. That shuts down the executor cluster-wide.
I would like to continue processing the tasks in the local queue until it is empty and then shut down.
Is there a way for me to let a node in the cluster continue to use hazelcast but tell it to stop accepting new tasks from the executor service?
Not that easily, however you have member attributes (Member::setX/::getX) and you could set an attribute to signal "no new tasks please" and when you submit a tasks you either preselect a member to execute on based on the attribute or you use the overload with the MemberSelector.

Resources