CPU Consumption of apache spark process - apache-spark

I have a system with 6 physical cores and each core has 8 hardware threads resulting in 48 virtual cores. Following are the setting in configuration files.
spark-env.sh
export SPARK_WORKER_CORES=1
spark-defaults.conf
spark.driver.cores 1
spark.executor.cores 1
spark.cores.max 1
So it means it should only use 1 virtual core but if we see the output from the TOP command, some time, it has very huge spikes e.g the CPU consumption is above 4000 e.g.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22581 sbaig 20 0 0.278t 0.064t 37312 S 4728 6.4 7:11.30 java
....
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22581 sbaig 20 0 0.278t 0.065t 37312 S 1502 6.5 8:22.75 java
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22581 sbaig 20 0 0.278t 0.065t 37312 S 4035 6.6 9:51.64 java
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22581 sbaig 20 0 0.278t 0.080t 37312 S 3445 8.1 15:06.26 java
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22581 sbaig 20 0 0.278t 0.082t 37312 S 4178 8.2 17:37.59 java
...
It means, instead of using 1 virtual core, spark is using all available cores in the system so my question is why it is behaving like this? why it is not using only 1 core during execution of job which we set in SPARK_WORKER_CORES property.
I am using spark 1.6.1 with standalone mode.
Any help will be highly appreciated.
Thanks
Shuja

As the per the information you provided, it looks like you are setting the information in spark-defaults.conf file only.
In order to apply this configuration in your spark application, you have to configure these three properties in SparkConf object of code while creating the spark context as shown below.
var conf = new SparkConf()
conf.set("spark.driver.cores","1")
conf.set("spark.executor.cores","1")
conf.set("spark.cores.max","1")
Or if you are submitting the application using the spark-submit CLI then you can use the --driver-cores, --executor-cores and --conf spark.cores.max=1 options while running application.

Related

How to kill/stop a process that continuously refreshes its PID?

I recently installed Graylog2 onto my Ubuntu server for log monitoring. I soon after get an alert stating that my CPUs are reaching capacity. I then log into my server over SSH and run top. What I see confuses me and makes it difficult to kill the process.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2462 graylog2 20 0 2103292 42684 16424 S 19.3 1.1 0:00.58 java
2470 graylog+ 20 0 2295612 46368 16032 S 13.0 1.1 0:00.39 java
1971 www-data 20 0 354808 36140 19392 S 10.0 0.9 0:00.61 php5
Everytime top refreshes, I see that the PIDs of graylog have increased so I'm unable to kill it by PID.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16937 www-data 20 0 357988 52140 34244 S 45.3 1.3 0:07.45 php5-fpm
24588 graylog2 20 0 2079236 35464 15576 S 9.7 0.9 0:00.29 java
24547 graylog+ 20 0 2295612 37148 15640 S 8.0 0.9 0:00.24 java
What is the proper way to kill/stop a process that continuously re-instantiates itself like that?
I don't now graylog. But perhaps 'killall' can help you. It handles processes by name.
http://linux.die.net/man/1/killall
Please read the man pages before use it.
i don't use it often. so i don't know the disadvantages. (if there are any)

How to tell if a task is running or dead in ubuntu?

I am running a awk script in Ubuntu terminal. The script works for relatively small size file. So I tried it on a big file and it has been a long time and is still running.
I used top and this is what i got and the %MEM is changing from time to time.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2263 user 20 0 15.1g 12g 372 D 17 90.1 3:56.43 awko
I dont really know if it's dead or should I keep waiting? How can I tell it?

How to find CPU utilization rate moster file?

I use "top" command and got the result below:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30769 test 20 0 48964 23m 5968 R 100 1.4 2:06.89 php
30747 test 20 0 48964 23m 5976 R 57 1.4 6:24.55 php
How can I to find which php file use the mostest CPU.
(apologize for my poor english)
Try this:
ps -e -o pid,%cpu,comm= | sort -n -k 2 | grep "\.php"
output will be sorted by processes %CPU usage.

Why the SWAP listed in the detail list of the TOP command is greater than in the summary?

The TOP command results:
Mem: 3991840k total, 1496328k used, 2495512k free, 156752k buffers
**Swap**: 3905528k total, **3980k** used, 3901548k free, 447860k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ **SWAP** COMMAND
28250 www-data 20 0 430m 210m 21m R 63 5.4 0:07.29 **219m** apache2
28266 www-data 20 0 256m 40m 21m S 30 1.0 0:01.94 **216m** apache2
28206 www-data 20 0 260m 44m 21m S 27 1.1 0:10.27 **215m** apache2
28259 www-data 20 0 256m 40m 21m S 26 1.0 0:02.21 **216m** apache2
The details list shows a group of apache2 processes are using SWAP memory about 210m+ each, but the summary reports only 3980k is used. The total SWAP memory in the detail list is much greater than in the summary. Do the two swap refer the same thing?
Quoted from http://www.linuxforums.org/articles/using-top-more-efficiently_89.html :
VIRT=RES+SWAP
As explained previously, VIRT includes anything inside task's
address space, no matter it is in RAM,
swapped out or still not loaded from
disk. While RES represents total RAM
consumed by this task. So, SWAP here
means it represents the total amount
of data being swapped out OR still not
loaded from disk. Don't be fooled by
the name, it doesn't just represent
the swapped out data.

regarding VRT column of top command's output on linux

In the man pages of top command, it is given that VRT column shows memory consumption in kb(kilo bits). When i am running my application in linux, memory consumed is shown as 157m. Does this 157m mean 157 mega bytes or 157 mega bits? Any clarifiction is appreciated.
It's in MegaBytes. Put the output from your top next to the output from ps aux.
> ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 780 72 ? S Jun26 0:09 init [3]
mysql 28670 2.1 42.1 2733944 1708028 ? Sl Sep24 1910:21 /usr/sbin/mysqld
>top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28670 mysql 15 0 2667m 1.6g 4164 S 104 42.2 1910:37 mysqld

Resources