Separate cpuset for jobs - pbs

Is there a way to restrict cpus and memory for users running scripts directly, but allow more cpus and memory on job submission?
I am running torque/pbs on Ubuntu 14.04 server and want to allow "normal" usage of 8 cpus and 16GB RAM, and the rest to be dedicated as a "mom" resource for the cluster. Normal cgroups/cpuset configuration also restricts the running jobs.

If you configure Torque with --enable-cpuset the mom will automatically create a cpuset for each job. Torque isn't really equipped to use part of a machine, but a hack that might work to make this work in conjunction with only using half the machine is to specify np= in the nodes file, and then the mom will restrict the jobs to the first X cpus.

Related

Can slurm run 3 seperate computers as one "node"?

I'm an intern that's been tasked with installing slurm across three compute units running ubuntu. How things work now is people ssh into one of the compute units and run a job on there, since all three units share memory through nfs mounting. Otherwise they are separate machines though.
My issue is that from what I've read in the documentation, it seems like when installing slurm I would specify each of these compute units as a completely separate node, and any jobs I'd like to run that use multiple cores would still be limited by how many cores are available on the individual node. My supervisor has told me however that the three units should be installed as a single node, and when a job comes in that needs more cores than available on a single compute unit, slurm should just use all the cores. The intention is that we won't be changing how we execute jobs (like a parallelized R script), just "wrapping" them in a sbatch script before sending them to slurm for scheduling and execution.
So is my supervisor correct in that slurm can be used to run our parallelized scripts unchanged with more cores than available on a single machine?
Running a script on more cores than available is nonsense. It does not provide any performance increase, rather the oposite, as more threads have to be managed but the computing power is the same.
But he is right in the sense that you can wrap your current script and send it to SLURM for execution using the whole node. But the three machines will be three nodes. They cannot work as a single node because, well, they are not a single node/machine. They do not share either memory nor busses, nor peripherals... they just share some disk thru the network.
You say that
any jobs I'd like to run that use multiple cores would still be limited by how many cores are available on the individual node
but that's the current situation with SSH. Nothing is lost by using SLURM to manage the resources. In fact, SLURM will take care of giving each job the proper resources and avoiding other users interfering with your computations.
Your best bet: create a cluster of three nodes as usual and let people send their jobs asking for as many resources they need without exceeding the available resources.

Node.js cluster module cannot use all the cpu cores when running inside docker container

When run Node.js cluster module on my physical machine, the os.cpus().length will get 4, but after put the app inside docker container then it returns 2!
I generally know this is because that by default Golang will just run on one single core, that's why here the cluster module only can see one single CPU core (2 logical cores).
If I want my cluster module to utilize all the physical CPU cores, what is the proper way to achieve that?
I tried to play with the --cpuset-cpus=0-1 options, till now haven't figure out much.
I am thinking if I just create an arbitrary amount of workers, will that really can utilize all the CPU cores? The os.cpus().length here is just used to figure out how many cpu cores the machine has, I can get around of this by calling into shell script. That means this question can be just simply equal to Node.js os.cpus() API is not compatible with docker? Is that true?
Your docker machine uses default 2 core. On mac you can change the amount in advanced.

Is there a way to set the niceness setting of spark executor processes?

I have a cluster of machines that I have to share with other processes. Lets just say I am not a nice person and want my spark executor processes to have a higher priority then other people's processes. How can I set that?
I am using StandAlone mode, v2.01, running on RHEL7
Spark does not currently (2.4.0) support nice process priorities. Grepping through the codebase, there is no usage of nice, and hence no easy to set process priority on executors using out-of-the-box Spark. It would be a little odd of Spark to do this, since it only assumes it can start a JVM, not that the base operating system is UNIX.
There are hacky ways to get around this that I do NOT recommend. For instance, if you are using Mesos as a resource manager, you could set spark.mesos.executor.docker.image to an image where java actually calls nice -1 old-java "$#".
Allocate all the resources to the spark application leaving minimal
resource needed for os to run.
A simple scenario :
Imagine a cluster with six nodes running NodeManagers(Yarn Mode), each equipped with 16 cores and 64GB of memory. The NodeManager capacities, yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores, should probably be set to 63 * 1024 = 64512 (megabytes) and 15 respectively. We avoid allocating 100% of the resources to YARN containers because the node needs some resources to run the OS and Hadoop daemons. In this case, we leave a gigabyte and a core for these system processes.

Is it possible to use qsub for distribution of job on machine having multiple core ?

Can I use qsub for distribution of jobs on machine having multiple
core ?
My machine has 8 cores and is it possible to distribute job (set of different programs) using PBS server on these 8 cores separately?
if not, Is there any other alternate of this. Main script of the program use qsub to distribute 6 different jobs on nodes when using in parallel mode but when user won't opt for parallel option it use the only one core not other.
My machine has 8 cores and is it possible to distribute job (set of different programs) using PBS server on these 8 cores separately?
qsub is a part of torque. Torque is simply a scheduler. Torque does not distribute tasks itself. It is up to the users to do those things.
Depending on what you need to do you might look at using MPI or OpenMP.
If each task is it's own command you can spawn each task and they will automatically use the available processors, as they are able to.

Hadoop: Using cgroups for TaskTracker tasks

Is it possible to configure cgroups or Hadoop in a way that each process that is spawned by the TaskTracker is assigned to a specific cgroup?
I want to enforce memory limits using cgroups. It is possible to assign a cgroup to the TaskTracker but if jobs wreak havoc the TaskTracker will be probably also killed by the oom-killer because they are in the same group.
Let's say I have 8GB memory on a machine. I want to reserve 1,5GB for the DataNode and system utilities and let the Hadoop TaskTracker use 6,5GB of memory. Now I start a Job using the streaming API at spawns 4 mappers and 2 reducers (each of these could in theory use 1GB RAM) that eats more memory than allowed. Now the cgroup memory limit will be hit and oom-killer starts to kill a job. I would rather use a cgroup for each Map and Reduce task e.g. a cgroup that is limited to 1GB memory.
Is this a real or more theoretical problem? Would the oom-killer really kill the Hadoop TaskTracker or would he start killing the forked processes first? If the latter is most of the time true my idea would probably work. If not - a bad job would still kill the TaskTracker on all cluster machines and require manual restarts.
Is there anything else to look for when using cgroups?
Have you looked at the hadoop parameters that allow the to set and max the heap allocations for the tasktracker's child processes (tasks) and also do not forget to look at the reuse of jvm possibility.
useful links:
http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/
http://developer.yahoo.com/hadoop/tutorial/module7.html
How to avoid OutOfMemoryException when running Hadoop?
http://www.quora.com/Why-does-Hadoop-use-one-JVM-per-task-block
If it's that you have lot of students and staff accessing the Hadoop cluster for job submission, you can probably look at Job Scheduling in Hadoop.
Here is the gist of some types you may be interested in -
Fair scheduler:
The core idea behind the fair share scheduler was to assign resources to jobs such that on average over time, each job gets an equal share of the available resources.
To ensure fairness, each user is assigned to a pool. In this way, if one user submits many jobs, he or she can receive the same share of cluster resources as all other users (independent of the work they have submitted).
Capacity scheduler:
In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a guaranteed capacity (where the overall capacity of the cluster is the sum of each queue's capacity). Capacity scheduling was defined for large clusters, which may have multiple, independent consumers and target applications.
Here's the link from where I shamelessly copied the above mentioned things, due to lack of time.
http://www.ibm.com/developerworks/library/os-hadoop-scheduling/index.html
To configure Hadoop use this link: http://hadoop.apache.org/docs/r1.1.1/fair_scheduler.html#Installation

Resources