I have carried out the following commands.
qmgr -c "create queue fastq queue_type=execution"
qmgr -c "set queue fastq started=true"
qmgr -c "set queue fastq enabled=true"
qmgr -c "set queue fastq acl_hosts=compute-0-30"
qmgr -c "set queue fastq acl_host_enable=true"
qmgr -c "set queue fastq acl_users=username"
qmgr -c "set queue fastq acl_user_enable=true"
But when I have the following header for my PBS script,
#!/bin/sh
#PBS -l nodes=1:ppn=8
#PBS -N job
#PBS -u username
#PBS -q fastq
#PBS -be
mpirun script
I get the following error:
host.edu > qsub runscript
qsub: Access from host not allowed, or unknown host MSG=host ACL rejected the submitting host: user username#email.com, queue fastq, host host.edu
If you are submitting the job from a computer that is not the torque server, you will also need to set the submit_hosts flag.
See Appendix B in the documentation:
http://docs.adaptivecomputing.com/torque/help.htm#topics/12-appendices/serverParameters.htm#submist_hosts
Take special care to use the full hostname of the machine ( the value returned from uname -n).
You should always use the fully qualified hostname as torque can be quite picky.
I would also recommend that you take a look at torque.setup inside the root directory of the tar ball. It will seed a basic configuration.
qmgr -c 'p s'
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = foo.edu
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 45
set server poll_jobs = True
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 19193
set server moab_array_compatible = True
Related
with the commands
$>squeue -u mnyber004
I can visualize all the submitted jobs on my cluster account (slurm)
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
16884 ada CPUeq6 mnyber00 R 1-01:26:17 1 srvcnthpc105
16882 ada CPUeq4 mnyber00 R 1-01:26:20 1 srvcnthpc104
16878 ada CPUeq2 mnyber00 R 1-01:26:31 1 srvcnthpc104
20126 ada CPUeq1 mnyber00 R 22:32:28 1 srvcnthpc103
22004 curie WRI_0015 mnyber00 R 16:11 1 srvcnthpc603
22002 curie WRI_0014 mnyber00 R 16:13 1 srvcnthpc603
22000 curie WRI_0013 mnyber00 R 16:14 1 srvcnthpc603
How to cancel all the jobs running on the partition ada?
In your case, scancel offers the appropriate filters, so you can simply run
scancel -u mnyber004 -p ada
Should it not have been the case, a frequent idiom is to use the more powerful filtering properties of squeue and the --format option to build the proper command and then feed it to sh:
squeue -u mnyber004 -p ada --format "scancel %i" | sh
You can play it safer by first saving to a file and then sourcing the file.
squeue -u mnyber004 -p ada --format "scancel %j" > /tmp/remove.sh
source remove.sh
Below is my code about using pexpect module achieve SSH logon function.
#!/usr/bin/env python
import pexpect
import sys
#use ssh to logon server
user="inteuser" #username
host="146.11.85.xxx" #host ip
password="xxxx" #password
command="ls -l" #list file on home/user directory
child = pexpect.spawn('ssh -l %s %s %s'%(user, host, command))
child.expect('password:')
child.sendline(password)
childlog = open('prompt.log',"ab") # restore prompt log to file prompt.log
__console__ = sys.stdout # make a backup of system output to console
sys.stdout = childlog # print the system output to childlog
child.expect(pexpect.EOF)
childlog.close()
sys.stdout = __console__ # back to the original state of system output
print(child.before) # print the contents before match expect function
after I execute my script
[~/Liaohaifeng]$ python3 ssh_test.py
b' \r\ntotal 69636\r\n-rw-rw-r-- 1 inteuser inteuser 949 Nov 28 02:01
01_eITK_trtest01_CrNwid.log\r\n
[~/Liaohaifeng]$ cat prompt.log
total 69412
-rw-rw-r-- 1 inteuser inteuser 949 Nov 28 02:01 01_eITK_trtest01_CrNwid.log
I think this result is not my expected. when I remove the code child.expect(pexpect.EOF) in my script, the output about print(child.before) can be correct(it should print the content before matching password)
Below is the output after I remove child.expect(pexpect.EOF)
[~/Liaohaifeng]$ python3 ssh_test.py
b"\r\n-------------------------------------------------------------------------------\r\n...
These computer resources are provided for authorized users only. For legal,
\r\n
security and cost reasons, utilization and access of resources are sxx, in\r\n
accordance with approved internal procedures, at any time if IF YOU ARE NOT AN AUTHORIZED USER; PLEASE EXIT IMMEDIATELY...\r\n "
my purpose is print out all the output to a file after executing the script,but the log file still only contains the output of listing directory. So why this happen? could you please help update my script? thank you very much.
You can use the spawn().logfile_read.
[STEP 101] # cat example.py
import pexpect, sys
child = pexpect.spawn('bash --norc')
if sys.version_info[0] <= 2:
# python2
child.logfile_read = open('/tmp/pexpect.log', 'w')
else:
# python3
fp = open('/tmp/pexpect.log', 'w')
child.logfile_read = fp.buffer
child.expect('bash-[.0-9]+[$#] ')
child.sendline('echo hello world')
child.expect('bash-[.0-9]+[$#] ')
child.sendline('exit')
child.expect(pexpect.EOF)
child.logfile_read.close()
[STEP 102] # python3 example.py
[STEP 103] # cat /tmp/pexpect.log
bash-4.4# echo hello world
hello world
bash-4.4# exit
exit
[STEP 104] #
It is a simple question, just adjust code order is OK.
#!/usr/bin/env python
import pexpect
import sys
#use ssh to logon server
user="inteuser" #username
host="146.11.85.xxx" #host ip
password="xxxx" #password
command="ls -l" #list file on home/user directory
child = pexpect.spawn('ssh -l %s %s %s'%(user, host, command))
childlog = open('prompt.log',"ab")
child.logfile = childlog
child.expect('password:')
child.sendline(password)
child.expect(pexpect.EOF)
childlog.close()
I sent this question a few months ago to the slurm-dev list, but it is still unsolved.
The problem is: after trying to change the job size as describes the FAQ, I wanted to do it programatically using the API.
Everything seems to work fine before the step of updating the environment variables.
When I launch the application this is what I get:
$ salloc -N1 mpiexec -n 1 ./jobExpansion
salloc: Granted job allocation 559
srun: error: Only allocated 1 nodes asked for 4
In the squeue I can see that the allocation has changed, but srun cannot see the changes.
I continued debugging and if I executed:
$ salloc -N1
$ export SLURM_NODELIST=n04,n06,n00,n01
$ export SLURM_NNODES=4
$ mpiexec -n 1 ./jobExpansion
It worked.
So, I don't want to overwhelm with the complete code but, just in case you could help me I paste here the parts of the resizing:
slurm_init_job_desc_msg(&job);
job.user_id = getuid();
job.min_nodes = hostsToExpand;
job.dependency = (char *) malloc(sizeof (char)*20);
sprintf(job.dependency, (char *) "expand:%s", pID);
//$ salloc -N4 --dependency=expand:$SLURM_JOBID
slurm_alloc_msg_ptr = slurm_allocate_resources_blocking(&job, 0, NULL);
//$ scontrol update jobid=$SLURM_JOBID NumNodes=0
slurm_init_job_desc_msg(&job_update);
job_update.job_id = slurm_alloc_msg_ptr->job_id;
job_update.min_nodes = 0;
slurm_update_job(&job_update);
//exit
slurm_kill_job(slurm_alloc_msg_ptr->job_id, 9, 0);
//$ scontrol update jobid=$SLURM_JOBID NumNodes=ALL
slurm_init_job_desc_msg(&job_update);
job_update.job_id = procID;
job_update.min_nodes = INFINITE;
slurm_update_job(&job_update);
Everything points out the environment variables but I am not sure how to properly update them.
Thank you.
EDITED
If somebody would like test what I've said, here is the repository:
git clone https://siserte#bitbucket.org/siserte/slurm-job-expansion-test.git
I am walking around this problem a long time - cgroups just don't want to work when reloading config file again(hangs on mount), have to reboot each time to changes take effect.
This are my steps:
(1.)Fresh start of OS.
(2.)cgsnapshot -s
# Configuration file generated by cgsnapshot
mount {
cpuset = /sys/fs/cgroup/cpuset;
cpu = /sys/fs/cgroup/cpu;
cpuacct = /sys/fs/cgroup/cpuacct;
memory = /sys/fs/cgroup/memory;
devices = /sys/fs/cgroup/devices;
freezer = /sys/fs/cgroup/freezer;
net_cls = /sys/fs/cgroup/net_cls;
blkio = /sys/fs/cgroup/blkio;
perf_event = /sys/fs/cgroup/perf_event;
}
(3.)cgclear(4.)cgsnapshot -s
# Configuration file generated by cgsnapshot
(5.)cgconfigparser -l /etc/cgconfig.conf (6.)cgsnapshot -s
mount {
cpu = /cgroup/cpu_mem_blkio;
cpuacct = /cgroup/cpu_mem_blkio;
memory = /cgroup/cpu_mem_blkio;
blkio = /cgroup/cpu_mem_blkio;
}
group hello1 {
...
group hello2 {
...
(7.)bash script /etc/rc.d/rc.cgred start
Now everything is working, but when i do this (the same config):
(8.)cgclear
(9.)cgconfigparser -l /etc/cgconfig.conf
It hangs forever, when i use strace it stops at:
mount("cgroup", "/cgroup/cpu_mem_blkio", "cgroup", 0,
"cpu,cpuacct,blkio,memory") = ? ERESTARTNOINTR (To be restarted)
Could someone point me whats wrong?
How i can i add new group, without rebooting?
Is this normal behavior of cgroups?
I even tried adding this patch from here:
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
<at> <at> -1909,7 +1909,7 <at> <at> static void cgroup_kill_sb(struct super_block *sb)
*
* And don't kill the default root.
*/
- if (css_has_online_children(&root->cgrp.self) ||
+ if (!list_empty(&root->cgrp.self.children) ||
root == &cgrp_dfl_root)
cgroup_put(&root->cgrp);
else
still testing, but looks the same.
Looks like the right way of doing it, is setting everything from the command line.
mount -t cgroup -o cpu,memory,blkio,cpuacct cpu_mem_blkio /cgroup/cpu_mem_blkio
mkdir /cgroup/cpu_mem_blkio/hello1
mkdir /cgroup/cpu_mem_blkio/hello2
echo 200 > /cgroup/cpu_mem_blkio/hello1/cpu.shares
echo 200M > /cgroup/cpu_mem_blkio/hello1/memory.limit_in_bytes
echo 400M > /cgroup/cpu_mem_blkio/hello1/memory.memsw.limit_in_bytes
echo 100 > /cgroup/cpu_mem_blkio/hello1/blkio.weight
...
I am using a bash script to generate mobility files (setdest) in ns2 for various seeds. But I am running into this troublesome segmentation fault. Any help would be appreciated. The setdest.cc has been modified, so its not the standard ns2 file.
I will walk you through the problem.
This code in a shell script returns the segmentation fault.
#! /bin/sh
setdest="/root/ns-allinone-2.1b9a/ns-2.1b9a/indep-utils/cmu-scen-gen/setdest/setdest_mesh_seed_mod"
let nn="70" #Number of nodes in the simulation
let time="900" #Simulation time
let x="1000" #Horizontal dimensions
let y="1000" #Vertical dimensions
for speed in 5
do
for pause in 10
do
for seed in 1 5
do
echo -e "\n"
echo Seed = $seed Speed = $speed Pause Time = $pause
chmod 700 $setdest
setdest -n $nn -p $pause -s $speed -t $time -x $x -y $y -l 1 -m 50 > scen-mesh-n$nn-seed$seed-p$pause-s$speed-t$time-x$x-y$y
done
done
done
error is
scengen_mesh: line 21: 14144 Segmentation fault $setdest -n $nn -p $pause -s $speed -t $time -x $x -y $y -l 1 -m 50 >scen-mesh-n$nn-seed$seed-p$pause-s$speed-t$time-x$x-y$y
line 21 is the last line of the shell script (done)
The strange thing is If i run the same setdest command on the terminal, there is no problem! like
$setdest -n 70 -p 10 -s 5 -t 900 -x 1000 -y 1000 -l 1 -m 50
I have made out where the problem is exactly. Its with the argument -l. If i remove the argument in the shell script, there is no problem. Now i will walk you through the modified setdest.cc where this argument is coming from.
This modified setdest file uses a text file initpos to read XY coordinates of static nodes for a wireless mesh topology. the relevant lines of code are
FILE *fp_loc;
int locinit;
fp_loc = fopen("initpos","r");
while ((ch = getopt(argc, argv, "r:m:l:n:p:s:t:x:y:i:o")) != EOF) {
switch (ch) {
case 'l':
locinit = atoi(optarg);
break;
default:
usage(argv);
exit(1);
if(locinit)
fscanf(fp_loc,"%lf %lf",&position.X, &position.Y);
if (position.X == -1 && position.Y == -1){
position.X = uniform() * MAXX;
position.Y = uniform() * MAXY;
}
What i dont get is...
In Shell script..
-option -l if supplied by 0 returns no error,
-but if supplied by any other value (i used 1 mostly) returns this segmentation fault.
In Terminal..
-no segmentation fault with any value. 0 or 1
something to do with the shell script surely. I am amazed what is going wrong where!
Your help will be highly appreciated.
Cheers