slurm script gives "command not found" - linux

I am trying to submit a script to slurm that runs m4 on an input file. m4 is installed on our cluster, and if I run the script by itself, everything works as expected. But when I submit a run to slurm via a slurm script, I get an error.
Here is the script I would like to run (named m4it.sh).
[Note that I'm printing PATH and SHELL in an attempt to debug.]
#!/usr/bin/env bash
echo "Beginning m4it.sh"
echo "PATH=$PATH"
echo "SHELL=$SHELL"
echo
m4 file.m4 > fileout.txt
and here is my slurm script:
#!/usr/bin/env bash
#
#SBATCH --job-name=m4it
### Account name (req'd)
#SBATCH --account=MyAccount
### Redirect .o and .e files to the logs dir
#SBATCH -o m4it.out
#SBATCH -e m4it.err
#
#SBATCH --ntasks=1
#SBATCH --time=00:01:00
#SBATCH --mem-per-cpu=125
echo "PATH=$PATH"
echo "SHELL=$SHELL"
echo
echo "running m4it.sh"
echo
./m4it.sh
which submits successfully to slurm via
sbatch m4it.slurm
When it executes, I get the following error in my m4it.err logfile:
./m4it.sh: line 8: m4: command not found
The PATH and the SHELL variables (printed to m4it.out by the m4it.slurm and by the m4it.sh scripts) are identical. The PATH contains my PATH when I login, and SHELL is /bin/bash, as expected.
Even if I include a symlink to the m4 executable from a directory in my PATH, I still get this error. Also, it is not just m4 that is the problem. The script will report the command "apropos" as an unknown command, even though it runs fine on the command line. The script can "cd" and "ls" just fine though.
I've checked read/write/execute permissions.
ls -ld / /usr /usr/bin /usr/bin/m4
yields the following:
dr-xr-xr-x. 30 root root 4096 Apr 8 11:11 /
drwxr-xr-x. 14 root root 4096 Feb 17 20:24 /usr
dr-xr-xr-x. 2 root root 36864 Apr 29 11:14 /usr/bin
-rwxr-xr-x 1 root root 212440 Jun 3 2010 /usr/bin/m4
It seems that the node the m4it.sh script executes on is different from the front node and that somehow information (environment variables or paths) are not coming across. I have also tried to export all my settings with the argument --export=ALL as follows:
sbatch m4it.slurm --export=ALL
but this didn't work either (same result).
Can anyone help here?

I was able to log in to the compute node in an interactive session. Indeed that node's /usr/bin is significantly different than the front node's, and m4 is not installed.
This also explains why the symlink from a directory in my PATH no longer worked. It was pointing to /usr/bin/m4, but as soon as the job was executed on that compute node, /usr/bin/m4 no longer existed, and thus the symlink was invalid.
If I want to use m4, the solution is to either ask the admins to install m4 on the compute nodes or, alternatively, copy a local version of the executable to somewhere in my home directory that exists in my PATH variable.

Related

Bash binary reports a different version at different times

I am running a bash script with the shebang line #!/bin/bash on RHEL 7.6 . The bash version is 4.2 and I have verified this by running /bin/bash --version. But once this script is being executed, the BASH_VERSION variable being reported inside the script is 3.X. I have printed out the binary location when the script is running is indeed /bin/bash by printing $(ls -l /proc/$$/exe). That actually reports /bin/bash
lrwxrwxrwx 1 xx yy 0 Jan 28 21:03 /proc/6855/exe -> /bin/bash
I am really confused as to how this can happen?

Execute shell script whithin another script prompts: No such file or directory

(I'm new in shell script.)
I've been stuck with this issue for a while. I've tried different methods but without luck.
Description:
When my script attempt to run another script (SiebelMessageCreator.sh, which I don't own) it prompts:
-bash: ./SiebelMessageCreator.sh: No such file or directory
But the file exists and has execute permissions:
-rwxr-xr-x 1 owner ownergrp 322 Jun 11 2015 SiebelMessageCreator.sh
The code that is performing the script execution is:
(cd $ScriptPath; su -c './SiebelMessageCreator.sh' - owner; su -c 'nohup sh SiebelMessageSender.sh &' - owner;)
It's within a subshell because I first thought that it was throwing that message because my script was running in my home directory (When I run the script I'm root and I've moved to my non-root home directory to run the script because I can't move my script [ policies ] to the directory where the other script resides).
I've also tried with the sh SCRIPT.sh ./SCRIPT.sh. And changing the shebang from bash to ksh because the SiebelMessageCreator.sh has that shell.
The su -c 'sh SCRIPT.sh' - owner is necessary. If the script runs as root and not as owner it brokes something (?) (that's what my partners told me from their experience executing it as root). So I execute it as the owner.
Another thing that I've found in my research is that It can throw that message if it's a Symbolic link. I'm really not sure if the content of the script it's a symbolic link. Here it is:
#!/bin/ksh
BASEDIRROOT=/path/to/file/cpp-plwsutil-c
ore-runtime.jar (path changed on purpose for this question)
java -classpath $BASEDIRROOT com.hp.cpp.plwsutil.SiebelMessageCreator
exitCode=$?
echo "`date -u '+%Y-%m-%d %H:%M:%S %Z'` - Script execution finished with exit code $exitCode."
exit $exitCode
As you can see it's a very siple script that just call a .jar. But also I can't add it to my script [ policies ].
If I run the ./SiebelMessageCreator.sh manually it works just fine. But not with my script. I suppose that discards the x64 x32 bits issue that I've also found when I googled?
By the way, I'm automating some tasks, the ./SiebelMessageCreator.sh and nohup sh SiebelMessageSender.sh & are just the last steps.
Any ideas :( ?
did you try ?
. ./SiebelMessageCreator.sh
you can also perform which sh or which ksh, then modify the first line #!/bin/ksh

user-data (cloud-init) script not executing on EC2

my user-data script
#!
set -e -x
echo `whoami`
su root
yum update -y
touch ~/PLEASE_WORK.txt
which is fed in from the command:
ec2-run-instances ami-05355a6c -n 1 -g mongo-group -k mykey -f myscript.sh -t t1.micro -z us-east-1a
but when I check the file /var/log/cloud-init.log, the tail -n 5 is:
[CLOUDINIT] 2013-07-22 16:02:29,566 - cloud-init-cfg[INFO]: cloud-init-cfg ['runcmd']
[CLOUDINIT] 2013-07-22 16:02:29,583 - __init__.py[DEBUG]: restored from cache type DataSourceEc2
[CLOUDINIT] 2013-07-22 16:02:29,686 - cloud-init-cfg[DEBUG]: handling runcmd with freq=None and args=[]
[CLOUDINIT] 2013-07-22 16:02:33,691 - cloud-init-run-module[INFO]: cloud-init-run-module ['once-per-instance', 'user-scripts', 'execute', 'run-parts', '/var/lib/cloud/data/scripts']
[CLOUDINIT] 2013-07-22 16:02:33,699 - __init__.py[DEBUG]: restored from cache type DataSourceEc2
I've also verified that curl http://169.254.169.254/latest/user-data returns my file as intended.
and no other errors or the output of my script happens. how do I get the user-data scrip to execute on boot up correctly?
Actually, cloud-init allows a single shell script as an input (though you may want to use a MIME archive for more complex setups).
The problem with the OP's script is that the first line is incorrect. You should use something like this:
#!/bin/sh
The reason for this is that, while cloud-init uses #! to recognize a user script, the operating system needs a complete shebang line in order to execute the script.
So what's happening in the OP's case is that cloud-init behaves correctly (i.e. it downloads and tries to run the script) but the operating system is unable to actually execute it.
See: Shebang (Unix) on Wikipedia
Cloud-init does not accept plain bash scripts, just like that. It's a beast that eats YAML file that defines your instance (packages, ssh keys and other stuff).
Using MIME you can also send arbitrary shell scripts, but you have to MIME-encode them.
$ cat my-boothook.txt
#!/bin/sh
echo "Hello World!"
echo "This will run as soon as possible in the boot sequence"
$ cat my-user-script.txt
#!/usr/bin/perl
print "This is a user script (rc.local)\n"
$ cat my-include.txt
# these urls will be read pulled in if they were part of user-data
# comments are allowed. The format is one url per line
http://www.ubuntu.com/robots.txt
http://www.w3schools.com/html/lastpage.htm
$ cat my-upstart-job.txt
description "a test upstart job"
start on stopped rc RUNLEVEL=[2345]
console output
task
script
echo "====BEGIN======="
echo "HELLO From an Upstart Job"
echo "=====END========"
end script
$ cat my-cloudconfig.txt
#cloud-config
ssh_import_id: [smoser]
apt_sources:
- source: "ppa:smoser/ppa"
$ ls
my-boothook.txt my-include.txt my-user-script.txt
my-cloudconfig.txt my-upstart-job.txt
$ write-mime-multipart --output=combined-userdata.txt \
my-boothook.txt:text/cloud-boothook \
my-include.txt:text/x-include-url \
my-upstart-job.txt:text/upstart-job \
my-user-script.txt:text/x-shellscript \
my-cloudconfig.txt
$ ls -l combined-userdata.txt
-rw-r--r-- 1 smoser smoser 1782 2010-07-01 16:08 combined-userdata.txt
The combined-userdata.txt is the file you want to paste there.
More info here:
https://help.ubuntu.com/community/CloudInit
Also note, this highly depends on the image you are using. But you say it is really cloud-init based image, so this applies. There are other cloud initiators which are not named cloud-init - then it could be different.
This is a couple years old now, but for others benefit I had the same issue, and it turned out that cloud-init was running twice, from inside /etc/rc3.d . Deleting these files inside the folder allowed the userdata to run correctly:
lrwxrwxrwx 1 root root 22 Jun 5 02:49 S-1cloud-config -> ../init.d/cloud-config
lrwxrwxrwx 1 root root 20 Jun 5 02:49 S-1cloud-init -> ../init.d/cloud-init
lrwxrwxrwx 1 root root 26 Jun 5 02:49 S-1cloud-init-local -> ../init.d/cloud-init-local
The problem is with cloud-init not allowing the user script to run on the next start-up.
First remove the cloud-init artifacts by executing:
rm /var/lib/cloud/instances/*/sem/config_scripts_user
And then your userdata must look like this:
#!/bin/bash
echo "hello!"
And then start you instance. It now works (tested).

Crontab Permission Denied [duplicate]

This question already has answers here:
CronJob not running
(19 answers)
Closed last month.
I'm having problem with crontab when I'm running a script.
My sudo crontab -e looks like this:
05 00 * * * /opt/mcserver/backup.sh
10 00 * * * /opt/mcserver/suspend.sh
05 08 * * * /sbin/shutdown -r +1
11 11 * * * /opt/mcserver/start.sh <--- This isn't working
And the start.sh looks like this:
#!/bin/sh
screen java -d64 -Xincgc -Xmx2048M -jar craftbukkit.jar nogui
and have these permissions (ls -l output)
-rwxr-xr-x 1 eve eve 72 Nov 24 14:17 start.sh
I can run the command from the terminal, either using sudo or not
./start.sh
But it wont start with crontab.
If i do
grep -iR "start.sh" /var/log
I get the following output
/var/log/syslog:Nov 27 11:11:01 eve-desk CRON[5204]: (root) CMD (eve /opt/mcserver/start.sh)
grep: /var/log/btmp: Permission denied
grep: /var/log/lightdm/x-0-greeter.log: Permission denied
grep: /var/log/lightdm/lightdm.log: Permission denied
grep: /var/log/lightdm/x-0.log: Permission denied
So my question is, why isn't it working?
And since my script run without using sudo, I don't necessarily need to put it in sudo crontab?
( and I'm using Ubuntu 12.10 )
Thanks in advance,
Philip
Answer to twalberg's response
1. Changed owner on craftbukkit to root, to see if that fixed the problem.
-rw-r--r-- 1 root root 12084211 Nov 21 02:14 craftbukkit.jar
and also added an explicit cd in my start.sh script as such:
#!/bin/sh
cd /opt/mcserver/
screen java -d64 -Xincgc -Xmx2048M -jar craftbukkit.jar nogui
2. I'm not quite sure what you mean here. Should I use the following path in my start.sh file when i start java?
(output from which java)
/usr/bin/java
3. When my server closes, screen is terminated. Is it a good idea to start screen in "detached mode" anyway?
Still got the same "Permission denied" error.
Problem solved!
By using the proper flag on screen, as below, it is now working as it should!
screen -d -m java -d64 -Xincgc -Xmx2048M -jar craftbukkit.jar nogui
Thanks a lot to those who answered, and especially twalberg!
Here are some things to check:
root obviously has read/execute permissions on start.sh, but what are the permissions on craftbukkit.jar - can root read it? You may also want to add an explicit cd /path/to/where/craftbukkit.jar/is in your start.sh script.
Is java in root's default path within cron? Note that this path is not necessarily the same as the one that you get via sudo, su or directly logging in as root - it's typically much more restricted. Use full path names to both java and craftbukkit.jar to work around that.
Since screen will not start with a terminal available, you may need screen -d -m ... instead. Hopefully, you intend to eventually attach to each screen instance and terminate it later, or you have arranged for it to terminate automatically when the script is done...
The /var/log/syslog entry shows that cron did in fact execute the script, so it must have failed for one of the above reasons (or something else I haven't noticed yet)
The other errors from grep are simply due to the fact your non-root user does not have permission to read those specific files (this is normal, and a good thing).
start.sh is owned by "eve:eve" and your crontab is running as root.
You can solve this by running following command
chown root:root /opt/craftbukkit/start.sh
Your crontab will be running as root though.
Tip: When running bash in crontab always use absolute paths (it will make debugging a lot easier).
The log shows the user has no access to dir " /var/log/", You should set the log files' permition for the cron's owner.

How does Set-user-id bit work on Linux?

I have the following "root-file" with the following contents:
$ cat root-file
#!/bin/bash
echo $EUID
id
Following are the permissions for this file:
$ ls -l root-file
-rwsr-sr-x 1 root root 15 Nov 18 02:20 root-file
Since the set-user-id bit is set for this file, I would expect that on executing this
file, the effective uid would be displayed as 0 even when a non-root user executes it (since set-user-id bit causes the process to be executed with the effective user-id of the owner of the file, which in this case is root). However, instead I get the following output on executing "root-file" from a non-root shell.
$ ./root-file
1000
uid=1000(chanakya) gid=1000(chanakya) groups=1000(chanakya),4(adm),20(dialout),24(cdrom),46(plugdev),105(lpadmin),119(admin),122(sambashare)
This file/or script is not being executed with effective user-id 0. Why is that so?
you cannot use setuid on shell scripts...
if you absolutely need to use setuid checkout http://isptools.sourceforge.net/suid-wrap.html
Normally something like this could also be established using some custom sudo configuration...

Resources