Cgroup unexpectedly propagates SIGSTOP to the parent - linux

I have a small script to run a command inside a cgroup that limits CPU time:
$ cat cgrun.sh
#!/bin/bash
if [[ $# -lt 1 ]]; then
echo "Usage: $0 <bin>"
exit 1
fi
sudo cgcreate -g cpu:/cpulimit
sudo cgset -r cpu.cfs_period_us=1000000 cpulimit
sudo cgset -r cpu.cfs_quota_us=100000 cpulimit
sudo cgexec -g cpu:cpulimit sudo -u $USER "$#"
sudo cgdelete cpu:/cpulimit
I let the command run: ./cgrun.sh /bin/sleep 10
Then I send SIGSTOP to the sleep command from another terminal. Somehow at this moment the parent commands, sudo and cgexec receive this signal as well. Then, I send SIGCONT to the sleep command, which allows sleep to continue.
But at this moment sudo and cgexec are stopped and never reap the zombie of the sleep process. I don't understand how this can happen? And how can I prevent it? Moreover, I cannot send SIGCONT to sudo and cgexec, because I'm sending the signals from user, while these commands run as root.
Here is how it looks in htop (some columns omitted):
PID USER S CPU% MEM% TIME+ Command
1222869 user S 0.0 0.0 0:00.00 │ │ └─ /bin/bash ./cgrun.sh /bin/sleep 10
1222882 root T 0.0 0.0 0:00.00 │ │ └─ sudo cgexec -g cpu:cpulimit sudo -u user /bin/sleep 10
1222884 root T 0.0 0.0 0:00.00 │ │ └─ sudo -u desertfox /bin/sleep 10
1222887 user Z 0.0 0.0 0:00.00 │ │ └─ /bin/sleep 10
How can create a cgroup in a way that SIGSTOP is not bounced to parent processes?
UPD
If I start the process using systemd-run, I do not observe the same behavior:
sudo systemd-run --uid=$USER -t -p CPUQuota=10% sleep 10

Instead of using the "cg tools", I would do it the "hard way" with the shell commands to create the cpulimit cgroup (it is a mkdir), set the cfs parameters (with echo command in the corresponding cpu.cfs_* files), create a sub-shell with the (...) notation, move it into the cgroup (echo command of its pid into the tasks file of the cgroup) and execute the requested command in this subshell.
Hence, cgrun.sh would look like this:
#!/bin/bash
if [[ $# -lt 1 ]]; then
echo "Usage: $0 <bin>" >&2
exit 1
fi
CGTREE=/sys/fs/cgroup/cpu
sudo -s <<EOF
[ ! -d ${CGTREE}/cpulimit ] && mkdir ${CGTREE}/cpulimit
echo 1000000 > ${CGTREE}/cpulimit/cpu.cfs_period_us
echo 100000 > ${CGTREE}/cpulimit/cpu.cfs_quota_us
EOF
# Sub-shell in background
(
# Pid of the current sub-shell
# ($$ would return the pid of the father process)
MY_PID=$BASHPID
# Move current process into the cgroup
sudo sh -c "echo ${MY_PID} > ${CGTREE}/cpulimit/tasks"
# Run the command with calling user id (it inherits the cgroup)
exec "$#"
) &
# Wait for the sub-shell
wait $!
# Exit code of the sub-shell
rc=$?
# Delete the cgroup
sudo rmdir ${CGTREE}/cpulimit
# Exit with the return code of the sub-shell
exit $rc
Run it (before we get the pid of the current shell to display the process hierarchy in another terminal):
$ echo $$
112588
$ ./cgrun.sh /bin/sleep 50
This creates the following process hierarchy:
$ pstree -p 112588
bash(112588)-+-cgrun.sh(113079)---sleep(113086)
Stop the sleep process:
$ kill -STOP 113086
Look at the cgroup to verify that sleep command is running into it (its pid is in the tasks file) and the CFS parameters are correctly set:
$ ls -l /sys/fs/cgroup/cpu/cpulimit/
total 0
-rw-r--r-- 1 root root 0 nov. 5 22:38 cgroup.clone_children
-rw-r--r-- 1 root root 0 nov. 5 22:38 cgroup.procs
-rw-r--r-- 1 root root 0 nov. 5 22:36 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 nov. 5 22:36 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 nov. 5 22:38 cpu.shares
-r--r--r-- 1 root root 0 nov. 5 22:38 cpu.stat
-rw-r--r-- 1 root root 0 nov. 5 22:38 cpu.uclamp.max
-rw-r--r-- 1 root root 0 nov. 5 22:38 cpu.uclamp.min
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.stat
-rw-r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_all
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_percpu
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_percpu_sys
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_percpu_user
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_sys
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_user
-rw-r--r-- 1 root root 0 nov. 5 22:38 notify_on_release
-rw-r--r-- 1 root root 0 nov. 5 22:36 tasks
$ cat /sys/fs/cgroup/cpu/cpulimit/tasks
113086 # This is the pid of sleep
$ cat /sys/fs/cgroup/cpu/cpulimit/cpu.cfs_*
1000000
100000
Send SIGCONT signal to the sleep process:
$ kill -CONT 113086
The process finishes and the cgroup is destroyed:
$ ls -l /sys/fs/cgroup/cpu/cpulimit
ls: cannot access '/sys/fs/cgroup/cpu/cpulimit': No such file or directory
Get the exit code of the script once it is finished (it is the exit code of the launched command):
$ echo $?
0

Related

to access the scratch folder of a SLURM cluster node

I would appreciate your suggestions and advise on the following please :
I am using a SLURM cluster and my colleagues have advised to run a singularity container on the cluster, and re-direct the output of the singularity container to a folder that is hosted in the /scratch folder of each computing node.
for example :
singularity exec --bind /local/scratch/bt:/output \
singularity_latest.sif run \
-o /output
i would like to ask please : how can i access the "output" folder in the "scratch" of the computing node ? Thanks a lot !
bogdan
You can think of --bind as a bit like a symlink. Running ls /local/scratch/bt on the host OS is equivalent to running ls /output inside the exec process.
mkdir scratch
touch scratch/file1
ls -l scratch
# total 0
# -rw-rw-r-- 1 tsnowlan tsnowlan 0 Jun 8 09:13 file1
singularity exec -B $PWD/scratch:/output my_image.sif ls -l /output
# total 0
# -rw-rw-r-- 1 tsnowlan tsnowlan 0 Jun 8 09:13 file1
# singularity also accepts relative paths
singularity exec -B scratch:/output my_image.sif touch /output/file2
ls -l scratch
# total 0
# -rw-rw-r-- 1 tsnowlan tsnowlan 0 Jun 8 09:13 file1
# -rw-rw-r-- 1 tsnowlan tsnowlan 0 Jun 8 09:16 file2

why Monit create multiple process if it already running

myscript.sh is running then I start Monit with config
set daemon 20 with start delay 5
check program myscript with path "/home/myscript.sh"
if status != 0 then exec "/home/myscript.sh"
or
set daemon 20 with start delay 5
check program myscript with path "/bin/bash /home/myscript.sh"
if status != 0 then exec "/home/myscript.sh"
is my config wrong? why monit create new process
# ps -ef | grep myscript.sh
root 1580 1571 0 13:29 ? /bin/bash /home/myscript.sh < created by monit
root 32675 15735 0 13:23 pts/2 /bin/bash /home/myscript.sh

Cannot run ANY shell scripts even when root [duplicate]

This question already has an answer here:
bash: /bin/myscript: permission denied
(1 answer)
Closed 8 years ago.
when trying to run a teamspeak server and a minecraft server on a newly rented VPS I ran into some big troubles. Whenever I try to run a shell script even when root it does not work.
One script: spigot.sh
#!/bin/sh
BINDIR=$(dirname "$(readlink -fn "$0")")
cd "$BINDIR"
java -Xms5G -Xmx7G -XX:MaxPermSize=128M -jar spigot.jar
Error after trying to use this as root
root#vps23946:/home/user/minecraft# ./spigot.sh
-bash: ./spigot.sh: Permission denied
Error after trying to use this as user
user#vps23946:~/minecraft$ ./spigot.sh
-bash: ./spigot.sh: Permission denied
Results from ls -l
root#vps23946:/home/user/minecraft# ls -l
total 22616
drwxr-xr-x 16 user root 4096 Jun 6 22:39 backups
-rw-r--r-- 1 user root 2 Jun 7 13:54 banned-ips.json
-rw-r--r-- 1 user root 110 May 25 17:32 banned-ips.txt.converted
-rw-r--r-- 1 user root 229 Jun 7 13:54 banned-players.json
-rw-r--r-- 1 user root 267 May 25 17:32 banned-players.txt.converted
-rw-r--r-- 1 user root 1474 Jun 7 13:54 bukkit.yml
-rw-r--r-- 1 user root 610 Jun 7 13:54 commands.yml
drwxr-xr-x 2 user root 4096 Jun 6 19:56 crash-reports
drwxr-xr-x 2 user root 4096 Jun 7 13:54 C:\Users\Rory Finnegan\Desktop\Prep server\backups
drwxr-xr-x 6 user root 4096 Jun 7 14:25 flat
-rw-r--r-- 1 user root 2576 Apr 3 16:04 help.yml
drwxr-xr-x 2 user root 4096 Jun 7 13:54 logs
-rw-r--r-- 1 user root 415 Jun 7 13:54 ops.json
-rw-r--r-- 1 user root 191 May 28 19:02 ops.txt.converted
-rw-r--r-- 1 user root 0 Apr 3 16:05 permissions.yml
drwxr-xr-x 27 user root 4096 Jun 6 22:39 plugins
-rw-r--r-- 1 user root 768 Jun 7 13:54 server.properties
-rw-r--r-- 1 user root 23053543 May 30 15:48 spigot.jar
-rw-r--r-- 1 user root 122 Jun 7 13:36 spigot.sh
-rw-r--r-- 1 user root 2749 Jun 7 13:54 spigot.yml
-rw-r--r-- 1 user root 2404 Jun 7 14:07 usercache.json
-rw-r--r-- 1 user root 1588 Apr 3 16:04 wepif.yml
-rw-r--r-- 1 user root 783 Jun 6 16:21 whitelist.json
-rw-r--r-- 1 user root 250 May 3 19:31 white-list.txt.converted
drwxr-xr-x 7 user root 4096 Jun 7 14:25 world
drwxr-xr-x 6 user root 4096 Jun 7 14:25 world_nether
drwxr-xr-x 6 user root 4096 Jun 7 14:25 world_the_end
Second Script: ts3server_minimal_runscript.sh
#!/bin/sh
export LD_LIBRARY_PATH=".:$LD_LIBRARY_PATH"
D1=$(readlink -f "$0")
D2=$(dirname "${D1}")
cd "${D2}"
if [ -e ts3server_linux_x86 ]; then
if [ -z "`uname | grep Linux`" -o ! -z "`uname -m | grep 64`" ]; then
echo "Do you have the right TS3 Server package for your system? You have: ` uname` `uname -m`, not Linux i386."
fi
./ts3server_linux_x86 $#
elif [ -e ts3server_linux_amd64 ]; then
if [ -z "`uname | grep Linux`" -o -z "`uname -m | grep 64`" ]; then
echo "Do you have the right TS3 Server package for your system? You have: ` uname` `uname -m`, not Linux x86_64."
fi
./ts3server_linux_amd64 $#
elif [ -e ts3server_freebsd_x86 ]; then
if [ ! -z "`uname | grep Linux`" -o ! -z "`uname -m | grep 64`" ]; then
#
With these I get the same errors.
I am running Ubuntu Server 14.04
Scripts and programs must be executable to be invoked by name. Either use chmod to add the executable permission to the file (chmod a+x ./spigot.sh) or invoke an executable interpreter and pass in the script, e.g. /bin/sh ./spigot.sh
Try
chmod +x spigot.sh
and that will enable the script to be executed

Can't find documents with MediaWiki 1.21 with the Lucene-search extension

We're running MediaWiki 1.21 on Ubuntu 12.04.3 with the Lucene-search extension 2.1.3 (from its build.properties file).
I followed the instructions for a Single Host Setup (using ant to build the jar), and Setting Up Suggestions for the Search Box. Things seemed to be working just fine. However, new documents aren't being matched by the type-ahead search feature. Looking at the filesystem, I see that there are various items in the application's indexes directory:
$ cd /usr/local/search/lucene-search-2/indexes
$ ls -l
total 24
drwxr-xr-x 10 root root 4096 Aug 20 2013 import
drwxr-xr-x 7 root root 4096 Apr 14 06:42 index
drwxr-xr-x 2 root root 4096 Apr 14 06:41 search
drwxr-xr-x 9 root root 4096 Aug 20 2013 snapshot
drwxr-xr-x 2 root root 4096 Aug 20 2013 status
drwxr-xr-x 8 root root 4096 Aug 20 2013 update
We have a daily cron job that runs the Lucene-search build command, which dumps the wiki database as xml, and then modifies files in the import and snapshot folders. I noticed that the job reads from the search folder, which contains symbolic links to the update folder:
$ ls -l search/
total 24
lrwxrwxrwx 1 root root 70 Feb 12 21:39 wikidb -> /usr/local/search/lucene-search-2/indexes/update/wikidb/20140212064727
lrwxrwxrwx 1 root root 73 Feb 12 21:39 wikidb.hl -> /usr/local/search/lucene-search-2/indexes/update/wikidb.hl/20140212064727
lrwxrwxrwx 1 root root 76 Apr 14 06:41 wikidb.links -> /usr/local/search/lucene-search-2/indexes/update/wikidb.links/20140414064150
lrwxrwxrwx 1 root root 77 Feb 12 21:39 wikidb.prefix -> /usr/local/search/lucene-search-2/indexes/update/wikidb.prefix/20140212064728
lrwxrwxrwx 1 root root 78 Feb 12 21:39 wikidb.related -> /usr/local/search/lucene-search-2/indexes/update/wikidb.related/20140212064713
lrwxrwxrwx 1 root root 76 Feb 12 21:39 wikidb.spell -> /usr/local/search/lucene-search-2/indexes/update/wikidb.spell/20140212064740
Only the wikidb.links entry is current. The others are a couple of months old, which makes me think I missed something in how our daily cron task is setup. Here's the job:
#!/bin/sh
log=/var/log/lucene-search-2-cron.log
(
echo "Building wiki lucene-search indexes ..."
cd /usr/local/search/lucene-search-2
./build
echo "Stopping the lsearchd service..."
service lsearchd stop
# ok, so stopping the service apparently doesn't mean that the processes are gone, whack them manually
# See tip on using the "[x]yz" character class option so you don't need the additional "grep -v xyz":
# http://stackoverflow.com/questions/3510673/find-and-kill-a-process-in-one-line-using-bash-and-regex
echo "Killing any lucene-search processes that didn't terminate..."
kill -9 $(ps -ef | grep '[l]search' | awk '{print $2}')
echo "Starting the lsearchd service..."
service lsearchd start
) > $log 2>&1
And here's the service script /etc/init.d/lsearchd:
#!/bin/sh -e
### BEGIN INIT INFO
# Provides: lsearchd
# Required-Start: $syslog
# Required-Stop: $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 1
# Short-Description: Start the Lucene Search daemon
# Description: Provide a Lucene Search backend for MediaWiki. Copied by John Ericson from: http://ubuntuforums.org/showthread.php?t
=1476445
### END INIT INFO
# Set to install directory of lucense-search. For example: /usr/local/lucene-search-2.1.3
LUCENE_SEARCH_DIR="/usr/local/search/lucene-search-2"
# Set username for daemon to run as. Can also use syntax "username:groupname" to also specify group for daemon to run as. For example: me:me
RUN_AS_USER="lsearchd"
OPTIONS="-configfile $LUCENE_SEARCH_DIR/lsearch.conf"
test -x $LUCENE_SEARCH_DIR/lsearchd || exit 0
test -n "$RUN_AS_USER" && CHUID_ARG="--chuid $RUN_AS_USER" || CHUID_ARG=""
if [ -f "/etc/default/lsearchd" ] ; then
. /etc/default/lsearchd
fi
. /lib/lsb/init-functions
case "$1" in
start)
cd $LUCENE_SEARCH_DIR
log_begin_msg "Starting Lucene Search Daemon..."
start-stop-daemon --start --quiet --oknodo --chdir $LUCENE_SEARCH_DIR --background $CHUID_ARG --exec $LUCENE_SEARCH_DIR/lsearchd -- $OPT
IONS
log_end_msg $?
;;
stop)
log_begin_msg "Stopping Lucene Search Daemon..."
start-stop-daemon --stop --quiet --oknodo --retry 2 --chdir $LUCENE_SEARCH_DIR $CHUID_ARG --exec $LUCENE_SEARCH_DIR/lsearchd
log_end_msg $?
;;
restart)
$0 stop
sleep 1
$0 start
;;
reload|force-reload)
log_begin_msg "Reloading Lucene Search Daemon..."
start-stop-daemon --stop -signal 1 --chdir $LUCENE_SEARCH_DIR $CHUID_ARG --exec $LUCENE_SEARCH_DIR/lsearchd
log_end_msg $?
;;
status)
status_of_proc $LUCENE_SEARCH_DIR/lsearchd lsearchd && exit 0 || exit $?
;;
*)
log_success_msg "Usage: /etc/init.d/lsearchd {start|stop|restart|reload|force-reload|status}"
exit 1
esac
exit 0
Update #1:
I deleted the update directory and ran the build command manually from the console as root. As expected, it only generated the update/wikidb.links entry, none of the other folders exist. I reviewed my earlier setup notes, and don't see anything different, so how did those folders get created, and how do they get maintained?
Update #2:
I retraced my steps from the initial install, and couldn't see anything I missed. So on a chance, I stopped the service and ran lsearchd from the console, and it created the missing directories! So I terminated the process and tried things again: deleted the indexes folder and ran the cron script from the console as root. I confirmed that when run this way, lsearchd DID NOT create the missing directories. And of course, now I remember that I had run lsearchd from the console when initially setting things up, verifying that it was getting client queries for the wiki's Search input field. And these are the indexes it had been using for the lookups, which explains why new documents are not included.
Here is what the command looks like when run as a service:
$ ps -ef | grep [l]search
lsearchd 10192 1 0 14:02 ? 00:00:00 /bin/bash /usr/local/search/lucene-search-2/lsearchd -configfile /usr/local/search/lucene-search-2/lsearch.conf
lsearchd 10198 10192 0 14:02 ? 00:00:01 java -Djava.rmi.server.codebase=file:///usr/local/search/lucene-search-2/LuceneSearch.jar -Djava.rmi.server.hostname=AMWikiBugz -jar /usr/local/search/lucene-search-2/LuceneSearch.jar -configfile /usr/local/search/lucene-search-2/lsearch.conf
So the remaining question is:
Why does lsearchd NOT create the directories when run as a service?
This was a permissions issue. d'oh!
The cron job and service init scripts all execute as root, however the service process is instantiated as the lsearchd user. Once I changed ownership of /usr/local/search/lucene-search-2/indexes/ and all subdirectories to be owned by lsearchd:lsearchd, the lsearchd process was able to create the missing directories when run via the service under cron.
It would have helped if something along the way had logged an error message to syslog indicating that it couldn't write to the target folder.

CouchDB won't let me DELETE. I think I have users set up correctly

I created a database "my_new_database" and "albums", neither of which I can DELETE. I believe I am still in "ADMIN" party mode. To demonstrate my issue Ill just post some info below.
First here is to show couchdb running ( started using the SystemV script via service )
$ ps aux | grep couch
couchdb 2939 0.0 0.2 108320 1528 ? S 20:45 0:00 /bin/sh -e /usr/bin/couchdb -a /etc/couchdb/default.ini -a /etc/couchdb/local.ini -b -r 0 -p /var/run/couchdb/couchdb.pid -o /dev/null -e /dev/null -R
couchdb 2950 0.0 0.1 108320 732 ? S 20:45 0:00 /bin/sh -e /usr/bin/couchdb -a /etc/couchdb/default.ini -a /etc/couchdb/local.ini -b -r 0 -p /var/run/couchdb/couchdb.pid -o /dev/null -e /dev/null -R
couchdb 2951 4.8 2.3 362168 14004 ? Sl 20:45 0:00 /usr/lib64/erlang/erts-5.8.5/bin/beam -Bd -K true -A 4 -- -root /usr/lib64/erlang -progname erl -- -home /usr/local/var/lib/couchdb -- -noshell -noinput -sasl errlog_type error -couch_ini /etc/couchdb/default.ini /etc/couchdb/local.ini /etc/couchdb/default.ini /etc/couchdb/local.ini -s couch -pidfile /var/run/couchdb/couchdb.pid -heart
couchdb 2959 0.0 0.0 3932 304 ? Ss 20:45 0:00 heart -pid 2951 -ht 11
ec2-user 2963 0.0 0.1 103424 828 pts/1 S+ 20:45 0:00 grep couch
Here is the output of the ".couch" databases I have ( shown for user ownership and permissions)
$ ls -lat /var/lib/couchdb
-rw-r--r-- 1 couchdb couchdb 23 Oct 11 20:45 couch.uri
drwxr-xr-x 3 couchdb couchdb 4096 Oct 11 19:35 .
-rw-r--r-- 1 couchdb couchdb 79 Oct 11 19:35 database2.couch
-rwxrwxrwx 1 couchdb couchdb 79 Oct 11 19:00 my_new_database.couch
-rw-r--r-- 1 couchdb couchdb 4182 Oct 4 21:52 albums.couch
-rw-r--r-- 1 couchdb couchdb 79 Oct 4 21:42 albums-backup.couch
-rw-r--r-- 1 couchdb couchdb 4185 Oct 4 21:30 _users.couch
drwxr-xr-x 18 root root 4096 Oct 4 20:58 ..
drwxr-xr-x 2 root root 4096 Oct 4 18:34 .delete
Here is my first attempt to DELETE
$ curl -X DELETE http://127.0.0.1:5984/my_new_database
{"error":"unauthorized","reason":"You are not a server admin."}
And my second attempt using an authenticated user.
$ curl -X DELETE http://brian:brian#127.0.0.1:5984/my_new_database
{"error":"error","reason":"eacces"}
The username/password of brian/brian was added to the [admin] section of /etc/couchdb/local.ini
Here is the output of my "_users" file. The "key" and "id" fields confuse me.
$ curl -X GET http://brian:brian#127.0.0.1:5984/_users/_all_docs
{"total_rows":1,"offset":0,"rows":[
{"id":"_design/_auth","key":"_design/_auth","value":{"rev":"1-c44fb12a2676d481d235523092e0cec4"}}
]}
Have you restarted your CouchDB after you added to user to local.ini? If so, has the password in the file been hashed or is it readable?
Generally your file permissions look OK, so I can't tell what exactly causes the error. For a quick fix you can simply delete the .couch file, though.
This question is really old, but since I got bitten by this today and this is where Google led me, I thought I'd share my solution for others that stumble here. In my case, my Couch lib directory (/usr/local/var/lib/couchdb for me) had a directory called .delete that was owned by root. Changing the owner to couchdb let me delete databases again.

Resources