Does there exist a dummy command that doesn't change input and output? - linux

Does there exist a dummy command in Linux, that doesn't change the input and output, and takes arguments that doesn't do anything?
cmd1 | dummy --tag="some unique string here" | cmd2
My interest in such dummy command is to be able to identify the processes, then I have multiple cmd1 | cmd2 running.

You can use a variable for that:
cmd1 | USELESS_VAR="some unique string here" cmd2
This has the advantage that Jonathan commented about of not performing an extra copy on all the data.

To create the command that you want, make a file called dummy with the contents:
#!/bin/sh
cat
Place the file in your PATH and make it executable (chmod a+x dummy). Because dummy ignores its arguments, you may place any argument on the command line that you choose. Because the body of dummy runs cat (with no arguments), dummy will echo stdin to stdout.
Alternative
My interest in such dummy command is to be able to identify the processes, then I have multiple cmd1 | cmd2 running.
As an alternative, consider
pgrep cmd1
It will show a process ID for every copy of cmd that is running. As an example:
$ pgrep getty
3239
4866
4867
4893
This shows that four copies of getty are running.

I'm not quite clear on your use case, but if you want to be able to "tag" different invocations of a command for the convenience of a sysadmin looking at a ps listing, you might be able to rename argv[0] to something suitably informative:
$:- perl -E 'exec {shift} "FOO", #ARGV' sleep 60 &
$:- perl -E 'exec {shift} "BAR", #ARGV' sleep 60 &
$:- perl -E 'exec {shift} "BAZ", #ARGV' sleep 60 &
$:- ps -o cmd=
-bash
FOO 60
BAR 60
BAZ 60
ps -o cmd
You don't have to use perl's exec. There are other conveniences for this, too, such as DJB's argv0, RedHat's doexec, and bash itself.

You should find a better way of identifying the processes buy anyway, you can use:
ln -s /dev/null /tmp/some-unique-file
cmd1 | tee /tmp/some-unique-file | cmd2

Related

How to detect if a bash script is already running, considering its arguments [duplicate]

This question already has answers here:
Quick-and-dirty way to ensure only one instance of a shell script is running at a time
(43 answers)
What is the best way to ensure only one instance of a Bash script is running? [duplicate]
(14 answers)
Closed 1 year ago.
Sorry for my poor english ;)
I need to check if a script is already running or not. I don't want to use a lock file, as it can be tricky (ie: if my script wrote a lock file, but crashed, I will consider it as running).
I also need to take parameters into account. ie:
test.sh 123
should be considered as a different process than
test.sh 456
I tried this :
#!/bin/bash
echo "inside test.sh, script name with arguments: $0 +$*$"
echo " simple pgrep on script name with arguments:"
pgrep -f "$0 +$*$"
echo " counting simple pgrep on script name with arguments with wc -l"
echo $(pgrep -f "$0 +$*$" | wc -l)
echo " counting pgrep echo result with wc -w"
processes=$(pgrep -f "$0 +$*$")
nbProcesses=$(echo $processes | wc -w)
echo $nbProcesses
sleep 300
When I try, I get this result:
[frederic.charrier#charrier tmp]$ /tmp/test.sh 123
inside test.sh, script name with arguments: /tmp/test.sh +123$
simple pgrep on script name with arguments:
123976
counting simple pgrep on script name with arguments with wc -l
2
counting pgrep echo result with wc -w
1
^Z
[1]+ Stoppé /tmp/test.sh 123
[frederic.charrier#charrier tmp]$ /tmp/test.sh 123
inside test.sh, script name with arguments: /tmp/test.sh +123$
simple pgrep on script name with arguments:
123976
124029
counting simple pgrep on script name with arguments with wc -l
3
counting pgrep echo result with wc -w
2
My questions are:
when I run the script the first time, it's running once. So pgrep is returning only one result: 123976, which is fine. But why a "wc -l" on 123976 is returning 2?
when I run the script a second time, I get the same strange behavior: pgrep returns the correct result, pgrep | wc -l returns something wrong, and "echo pgrep ... | wc -w" returns the correct result. Why?
How to detect if a bash script is already running
If you are aware of the drawbacks of your method, using pgrep looks fine. Note that both $0 and $* can have regex-syntax stuff in them, you have to escape them first, and I think I would also do pgrep -f "^$0... to match it from the beginning.
why a "wc -l" on 123976 is returning 2?
Because command substitution $(..) spawns a subshell, so there are two shells running, when pgrep is executed.
Overall, echo $(cmd) is an antipattern. Just run it cmd.
In some cases, like when there is single one command inside command substitution, bash optimizes and replaces (exec) the subshell with the command itself, effectively eliminating the subshell. This is an optimization. That's why processes=$(pgrep ..) returns 1.
Why?
There is one more process running.

/usr/bin/time only timing first component of pipeline

I am trying to time the following command and eventually want to output the result to a file (hence the use of /usr/bin/time. However, the command being timed is always seq 1 2 rather than the entirety of the command:
/usr/bin/time -v seq 1 2 | parallel ssh-keygen -G /folder/{}.candidate -b 768
Is there a way to make sure the whole command is timed and not just the first little part?
If you are using bash it is easier to use the bash built-in:
$ time cmd1 | cmd2 | cmd3
will time the entire pipe.
If you really want to use the external time command, you have to run the pipe in a sub-shell:
$ /usr/bin/time sh -c 'cmd1 | cmd2 | cmd3'
or else you'd be piping the time instead of timing the pipe.
This syntax also has the nice side effect that it is easy to separate the output of the command and that of time:
$ /usr/bin/time sh -c 'cmd1 | cmd2 > cmdout.txt 2> cmderr.txt' 2>> time.txt
However, it has an extra difficulty in the quotes. If your command itself contains quotes, you may need to escape them in hard to read ways.

Linux: start a script after another has finished

I read the answer for this issue from this link
in Stackoverflow.com. But I am so new in writing shell script that I did something wrong. The following are my scripts:
testscript:
#!/bin/csh -f
pid=$(ps -opid= -C csh testscript1)
while [ -d /proc/$pid ] ; do
sleep 1
done && csh testscript2
exit
testscript1:
#!/bin/csh -f
/usr/bin/firefox
exit
testscript2:
#!/bin/csh -f
echo Done
exit
The purpose is for testscript to call testscript1 first; once testscript1 already finish (which means the firefox called in script1 is closed) testscript will call testscript2. However I got this result after running testscript:
$ csh testscript
Illegal variable name.
Please help me with this issue. Thanks ahead.
I believe this line is not CSH:
pid=$(ps -opid= -C csh testscript1)
In general in csh you define variables like this:
set pid=...
I am not sure what the $() syntax is, perhaps back ticks woudl work as a replacement:
set pid=`ps -opid= -C csh testscript1`
Perhaps you didn't notice that the scripts you found were written for bash, not csh, but
you're trying to process them with the csh interpreter.
It looks like you've misunderstood what the original code was trying to do -- it was
intended to monitor an already-existing process, by looking up its process id using the process name.
You seem to be trying to start the first process from inside the ps command. But
in that case, there's no need for you to do anything so complicated -- all you need
is:
#!/bin/csh
csh testscript1
csh testscript2
Unless you go out of your way to run one of the scripts in the background,
the second script will not run until the first script is finished.
Although this has nothing to do with your problem, csh is more oriented toward
interactive use; for script writing, it's considered a poor choice, so you might be
better off learning bash instead.
Try,
below script will check testscript1's pid, if it is not found then it will execute testscirpt2
sp=$(ps -ef | grep testscript1 | grep -v grep | awk '{print $2}')
/bin/ls -l /proc/ | grep $sp > /dev/null 2>&1 && sleep 0 || /bin/csh testscript2

How can I use a pipe or redirect in a qsub command?

There are some commands I'd like to run on a grid using qsub (SGE 8.1.3, CentOS 5.9) that need to use a pipe (|) or a redirect (>). For example, let's say I have to parallelize the command
echo 'hello world' > hello.txt
(Obviously a simplified example: in reality I might need to redirect the output of a program like bowtie directly to samtools). If I did:
qsub echo 'hello world' > hello.txt
the resulting content of hello.txt would look like
Your job 123454321 ("echo") has been submitted
Similarly if I used a pipe (echo "hello world" | myprogram), that message is all that would be passed to myprogram, not the actual stdout.
I'm aware I could write a small bash script that each contain the command with the pipe/redirect, and then do qsub ./myscript.sh. However, I'm trying to run many parallelized jobs at the same time using a script, so I'd have to write many such bash scripts each with a slightly different command. When scripting this solution can start to feel very hackish. An example of such a script in Python:
for i, (infile1, infile2, outfile) in enumerate(files):
command = ("bowtie -S %s %s | " +
"samtools view -bS - > %s\n") % (infile1, infile2, outfile)
script = "job" + str(counter) + ".sh"
open(script, "w").write(command)
os.system("chmod 755 %s" % script)
os.system("qsub -cwd ./%s" % script)
This is frustrating for a few reasons, among them that my program can't even delete the many jobXX.sh scripts afterwards to clean up after itself, since I don't know how long the job will be waiting in the queue, and the script has to be there when the job starts.
Is there a way to provide my full echo 'hello world' > hello.txt command to qsub without having to create another file containing the command?
You can do this by turning it into a bash -c command, which lets you put the | in a quoted statement:
qsub bash -c "cmd <options> | cmd2 <options>"
As #spuder has noted in the comments, it seems that in other versions of qsub (not SGE 8.1.3, which I'm using), one can solve the problem with:
echo "cmd <options> | cmd2 <options>" | qsub
as well.
Although my answer is a bit late I am adding it for any incoming viewers. To use a pipe/direct and submit that as a qsub job you need to do a couple of things. But first, using qsub at the end of a pipe like you're doing will only result in one job being sent to the queue (i.e. Your code will run serially rather than get parallelized).
Run qsub with enabling binary mode since the default qsub behavior rather expects compiled code. For that you use the "-b y" flag to qsub and you'll avoid any errors of the sort "command required for a binary mode" or "script length does not match declared length".
echo each call to qsub and then pipe that to shell.
Suppose you have a file params-query.txt which hold several bowtie commands and piped calls to samtools of the following form:
bowtie -q query -1 param1 -2 param2 ... | samtools ...
To send each query as a separate job first prepare your command line units from STDIN through xargs STDIN. Notice the quotes around the braces are important if you are submitting a command of piped parts. That way your entire query is treated a single unit.
cat params-query.txt | xargs -i echo qsub -b y -o output_log -e error_log -N job_name \"{}\" | sh
If that didn't work as expected then you're probably better off generating an intermediate output between bowtie and samtools before calling samtools to accept that intermediate output. You won't need to change the qsub call through xargs but the code in params-query.txt should look like:
bowtie -q query -o intermediate_query_out -1 param1 -2 param2 && samtools read_from_intermediate_query_out
This page has interesting qsub tricks you might like
grep http *.job | awk -F: '{print $1}' | sort -u | xargs -I {} qsub {}

Why are commands executed in backquotes giving me different results when done in as script?

I have a script that I mean to be run from cron that ensures that a daemon that I wrote is working. The contents of the script file are similar to the following:
daemon_pid=`ps -A | grep -c fsdaemon`
echo "daemon_pid: " $daemon_pid
if [ $daemon_pid -eq 0 ]; then
echo "restarting fsdaemon"
/etc/init.d/fsdaemon start
fi
When I execute this script from the command prompt, the line that echoes the value of $daemon_pid is reporting a value of 2. This value is two regardless of whether my daemon is running or not. If, however, I execute the command with back quotes and then examine the $daemon_pid variable, the value of $daemon_pid is now one. I have also tried single stepping through the script using bashdb and, when I examine the variables using that tool, they are what they should be.
My question therefore is: why is there a difference in the behaviour between when the script is executed by the shell versus when the commands in the script are executed manually? I'm sure that there is something very fundamental that I am missing.
You're very likely encountering the grep as part of the 'answer' from ps.
To help fully understand what is happening, turn off the -c option, to see what data is being returned from just ps -A | grep fsdameon.
To solve the issue, some systems have a p(rocess)grep (pgrep). That will work, OR
ps -A | grep -v grep | grep -c fsdaemon
Is a common idiom you will see, but at the expense of another process.
The cleanest solution is,
ps -A | grep -c '[f]sdaemon'
The regular expression syntax should work with all greps, on all systems.
I hope this helps.
The problem is that grep itself shows up... Try running this command with anything after grep -c:
eple:~ erik$ ps -a | grep -c asdfladsf
1
eple:~ erik$ ps -a | grep -c gooblygoolbygookeydookey
1
eple:~ erik$
What does ps -a | grep fsdaemon return? Just look at the processes actually listed... :)
Since this is Linux, why not try the pgrep? This saves you a pipe, and you don't end up with grep reporting back the daemon script itself running.
Aany process with arguments including that name will add to the count - grep, and your script.
psing for a process isn't really reliable, you should use a lock file.
As several people have pointed out already, your process count is inflated because ps | grep detects (1) the script itself and (2) the subprocess created by the backquotes, which inherits the name of the main script. So an easy solution is to change the name of the script to something that doesn't include the name you're looking for. But you can do better.
The "best-practice" solution that I would suggest is to use the facilities provided by your operating system. It's not uncommon for an init script to create a PID file as part of the process of starting your daemon; in other words, instead of just running the daemon itself, you use a wrapper script that starts the daemon and then writes the process ID to a file somewhere. If start-stop-daemon exists on your system (and I think it's fairly common these days), you can use that like so:
start-stop-daemon --start --quiet --background \
--make-pidfile --pidfile /var/run/fsdaemon.pid -- /usr/bin/fsdaemon
(obviously replace the path /usr/bin/fsdaemon as appropriate) to start it, and then
start-stop-daemon --stop --quiet --pidfile /var/run/fsdaemon.pid
to stop it. start-stop-daemon has other options that might be useful to you, which you can investigate by reading the man page.
If you don't have access to start-stop-daemon, you can write a wrapper script to do basically the same thing, something like this to start:
echo "$$" > /var/run/fsdaemon.pid
exec /usr/bin/fsdaemon
and this to stop:
kill $(< /var/run/fsdaemon/pid)
rm /var/run/fsdaemon.pid
(this is pretty crude, of course, but it should normally work).
Anyway, once you have the setup to generate a PID file, whether by using start-stop-daemon or not, you can update your check script to this:
daemon_pid=`ps --no-headers --pid $(< /var/run/fsdaemon.pid) | wc -l`
if [ $daemon_pid -eq 0 ]; then
echo "restarting fsdaemon"
/etc/init.d/fsdaemon restart
fi
(one would think there would be a concise command to check whether a given PID is running, but I don't know it).
If you don't want to (or can't) create a PID file, I would at least suggest pgrep instead of ps | grep, since pgrep will search directly for a process by name and won't find anything that just happens to include the same string.
daemon_pid=`pgrep -x -c fsdaemon`
if [ $daemon_pid -eq 0 ]; then
echo "restarting fsdaemon"
/etc/init.d/fsdaemon restart
fi
The -x means "match exactly", and -c works as with grep.
By the way, it seems a bit misleading to name your variable daemon_pid when it is actually a count.

Resources