Confusion with pid's and processes on linux - linux

from reading docs and online most people have been saying that to kill a process in linux, only the command kill "pid" is needed.
For example to kill memcached would be kill $(cat memcached.pid)
But for pretty much every process that i've tried to kill including the one above, this would not work. I managed to get it to work with a different command:
ps aux | grep (process name here)
That command, for whatever reason would get a different pid, which would work when killing the program.
I guess my question is, why are there different pid's? Isn't the point of an id to be unique? Why do celery, memcached, and other processes all have a different pid's when using the aux | grep command, versus the pid in the .pid file? Is this some kinda error on my configuration or is it ment to be like this?
Also, where is it possible to get all arguments and descriptions for an executable in linux?
I know the "man" command is useful for some functions, but it wont work for many executables, like celery for example.
Thanks!

The process ID (pid) is assigned by the operating system on-the-fly when a process starts up. It's unique in the sense that no two processes have the same ID. However, the actual value is not guaranteed to be the same from one run of the process to another. The best way to think of it is like those "now serving" tickets:
You are correct that you can look up an ID via ps and grep, though you may find it easier to just use:
pgrep (process name here)
Also, if you just want to kill the process, you can even skip the above step and use:
pkill (process name here)

Related

How to reference a process reliably (using a tag or something similar)?

I have multiple processes (web-scrapers) running in the background (one scraper for each website). The processes are python scripts that were spawned/forked a few weeks ago. I would like to control (they listen on sockets to enable IPC) them from one central place (kinda like a dispatcher/manager python script), while the processes (scrapers) remain individual unrelated processes.
I thought about using the PID to reference each process, but that would require storing the PID whenever I (re)launch one of the scrapers because there is no semantic relation between a number and my use case. I just want to supply some text-tag along with the process when I launch it, so that I can reference it later on.
pgrep -f searches all processes by their name and calling pattern (including arguments).
E.g. if you spawned a process as python myscraper --scrapernametag=uniqueid01 then you can run:
TAG=uniqueid01; pgrep -f "scrapernametag=$TAG"
to discover the PID of a process later down the line.

What exactly is doing ps -aux in terminal?

Command ps statically lists all processes. What exactly is doing -aux option?
a - all processes
u - user
x - execute
Something more?
I found a nice explanation here: http://www.linfo.org/ps.html
A common and convenient way of using ps to obtain much more complete information about the processes currently on the system is to use the following:
ps -aux | less
The -a option tells ps to list the processes of all users on the system rather than just those of the current user, with the exception of group leaders and processes not associated with a terminal. A group leader is the first member of a group of related processes.
The -u option tells ps to provide detailed information about each process. The -x option adds to the list processes that have no controlling terminal, such as daemons, which are programs that are launched during booting (i.e., computer startup) and run unobtrusively in the background until they are activated by a particular event or condition.

Monitor multiple instances of same process

I'm trying to monitor multiple instances of the same process. I can't for the life of me do this without running into a problem.
All the examples I have seen so far on the internet involve me writing out the PID or monitoring the process itself. The issue is that if one instance fails, it doesn't mean all the rest have failed as well.
In order for me to write out the PID for each process it would mean I'd probably have to run each process with a short delay to record the correct, seeing as the way I need to record the PID is done through the process name being probed.
If I'm wrong on this, please correct me. But so far I haven't found a way to monitor each individual process, which all have the same name.
To add to the above, the processes are run in a batch script and each one is run in its own screen (ffmpeg would otherwise not be able to run in the background).
If anyone can point me vaguely in the right direction on how to do this in Linux I would really appreciate it. I read somewhere that it would be possible to set up symlinks which would then give me fake process names and that way I can monitor the 'fake' process name.
man wait. For example, in shell script:
wget "$url1" &
pid1=$!
wget "$url2" &
pid2=$!
wait $pid1 $pid2
will launch both wget processes, and wait until both processes are finished (or failed)

How to set process ID in Linux for a specific program

I was wondering if there is some way to force to use some specific process ID to Linux to some application before running it. I need to know in advance the process ID.
Actually, there is a way to do this. Since kernel 3.3 with CONFIG_CHECKPOINT_RESTORE set(which is set in most distros), there is /proc/sys/kernel/ns_last_pid which contains last pid generated by kernel. So, if you want to set PID for forked program, you need to perform these actions:
Open /proc/sys/kernel/ns_last_pid and get fd
flock it with LOCK_EX
write PID-1
fork
VoilĂ ! Child will have PID that you wanted.
Also, don't forget to unlock (flock with LOCK_UN) and close ns_last_pid.
You can checkout C code at my blog here.
As many already suggested you cannot set directly a PID but usually shells have facilities to know which is the last forked process ID.
For example in bash you can lunch an executable in background (appending &) and find its PID in the variable $!.
Example:
$ lsof >/dev/null &
[1] 15458
$ echo $!
15458
On CentOS7.2 you can simply do the following:
Let's say you want to execute the sleep command with a PID of 1894.
sudo echo 1893 > /proc/sys/kernel/ns_last_pid; sleep 1000
(However, keep in mind that if by chance another process executes in the extremely brief amount of time between the echo and sleep command you could end up with a PID of 1895+. I've tested it hundreds of times and it has never happened to me. If you want to guarantee the PID you will need to lock the file after you write to it, execute sleep, then unlock the file as suggested in Ruslan's answer above.)
There's no way to force to use specific PID for process. As Wikipedia says:
Process IDs are usually allocated on a sequential basis, beginning at
0 and rising to a maximum value which varies from system to system.
Once this limit is reached, allocation restarts at 300 and again
increases. In Mac OS X and HP-UX, allocation restarts at 100. However,
for this and subsequent passes any PIDs still assigned to processes
are skipped
You could just repeatedly call fork() to create new child processes until you get a child with the desired PID. Remember to call wait() often, or you will hit the per-user process limit quickly.
This method assumes that the OS assigns new PIDs sequentially, which appears to be the case eg. on Linux 3.3.
The advantage over the ns_last_pid method is that it doesn't require root permissions.
Every process on a linux system is generated by fork() so there should be no way to force a specific PID.
From Linux 5.5 you can pass an array of PIDs to the clone3 system call to be assigned to the new process, up to one for each nested PID namespace, from the inside out. This requires CAP_SYS_ADMIN or (since Linux 5.9) CAP_CHECKPOINT_RESTORE over the PID namespace.
If you a not concerned with PID namespaces use an array of size one.

Advanced pgrep-like process search in bash

I need to find the pid of a certain java process in bash on linux.
If there's only one java process,
PID=$(pgrep java)
works.
For multiple java processes it becomes more complicated. Manually, I run pstree, find the ancestor of the java process that I need first, then find the java process in question. Is it possible to do this in bash? Basically I need the functionality that in pseudo-code looks like:
Having `processname1` and `processname2`
and knowing that `processname2` is in the subtree of 'processname1',
find the pid of `processname2`.
In this example the java process will be processname2.
Reformulating your psuedo-code question: find all processname2 processes which have a processname1 process as parent. This can be directly expressed using the following nested pgrep call:
pgrep -P $(pgrep -d, processname1) processname2
Here's the documentation for those flag straight from the pgrep(1) manpage:
-d delimiter
Sets the string used to delimit each process ID in the output
(by default a newline).
-P ppid,...
Only match processes whose parent process ID is listed.
Note that this will only work if processname2 is an immediate child process of processname1.

Resources