Finding a process ID given a socket and inode in Python 3

Finding a process ID given a socket and inode in Python 3 - linux

/proc/net/tcp gives me a local address, port, and inode number for a socket (0.0.0.0:5432 and 9289, for example).
I'd like to find the PID for a specific process, given the above information.
It's possible to open every numbered folder in /proc, and then check symlinks for matching socket/inode numbers with a shell command like "$ sudo ls -l /proc/*/fd/ 2>/dev/null | grep socket". However, this seems more computationally expensive than necessary, since <5% of the processes on any given system have open TCP sockets.
What's the most efficient way to find the PID which has opened a given socket? I'd prefer to use standard libraries, and I'm currently developing with Python 3.2.3.
Edit: Removed the code samples from the question, since they are now included in the answer below.

The following code accomplishes the original goal:
def find_pid(inode):
# get a list of all files and directories in /proc
procFiles = os.listdir("/proc/")
# remove the pid of the current python process
procFiles.remove(str(os.getpid()))
# set up a list object to store valid pids
pids = []
for f in procFiles:
try:
# convert the filename to an integer and back, saving the result to a list
integer = int(f)
pids.append(str(integer))
except ValueError:
# if the filename doesn't convert to an integer, it's not a pid, and we don't care about it
pass
for pid in pids:
# check the fd directory for socket information
fds = os.listdir("/proc/%s/fd/" % pid)
for fd in fds:
# save the pid for sockets matching our inode
if ('socket:[%d]' % inode) == os.readlink("/proc/%s/fd/%s" % (pid, fd)):
return pid

I do not know how to do this in python, but you could use lsof(1):
lsof -i | awk -v sock=158384387 '$6 == sock{print $2}'
158384387 is the inode for the socket. And then call it from python using subprocess.Popen.
You will have to use sudo(8) if you want to see sockets opened by other users.

Related

Prevent script running with same arguments twice

We are looking into building a logcheck script that will tail a given log file and email when the given arguments are found. I am having trouble accurately determining if another version of this script is running with at least one of the same arguments against the same file. Script can take the following:
logcheck -i <filename(s)> <searchCriterion> <optionalEmailAddresses>
I have tried to use ps aux with a series of grep, sed, and cut, but it always ends up being more code than the script itself and seldom works very efficiently. Is there an efficient way to tell if another version of this script is running with the same filename and search criteria? A few examples of input:
EX1 .\logcheck -i file1,file2,file3 "foo string 0123" email#address.com
EX2 .\logcheck -s file1 Hello,World,Foo
EX3 .\logcheck -i file3 foo email#address1.com,email#address2.com
In this case 3 should not run because 1 is already running with parameters file3 and foo.

There are many solutions for your problem, I would recommend creating a lock file, with the following format:
arg1Ex1 PID#(Ex1)
arg2Ex1 PID#(Ex1)
arg3Ex1 PID#(Ex1)
arg4Ex1 PID#(Ex1)
arg1Ex2 PID#(Ex2)
arg2Ex2 PID#(Ex2)
arg3Ex2 PID#(Ex2)
arg4Ex2 PID#(Ex2)
when your script starts:
It will search in the file for all the arguments it has received (awk command or grep)
If one of the arguments is present in the list, fetch the process PID (awk 'print $2' for example) to check if it is still running (ps) (double check for concurrency and in case of process ended abnormally previously garbage might remain inside the file)
If the PID is still there, the script will not run
Else append the arguments to the lock file with the current process PID and run the script.
At the end, of the execution you remove the lines that contains the arguments that have been used by the script, or remove all lines with its PID.

How do I implement "file -s <file>" on Linux in pure Go?

Intent:
Does Go have the functionality (package or otherwise) to perform a special file stat on Linux akin to the command file -s <path>
Example:
[root#localhost ~]# file /proc/uptime
/proc/uptime: empty
[root#localhost ~]# file -s /proc/uptime
/proc/uptime: ASCII text
Use Case:
I have a fileglob of files in /proc/* that I need to very quickly detect if they are truly empty instead of appearing to be empty.
Using The os Package:
Code:
result,_ := os.Stat("/proc/uptime")
fmt.Println("Name:",result.Name()," Size:",result.Size()," Mode:",int(result.Mode()))
fmt.Printf("%q",result)
Result:
Name: uptime Size: 0 Mode: 292
&{"uptime" '\x00' 'Ĥ' {%!q(int64=63606896088) %!q(int32=413685520) %!q(*time.Location=&{ [] [] 0 0 <nil>})} {'\x03' %!q(uint64=4026532071) '\x01' '脤' '\x00' '\x00' '\x00' '\x00' '\x00' 'Ѐ' '\x00' {%!q(int64=1471299288) %!q(int64=413685520)} {%!q(int64=1471299288) %!q(int64=413685520)} {%!q(int64=1471299288) %!q(int64=413685520)} ['\x00' '\x00' '\x00']}}
Obvious Workaround:
There is the obvious workaround of the following. But it's a little over the top to need to call in a bash shell in order to get file stats.
output,_ := exec.Command("bash","-c","file -s","/proc/uptime").Output()
//parse output etc...
EDIT/MY PRACTICAL USE CASE:
Quickly determining which files are zero size without needing to read each one of them first.
file -s /cgroup/memory/lsf/<cluster>/*/tasks | <clean up commands> | uniq -c
6 /cgroup/memory/lsf/<cluster>/<jobid>/tasks: ASCII text
805 /cgroup/memory/lsf/<cluster>/<jobid>/tasks: empty
So in this case, I know that only those 6 jobs are running and the rest (805) have terminated. Reading the file works like this:
# cat /cgroup/memory/lsf/<cluster>/<jobid>/tasks
#
or
# cat /cgroup/memory/lsf/<cluster>/<jobid>/tasks
12352
53455
...

I'm afraid you might be confusing matters here: file is special in precisely a way it "knows" a set of heuristics to carry out its tasks.
To my knowledge, Go does not have anything like this in its standard library, and I've not came across a 3rd-party package implementing a file-like functionality (though I invite you to search by relevant keywords on http://godoc.org)
On the other hand, Go provides full access to the syscall interface of the underlying OS so when it comes to querying the OS in a way file does it, there's nothing you could not do in plain Go.
So I suggest you to just fetch the source code of file, learn what it does in its mode turned on by the "-s" command-line option and implement that in your Go code.
We'll try to have you with specific problems doing that — should you have any.
Update
Looks like I've managed to grasp the OP is struggling with: a simple check:
$ stat -c %s /proc/$$/status && wc -c < $_
0
849
That is, the stat call on a file under /proc shows it has no contents but actually reading from that file returns that contents.
OK, so the solution is simple: instead of doing a call to os.Stat() while traversing the subtree of the filesystem one should instead merely attempt to read a single byte from the file, like in:
var buf [1]byte
f, err := os.Open(fname)
if err != nil {
// do something, or maybe ignore.
// A not existing file is OK to ignore
// (the POSIX error code will be ENOENT)
// because after the `path/filepath.Walk()` fetched an entry for
// this file from its directory, the file might well have gone.
}
_, err = f.Read(buf[:])
if err != nil {
if err == io.EOF {
// OK, we failed to read 1 byte, so the file is empty.
}
// Otherwise, deal with the error
}
f.Close()
You might try to be more clever and first obtain the stat information
(using a call to os.Stat()) to see if the file is a regular file—to
not attempt reading from sockets etc.

I have a fileglob of files in /proc/* that I need to very quickly
detect if they are truly empty instead of appearing to be empty.
They are truly empty in some sense (eg. they occupy no space on file system). If you want to check whether any data can be read from them, try reading from them - that's what file -s does:
-s, --special-files
Normally, file only attempts to read and
determine the type of argument files which stat(2) reports are
ordinary files. This prevents problems, because reading special files
may have peculiar consequences. Specifying the -s option causes file
to also read argument files which are block or character special
files. This is useful for determining the filesystem types of the
data in raw disk partitions, which are block special files. This
option also causes file to disregard the file size as reported by
stat(2) since on some systems it reports a zero size for raw disk
partitions.

core dump filename gets thread name instead of executable name with core_pattern %e.%p.core

I recently started setting some thread names within my application by using pthread_setname_np(). After doing this, if a crash occurs within one of the named threads, the core dump filename is getting the thread name instead of executable name with core_pattern %e.%p.core
According to the core man page, the %e flag in the core_pattern is supposed to get expanded to the executable name. It doesn't say anything about the thread name.
I want the executable name and not the thread name, because I have other automated scripts (not maintained by me) that depend on the core filenames beginning with the application name.
Is this a bug in pthread_setname_np() or core_pattern?
I am running on Linux CentOS 6.7.

So I wound up working around the issue by piping the core dump to a Python script which then renames the core filename based on a hard-coded mapping of thread name regex patterns to executable name.
Here's how to pipe the core to a script:
/sbin/sysctl -q -w "kernel.core_pattern=|/opt/mydirectory/bin/core_helper.py --corefile /opt/mydirectory/coredumps/%e.%p.core"
/sbin/sysctl -q -w "kernel.core_pipe_limit=8"
Here's a snippet of a class in core_helper.py. As a bonus, if you give the core filename a .gz extension, it will compress the coredump with gzip.
class CoredumpHelperConfig:
def __init__(self, corefile):
self.corefile = corefile
# Work-around: Linux is putting the thread name into the
# core filename instead of the executable. Revert the thread name to
# executable name by using this mapping.
# The order is important -- the first match will be used.
threadNameToExecutableMapping = [# pattern , replace
(r'fooThread.*', r'foo'),
(r'barThread.*', r'foo'),
]
def processCore(self):
(dirname, basename) = os.path.split(self.corefile)
# E.g. fooThread0.21495.core (no compression) or fooThread0.21495.core.gz (compression requested)
match = re.match(r'^(\w+)\.(\d+)\.(core(\.gz)?)$', basename)
assert match
(threadName, pid, ext, compression) = match.groups()
# Work-around for thread name problem
execName = threadName
for (pattern, replace) in CoredumpHelperConfig.threadNameToExecutableMapping:
match = re.match(pattern, threadName)
if match:
execName = re.sub(pattern, replace, threadName)
break
self.corefile = os.path.join(dirname, '.'.join([execName, pid, ext]))
# Pipe the contents of the core into corefile, optionally compressing it
core = open(self.corefile, 'w')
coreProcessApp = "tee"
if(compression):
coreProcessApp = "gzip"
p = subprocess.Popen(coreProcessApp, shell=True, stdin=sys.stdin, stdout=core, stderr=core)
core.close()
return True
I'll leave it as an exercise to the reader on how to write the rest of the file.

The executable name that generated the core can be retrieved by using gdb.
The following prints it:
gdb -batch -ex "core corefile" | grep "Core was generated" | cut -d\` -f2 | cut -d"'" -f1 | awk '{print $1}'
Or better yet use the pid %p and /proc to get it. Example:
$ sleep 900 &
[1] 2615
$ readlink /proc/$(pidof sleep)/exe
/bin/sleep
$ basename $(readlink /proc/$(pidof sleep)/exe)
sleep

I have the same problem. And I have worked around with the same way.
I get executable filename use /proc/pid/exe
src_file_path = os.readlink("/proc/%s/exe" %pid)
exec_filename = os.path.basename(src_file_path)

Understanding sincedb files from Logstash file input

When using the file input with Logstash, a sincedb file is written in order to keep track of the current position of monitored log files. How to understand its contents?
Example of a sincedb file:
286105 0 19 20678374

There are 4 fields (source):
inode
major device number
minor device number
byte offset
Assuming that a hard disk would be segmented in thousands of very tiny parts with a number for each one, the inode would be more or less like the number of the tiny part where the file begins. So a given inode is unique to each hard disk, but in order to address cases where there are multiple disks on the same server, using major and minor device number is required in order to guarantee uniqueness of the triplet {inode, minor device number, minor device number}. More accurate info about inodes on Wikipedia.
That said, I am not so sure that (for example) files mounted through NFS could not collide with local files since the inode of a file mounted through NFS seems to be the remote one. Even though I don't think that the plugin writer bothered about such cases, and despite using NFS myself, never ran into any trouble so far. Also I suspect the collision probability to be very tiny.
Now with the triplet formed by inode and major and minor device number we have a way of targeting the single log file that is being read by the plugin without error (or at least that was the original intent). The last number, the byte offset, keeps track of how far the input log file as already been read and outputted to Logstash.
In some specific architectures like Solaris or Windows there have been bugs with ruby wrongly detecting the inode number, which was equal to 0. This could for example lead to issues like logstash not detecting a file rotation.

This was super helpful. I wanted to map all my SinceDB files to the logstash inputs, so I put together a little bash two-liner to print this mapping.
filesystems=$(grep path /etc/logstash/conf.d/*.conf | awk -F'=>' '{ print $2 }' | xargs -I {} df -P {} 2>/dev/null | grep -v Filesystem | sort | uniq | cut -d' ' -f 1)
for fs in $filesystems; do for f in $(ls -a .sincedb_*); do echo $f; inodes=$(cut -d' ' -f 1 $f); for inode in $inodes; do sudo debugfs -R "ncheck $inode" $fs 2>/dev/null | grep -v Inode | cut -f 2; done; echo; done; done
I just documented the details about mapping SinceDB files to logstash input.

bash script read pipe or argument

I want my script to read a string either from stdin , if it's piped, or from an argument. So first i want to check if some text is piped and if not it should use an argument as input. My code looks something like this:
value=$(cat) # read from stdin
if [ "$pipe" != "" ]; then #check if pipe is not empty
#Do something with pipe string
else
#Do something with argument string
fi
The problem is when it's not piped, then the script will halt and wait for "ctrl d" and i dont want that. Any suggestions on how to solve this?
Thanks in advance.
/Tomas

What about checking the argument first?
if (($#)) ; then
process "$1"
else
cat | process
fi
Or, just take advantage from the same behaviour of cat:
cat "$#" | process

If you only need to know if it's a pipe or a redirection, it should be sufficient to determine if stdin is a terminal or not:
if [ -t 0 ]; then
# stdin is a tty: process command line
else
# stdin is not a tty: process standard input
fi
[ (aka test) with -t is equivalent to the libc isatty() function.
The above will work with both something | myscript and myscript < infile. This is the simplest solution, assuming your script is for interactive use.
The [ command is a builtin in bash and some other shells, and since [/test with -tis in POSIX, it's portable too (not relying on Linux, bash, or GNU utility features).
There's one edge case, test -t also returns false if the file descriptor is invalid, but it would take some slight adversity to arrange that. test -e will detect this, though assuming you have a filename such as /dev/stdin to use.
The POSIX tty command can also be used, and handles the adversity above. It will print the tty device name and return 0 if stdin is a terminal, and will print "not a tty" and return 1 in any other case:
if tty >/dev/null ; then
# stdin is a tty: process command line
else
# stdin is not a tty: process standard input
fi
(with GNU tty, you can use tty -s for silent operation)
A less portable way, though certainly acceptable on a typical Linux, is to use GNU stat with its %F format specifier, this returns the text "character special file", "fifo" and "regular file" in the cases of terminal, pipe and redirection respectively. stat requires a filename, so you must provide a specially-named file of the form /dev/stdin, /dev/fd/0, or /proc/self/fd/0, and use -L to chase symlinks:
stat -L -c "%F" /dev/stdin
This is probably the best way to handle non-interactive use (since you can't make assumptions about terminals then), or to detect an actual pipe (FIFO) distinct from redirection.
There is a slight gotcha with %F in that you cannot use it to tell the difference between a terminal and certain other device files, for example /dev/zero or /dev/null which are also "character special files" and might reasonably appear. An unpretty solution is to use %t to report the underlying device type (major, in hex), assuming you know what the underlying tty device number ranges are... and that depends on whether you're using BSD style ptys or Unix98 ptys, or whether you're on the actual console, among other things. In the simple case %t will be 0 though for a pipe or a redirection of a normal (non-special) file.
More general solutions to this kind of problem are to use bash's read with a timeout (read -t 0 ...) or non-blocking I/O with GNU dd (dd iflag=nonblock).
The latter will allow you to detect lack of input on stdin, dd will return an exit code of 1 if there is nothing ready to read. However, these are more suitable for non-blocking polling loops, rather than a once-off check: there is a race condition when you start two or more processes in a pipeline as one may be ready to read before another has written.

It's easier to check for command line arguments first and fallback to stdin if no arguments. Shell Parameter Expansion is a nice shorthand instead of the if-else:
value=${*:-`cat`}
# do something with $value

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string