core dump filename gets thread name instead of executable name with core_pattern %e.%p.core - linux

I recently started setting some thread names within my application by using pthread_setname_np(). After doing this, if a crash occurs within one of the named threads, the core dump filename is getting the thread name instead of executable name with core_pattern %e.%p.core
According to the core man page, the %e flag in the core_pattern is supposed to get expanded to the executable name. It doesn't say anything about the thread name.
I want the executable name and not the thread name, because I have other automated scripts (not maintained by me) that depend on the core filenames beginning with the application name.
Is this a bug in pthread_setname_np() or core_pattern?
I am running on Linux CentOS 6.7.

So I wound up working around the issue by piping the core dump to a Python script which then renames the core filename based on a hard-coded mapping of thread name regex patterns to executable name.
Here's how to pipe the core to a script:
/sbin/sysctl -q -w "kernel.core_pattern=|/opt/mydirectory/bin/core_helper.py --corefile /opt/mydirectory/coredumps/%e.%p.core"
/sbin/sysctl -q -w "kernel.core_pipe_limit=8"
Here's a snippet of a class in core_helper.py. As a bonus, if you give the core filename a .gz extension, it will compress the coredump with gzip.
class CoredumpHelperConfig:
def __init__(self, corefile):
self.corefile = corefile
# Work-around: Linux is putting the thread name into the
# core filename instead of the executable. Revert the thread name to
# executable name by using this mapping.
# The order is important -- the first match will be used.
threadNameToExecutableMapping = [# pattern , replace
(r'fooThread.*', r'foo'),
(r'barThread.*', r'foo'),
]
def processCore(self):
(dirname, basename) = os.path.split(self.corefile)
# E.g. fooThread0.21495.core (no compression) or fooThread0.21495.core.gz (compression requested)
match = re.match(r'^(\w+)\.(\d+)\.(core(\.gz)?)$', basename)
assert match
(threadName, pid, ext, compression) = match.groups()
# Work-around for thread name problem
execName = threadName
for (pattern, replace) in CoredumpHelperConfig.threadNameToExecutableMapping:
match = re.match(pattern, threadName)
if match:
execName = re.sub(pattern, replace, threadName)
break
self.corefile = os.path.join(dirname, '.'.join([execName, pid, ext]))
# Pipe the contents of the core into corefile, optionally compressing it
core = open(self.corefile, 'w')
coreProcessApp = "tee"
if(compression):
coreProcessApp = "gzip"
p = subprocess.Popen(coreProcessApp, shell=True, stdin=sys.stdin, stdout=core, stderr=core)
core.close()
return True
I'll leave it as an exercise to the reader on how to write the rest of the file.

The executable name that generated the core can be retrieved by using gdb.
The following prints it:
gdb -batch -ex "core corefile" | grep "Core was generated" | cut -d\` -f2 | cut -d"'" -f1 | awk '{print $1}'
Or better yet use the pid %p and /proc to get it. Example:
$ sleep 900 &
[1] 2615
$ readlink /proc/$(pidof sleep)/exe
/bin/sleep
$ basename $(readlink /proc/$(pidof sleep)/exe)
sleep

I have the same problem. And I have worked around with the same way.
I get executable filename use /proc/pid/exe
src_file_path = os.readlink("/proc/%s/exe" %pid)
exec_filename = os.path.basename(src_file_path)

Related

Find patterns and rename multiple files

I have a list of machine names and hostnames
ex)
# cat /tmp/machine_list.txt
[one]apple machine #1 myserver1
[two]apple machine #2 myserver2
[three]apple machine #3 myserver3
and, server each directory
and each directory contains an tar file and a file with the host name written on it.
# ls /tmp/sos1/*
sosreport1.tar.gz
hostname_map.txt
# cat /tmp/sos1/hostname_map.txt
myserver1
# ls /tmp/sos2/*
sosreport2.tar.gz
hostname_map.txt
# cat /tmp/sos2/hostname_map.txt
myserver2
# ls /tmp/sos3/*
sosreport3.tar.gz
hostname_map.txt
# cat /tmp/sos3/hostname_map.txt
myserver3
Is it possible to rename the sosreport*.tar.gz by referencing the hostname_map in each directory relative to the /tmp/machine_list.txt file? (like below)
# ls /tmp/sos1/*
[one]apple_machine_#1_myserver1_sosreport1.tar.gz
# ls /tmp/sos2/*
[two]apple_machine_#2_myserver2_sosreport2.tar.gz
# ls /tmp/sos3/*
[three]apple_machine_#3_myserver3_sosreport3.tar.gz
A single change is possible, but what about multiple changes?
Something like this?
srvname () {
awk -v srv="$(cat "$1")" -F '\t' '$2==srv { print $1; exit }' machine_list.txt
}
for dir in /tmp/sos*/; do
server=$(srvname "$dir"/hostname_map.txt)
mv "$dir"/sosreport*.tar.gz "$dir/$server.tar.gz"
done
Demo: https://ideone.com/TS5VyQ
The function assumes your mapping file is tab-delimited. If you want underscores instead of spaces in the server names, change the mapping file.
This should be portable to POSIX sh; the cat could be replaced with a Bash redirection, but I feel that it's not worth giving up portability for such a small change.
If this were my project, I'd probably make the function into a self-contained reusable script (with the input file replaced with a here document in the script itself) since there will probably be more situations where you need to perform the same mapping.

Prevent script running with same arguments twice

We are looking into building a logcheck script that will tail a given log file and email when the given arguments are found. I am having trouble accurately determining if another version of this script is running with at least one of the same arguments against the same file. Script can take the following:
logcheck -i <filename(s)> <searchCriterion> <optionalEmailAddresses>
I have tried to use ps aux with a series of grep, sed, and cut, but it always ends up being more code than the script itself and seldom works very efficiently. Is there an efficient way to tell if another version of this script is running with the same filename and search criteria? A few examples of input:
EX1 .\logcheck -i file1,file2,file3 "foo string 0123" email#address.com
EX2 .\logcheck -s file1 Hello,World,Foo
EX3 .\logcheck -i file3 foo email#address1.com,email#address2.com
In this case 3 should not run because 1 is already running with parameters file3 and foo.
There are many solutions for your problem, I would recommend creating a lock file, with the following format:
arg1Ex1 PID#(Ex1)
arg2Ex1 PID#(Ex1)
arg3Ex1 PID#(Ex1)
arg4Ex1 PID#(Ex1)
arg1Ex2 PID#(Ex2)
arg2Ex2 PID#(Ex2)
arg3Ex2 PID#(Ex2)
arg4Ex2 PID#(Ex2)
when your script starts:
It will search in the file for all the arguments it has received (awk command or grep)
If one of the arguments is present in the list, fetch the process PID (awk 'print $2' for example) to check if it is still running (ps) (double check for concurrency and in case of process ended abnormally previously garbage might remain inside the file)
If the PID is still there, the script will not run
Else append the arguments to the lock file with the current process PID and run the script.
At the end, of the execution you remove the lines that contains the arguments that have been used by the script, or remove all lines with its PID.

Finding a process ID given a socket and inode in Python 3

/proc/net/tcp gives me a local address, port, and inode number for a socket (0.0.0.0:5432 and 9289, for example).
I'd like to find the PID for a specific process, given the above information.
It's possible to open every numbered folder in /proc, and then check symlinks for matching socket/inode numbers with a shell command like "$ sudo ls -l /proc/*/fd/ 2>/dev/null | grep socket". However, this seems more computationally expensive than necessary, since <5% of the processes on any given system have open TCP sockets.
What's the most efficient way to find the PID which has opened a given socket? I'd prefer to use standard libraries, and I'm currently developing with Python 3.2.3.
Edit: Removed the code samples from the question, since they are now included in the answer below.
The following code accomplishes the original goal:
def find_pid(inode):
# get a list of all files and directories in /proc
procFiles = os.listdir("/proc/")
# remove the pid of the current python process
procFiles.remove(str(os.getpid()))
# set up a list object to store valid pids
pids = []
for f in procFiles:
try:
# convert the filename to an integer and back, saving the result to a list
integer = int(f)
pids.append(str(integer))
except ValueError:
# if the filename doesn't convert to an integer, it's not a pid, and we don't care about it
pass
for pid in pids:
# check the fd directory for socket information
fds = os.listdir("/proc/%s/fd/" % pid)
for fd in fds:
# save the pid for sockets matching our inode
if ('socket:[%d]' % inode) == os.readlink("/proc/%s/fd/%s" % (pid, fd)):
return pid
I do not know how to do this in python, but you could use lsof(1):
lsof -i | awk -v sock=158384387 '$6 == sock{print $2}'
158384387 is the inode for the socket. And then call it from python using subprocess.Popen.
You will have to use sudo(8) if you want to see sockets opened by other users.

What does the '-' operator actually do in Linux?

I see the - operator behaving in different ways with different commands.
For example,
cd -
cds to the previous directory, whereas,
vim -
reads from stdin
So I want to know why the - operator is behaving in 2 different ways here. Can someone point me to some detailed documentation of the - operator?
It is not an operator, it is an argument. When you write a program in C or C++ it comes as argv[1] (when it is the first argument) and you can do whatever you like with it.
By convention, many programs use - as a placeholder for stdin where an input file name is normally required, and stdout where an output file name is expected. But cd does not require reading a file stream, why should it need stdin or stdout?
Extra: here below is the excerpt from vim's main.c that parses arguments that begin with -: if there is no additional character it activates STDIN input.
else if (argv[0][0] == '-' && !had_minmin)
{
want_argument = FALSE;
c = argv[0][argv_idx++];
#ifdef VMS
...
#endif
switch (c)
{
case NUL: /* "vim -" read from stdin */
/* "ex -" silent mode */
if (exmode_active)
silent_mode = TRUE;
else
{
if (parmp->edit_type != EDIT_NONE)
mainerr(ME_TOO_MANY_ARGS, (char_u *)argv[0]);
parmp->edit_type = EDIT_STDIN;
read_cmd_fd = 2; /* read from stderr instead of stdin */
}
The dash on its own is a simple command argument. Its meaning is command dependent. Its two most usual meanings are 'standard input' or (less often) 'standard output'. The meaning of 'previous directory' is unique to the cd shell built-in (and it only means that in some shells, not all shells).
cat file1 - file2 | troff ...
This means read file1, standard input, and file2 in that sequence and send the output to troff.
An extreme case of using - to mean 'standard input' or 'standard output' comes from (GNU) tar:
generate_file_list ... |
tar -cf - -T - |
( cd /some/where/else; tar -xf - )
The -cf - options in the first tar mean 'create an archive' and 'the output file is standard output'; the -T - option means 'read the list of files and/or directories from standard input'.
The -xf - options in the second tar mean 'extract an archive' and 'the input file is standard input'. In fact, GNU tar has an option -C /some/where/else which means it does the cd itself, so the whole command could be:
generate_file_list ... |
tar -cf - -T - |
tar -xf - -C /some/where/else
The net effect of this is to copy the files named by the generate_file_list command from under the 'current directory' to /some/where/else, preserving the directory structure. (The 'current directory' has to be taken with a small pinch of salt; any absolute file names are given special treatment by GNU tar — it removes the leading slash — and relative names are taken as relative to the current directory.)
It depends on the program it's being used in. It means different things to different programs.
I think different program use different convention. manpages shows how each program interpret -, here is man bash
-
At shell startup, set to the absolute pathname used to invoke the shell
or shell script being executed as passed in the environment or argument list.
Subsequently, expands to the last argument to the previous command, after expansion.
Also set to the full pathname used to invoke each command executed and placed
in the environment exported to that command. When checking mail, this parameter
holds the name of the mail file currently being checked.
and man vim
- The file to edit is read from stdin. Commands are read from stderr,
which should be a tty.

Compressing the core files during core generation

Is there way to compress the core files during core dump generation?
If the storage space is limited in the system, is there a way of conserving it in case of need for core dump generation with immediate compression?
Ideally the method would work on older versions of linux such as 2.6.x.
The Linux kernel /proc/sys/kernel/core_pattern file will do what you want: http://www.mjmwired.net/kernel/Documentation/sysctl/kernel.txt#191
Set the filename to something like |/bin/gzip -1 > /var/crash/core-%t-%p-%u.gz and your core files should be saved compressed for you.
For an embedded Linux systems, following script change perfectly works to generate compressed core files in 2 steps
step 1: create a script
touch /bin/gen_compress_core.sh
chmod +x /bin/gen_compress_core.sh
cat > /bin/gen_compress_core.sh #!/bin/sh exec /bin/gzip -f - >"/var/core/core-$1.$2.gz"
ctrl +d
step 2: update the core pattern file
cat > /proc/sys/kernel/core_pattern |/bin/gen_compress_core.sh %e %p ctrl+d
As suggested by other answer, the Linux kernel /proc/sys/kernel/core_pattern file is good place to start: http://www.mjmwired.net/kernel/Documentation/sysctl/kernel.txt#141
As documentation says you can specify the special character "|" which will tell kernel to output the file to script. As suggested you could use |/bin/gzip -1 > /var/crash/core-%t-%p-%u.gz as name, however it doesn't seem to work for me. I expect that the reason is that on my system kernel doesn't treat the > character as a output, rather it probably passes it as a parameter to gzip.
In order to avoid this problem, like other suggested you can create your file in some location I am using /home//crash/core.sh, create it using the following command, replacing with your user. Alternatively you can also obviously change the entire path.
echo -e '#!/bin/bash\nexec /bin/gzip -f - >"/home/<username>/crashes/core-$1-$2-$3-$4-$5.gz"' > ~/crashes/core.sh
Now this script will take 5 input parameters and concatenate them and add to core-path. The full paths must be specified in the ~/crashes/core.sh. Also the location of this script can be specified. Now lets tell kernel to use tour executable with parameters when generating file:
sudo sysctl -w kernel.core_pattern="|/home/<username>/crashes/core.sh %e %p %h %t"
Again should be replaced (or entire path to match location and name of core.sh script). Next step is to crash some program, lets create example crashing cpp file:
int main (){
int * a = nullptr;
int b = *a;
}
After compiling and running there are 2 options, either we will see:
Segmentation fault (core dumped)
Or
Segmentation fault
In case we see the latter, there are few possible reasons.
ulimit is not set, ulimit -c should specify what is limit for cores
apport or your distro core dump collector is not running, this should be investigated further
there is an error in script we wrote, I suggest than checking some basic dump path to check if the other things aren't reason the below should create /tmp/core.dump:
sudo sysctl -w kernel.core_pattern="/tmp/core.dump"
I know there is already an answer for this question however it wasn't obvious for me why it isn't working "out of the box" so I wanted to summarize my findings, hope it helps someone.

Resources