Perl - Multithreading script prints memory map and backtrace

Perl - Multithreading script prints memory map and backtrace - multithreading

When I invoke my multithreaded perl script, on a few occassions, it throws some exception similar to the following. I am sorry that I could not share code. But if really needed I can try to build a snippet (if really really required). Because I guess this should have some theoretical answer.
*** glibc detected *** perl: double free or corruption (!prev): 0x00007f775401e9a0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3d74c75e66]
/lib64/libc.so.6[0x3d74c789b3]
/lib64/libc.so.6[0x3d74c7b880]
/lib64/libc.so.6(realloc+0xe5)[0x3d74c7baf5]
/usr/lib/../lib64/libcrypto.so.10(CRYPTO_realloc+0x5f)[0x7f775907bd8f]
/usr/lib/../lib64/libcrypto.so.10(lh_insert+0xee)[0x7f77590f763e]
/usr/lib/../lib64/libcrypto.so.10(OBJ_NAME_add+0x6b)[0x7f775907f12b]
/usr/lib/../lib64/libcrypto.so.10(EVP_add_cipher+0x27)[0x7f7759102387]
/usr/lib/../lib64/libcrypto.so.10(OpenSSL_add_all_ciphers+0x4b7)[0x7f7759106a07]
/usr/lib/../lib64/libcrypto.so.10(OPENSSL_add_all_algorithms_noconf+0xe)[0x7f775910653e]
/usr/local/lib/libssh2.so.1(libssh2_init+0x39)[0x7f77596800b9]
Why am I getting such errors?
I am using use Thread::Queue; use threads::shared; Please let me know your views.
Below are the thread libraries version info.
use threads; - installed v2.15 (latest - 2.16)
use Thread::Queue; - installed v3.12 (up to date)
use threads::shared; - installed v1.56 (latest - 1.57)
perl - installed v5.26.1
Other libraries are::
use YAML::XS 'LoadFile'; - 0.66 up to date
use Net::Netconf::Manager; - 1.02 up to date
use Config::Properties; - 1.80 up to date
use Sys::Syslog; - 0.35 up to date
use DateTime::Format::Strptime; - 1.74 up to date
use DateTime; - 1.44 up to date
use XML::LibXML; - 2.0129 (latest 2.0139)
use Regexp::Common qw/net/; - 2017060201 up to date
use Getopt::Long; - 2.5 up to date

To give you a firm answer, we need something we can run and troubleshoot. Otherwise the error is not reproducible.
Having said that - this looks like something similar to what I have encountered before with certain modules not being thread safe - they'll often run fine, and then very occasionally explode in your face.
E.g. Crypt::SSLeay back in 2008. Net::SSLeay prior to 1.4.2
The general workaround is to stop loading the culprits at compile time with use - because then the same state is inherited by all threads - and instead within the thread, load them at runtime with require and import. By doing this, you're isolating them - your threads will take slightly longer to start, but you shouldn't be spamming threads anyway in perl.
Or use a different module that is thread safe.
With your update and screenshot - Net::SSH2 is mentioned - and that implies one of your other modules is pulling that in.
However Net::SSH thread safey indicates that libssh might have some constraints on thread safety:
Thread-safe: just don't share handles simultaneously
You don't explicitly mention using it, but it looks like it's being pulled in by another module. At a guess, that'll be Net::Netconf::Manager.
And as a second further guess - it might well be doing the 'share handles' because it doesn't realise it's being run in a thread.
So this module is the one I'd suggest isolating within the threads:
require 'Net::NetConf::Manager';
Net::NetConf::Manager -> import;
And do the instantiation within-thread.
As you're using a worker-threads model, this should be minimal overhead, and will mean that you're not hitting this problem.
But more generally - it's unwise to assume modules are thread safe unless they explicitly say they are. The major 'tripping' points are usually whenever any sort of resource sharing can be assumed/implied by the module, such as network sockets, file handles, database connections etc. Often a socket is created at instantiation (e.g. the point where you pass username/password) and having two threads trying to drive a socket concurrently is a potential race condition.

Related

Getting a list or count of all running threads

How can I do one of the following?
get a list of ThreadIDs all running forked threads (preferably with labels) from application code?
get a simple (maybe approximate; e.g. from the last major GC), count of running threads
get either of the above from gdb or similar
Things that don't work for me:
requiring a profiling build
maintaining some data structure to try to track this information myself
Things I thought might be promising:
write a custom eventlog hook to try to get this info https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#hooks-to-change-rts-behaviour
create a hacky RTS binding somehow

GHC 9.6 introduced listThreads :: IO [ThreadId]
From GHC User’s Guide:
GHC now provides a set of operations for introspecting on the threads of a program, GHC.Conc.listThreads, as well as operations for querying a thread’s label (GHC.Conc.threadLabel) and status (GHC.Conc.threadStatus).

Why are static variables in an xsub not thread safe?

According to perldoc threads:
Since Perl 5.8, thread programming has been available using a model
called interpreter threads which provides a new Perl interpreter for
each thread, and, by default, results in no data or state information
being shared between threads.
What type of data or state information is referred to in the above quote? According to perldoc perlxs:
Starting with Perl 5.8, a macro framework has been defined to allow
static data to be safely stored in XS modules that will be accessed
from a multi-threaded Perl.
So it seems to me static variables are shared between threads? But Perl variables are not shared?
(I am trying to figure out exactly what kind of data is thread safe, and how to create a thread safe module)

Each thread has its own interpreter. This struct[1] stores everything that makes up perl, including parser state, regex engine state, symbol table, and all "SV" (which includes scalars, arrays, hashes, code, etc). Creating a new thread from within Perl copies makes a copy of the current interpreter.
XS code may safely use the Perl API because each function has a parameter that specifies the interpreter to use. This is often invisible to the code thanks to macros, but you might have noticed references to "THX" or "Perl context". Just don't pass an SV that belongs to one interpreter to another. (You might have heard of the "Free to wrong pool" error message that can result from this.)
But Perl can't offer any protection to things outside of its knowledge or control, such as the static storage of the external libraries it loads. No copies of those are made. Two threads could call the same C function at the same time, so precautions need to be taken as if you were writing a multi-threaded C program.
That macro framework to which your quote alludes gives access to per-interpreter storage. It also allows the library to specify a function to call on creation of new Perl threads to clone the variables into to the new interpreter.
If Perl is built without -Dusemultiplicity, the Perl interpreter consists of a bajillion global (static) variables instead. MULTIPLICITY moves them into a struct and adds a context parameter to Perl API calls. This has a performance penalty, but it allows a process to have multiple Perl interpreters. Because threaded builds of Perl requires this, building a threaded perl (-Dusethreads) presumes -Dusemultiplicity.

How to run parallel fork as single thread in perl?

I was trying to check response messages written in perl which takes requests through Amazon API and returns responses..How to run parallel fork as single thread in perl?. I'm using LWP::UserAgent module and I want to debug HTTP requests.

As a word of warning - threads and forks are different things in perl. Very different.
However the long and short of it is - you can't, at least not trivially - a fork is a separate process. It actually happens when you run -any- external command in perl, it's just by default perl sits and waits for that command to finish and return output.
However if you've got access to the code, you can amend it to run single threaded - sometimes that's as simple as reducing the paralleism with a config parameter. (In fact quite often - debugging parallel code is a much more complicated task than sequential, so getting it working before running parallel is really important).
You might be able to embed a waitpid into your primary code so you've only got one thing running at once. Without a code example though, it's impossible to say for sure.

How to make a mod_perl interpreter sticky by some conventions?

As it seems that mod_perl only manages Perl interpreters per VHOST, is
there any way I can influence which cloned interpreter mod_perl
selects to process a request? I've read through the configurable
scopes and had a look into "modperl_interp_select" in the source and I
could see that if a request already has a interpreter associated, that
one is selected by mod_perl.
else if (r) {
if (is_subrequest && (scope == MP_INTERP_SCOPE_REQUEST)) {
[...]
}
else {
p = r->pool;
get_interp(p);
}
I would like to add some kind of handler before mod_perl selects an
interpreter to process a request and then select an interpreter to
assign it to the request myself, based on different criteria included in
the request.
But I'm having trouble to understand if such a handler can exist at
all or if everything regarding a request is already processed by a
selected interpreter of mod_perl.
Additionally, I can see APR::Pool-API, but it doesn't seem to provide
the capability to set some user data on a current pool object, which
is what mod_perl reads by "get_interp".
Could anyone help me on that? Thanks!
A bit on the background: I have a dir structure in cgi-bin like the following:
cgi-bin
software1
customer1
*.cgi
*.pm
customer2
*.cgi
*.pm
software2
customer1
*.cgi
*.pm
customer2
*.cgi
*.pm
Each customer uses a private copy of the software and the software is using itself, e.g. software1 of customer1 may talk to software2 of customer1 by loading some special client libs of software2 into it's own Perl interpreter. To get things more complicated, software2 may even bring general/common parts of software1 with it's own private installation by using svn:external. So I have a lot of the same software with the same Perl packages in one VHOST and I can't guarantee that all of those private installation always have the same version level.
It's quite a mixup, but which is known to work under the rules we have within the same Perl interpreter.
But now comes mod_perl, clones interpreters as needed and reuses them for requests into whichever sub dir of cgi-bin it likes and in this case things will break, because suddenly the interpreter already processed software1 of customer1 and should now process software2 of customer2, which uses common packages of software1, which already where loaded by the Perl interpreter before and are used because of %INC instead of the private packages of software2 and such...
Yes, there are different ways to deal with that, like VHOSTs and sub domains persoftware or customer or whatever, but I would like to check different ways of keeping one VHOST and the current directory structure, just by using what mod_perl or Apache httpd provides. And one way would be if I could tell mod_perl to always use the same Perl interpreter for requests to the same directory. This way mod_perl would create it's pool of interpreters and I would be responsible to select each of them per directory.

What I've learned so far is that it's not easily possible to influence mod_perl's decision about a selected interpreter. If one wants to, it seems that one would need to really patch mod_perl on C level or provide an own C-handler for httpd as a hook to run before mod_perl. In the end mod_perl is only a combination of handlers for httpd itself, so placing one in front of it doing some special things is possible. It's another question if it's wise to do so, because one would have to deal with some mod_perl internals like the fact there's no interpreter available currently and in the end in my case I would need a map somewhere for interpreters and their associated directory to process.
In the end it's not that easy and I don't want to patch mod_perl or start on low level C for a httpd handler/hook.
For documentation purposes I want to mention two possible workarounds which came into my mind:
Pool of Perl threads
The problem with the current mod_perl approach in my case is that it clones Perl interpreters low level in C and those are run by threads provided by a pool of httpd, every thread can run any interpreter any given time, unless it's not already in use by another thread. With this approach it seems impossible to access the interpreters within Perl itself without using any low level XS as well, especially it's not possible to manage the interpreters and threads with Perl's Threads API, simply because it aren't no Perl threads, it's Perl interpreters executed by httpd threads. In the end both behave the same, though, because during creation of Perl threads the current interpreter is cloned as well and associated OS threads are created and such. But while using Perl's threads you have more influence about the shared data and such.
So a workaround for my current problem could be to not let mod_perl and it's interpreters process a request, but instead create an own thread pool of Perl threads directly while starting up in a VHOST using PerlModule or such. Those threads can be directly managed entirely within Perl, one could create some queues to dispatch work in form of absolute paths to requested CGI applications and such. Besides the thread pool itself a handler would be needed which would get called instead of e.g. ModPerl::Registry to function as a dispatcher: It would need to decide based on some criteria which thread to use and put the requested path into it's queue and the thread itself could ultimately e.g. just create new instances of ModPerl::Registry to process the given file. Of course there would be some glue needed here and there...
There are some downsides of this approach of course: It sounds like a fair amount of work, doubles some of the functionality already implemented by mod_perl especially regarding pool maintenance and doubles the amount of threads and memory used, because the mod_perl interpreters and threads would only be used to execute the dispatcher handler and additionally one would have threads and interpreters to process the requests within the Perl threads. The amount of threads shouldn't be a huge problem at all, that one of mod_perl would just sleep and wait for the Perl thread to finish it's work.
#INC-Hook with source code changes
Another, and I guess easier, approach would be to make use of #INC hooks for Perl's require in combination again with an own mod_perl handler extending ModPerl::Registry. The key point is that the handler is the first place where the requested file is read and it's source can be changed before compiling it. Every
use XY::Z;
XY::Z->new(...);
could be changed to
use SomePrefix::XY::Z;
SomePrefix::XY::Z->new(...);
where SomePrefix would simply be the full path of the parent directory of the requested file changed to be a valid Perl package name. ModPerl::Registry already does something similar while transforming a requested CGI script automatically into a mod_perl handler, so this works in general and ModPerl::Registry already provides some logic to generate the package name and such. The change leads to that Perl won't find the packages anymore automatically, simply because they don't exist with the new name in a place known to Perl, that's where the #INC hook applies.
The hook is responsible to recognize such changed packages, simply because of the name of SomePrefix or a marker prefix in front of SomePrefix or whatever, and map those to a file in the file system to provide a handle to the requested file which Perl can load during "require". Additionally, the hook will provide a callback which gets called by Perl for each line of the file read and will function as a source code filter, again changing each "package", "use" or "require" statement to have SomePrefix in front of. This will result again in the hook being responsible for providing file handles to those packages etc.
The key point here is changing the source code during runtime once: Instead of "XY/Z.pm" which Perl would require normally, would be available n times in my directory structure and would be saved as "XY/Z.pm" in %INC, one lets Perl require "SomePrefix/XY/Z.pm", that would be stored in %INC and is unique for every Perl interpreter used by mod_perl because SomePrefix reflects the unique installation directory of a requested file. There's no room anymore for Perl to think it already had loaded XY::Z, just because it processed a request from another directory before.
Of course this only works for easy "use ...;" statements, things like "eval("require $package");" will make things a bit more complicated.
Comments welcome... :-)

"PerlOptions" is a DIR scoped variable, not limited to VirtualHosts, so new interpreter pools can be created for any location. These directive can even be placed in .htaccess files for ease of configuration, but something like this in your httpd.conf should give you the desired effect:
<Location /cgi-bin/software1/customer1>
PerlOptions +Parent
</Location>
<Location /cgi-bin/software1/customer2>
PerlOptions +Parent
</Location>
<Location /cgi-bin/software2/customer1>
PerlOptions +Parent
</Location>
<Location /cgi-bin/software2/customer2>
PerlOptions +Parent
</Location>

Automatically adjusting process priorities under Linux

I'm trying to write a program that automatically sets process priorities based on a configuration file (basically path - priority pairs).
I thought the best solution would be a kernel module that replaces the execve() system call. Too bad, the system call table isn't exported in kernel versions > 2.6.0, so it's not possible to replace system calls without really ugly hacks.
I do not want to do the following:
-Replace binaries with shell scripts, that start and renice the binaries.
-Patch/recompile my stock Ubuntu kernel
-Do ugly hacks like reading kernel executable memory and guessing the syscall table location
-Polling of running processes
I really want to be:
-Able to control the priority of any process based on it's executable path, and a configuration file. Rules apply to any user.
Does anyone of you have any ideas on how to complete this task?

If you've settled for a polling solution, most of the features you want to implement already exist in the Automatic Nice Daemon. You can configure nice levels for processes based on process name, user and group. It's even possible to adjust process priorities dynamically based on how much CPU time it has used so far.

Sometimes polling is a necessity, and even more optimal in the end -- believe it or not. It depends on a lot of variables.
If the polling overhead is low-enough, it far exceeds the added complexity, cost, and RISK of developing your own style kernel hooks to get notified of the changes you need. That said, when hooks or notification events are available, or can be easily injected, they should certainly be used if the situation calls.
This is classic programmer 'perfection' thinking. As engineers, we strive for perfection. This is the real world though and sometimes compromises must be made. Ironically, the more perfect solution may be the less efficient one in some cases.
I develop a similar 'process and process priority optimization automation' tool for Windows called Process Lasso (not an advertisement, its free). I had a similar choice to make and have a hybrid solution in place. Kernel mode hooks are available for certain process related events in Windows (creation and destruction), but they not only aren't exposed at user mode, but also aren't helpful at monitoring other process metrics. I don't think any OS is going to natively inform you of any change to any process metric. The overhead for that many different hooks might be much greater than simple polling.
Lastly, considering the HIGH frequency of process changes, it may be better to handle all changes at once (polling at interval) vs. notification events/hooks, which may have to be processed many more times per second.
You are RIGHT to stay away from scripts. Why? Because they are slow(er). Of course, the linux scheduler does a fairly good job at handling CPU bound threads by downgrading their priority and rewarding (upgrading) the priority of I/O bound threads -- so even in high loads a script should be responsive I guess.

There's another point of attack you might consider: replace the system's dynamic linker with a modified one which applies your logic. (See this paper for some nice examples of what's possible from the largely neglected art of linker hacking).
Where this approach will have problems is with purely statically linked binaries. I doubt there's much on a modern system which actually doesn't link something dynamically (things like busybox-static being the obvious exceptions, although you might regard the ability to get a minimal shell outside of your controls as a feature when it all goes horribly wrong), so this may not be a big deal. On the other hand, if the priority policies are intended to bring some order to an overloaded shared multi-user system then you might see smart users preparing static-linked versions of apps to avoid linker-imposed priorities.

Sure, just iterate through /proc/nnn/exe to get the pathname of the running image. Only use the ones with slashes, the others are kernel procs.
Check to see if you have already processed that one, otherwise look up the new priority in your configuration file and use renice(8) to tweak its priority.

If you want to do it as a kernel module then you could look into making your own binary loader. See the following kernel source files for examples:
$KERNEL_SOURCE/fs/binfmt_elf.c
$KERNEL_SOURCE/fs/binfmt_misc.c
$KERNEL_SOURCE/fs/binfmt_script.c
They can give you a first idea where to start.
You could just modify the ELF loader to check for an additional section in ELF files and when found use its content for changing scheduling priorities. You then would not even need to manage separate configuration files, but simply add a new section to every ELF executable you want to manage this way and you are done. See objcopy/objdump of the binutils tools for how to add new sections to ELF files.

Does anyone of you have any ideas on how to complete this task?
As an idea, consider using apparmor in complain-mode. That would log certain messages to syslog, which you could listen to.

If the processes in question are started by executing an executable file with a known path, you can use the inotify mechanism to watch for events on that file. Executing it will trigger an I_OPEN and an I_ACCESS event.
Unfortunately, this won't tell you which process caused the event to trigger, but you can then check which /proc/*/exe are a symlink to the executable file in question and renice the process id in question.
E.g. here is a crude implementation in Perl using Linux::Inotify2 (which, on Ubuntu, is provided by the liblinux-inotify2-perl package):
perl -MLinux::Inotify2 -e '
use warnings;
use strict;
my $x = shift(#ARGV);
my $w = new Linux::Inotify2;
$w->watch($x, IN_ACCESS, sub
{
for (glob("/proc/*/exe"))
{
if (-r $_ && readlink($_) eq $x && m#^/proc/(\d+)/#)
{
system(#ARGV, $1)
}
}
});
1 while $w->poll
' /bin/ls renice
You can of course save the Perl code to a file, say onexecuting, prepend a first line #!/usr/bin/env perl, make the file executable, put it on your $PATH, and from then on use onexecuting /bin/ls renice.
Then you can use this utility as a basis for implementing various policies for renicing executables. (or doing other things).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string