How to run parallel fork as single thread in perl?

How to run parallel fork as single thread in perl? - multithreading

I was trying to check response messages written in perl which takes requests through Amazon API and returns responses..How to run parallel fork as single thread in perl?. I'm using LWP::UserAgent module and I want to debug HTTP requests.

As a word of warning - threads and forks are different things in perl. Very different.
However the long and short of it is - you can't, at least not trivially - a fork is a separate process. It actually happens when you run -any- external command in perl, it's just by default perl sits and waits for that command to finish and return output.
However if you've got access to the code, you can amend it to run single threaded - sometimes that's as simple as reducing the paralleism with a config parameter. (In fact quite often - debugging parallel code is a much more complicated task than sequential, so getting it working before running parallel is really important).
You might be able to embed a waitpid into your primary code so you've only got one thing running at once. Without a code example though, it's impossible to say for sure.

Related

PyO3 - prevent user submitted code from looping and blocking server thread

I'm writing a game in Rust where each player can submit some python scripts to the server in order to automate various tasks in the game. I plan on using pyo3 to run the python from rust.
However, I can see an issue arising if a player submits a script like this:
def on_event(e):
while True:
pass
Now when the server calls the function (using something like PyAny::call1()) the thread will hang as it reaches the infinite loop.
My first thought was to have pyo3 execute the python one statement at a time, therefore being able to exit if the script been running for over a certain threshold, but I don't think pyo3 supports this.
My next idea was to give each player their own thread to run their own scripts on, that way if one of their scripts got stuck it only affected their gameplay. However, I still have the issue of not being able to kill a thread when it gets stuck in an infinite loop - if a lot of players submitted scripts that just looped, lots of threads would start using a lot of CPU time.
All I need is way to execute python scripts in a way such that if one of them does loop, it does not affect the server's performance at all.
Thanks :)

One solution is to restrict the time that you give each user script to run.
You can do it via PyThreadState_SetAsyncExc, see here for some code. It uses C calls of the interpreter, which you probably can access in Rust (with PyO3 FFI magic).
Another way would be to do it on the OS level: if you spawn a process for the user script, and then kill it when it runs for too long. This might be more secure if you limit what a process can access (with some OS calls), but requires some boilerplate to communicate between the host.

Run multiple copies of Speedy or PersistentPerl to be called from Tomcat

I have a modern webapp running under Tomcat, which often needs to call some legacy perl code to get some results. Right now, we wrap these in a call to Runtime.getRuntime().exec() which is working fine.
However, as the webapp gets busier we are noticing that often the perl is timing out and we need to control this.
I am using commons-pool to ensure that only X number of copies can be run at a time, and threads will queue up nicely for a perl instance when they need one, timing out after Y seconds and returning an error (this is fine, the client will just retry).
However we still have the problem that Perl takes a long time to start up, interpret the script, execute and return. At busy times we are doing this 30-50 times per second. It's a beefy machine but it's starting to struggle.
I have read up on Speedy and PersistentPerl and am considering holding open a copy of this in memory for each object in my pool, so that we do not need to open and close the Perl each time.
Is this a good idea? Any tips for how to go about doing this?

Those approaches should reduce the overhead from the start up time of your script. If the script is something that can be run as a CGI program then you might be better offer making it work with Plack and running it with a PSGI server. Your Tomcat application could collect and send the request parameters to your script and/or "web application" running in the background.

Waiting on many parallel shell commands with Perl

Concise-ish problem explanation:
I'd like to be able to run multiple (we'll say a few hundred) shell commands, each of which starts a long running process and blocks for hours or days with at most a line or two of output (this command is simply a job submission to a cluster). This blocking is helpful so I can know exactly when each finishes, because I'd like to investigate each result and possibly re-run each multiple times in case they fail. My program will act as a sort of controller for these programs.
for all commands in parallel {
submit_job_and_wait()
tries = 1
while ! job_was_successful and tries < 3{
resubmit_with_extra_memory_and_wait()
tries++
}
}
What I've tried/investigated:
I was so far thinking it would be best to create a thread for each submission which just blocks waiting for input. There is enough memory for quite a few waiting threads. But from what I've read, perl threads are closer to duplicate processes than in other languages, so creating hundreds of them is not feasible (nor does it feel right).
There also seem to be a variety of event-loop-ish cooperative systems like AnyEvent and Coro, but these seem to require you to rely on asynchronous libraries, otherwise you can't really do anything concurrently. I can't figure out how to make multiple shell commands with it. I've tried using AnyEvent::Util::run_cmd, but after I submit multiple commands, I have to specify the order in which I want to wait for them. I don't know in advance how long each submission will take, so I can't recv without sometimes getting very unlucky. This isn't really parallel.
my $cv1 = run_cmd("qsub -sync y 'sleep $RANDOM'");
my $cv2 = run_cmd("qsub -sync y 'sleep $RANDOM'");
# Now should I $cv1->recv first or $cv2->recv? Who knows!
# Out of 100 submissions, I may have to wait on the longest one before processing any.
My understanding of AnyEvent and friends may be wrong, so please correct me if so. :)
The other option is to run the job submission in its non-blocking form and have it communicate its completion back to my process, but the inter-process communication required to accomplish and coordinate this across different machines daunts me a little. I'm hoping to find a local solution before resorting to that.
Is there a solution I've overlooked?

You could rather use Scientific Workflow software such as fireworks or pegasus which are designed to help scientists submit large numbers of computing jobs to shared or dedicated resources. But they can also do much more so it might be overkill for your problem, but they are still worth having a look at.
If your goal is to try and find the tightest memory requirements for you job, you could also simply submit your job with a large amount or requested memory, and then extract actual memory usage from accounting (qacct), or , cluster policy permitting, logging on the compute node(s) where your job is running and view the memory usage with top or ps.

Is is OK to use a non-zero return code for a process that executed successfully?

I'm implementing a simple job scheduler, which spans a new process for every job to run. When a job exits, I'd like it to report the number of actions executed to the scheduler.
The simplest way I could find, is to exit with the number of actions as a return code. The process would for example exit with return code 3 for "3 actions executed".
But the standard (AFAIK) being to use the return code 0 when a process exited successfully, and any other value when there was en error, would this approach risk to create any problem?
Note: the child process is not an executable script, but a fork of the parent, so not accessible from the outside world.

What you are looking for is inter process communication - and there are plenty ways to do it:
Sockets
Shared memory
Pipes
Exclusive file descriptors (to some extend, rather go for something else if you can)
...
Return convention changes are not something a regular programmer should dare to violate.

The only risk is confusing a calling script. What you describe makes sense, since what you want really is the count. As Joe said, use negative values for failures, and you should consider including a --help option that explains the return values ... so you can figure out what this code is doing when you try to use it next month.

I would use logs for it: log the number of actions executed to the scheduler. This way you can also log datetimes and other extra info.
I would not change the return convention...

If the scheduler spans a child and you are writing that you could also open a pipe per child, or a named pipes or maybe unix domain sockets, and use that for inter process communication and writing the processed jobs there.
I would stick with conventions, namely returning 0 for success, expecially if your program is visible/usable around by other people, or anyway document well those decisions.
Anyway apart from conventions there are also standards.

Use full processing power with perl

I have a perl script which is running correct but it is only using 1 core of my 2 core CPU, how can i make it utilise all cores.
I know that i can create threads using threads->new(); but how do i fit that into something like:
my $twig= new XML::Twig::XPath(TwigRoots => {TrdCaptRpt => \&top_level});
$twig->parsefile($file);
where the subroutine is being called by something else.

The standard approach with Perl is to not try to use multiple cores with one invocation of the script, but instead to run jobs in parallel on separate cores.
Yes, you can use threading with Perl, but Perl's threading is (very) heavyweight. To avoid potential race conditions, when you spawn a thread Perl simply copies everything that it does not want to explicitly share. Therefore using threading is likely to be much slower than not.

You would need to modify the code of XML::Twig. There is no canned answer of what would need to be done. if you find yourself having to run this script for multiple files, a better and very simple option, is to write your script so it can run for more than 1 file at the same time. You could do that with threads or you could do that with a wrapper script that executes 2 copies of your script at the same time (perhaps with xargs?).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string