Multithreading/Parallel Bash Scripts in Unix Environment

Multithreading/Parallel Bash Scripts in Unix Environment - linux

I have multiple bash scripts that I have tried to "parallelize" within a master bash script.
Bash Script:
#!/bin/bash
SHELL=/bin/bash
bash /home/.../a.sh &
bash /home/.../b.sh &
wait
bash /home/.../c.sh &
bash /home/.../d.sh &
bash /home/.../e.sh &
wait
echo "Done paralleling!"
exit 0
I have run the script normally (without ampersands) and with ampersands and I am not seeing any appreciable difference in processing time, leading me to believe that something may not be coded correctly/the most efficient way.

In classic computer-science theory, resource-contention is referred to as "thrashing."
(In the good ol' days, when a 5-megabyte disk drive might be the size of a small washing machine, we used to call it "Maytag Mode," since the poor thing looked like a Maytag washing-machine on the "spin" cycle!)
If you graph the performance curve caused by contention, it slopes upward, then abruptly has an "elbow" shape: it goes straight up, exponentially. We call that, "hitting the wall."
An interesting thing to fiddle-around-with on this script (if you're just curious ...) is to put wait statements at several places. (Be sure you're doing this correctly ...) Allow, say, two instances to run, wait for all of them to complete, then three more, and so on. See if that's usefully faster, and, if it is, try three. And so on. You may find a "sweet spot."
Or ... not. (Don't spend too much time with this. It doesn't look like it's going to be worth it.)

You're likely correct. The thing with parallelism is that it allows you to grab multiple resources to use in parallel. That improves your speed if - and only if - that resource is your limiting factor.
So - for example - if you're reading from a disk - odds are good that the action of reading from disk is what's limiting you, and doing more in parallel doesn't help - and indeed, because of contention can slow the process down. (The disk has to seek to service multiple processes, rather than just 'getting on' and serialising a read).
So it really does boil down to what your script actually does and why it's slow. And the best way of checking that is by profiling it.
At a basic level, something like truss or strace might help.
e.g.
strace -fTtc /home/../e.sh
And see what types of system calls are being made, and how much of the total time they're consuming.

Related

Sleep() Methods and OS - Scheduler (Camunda/Groovy)

I got a question for you guys and its not as specific as usual, which could make it a little annoying to answer.
The tool i'm working with is Camunda in combination with Groovy scripts and the goal is to reduce the maximum cpu load (or peak load). I'm doing this by "stretching" the work load over a certain time frame since the platform seems to be unhappy with huge work load inputs in a short amount of time. The resulting problem is that Camunda wont react smoothly when someone tries to operate it at the UI - Level.
So i wrote a small script which basically just lets each individual process determine his own "time to sleep" before running, if a certain threshold is exceeded. This is based on how many processes are trying to run at the same time as the individual process.
It looks like:
Process wants to start -> Process asks how many other processes are running ->
waitingTime = numberOfProcesses * timeToSleep * iterationOfMeasures
CPU-Usage Curve 1,3 without the Script. Curve 2,4 With the script
Testing it i saw that i could stretch the work load and smoothe out the UI - Levels. But now i need to describe why this is working exactly.
The Questions are:
What does a sleep method do exactly ?
What does the sleep method do on CPU - Level?
How does an OS-Scheduler react to a Sleep Method?
Namely: Does the scheduler reschedule or just simply "wait" for the time given?
How can i recreate and test the question given above?
The main goal is not for you to answer this, but could you give me a hint for finding the right Literature to answer these questions? Maybe you remember a book which helped you understand this kind of things or a Professor recommended something to you. (Mine wont answer, and i cant blame him)
I'm grateful for hints and or recommendations !

i'm sure you could use timer event
https://docs.camunda.org/manual/7.15/reference/bpmn20/events/timer-events/
it allows to postpone next task trigger for some time defined by expression.
about sleep in java/groovy: https://www.javamex.com/tutorials/threads/sleep.shtml
using sleep is blocking current thread in groovy/java/camunda.
so instead of doing something effective it's just blocked.

Do not use all MATLAB pool workers

I have set up a local Matlab (R2015b) pool of workers according to my CPU configuration (quad-core, multi-threading => 8 workers in total.)
I have simulations that last 24h but I want to be able to use my computer at the same time. Therefore, I limit myself to 4 simulations a day (sent via batch) so that I can keep working at the same time.
My question is this: how can I queue several jobs without eating up the 8 workers? Another related question is if I reduce the size of the pool to 4 workers, will I still be able to run Matlab smoothly?
Thank you very much for your answer.

I would say that the best solution to your problem is to do it via bash in stead of matlab. In bash you have a command called nice which allows you to down prioritize the simulation. Which means that if you are using the computer you will get the power, and if you are not using it, the power goes to the computations.
Regarding the second part of your question. The easiest way to queue all the jobs is to make a bash script something like the following:
for f in $(find . -name name_of_matlab_script*)
do
nice -n 10 matlab -nodisplay <$f
done
where the name of the matlab scripts would be called something with the same base and then the start will take care of the rest. Then it will run the scripts after each other however give priority to what you otherwise use your computer for.
If you want more advanced scheduling software I normally uses Slurm.
Regarding the 4 workers in stead of 8, then as Ander Biguri says in the comments, as few as possible as long as you do not add to much extra time.

Puzzled by the cpushare setting on Docker.

I wrote a test program 'cputest.py' in python like this:
import time
while True:
for _ in range(10120*40):
pass
time.sleep(0.008)
, which costs 80% cpu when running in a container (without interference of other runnng containers).
Then I ran this program in two containers by the following two commands:
docker run -d -c 256 --cpuset=1 IMAGENAME python /cputest.py
docker run -d -c 1024 --cpuset=1 IMAGENAME python /cputest.py
and used 'top' to view their cpu costs. It turned out that they relatively cost 30% and 67% cpu. I'm pretty puzzled by this result. Would anyone kindly explain it for me? Many thanks!

I sat down last night and tried to figure this out on my own, but ended up not being able to explain the 70 / 30 split either. So, I sent an email to some other devs and got this response, which I think makes sense:
I think you are slightly misunderstanding how task scheduling works - which is why the maths doesn't work. I'll try and dig out a good article but at a basic level the kernel assigns slices of time to each task that needs to execute and allocates slices to tasks with the given priorities.
So with those priorities and the tight looped code (no sleep) the kernel assigns 4/5 of the slots to a and 1/5 to b. Hence the 80/20 split.
However when you add in sleep it gets more complex. Sleep basically tells the kernel to yield the current task and then execution will return to that task after the sleep time has elapsed. It could be longer than the time given - especially if there are higher priority tasks running. When nothing else is running the kernel then just sits idle for the sleep time.
But when you have two tasks the sleeps allow the two tasks to interweave. So when one sleeps the other can execute. This likely leads to a complex execution which you can't model with simple maths. Feel free to prove me wrong there!
I think another reason for the 70/30 split is the way you are doing "80% load". The numbers you have chosen for the loop and the sleep just happen to work on your PC with a single task executing. You could try moving the loop to be based on elapsed time - so loop for 0.8 then sleep for 0.2. That might give you something closer to 80/20 but I don't know.
So in essence, your time.sleep() call is skewing your expected numbers, removing the time.sleep() causes the CPU load to be far closer to what you'd expect.

Waiting on many parallel shell commands with Perl

Concise-ish problem explanation:
I'd like to be able to run multiple (we'll say a few hundred) shell commands, each of which starts a long running process and blocks for hours or days with at most a line or two of output (this command is simply a job submission to a cluster). This blocking is helpful so I can know exactly when each finishes, because I'd like to investigate each result and possibly re-run each multiple times in case they fail. My program will act as a sort of controller for these programs.
for all commands in parallel {
submit_job_and_wait()
tries = 1
while ! job_was_successful and tries < 3{
resubmit_with_extra_memory_and_wait()
tries++
}
}
What I've tried/investigated:
I was so far thinking it would be best to create a thread for each submission which just blocks waiting for input. There is enough memory for quite a few waiting threads. But from what I've read, perl threads are closer to duplicate processes than in other languages, so creating hundreds of them is not feasible (nor does it feel right).
There also seem to be a variety of event-loop-ish cooperative systems like AnyEvent and Coro, but these seem to require you to rely on asynchronous libraries, otherwise you can't really do anything concurrently. I can't figure out how to make multiple shell commands with it. I've tried using AnyEvent::Util::run_cmd, but after I submit multiple commands, I have to specify the order in which I want to wait for them. I don't know in advance how long each submission will take, so I can't recv without sometimes getting very unlucky. This isn't really parallel.
my $cv1 = run_cmd("qsub -sync y 'sleep $RANDOM'");
my $cv2 = run_cmd("qsub -sync y 'sleep $RANDOM'");
# Now should I $cv1->recv first or $cv2->recv? Who knows!
# Out of 100 submissions, I may have to wait on the longest one before processing any.
My understanding of AnyEvent and friends may be wrong, so please correct me if so. :)
The other option is to run the job submission in its non-blocking form and have it communicate its completion back to my process, but the inter-process communication required to accomplish and coordinate this across different machines daunts me a little. I'm hoping to find a local solution before resorting to that.
Is there a solution I've overlooked?

You could rather use Scientific Workflow software such as fireworks or pegasus which are designed to help scientists submit large numbers of computing jobs to shared or dedicated resources. But they can also do much more so it might be overkill for your problem, but they are still worth having a look at.
If your goal is to try and find the tightest memory requirements for you job, you could also simply submit your job with a large amount or requested memory, and then extract actual memory usage from accounting (qacct), or , cluster policy permitting, logging on the compute node(s) where your job is running and view the memory usage with top or ps.

Should I use c++ or script for a daemon process?

I need to implement a daemon that needs to extract data from a database, load the data to memory, and according to this data
perform actions like sending emails or write/update files. These actions need to be performed every 30 minutes.
I really don't know what to decide. Compile a c++ program that will do the task or use scripts and miscellaneous Linux tools (sed/awk).
What will be the fastest way to do this? To save cpu and memory.
The dilemma is about marinating this process if it's script it does not need compilations and I can just drop it into any machine linux/unix
but if it's native it's more harder.
What do you think?

Use cron(1) to start your program every 30 minutes.
So called scripting languages will definitely enable you to write your program more quickly than C++. But doing this with shell and sed an/or awk, while definitly possible, is very difficult when you have to cope with all corner cases, particularly regarding strings escaping (think quotes, “&”’s “;”’s…).
I suggest you go with a more full featured “scripting” language such as Perl or Python.

Why are you trying to save CPU & Memory? Are you absolutely sure this is a real requirement (or just "premature optimization")?
Unless performance is critical, there's absolutely no reason to code such a thing in C++. It seems to be a sort of maintenance process (right?). I say write it in the highest level script language you know. Python or PHP seem like good candidates. Even if you don't know these languages, it would still take you less time to familiarize yourself with them than it would take you to do it in C++.

I'd go with a Python/Perl/Ruby implementation with a cron entry to schedule the script to run every 30 minutes.
If performance becomes an issue you can add a column to you DB that tracks the last time you ran calculations for the account and then split the processing of your records into groups of 2 or 3 or 4, running them ever 15, 10, 5 minutes respectively.
If after splitting your calculations into groups, you still have performance demands then consider C++/C/Java.
I'd still run this using cron though. No need to be a daemon unless you are providing on-demand services.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string