Example code to check parallelism of HSL solver MA97 in IPOPT - multithreading

I'm working on solving non-linear optimization problems. Currently I'm evaluating different algorithms to find out which one fits my problem best. I'm using MATLAB 2020b on Ubuntu 20.04 LTS.
I currently got IPOPT with the HSL solvers up and running. My problem consists of a few hundred variables (~500 at the moment). Switching to MA97 didn't show any performance improvements. Probably my problem is too small? Nevertheless, I'd like to check if the parallelism of MA97 compared to e.g. MA27 is working properly, hence, if I compiled everything correctly.
Is there any sample problem where I can verify if MA97 is working multi-threaded but MA27 not?

Several approaches suggested:
Try to debug from Matlab into the native code and see what IPOPT is calling into. This approach is tricky because Matlab itself uses OpenMP.
Use proc filesystem, if there are subdirectories under /proc/self/tasks, the process is multi-threaded. This approach has the same issues as above (Matlab backend will likely be using multi-threading).
Use environmental variables to limit the number of OpenMP threads (OMP_THREAD_LIMIT) and check for performance changes. Will need to measure this difference specifically around the call to IPOPT, as again, Matlab will be using OpenMP for its own functionality.
Matlab has a built-in profiler:
% start profiling
profile on
% your code ...
% launch profile viewer
profile viewer
Also, the IPOPT logs may be helpful. If the solver is multithreaded, there should be a difference between elapsed real-time and CPU time. This scales with parallelism, i.e
CPU time ∝ threads count * elapsed real-time
This is a rough approximation which is only valid up to the point you become resource-constrained on the number of threads.

I hope you already solved your problem. But I want to reply to help others. If you pass option linear_solver ma97 IPOPT should use HSL MA97 solver. I dont know how it can be done from MATLAB but if you add working directory "ipopt.opt" file IPOPT will read this file and apply specified options.
File content: (no equality sign)
linear_solver ma97

Related

Multi-thread usage in Dymola slow down solution speed

Does using multi-core functionality in Dymola2020x always speed up the solution? My observation is using Advanced.ParallelizeCode=true for a model with DOF~23k; compiling time is comparable with single thread however the solution time with default solver is slower.
Any comments are appreciated!
Multi-core functionality of a single model does not always speed up execution.
There are a number of possible explanations:
There are so many dependencies that it isn't possible to parallelize at all. (Look at translation log - this is fairly clear).
It's only possible to parallelize a small part of the model. (Look at translation log - this takes more time).
The model uses many external function (or FMUs), and by default Dymola treat them as critical sections. (See release notes and manual on __Dymola_ThreadSafe and __Dymola_CriticalRegion).
In versions before Dymola 2020x you might have to set the environment variable OMP_WAIT_POLICY=PASSIVE. (Shouldn't be needed in your version.)
Using decouple as described in https://www.claytex.com/tech-blog/decouple-blocks-and-model-parallelisation/ can help for the first two.
Note that an alternatives to parallelization within the model is to parallelize a sweep of parameters (if that is your scenario). That is done automatically for sweep parameters, and without any of these drawbacks.

concurrent.futures with multinest

What is the best python based multinest package that optimizes for multi processing with concurrent.futures?
I've had issues getting multicast to use all of my CPUs with anything but multiprocessing.pool; but the python multinest operations seem to not be able to use that.
On the github issues section for dynesty (one of the two most common, pure-python MultiNest), we discussed this is as well
https://github.com/joshspeagle/dynesty/issues/100
There was not a very settled, final explanation, but the thought is that
(1) The cost function is not large enough to require all of the cores at once
(2) The bootstrap flag should be set to 0 to avoid bootstrapping; it's a trick implemented for speed that seems to be interfering.
I've used Nestle (github.com/kbarbary/nestle) and Dynesty (github.com/joshspeagle/dynesty); they both seem to have this problem no matter the complexity of the cost function.
I have had great success using PyMultiNest (github.com/JohannesBuchner/PyMultiNest); but it requires the fortran version of MultiNest (github.com/JohannesBuchner/MultiNest), which is very difficult to install correctly -- need to manually install OpenMPI. Both MultiNest and OpenMPI can have compiler issues depending on the OS, system, and configuration thereof.
I would suggest using PyMultiNest, except that it's so hard to install; Using Dynesty and Nestle are trivial; but they have had this issue with full parallelizations.

Time virtualisation on linux

I'm attempting to test an application which has a heavy dependency on the time of day. I would like to have the ability to execute the program as if it was running in normal time (not accelerated) but on arbitrary date/time periods.
My first thought was to abstract the time retrieval function calls with my own library calls which would allow me to alter the behaviour for testing but I wondered whether it would be possible without adding conditional logic to my code base or building a test variant of the binary.
What I'm really looking for is some kind of localised time domain, is this possible with a container (like Docker) or using LD_PRELOAD to intercept the calls?
I also saw a patch that enabled time to be disconnected from the system time using unshare(COL_TIME) but it doesn't look like this got in.
It seems like a problem that must have be solved numerous times before, anyone willing to share their solution(s)?
Thanks
AJ
Whilst alternative solutions and tricks are great, I think you're severely overcomplicating a simple problem. It's completely common and acceptable to include certain command-line switches in a program for testing/evaluation purposes. I would simply include a command line switch like this that accepts an ISO timestamp:
./myprogram --debug-override-time=2014-01-01Z12:34:56
Then at startup, if set, subtract it from the current system time, and indeed make a local apptime() function which corrects the output of regular system for this, and call that everywhere in your code instead.
The big advantage of this is that anyone can reproduce your testing results, without a big readup on custom linux tricks, so also an external testing team or a future co-developer who's good at coding but not at runtime tricks. When (unit) testing, that's a major advantage to be able to just call your code with a simple switch and be able to test the results for equality to a sample set.
You don't even have to document it, lots of production tools in enterprise-grade products have hidden command line switches for this kind of behaviour that the 'general public' need not know about.
There are several ways to query the time on Linux. Read time(7); I know at least time(2), gettimeofday(2), clock_gettime(2).
So you could use LD_PRELOAD tricks to redefine each of these to e.g. substract from the seconds part (not the micro-second or nano-second part) a fixed amount of seconds, given e.g. by some environment variable. See this example as a starting point.

Ghostscript failsafe mechanism?

I am running a ghostscript command from the shell to convert a postscript file to JPG, like so:
gs -dBATCH - dSAFER -dNOPAUSE -sDEVICE=jpeg -sOutputFile=out.jpg source.ps
Most of the time this works fine, but occasionally a bad file will cause it to hang.
As I am not an expect in GhostScript, I can't say whether there are any built-in failsafe mechanisms that could prevent it from failing, or at least make it fail in a more graceful manner (right now I have to kill the process)
Thanks
On those bad files, I would suggest trying them with either -dNOTRANSPARENCY and/or -dNOINTERPOLATION. Disabling transparency, if it makes a difference, will likely cause the output to be incorrect, but it would give you a hint as to whether you've found a bug, or a slow file. Transparency blending and image interpolation are both areas that can easily consume a lot of CPU time and memory.
You might try leaving it running overnight, again in an attempt to establish whether this is a bug or not.
Also, if you're not already doing so, you could consider upgrading to the latest release (9.05), we've fixed a number of problems, and improved performance somewhat in the last few releases.
Finally, if you have an example you can share, report it with the example at Ghostscript Bugzilla
Parenthetically, using a Postscript RIP in a traditional "server" configuration generally relies on a Postscript infinite loop - the "server loop" is usually implemented in Postscript.
Chris
PostScript is a completely general programming language. So PostScript programs, like programs in any other full programming language, can get stuck in endless loops as well as go wrong in all the other usual ways. The Halting Theorem proves that, in general, it is impossible to predict whether a given program will get stuck in a loop or not purely from some automatic analysis of it (other than actually running it).
The only way you can guard against hangs is impose some kind of arbitrary time limit on the execution of a PostScript program, and kill the Ghostscript process when that time is exceeded.

Linux kernel scheduling

I wish to know how Old Linux scheduling algorithm SJF (shortest job first) calculates the process runtime ?
This problem actually is one of the major reasons why it is rarely used in common environments, since SJF algorithm requires accurate estimate of the runtime of all processes, which is only given in specialized environments.
In common situations you can only get estimated and inaccurate length of process running time, for example, by recording the length of previous CPU bursts of the same process, and use mathematical approximation methods to calculate how long it will run next time.
If you have some bandwidth to burn, you might be able to find the actual code here. Start at 2.0, where I think you'll find it as experimental.
SJF was (IIRC) extremely short lived, for the exact reasons that ZelluX noted.
I think your only hope to understand the method behind its madness lives in the code at this point. You may be able to build it and get it to boot in a simulator.
Edit:
I'm now not completely sure if it ever did go into mainline. If you can't find it, don't blame me :)

Resources