What is difference between Hydra and Torque and what is better: MPICH2 or OpenMPI - openmpi

I have two questions:
What is difference between Hydra and Torque, or to ask in other way: what more Hydra have to offer in compare to Torque? Do I need Hydra at all if I choose to use Torque (+ MAUI)?
Also, what is an advantage of MPICH2 in advance of OpenMPI, since OpenMPI is supporting IB and also have continuously supporting Windows platform? For me it looks like swiss knife. Am I wrong?

Torque and Hydra are two completely separate things. Torque is a distributed resource manager that allows execution in batch mode of tasks (jobs) on a network of compute systems. Hydra is part of MPICH and is responsible for launching and controlling processes that are part of the MPI job. The way Torque and Hydra work together is that one submits a job to Torque, which reserves cluster resources and at some point start the job. The mpiexec command in turn uses Hydra to start and control the processes that make the MPI job on the compute nodes provided by Torque.
MPICH2 and Open MPI are both quite mature MPI implementations. While Open MPI supports more connection protocols, there is an InfiniBand-enabled version of MPICH called MVAPICH. MPICH is also basis of several commercial MPI implementations including Intel MPI and Microsoft MPI. While Open MPI has supported Windows for a long time, their Windows maintainer left some time ago and it is unclear if they will continue to support that OS.

Related

How does Linux kernel automatically migrate threads (to balance load or otherwise)?

I looked at this Q&A How does Linux kernel migrate the process among multiple cores?
and I guess it's somewhat clear how the tasks are migrated to different cores via sched_setaffinity()
But I am right that the scheduler may decide to migrate a given task to some core at any tick?
I looked here https://elixir.bootlin.com/linux/v6.0/source/kernel/sched/core.c and I don't see where something like stop_one_core is called from
It's called by sched_exec but that is... for exec syscalls only, right?..
It's also called by migrate_task_to but that's under an ifdef for NUMA...
So where's the usual scheduler migration path? Or am I wrong and such usual periodic migration is not performed by the scheduler but... by migration threads ([migration/X]) maybe?

How to update Julia in ssh clusters

I'm working as a PhD student in a lab with ssh clusters, I have the access to connect to each one of them (there's no queue system, as it is a small lab, hence, as long as someone is not using a lot of cores in each computer I can run my programs on them).
Currently the lab doesn't have a cluster administrator so its maintenance is in the hands of two researchers with computer knowledge. Currently, the clusters have a very old version of Julia (0.5.1) and I need an update in order to work; however, one of the two researchers in charge told me that it will require a very large amount of time and stopping all current process in order to update Julia, so he is unwilling to make the update on the clusters.
Is there a way that I can update the Julia version on the clusters all by myself? Without interrupting nor canceling any of the current processes?
I believe non of the current processes are being run with Julia, as I am the only one in the lab who works with it. The languages being used for these processes are C, C++ and Fortran.
Julia does not need to be installed system-wide to be used. In fact on all OSes - Linux, Mac and Windows - the Julia distribution is portable and self-contained.
So, the easiest way to get this is to use juliaup to install julia on all the nodes, and use it to manage all the julia versions you need.

VM - LLVMPIPE (Software Rendering) and Effect of Core/Thread Usage on Application

I have written an application (Qt/C++) that creates a lot of concurrent worker threads to accomplish its task, utilizing QThreadPool (from the Qt framework). It has worked flawlessly running on a dedicated server/hardware.
A copy of this application is now running in a virtual machine (RHEL 7), and performance has suffered significantly in that the queue (from the thread pool) is being exercised quite extensively. This has resulted in things getting backed up a bit. This, despite having more cores available to the application through this VM version than the dedicated, non-virtualized server.
Today, I did some troubleshooting with the top -H -p <pid> command, and found that there were 16 total llvmpipe-# threads running all at once, apparently for software rendering of my application's very simple graphical display. It looks to me like the presence of so many of these rendering threads has left limited resources available for my actual application's threads to be concurrently running. Meaning, my worker threads are yielding/taking a back seat to these.
As this is a small/simple GUI running on a server, I don't care to dedicate so many threads to software rendering of its display. I read some Mesa3D documentation about utilizing the LP_NUM_THREADS environment variable, to limit its use. I set it to LP_NUM_THREADS=4, and as a result I seem to have effectively opened up 12 cores for my application to now use for its worker threads.
Does this sound reasonable, or will I pay some sort of other consequence for doing this?

Ubuntu Firebird2 Classic Server config to use all CPU

I am using Firebird 2.0.7 CS on my Ubuntu 16.04 Server. It is not possible to upgrade to a higher version due to the software used, which requires a lower one.
I've used the SuperServer version before, but on Linux the parameter CpuAffinityMask is ignored.
The SuperServer version works tragically because on Linux it uses only 1 core.
The ClassicServer version is a little better, because it assigns 1 core to the 1 user.
When I run demanding task in program, fb_inet_server use 100% of 1 core, but other 23 cores are idle. How I can assign more cores to this process?
The CpuAffinityMask setting is only for SuperServer (and then only for Windows).
If you are using Classic Server, then Firebird can (and it will) use all cores if there is sufficient activity, however the Firebird processes need to coordinate their effort, which - if there is a lot of lock contention - can lead to reduced performance.
To reduce lock contention, you may want to increase the LockHashSlots setting.
Increasing the number of page buffers may also help, but keep in mind that with Classic Server, this setting is per process and can increase memory usage.
Contrary to what you state, Firebird does not "assign[s] 1 core to the 1 user.". Classic Server will create a process per connection, and the threads of these processes will be scheduled by the OS on any available core.

Do performance stats like Geekbench represent general multi-tasking performance?

I am trying to compare how an i7 dual core 2.7Ghz would perform vs. an i7 quad core 2.0Ghz in a multitasking environment. The quad core scores at around 9000 while the dual comes in at around 7500 (for Geekbench). At the same time, Geekbench explicity specifies that the tests show the full performance potential of all the cores. However, in real world, everyday use, almost none of the application I would be running are multi-threaded (Ruby runtime, Java IDE, Windows VM on mac, app server).
This machine would server as a web development machine. Which cpu would be most "snappy" in terms of response time in this use case?
Results of a benchmark have any practical meaning only if the benchmark very closely approximates your typical workload.
You should consider whether your typical development environment regularly calls for parallelism. For example, if I develop a C/C++/Java app it's common that a header file (or Java source) change to cause several other files to be recompiled and a few binaries to be relinked - that's a highly parallel workload and many-core CPU may prove advantageous.
On the other hand, if I'm changing a few Python or Javascript sources, I doubt I will create any parallel workload when I try to execute and test the changes.
However, these are theoretical considerations.
I don't think the speed of the machine is a bottleneck in any development effort. The human is.

Resources