LLVM-IR, interprocess communication, managing layer

LLVM-IR, interprocess communication, managing layer - multithreading

would it be possible to implement a layer managing several running programs on the basis of LLVM-IR?
Background: I have several small programs (which should be coded in IR directly or tranformed from clang to IR). They might get altered on IR level, then should be compiled on/for a given architecture and run there (in parallel).
There should be a central instance, which supervises the individuals, possibly spawning new individuals. Interprogram communication should be possible.
Not using IR, i could use c/c++ and some system specific threading/message passing tools.
If I understood correctly, I could translate the code to IR; but if IR is targetagnostic, would'nt this fail? If not this means that there should be IR level tools for inter program communication...
So I'm confused about this. I read through the LLVM documentation but did not get it clear so far. Any hints?
UPDATE:
So far I found a possible (target dependent) solution.
On could code the managing layer in clang, using the clang interpreter (jit) approach for the individuals which were 'jitted' and then spawned as threads (e.g. using pthread) by the managing layer.
If I understood correctly, the target specific part enters the IR code as externals. In this sense IR is not platform agnostic, as the resulting IR from a given clang code makes use of the platform specific includes.
Using newlib, one gets standard clib functionalities as source, applicable for several targets.
So using newlib+clang, the IR is not platform independent, but the process from source to final machine code is flexible (among the available newlib&llvm targets).
Right?
How about generating threads with round-robin type scheduling? I guess this will be platform dependent (e.g. by using pthread).
Right?

Related

Node WASI vs spawning a child process

In the NodeJS docs it states the following:
The WASI API provides an implementation of the WebAssembly System Interface specification. WASI gives sandboxed WebAssembly applications access to the underlying operating system via a collection of POSIX-like functions.
My question is:
What is the biggest benefit of using the WASI API over, say, spawing some other child process or similar methods of running non-nodejs code?
I would have to assume it's faster than spawning a child process, or using some C code with bindings due to the native-api.
Maybe I'm simply misunderstanding the entire idea behind WASI, which is plausable, given that part of what makes WASM so amazing is the ability to use a server-side, full blown programming language on the web (mostly), like all the crazy tools we've seen with Go/Rust.
Is this more so for the benefit of running WASM in node, natively, and again, if so, what are the benefits compared to running child processes?

I ended up getting my answer from a post here that was removed.
In really high-level terms, WASI is simply an (systems) interface for WASM.
I ended up finding this short 'article', if you will, super helpful too!
Just as WebAssembly is an assembly language for a conceptual machine, WebAssembly needs a system interface for a conceptual operating system, not any single operating system. This way, it can be run across all different OSs.
This is what WASI is — a system interface for the WebAssembly platform.

General server-side use of V8: Isolates

Google's open sourced V8 engine is mature, performant JIT compiler.
Implemented primarily in C++, acting as JS centric execution runtime.
It has an isolation implementation (V8: Isolates), providing isolation granularity within a single process.
Leading to two part question.
(Generic)
Can this capability be broadly used for isolation across server-side web application engines (e.g. nginx, apache) and programming languages?
(And more specific ->)
What I've grasped of V8 - is that it's designed for JS scripting lang (even though, it compiles directly to machine code).
Wanting to use a programming language for source code - say Haskell, C++/C - then tends to still have JS interface in between.
Would there be a much direct way to generate machine code, while still using V8: Isolates?

V8 is a JavaScript (and WebAssembly, in recent versions) engine and as such cannot be used to compile or execute any other languages.
If you have C++ code, you'll need to use a C++ compiler to generate executable machine code for it. Haskell code needs a Haskell compiler.
Depending on your requirements, WebAssembly might be interesting to you: it is a portable compilation target for languages like C++ that is more suitable for this purpose than JavaScript.
This should answer both your "more specific" and the "generic" question.
Note that there isn't really any magic in V8's Isolates that one might want to use for other purposes; the term mostly describes the ability to have several separate instances of V8 in the same process. That's rather easy to pull off if you start your own project from scratch (no matter what its purpose is), you just have to maintain a bit of coding discipline; for an existing codebase it requires refactoring of all global state (static variables etc).
Also, note that the world has learned this year that from a security point of view, there really is no such thing as in-process isolation. If you have strong security requirements, then at the very least you'll have to run separate processes for different security domains. (To be clear, V8's Isolates do not provide protection from side-channel attacks.)

OpenMP program on different hosts

I want to know if it would be possible to run an OpenMP program on multiple hosts. So far I only heard of programs that can be executed on multiple thread but all within the same physical computer. Is it possible to execute a program on two (or more) clients? I don't want to use MPI.

Yes, it is possible to run OpenMP programs on a distributed system, but I doubt it is within the reach of every user around. ScaleMP offers vSMP - an expensive commercial hypervisor software that allows one to create a virtual NUMA machine on top of many networked hosts, then run a regular OS (Linux or Windows) inside this VM. It requires a fast network interconnect (e.g. InfiniBand) and dedicated hosts (since it runs as a hypervisor beneath the normal OS). We have an operational vSMP cluster here and it runs unmodified OpenMP applications, but performance is strongly dependent on data hierarchy and access patterns.
NICTA used to develop similar SSI hypervisor named vNUMA, but development also stopped. Besides their solution was IA64-specific (IA64 is Intel Itanium, not to be mistaken with Intel64, which is their current generation of x86 CPUs).
Intel used to develop Cluster OpenMP (ClOMP; not to be mistaken with the similarly named project to bring OpenMP support to Clang), but it was abandoned due to "general lack of interest among customers and fewer cases than expected where it showed a benefit" (from here). ClOMP was an Intel extension to OpenMP and it was built into the Intel compiler suite, e.g. you couldn't use it with GCC (this request to start ClOMP development for GCC went in the limbo). If you have access to old versions of Intel compilers (versions from 9.1 to 11.1), you would have to obtain a (trial) ClOMP license, which might be next to impossible given that the product is dead and old (trial) licenses have already expired. Then again, starting with version 12.0, Intel compilers no longer support ClOMP.
Other research projects exist (just search for "distributed shared memory"), but only vSMP (the ScaleMP solution) seems to be mature enough for production HPC environments (and it's priced accordingly). Seems like most efforts now go into development of co-array languages (Co-Array Fortran, Unified Parallel C, etc.) instead. I would suggest that you have a look at Berkeley UPC or invest some time in learning MPI as it is definitely not going away in the years to come.

Before, there was the Cluster OpenMP.
Cluster OpenMP, was an implementation of OpenMP that could make use of multiple SMP machines without resorting to MPI. This advance had the advantage of eliminating the need to write explicit messaging code, as well as not mixing programming paradigms. The shared memory in Cluster OpenMP was maintained across all machines through a distributed shared-memory subsystem. Cluster OpenMP is based on the relaxed memory consistency of OpenMP, allowing shared variables to be made consistent only when absolutely necessary. source
Performance Considerations for Cluster OpenMP
Some memory operations are much more expensive than others. To achieve good performance with Cluster OpenMP, the number of accesses to unprotected pages must be as high as possible, relative to the number of accesses to protected pages. This means that once a page is brought up-to-date on a given node, a large number of accesses should be made to it before the next synchronization. In order to accomplish this, a program should have as little synchronization as possible, and re-use the data on a given page as much as possible. This translates to avoiding fine-grained synchronization, such as atomic constructs or locks, and having high data locality source.

Another option for running OpenMP programs on multiple hosts is the remote offloading plugin in the LLVM OpenMP runtime.
https://openmp.llvm.org/design/Runtimes.html#remote-offloading-plugin
The big issue with running OpenMP programs on distributed memory is data movement. Coincidentally, that is also one of the major issues in programming GPU's. Extending OpenMP to handle GPU programming has given rise to OpenMP directives to describe data transfer. Programming GPU's has also forced programmers to think more carefully about building programs that consider data movement.

Using SDK semaphore in native code

We have a large c++ codebase that I'm porting onto Android. We had the foresight to abstract various platform-dependent features (threading, file access etc), so the process involves the gradual implementation of Android-appropriate code functions in the NDK
I was getting on reasonably well until I realised that semaphores (used in our core code) don't appear to have an implementation in the NDK.
I was wondering if it were possible under this (and possibly other) circumstances to implement the required functionlity (if it exists) in the SDK, e.g a 'Java' Semaphore and pass it down to the native code via the JNI interface for the native code to operate on it via appropriate callbacks.
Is there a reason why this might be inadviseable for synchronisation purposes?
Thanks

Have a bad news guys. Semaphore isn't implemented on android version of pthread google source
As for me flock(2) is an answer. But there is potential problem where you are trying to lock same file from different users.

Probably the worst consequence of using a Java-based implementation that you pass to your native code is that your semaphore operations will be very slow since they have to cross the JNI boundary.
Is there any reason why you can't use POSIX semaphores? See semaphore.h from the NDK's headers.

What is the difference with these technology related terms?

What is the difference between the next terms, it can help a lot in interviews and general understanding.
Framerwork
Library
IDE
API

Framework
Some predefined architecture that a developer has chosen and which dictates how the application will be written. It usually already includes many concepts which helps the developer to concentrate on the domain of the application instead of the plumbing. This plumbing is provided by the framework. For example the .NET framework provides out-of-the-box tools that would allow you to talk to web servers, without even knowing the internals of the TCP/IP protocol (actually it helps knowing the internals but you get the point).
Library
A reusable compiled unit that can be redistributed and reused across various projects. Well not necessary compiled in case of dynamic languages.
IDE
It's the development environment where you create the other three parts (usually text editor), it might also include compiler and the possibility to execute, debug and see the output of the program in order to speed up the development process.
API
Application Programming Interface. This could mean many things but usually it is a set of functions given to the disposition of the developer and which perform specific tasks and work only in a specific context.

IDE is a tool for fast, easy and flexible development
An API is provided for an existing software. Using these third party applications can interact with main/primary application.
A framework or library are typically same. They are a common set of functionality for other software to use.
Ref: wiki for Framework, API

Framework: a collection of libraries and programming practices to provide general functionality for a program, so that it doesn't have to be rewritten. Typically a framework for an application program will handle user display and input, among other things. The intent is usually to hide the more complex functionality of an application, and to encourage a certain style.
Library: A piece of software to provide certain functionality to other programs that call it. Typically designed to be reusable and modular, so that a library can be distributed and be useful without its source code.
Integrated Development Environment: A integrated set of tools to write programs and turn them into finished products, usually including at least an editor, compiler, linker, and debugger. IDEs sometimes provide support for frameworks.
Application Programming Interface: A set of function calls and sometimes variable accesses available to a program, typically being the public interface of one or more libraries.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string