Data distribution using OpenMPI - openmpi

I would like to distribute the data on multiple machines connected by TCP/IP network using OpenMPI.. can anyone point me to the right resources and direction. I am new to OpenMPI.
Thanks

It depends on the language you're going to write the software. But basically, openMPI application look like this:
Call MPI_INIT for MPI to initialize necessary communications for you between the nodes.
Use MPI_Send, MPI_RECV functions to send or to receive data. There are blocking and non-blocking calls for these functions, along with several others - broadcasting (send to everyone), scatter (distribute data from an array in equal portions to every host) etc.
Use MPI_FINALIZE to finish the communication process.
In MPI, there's almost always following workflow is included:
Master host is assigned - usually the one with processId = 0. It's function is to coordinate the work of slave hosts. Basically, if you have to get the maximum value from array in parallel, it's his job to take the array, distribute it in equal portions to slaves, gather the results from slaves and choose the max number from the list.
Slave host - waits for data to receive, performs handling, sends the results back to master.
I'd recommend this MPI tutorial for C++ development and also check out this so post regarding books on the topic.

Here's just one of the many MPI tutorials on the net; I'm surprised you didn't find this yourself.

Related

What is the meaning of using COMM_WORLD or COMM_SELF to instantiate a TS, DMDA, Vec, etc

I'm looking at several examples from PETSc and petsc4py and looking at the PDF user manual of PETSc. The manual states:
For those not familiar with MPI, acommunicatoris a way of indicating a collection of processes that will be involved together in a calculation or communication. Communicators have the variable type MPI_Comm. In most cases users can employ the communicator PETSC_COMM_WORLD to indicate all processes in a given run and PETSC_COMM_SELF to indicate a single process.
I believe I understand that statement, but I'm unsure of the real consequences of actually using these communicators are. I'm unsure of what really happens when you do TSCreate(PETSC_COMM_WORLD,...) vs TSCreate(PETSC_COMM_SELF,...) or likewise for a distributed array. If you created a DMDA with PETSC_COMM_SELF, does this maybe mean that the DM object won't really be distributed across multiple processes? Or if you create a TS with PETSC_COMM_SELF and a DM with PETSC_COMM_WORLD, does this mean the solver can't actually access ghost nodes? Does it effect the results of DMCreateLocalVector and DMCreateGlobalVector?
The communicator for a solver decides which processes participate in the solver operations. For example, a TS with PETSC_COMM_SELF would run independently on each process, whereas one with PETSC_COMM_WORLD would evolve a single system across all processes. If you are using a DM with the solver, the communicators must be congruent.

Single Camera access by two processes at the same time

I want to use one Camera for two processes / threads, e.g.
a) live streaming and
b) image processing at the same time.
Use Case:
Application, which can handle multiple request, based on a user request.
a) User can request – Detect cam-1 and do a Live streaming
b) Later, user can request – Detect Motion / Image processing using the same cam-1, while process (a) is doing the live streaming.
Challenge I see to access same camera by 2 different process at the same time, is there way to reroute the data / pointers of Cam data to different process ?
Note: OS -Windows
Any help will be appreciated !!
Regards, AK
Well, doable. But ..
Given the said above, there are few things to respect once designing the target software approach. One of these is a fact, the camera is a device, which restricts it to have a single "commander-in-charge", rather than permiting to have a shizophrenic "duty" under several concurrent bosses.
This sais, the solution is in smarter-design of the acquired data-stream, this could be delivered into several concurrent consuming-processes.
For more hints on such a design concept, read this Answer to a similarly motivated Question.
Avoid to let two threads access the camera at the same time.
If the driver allows it, you may work with multiple buffers, used in a round-robin fashion to store the live stream. Their content can be continuously sent to the display, but when desired you can leave one on the side and reserve it to allow for longer processing.
If this is not possible, you can copy every desired image to a processing buffer when needed.
If your system must be very responsive and process the images in real-time, there is probably no need for two threads !
In any case, if you are working with two threads, there is no need to "reroute the pointers", you simply let the threads access the buffers.
If they are processes rather than threads, then you can establish the buffers in a shared memory section.

"Resequencing" messages after processing them out-of-order

I'm working on what's basically a highly-available distributed message-passing system. The system receives messages from someplace over HTTP or TCP, perform various transformations on it, and then sends it to one or more destinations (also using TCP/HTTP).
The system has a requirement that all messages sent to a given destination are in-order, because some messages build on the content of previous ones. This limits us to processing the messages sequentially, which takes about 750ms per message. So if someone sends us, for example, one message every 250ms, we're forced to queue the messages behind each other. This eventually introduces intolerable delay in message processing under high load, as each message may have to wait for hundreds of other messages to be processed before it gets its turn.
In order to solve this problem, I want to be able to parallelize our message processing without breaking the requirement that we send them in-order.
We can easily scale our processing horizontally. The missing piece is a way to ensure that, even if messages are processed out-of-order, they are "resequenced" and sent to the destinations in the order in which they were received. I'm trying to find the best way to achieve that.
Apache Camel has a thing called a Resequencer that does this, and it includes a nice diagram (which I don't have enough rep to embed directly). This is exactly what I want: something that takes out-of-order messages and puts them in-order.
But, I don't want it to be written in Java, and I need the solution to be highly available (i.e. resistant to typical system failures like crashes or system restarts) which I don't think Apache Camel offers.
Our application is written in Node.js, with Redis and Postgresql for data persistence. We use the Kue library for our message queues. Although Kue offers priority queueing, the featureset is too limited for the use-case described above, so I think we need an alternative technology to work in tandem with Kue to resequence our messages.
I was trying to research this topic online, and I can't find as much information as I expected. It seems like the type of distributed architecture pattern that would have articles and implementations galore, but I don't see that many. Searching for things like "message resequencing", "out of order processing", "parallelizing message processing", etc. turn up solutions that mostly just relax the "in-order" requirements based on partitions or topics or whatnot. Alternatively, they talk about parallelization on a single machine. I need a solution that:
Can handle processing on multiple messages simultaneously in any order.
Will always send messages in the order in which they arrived in the system, no matter what order they were processed in.
Is usable from Node.js
Can operate in a HA environment (i.e. multiple instances of it running on the same message queue at once w/o inconsistencies.)
Our current plan, which makes sense to me but which I cannot find described anywhere online, is to use Redis to maintain sets of in-progress and ready-to-send messages, sorted by their arrival time. Roughly, it works like this:
When a message is received, that message is put on the in-progress set.
When message processing is finished, that message is put on the ready-to-send set.
Whenever there's the same message at the front of both the in-progress and ready-to-send sets, that message can be sent and it will be in order.
I would write a small Node library that implements this behavior with a priority-queue-esque API using atomic Redis transactions. But this is just something I came up with myself, so I am wondering: Are there other technologies (ideally using the Node/Redis stack we're already on) that are out there for solving the problem of resequencing out-of-order messages? Or is there some other term for this problem that I can use as a keyword for research? Thanks for your help!
This is a common problem, so there are surely many solutions available. This is also quite a simple problem, and a good learning opportunity in the field of distributed systems. I would suggest writing your own.
You're going to have a few problems building this, namely
2: Exactly-once delivery
1: Guaranteed order of messages
2: Exactly-once delivery
You've found number 1, and you're solving this by resequencing them in redis, which is an ok solution. The other one, however, is not solved.
It looks like your architecture is not geared towards fault tolerance, so currently, if a server craches, you restart it and continue with your life. This works fine when processing all requests sequentially, because then you know exactly when you crashed, based on what the last successfully completed request was.
What you need is either a strategy for finding out what requests you actually completed, and which ones failed, or a well-written apology letter to send to your customers when something crashes.
If Redis is not sharded, it is strongly consistent. It will fail and possibly lose all data if that single node crashes, but you will not have any problems with out-of-order data, or data popping in and out of existance. A single Redis node can thus hold the guarantee that if a message is inserted into the to-process-set, and then into the done-set, no node will see the message in the done-set without it also being in the to-process-set.
How I would do it
Using redis seems like too much fuzz, assuming that the messages are not huge, and that losing them is ok if a process crashes, and that running them more than once, or even multiple copies of a single request at the same time is not a problem.
I would recommend setting up a supervisor server that takes incoming requests, dispatches each to a randomly chosen slave, stores the responses and puts them back in order again before sending them on. You said you expected the processing to take 750ms. If a slave hasn't responded within say 2 seconds, dispatch it again to another node randomly within 0-1 seconds. The first one responding is the one we're going to use. Beware of duplicate responses.
If the retry request also fails, double the maximum wait time. After 5 failures or so, each waiting up to twice (or any multiple greater than one) as long as the previous one, we probably have a permanent error, so we should probably ask for human intervention. This algorithm is called exponential backoff, and prevents a sudden spike in requests from taking down the entire cluster. Not using a random interval, and retrying after n seconds would probably cause a DOS-attack every n seconds until the cluster dies, if it ever gets a big enough load spike.
There are many ways this could fail, so make sure this system is not the only place data is stored. However, this will probably work 99+% of the time, it's probably at least as good as your current system, and you can implement it in a few hundred lines of code. Just make sure your supervisor is using asynchronous requests so that you can handle retries and timeouts. Javascript is by nature single-threaded, so this is slightly trickier than normal, but I'm confident you can do it.

Having MATLAB to run multiple independent functions which contains infinite while loop

I am currently working with three matlab functions to make them run near simultaneously in single Matlab session(as I known matlab is single-threaded), these three functions are allocated with individual tasks, it might be difficult for me to explain all the detail of each function here, but try to include as much information as possible.
They are CONTROL/CAMERA/DATA_DISPLAY tasks, The approach I am using is creating Timer objects to have all the function callback continuously with different callback period time.
CONTROL will sending and receiving data through wifi with udp port, it will check the availability of package, and execute callback constantly
CAMERA receiving camera frame continuously through tcp and display it, one timer object T1 for this function to refresh the capture frame
DATA_DISPLAY display all the received data, this will refresh continuously, so another timer T2 for this function to refresh the display
However I noticed that the timer T2 is blocking the timer T1 when it is executed, and slowing down the whole process. I am working on a system using a multi-core CPU and I would expect MATLAB to be able to execute both timer objects in parallel taking advantage of the computational cores.
Through searching the parallel computing toolbox in matlab, it seems not able to deal with infinite loop or continuous callback, since the code will not finish and display nothing when execute, probably I am not so sure how to utilize this toolbox
Or can anyone provide any good idea of re-structuring the code into more efficient structure.
Many thanks
I see a problem using the parallel computing toolbox here. The design implies that the jobs are controlled via your primary matlab instance. Besides this, the primary instance is the only one with a gui, which would require to let your DISPLAY_DATA-Task control everything. I don't know if this is possible, but it would result in a very strange architecture. Besides this, inter process communication is not the best idea when processing large data amounts.
To solve the issue, I would use Java to display your data and realise the 'DISPLAY_DATA'-Part. The connection to java is very fast and simple to use. You will have to write a small java gui which has a appendframe-function that allows your CAMERA-Job to push new data. Obviously updating the gui should be done parallel without blocking.

Is the lock necessary when a host attempts to receive the data from different sockets

I have three machines A, B, and C that are all connected each other. If A and B try to send data to C simultaneously, Can C use two different threads to receive the respective data without using any locks? Here C is connected to A and B through different sockets. Thanks in advance.
Well, yes - no explicit locks anyway. The IP stack will have its own internal locks, but I don't think that's what you are asking.
You already appreciate that multiple processes can communicate simultaneously with different servers, and multiple processes implies different threads. The IP stack is therefore thread-safe.
Given the usual general care with any shared data inside one multithreaded process, (as metioned by rockstar comment), there is no problem with those threads communicating with IP endpoints on different peers/hosts. This is very common and works fine.
The two threads on C can safely communicate independently with A and B.
Go ahead - try it!
[posting my comment as answer as it is not wrong and makes sense :P even referenced.]
I would say that you can have 2 threads . One thread listening for data from socket 1 and the other thread listening for data from socket 2 .
But if you need a lock or not should depend on what you do with the data . Do you write it to some buffer ? Since threads share Data,Code & Heap segment therefore you must be careful when you write this received data in which case you need to lock .
This is my basic understanding . I shall wait for more knowledgeable answers here.

Resources