Calling between multiple Matlab instances or threads - multithreading

Basically, I need a way to call Matlab functions from an indefinitely-long separate thread.
First, I'm aware that I could use the TCPIP or UDP functionality to communicate between two instances of Matlab. I'll explain why that doesn't really help.
Background: I've written a Matlab class that acts as an interface for a USB device. Matlab was chosen because I need it to run on Mac/Linux/Windows, and the target users are only familiar with Matlab. Because of some inconsistencies in Matlab across platforms, I'm not using the BytesAvailableFcn or BytesAvailableFcnMode (I need as near realtime as possible, and with the aforementioned there can be delays up to 100s of milliseconds to send and receive data), and am instead sending and polling the port at a fixed interval using a timer. This introduces some overhead, and, if the user holds onto the main thread, the sending/receiving will also stop. Now, one of the most important function of the class is to set callbacks that are based on the input received from the device. The user sets their function and a given condition to match, and the object will call it automatically.
Problem: This object works well, completely in the background. However, as mentioned, it consumes some resources on the Matlab thread. I'm curious about making just the serial wrapper and callback functionality run on its own thread. However, if I compile it as a standalone application (for all 3 platforms) I believe my only solution will be TCPIP/UDP communication. Which then requires the object running on the main thread to poll the port in order to handle the callbacks in realtime - thus negating the benefit of moving it to a standalone application.
Threading in matlab is a nightmare. Doing anything in realtime, with the kind of latencies you're describing is not advised. Under the hood, Matlab uses Java for all it's platform independence. If you want to do this right, you'll write your app natively in Java, and call your java from Matlab (to deal with the fact that your users are incapable of installing a JRE, but can install matlab.)
That said, there is a better way to handle callbacks than what you are doing. My preferred architecture in this scenario is to have one thread service the hardware, and communicate with other threads via message queues (one for input, one for output, and one for command/control if you need to get super fancy.) Basically, the hardware thread then just focuses on servicing the queues. You have a second thread handle the callbacks. It reads the output queue of the hardware thread, and services the callbacks. I've never done this in matlab (see first paragraph) but it works very well in Java contexts.


What does it mean by " asynchronous I/O primitives" in nodejs?

I was going through Node.js documentation, and couldn't understand the line :
A Node.js app runs in a single process, without creating a new thread for every request. Node.js provides a set of asynchronous I/O primitives in its standard library that prevent JavaScript code from blocking and generally, libraries in Node.js are written using non-blocking paradigms, making blocking behavior the exception rather than the norm.
Source : Introduction to node js
I couldn't understand specifically:
[...] Node.js provides a set of asynchronous I/O primitives in its standard library that prevent JavaScript code from blocking [..]
Does it simply means, it has built in functionality that provides the provision to work as asynchronous?
If not then what are these set of asynchronous I/O primitives? If anyone could provide me some link for better understanding or getting started with Node.js, that would be really great.
P.S : I have practical experience with Nodejs where I understand how it's code will work but don't understand why it will work, so I want understand its theoretical part, so I can understand what actually is going on in the background.
Does it simply means, it has built in functionality that provides the provision to work as asynchronous?
Yes, that's basically what it means.
In a "traditional" one-thread-per-connection* model you accept a connection and then hand off the handling of that request to a thread (either a freshly started one or from a pool, doesn't really change much) and do all work related to that connection on that one thread, including sending the response.
This can easily done with synchronous/blocking I/O: have a read method that simply returns the read bytes and a write method that blocks until the writing is done.
This does, however mean that the thread handling that request can not do anything else and also that you need many threads to be able to handle many concurrent connections/requests. And since I/O operations take a relatively huge time (when measured in the speed of memory access and computation), that means that most of these threads will be waiting for one I/O operation or another for most of the time.
Having asynchronous I/O and an event-based core architecture means that when a I/O operation is initiated the CPU can immediately go on to processing whatever action needs to be done next, which will likely be related to an entirely different request.
Therefore one can handle many more requests on a single thread.
The "primitives" just means that basic I/O operations such as "read bytes" and "write bytes" to/from network connections or files are provided as asynchronous operations and higher-level operations need to be built on top of those (again in an asynchronous way, to keep the benefits).
As a side node: many other programming environments have either had asynchronous I/O APIs for a long time or have gotten them in recent years. The one thing that sets Node.js apart is that it's the default option: If you're reading from a socket or a file, then doing it asynchronously is what is "normal" and blocking calls are the big exception. This means that the entire ecosystem surrounding Node.js (i.e. almost all third-party libraries) works with that assumption in mind and is also written in that same manner.
So while Java, for example, has asynchronous I/O you lose that advantage as soon as you use any I/O related libraries that only support blocking I/O.
* I use connection/request interchangeably in this answer, under the assumption that each connection contains a single request. That assumption is usually wrong these days (most common protocols allow multiple request/response pairs in a single connnection), but handling multiple requests on a single connection doesn't fundamentally change anything about this answer.
It means node js doesn't halts on input/output operations. Suppose you need to do some task and it have some blocking condition e.g if space key is pressed then do this or while esc key isn't pressed keep taking input as node js is single threaded this will stop all operations and it will focus of doing the blocking condition job until its finishes it asyncronous will allow application not halt other tasks while doing one it will do other task until than task finishes and thats why we use await to get value out of promises in node js async function when the data is processed then the pointer will comes to the line where await line is present and get the value or process the information from it.

Can single thread do everything that multithread can do?

My 1st Question: As per the title.
I am asking this because I came across a StackExchange question: What can multiple threads do that a single thread cannot?
In one of the solutions given in that link states that whatever multithread can do, it can be done by single thread as well.
However I don't think this is true. My argument is this: When we build a simple chat program with socket programming and run it via the command console. If the chat program is single threaded. The chat program is actually half-duplex. Which means we cannot listen and talk concurrently and each time only a party can talk and the other have to listen. In order for both parties to be able to talk and receive message concurrently, we have to implement it with multithreads.
My 2nd Question: Is my argument correct? Or did I miss out some points here, and therefore a single thread still can do everything multithread does?
Let's consider the computer as a whole, and more precisely that you chat application is bound with the kernel (or the whole os) as a piece we would call "the software".
Now consider that this "software" runs on a single core (say a i386).
Then you can figure out that, even if you wrote your chat application using threads (which is probably quite overkill), the software as a whole runs on a single CPU core, which means that at a very moment it performs one single thing even if there seem to be parallel things happening.
This is nothing more but a Turing machine (using a single tape)
The parallelism is an illusion caused by the kernel because it can switch between task fast enough. Just like a film seems to be continuous picture on screen, when actually there are just 24 images per seconds, and this is enough to fool our brain.
So I would say that anything a multithreaded program does, a single threaded could do.
Nevertheless, now we all use multi-core CPUs which can be seen at a certain point as running on multiple computers at the same time (parallel computing), thus you can probably find software that works on multi core and that would not run on a single threaded one.
A good example are device drivers (in kernel). If you have a poor implementation, on non preemptive kernel, you can create a busy loop that waits for an event indefinitely. This usually deadlock on single core (you prevent the kernel to schedule to another task, thus you prevent the event to be sent). But this can work on multi core as the event is usually eventually sent by the other thread running on an other core (hopefully).
I want to amend the existing answer (+1):
You absolutely can run multiple parallel IOs on a single thread. An IO is nothing more but a kernel data structure. When you start the IO the OS talks to the hardware and tells it to do something. Then, the CPU is free to do whatever it wants. The hardware calls back into the OS when it's done. It issues an interrupt which hijacks a CPU core to process the completion notification.
This is called async IO and all OS'es provide it.
In fact this is how socket programs with many connections run. They use async IO to multiplex high amounts of connections onto a small pool of threads.
The core reason why this argument is incorrect is subtle. While it's true that with only a single thread, or single core, or single network interface, that particular component can only be handling a send or a receive at any given time, if it's not the critical path, it does not make sense to describe the overall system as half duplex.
Consider a network link that is full-duplex and takes 1ms to move a chunk of data from one end to the other. Now imagine we have a device that puts data on the link or removes data from the link but cannot do both at the same time. So long as it takes much less than 1ms to process a send or a receive, this single file path that data in both directions must go through does not somehow make the link half-duplex. There will still be data moving in both directions at the same time.
In any realistic chat application, the CPU will not be the limiting factor. So it's inability to do more than one thing at a time can't make the system half-duplex. There can still be data moving in both directions at the same time.
For a typical chat application under typical load, the behavior of the system will not be significantly different whether implementation uses a single thread or has multiple threads with infinite CPU resources. The CPU just won't be the limiting factor.

Does v4l2 support multi-map?

I'm trying to share frames(images) that I receive from a USB camera(logitech c270) between two processes so that I can avoid a memcpy. I'm using memory mapping streaming I/O method described here and I can successfully get frames from the camera after using v4l2_mmap. However, I have another process(for image processing) which has to use the image buffers after the dequeue and signal the first process to queue the buffer again.
Searching online, I could find that opening a video device multiple times is allowed, but when I try to map(tried both v4l2_mmap and just mmap) in the second process after a successful v4l2_open, I get an EINVAL error.
I found this pdf which talks about implementing multi-map in v4l2(Not official) and was wondering if this is implemented. I have also tried using User pointer streaming I/O method, the document of which explicitly states that a shared memory can be implemented in this method, but I get an EINVAL when I request for buffers(According to the documentation in this means the camera doesn't support User pointer streaming I/O).
Note: I want to keep the code modular, hence two processes. If this is not possible, doing all the work in a single process(multiple threads & global frame buffer) is still possible.
Using standard shared memory function calls is not possible as the two processes have to map to the video device file(/dev/video0) and I cannot have it under /dev/shm.
The main problem with multi-consumer mmap is that this needs to be implemented on the device driver side. That is: even if some devices might support multi-map, others might not.
So unless you can control the camera that is being used with your application, you will eventually come across one that does not, in which case your application would not work.
So in any case, your application should provide means to handle non multi-map devices.
Btw, you do not need multiple processes to keep your code modular.
Multiple processes have their merits (e.g. privilige separation, crash resilience,...), but might also encourage code duplication...
This may not be relevant now.....
You don't need to use the full monty multi consumer thing to do this. I have used Python to hand off the processing of the mmap buffers to multiple processes (python multi-threading only allows 1 thread at a time to execute)
If you're running multi-threaded then worker threads can pick up the buffer and process it independently when triggered by the master thread
Since the code is obviously very pythonesq I won't post it here as it wouldn't make sense in other languages as it uses python multi-processing support.

"Multi-process" vs. "single-process multi-threading" for software modules communicating via messaging

We need to build a software framework (or middleware) that will enable messaging between different software components (or modules) running on a single machine. This framework will provide such features:
Communication between modules are through 'messaging'.
Each module will have its own message queue and message handler thread that will synchronously handle each incoming message.
With the above requirements, which of the following approach is the correct one (with its reasoning)?:
Implementing modules as processes, and messaging through shared memory
Implementing modules as threads in a single process, and messaging by pushing message objects to the destination module's message queue.
Of source, there are some apparent cons & pros:
In Option-2, if one module causes segmentation fault, the process (thus the whole application) will crash. And one module can access/mutate another module's memory directly, which can lead to difficult-to-debug runtime errors.
But with Option-1, you need to take care of the states where a module you need to communicate has just crashed. If there are N modules in the software, there can be 2^N many alive/crashed states of the system that affects the algorithms running on the modules.
Again in Option-1, sender cannot assume that the receiver has received the message, because it might have crashed at that moment. (But the system can alert all the modules that a particular module has crashed; that way, sender can conclude that the receiver will not be able to handle the message, even though it has successfully received it)
I am in favor of Option-2, but I am not sure whether my arguments are solid enough or not. What are your opinions?
EDIT: Upon requests for clarification, here are more specification details:
This is an embedded application that is going to run on Linux OS.
Unfortunately, I cannot tell you about the project itself, but I can say that there are multiple components of the project, each component will be developed by its own team (of 3-4 people), and it is decided that the communication between these components/modules are through some kind of messaging framework.
C/C++ will be used as programming language.
What the 'Module Interface API' will automatically provide to the developers of a module are: (1) An message/event handler thread loop, (2) a synchronous message queue, (3) a function pointer member variable where you can set your message handler function.
Here is what I could come up with:
Multi-process(1) vs. Single-process, multi-threaded(2):
Impact of segmentation faults: In (2), if one module causes segmentation fault, the whole application crashes. In (1), modules have different memory regions and thus only the module that cause segmentation fault will crash.
Message delivery guarantee: In (2), you can assume that message delivery is guaranteed. In (1) the receiving module may crash before the receival or during handling of the message.
Sharing memory between modules: In (2), the whole memory is shared by all modules, so you can directly send message objects. In (1), you need to use 'Shared Memory' between modules.
Messaging implementation: In (2), you can send message objects between modules, in (1) you need to use either of network socket, unix socket, pipes, or message objects stored in a Shared Memory. For the sake of efficiency, storing message objects in a Shared Memory seems to be the best choice.
Pointer usage between modules: In (2), you can use pointers in your message objects. The ownership of heap objects (accessed by pointers in the messages) can be transferred to the receiving module. In (1), you need to manually manage the memory (with custom malloc/free functions) in the 'Shared Memory' region.
Module management: In (2), you are managing just one process. In (1), you need to manage a pool of processes each representing one module.
Sounds like you're implementing Communicating Sequential Processes. Excellent!
Tackling threads vs processes first, I would stick to threads; the context switch times are faster (especially on Windows where process context switches are quite slow).
Second, shared memory vs a message queue; if you're doing full synchronous message passing it'll make no difference to performance. The shared memory approach involves a shared buffer that gets copied to by the sender and copied from by the reader. That's the same amount of work as is required for a message queue. So for simplicity's sake I would stick with the message queue.
in fact you might like to consider using a pipe instead of a message queue. You have to write code to make the pipe synchronous (they're normally asynchronous, which would be Actor Model; message queues can often be set to zero length which does what you want for it to be synchronous and properly CSP), but then you could just as easily use a socket instead. Your program can then become multi-machine distributed should the need arise, but you've not had to change the architecture at all. Also named pipes between processes is an equivalent option, so on platforms where process context switch times are good (e.g. linux) the whole thread vs process question goes away. So working a bit harder to use a pipe gives you very significant scalability options.
Regarding crashing; if you go the multiprocess route and you want to be able to gracefully handle the failure of a process you're going to have to do a bit of work. Essentially you will need a thread at each end of the messaging channel simply to monitor the responsiveness of the other end (perhaps by bouncing a keep-awake message back and forth between themselves). These threads need to feed status info into their corresponding main thread to tell it when the other end has failed to send a keep-awake on schedule. The main thread can then act accordingly. When I did this I had the monitor thread automatically reconnect as and when it could (e.g. the remote process has come back to life), and tell the main thread that too. This means that bits of my system can come and go and the rest of it just copes nicely.
Finally, your actual application processes will end up as a loop, with something like select() at the top to wait for message inputs from all the different channels (and monitor threads) that it is expecting to hear from.
By the way, this sort of thing is frustratingly hard to implement in Windows. There's just no proper equivalent of select() anywhere in any Microsoft language. There is a select() for sockets, but you can't use it on pipes, etc. like you can in Unix. The Cygwin guys had real problems implementing their version of select(). I think they ended up with a polling thread per file descriptor; massively inefficient.
Good luck!
Your question lacks a description of how the "modules" are implemented and what do they do, and possibly a description of the environment in which you are planning to implement all of this.
For example:
If the modules themselves have some requirements which makes them hard to implement as threads (e.g. they use non-thread-safe 3rd party libraries, have global variables, etc.), your message delivery system will also not be implementable with threads.
If you are using an environment such as Python which does not handle thread parallelism very well (because of its global interpreter lock), and running on Linux, you will not gain any performance benefits with threads over processes.
There are more things to consider. If you are just passing data between modules, who says your system needs to use either multiple threads or multiple processes? There are other architectures which do the same thing without either of them, such as event-driven with callbacks (a message receiver can register a callback with your system, which is invoked when a message generator generates a message). This approach will be absolutely the fastest in any case where parallelism isn't important and where receiving code can be invoked in the execution context of the caller.
tl;dr: you have only scratched the surface with your question :)

How does Asynchronous programming work in a single threaded programming model?

I was going through the details of node.jsand came to know that, It supports asynchronous programming though essentially it provides a single threaded model.
How is asynchronous programming handled in such cases? Is it like runtime itself creates and manages threads, but the programmer cannot create threads explicitly? It would be great if someone could point me to some resources to learn about this.
Say it with me now: async programming does not necessarily mean multi-threaded.
Javascript is a single-threaded runtime - you simply aren't able to create new threads in JS because the language/runtime doesn't support it.
Frank says it correctly (although obtusely) In English: there's a main event loop that handles when things come into your app. So, "handle this HTTP request" will get added to the event queue, then handled by the event loop when appropriate.
When you call an async operation (a mysql db query, for example), node.js sends "hey, execute this query" to mysql. Since this query will take some time (milliseconds), node.js performs the query using the MySQL async library - getting back to the event loop and doing something else there while waiting for mysql to get back to us. Like handling that HTTP request.
Edit: By contrast, node.js could simply wait around (doing nothing) for mysql to get back to it. This is called a synchronous call. Imagine a restaurant, where your waiter submits your order to the cook, then sits down and twiddles his/her thumbs while the chef cooks. In a restaurant, like in a node.js program, such behavior is foolish - you have other customers who are hungry and need to be served. Thus you want to be as asynchronous as possible to make sure one waiter (or node.js process) is serving as many people as they can.
Edit done
Node.js communicates with mysql using C libraries, so technically those C libraries could spawn off threads, but inside Javascript you can't do anything with threads.
Ryan said it best: sync/async is orthogonal to single/multi-threaded. For single and multi-threaded cases there is a main event loop that calls registered callbacks using the Reactor Pattern. For the single-threaded case the callbacks are invoked sequentially on main thread. For the multi-threaded case they are invoked on separate threads (typically using a thread pool). It is really a question of how much contention there will be: if all requests require synchronized access to a single data structure (say a list of subscribers) then the benefits of having multiple threaded may be diminished. It's problem dependent.
As far as implementation, if a framework is single threaded then it is likely using poll/select system call i.e. the OS is triggering the asynchronous event.
To restate the waiter/chef analogy:
Your program is a waiter ("you") and the JavaScript runtime is a kitchen full of chefs doing the things you ask.
The interface between the waiter and the kitchen is mediated by queues so requests are not lost in instances of overcapacity.
So your program is assigned one thread of execution. You can only wait one table at a time. Each time you want to offload some work (like making the food/making a network request), you run to the kitchen and pin the order to a board (queue) for the chefs (runtime) to pick-up when they have spare capacity. The chefs will let you know when the order is ready (they will call you back). In the meantime, you go wait another table (you are not blocked by the kitchen).
So the accepted answer is misleading. The JavaScript runtime is definitionally multithreaded because I/O does not block your JavaScript program. As a waiter you can continue serving customers, while the kitchen cooks. That involves at least two threads of execution. The reality is that the runtime will maintain several threads of execution behind the scenes, in order to efficiently serve the single thread directly corresponding to your script.
By design, only one thread of execution is assigned to the synchronous running of your JavaScript program. This is a good thing because it makes your program easier to reason about than having to handle multiple threads of execution yourself. Don't worry: your JavaScript program can still get plenty complicated though!
