Running another instance of the same program with a goroutine?

Running another instance of the same program with a goroutine? - multithreading

Is it an acceptable practice to run multiple instances of the same go program using goroutines, like running go main()?
If so, is it possible to modify arguments sent to the goroutine (or os.Args[]) such that the main() function does not create an infinite number of goroutines?
The goroutines should then be able to communicate with each other via channels. I understand that goroutines share the same memory space but have separate stacks, so this could cause some race condition issues.
Or, perhaps, this is an improper use of Goroutines and I should just stick to exec.Command() to execute another instance of the executable, and have those instances communicate via a JSON-RPC.
Thanks for the assistance.

I'm not sure you are understanding how a goroutine works here. Think of it like a virtual thread, as it is pretty much Go's alternative to threads in practice. When you call go foo() you are spawning a goroutine (or virtual thread) within your executable same as you would a thread in other languages, not a separate process as an exec or syscall.ForkExec().
The proper practice in Go is to stick with a single process and use goroutines for concurrent responsibilities. For example, if you are writing your own port listener and want several iterations to each listen on a different port, your outline might be:
func APIHandler(port int) {
// do stuff
}
func main() {
go APIHandler(80)
go APIHandler(81)
go APIHandler(82)
// sync.WaitGroup, or maybe wait on an error chan
}

Related

thread vs. subroutine. When to use what?

I think threads and subroutines are not directly comparable, but I'm designing a system that many class instances communicating with each other and I'm not sure how exactly I should do it.
Let's say I have a class that receives a command, execute the command, and return the result. To execute commands, I can make methods to handle them, or I can create threads for each command and forward the message to an appropriate thread and have the thread handle the command.
Is there any criteria to choose a certain design? like real-time, or concurrency issue? An can the number of threads be a problem if it is too large?
Thank you.

Confused about threads

I'm studying threads in C and I have this theoretical question in mind that is driving me crazy. Assume the following code:
1) void main() {
2) createThread(...); // create a new thread that does "something"
3) }
After line 2 is executed, two paths of execution are created. However I believe that immediately after line 2 is executed then it doesn't even matter what the new thread does, which was created at line 2, because the original thread that executed line 2 will end the entire program at its next instruction. Am I wrong? is there any chance the original thread gets suspended somehow and the new thread get its chance to do something (assume the code as is, no sync between threads or join operations are performed)

It can work out either way. If you have more than one core, the new thread might get its own core. Even if you don't, the scheduler might give the new thread priority over the existing one. The original thread might exhaust its timeslice right after it creates a new thread.
So that code creates a race condition -- one thread is trying to do work, another thread is trying to terminate the process. Which one wins will depend on the threading implementation, the hardware, and perhaps even some random chance.

If main() finishes before the spawned threads, all those threads will be terminated as there is no main() to support them.
Calling pthread_exit() at the end of main() will block it and keep it alive to support the threads it created until they complete execution.
You can learn more about this here: https://computing.llnl.gov/tutorials/pthreads/

Assuming you are using POSIX pthreads (not clear from your example) then you are right. If you don't want that then indeed pthread_exit from main will mean the program will continue to run until all the threads finish. The "main thread" is special in this regard, as its exit normally causes all threads to terminate.
More typically, you'll do something useful in the main thread after a new thread has been forked. Otherwise, what's the point? So you'll do your own processing, wait on some events, etc. If you want main (or any other thread) to wait for a thread to complete before proceeding, you can call pthread_join() with the handle of the thread of interest.
All of this may be off the point, however since you are not explicitly using POSIX threads in your example, so I don't know if that's pseudo-code for the purpose of example or literal code. In Windows, CreateThread has different semantics from POSIX pthreads. However, you didn't use that capitalization for the call in your example so I don't know if that's what you intended either. Personally I use the pthreads_win32 library even on Windows.

Does nodejs carry the "single thread" (no multithreaded locking code) benefit when run on multiple cores?

As I understand, one of the benefits of NodeJS is that it's one thread per process; in the standard case, you don't need to worry about concurrency.
I also read about NodeJS scaling on multi core machines (Node.js on multi-core machines):
Workers will compete to accept new connections, and the least loaded process is most likely to win. It works pretty well and can scale up throughput quite well on a multi-core box.
In this case, will multiple threads execute in parallel? If so, doesn't that mean we do have to write multithreaded code (if we want to use multiple cores) - and if so, how do I do that?
Or if they don't execute in parallel... where is the boost/benefit of multiple cores coming from?
Edit: My current understanding
So there may be multiple processes on multiple cores but each process only has a single thread.
For example:
var io = require('socket.io').listen(81);
var connections = [];
io.sockets.on('connect', function (socket) {
console.log('connected...');
connections.push(socket);
socket.on('disconnect', function () {
console.log('disconnected');
connections.remove(socket);
});
});
There aren't race connections; there's a single thread, there won't be concurrent accesses of connections. When you have different processes, each process has its own copy of connections. So if you had a massive chatroom you couldn't balance the load with multiple processes; each process would be its own chatroom.
In this aspect, it's not any different from PHP, in that each PHP script has its own copy of the variables so you don't write locking code. Of course, the rest of it is completely different, but as far as I can see the argument "you don't have to write thread-locking code" isn't much of a plus because most data will be saved elsewhere anyways (not as in-memory variables).

The answer to:
Does nodejs carry the “single thread” (no multithreaded locking code) benefit when run on multiple cores?
Is yes, node still prevents locking code, as each process is still single threaded.
There are no multi-threads in node (javascript is designed to be a single thread). Scaling to multi-cores involves multiple processes, each with a single thread.
So, you have multiple process that execute in parallel, but since they're separate processes, with their own process space, you don't have the same issues with locks as you would with a multi-threaded process. Communicating between processes uses IPC via handles. Since all IO is non-blocking in Node, while child processes are waiting for I/O other processes can continue to execute, and receive data.

As the nature of javascript is, running code can only be executed in a single Thread. That means in every internal resource of the running Node each resource is accessible by only one running function, parallelism can't happen.
An example:
var car = {
velocity: 100,
};
function speedUpTo150() {
car.velocity = 150;
}
function slowDownTo80() {
car.velocity = 80;
}
speedUpTo150();
slowDownTo80();
setTimeout(function() {
speedUpTo150();
},1000);
setTimeout(function() {
slowDownTo80();
},1000);
By this example it should be clear that race condition can not happen as on any time access to car can only have one function.
Yet nodejs as you mentioned can have a multicore execution mode. This can happen either by clustering (forking) the Javascript code into various nodeJS processes, or by spawing child Processes. Again in each individual processes (either cluster or child processes) race condition can not happen in their internal resources. Neither can happen as they exchange resources, as at any time at both sides only one piece of code is executed and apply the exchange.
But you also mentioned external resources, such as MongoDB. NodeJS can not be agnostic of what MongoDB is serving at any time rather than its own calls. So in that case race condition (I am not completely sure how mongoDB serves this case, it's just a hypothesis) can happen as at any time MongoDB can serve any process, either that second process is a fork instance of NodeJS or any other. In such cases you should implement a locking mechanism.
You Should note that same case is also applied to the Actor pattern where each actor is an individual thread and have a really similar way to handle race condition to its thread internal resources. But when it comes to external resources by Actor's nature it is not possible to be aware of external resource's state.
Just food for thought, why don't you check for an immutable mechanism?
Cheers!

JavaScript always runs in a single thread. There is no such thing as multithreaded code in JavaScript. It is not good for heavy computing, but it's good for IO based operations, because it's event based, e.g. when IO access is in progress the thread is free to handle other requests/operation. That's why it can handle well many "simultaneous" connections.

QProcess, QEventLoop - of any use for parallel-processing

I wonder whether I could use QEventLoop (QProcess?) to parallelize multiple calls to same function with Qt. What is precisely the difference with QtConcurrent or QThread? What is a process and an event loop more precisely? I read that QCoreApplication must exec() as early as possible in main() method, so that I wonder why it is different from main Thread.
could you point as some efficient reference to processes and thread with Qt? I came through the official doc and those things remain unclear.
Thanks and regards.

Process and thread are not Qt-specific concepts. You can search for "process vs. thread" anywhere for that distinction to be explained. For instance: What resources are shared between threads?
Though related concepts, spawning a new process is a more "heavyweight" form of parallelism than spawning a new thread within your existing process. Processes are protected from each other by default, while threads of execution within a process can read and write each other's memory directly. The protection you get from spawning processes comes at a greater run-time cost...and since independent processes can't read each other's memory, you have to share data between them using methods of inter-process communication.
Odds are that you want threads, because they're simpler to use in a case where one is writing all the code in a program. Given all the complexities in multithreaded programming, I'd suggest looking at a good book or reading some websites to start with. See: What are some good resources for learning threaded programming?
But if you want to dive in and just get a feel for how threading in Qt looks, you can spend time looking at the examples:
http://qt-project.org/doc/qt-4.8/examples-threadandconcurrent.html
QtConcurrent is an abstraction library that makes it easier to implement some kinds of parallel programming patterns. It's built on top of the QThread abstractions, and there's nothing it can do that you couldn't code yourself by writing to QThread directly. But it might make your code easier to write and less prone to errors.
As for an event loop...that is merely a generic term for how any given thread of execution in your program waits for work items to process, processes them, and can decide when it is no longer needed. If a thread's job were merely to start up, do some math, and exit...then it wouldn't need an event loop. But starting and stopping a thread takes time and churns resources. So typically threads live for longer periods of time, and have an event loop that knows how to wait for events it needs to respond to.
If you build on top of QtConcurrent, you won't have to worry about an event loop in your worker threads because they are managed automatically in a thread pool. The word count example is pretty simple to see:
http://qt-project.org/doc/qt-4.8/qtconcurrent-wordcount-main-cpp.html

How to use fork() in unix? Why not something of the form fork(pointerToFunctionToRun)?

I am having some trouble understanding how to use Unix's fork(). I am used to, when in need of parallelization, spawining threads in my application. It's always something of the form
CreateNewThread(MyFunctionToRun());
void myFunctionToRun() { ... }
Now, when learning about Unix's fork(), I was given examples of the form:
fork();
printf("%d\n", 123);
in which the code after the fork is "split up". I can't understand how fork() can be useful. Why doesn't fork() have a similar syntax to the above CreateNewThread(), where you pass it the address of a function you want to run?
To accomplish something similar to CreateNewThread(), I'd have to be creative and do something like
//pseudo code
id = fork();
if (id == 0) { //im the child
FunctionToRun();
} else { //im the parent
wait();
}
Maybe the problem is that I am so used to spawning threads the .NET way that I can't think clearly about this. What am I missing here? What are the advantages of fork() over CreateNewThread()?
PS: I know fork() will spawn a new process, while CreateNewThread() will spawn a new thread.
Thanks

fork() says "copy the current process state into a new process and start it running from right here." Because the code is then running in two processes, it in fact returns twice: once in the parent process (where it returns the child process's process identifier) and once in the child (where it returns zero).
There are a lot of restrictions on what it is safe to call in the child process after fork() (see below). The expectation is that the fork() call was part one of spawning a new process running a new executable with its own state. Part two of this process is a call to execve() or one of its variants, which specifies the path to an executable to be loaded into the currently running process, the arguments to be provided to that process, and the environment variables to surround that process. (There is nothing to stop you from re-executing the currently running executable and providing a flag that will make it pick up where the parent left off, if that's what you really want.)
The UNIX fork()-exec() dance is roughly the equivalent of the Windows CreateProcess(). A newer function is even more like it: posix_spawn().
As a practical example of using fork(), consider a shell, such as bash. fork() is used all the time by a command shell. When you tell the shell to run a program (such as echo "hello world"), it forks itself and then execs that program. A pipeline is a collection of forked processes with stdout and stdin rigged up appropriately by the parent in between fork() and exec().
If you want to create a new thread, you should use the Posix threads library. You create a new Posix thread (pthread) using pthread_create(). Your CreateNewThread() example would look like this:
#include <pthread.h>
/* Pthread functions are expected to accept and return void *. */
void *MyFunctionToRun(void *dummy __unused);
pthread_t thread;
int error = pthread_create(&thread,
NULL/*use default thread attributes*/,
MyFunctionToRun,
(void *)NULL/*argument*/);
Before threads were available, fork() was the closest thing UNIX provided to multithreading. Now that threads are available, usage of fork() is almost entirely limited to spawning a new process to execute a different executable.
below: The restrictions are because fork() predates multithreading, so only the thread that calls fork() continues to execute in the child process. Per POSIX:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. [THR] [Option Start] Fork handlers may be established by means of the pthread_atfork() function in order to maintain application invariants across fork() calls. [Option End]
When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that is not asynch-signal-safe, the behavior is undefined.
Because any library function you call could have spawned a thread on your behalf, the paranoid assumption is that you are always limited to executing async-signal-safe operations in the child process between calling fork() and exec().

History aside, there are some fundamental differences with respect to ownership of resource and life time between processes and threads.
When you fork, the new process occupies a completely separate memory space. That's a very important distinction from creating a new thread. In multi-threaded applications you have to consider how you access and manipulate shared resources. Processed that have been forked have to explicitly share resources using inter-process means such as shared memory, pipes, remote procedure calls, semaphores, etc.
Another difference is that fork()'ed children can outlive their parent, where as all threads die when the process terminates.
In a client-server architecture where very, very long uptime is expected, using fork() rather than creating threads could be a valid strategy to combat memory leaks. Rather than worrying about cleaning up memory leaks in your threads, you just fork off a new child process to process each client request, then kill the child when it's done. The only source of memory leaks would then be the parent process that dispatches events.
An analogy: You can think of spawning threads as opening tabs inside a single browser window, while forking is like opening separate browser windows.

It would be more valid to ask why CreateNewThread doesn't just return a thread id like fork() does... after all fork() set a precedent. Your opinion's just coloured by you having seen one before the other. Take a step back and consider that fork() duplicates the process and continues execution... what better place than at the next instruction? Why complicate things by adding a function call into the bargain (and then one what only takes void*)?
Your comment to Mike says "I can't understand is in which contexts you'd want to use it.". Basically, you use it when you want to:
run another process using the exec family of functions
do some parallel processing independently (in terms of memory usage, signal handling, resources, security, robustness), for example:
each process may have intrusive limits of the number of file descriptors they can manage, or on a 32-bit system - the amount of memory: a second process can share the work while getting its own resources
web browsers tend to fork distinct processes because they can do some initialisation then call operating system functions to permanently reduce their privileges (e.g. change to a less-trusted user id, change the "root" directory under which they can access files, or make some memory pages read-only); most OSes don't allow the same extent of fine-grained permission-setting on a per-thread basis; another benefit is if a child process seg-faults or similar the parent process can handle that and continue, whereas similar faults in multi-threaded code raise questions about whether memory has been corrupted - or locks have been held - by the crashing thread such that remaining threads are compromised
BTW / using UNIX/Linux doesn't mean you have to give up threads for fork()ing processes... you can use pthread_create() and related functions if you're more comfortable with the threading paradigm.

Letting the difference between spawning a process and a thread set aside for a second: Basically, fork() is a more fundamental primitive. While SpawnNewThread has to do some background work to get the program counter in the right spot, fork does no such work, it just copies (or virtually copies) your program memory and continues the counter.

Fork has been with us for a very, very, long time. Fork was thought of before the idea of 'start a thread running a particular function' was a glimmer in anyone's eye.
People don't use fork because it's 'better,' we use it because it is the one and only unprivileged user-mode process creation function that works across all variations of Linux. If you want to create a process, you have to call fork. And, for some purposes, a process is what you need, not a thread.
You might consider researching the early papers on the subject.

It is worth noting that multi-processing not exactly the same as multi-threading. The new process created by fork share very little context with the old one, which is quite different from the case for threads.
So, lets look at the unixy thread system: pthread_create has semantics similar to CreateNewThread.
Or, to turn it around, lets look at the windows (or java or other system that makes its living with threads) way of spawning a process identical to the one you're currently running (which is what fork does on unix)...well, we could except that there isn't one: that just not part of the all-threads-all-the-time model. (Which is not a bad thing, mind you, just different).

You fork whenever you want to more than one thing at the same time. It’s called multitasking, and is really useful.
Here for example is a telnetish like program:
#!/usr/bin/perl
use strict;
use IO::Socket;
my ($host, $port, $kidpid, $handle, $line);
unless (#ARGV == 2) { die "usage: $0 host port" }
($host, $port) = #ARGV;
# create a tcp connection to the specified host and port
$handle = IO::Socket::INET->new(Proto => "tcp",
PeerAddr => $host,
PeerPort => $port)
or die "can't connect to port $port on $host: $!";
$handle->autoflush(1); # so output gets there right away
print STDERR "[Connected to $host:$port]\n";
# split the program into two processes, identical twins
die "can't fork: $!" unless defined($kidpid = fork());
if ($kidpid) {
# parent copies the socket to standard output
while (defined ($line = <$handle>)) {
print STDOUT $line;
}
kill("TERM" => $kidpid); # send SIGTERM to child
}
else {
# child copies standard input to the socket
while (defined ($line = <STDIN>)) {
print $handle $line;
}
}
exit;
See how easy that is?

Fork()'s most popular use is as a way to clone a server for each new client that connect()s (because the new process inherits all file descriptors in whatever state they exist).
But I've also used it to initiate a new (locally running) service on-demand from a client.
That scheme is best done with two calls to fork() - one stays in the parent session until the server is up and running and able to connect, the other (I fork it off from the child) becomes the server and departs the parent's session so it can no longer be reached by (say) SIGQUIT.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string