Socket incoming connections can not push_back elements concurrently to a globally defined std::vector

Socket incoming connections can not push_back elements concurrently to a globally defined std::vector - linux

I am new in socket programming and at this moment I am confronted with a problem that I can not solve. I have read from several sources that the C++ standard template (STL) containers are not thread-safe, so that one as a programmer has to impose a mechanism that ensures that several threads do not modify the data of a container concurrently.
For instance, Thread safety std::vector push_back and reserve
I have used the std::mutex class to make sure that nobody writes data in the same container at the same time when programming threads. However, this is not working for me when I use sockets.
Suppose I have 4 clients, each one sending data (int) to the server in the following order:
client_0: 4
client_1: 8
client_2: 5
client_4: 7
Observe the following code for a simple server:
#define PORT 60000
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <vector>
#include <string>
#include <iostream>
#include <mutex>
using namespace std;
vector<int> inputQueue; //<--------!
mutex mtx; //<---------------------!
void printVector(vector<int> input) {
cout << "inputQueue: [";
for (unsigned int i = 0; i < input.size(); i++ ) {
if (i != input.size() - 1)
cout << input[i] << ", ";
else
cout << input[i];
}
cout << "]." << endl;
}
int main(int argc, char const *argv[])
{
int server_fd, client_fd;
struct sockaddr_in address;
int opt = 1;
int addrlen = sizeof(address);
if ((server_fd = socket(AF_INET, SOCK_STREAM, 0)) == 0) {
perror("socket failed");
exit(EXIT_FAILURE);
}
if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR | SO_REUSEPORT, &opt, sizeof(opt))) {
perror("setsockopt");
exit(EXIT_FAILURE);
}
address.sin_family = AF_INET;
address.sin_addr.s_addr = INADDR_ANY;
address.sin_port = htons( PORT );
if (bind(server_fd, (struct sockaddr *)&address, sizeof(address))<0) {
perror("bind failed");
exit(EXIT_FAILURE);
}
if (listen(server_fd, 10) < 0) {
perror("listen");
exit(EXIT_FAILURE);
}
while(1) {
char buffer[4];
if ((client_fd = accept(server_fd, (struct sockaddr *)&address, (socklen_t*)&addrlen))<0) {
perror("accept");
exit(EXIT_FAILURE);
}
if (!fork()) {
recv(client_fd, buffer, 4, MSG_WAITALL);
int receivedInt = int(
(unsigned char)(buffer[0]) << 24 |
(unsigned char)(buffer[1]) << 16 |
(unsigned char)(buffer[2]) << 8 |
(unsigned char)(buffer[3])
);
mtx.lock(); //<-------------------------------------!
inputQueue.push_back(receivedInt); //<--------------!
cout << "Client context. Integer registered: " << receivedInt << ": inputQueue length is " << inputQueue.size() << endl;
printVector(inputQueue); //<------------------------!
mtx.unlock(); //<-----------------------------------!
close(server_fd); close(client_fd);
}
cout << "Server context: inputQueue length is " << inputQueue.size() << endl;
printVector(inputQueue);
}
return 0;
}
The server must receive data making sure that they do so in the same order and registering their respective data in a vector of integers, that is, std::vector<int> inputQueue, using the push_back() method, so that inputQueue = {4, 8, 5, 7} at the end of the reception of all the data by the clients.
I must clarify that inputQueue is a global variable, which when starting the execution of the server, does not contain elements, but they are added as the clients register.
The problem is that none of the client registers elements in inputQueue. Notice in the following code that, depending on where you put the cout << instruction, you can see that the inputQueue size is different. This shows that within the context of the client, each client overwrites the first element of inputQueue, but outside it none of the clients is able to register a single element in inputQueue.
Apparently, each socket has its own copy of inputQueue, so when it is destroyed, the modified copy of inputQueue is also destroyed.
Output is the following:
Server context: inputQueue length is 0
inputQueue: [].
Client context. Integer registered: 4: inputQueue length is 1
inputQueue: [4].
Server context: inputQueue length is 1
inputQueue: [4].
Server context: inputQueue length is 0
inputQueue: [].
Client context. Integer registered: 8: inputQueue length is 1
inputQueue: [8].
Server context: inputQueue length is 0
inputQueue: [].
Server context: inputQueue length is 1
inputQueue: [8].
Client context. Integer registered: 5: inputQueue length is 1
inputQueue: [5].
Server context: inputQueue length is 1
inputQueue: [5].
Server context: inputQueue length is 0
inputQueue: [].
Client context. Integer registered: 7: inputQueue length is 1
inputQueue: [7].
Server context: inputQueue length is 1
inputQueue: [7].
Does anyone have any idea why this happens and how could they solve it? I hope you can help me. Thank you

if (!fork()) {
fork() creates a completely new, independent process with its own virtual memory address space. The shown code, apparently, expects both the child process and the original process to be interacting through the same object, namely a vector, locked by a mutex.
That's not what happens. You now have two completely independent processes. This is no different than running your program twice, at the same time or in quck succession. Do you expect both running copies of your program to somehow share the same vector and mutex? Of course not.
What you are looking to do, instead, is to use std::thread to create a new execution thread in the same process. Your C++ book should have more information how to create new execution threads with std::thread.
Furthermore, even if you replace the fork() with an analogous execution thread: that will still not solve all the problems here. You will also need to correctly handle synchronization between multiple execution threads. Specifically: there are no guarantees whatsoever that a new execution thread will insert something into the vector, before the other execution thread attempts to printVector its contents. The new execution thread could manage to do that, before the original execution thread enters printVector. Or it may not, and printVector finds a completely empty vector, because the other execution thread hasn't managed to push something into it, quickly enough. You now have two completely independent execution threads running at the same time, and you have no guarantees as to which thread does what, first.
You can even get a different result every time you run the multithreaded version of the shown program (and you probably will).
When you are ready to begin tackling this new problem, your C++ book will explain how to use condition variables, together with mutexes, to correctly implement multi-threaded synchronization. Unfortunately, this is not a topic that can be completely covered in a brief answer on stackoverflow.com, but it should have several dedicated chapters in your C++ book, where you will find more information.
P.S. The only reason your output shows anything in the input queue is because there's nothing to stop the child process to continue executing the program when it exits its if statement, and ends up, itself, calling printVector. It's not coming from the parent process. Each child process ends up printing the value it itself inserted into its own vector.

As noted by Miles Budnek, you are creating a new child process. Sockets are global OS objects, so are working as expected. Your vector, and the memory it's stored in, is local to the process and therefore cannot be accessed by your new proc.
Consider looking into std::thread:
https://en.cppreference.com/w/cpp/thread/thread
One of the most used methods of starting a thread is with a lambda.
#include <thread>
#include <iostream>
auto print_number(int number) -> void
{
std::cout << number << std::endl; // This runs in the new thread.
}
int main()
{
int num = 12;
auto t = std::thread([num](){print_number(num);}); // Spawn new thread that calls the lambda
t.join(); // Wait for thread to finish execution
return 0;
}

Related

What is the difference between two join statements in the code?

In the below code, there are two joins (of course one is commented). I would like to know what is the difference between
when join is executed before the loop and when join is executed after the loop?
#include <iostream>
#include <thread>
using namespace std;
void ThreadFunction();
int main()
{
thread ThreadFunctionObj(ThreadFunction);
//ThreadFunctionObj.join();
for (int j=0;j<10;++j)
{
cout << "\tj = " << j << endl;
}
ThreadFunctionObj.join();
return 0;
}
void ThreadFunction()
{
for (int i=0;i<10;++i)
{
cout << "i = " << i << endl;
}
}

A join() on a thread waits for it to finish execution, your code doesn't continue as long as the thread isn't done. As such, calling join() right after starting a new thread defeats the purpose of multi-threading, as it would be the same as executing those two for loops in a serial way. Calling join() after your loop in main() ensures that both for loops execute in parallel, meaning that at the end of your for loop in your main(), you wait for the ThreadFunction() loop to be done too. This is the equivalent of you and a friend going out to eat, for example. You both start eating at relatively the same time, but the first one to finish still has to wait for the other (might not be the best example, but hope it does the job).
Hope it helps

std::async performance on Windows and Solaris 10

I'm running a simple threaded test program on both a Windows machine (compiled using MSVS2015) and a server running Solaris 10 (compiled using GCC 4.9.3). On Windows I'm getting significant performance increases from increasing the threads from 1 to the amount of cores available; however, the very same code does not see any performance gains at all on Solaris 10.
The Windows machine has 4 cores (8 logical) and the Unix machine has 8 cores (16 logical).
What could be the cause for this? I'm compiling with -pthread, and it is creating threads since it prints all the "S"es before the first "F". I don't have root access on the Solaris machine, and from what I can see there's no installed tool which I can use to view a process' affinity.
Example code:
#include <iostream>
#include <vector>
#include <future>
#include <random>
#include <chrono>
std::default_random_engine gen(std::chrono::system_clock::now().time_since_epoch().count());
std::normal_distribution<double> randn(0.0, 1.0);
double generate_randn(uint64_t iterations)
{
// Print "S" when a thread starts
std::cout << "S";
std::cout.flush();
double rvalue = 0;
for (int i = 0; i < iterations; i++)
{
rvalue += randn(gen);
}
// Print "F" when a thread finishes
std::cout << "F";
std::cout.flush();
return rvalue/iterations;
}
int main(int argc, char *argv[])
{
if (argc < 2)
return 0;
uint64_t count = 100000000;
uint32_t threads = std::atoi(argv[1]);
double total = 0;
std::vector<std::future<double>> futures;
std::chrono::high_resolution_clock::time_point t1;
std::chrono::high_resolution_clock::time_point t2;
// Start timing
t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < threads; i++)
{
// Start async tasks
futures.push_back(std::async(std::launch::async, generate_randn, count/threads));
}
for (auto &future : futures)
{
// Wait for tasks to finish
future.wait();
total += future.get();
}
// End timing
t2 = std::chrono::high_resolution_clock::now();
// Take the average of the threads' results
total /= threads;
std::cout << std::endl;
std::cout << total << std::endl;
std::cout << "Finished in " << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count() << " ms" << std::endl;
}

As a general rule, classes defined by the C++ standard library do not have any internal locking. Modifying an instance of a standard library class from more than one thread, or reading it from one thread while writing it from another, is undefined behavior, unless "objects of that type are explicitly specified as being sharable without data races". (N3337, sections 17.6.4.10 and 17.6.5.9.) The RNG classes are not "explicitly specified as being sharable without data races". (cout is an example of a stdlib object that is "sharable with data races" — as long as you haven't done ios::sync_with_stdio(false).)
As such, your program is incorrect because it accesses a global RNG object from more than one thread simultaneously; every time you request another random number, the internal state of the generator is modified. On Solaris, this seems to result in serialization of accesses, whereas on Windows it is probably instead causing you not to get properly "random" numbers.
The cure is to create separate RNGs for each thread. Then each thread will operate independently, and they will neither slow each other down nor step on each other's toes. This is a special case of a very general principle: multithreading always works better the less shared data there is.
There's an additional wrinkle to worry about: each thread will call system_clock::now at very nearly the same time, so you may end up with some of the per-thread RNGs seeded with the same value. It would be better to seed them all from a random_device object. random_device requests random numbers from the operating system, and does not need to be seeded; but it can be very slow. The random_device should be created and used inside main, and seeds passed to each worker function, because a global random_device accessed from multiple threads (as in the previous edition of this answer) is just as undefined as a global default_random_engine.
All told, your program should look something like this:
#include <iostream>
#include <vector>
#include <future>
#include <random>
#include <chrono>
static double generate_randn(uint64_t iterations, unsigned int seed)
{
// Print "S" when a thread starts
std::cout << "S";
std::cout.flush();
std::default_random_engine gen(seed);
std::normal_distribution<double> randn(0.0, 1.0);
double rvalue = 0;
for (int i = 0; i < iterations; i++)
{
rvalue += randn(gen);
}
// Print "F" when a thread finishes
std::cout << "F";
std::cout.flush();
return rvalue/iterations;
}
int main(int argc, char *argv[])
{
if (argc < 2)
return 0;
uint64_t count = 100000000;
uint32_t threads = std::atoi(argv[1]);
double total = 0;
std::vector<std::future<double>> futures;
std::chrono::high_resolution_clock::time_point t1;
std::chrono::high_resolution_clock::time_point t2;
std::random_device make_seed;
// Start timing
t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < threads; i++)
{
// Start async tasks
futures.push_back(std::async(std::launch::async,
generate_randn,
count/threads,
make_seed()));
}
for (auto &future : futures)
{
// Wait for tasks to finish
future.wait();
total += future.get();
}
// End timing
t2 = std::chrono::high_resolution_clock::now();
// Take the average of the threads' results
total /= threads;
std::cout << '\n' << total
<< "\nFinished in "
<< std::chrono::duration_cast<
std::chrono::milliseconds>(t2 - t1).count()
<< " ms\n";
}

(This isn't really an answer, but it won't fit into a comment, especially with the command formatting an links.)
You can profile your executable on Solaris using Solaris Studio's collect utility. On Solaris, that will be able to show you where your threads are contending.
collect -d /tmp -p high -s all app [app args]
Then view the results using the analyzer utility:
analyzer /tmp/test.1.er &
Replace /tmp/test.1.er with the path to the output generated by a collect profile run.
If your threads are contending over some resource(s) as #zwol posted in his answer, you will see it.
Oracle marketing brief for the toolset can be found here: http://www.oracle.com/technetwork/server-storage/solarisstudio/documentation/o11-151-perf-analyzer-brief-1405338.pdf
You can also try compiling your code with Solaris Studio for more data.

What's the correct way of waiting for detached threads to finish?

Look at this sample code:
void OutputElement(int e, int delay)
{
this_thread::sleep_for(chrono::milliseconds(100 * delay));
cout << e << '\n';
}
void SleepSort(int v[], uint n)
{
for (uint i = 0 ; i < n ; ++i)
{
thread t(OutputElement, v[i], v[i]);
t.detach();
}
}
It starts n new threads and each one sleeps for some time before outputting a value and finishing. What's the correct/best/recommended way of waiting for all threads to finish in this case? I know how to work around this but I want to know what's the recommended multithreading tool/design that I should use in this situation (e.g. condition_variable, mutex etc...)?

And now for the slightly dissenting answer. And I do mean slightly because I mostly agree with the other answer and the comments that say "don't detach, instead join."
First imagine that there is no join(). And that you have to communicate among your threads with a mutex and condition_variable. This really isn't that hard nor complicated. And it allows an arbitrarily rich communication, which can be anything you want, as long as it is only communicated while the mutex is locked.
Now a very common idiom for such communication would simply be a state that says "I'm done". Child threads would set it, and the parent thread would wait on the condition_variable until the child said "I'm done." This idiom would in fact be so common as to deserve a convenience function that encapsulated the mutex, condition_variable and state.
join() is precisely this convenience function.
But imho one has to be careful. When one says: "Never detach, always join," that could be interpreted as: Never make your thread communication more complicated than "I'm done."
For a more complex interaction between parent thread and child thread, consider the case where a parent thread launches several child threads to go out and independently search for the solution to a problem. When the problem is first found by any thread, that gets communicated to the parent, and the parent can then take that solution, and tell all the other threads that they don't need to search any more.
For example:
#include <chrono>
#include <iostream>
#include <iterator>
#include <random>
#include <thread>
#include <vector>
void OneSearch(int id, std::shared_ptr<std::mutex> mut,
std::shared_ptr<std::condition_variable> cv,
int& state, int& solution)
{
std::random_device seed;
// std::mt19937_64 eng{seed()};
std::mt19937_64 eng{static_cast<unsigned>(id)};
std::uniform_int_distribution<> dist(0, 100000000);
int test = 0;
while (true)
{
for (int i = 0; i < 100000000; ++i)
{
++test;
if (dist(eng) == 999)
{
std::unique_lock<std::mutex> lk(*mut);
if (state == -1)
{
state = id;
solution = test;
cv->notify_one();
}
return;
}
}
std::unique_lock<std::mutex> lk(*mut);
if (state != -1)
return;
}
}
auto findSolution(int n)
{
std::vector<std::thread> threads;
auto mut = std::make_shared<std::mutex>();
auto cv = std::make_shared<std::condition_variable>();
int state = -1;
int solution = -1;
std::unique_lock<std::mutex> lk(*mut);
for (uint i = 0 ; i < n ; ++i)
threads.push_back(std::thread(OneSearch, i, mut, cv,
std::ref(state), std::ref(solution)));
while (state == -1)
cv->wait(lk);
lk.unlock();
for (auto& t : threads)
t.join();
return std::make_pair(state, solution);
}
int
main()
{
auto p = findSolution(5);
std::cout << '{' << p.first << ", " << p.second << "}\n";
}
Above I've created a "dummy problem" where a thread searches for how many times it needs to query a URNG until it comes up with the number 999. The parent thread puts 5 child threads to work on it. The child threads work for awhile, and then every once in a while, look up and see if any other thread has found the solution yet. If so, they quit, else they keep working. The main thread waits until solution is found, and then joins with all the child threads.
For me, using the bash time facility, this outputs:
$ time a.out
{3, 30235588}
real 0m4.884s
user 0m16.792s
sys 0m0.017s
But what if instead of joining with all the threads, it detached those threads that had not yet found a solution. This might look like:
for (unsigned i = 0; i < n; ++i)
{
if (i == state)
threads[i].join();
else
threads[i].detach();
}
(in place of the t.join() loop from above). For me this now runs in 1.8 seconds, instead of the 4.9 seconds above. I.e. the child threads are not checking with each other that often, and so main just detaches the working threads and lets the OS bring them down. This is safe for this example because the child threads own everything they are touching. Nothing gets destructed out from under them.
One final iteration can be realized by noticing that even the thread that finds the solution doesn't need to be joined with. All of the threads could be detached. The code is actually much simpler:
auto findSolution(int n)
{
auto mut = std::make_shared<std::mutex>();
auto cv = std::make_shared<std::condition_variable>();
int state = -1;
int solution = -1;
std::unique_lock<std::mutex> lk(*mut);
for (uint i = 0 ; i < n ; ++i)
std::thread(OneSearch, i, mut, cv,
std::ref(state), std::ref(solution)).detach();
while (state == -1)
cv->wait(lk);
return std::make_pair(state, solution);
}
And the performance remains at about 1.8 seconds.
There is still (sort of) an effective join with the solution-finding thread here. But it is accomplished with the condition_variable::wait instead of with join.
thread::join() is a convenience function for the very common idiom that your parent/child thread communication protocol is simply "I'm done." Prefer thread::join() in this common case as it is easier to read, and easier to write.
However don't unnecessarily constrain yourself to such a simple parent/child communication protocol. And don't be afraid to build your own richer protocol when the task at hand needs it. And in this case, thread::detach() will often make more sense. thread::detach() doesn't necessarily imply a fire-and-forget thread. It can simply mean that your communication protocol is more complex than "I'm done."

Don't detach, but instead join:
std::vector<std::thread> ts;
for (unsigned int i = 0; i != n; ++i)
ts.emplace_back(OutputElement, v[i], v[i]);
for (auto & t : threads)
t.join();

Linux ptrace memory reading & process management

I'm trying to read a processes memory on Linux (Xubuntu, to be precise). I'm pretty new to Linux, though I've done this same read using Win32API ReadProcessMemory() before in Windows. The general idea is that I'm trying to develop some software for a game which will get my stats and upload them to a server, which will track my progress and keep a log of it. The end goal is to make a bot which will automatically play and farm data about the game. In order to do this, I need to be able to access the processes memory. In Windows, that's dead easy. In Linux, it's proving a little more complex.
I've found a memory address which contains information I want to read. The information is an int32, and it is stored at 84a1bd8. I found it using GameConqueror 0.13. The address remains correct after restarting, so it appears there is no ASLR (as there was in Windows). I also know the ProcessID (I can find this using task manager for now, though if someone knows a simple way to get a PID by either ClassName, Exe name, or similar, that would be great too!) So, that looks like it should be all I really need to use PTRACE_PEEKDATA to read the memory, right? Well, that's the problem, it doesn't appear to be. My code looks like this:
#include <iostream>
#include <string>
#include <sys/ptrace.h>
#include <errno.h>
using namespace std;
int main()
{
pid_t pid = 4847;
int addr = 0x84a1bd8;
long ret = ptrace(PTRACE_TRACEME, pid, NULL, NULL);
cout << "ptrace Status: " << ret << endl;
cout << "Errno: " << errno << endl;
ret = ptrace(PTRACE_PEEKDATA, pid, (void*)addr, NULL);
cout << "ptrace Status: " << ret << endl;
cout << "Errno: " << errno << endl;
ret = ptrace(PTRACE_DETACH, pid, NULL, NULL);
cout << "ptrace Status: " << ret << endl;
cout << "Errno: " << errno << endl;
return 0;
}
The output looks like this:
ptrace Status: 0
Errno: 0
ptrace Status: -1
Errno: 3
ptrace Status: -1
Errno: 3
Being quite new to Linux, I don't know where I'm to find error codes and how I can work out what this error actually means, and nor do I know if I am even declaring the address correctly. Should I declare it as an int in it's decimal equivalent? Is there anything I'm missing?
Thanks for your time

Found the solution to be that when using ptrace() you must call in an order:
ptrace(PTRACE_ATTACH, pid, NULL, NULL)
ptrace(PTRACE_PEEKDATA, pid, addr, NULL)
ptrace(PTRACE_DETACH, pid, NULL, NULL)
So the simple answer: You need to attach and detach before and after reading the memory.
It may also be useful to know that between the attach and detach commands, the process will sleep, meaning this method isn't so good for my purpose, but may be useful to others :)
Thanks to #PeterL. for your help.

Does Creating a New Thread Duplicate File Descriptors and Socket Descriptors in Linux?

Everyone knows the classic model of a process listening for connections on a socket and forking a new process to handle each new connection. Normal practice is for the parent process to immediately call close on the newly created socket, decrementing the handle count so that only the child has a handle to the new socket.
I've read that the only difference between a process and a thread in Linux is that threads share the same memory. In this case I'm assuming spawning a new thread to handle a new connection also duplicates file descriptors and would also require the 'parent' thread to close it's copy of the socket?

No. Threads share the same memory, so they share the same variables. If you close socket in parent thread, it will be also closed in child thread.
EDIT:
man fork: The child inherits copies of the parent’s set of open file descriptors.
man pthreads: threads share a range of other attributes (i.e., these attributes are process-wide rather than per-thread): [...] open file descriptors
And some code:
#include <cstring>
#include <iostream>
using namespace std;
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>
// global variable
int fd = -1;
void * threadProc(void * param) {
cout << "thread: begin" << endl;
sleep(2);
int rc = close(fd);
if (rc == -1) {
int errsv = errno;
cout << "thread: close() failed: " << strerror(errsv) << endl;
}
else {
cout << "thread: file is closed" << endl;
}
cout << "thread: end" << endl;
}
int main() {
int rc = open("/etc/passwd", O_RDONLY);
fd = rc;
pthread_t threadId;
rc = pthread_create(&threadId, NULL, &threadProc, NULL);
sleep(1);
rc = close(fd);
if (rc == -1) {
int errsv = errno;
cout << "main: close() failed: " << strerror(errsv) << endl;
return 0;
}
else {
cout << "main: file is closed" << endl;
}
sleep(2);
}
Output is:
thread: begin
main: file is closed
thread: close() failed: Bad file descriptor
thread: end

In principle, Linux clone() can implement not only a new process (like fork()), or a new thread (like pthread_create perhaps), but also anything in between.
In practice, it is only ever used for one or the other. Threads created with pthread_create share the file descriptors with all other threads in the process (not just the parent). This is non-negotiable.
Sharing a file descriptor and having a copy is different. If you have a copy (like fork()) then all copies must be closed before the file handle goes away. If you share the FD in a thread, once one closes it, it's gone.

On Linux threads are implemented via the clone syscall using the CLONE_FILES flag:
If CLONE_FILES is set, the calling
process and the child processes share
the same file descriptor table. Any
file descriptor created by the calling
process or by the child process is
also valid in the other process.
Similarly, if one of the processes
closes a file descriptor, or changes
its associated flags (using the
fcntl(2) F_SETFD operation), the other
process is also affected.
Also have a look at the glibc source code for the details of how it is used in createthread.c:
int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGNAL
| CLONE_SETTLS | CLONE_PARENT_SETTID
| CLONE_CHILD_CLEARTID | CLONE_SYSVSEM
#if __ASSUME_NO_CLONE_DETACHED == 0
| CLONE_DETACHED
#endif
| 0);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string