How to seed Random^ in C++ / CLI with threads?

How to seed Random^ in C++ / CLI with threads? - multithreading

I have a basic program that is using multi-threading. Each thread needs to use different random numbers when the thread procedure is called. I have tried seeding the random number generator within the thread procedure, but I get the same random numbers for each thread. Here is a simple version of what I am doing:
public ref class ThreadX
{
public:
void ThreadProc()
{
srand(time(NULL));
Console::WriteLine(rand()); //Will output same random numbers
}
}
int main(){
ThreadX^ process1 = gcnew ThreadX(gasStationATM);
Thread^ Thread1 = gcnew Thread(gcnew ThreadStart(process1, &ThreadX::ThreadProc));
Thread^ Thread2 = gcnew Thread(gcnew ThreadStart(process1, &ThreadX::ThreadProc));
Thread1->Start();
Thread2->Start();
}
What I originally thought was when the second thread was started, the second thread started the time for the seed would be different and give a different series of random numbers for the second number. How can I seed the srand in C++ CLI so that each thread generates random numbers.

When you create your Random instance, use the constructor that lets you specify a seed value. The default constructor uses the current system time as the seed value, so two instances created at the same time will use the same seed, and therefore produce the same sequence of random numbers.
For your seed value, there are two options for specifying different values: You could use some value unique to the thread you're running on, or you could use a unique number that you manage.
Random^ GetRandom1()
{
return gcnew Random(Thread::CurrentThread->ManagedThreadId);
}
static int uniqueID = 0;
Random^ GetRandom2()
{
return gcnew Random(Interlocked::Increment(uniqueID));
}

Another option is to use a single central RNG in your main thread. As each new thread is created, seed the thread's own RNG with the next number from the central RNG. This also has the advantage that you can repeat a run exactly if necessary by giving the central RNG a specific seed. That technique can be useful for repeating (and hence fixing) errors.

srand(time(NULL) + rank) will do the job.
At present, your seed is not thread specific. When all the threads are spawned, they have almost same time(NULL), and so value of seed is same for each thread. To avoid it, you can use rank of thread as seed, or any other thread specific variable.

Related

How can I solve the problem of script blocking？

I want to give users the ability to customize the behavior of game objects, but I found that unity is actually a single threaded program. If the user writes a script with circular statements in the game object, the main thread of unity will block, just like the game is stuck. How to make the update function of object seem to be executed on a separate thread?
De facto execution order
The logical execution sequence I want to implement

You can implement threading, but the UnityAPI is NOT thread safe, so anything you do outside of the main thread cannot use the UnityAPI. This means that you can do a calculation in another thread and get a result returned to the main thread, but you cannot manipulate GameObjects from the thread.
You do have other options though, for tasks which can take several frames, you can use a coroutine. This will also allow the method to wait without halting the main thread. It sounds like your best option is the C# Jobs System. This system essentially lets you use multithreading and manages the threads for you.
Example from the Unity Manual:
public struct MyJob : IJob
{
public float a;
public float b;
public NativeArray<float> result;
public void Execute()
{
result[0] = a + b;
}
}
// Create a native array of a single float to store the result. This example waits for the job to complete for illustration purposes
NativeArray<float> result = new NativeArray<float>(1, Allocator.TempJob);
// Set up the job data
MyJob jobData = new MyJob();
jobData.a = 10;
jobData.b = 10;
jobData.result = result;
// Schedule the job
JobHandle handle = jobData.Schedule();
// Wait for the job to complete
handle.Complete();
float aPlusB = result[0];
// Free the memory allocated by the result array
result.Dispose();

using atomic c++11 to implement a thread safe down counter to zero

I'm new to atomic techniques and try to implement a safe thread version for the follow code:
// say m_cnt is unsigned
void Counter::dec_counter()
{
if(0==m_cnt)
return;
--m_cnt;
if(0 == m_cnt)
{
// Do seomthing
}
}
Every thread that calls dec_counter must decrement it by one and "Do something" should be done only one time - at when the counter is decremented to 0.
After fighting with it, I did the follow code that does it well (I think), but I wonder if this is the way to do it, or is there a better way. Thanks.
// m_cnt is std::atomic<unsigned>
void Counter::dec_counter()
{
// loop until decrement done
unsigned uiExpectedValue;
unsigned uiNewValue;
do
{
uiExpectedValue = m_cnt.load();
// if other thread already decremented it to 0, then do nothing.
if (0 == uiExpectedValue)
return;
uiNewValue = uiExpectedValue - 1;
// at the short time from doing
// uiExpectedValue = m_cnt.load();
// it is possible that another thread had decremented m_cnt, and it won't be equal here to uiExpectedValue,
// thus the loop, to be sure we do a decrement
} while (!m_cnt.compare_exchange_weak(uiExpectedValue, uiNewValue));
// if we are here, that means we did decrement . so if it was to 0, then do something
if (0 == uiNewValue)
{
// do something
}
}

The thing with atomic is that only that one statement is atomic.
If you write
std::atomic<int> i {20}
...
if (!--i)
...
Then just 1 thread will enter the if.
However, if you split up the change and the test, then other threads can get into the gap, and you may get strange results:
std::atomic<int> i {20}
...
--i;
// other thread(s) can modify i just here
if (!i)
...
Of course you can split the condition test for the decrement by using a local variable:
std::atomic<int> i {20}
...
int j=--i;
// other thread(s) can modify i just here
if (!j)
...
All the simple math operations are generally efficiently supported for small atomics in c++
For more complex types and expressions, you need to use the read/modify/write member methods.
These allow you to read the current value, calculate the new value, and then call compare_exchange_strong or compare_exchange_weak say "if the value has not changed, then store my new value, otherwise give me the new current value" a a single atomic operation. You can stick this in a loop and keep recalculating the new value until you are lucky enough that your thread is the only writer. If there are not too many threads trying too often to change the value this is reasonably efficient as well.

Give me a scenario where such an output could happen when multi-threading is happening

Just trying to understand threads and race condition and how they affect the expected output. In the below code, i once had an output that began with
"2 Thread-1" then "1 Thread-0" .... How could such an output happen? What I understand is as follows:
Step1:Assuming Thread 0 started, it incremented counter to 1,
Step2: Before printing it, Thread 1 incremented it to 2 and printed it,
Step3: Thread 0 prints counter which should be 2 but is printing 1.
How could Thread 0 print counter as 1 when Thread 1 already incremented it to 2?
P.S: I know that synchronized key could deal with such race conditions, but I just want to have some concepts done before.
public class Counter {
static int count=0;
public void add(int value) {
count=count+value;
System.out.println(count+" "+ Thread.currentThread().getName());
}
}
public class CounterThread extends Thread {
Counter counter;
public CounterThread(Counter c) {
counter=c;
}
public void run() {
for(int i=0;i<5;i++) {
counter.add(1);
}
}
}
public class Main {
public static void main(String args[]) {
Counter counter= new Counter();
Thread t1= new CounterThread(counter);
Thread t2= new CounterThread(counter);
t1.start();
t2.start();
}
}

How could Thread 0 print counter as 1 when Thread 1 already incremented it to 2?
There's a lot more going on in these two lines than meets the eye:
count=count+value;
System.out.println(count+" "+ Thread.currentThread().getName());
First of all, the compiler doesn't know anything about threads. It's job is to emit code that will achieve the same end result when executed in a single thread. That is, when all is said and done, the count must be incremented, and the message must be printed.
The compiler has a lot of freedom to re-order operations, and to store values in temporary registers in order to ensure that the correct end result is achieved in the most efficient way possible. So, for example, the count in the expression count+" "+... will not necessarily cause the compiler to fetch the latest value of the global count variable. In fact it probably will not fetch from the global variable because it knows that the result of the + operation still is sitting in a CPU register. And, since it doesn't acknowledge that other threads could exist, then it knows that there's no way that the value in the register could be any different from what it stored into the global variable after doing the +.
Second of all, the hardware itself is allowed to stash values in temporary places and re-order operations for efficiency, and it too is allowed to assume that there are no other threads. So, even when the compiler emits code that says to actually fetch from or store to the global variable instead of to or from a register, the hardware does not necessarily store to or fetch from the actual address in memory.
Assuming your code example is Java code, then all of that changes when you make appropriate use of synchronized blocks. If you would add synchronized to the declaration of your add method for example:
public synchronized void add(int value) {
count=count+value;
System.out.println(count+" "+ Thread.currentThread().getName());
}
That forces the compiler to acknowledge the existence of other threads, and the compiler will emit instructions that force the hardware to acknowledge other threads as well.
By adding synchronized to the add method, you force the hardware to deliver the actual value of the global variable on entry to the method, your force it to actually write the global by the time the method returns, and you prevent more than one thread from being in the method at the same time.

Design pattern for asynchronous while loop

I have a function that boils down to:
while(doWork)
{
config = generateConfigurationForTesting();
result = executeWork(config);
doWork = isDone(result);
}
How can I rewrite this for efficient asynchronous execution, assuming all functions are thread safe, independent of previous iterations, and probably require more iterations than the maximum number of allowable threads ?
The problem here is we don't know how many iterations are required in advance so we can't make a dispatch_group or use dispatch_apply.
This is my first attempt, but it looks a bit ugly to me because of arbitrarily chosen values and sleeping;
int thread_count = 0;
bool doWork = true;
int max_threads = 20; // arbitrarily chosen number
dispatch_queue_t queue =
dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
while(doWork)
{
if(thread_count < max_threads)
{
dispatch_async(queue, ^{ Config myconfig = generateConfigurationForTesting();
Result myresult = executeWork();
dispatch_async(queue, checkResult(myresult)); });
thread_count++;
}
else
usleep(100); // don't consume too much CPU
}
void checkResult(Result value)
{
if(value == good) doWork = false;
thread_count--;
}

Based on your description, it looks like generateConfigurationForTesting is some kind of randomization technique or otherwise a generator which can make a near-infinite number of configuration (hence your comment that you don't know ahead of time how many iterations you will need). With that as an assumption, you are basically stuck with the model that you've created, since your executor needs to be limited by some reasonable assumptions about the queue and you don't want to over-generate, as that would just extend the length of the run after you have succeeded in finding value ==good measurements.
I would suggest you consider using a queue (or OSAtomicIncrement* and OSAtomicDecrement*) to protect access to thread_count and doWork. As it stands, the thread_count increment and decrement will happen in two different queues (main_queue for the main thread and the default queue for the background task) and thus could simultaneously increment and decrement the thread count. This could lead to an undercount (which would cause more threads to be created than you expect) or an overcount (which would cause you to never complete your task).
Another option to making this look a little nicer would be to have checkResult add new elements into the queue if value!=good. This way, you load up the initial elements of the queue using dispatch_apply( 20, queue, ^{ ... }) and you don't need the thread_count at all. The first 20 will be added using dispatch_apply (or an amount that dispatch_apply feels is appropriate for your configuration) and then each time checkResult is called you can either set doWork=false or add another operation to queue.

dispatch_apply() works for this, just pass ncpu as the number of iterations (apply never uses more than ncpu worker threads) and keep each instance of your worker block running for as long as there is more work to do (i.e. loop back to generateConfigurationForTesting() unless !doWork).

Can I assign a per-thread index, using pthreads?

I'm optimizing some instrumentation for my project (Linux,ICC,pthreads), and would like some feedback on this technique to assign a unique index to a thread, so I can use it to index into an array of per-thread data.
The old technique uses a std::map based on pthread id, but I'd like to avoid locks and a map lookup if possible (it is creating a significant amount of overhead).
Here is my new technique:
static PerThreadInfo info[MAX_THREADS]; // shared, each index is per thread
// Allow each thread a unique sequential index, used for indexing into per
// thread data.
1:static size_t GetThreadIndex()
2:{
3: static size_t threadCount = 0;
4: __thread static size_t myThreadIndex = threadCount++;
5: return myThreadIndex;
6:}
later in the code:
// add some info per thread, so it can be aggregated globally
info[ GetThreadIndex() ] = MyNewInfo();
So:
1) It looks like line 4 could be a race condition if two threads where created at exactly the same time. If so - how can I avoid this (preferably without locks)? I can't see how an atomic increment would help here.
2) Is there a better way to create a per-thread index somehow? Maybe by pre-generating the TLS index on thread creation somehow?

1) An atomic increment would help here actually, as the possible race is two threads reading and assigning the same ID to themselves, so making sure the increment (read number, add 1, store number) happens atomically fixes that race condition. On Intel a "lock; inc" would do the trick, or whatever your platform offers (like InterlockedIncrement() for Windows for example).
2) Well, you could actually make the whole info thread-local ("__thread static PerThreadInfo info;"), provided your only aim is to be able to access the data per-thread easily and under a common name. If you actually want it to be a globally accessible array, then saving the index as you do using TLS is a very straightforward and efficient way to do this. You could also pre-compute the indexes and pass them along as arguments at thread creation, as Kromey noted in his post.

Why so averse to using locks? Solving race conditions is exactly what they're designed for...
In any rate, you can use the 4th argument in pthread_create() to pass an argument to your threads' start routine; in this way, you could use your master process to generate an incrementing counter as it launches the threads, and pass this counter into each thread as it is created, giving you your unique index for each thread.

I know you tagged this [pthreads], but you also mentioned the "old technique" of using std::map. This leads me to believe that you're programming in C++. In C++11 you have std::thread, and you can pass out unique indexes (id's) to your threads at thread creation time through an ordinary function parameter.
Below is an example HelloWorld that creates N threads, assigning each an index of 0 through N-1. Each thread does nothing but say "hi" and give it's index:
#include <iostream>
#include <thread>
#include <mutex>
#include <vector>
inline void sub_print() {}
template <class A0, class ...Args>
void
sub_print(const A0& a0, const Args& ...args)
{
std::cout << a0;
sub_print(args...);
}
std::mutex&
cout_mut()
{
static std::mutex m;
return m;
}
template <class ...Args>
void
print(const Args& ...args)
{
std::lock_guard<std::mutex> _(cout_mut());
sub_print(args...);
}
void f(int id)
{
print("This is thread ", id, "\n");
}
int main()
{
const int N = 10;
std::vector<std::thread> threads;
for (int i = 0; i < N; ++i)
threads.push_back(std::thread(f, i));
for (auto i = threads.begin(), e = threads.end(); i != e; ++i)
i->join();
}
My output:
This is thread 0
This is thread 1
This is thread 4
This is thread 3
This is thread 5
This is thread 7
This is thread 6
This is thread 2
This is thread 9
This is thread 8

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string