garbage collection concurrent - multithreading

Why simple mark and sweep algorithm can't be done concurrently.
I have read that we need CMS(concurrent mark and sweep algorithm) for concurrent garbage collection that is more complex than simple mark and sweep algorithm but what is the problem with the simple one that we need CMS?

The simplest reason is that marker can miss some reachable objects if the path to an object changes. Consider following program and scenario:
a = new Object();
b = new Object();
b.c = new Object();
// gc process marks a, nothing reachable from a
a.c = b.c;
b.c = null;
// gc process marks b, nothing reachable from b
// marking phase completes, a.c is not marked
// gc sweeps a.c
print a.c; // safety violation

Related

How can I solve the problem of script blocking?

I want to give users the ability to customize the behavior of game objects, but I found that unity is actually a single threaded program. If the user writes a script with circular statements in the game object, the main thread of unity will block, just like the game is stuck. How to make the update function of object seem to be executed on a separate thread?
De facto execution order
The logical execution sequence I want to implement
You can implement threading, but the UnityAPI is NOT thread safe, so anything you do outside of the main thread cannot use the UnityAPI. This means that you can do a calculation in another thread and get a result returned to the main thread, but you cannot manipulate GameObjects from the thread.
You do have other options though, for tasks which can take several frames, you can use a coroutine. This will also allow the method to wait without halting the main thread. It sounds like your best option is the C# Jobs System. This system essentially lets you use multithreading and manages the threads for you.
Example from the Unity Manual:
public struct MyJob : IJob
{
public float a;
public float b;
public NativeArray<float> result;
public void Execute()
{
result[0] = a + b;
}
}
// Create a native array of a single float to store the result. This example waits for the job to complete for illustration purposes
NativeArray<float> result = new NativeArray<float>(1, Allocator.TempJob);
// Set up the job data
MyJob jobData = new MyJob();
jobData.a = 10;
jobData.b = 10;
jobData.result = result;
// Schedule the job
JobHandle handle = jobData.Schedule();
// Wait for the job to complete
handle.Complete();
float aPlusB = result[0];
// Free the memory allocated by the result array
result.Dispose();

Concurrent array access with std::atomic

I know how mutex synchronization works, but I have problems deciding how synchronization need to be done in following over simplified case:
We have an array with 10 elements.
Thread 1 access the array in read only way - read elements. e.g. something like this:
// const int *my_array;
int something = my_array[5];
Thread 2 doing unrelated stuff, but from time to time it might decide to update all 10 elements at once. e.g. something like:
// const int *my_array;
const int *my_temp_array = load_new_data();
// suppose pointer memory are correct.
// because it is pointers, the following operation is instant
my_array = my_temp_array;
Both threads need to use primitive such std::unique_lock.
But is there a way this to be done with std::atomic?
Note:
as Igor mentioned:
if thread 1 loops over the array, and thread 2 flips it in the middle
of the loop - is it OK to process half old elements and half new ones?
Who allocates memory for the array, and when and by whom should it be
deallocated?
The example is oversimplified and I am interested only in general thread synchronization. This is why number of elements are fixed to 10. Also let suppose there is no memory allocation and it is OK to process half old elements and half new ones.
if old memory not need to free, std::atomic<int*> works like this
//global atomic ptr
std::atomic<int*> myarray = nullptr;
// Thread 1
const int * current_array = myarray;
// do something current_array[10]
int something = current_array[5];
// Thread 2
int * tmp_array = new int[10];
myarray = tmp_array;

Design pattern for asynchronous while loop

I have a function that boils down to:
while(doWork)
{
config = generateConfigurationForTesting();
result = executeWork(config);
doWork = isDone(result);
}
How can I rewrite this for efficient asynchronous execution, assuming all functions are thread safe, independent of previous iterations, and probably require more iterations than the maximum number of allowable threads ?
The problem here is we don't know how many iterations are required in advance so we can't make a dispatch_group or use dispatch_apply.
This is my first attempt, but it looks a bit ugly to me because of arbitrarily chosen values and sleeping;
int thread_count = 0;
bool doWork = true;
int max_threads = 20; // arbitrarily chosen number
dispatch_queue_t queue =
dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
while(doWork)
{
if(thread_count < max_threads)
{
dispatch_async(queue, ^{ Config myconfig = generateConfigurationForTesting();
Result myresult = executeWork();
dispatch_async(queue, checkResult(myresult)); });
thread_count++;
}
else
usleep(100); // don't consume too much CPU
}
void checkResult(Result value)
{
if(value == good) doWork = false;
thread_count--;
}
Based on your description, it looks like generateConfigurationForTesting is some kind of randomization technique or otherwise a generator which can make a near-infinite number of configuration (hence your comment that you don't know ahead of time how many iterations you will need). With that as an assumption, you are basically stuck with the model that you've created, since your executor needs to be limited by some reasonable assumptions about the queue and you don't want to over-generate, as that would just extend the length of the run after you have succeeded in finding value ==good measurements.
I would suggest you consider using a queue (or OSAtomicIncrement* and OSAtomicDecrement*) to protect access to thread_count and doWork. As it stands, the thread_count increment and decrement will happen in two different queues (main_queue for the main thread and the default queue for the background task) and thus could simultaneously increment and decrement the thread count. This could lead to an undercount (which would cause more threads to be created than you expect) or an overcount (which would cause you to never complete your task).
Another option to making this look a little nicer would be to have checkResult add new elements into the queue if value!=good. This way, you load up the initial elements of the queue using dispatch_apply( 20, queue, ^{ ... }) and you don't need the thread_count at all. The first 20 will be added using dispatch_apply (or an amount that dispatch_apply feels is appropriate for your configuration) and then each time checkResult is called you can either set doWork=false or add another operation to queue.
dispatch_apply() works for this, just pass ncpu as the number of iterations (apply never uses more than ncpu worker threads) and keep each instance of your worker block running for as long as there is more work to do (i.e. loop back to generateConfigurationForTesting() unless !doWork).

Identifying memory leaks in C++

I've got the following bit of code, which I've narrowed down to be causing a memory leak (that is, in Task Manager, the Private Working Set of memory increases with the same repeated input string). I understand the concepts of heaps and stacks for memory, as well as the general rules for avoiding memory leaks, but something somewhere is still going wrong:
while(!quit){
char* thebuffer = new char[210];
//checked the function, it isn't creating the leak
int size = FuncToObtainInputTextFromApp(thebuffer); //stored in thebuffer
string bufferstring = thebuffer;
int startlog = bufferstring.find("$");
int endlog = bufferstring.find("&");
string str_text="";
str_text = bufferstring.substr(startlog,endlog-startlog+1);
String^ str_text_m = gcnew String(str_text_m.c_str());
//some work done
delete str_text_m;
delete [] thebuffer;
}
The only thing I can think of is it might be the creation of 'string str_text' since it never goes out of scope since it just reloops in the while? If so, how would I resolve that? Defining it outside the while loop wouldn't solve it since it'd also remain in scope then too. Any help would be greatly appreciated.
You should use scope-bound resource management (also known as RAII), it's good practice in any case. Never allocate memory manually, keep it in an automatically allocated class that will clean up the resource for you in the destructor.
You code might read:
while(!quit)
{
// completely safe, no leaks possible
std::vector<char> thebuffer(210);
int size = FuncToObtainInputTextFromApp(&thebuffer[0]);
// you never used size, this should be better
string bufferstring(thebuffer, size);
// find does not return an int, but a size_t
std::size_t startlog = bufferstring.find("$");
std::size_t endlog = bufferstring.find("&");
// why was this split across two lines?
// there's also no checks to ensure the above find
// calls worked, be careful
string str_text = bufferstring.substr(startlog, endlog - startlog + 1);
// why copy the string into a String? why not construct
// this directly?
String^ str_text_m = gcnew String(str_text_m.c_str());
// ...
// don't really need to do that, I think,
// it's garbage collected for a reason
// delete str_text_m;
}
The point is, you won't get memory leaks if you're ensured your resources are freed by themselves. Maybe the garbage collector is causing your leak detector to mis-fire.
On a side note, your code seems to have lots of unnecessary copying, you might want to rethink how many times you copy the string around. (For example, find "$" and "&" while it's in the vector, and just copy from there into str_text, no need for an intermediate copy.)
Are you #using std, so that str_text's type is std::string? Maybe you meant to write -
String^ str_text_m = gcnew String(str_text.c_str());
(and not gcnew String(str_text_m.c_str()) ) ?
Most importantly, allocating a String (or any object) with gcnew is declaring that you will not be delete'ing it explicitly - you leave it up to the garbage collector. Not sure what happens if you do delete it (technically it's not even a pointer. Definitely does not reference anything on the CRT heap, where new/delete have power).
You can probably safely comment str_text_m's deletion. You can expect gradual memory increase (where the gcnew's accumulate) and sudden decreases (where the garbage collection kicks in) in some intervals.
Even better, you can probably reuse str_text_m, along the lines of -
String^ str_text_m = gcnew String();
while(!quit){
...
str_text_m = String(str_text.c_str());
...
}
I know its recommended to set the freed variable to NULL after deleting it just to prevent any invalid memory reference. May help, may not.
delete [] thebuffer;
thebuffer = NULL; // Clear a to prevent using invalid memory reference
There is a tool called DevPartner which can catch all memory leaks at runtime. If you have the pdb for your application this will give you the line numbers in your application where all memory leak has been observed.
This is best used for really big applications.

Looking for a lock-free RT-safe single-reader single-writer structure

I'm looking for a lock-free design conforming to these requisites:
a single writer writes into a structure and a single reader reads from this structure (this structure exists already and is safe for simultaneous read/write)
but at some time, the structure needs to be changed by the writer, which then initialises, switches and writes into a new structure (of the same type but with new content)
and at the next time the reader reads, it switches to this new structure (if the writer multiply switches to a new lock-free structure, the reader discards these structures, ignoring their data).
The structures must be reused, i.e. no heap memory allocation/free is allowed during write/read/switch operation, for RT purposes.
I have currently implemented a ringbuffer containing multiple instances of these structures; but this implementation suffers from the fact that when the writer has used all the structures present in the ringbuffer, there is no more place to change from structure... But the rest of the ringbuffer contains some data which don't have to be read by the reader but can't be re-used by the writer. As a consequence, the ringbuffer does not fit this purpose.
Any idea (name or pseudo-implementation) of a lock-free design? Thanks for having considered this problem.
Here's one. The keys are that there are three buffers and the reader reserves the buffer it is reading from. The writer writes to one of the other two buffers. The risk of collision is minimal. Plus, this expands. Just make your member arrays one element longer than the number of readers plus the number of writers.
class RingBuffer
{
RingBuffer():lastFullWrite(0)
{
//Initialize the elements of dataBeingRead to false
for(unsigned int i=0; i<DATA_COUNT; i++)
{
dataBeingRead[i] = false;
}
}
Data read()
{
// You may want to check to make sure write has been called once here
// to prevent read from grabbing junk data. Else, initialize the elements
// of dataArray to something valid.
unsigned int indexToRead = lastFullWriteIndex;
Data dataCopy;
dataBeingRead[indexToRead] = true;
dataCopy = dataArray[indexToRead];
dataBeingRead[indexToRead] = false;
return dataCopy;
}
void write( const Data& dataArg )
{
unsigned int writeIndex(0);
//Search for an unused piece of data.
// It's O(n), but plenty fast enough for small arrays.
while( true == dataBeingRead[writeIndex] && writeIndex < DATA_COUNT )
{
writeIndex++;
}
dataArray[writeIndex] = dataArg;
lastFullWrite = &dataArray[writeIndex];
}
private:
static const unsigned int DATA_COUNT;
unsigned int lastFullWrite;
Data dataArray[DATA_COUNT];
bool dataBeingRead[DATA_COUNT];
};
Note: The way it's written here, there are two copies to read your data. If you pass your data out of the read function through a reference argument, you can cut that down to one copy.
You're on the right track.
Lock free communication of fixed messages between threads/processes/processors
fixed size ring buffers can be used in lock free communications between threads, processes or processors if there is one producer and one consumer. Some checks to perform:
head variable is written only by producer (as an atomic action after writing)
tail variable is written only by consumer (as an atomic action after reading)
Pitfall: introduction of a size variable or buffer full/empty flag; these are typically written by both producer and consumer and hence will give you an issue.
I generally use ring buffers for this purpoee. Most important lesson I've learned is that a ring buffer of can never contain more than elements. This way a head and tail variable are written by producer respectively consumer.
Extension for large/variable size blocks
To use buffers in a real time environment, you can either use memory pools (often available in optimized form in real time operating systems) or decouple allocation from usage. The latter fits to the question, I believe.
If you need to exchange large blocks, I suggest to use a pool with buffer blocks and communicate pointers to buffers using a queue. So use a 3rd queue with buffer pointers. This way the allocates can be done in application (background) and you real time portion has access to a variable amount of memory.
Application
while (blockQueue.full != true)
{
buf = allocate block of memory from heap or buffer pool
msg = { .... , buf };
blockQueue.Put(msg)
}
Producer:
pBuf = blockQueue.Get()
pQueue.Put()
Consumer
if (pQueue.Empty == false)
{
msg=pQueue.Get()
// use info in msg, with buf pointer
// optionally indicate that buf is no longer used
}

Resources