Vala: Invalid read when a joined thread gets unreferenced - multithreading

When I compile and run the code below in valgrind, it looks like the thread gets free'd when I join the thread, and then later when it gets unreferenced some memory that is already free'd gets read.
Is this a "false positive" from valgrind? If not, is it in general safe to ignore in larger parallel programs? How do I get around it?
int main (string[] args) {
Thread<int> thread = new Thread<int>.try ("ThreadName", () => {
stdout.printf ("Hello World");
return 0;
});
thread.join ();
return 0;
}
==2697== Invalid read of size 4
==2697== at 0x50F2350: g_thread_unref (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
==2697== by 0x400A65: _vala_main (in /home/lockner/test)
==2697== by 0x400A9C: main (in /home/lockner/test)
==2697== Address 0x5dc17e8 is 24 bytes inside a block of size 72 free'd
==2697== at 0x4C2B60C: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2697== by 0x50F2547: g_thread_join (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
==2697== by 0x400A4B: _vala_main (in /home/lockner/test)
==2697== by 0x400A9C: main (in /home/lockner/test)
When I manually add "thread = NULL;" between the join call and the _g_thread_unref0 macro in the generated C code, the invalid read is gone in the valgrind output.
g_thread_join (thread);
result = 0;
thread = NULL;
_g_thread_unref0 (thread);
return result;

It turns out it was a missing annotation in glib-2.0.vapi
Adding [DestroysInstance] above join() solves the problem.

The issue is that g_thread_join already removes 1 reference. So the generated code does a double-free.
If you needed to add [DestroysInstance] this is clearly a bug in valac/the GThread binding.

Related

Memory not be freed on Mac when vector push_back string

Code as below, found that when vector push_back string on a Mac demo app, memory not be freed. I thought the stack variable will be freed when out of function scope, am I wrong? Thanks for any tips.
in model.h:
#pragma once
namespace NS {
const uint8_t kModel[8779041] = {4,0,188,250,....};
}
in ViewController.mm:
- (void)start {
std::vector<std::string> params = {};
std::string strModel(reinterpret_cast<const char *>(NS::kModel), sizeof(NS:kModel));
params.push_back(strModel);
}
The answer to your question depends on your understanding of the the "free" memory. The behaviour you are observing can be reproduced as simple as with a couple lines of code:
void myFunc() {
const auto *ptr = new uint8_t[8779041]{};
delete[] ptr;
}
Let's run this function and see how the memory consumption graph changes:
int main() {
myFunc(); // 1 MB
std::cout << "Check point" << std::endl; // 9.4 MB
return 0;
}
If you put one breakpoint right at the line with myFunc() invocation and another one at the line with "Check point" console output, you will witness how memory consumption for the process jumps by about 8 MB (for my system and machine configuration Xcode shows sudden jump from 1 MB to 9.4 MB). But wait, isn't it supposed to be 1 MB again after the function, as the allocated memory is freed at the end of the function? Well, not exactly.. The system doesn't regain this memory right away, because it's not that cheap operation to begin with, and if your process requests the same amount memory 1 CPU cycle later, it would be quite a redundant work. Thus, the system usually doesn't bother shrinking memory dedicated to a process either until it's needed for another process, and until it runs out of available resources (it also can be some kind of fixed timer, but overall I would say this is implementation-defined). Another common reason the memory is not freed, is because you often observe it through debug mode, where the memory remains dedicated to the process to track some tricky scenarios (like NSZombie objects, which address needs to remain accessible to the process in order to report the use-after-free occasions).
The most important here is that internally, the process can differentiate between "deleted" and "occupied" memory pages, thus it can re-occupy memory which is already deleted. As a result, no matter how many times you call the same function, the memory dedicated to the process remains the same:
int main() {
myFunc(); // 1 MB
std::cout << "Check point" << std::endl; // 9.4 MB
for (int i = 0; i < 10000; ++i) {
myFunc();
}
std::cout << "Another point" << std::endl; // 9.4 MB
return 0;
}

Local memory for each CUDA thread

I have a simple program below. My question is that where is "temp" actually stored? is it in global or local memory? I need array temp for each idx so that every thread has individual array temp. In this case, it is working properly. But in my actual program, when I tried to fill temp[0] from test2 it made the program stopped. Suppose we have 1024 threads then it only run the kernel around 200 threads. So, I am wondering whether temp is shared or not. If yes, maybe there is a collision there. I also did not get any error messsage. Please someone explain about this.
__device__ void test2(int temp[], int idx) {
temp[0] = idx;
printf("%d ", temp[0]);
}
__global__ void test() {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int *temp = (int *) malloc(100 * sizeof (int));
test2(temp, idx);
}
int main() {
test << <1, 1024 >> >();
return 0;
}
My question is that where is "temp" actually stored?
The allocation for temp is stored in a place called the device heap. It is a form of global memory. However the temp variable itself (i.e. the pointer value) is in local memory - not shared or visible to other threads.
I need array temp for each idx so that every thread has individual array temp.
You will get that, subject to caveats below. Each thread will have its own individual array, referenced by its local variable temp. Each thread will have a separate allocation for storage on the device heap.
People commonly have problems with in-kernel new or malloc. One of the main reasons is that the device heap is initially limited to 8MB, across all of your device heap allocations. So if enough threads do a new or malloc of enough allocation requests, you will run out of space.
When you run out of space, the API way to signal that is to return a zero pointer value for the allocation (a NULL pointer). If you then attempt to use this NULL pointer, you will have trouble.
For debugging purposes (i.e. to prove this is happening), test the pointer for NULL (i.e. == 0) before using it. If it is NULL, don't use it (perhaps print an error message instead).
You can read more about this in the documentation or in many questions here on the SO cuda tag. If you read any of these sources, you will discover that you can increase the size of the device heap.

Threads issue in c language using pthread library

I declare a global variable and initialize it with 0.
In main () function i create two threads. The first thread function increments the global variable upto the received arguments (function parameter) using a for loop, while the second function decrements the global variable same times using for loop.
When i pass 1000 as arguments the program works fine but when i pass 100000 the global variable value should be zero at the end but i found the value is not zero.
I also called the join function for both threads but doesn't works.
#include "stdio.h"
#include "stdlib.h"
#include "pthread.h"
int globVar =0;
void *incFunct(void* val){
for (int i=0; i<val; i++)
globVar++;
pthread_exit(NULL);
}
void *decFunct(void* val){
for (int i=0; i<val; i++)
globVar--;
pthread_exit(NULL);
}
int main()
{
pthread_t tid[2];
int val = 1000000;
printf("Initial value of Global variable : %d \n", globVar);
pthread_create(&tid[0], NULL, &incFunct, (void*)val);
pthread_create(&tid[1], NULL, &decFunct, (void*)val);
pthread_join(tid[0], NULL);
pthread_join(tid[1], NULL);
printf("Final Value of Global Var : %d \n", globVar);
return 0;
}
Yeah, you can't do that. Reasonably, you could end up with globVar having any value between -10000000 and +1000000; unreasonably, you might have invited the compiler to burn down your home (ask google about undefined behaviour).
You need to synchronize the operations of the two threads. One such synchronization is with a pthread_mutex_t; and you would acquire the lock (pthread_mutex_lock()) before operating on globVar, and release the lock (pthread_mutex_unlock()) after updating globVar.
For this particularly silly case, atomics might be more appropriate if your compiler happens to support them (/usr/include/stdatomic.h).
One thing that might happen is that the inc thread and the dec thread don't see consistent values for globVar. If you increment a variable you think has a value of 592, and, at the same time, I decrement what I think is the same variable but with a value of 311 — who wins? What happens when it's all over?
Without memory synchronization, you can't predict what will happen when multiple threads update the same memory location. You might have problems with cache coherency, variable tearing, and even reordered operations. Mutexes or C11 atomic variables are two ways to avoid these problems.
(As an aside, I suspect you don't see this problem with one thousand iterations because the first thread finishes well before the second even looks at globVar, and your implementation happens to update memory for that latter thread's consistency.)

pthread_join(thread_id, &res) , if &res is not NULL - is free(res) needed?

I've stumbled across a code example here. The lines that caught my attention (all other lines skipped):
{
...
void *res;
...
s = pthread_join(tinfo[tnum].thread_id, &res);
...
free(res); /* Free memory allocated by thread */
}
Can anyone deeper into pthreads than myself comment on the free(res), please? I have to say I have never seen this before, and that googling for 1-1.5 hours didn't give me any other similar examples.
In pthread_join(thread_id, &res) , if &res is not NULL - is free(res)
needed?
It depends whether the return value of thread was dynamically allocated (with malloc() & co).
If you look at the function thread_start() on same page, you'll see that it has a return statement:
return uargv;
and uagrv was allocated with:
uargv = strdup(tinfo->argv_string);
Hence, the free() call is used in main() after the pthread_join() call.
Because res is the filled with uargv (returned by the thread). You can conceptually assume there's a code like this inside pthread_join() function:
if (res)
*res = uargv;
Here's it's allocated using strdup() (which internally allocates memory). So you free() it. If the thread simply has return NULL; (and free()'s the uargv itself) then you don't need free().
The general answer is if you allocate something with malloc() family functions then you need to free().

Segmentation Fault With Multiple Threads

I get error segmentation fault because of the free() at the end of this equation...
don't I have to free the temporary variable *stck? Or since it's a local pointer and
was never assigned a memory space via malloc, the compiler cleans it up for me?
void * push(void * _stck)
{
stack * stck = (stack*)_stck;//temp stack
int task_per_thread = 0; //number of push per thread
pthread_mutex_lock(stck->mutex);
while(stck->head == MAX_STACK -1 )
{
pthread_cond_wait(stck->has_space,stck->mutex);
}
while(task_per_thread <= (MAX_STACK/MAX_THREADS)&&
(stck->head < MAX_STACK) &&
(stck->item < MAX_STACK)//this is the amount of pushes
//we want to execute
)
{ //store actual value into stack
stck->list[stck->head]=stck->item+1;
stck->head = stck->head + 1;
stck->item = stck->item + 1;
task_per_thread = task_per_thread+1;
}
pthread_mutex_unlock(stck->mutex);
pthread_cond_signal(stck->has_element);
free(stck);
return NULL;
}
Edit: You totally changed the question so my old answer doesn't really make sense anymore. I'll try to answer the new one (old answer still below) but for reference, next time please just ask a new question instead of changing an old one.
stck is a pointer that you set to point to the same memory as _stck points to. A pointer does not imply allocating memory, it just points to memory that is already (hopefully) allocated. When you do for example
char* a = malloc(10); // Allocate memory and save the pointer in a.
char* b = a; // Just make b point to the same memory block too.
free(a); // Free the malloc'd memory block.
free(b); // Free the same memory block again.
you free the same memory twice.
-- old answer
In push, you're setting stck to point to the same memory block as _stck, and at the end of the call you free stack (thereby calling free() on your common stack once from each thread)
Remove the free() call and, at least for me, it does not crash anymore. Deallocating the stack should probably be done in main() after joining all the threads.

Resources