How does garbage collector know a raw pointer and its referenced memory is no longer used - garbage-collection

I'm new to garbage collection, I have read this page:
A garbage collector for C and C++, it gives a simple example in the page: Using the Garbage Collector: A simple example.
#include "gc.h"
#include <assert.h>
#include <stdio.h>
int main()
{
int i;
GC_INIT(); /* Optional on Linux/X86; see below. */
for (i = 0; i < 10000000; ++i)
{
int **p = (int **) GC_MALLOC(sizeof(int *));
int *q = (int *) GC_MALLOC_ATOMIC(sizeof(int));
assert(*p == 0);
*p = (int *) GC_REALLOC(q, 2 * sizeof(int));
if (i % 100000 == 0)
printf("Heap size = %d\n", GC_get_heap_size());
}
return 0;
}
Here, *p is a pointer in garbage collector managed memory, and it point to a memory also inside the managed memory.
I'm curious that how does the garbage collector know that the two memory allocated in a previous for loop is leaked and should be reclaimed in the next for loop.

Related

pthreads code not scaling up

I wrote the following very simple pthread code to test how it scales up. I am running the code on a machine with 8 logical processors and at no time do I create more than 8 threads (to avoid context switching).
With increasing number of threads, each thread has to do lesser amount of work. Also, it is evident from the code that there are no shared Data structures between the threads which might be a bottleneck. But still, my performance degrades as I increase the number of threads.
Can somebody tell me what am I doing wrong here.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int NUM_THREADS = 3;
unsigned long int COUNTER = 10000000000000;
unsigned long int LOOP_INDEX;
void* addNum(void *data)
{
unsigned long int sum = 0;
for(unsigned long int i = 0; i < LOOP_INDEX; i++) {
sum += 100;
}
return NULL;
}
int main(int argc, char** argv)
{
NUM_THREADS = atoi(argv[1]);
pthread_t *threads = (pthread_t*)malloc(sizeof(pthread_t) * NUM_THREADS);
int rc;
clock_t start, diff;
LOOP_INDEX = COUNTER/NUM_THREADS;
start = clock();
for (int t = 0; t < NUM_THREADS; t++) {
rc = pthread_create((threads + t), NULL, addNum, NULL);
if (rc) {
printf("ERROR; return code from pthread_create() is %d", rc);
exit(-1);
}
}
void *status;
for (int t = 0; t < NUM_THREADS; t++) {
rc = pthread_join(threads[t], &status);
}
diff = clock() - start;
int sec = diff / CLOCKS_PER_SEC;
printf("%d",sec);
}
Note: All the answers I found online said that the overhead of creating the threads is more than the work they are doing. To test it, I commented out everything in the "addNum()" function. But then, after doing that no matter how many threads I create, the time taken by the code is 0 seconds. So there is no overhead as such, I think.
clock() counts CPU time used, across all threads. So all that's telling you is that you're using a little bit more total CPU time, which is exactly what you would expect.
It's the total wall clock elapsed time which should be going down if your parallelisation is effective. Measure that with clock_gettime() specifying the CLOCK_MONOTONIC clock instead of clock().

Pass multiple args to thread using struct (pthread)

I'm learning to programming using pthread for a adder program, after reference several codes still don't get how to pass multiple arguments into a thread using a struct, here is my buggy program:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>
typedef struct s_addition {
int num1;
int num2;
int sum;
} addition;
void *thread_add_function (void *ad)
{
printf ("ad.num1:%d, ad.num2:%d\n",ad.num1, ad.num2);
ad.sum = ad.num1 + ad.num2;
pthread_exit(0);
}
int main()
{
int N = 5;
int a[N], b[N], c[N];
srand (time(NULL));
// fill them with random numbers
for ( int j = 0; j < N; j++ ) {
a[j] = rand() % 392;
b[j] = rand() % 321;
}
addition ad1;
pthread_t thread[N];
for (int i = 0; i < N; i++) {
ad1.num1 = a[i];
ad1.num2 = b[i];
printf ("ad1.num1:%d, ad1.num2:%d\n",ad1.num1, ad1.num2);
pthread_create (&thread[i], NULL, thread_add_function, &ad1);
pthread_join(thread[i], NULL);
c[i] = ad.sum;
}
printf( "This is the result of using pthread.\n");
for ( int i = 0; i < N; i++) {
printf( "%d + %d = %d\n", a[i], b[i], c[i]);
}
}
But when compiling I got the following error:
vecadd_parallel.c:15:39: error: member reference base type 'void *' is not a
structure or union
printf ("ad.num1:%d, ad.num2:%d\n",ad.num1, ad.num2);
I tried but still cannot get a clue, what I am doing wrong with it?
Seems like you have a problem with trying to access the members of a void datatype.
You will need to add a line to cast your parameter to thread_add_function to the correct datatype similar to addition* add = (addition*)ad;, and then use this variable in your function (note that you also have to change you r .'s to -> because it's a pointer)
You also should only pass data to threads that was malloc()'d, as stack allocated data may not be permanent. It should be fine for the current implementation, but changes later could easily give strange, unpredictable behaviour.

C++ 11 std::thread strange behavior

I am experimenting a bit with std::thread and C++11, and I am encountering strange behaviour.
Please have a look at the following code:
#include <cstdlib>
#include <thread>
#include <vector>
#include <iostream>
void thread_sum_up(const size_t n, size_t& count) {
size_t i;
for (i = 0; i < n; ++i);
count = i;
}
class A {
public:
A(const size_t x) : x_(x) {}
size_t sum_up(const size_t num_threads) const {
size_t i;
std::vector<std::thread> threads;
std::vector<size_t> data_vector;
for (i = 0; i < num_threads; ++i) {
data_vector.push_back(0);
threads.push_back(std::thread(thread_sum_up, x_, std::ref(data_vector[i])));
}
std::cout << "Threads started ...\n";
for (i = 0; i < num_threads; ++i)
threads[i].join();
size_t sum = 0;
for (i = 0; i < num_threads; ++i)
sum += data_vector[i];
return sum;
}
private:
const size_t x_;
};
int main(int argc, char* argv[]) {
const size_t x = atoi(argv[1]);
const size_t num_threads = atoi(argv[2]);
A a(x);
std::cout << a.sum_up(num_threads) << std::endl;
return 0;
}
The main idea here is that I want to specify a number of threads which do independent computations (in this case, simple increments).
After all threads are finished, the results should be merged in order to obtain an overall result.
Just to clarify: This is only for testing purposes, in order to get me understand how
C++11 threads work.
However, when compiling this code using the command
g++ -o threads threads.cpp -pthread -O0 -std=c++0x
on a Ubuntu box, I get very strange behaviour, when I execute the resulting binary.
For example:
$ ./threads 1000 4
Threads started ...
Segmentation fault (core dumped)
(should yield the output: 4000)
$ ./threads 100000 4
Threads started ...
200000
(should yield the output: 400000)
Does anybody has an idea what is going on here?
Thank you in advance!
Your code has many problems (see even thread_sum_up for about 2-3 bugs) but the main bug I found by glancing your code is here:
data_vector.push_back(0);
threads.push_back(std::thread(thread_sum_up, x_, std::ref(data_vector[i])));
See, when you push_back into a vector (I'm talking about data_vector), it can move all previous data around in memory. But then you take the address of (reference to) a cell for your thread, and then push back again (making the previous reference invalid)
This will cause you to crash.
For an easy fix - add data_vector.reserve(num_threads); just after creating it.
Edit at your request - some bugs in thread_sum_up
void thread_sum_up(const size_t n, size_t& count) {
size_t i;
for (i = 0; i < n; ++i); // see that last ';' there? means this loop is empty. it shouldn't be there
count = i; // You're just setting count to be i. why do that in a loop? Did you mean +=?
}
The cause of your crash might be that std::ref(data_vector[i]) being invalidated by the next push_back in data_vector. Since you know the number of threads, do a data_vector.reserve(num_threads) before you start spawning off the threads to keep the references from being invalidated.
As you resize the vector with the calls to push_back, it is likely to have to reallocate the storage space, causing the references to the contained values to be invalidated. This causes the thread to write to non-allocated memory, which is undefined behavior.
Your options are to pre-allocate the size you need (vector::reserve is one option), or choose a different container.

malloc large memory never returns NULL

when I run this, it seems to have no problem with keep allocating memory with cnt going over thousands. I don't understand why -- aren't I supposed to get a NULL at some point? Thanks!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
int main(void)
{
long C = pow(10, 9);
int cnt = 0;
int conversion = 8 * 1024 * 1024;
int *p;
while (1)
{
p = (int *)malloc(C * sizeof(int));
if (p != NULL)
cnt++;
else break;
if (cnt % 10 == 0)
printf("number of successful malloc is %d with %ld Mb\n", cnt, cnt * C / conversion);
}
return 0;
}
Are you running this on Linux? Linux has a highly surprising feature known as overcommit. It doesn't actually allocate memory when you call malloc(), but rather when you actually use that memory. malloc() will happily let you allocate as much memory as your heart desires, never returning a NULL pointer.
It's only when you actually access the memory that Linux takes you seriously and goes out searching for free memory to give you. Of course there may not actually be enough memory to meet the promise it gave your program. You say, "Give me 8GB," and malloc() says, "Sure." Then you try to write to your pointer and Linux says, "Oops! I lied. How bout I just kill off processes (probably yours) until I I free up enough memory?"
You're allocating virtual memory. On a 64-bit OS, virtual memory is available in almost unlimited supply.

Why the physical memory in Linux is allocated linearly increased rather than at once?

I had wrote a program as below which allocated about 1.2G memory at once, and I tested it on Linux. Then I found
If I defined the macro *WRITE_MEM*, the physical memory usage (inspected by the command top) will increase linearly.
If I didn't define the macro, the physical memory usage is very small (about hundreds of kilobytes) and not changed verly large.
I dont's understand the phenomenon.
#include <iostream>
#include <cmath>
#include <cstdlib>
using namespace std;
float sum = 0.;
int main (int argc, char** argv)
{
float* pf = (float*) malloc(1024*1024*300*4);
float* p = pf;
for (int i = 0; i < 300; i++) {
cout << i << "..." << endl;
float* qf = (float *) malloc(1024*1024*4);
float* q = qf;
for (int j = 0; j < 1024*1024; j++) {
*q++ = sin(j*j*j*j) ;
}
q = qf;
for (int j = 0; j < 1024*1024; j++) {
#ifdef WRITE_MEM // The physical memory usage will increase linearly
*p++ = *q++;
sum += *q;
#else // The physical memory usage is small and will not change
p++;
// or
// sum += *p++;
#endif
}
free(qf);
}
free(pf);
return 0;
}
Linux allocates virtual memory immediately, but doesn't back it with physical memory until the pages are actually used. This causes processes to only use the physical memory they actually require, leaving the unused memory available for the rest of the system.

Resources