c++ memory leak using threads - multithreading

I have a memory leak with this piece of code and I don't understand why.
Each thread calls the function exec. The function exec simply creates a std::vector and than delete it. This vector has length equal to the number of threads and it is created, and deleted, only once.
You can suppose that this code is thread-safe in the sense that the vector is deleted only after the creation.
class Foo{
public:
Foo(const std::size_t& numThreads):size_(numThreads){}
inline void alloc(){std::call_once(bufferflag_,&Foo::alloc_,this);}
inline void free(){std::call_once(bufferflag_,&Foo::free_,this);}
private:
const std::size_t size_;
std::vector<double>* bufferptr_;
std::once_flag bufferflag_;
inline void alloc_(){bufferptr_ = new std::vector<double>(size_);}
inline void free_(){delete [] bufferptr_;}
};
void exec(Foo& comm){
comm.alloc();
// sync the threads here with some barrier
comm.free();
}
void main(){
Foo comm(10);
std::vector<std::thread> t(10);
for(std::size_t tid=0;tid!=10;++tid) t[tid]=std::thread(exec,std::ref(comm));
for(std::size_t tid=0;tid!=10;++tid) t[tid].join();
}
HEAP SUMMARY:
in use at exit: 104 bytes in 2 blocks
total heap usage: 23 allocs, 21 frees, 3,704 bytes allocated
104 (24 direct, 80 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 2
LEAK SUMMARY:
definitely lost: 24 bytes in 1 blocks
indirectly lost: 80 bytes in 1 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 0 bytes in 0 blocks
suppressed: 0 bytes in 0 blocks
UPDATE
If instead of using call_once I just call the new and delete from the same thread, there are no memory leaks.

Take a look at this modified code:
class Foo
{
public:
Foo(const std::size_t& numThreads) :size_(numThreads) {}
inline void alloc()
{
printf("alloc\n");
std::call_once(alloc_flag, &Foo::alloc_, this);
}
inline void foo_free()
{
printf("free\n");
std::call_once(free_flag, &Foo::free_, this); // Changed from bufferflag
}
private:
const std::size_t size_;
std::vector<double>* bufferptr_;
std::once_flag alloc_flag;
std::once_flag free_flag;
inline void alloc_()
{
printf("once_alloc_!\n");
bufferptr_ = new std::vector<double>(size_);
}
inline void free_()
{
printf("once_free_!\n");
bufferptr_->clear();
delete bufferptr_; // Not delete[] bufferptr_
bufferptr_ = NULL;
}
};
void exec(Foo& comm){
comm.alloc();
// Barrier
comm.foo_free();
}
If you use 2 different once_flag for alloc and free, you run also the free_() method. The fact that you are using only one flag generates a memory leak because the flag is consumed by the first alloc.
To check memory leaks I use _CrtDumpMemoryLeaks(); and I get some extra leaks even if I comment out the thread create/join routines but I think this std::vector thrash. Hope this helps!

Related

CS50/pset5/speller memory leak

Can someone help me with this? What am I doing wrong? I don't exactly know what part of this I need to fix, I think it's somewhere in the unload function.
Here is my code:
// Implements a dictionary's functionality
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strings.h>
#include "dictionary.h"
#define HASHTABLE_SIZE 10000
// Defines struct for a node
typedef struct node
{
char word[LENGTH + 1];
struct node *next;
}
node;
node *hashtable[HASHTABLE_SIZE];
// Hashes the word (hash function posted on reddit by delipity)
int hash_index(char *hash_this)
{
unsigned int hash = 0;
for (int i = 0, n = strlen(hash_this); i < n; i++)
{
hash = (hash << 2) ^ hash_this[i];
}
return hash % HASHTABLE_SIZE;
}
// Initializes counter for words in dictionary
int word_count = 0;
// Loads dictionary into memory, returning true if successful else false
bool load(const char *dictionary)
{
// Opens dictionary
FILE *file = fopen(dictionary, "r");
if (file == NULL)
{
return false;
}
// Scans dictionary word by word
char word[LENGTH + 1];
while (fscanf(file, "%s", word) != EOF)
{
// Mallocs a node for each new word (i.e., creates node pointers)
node *new_node = malloc(sizeof(node));
node *cursor;
node *tmp;
// Checks if malloc succeeded, returns false if not
if (new_node == NULL)
{
unload();
return false;
}
// Copies word into node if malloc succeeds
strcpy(new_node->word, word);
// Initializes & calculates index of word for insertion into hashtable
int h = hash_index(new_node->word);
// Initializes head to point to hashtable index/bucket
node *head = hashtable[h];
// Inserts new nodes at beginning of lists
if (head == NULL)
{
hashtable[h] = new_node;
word_count++;
}
else
{
new_node->next = hashtable[h];
hashtable[h] = new_node;
word_count++;
}
}
fclose(file);
return true;
}
// Returns true if word is in dictionary else false
bool check(const char *word)
{
// Creates copy of word on which hash function can be performed
int n = strlen(word);
char word_copy[LENGTH + 1];
for (int i = 0; i < n; i++)
{
word_copy[i] = tolower(word[i]);
}
// Adds null terminator to end string
word_copy[n] = '\0';
// Initializes index for hashed word
int h = hash_index(word_copy);
// Sets cursor to point to same address as hashtable index/bucket
node *cursor = hashtable[h];
// Sets cursor to point to same location as head
while (cursor != NULL)
{
// If strcasecmp returns true, then word has been found
if (strcasecmp(cursor->word, word_copy) == 0)
{
return true;
}
// Else word has not yet been found, advance cursor
else
{
cursor = cursor->next;
}
}
// Cursor has reached end of list, word not found in dictionary (misspelled)
return false;
}
// Returns number of words in dictionary if loaded else 0 if not yet loaded
unsigned int size(void)
{
return word_count;
}
// Unloads dictionary from memory, returning true if successful else false
bool unload(void)
{
node *head = NULL;
node *cursor = head;
// freeing linked lists
while (cursor != NULL)
{
node *temp = cursor;
cursor = cursor->next;
free(temp);
}
return true;
}
These are the results from check50:
:) dictionary.c, dictionary.h, and Makefile exist
:) speller compiles
:) handles most basic words properly
:) handles min length (1-char) words
:) handles max length (45-char) words
:) handles words with apostrophes properly
:) spell-checking is case-insensitive
:) handles substrings properly
:( program is free of memory errors
valgrind tests failed; rerun with --log for more information.
These are the results when I run valgrind:
==595== Memcheck, a memory error detector
==595== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==595== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==595== Command: ./speller texts/cat.txt
==595==
MISSPELLED WORDS
WORDS MISSPELLED: 0
WORDS IN DICTIONARY: 143091
WORDS IN TEXT: 6
TIME IN load: 1.39
TIME IN check: 0.00
TIME IN size: 0.00
TIME IN unload: 0.00
TIME IN TOTAL: 1.40
==595==
==595== HEAP SUMMARY:
==595== in use at exit: 8,013,096 bytes in 143,091 blocks
==595== total heap usage: 143,096 allocs, 5 frees, 8,023,416 bytes allocated
==595==
==595== 8,013,096 bytes in 143,091 blocks are still reachable in loss record 1 of 1
==595== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==595== by 0x401175: load (dictionary.c:52)
==595== by 0x4009B4: main (speller.c:40)
==595==
==595== LEAK SUMMARY:
==595== definitely lost: 0 bytes in 0 blocks
==595== indirectly lost: 0 bytes in 0 blocks
==595== possibly lost: 0 bytes in 0 blocks
==595== still reachable: 8,013,096 bytes in 143,091 blocks
==595== suppressed: 0 bytes in 0 blocks
==595==
==595== For counts of detected and suppressed errors, rerun with: -v
==595== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
I'm really stuck on this, and I would greatly appreciate some help!!
You are correct, the problem is unload. Look at this sequence carefully:
node *head = NULL;
node *cursor = head;
// freeing linked lists
while (cursor != NULL)
What is the value of cursor when it first reaches the while?
hashtable is the head of the linked list. It needs to mentioned somewhere in the unload function.

multiple thread in CMSIS RTOS - STM32 nucleo L053R8

Today I developing RTOS (CMSIS RTOS) for kit STM32 nucleo L053R8. I have issue relate to multiple task.
I create 4 task(task_1, task_2, task_3, task_4), however only 3 task run.
This is part of my code:
#include "main.h"
#include "stm32l0xx_hal.h"
#include "cmsis_os.h"
osMutexId stdio_mutex;
osMutexDef(stdio_mutex);
int main(void){
.....
stdio_mutex = osMutexCreate(osMutex(stdio_mutex));
osThreadDef(defaultTask_1, StartDefaultTask_1, osPriorityNormal, 0, 128);
defaultTaskHandle = osThreadCreate(osThread(defaultTask_1), NULL);
osThreadDef(defaultTask_2, StartDefaultTask_2, osPriorityNormal, 0, 128);
defaultTaskHandle = osThreadCreate(osThread(defaultTask_2), NULL);
osThreadDef(defaultTask_3, StartDefaultTask_3, osPriorityNormal, 0, 128);
defaultTaskHandle = osThreadCreate(osThread(defaultTask_3), NULL);
osThreadDef(defaultTask_4, StartDefaultTask_4, osPriorityNormal, 0, 600);
defaultTaskHandle = osThreadCreate(osThread(defaultTask_4), NULL);
}
void StartDefaultTask_1(void const * argument){
for(;;){
osMutexWait(stdio_mutex, osWaitForever);
printf("%s\n\r", __func__);
osMutexRelease(stdio_mutex);
osDelay(1000);
}
}
void StartDefaultTask_2(void const * argument){
for(;;){
osMutexWait(stdio_mutex, osWaitForever);
printf("%s\n\r", __func__);
osMutexRelease(stdio_mutex);
osDelay(1000);
}
}
void StartDefaultTask_3(void const * argument){
for(;;){
osMutexWait(stdio_mutex, osWaitForever);
printf("%s\n\r", __func__);
osMutexRelease(stdio_mutex);
osDelay(1000);
}
}
void StartDefaultTask_4(void const * argument){
for(;;){
osMutexWait(stdio_mutex, osWaitForever);
printf("%s\n\r", __func__);
osMutexRelease(stdio_mutex);
osDelay(1000);
}
}
this is reslut in console (uart):
when I change stack size for task 4 from 600 -> 128 as below:
osThreadDef(defaultTask_4, StartDefaultTask_4, osPriorityNormal, 0, 128);
defaultTaskHandle = osThreadCreate(osThread(defaultTask_4), NULL);
then don't have any task run.
Actualy I want make many thread for my application, however this issue cause difficult to implement.
Could you let me know root cause of prblem? and how to resolve it.
Thank you advance!!
There is no common easy method of the stack calculation. It depends on many factors.
I would suggest to avoid stack greedy functions like printf scanf etc. Write your own ones, not as "smart" and universal but less resources greedy.
Avoid large local variables. Be very careful when you allocate the memory
As your Suggestions, I checked by debug and see root cause is heap size is small.
I resolve by 2 method
increase heap size: #define configTOTAL_HEAP_SIZE ((size_t)5120)
decrease stack size: #define configMINIMAL_STACK_SIZE ((uint16_t)64)
osThreadDef(defaultTask_6, StartDefaultTask_6, osPriorityNormal, 0, 64);
Do you know how to determine max of heap size? Please let me known.
Thank you so much

Why I used DeleteLocalRef(jclass), but still got memory leak in C++

Here's my exam under Ubuntu/JVM 1.7.0_75
1.
A thread will call the function below 50000 times per-second:
void LoggerImp::jniLog() {
for(int i=0;i<100;i++)
{} }
Of course, there's nothing problematic here.
2.
I add some jni codes into it:
void LoggerImp::jniLog() {
for(int i=0;i<100;i++)
{
jclass tcls = env->FindClass("com/morefun/bi/sdk/UserInfo");
} }
O.K. I didn't release the local reference, so memory leak happenes.
3.
Then I added the local reference releasing code as:
void LoggerImp::jniLog() {
for(int i=0;i<100;i++)
{
jclass tcls = env->FindClass("com/morefun/bi/sdk/UserInfo");
env->DeleteLocalRef(tcls);
} }
Seems no memory leak there...
4.
I make it more loop:
void LoggerImp::jniLog() {
for(int i=0;i<1000;i++)
{
jclass tcls = env->FindClass("com/morefun/bi/sdk/UserInfo");
env->DeleteLocalRef(tcls);
} }
Horrible things happened, there's still memory leak...
This time, the memory leak is not inside JVM, JVM heap looks O.K. during GC.
The program only use one JVM env in C++, and the code will never return back to JAVA.
And it's supposed to be a stable server thread, it's very important to keep the memory clean.
This made me crazy.
Anyone has idea on this issue will be very appreciated, :)

Why does server mode of CLR garbage collector occupies more memory than workstation mode?

I have the below test that I run in workstation mode and server mode for CLR garbage collector. At the end in server mode I end up with 520 MB private bytes where as in workstation mode, I only end up with 50 MB. Here's my output from windbg:
!eeheap
...
...
GC Heap Size: Size: 0x65ccac8 (106744520) bytes.
!address -summary
...
...
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 53 7ffb`ae203000 ( 127.983 Tb) 99.99%
MEM_RESERVE 94 4`31617000 ( 16.772 Gb) 97.06% 0.01%
MEM_COMMIT
336 0`207d6000 ( 519.836 Mb) 2.94% 0.00%
What is the thing causes that big difference despite the fact that at the end of the execution I force full GC?
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Runtime;
using System.Threading;
using System.Threading.Tasks;
class Program
{
static int N = 25 * 1000 * 1000; // Execute allocations in a loop 25 million times
static int Clear = 1000 * 1000; // Clear list after every 1 million allocations to give GC 25 chances to clear things up
static void Main(string[] args)
{
// do some warmup
AllocateRefContainer();
GC.Collect();
GC.Collect();
var sw = Stopwatch.StartNew();
AllocateRefContainer();
sw.Stop();
Console.WriteLine("RefContainer Allocation {0:F2}s, {1:N0} Allocs/s", sw.Elapsed.TotalSeconds, N / sw.Elapsed.TotalSeconds);
GC.Collect();
GC.Collect(2, GCCollectionMode.Forced, true);
Thread.Sleep(1000);
Console.WriteLine("PooledRefContainer Allocation {0:F2}s, {1:N0} Allocs/s", sw.Elapsed.TotalSeconds, N / sw.Elapsed.TotalSeconds);
Console.WriteLine("Private Bytes: {0:N0} MB", Process.GetCurrentProcess().PrivateMemorySize64 / (1024 * 1024));
Console.ReadLine();
}
class ReferenceContainer // Class with one object reference
{
public ReferenceContainer Obj;
}
static List<ReferenceContainer> RContainer = new List<ReferenceContainer>();
static void AllocateRefContainer()
{
var container = RContainer;
for (int i = 0; i < N; i++)
{
container.Add(new ReferenceContainer());
Calculate();
if (i % Clear == 0)
{
container.Clear();
}
}
}
// Simulate some CPU only calculation
static void Calculate()
{
long lret = 0;
for (int i = 0; i < 100; i++)
{
lret++;
}
}
}
Server mode runs 1 GC thread per logical processor core. This thread is allocated its own working set. There's therefore about 8x the overhead if you have 8 cores. Also the GC is non-deterministic, so you cannot reliably assess the memory usage since the reported bytes are those reserved for the app, not those necessarily used. Server GC is much more aggressive and may reserve more bytes.
See this MSDN article for more information.

address allocation in linux : possible over lap?

I am trying to check if my program is assigning memory correctly -
so i have series of pointers of different types,
pData1 = 0x844c458 ( result of malloc(5 * size of (double*))
pData2 = 0x844c470 (result of malloc (10 size of (double));
pData3 = 0x844c3a0(result of malloc(44 * size 0f (double*));
pData4 = 0x844c358
So i think double = 8 bytes , 5 *8 = 40 bytes, which means first two addresses will overlap and similarly last two ?
I am getting invalid free, so i am investigating a memory corruption in my code so trying to find where this might be happening.
-----Edit ------- Adding code details
THis is the struct -
struct _ELEMENT
{
short s;
char arr[20];
int size;
void *ptr1;
void *ptr2;
}ELEMENT;
There are two classes Parent and Derived ( child of Parent)
Class Parent
{
protected:
int size;
ELEMENT *ele1;
ELEMENT *ele2;
public:
void func();
...
}
Class Child::public Parent
{
int a,b,c;
}
Parent::Parent()
{
ele1 = NULL;
ele2= NULL;
}
Parent::~Parent()
{
for (int i =0; i< size; i++)
{
free(ele1[i].p1);
free(ele2[i].p1);
}
free(ele1);
free(ele2);
}
Child::Child()
{
a=0;...
}
Child::~Child()
{
for (int i =0; i< size; i++)
{
free(ele1[i].p1);
free(ele2[i].p1);
}
free(ele1);
free(ele2);
}
Parent::func ()
{
ele1 = (ELEMENT*)malloc (n * sizeof(ELEMENT));
ele2 = (ELEMENT*)malloc (n* sizeof(ELEMENT));
for (int i =0; i <somenumber; i++)
{
...some processing...
ele1[i].size = n;
ele2[i].size = x;
ele1[i].p1 = malloc (ele1[i].size);
ele2[i].p1 = malloc(ele2[i].size);
}
}
main ()
{
Parent *p;
CHild *c;
p = new Parent();
c= new Child();
p->func();
c->func();
delete(p);
delete(c);
}
The _glibc:invalid free comes at first free of parent destructor. This code was working fine in SOlaris for years but porting it in linux is giving this issue...
Thanks!
The answer to your question is that your program is allocating memory correctly, the first problem you had/have is that you don't know the size of your data types and so your computations are incorrect.
If you would post your code and the actual errors that you are getting, it's possible that we can figure this out. As it is, the deep problem of invalid free cannot be answered.
To whom it may concern, is this an answer or a comment?
sizeof(double) is 8 bytes and sizeof(double*) is 4 (on your 32 bit system).
Memory obtained by malloc will not overlap unless freed in the meantime.
Use a memory debugger such as valgrind
It looks that way... how did you end up with these pointers? In principle it's OK to have different pointers pointing to the same space - you just shouldn't be "freeing" except from a pointer that was assigned with malloc (or calloc). The glibc actually keeps some data "just below the pointer" that tells it how big the block is etc, so it can free "cleanly". If you change the value of the pointer, you can't "free part of a block" since the information about how big the block is isn't available (only if the pointer is unchanged). Could that be the source of your problem?

Resources