using malloc to allocate pointers - malloc

If you want to allocate a integer my understanding is that you do this:
int* x = (int*) malloc(sizeof(int));
If I understand correctly malloc will allocate an integer on the heap and return a pointer to it which will then be stored in x. If I wanted to allocated an int* instead would I do this?
int** x = (int**) malloc(sizeof(int*));
In general, does malloc return a point to the type that it just created? When you do this it looks like you are casting the pointer that malloc returns but not casting the actual memory that malloc allocated. Why is that?

Related

Local memory for each CUDA thread

I have a simple program below. My question is that where is "temp" actually stored? is it in global or local memory? I need array temp for each idx so that every thread has individual array temp. In this case, it is working properly. But in my actual program, when I tried to fill temp[0] from test2 it made the program stopped. Suppose we have 1024 threads then it only run the kernel around 200 threads. So, I am wondering whether temp is shared or not. If yes, maybe there is a collision there. I also did not get any error messsage. Please someone explain about this.
__device__ void test2(int temp[], int idx) {
temp[0] = idx;
printf("%d ", temp[0]);
}
__global__ void test() {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int *temp = (int *) malloc(100 * sizeof (int));
test2(temp, idx);
}
int main() {
test << <1, 1024 >> >();
return 0;
}
My question is that where is "temp" actually stored?
The allocation for temp is stored in a place called the device heap. It is a form of global memory. However the temp variable itself (i.e. the pointer value) is in local memory - not shared or visible to other threads.
I need array temp for each idx so that every thread has individual array temp.
You will get that, subject to caveats below. Each thread will have its own individual array, referenced by its local variable temp. Each thread will have a separate allocation for storage on the device heap.
People commonly have problems with in-kernel new or malloc. One of the main reasons is that the device heap is initially limited to 8MB, across all of your device heap allocations. So if enough threads do a new or malloc of enough allocation requests, you will run out of space.
When you run out of space, the API way to signal that is to return a zero pointer value for the allocation (a NULL pointer). If you then attempt to use this NULL pointer, you will have trouble.
For debugging purposes (i.e. to prove this is happening), test the pointer for NULL (i.e. == 0) before using it. If it is NULL, don't use it (perhaps print an error message instead).
You can read more about this in the documentation or in many questions here on the SO cuda tag. If you read any of these sources, you will discover that you can increase the size of the device heap.

Map multiple kernel buffer into contiguous userspace buffer?

I have allocated multiple kernel accessible buffers using dma_alloc_coherent, each 4MiB in size. The goal is to map these buffers into a contiguous userspace virtual memory. The issue is that remap_pfn_range doesn't seem to be working, as the userspace memory sometimes works and sometimes doesn't, or sometimes duplicates the page mappings of the buffers.
// in probe() function
dma_alloc_coherent(&pcie->dev, BUF_SIZE, &bus_addr0, GFP_KERNEL);
dma_alloc_coherent(&pcie->dev, BUF_SIZE, &bus_addr1, GFP_KERNEL);
// ...
// in mmap() function
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
pfn = dma_to_phys(&pcie->dev, &bus_addr0) >> PAGE_SHIFT;
remap_pfn_range(pfn, vma->vm_start + 0, pfn, BUF_SIZE, vma->vm_page_prot);
pfn = dma_to_phys(&pcie->dev, &bus_addr1) >> PAGE_SHIFT;
remap_pfn_range(pfn, vma->vm_start + BUF_SIZE, pfn, BUF_SIZE, vma->vm_page_prot);
I'm not really sure of the best way to map multiple kernel buffers to contiguous userspace memory, but I have a feeling I am doing it wrong. Thanks in advance.
I have no idea why there isn't a better interface to map multiple buffers contiguously into user space. In theory you can use multiple calls to remap_pfn_range() but getting the correct pfn for memory allocated by dma_alloc_coherent() is essentially impossible on some platforms (e.g. ARM).
I have come up with a solution to this problem that might not be considered "good" but seems to work well enough in my usage on multiple platforms (x86_64, and various ARM). The solution is to temporarily modify the start and end addresses in the struct vm_area_struct while calling dma_mmap_coherent() multiple times, once for each buffer. As long as you reset the VMA start and end addresses to their original values, everything seems to work okay (see my prior disclaimer).
Here is an example:
static int mmap(struct file *file, struct vm_area_struct *vma)
{
. . .
int rc;
unsigned long vm_start_orig = vma->vm_start;
unsigned long vm_end_orig = vma->vm_end;
for (int idx = 0; idx < buffer_list_size; idx++) {
buffer_entry = &buffer_list[idx];
/* Temporarily modify VMA start and end addresses */
if (idx > 0) {
vma->vm_start = vma->vm_end;
}
vma->vm_end = vma->vm_start + buffer_entry->size;
rc = dma_mmap_coherent(dev, vma,
buffer_entry->virt_address,
buffer_entry->phys_addr,
buffer_entry->size);
if (rc != 0) {
pr_err("dma_mmap_coherent: %d (IDX = %d)\n", rc, idx);
return -EAGAIN;
}
}
/* Restore VMA addresses */
vma->vm_start = vm_start_orig;
vma->vm_end = vm_end_orig;
return rc;
}
Unfortunately, the only currently supported method for mmap()ing DMA coherent memory is the macro dma_mmap_coherent() or the function dma_mmap_attrs() (which is called by dma_mmap_coherent()). Unfortunately, that does not support splitting a single VMA across multiple, individually allocated blocks of DMA coherent memory.
(I wish there was a supported way to split the mmap()ing of a VMA across multiple allocations of DMA coherent memory because it affects the buffer allocation in a kernel subsystem that I help maintain. I had to change it to allocate the buffer as a single block of DMA coherent memory instead of many page-sized blocks.)

Memory release while reassign char * to null

I'm a little bit confused regarding string memory usage in c++.
Is it good reassign *PChar to NULL second time? Will assigned first time to *PChar string memory be released?
char * fnc(int g)
{
...
}
char *PChar = NULL;
PChar=fnc(1);
if (PChar) { sprintf(s,"%s",PChar); } ;
*PChar = NULL;
PChar=fnc(2);
if (PChar) { sprintf(s,"%s",PChar); } ;
First things first. The following statement is not what you intend:
*PChar = NULL;
PChar=fnc(2);
You are NOT assigning null to the pointer, but putting value zero (0) to the first character of the said buffer. You might be willing to do:
PChar = NULL;
PChar=fnc(2);
As a good programming practice, yes you should assign a pointer to null after it is used (AND possibility memory-deallocated). But assigning a pointer to null will not free the memory - the pointer will not point to allocated memory, but to non-existent memory location. You need to call delete if it was allocated using new, or need to call free if allocated by malloc.
As for the given statement, the compiler would anyway remove the following statement, as the process of optimization:
// PChar = NULL;
PChar=fnc(2);
You need to be very careful while using pointers, and assignment to it with a statically allocated data or dynamically allocated buffer!
I would suggest declaring a buffer of the PChar type and pass pointer to this buffer in a function call.
Good programming practice cals for passing also the allowed length of the buffer that should be checked in th function.
#define MAX_PCHAR_LEN 1024 // or constant const DWORD . . .
PChar PCharbuf[MAX_PCHAR_LEN] = {0}; // initialize array with 0s
//make a call
fnc (&PCharbuf, MAX_PCHAR_LEN, 2); // whatever 2 means
This way you do not have to worry about who allocates and who released memory, since release is automatic after PCharbuf goes out of scope.

Segmentation Fault With Multiple Threads

I get error segmentation fault because of the free() at the end of this equation...
don't I have to free the temporary variable *stck? Or since it's a local pointer and
was never assigned a memory space via malloc, the compiler cleans it up for me?
void * push(void * _stck)
{
stack * stck = (stack*)_stck;//temp stack
int task_per_thread = 0; //number of push per thread
pthread_mutex_lock(stck->mutex);
while(stck->head == MAX_STACK -1 )
{
pthread_cond_wait(stck->has_space,stck->mutex);
}
while(task_per_thread <= (MAX_STACK/MAX_THREADS)&&
(stck->head < MAX_STACK) &&
(stck->item < MAX_STACK)//this is the amount of pushes
//we want to execute
)
{ //store actual value into stack
stck->list[stck->head]=stck->item+1;
stck->head = stck->head + 1;
stck->item = stck->item + 1;
task_per_thread = task_per_thread+1;
}
pthread_mutex_unlock(stck->mutex);
pthread_cond_signal(stck->has_element);
free(stck);
return NULL;
}
Edit: You totally changed the question so my old answer doesn't really make sense anymore. I'll try to answer the new one (old answer still below) but for reference, next time please just ask a new question instead of changing an old one.
stck is a pointer that you set to point to the same memory as _stck points to. A pointer does not imply allocating memory, it just points to memory that is already (hopefully) allocated. When you do for example
char* a = malloc(10); // Allocate memory and save the pointer in a.
char* b = a; // Just make b point to the same memory block too.
free(a); // Free the malloc'd memory block.
free(b); // Free the same memory block again.
you free the same memory twice.
-- old answer
In push, you're setting stck to point to the same memory block as _stck, and at the end of the call you free stack (thereby calling free() on your common stack once from each thread)
Remove the free() call and, at least for me, it does not crash anymore. Deallocating the stack should probably be done in main() after joining all the threads.

Two structs, one references another

typedef struct Radios_Frequencia {
char tipo_radio[3];
int qt_radio;
int frequencia;
}Radiof;
typedef struct Radio_Cidade {
char nome_cidade[30];
char nome_radio[30];
char dono_radio[3];
int numero_horas;
int audiencia;
Radiof *fre;
}R_cidade;
void Cadastrar_Radio(R_cidade**q){
printf("%d\n",i);
q[0]=(R_cidade*)malloc(sizeof(R_cidade));
printf("informa a frequencia da radio\n");
scanf("%d",&q[0]->fre->frequencia); //problem here
printf("%d\n",q[0]->fre->frequencia); // problem here
}
i want to know why this function void Cadastrar_Radio(R_cidade**q) does not print the data
You allocated storage for your primary structure but not the secondary one. Change
q[0]=(R_cidade*)malloc(sizeof(R_cidade));
to:
q[0]=(R_cidade*)malloc(sizeof(R_cidade));
q[0]->fre = malloc(sizeof(Radiof));
which will allocate both. Without that, there's a very good chance that fre will point off into never-never land (as in "you can never never tell what's going to happen since it's undefined behaviour).
You've allocated some storage, but you've not properly initialized any of it.
You won't get anything reliable to print until you put reliable values into the structures.
Additionally, as PaxDiablo also pointed out, you've allocated the space for the R_cidade structure, but not for the Radiof component of it. You're using scanf() to read a value into space that has not been allocated; that is not reliable - undefined behaviour at best, but most usually core dump time.
Note that although the two types are linked, the C compiler most certainly doesn't do any allocation of Radiof simply because R_cidade mentions it. It can't tell whether the pointer in R_cidade is meant to be to a single structure or the start of an array of structures, for example, so it cannot tell how much space to allocate. Besides, you might not want to initialize that structure every time - you might be happy to have left pointing nowhere (a null pointer) except in some special circumstances known only to you.
You should also verify that the memory allocation succeeded, or use a memory allocator that guarantees never to return a null or invalid pointer. Classically, that might be a cover function for the standard malloc() function:
#undef NDEBUG
#include <assert.h>
void *emalloc(size_t nbytes)
{
void *space = malloc(nbytes);
assert(space != 0);
return(space);
}
That's crude but effective. I use non-crashing error reporting routines of my own devising in place of the assert:
#include "stderr.h"
void *emalloc(size_t nbytes)
{
void *space = malloc(nbytes);
if (space == 0)
err_error("Out of memory\n");
return space;
}

Resources