userspace-kernel shared memory with mmap works strange on multi-thread - linux

I'm currently making shared memory between userspace to kernel.
The following code is simple version of my mmap method.
for (j = 0; j < 3; j++){
RxFrame[j] = kmalloc(4096*6, GFP_KERNEL);
npages = (4096*6 - 1)/(PAGE_SIZE) + 1;
for(i = 0; i < npages*PAGE_SIZE; i += PAGE_SIZE){
printk("RxFrame[%d] PAGE_SIZE is %d\n", j, PAGE_SIZE);
SetPageReserved(virt_to_page(RxFrame[j] + i));
static int my_mmap(struct file *filp, struct vm_area_struct *vma){
unsigned long pfn = virt_to_phys(RxFrame[index]);
ret = remap_pfn_range(vma, vma->vm_start, pfn>>PAGE_SHIFT, len, vma->vm_page_prot);
if (ret < 0) {
pr_err("could not map the address area\n");
return -EIO;
return 0;
And following is my userspace application code which mmap used.
FRAME *RxFrame[3];
for(int i = 0; i < 3; i++){
if( (RxFrame[i] = mmap(0, 4096*6, PROT_WRITE | PROT_READ, MAP_SHARED, fd[i], 0)) == NULL ){
printf("mmap error\n");
return 0;
auto DoReceive = [&](const int index){
// read RxFrame[index][0].data
for(int i = 0; i < 3; i++){
thread[i] = std::thread(DoReceive , i);
My purpose was making three kernel buffers which dedicated data(different data will store in different kernel buffer) will store and use them with mmap in userspace with multi-thread.
when just one kernel buffer is used(data stored in just one kernel buffer), I can read RxFrame[index][0].data very well.
But when three kernel buffer is used(different data stored in separate kernel buffer), I can only read the first transmitted data.. for example if RxFrame[0]'s buffer received data, I can read RxFrame[0][0].data but can't read RxFrame[1][0].data and RxFrame[2][0].data..
Strange thing is that when received data is quite a lot at few moment, I can read RxFrame[1][0].data and RxFrame[2][0].data which looks like cached result!
I used volatile and other mmap option but I can't figure out why my mapped memory works like cached in multi-thread.
Any help will be thanks!
userspace-kernel shared memory with mmap works strange on multi-thread


mmap failed to allocate virtual memory

I got the following output in ftrace:
mmap(0x200000000000, 17179869184, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
My code:
void alloc_page_full_reverse()
printf("Allocating default pagesize pages > 128TB \n");
mmap_chunks_higher(24575, 0);
printf("Allocating default pagesize pages < 128TB \n");
/* Note: Allocating a 16GB chunk less due to heap space required
for other mappings */
mmap_chunks_lower(8190, 0);
int mmap_chunks_higher(unsigned long no_of_chunks, unsigned long hugetlb_arg)
unsigned long i;
char *hptr;
char *hint;
int mmap_args = 0;
for (i = 0; i < no_of_chunks; i++){
hint = hind_addr();
MAP_PRIVATE | MAP_ANONYMOUS | hugetlb_arg, -1, 0); // MAP_CHUNK_SIZE = 16GB
if (hptr == MAP_FAILED){
printf("\n Map failed at address %p < 384TB in iteration = %d \n", hptr, i);
if (validate_addr(hptr, 1)){
printf("\n Address failed, not in > 128Tb iterator = %d\n", i);
printf("> 128Tb: \n chunks allocated= %d \n", i);
static char *hind_addr(void)
int bits = 48 + rand() % 15;
return (char *) (1UL << bits);
Need to understand before mmap how to validate **void mmap(void addr, size_t length, int prot, int flags, int fd, off_t offset); all its argument are validated,
EX: size_t length is validated.
I still want to make sure I have enough memory before doing a mmap
There isn't an interface that allows a process to check this, and for good reason. Suppose such a syscall existed, and the kernel told a process it could allocate 1 GB of memory. However, it is possible the kernel is not able to allocate that memory by the time process actually requests the allocation. So, this information would not be useful.
Instead, you should attempt to allocate memory, and handle ENOMEM.

How do I save the result from GetProfileBinary into a smart pointer?

At the moment I have the following member variable in a class:
BYTE *m_pbyImportColumnMappings;
In one of the classes we attempt to read existing data from the registry, and if it is not present, we allocate it. So far, I have changed it like this:
void CImportOCLMAssignmentHistoryDlg::ReadSettings()
UINT uSize;
m_dwImportFlags = theApp.GetNumberSetting(theApp.GetActiveScheduleSection(_T("Options")),
_T("ImportFlags"), ImportAssignment::None);
_T("ImportColumnMappings"), (LPBYTE*)&m_pbyImportColumnMappings, &uSize);
// Reset memory buffer (if required)
if (uSize != (sizeof(BYTE) * 15))
if (uSize > 0)
delete[] m_pbyImportColumnMappings;
m_pbyImportColumnMappings = nullptr;
m_pbyImportColumnMappings = new BYTE[15];
// Default values
const gsl::span column_mappings(m_pbyImportColumnMappings, 15);
std::fill(begin(column_mappings), end(column_mappings), -1);
m_pbyImportColumnMappings[0] = -1;
m_pbyImportColumnMappings[1] = -1;
m_pbyImportColumnMappings[2] = -1;
m_pbyImportColumnMappings[3] = -1;
m_pbyImportColumnMappings[4] = -1;
m_pbyImportColumnMappings[5] = -1;
m_pbyImportColumnMappings[6] = -1;
m_pbyImportColumnMappings[7] = -1;
m_pbyImportColumnMappings[8] = -1;
m_pbyImportColumnMappings[9] = -1;
m_pbyImportColumnMappings[10] = -1;
m_pbyImportColumnMappings[11] = -1;
m_pbyImportColumnMappings[12] = -1;
m_pbyImportColumnMappings[13] = -1;
m_pbyImportColumnMappings[14] = -1;
My initial change was to use a gsl::span to suppress several warnings about using pointer arithemetic. But I don't know how to turn m_pbyImportColumnMappings into a smart pointer, given the fact that we are attempting to initially populate it from GetProfileBinary.
If I could turn it into a smart pointer then I would not need to deallocate the memory when the class goes out of scope.
In a related answer this code was suggested:
theApp.GetProfileBinary(strSection, strEntry,
reinterpret_cast<LPBYTE*>(&pALI), &uBytesRead);
std::unique_ptr<BYTE[]> cleanup(reinterpret_cast<BYTE*>(pALI));
But, I am not sure how to apply that cleanup method given teh fact we are dealing with a member variable of the class as opposed to an isolated variable in a function.
For a cleaner code, consider using std::vector and a temporary buffer
std::vector<BYTE> m_mapping;
m_mapping.resize(15, -1);
UINT len = 0;
BYTE* temp = nullptr;
AfxGetApp()->GetProfileBinary(_T("setting"), _T("key"), &temp, &len);
std::unique_ptr<BYTE[]> cleanup(temp);
if (len == m_mapping.size() * sizeof(m_mapping[0]))
memcpy(, temp, len);
std::fill(m_mapping.begin(), m_mapping.end(), -1);
std::vector also has automatic cleanup and additional methods.
Otherwise, using std::unique_ptr to replace new/delete for this member data, can be a bit of a nightmare. Example:
m_mapping = nullptr;
GetProfileBinary("setting", "key", &m_mapping, &uSize);
if (uSize != (sizeof(BYTE) * 15))
{ std::unique_ptr<BYTE[]> cleanup(m_mapping); }
//delete memory immediately after exiting scope
//note the extra brackets
//allocate new memory and don't manage it anymore
m_mapping = std::make_unique<BYTE[]>(15).release();
for (int i = 0; i < 15; i++) m_mapping[i] = -1;
Here we are not able to take advantage of std::unique_ptr memory management, it's only used to turn off warnings.
You don't need any casting here because it just happens that m_pbyImportColumnMappings is BYTE, and GetProfileBinary expects BYTE, it allocates memory using new BYTE

Building a Simple character device but device driver file will not write or read

I am trying to write a simple character device/LKM that reads, writes, and seeks.
I have been having a lot of issues with this, but have been working on it/troubleshooting for weeks and have been unable to get it to work properly. Currently, my module makes properly and mounts and unmounts properly, but if I try to echo to the device driver file the terminal crashes, and when i try to read from it using cat it returns killed.
Steps for this module:
First, I make the module by running make -C /lib/modules/$(uname -r)/build M=$PWD modules
For my kernel, uname -r is 4.10.17newkernel
I mount the module using sudo insmod simple_char_driver.ko
If I run lsmod, the module is listed
If I run dmesg, the KERN_ALERT in my init function "This device is now open" triggers correctly.
Additionally, if I run sudo rmmod, that functions "This device is now closed" KERN_ALERT also triggers correctly.
The module also shows up correctly in cat /proc/devices
I created the device driver file in /dev using sudo mknod -m 777 /dev/simple_char_driver c 240 0
Before making this file, I made sure that the 240 major number was not already in use.
My device driver c file has the following code:
#define BUFFER_SIZE 1024
//minor nunmber 0;
static int place_in_buffer = 0;
static int end_of_buffer = 1024;
static int MAJOR_NUMBER = 240;
char* DEVICE_NAME = "simple_char_driver";
typedef struct{
char* buf;
char *device_buffer;
static int closeCounter=0;
static int openCounter=0;
ssize_t simple_char_driver_read (struct file *pfile, char __user *buffer, size_t length, loff_t *offset){
int bytesRead = 0;
if (*offset >=BUFFER_SIZE){
bytesRead = 0;
if (*offset + length > BUFFER_SIZE){
length = BUFFER_SIZE - *offset;
printk(KERN_INFO "Reading from device\n");
if (copy_to_user(buffer, device_buffer + *offset, length) != 0){
return -EFAULT;
copy_to_user(buffer, device_buffer + *offset, length);
*offset += length;
printk(KERN_ALERT "Read: %s", buffer);
printk(KERN_ALERT "%d bytes read\n", bytesRead);
return 0;
ssize_t simple_char_driver_write (struct file *pfile, const char __user *buffer, size_t length, loff_t *offset){
int nb_bytes_to_copy;
if (BUFFER_SIZE - 1 -*offset <= length)
nb_bytes_to_copy= BUFFER_SIZE - 1 -*offset;
printk("BUFFER_SIZE - 1 -*offset <= length");
else if (BUFFER_SIZE - 1 - *offset > length)
nb_bytes_to_copy = length;
printk("BUFFER_SIZE - 1 -*offset > length");
printk(KERN_INFO "Writing to device\n");
if (*offset + length > BUFFER_SIZE)
printk("sorry, can't do that. ");
return -1;
printk("about to copy from device");
copy_from_user(device_buffer + *offset, buffer, nb_bytes_to_copy);
device_buffer[*offset + nb_bytes_to_copy] = '\0';
*offset += nb_bytes_to_copy;
return nb_bytes_to_copy;
int simple_char_driver_open (struct inode *pinode, struct file *pfile)
printk(KERN_ALERT"This device is now open");
printk(KERN_ALERT "This device has been opened this many times: %d\n", openCounter);
return 0;
int simple_char_driver_close (struct inode *pinode, struct file *pfile)
printk(KERN_ALERT"This device is now closed");
printk(KERN_ALERT "This device has been closed this many times: %d\n", closeCounter);
return 0;
loff_t simple_char_driver_seek (struct file *pfile, loff_t offset, int whence)
printk(KERN_ALERT"We are now seeking!");
case 0:{
if(offset<= end_of_buffer && offset >0){
place_in_buffer = offset;
printk(KERN_ALERT" this is where we are in the buffer: %d\n", place_in_buffer);
printk(KERN_ALERT"ERROR you are attempting to go ouside the Buffer");
case 1:{
if(((place_in_buffer+offset)<= end_of_buffer)&&((place_in_buffer+offset)>0)){
place_in_buffer = place_in_buffer+offset;
printk(KERN_ALERT" this is where we are in the buffer: %d\n", place_in_buffer);
printk(KERN_ALERT"ERROR you are attempting to go ouside the Buffer");
case 2:{//THIS IS SEEK END
if((end_of_buffer-offset)>=0&& offset>0){
place_in_buffer = end_of_buffer-offset;
printk(KERN_ALERT" this is where we are in the buffer: %d\n", place_in_buffer);
printk(KERN_ALERT"ERROR you are attempting to go ouside the Buffer");
printk(KERN_ALERT"I sought %d\n", whence);
return place_in_buffer;
struct file_operations simple_char_driver_file_operations = {
.owner = THIS_MODULE,
.read = simple_char_driver_read,
.write = simple_char_driver_write,
.open = simple_char_driver_open,
.llseek = &simple_char_driver_seek,
.release = simple_char_driver_close,
static int simple_char_driver_init(void)
printk(KERN_ALERT "inside %s function\n",__FUNCTION__);
register_chrdev(MAJOR_NUMBER,DEVICE_NAME, &simple_char_driver_file_operations);
device_buffer = kmalloc(BUFFER_SIZE, GFP_KERNEL);
return 0;
static void simple_char_driver_exit(void)
printk(KERN_ALERT "inside %s function\n",__FUNCTION__);
unregister_chrdev(MAJOR_NUMBER, DEVICE_NAME);
As I said before, this file makes properly with no errors or warnings.
However, currently if I try to echo to the device file
using: echo "hello world" >> /dev/simple_char_driver
The terminal I am using crashes
If I then reopen a terminal, and use: cat /dev/simple_char_driver
then the terminal returns killed.
I am completely lost as to what is going wrong, and I have been searching for a solution for a very long time without success. If anyone has any insight into what is going wrong, please let me know.
Edit: As a user below suggested, I removed all code from my read and write methods except for the printk and the return, to make sure the functions were being triggered.
When I then used echo, dmesg showed that the write printk was triggered, and the device(which I had had open) closed. When I then tried to cat the device file, dmesg showed that the device reopened, the "ready from device" printk showed up succesfully, and then the device closed again. However, echo did not actually find anything to read from the device file, despite my having echoed "Hello world" into it immediately before.
Final functioning read and write functions are as follows:
ssize_t simple_char_driver_read (struct file *pfile, char __user *buffer, size_t length, loff_t *offset)
if (*offset > BUFFER_SIZE)
printk("offset is greater than buffer size");
return 0;
if (*offset + length > BUFFER_SIZE)
length = BUFFER_SIZE - *offset;
if (copy_to_user(buffer, device_buffer + *offset, length) != 0)
return -EFAULT;
*offset += length;
return length;
ssize_t simple_char_driver_write (struct file *pfile, const char __user *buffer, size_t length, loff_t *offset){
/* *buffer is the userspace buffer where you are writing the data you want to be written in the device file*/
/* length is the length of the userspace buffer*/
/* current position of the opened file*/
/* copy_from_user function: destination is device_buffer and source is the userspace buffer *buffer */
int nb_bytes_to_copy;
if (BUFFER_SIZE - 1 -*offset <= length)
nb_bytes_to_copy= BUFFER_SIZE - 1 -*offset;
printk("BUFFER_SIZE - 1 -*offset <= length");
else if (BUFFER_SIZE - 1 - *offset > length)
nb_bytes_to_copy = length;
printk("BUFFER_SIZE - 1 -*offset > length");
printk(KERN_INFO "Writing to device\n");
if (*offset + length > BUFFER_SIZE)
printk("sorry, can't do that. ");
return -1;
printk("about to copy from device");
copy_from_user(device_buffer + *offset, buffer, nb_bytes_to_copy);
device_buffer[*offset + nb_bytes_to_copy] = '\0';
*offset += nb_bytes_to_copy;
return nb_bytes_to_copy;
Your code in general leaves much to be desired, but what I can see at the moment is that your .write implementation might be dubious. There are two possible mistakes - the absence of buffer boundaries check and disregard of null-termination which may lead to undefined behaviour of strlen().
First of all, you know the size of your buffer - BUFFER_SIZE. Therefore, you should carry out a check that *offset + length < BUFFER_SIZE. It should be < and not <= because anyhow the last byte shall be reserved for null-termination. So, such a check shall make the method return immediately if no space is available (else branch or >=). I can't say for sure whether you should return 0 to report that nothing has been written or use a negative value to return an error code, say, -ENOBUFS or -ENOSPC. Anyhow, the return value of the method is ssize_t meaning that negative value may be returned.
Secondly, if your first check succeeds, your method shall calculate actual space available for writing. I.e., you can make use of MIN(A, B) macro to do this. In other words, you'd better create a variable, say, nb_bytes_to_copy and initialise it like nb_bytes_to_copy = MIN(BUFFER_SIZE - 1 - *offset, length) so that you can use it later in copy_from_user() call. If the user, say, requests to write 5 bytes of data starting at the offset of 1021 bytes, then your driver will allow to write only 2 bytes of the data - say, he instead of hello. Also, the return value shall be set to nb_bytes_to_copy so that the caller will be able to detect the buffer space shortage.
Finally, don't forget about null termination. As soon as you've done with
copy_from_user(device_buffer + *offset, buffer, nb_bytes_to_copy);
you shall pay attention to do something like
device_buffer[*offset + nb_bytes_copy] = '\0';
Alternatively, if I recall correctly, you may use a special function like strncopy_from_user() to make sure that the data is copied with an implicit null termination.
Also, although a null-terminated write shall not cause problems with subsequent strlen(), I doubt that you ever need it. You can simply do *offset += nb_bytes_to_copy.
By the way, I'd recommend to name the arguments/variables in a more descriptive way. *offset is an eyesore. It would look better if named *offsetp. If your method becomes huge, an average reader will unlikely remember that offset is a pointer and not a value. offsetp where p stands for "pointer" will ease the job of anyone who will support your code in future.
To put it together, I doubt your .write implementation and suggest that you rework it. If some other mistakes persist, you will need to debug them further. Adding debug printouts may come in handy, but please revisit the basic points first, such as null-termination and buffer boundary protection. To make my answer a little bit more useful for you, I furnish it with the link to the section 3.7 of "Linux Device Drivers 3" book which will shed light on the topic under discussion.

mutex and its effect on execution time (and cpu usage)

I wrote a very simple test program to examine efficiency of pthread mutex. But I'm not able to analyse the results I get. (I can see 4 CPUs in Linux System Monitor and that's why I have at least 4 active threads, because I want to keep all of them busy.) The existence of mutex is not necessary in the code.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
pthread_mutex_t lock1, lock2, lock3, lock4;
void do_sth() { /* just open a files, read it and copy to another file */
int i;
for (i = 0; i < 1; i++) {
FILE* fp = fopen("(2) Catching Fire.txt", "r");
if (fp == NULL) {
fprintf(stderr, "could not open file\n");
char filename[20];
sprintf(filename, "a%d", (int)pthread_self());
FILE* wfp = fopen(filename, "w");
if (wfp == NULL) {
fprintf(stderr, "could not open file for write\n");
int c;
while (c = fgetc(fp) != EOF) {
fputc(c, wfp);
void* routine1(void* param) {
void* routine2(void* param) {
void* routine3(void* param) {
void* routine4(void* param) {
int main(int argc, char** argv) {
int i ;
pthread_mutex_init(&lock1, 0);
pthread_mutex_init(&lock2, 0);
pthread_mutex_init(&lock3, 0);
pthread_mutex_init(&lock4, 0);
pthread_t thread1[4];
pthread_t thread2[4];
pthread_t thread3[4];
pthread_t thread4[4];
for (i = 0; i < 4; i++)
pthread_create(&thread1[i], NULL, routine1, NULL);
for (i = 0; i < 4; i++)
pthread_create(&thread2[i], NULL, routine2, NULL);
for (i = 0; i < 4; i++)
pthread_create(&thread3[i], NULL, routine3, NULL);
for (i = 0; i < 4; i++)
pthread_create(&thread4[i], NULL, routine4, NULL);
for (i = 0; i < 4; i++)
pthread_join(thread1[i], NULL);
for (i = 0; i < 4; i++)
pthread_join(thread2[i], NULL);
for (i = 0; i < 4; i++)
pthread_join(thread3[i], NULL);
for (i = 0; i < 4; i++)
pthread_join(thread4[i], NULL);
printf("Hello, World!\n");
I execute this program in two ways, with and without all the mutex. and I measure time of execution (using time ./a.out) and average cpu load (using htop). here is the results:
first: when I use htop, I can see that loadavg of the system considerably increases when I do not use any mutex in the code. I have no idea why this happens. (is 4 active threads not enough to get the most out of 4 CPUs?)
second: It takes (a little) less time for the program to execute with all those mutex than without it. why does it happen? I mean, it should take some time to sleep and wake up a thread.
edit: I guess, when I use locks I put other threads to sleep and it eliminates a lot of context-switch (saving some time), could this be the reason?
You are using one lock per thread, so that's why when you use all the mutexes you don't see an increase in the execution time of the application: dosth() is not actually being protected from concurrent execution.
Since all the threads are working on the same file, all they should be accessing it using the same lock (otherwise you will have incorrect results: all the threads trying to modify the file at the same time).
Try running again the experiments using just one global lock.

MMAP fails above 4k size

This is my first post so please let me know if there is any mistake from .
My aim is to get approx 150MBytes of data transfer from KERNEL to user space.
[This is because i am building an driver for DMA device on OMAP l138 to transfer and receive data between DMA DEVICE and FPGA]
Now in LINUX KERNEL i am allocating BUFFER of VARIABLE size using dma_alloc_coherent
Then the PHYSICAL address of this buffer i am passing to user space to be user as
OFFSET parameter to be used for mmap call from user space .
Then from data is copied and read back to and from from user space to kernel
This logic work fine till size of buffer is 4096. Above 4k the mmap fails and return "MAP_FAILED"
static int driver_mmap(struct file *f, struct vm_area_struct *vma)
u32bit ret;
u32bit size = (vma->vm_end)-(vma->vm_start);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
if (size > (NUM_PAGE*PAGE_SIZE)){
if ((ret = remap_pfn_range(vma,vma->vm_start,
(virt_to_phys((void *)krnl_area) >> PAGE_SHIFT),
size,vma->vm_page_prot)) < 0)
return ret;
printk("\nDVR:The MMAP returned %x to USER SAPCE \n",ret);
return 0;
dmasrc_ptr = dma_alloc_coherent( NULL ,GLOBAL_BUFFER_SIZE , &dmasrc ,0);
if( !dmasrc_ptr ) {
printk(KERN_INFO "DMA_ALLOC_FAILED for the source buffer ...\n");
return -ENOMEM;
printk( "\n--->The address of SRC is %x..\n",dmasrc_ptr);
// Round the allocated KERNEL MEMORY to the page bondary
krnl_area=(int *)((((unsigned long)dmasrc_ptr) + PAGE_SIZE - 1)&PAGE_MASK);
printk(KERN_CRIT "DVR:The KERNEL VIRTUAL ADDRS is %x..\n",krnl_area);
// Marking the PAGES as RESERVED
for (i = 0; i < (NUM_PAGE * PAGE_SIZE); i+= PAGE_SIZE) {
SetPageReserved(virt_to_page(((unsigned long)krnl_area) + i));
//Application code part
printf("USR:Please enter your requirement ");
printf("\nThe OPTION is %d..\n",option);
case 1 :
ret = ioctl(dev_FD ,IOCTL_UPP_START, &info);
if (ret < 0) {
printf("dma buffer info ioctl failed\n");
offset = info.var;
printf("THE ADDRESS WE GOT IS %X..\n",offset);
case 2 :
printf("THE OFFSET is %X..\n",offset);
if (mmap_Ptr == MAP_FAILED){
printf("USR[UPP] :MMAP FAiled \n\n");
printf("THE MMAP address is %X..\n",mmap_Ptr);
case 3:
struct upp_struct user_local_struct;
for (i = 0; i <(1024);i++) {
printf("WR:%X ",*(mmap_Ptr+i));
ioctl(dev_FD , IOCTL_UPP_WRITE ,&user_local_struct);
case 4:
struct upp_struct user_local_struct;
ioctl(dev_FD , IOCTL_UPP_READ,&user_local_struct);
for (i = 0; i <(1024);i++) {
printf("USR:You have entered an wrong option \n");
printf("\nUSR:CLosing the FILE ENTERIES ...\n");
use get_free_pages to allocate multiple pages, or use vmalloc but you need call remap_pfn_range at every page basis as vmalloc-ed physical memory could be not physically continuous.
