How does limits on the shared memory work on Linux

How does limits on the shared memory work on Linux - linux

I was looking into the Linux kernel limits on the shared memory
/proc/sys/kernel/shmall
specifies the maximum amount of pages that can be allocated. Considering this number as x and the page size as p. I assume that "x * p" bytes is the limit on the system wide shared memory.
Now I wrote a small program to create a shared memory segment and i attached to that shared memory segment twice as below
shm_id = shmget(IPC_PRIVATE, 4*sizeof(int), IPC_CREAT | 0666);
if (shm_id < 0) {
printf("shmget error\n");
exit(1);
}
printf("\n The shared memory created is %d",shm_id);
ptr = shmat(shm_id,NULL,0);
ptr_info = shmat(shm_id,NULL,0);
In the above program ptr and ptr_info were different. So the shared memory is mapped to 2 virtual addresses in my process address space.
When I do an ipcs it looks like this
...
0x00000000 1638416 sun 666 16000000 2
...
Now coming to the shmall limit x * p noted above in my question. Is this limit applicable on the sum of all the virtual memory allocated for every shared memory segment? or does this limit apply on the physical memory?
Physical memory is only one here (shared memory) and from the program above when I do 2 shmat's there is twice the amount of memory allocated in my process address space. So this limit will hit soon if do continuous shmat's on a single shared memory segment?

The limit only applies to physical memory, that is the real shared memory allocated for all segments, because shmat() just maps that allocated segment into process address space.
You can trace it in the kernel, there is only one place where this limit is checked — in the newseg() function that allocates new segments (ns->shm_ctlall comparison). shmat() implementation is busy with a lot of stuff, but doesn't care at all about shmall limit, so you can map one segment as many times as you want to (well, address space is also limited, but in practice you rarely care about this limit).
You can also try some test from userspace with a simple program like this one:
#define _GNU_SOURCE
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <unistd.h>
unsigned long int get_shmall() {
FILE *f = NULL;
char buf[512];
unsigned long int value = 0;
if ((f = fopen("/proc/sys/kernel/shmall", "r")) != NULL) {
if (fgets(buf, sizeof(buf), f) != NULL)
value = strtoul(buf, NULL, 10); // no proper checks
fclose(f); // no return value check
}
return value;
}
int set_shmall(unsigned long int value) {
FILE *f = NULL;
char buf[512];
int retval = 0;
if ((f = fopen("/proc/sys/kernel/shmall", "w")) != NULL) {
if (snprintf(buf, sizeof(buf), "%lu\n", value) >= sizeof(buf) ||
fwrite(buf, 1, strlen(buf), f) != strlen(buf))
retval = -1;
fclose(f); // fingers crossed
} else
retval = -1;
return retval;
}
int main()
{
int shm_id1 = -1, shm_id2 = -1;
unsigned long int shmall = 0, shmused, newshmall;
void *ptr1, *ptr2;
struct shm_info shminf;
if ((shmall = get_shmall()) == 0) {
printf("can't get shmall\n");
goto out;
}
printf("original shmall: %lu pages\n", shmall);
if (shmctl(0, SHM_INFO, (struct shmid_ds *)&shminf) < 0) {
printf("can't get SHM_INFO\n");
goto out;
}
shmused = shminf.shm_tot * getpagesize();
printf("shmused: %lu pages (%lu bytes)\n", shminf.shm_tot, shmused);
newshmall = shminf.shm_tot + 1;
if (set_shmall(newshmall) != 0) {
printf("can't set shmall\n");
goto out;
}
if (get_shmall() != newshmall) {
printf("something went wrong with shmall setting\n");
goto out;
}
printf("new shmall: %lu pages (%lu bytes)\n", newshmall, newshmall * getpagesize());
printf("shmget() for %u bytes: ", (unsigned int) getpagesize());
shm_id1 = shmget(IPC_PRIVATE, (size_t)getpagesize(), IPC_CREAT | 0666);
if (shm_id1 < 0) {
printf("failed: %s\n", strerror(errno));
goto out;
}
printf("ok\nshmat 1: ");
ptr1 = shmat(shm_id1, NULL, 0);
if (ptr1 == 0) {
printf("failed\n");
goto out;
}
printf("ok\nshmat 2: ");
ptr2 = shmat(shm_id1, NULL, 0);
if (ptr2 == 0) {
printf("failed\n");
goto out;
}
printf("ok\n");
if (ptr1 == ptr2) {
printf("ptr1 and ptr2 are the same with shm_id1\n");
goto out;
}
printf("shmget() for %u bytes: ", (unsigned int) getpagesize());
shm_id2 = shmget(IPC_PRIVATE, (size_t)getpagesize(), IPC_CREAT | 0666);
if (shm_id2 < 0)
printf("failed: %s\n", strerror(errno));
else
printf("ok, although it's wrong\n");
out:
if (shmall != 0 && set_shmall(shmall) != 0)
printf("failed to restrore shmall\n");
if (shm_id1 >= 0 && shmctl(shm_id1, IPC_RMID, NULL) < 0)
printf("failed to remove shm_id1\n");
if (shm_id2 >= 0 && shmctl(shm_id2, IPC_RMID, NULL) < 0)
printf("failed to remove shm_id2\n");
return 0;
}
What is does is it sets the shmall limit just one page above what is currently used by the system, then tries to get page-sized new segment and map it twice (all successfully), then tries to get one more page-sized segment and fails to do that (execute the program as superuser because it writes to /proc/sys/kernel/shmall):
$ sudo ./a.out
original shmall: 18446744073708503040 pages
shmused: 21053 pages (86233088 bytes)
new shmall: 21054 pages (86237184 bytes)
shmget() for 4096 bytes: ok
shmat 1: ok
shmat 2: ok
shmget() for 4096 bytes: failed: No space left on device

I did not find any Physical memory allocation at do_shmat function (linux/ipc/shm.c)
https://github.com/torvalds/linux/blob/5469dc270cd44c451590d40c031e6a71c1f637e8/ipc/shm.c
so shmat consumes only vm (your process address space),
the main function of shmat is mmap

Related

Get the serial number of the volume udf cd / dvd disk?

I'm writing a program in linux which counts the serial number (xxxx-xxxx) of the volume of the CD in Windows 7. My program correctly determines the serial number of the volume on disks with the filesystems iso9660 and joilet. But how to define a disk volume sniffer with a file system udf? Can someone tell me ....
ps if anyone does not understand I'm talking about the serial number of this kind https://extra-torrent.jimdo.com/2016/01/23/hard-disk-volume-serial-number-change/
#include <QCoreApplication>
#include <stdio.h>
#include <sys/ioctl.h>
#include <linux/cdrom.h>
#include <string.h>
#include <szi/szimac.h>
#include <qfile.h>
#include <iostream>
#include <QDir>
#include <unistd.h>
#define SEC_SIZE 2048
#define VD_N 16
#define VD_TYPE_SUPP 2
#define VD_TYPE_END 255
#define ESC_IDX 88
#define ESC_LEN 3
#define ESC_UCS2L1 "%/#"
#define ESC_UCS2L2 "%/C"
#define ESC_UCS2L3 "%/E"
using namespace std;
int cdid(unsigned char pvd[SEC_SIZE])
{
unsigned char part[4] = {0};
int i;
for(i = 0; i < SEC_SIZE; i += 4)
{
part[3] += pvd[i + 0];
part[2] += pvd[i + 1];
part[1] += pvd[i + 2];
part[0] += pvd[i + 3];
}
return (part[3] << 24) + (part[2] << 16) + (part[1] << 8) + part[0];
}
int main(int argc, char *argv[])
{
FILE *in;
unsigned char buf[SEC_SIZE];
struct cdrom_multisession msinfo;
long session_start;
int id;
QString home=QString(getenv("HOME"))+QString("/chteniestorm");
QFile file(home);
ustr="/dev/sr0";
in = fopen(ustr.toLocal8Bit().data(), "rb");
if(in == NULL)
{
if (file.open(QIODevice::WriteOnly))
{
file.write("sernom=1");
file.close();
}
cout<<"netdiska"<<endl;
return 0;
}
/* Get session info */
msinfo.addr_format = CDROM_LBA;
if(ioctl(fileno(in), CDROMMULTISESSION, &msinfo) != 0)
{
fprintf(stderr, "WARNING: Can't get multisession info\n");
perror(NULL);
session_start = 0;
}
else
{
session_start = msinfo.addr.lba;
}
fseek(in, 0, SEEK_SET); //to the begining
/* Seek to primary volume descriptor */
if(fseek(in, (session_start + VD_N) * SEC_SIZE, SEEK_SET) != 0)
{
if (file.open(QIODevice::WriteOnly))
{
file.write("sernom=2");
file.close();
}
fclose(in);
return 0;
}
/* Read descriptor */
if(fread(buf, 1, SEC_SIZE, in) != SEC_SIZE)
{
if (file.open(QIODevice::WriteOnly))
{
file.write("sernom=3");
file.close();
}
fclose(in);
return 0;
}
/* Caclculate disc id */
id = cdid(buf);
/* Search for Joliet extension */
while(buf[0] != VD_TYPE_END)
{
/* Read descriptor */
if(fread(buf, 1, SEC_SIZE, in) != SEC_SIZE)
{
perror(NULL);
return 0;
}
if(buf[0] == VD_TYPE_SUPP
&& (memcmp(buf + ESC_IDX, ESC_UCS2L1, ESC_LEN) == 0
|| memcmp(buf + ESC_IDX, ESC_UCS2L2, ESC_LEN) == 0
|| memcmp(buf + ESC_IDX, ESC_UCS2L3, ESC_LEN) == 0)
)
{
/* Joliet found */
id = cdid(buf);
}
}
fclose(in);
}

It looks like this question was asked on more places [1], [2], [3], [4] but nowhere was answered yet. So I will do it here.
In some of those posts people decoded serial number generation algorithm. It is just checksum which you already have found and put into your cdid() function. Same checksum algorithm is used for both ISO9660 and UDF filesystems on Windows. You have already figured out from which ISO9660 structures is that checksum calculated.
So your question remain just for UDF filesystem. For UDF filesystem on Windows that checksum is calculated from the 512 bytes long File Set Descriptor (FSD) structure. I would suggest you to read OSTA UDF specification how to locale that FSD on UDF disc.
Basically for plain UDF which do not use Virtual Allocation Table (VAT), Sparing Table or Metadata Partition, location of the FSD is stored in Logical Volume Descriptor (LVD) structure, in field LogicalVolumeContentsUse (it is of type long_ad). LVD is stored in the Volume Descriptor Sequence (VDS). VDS's location is stored in Anchor Volume Descriptor Pointer (AVDP), in field MainVolumeDescriptorSequenceExtent. AVDP itself is located at sector 256 of medium. Optical media have sector size 2048 bytes and common hard disk 512 bytes.
For UDF with VAT (e.g. on CD-R/DVD-R/BD-R), Sparing Table (e.g. on CD-RW/DVD-RW) or Metadata Partition (e.g. on Blu-ray), it is much more complicated. You need to look into Virtual, Sparable or Metadata Partition to figure out how to translate logical location of the FSD to physical location of media.
In udftools project starting with version 2.0, there is a new tool udfinfo which provides various information about UDF filesystem. It shows also that Windows specific Volume Serial Number from your question under winserialnum key. Note that udfinfo cannot read FSD from UDF filesystem with VAT or Metadata yet.

Why does my process take too long to die?

Basically I'm using Linux 2.6.34 on PowerPC (Freescale e500mc). I have a process (a kind of VM that was developed in-house) that uses about 2.25 G of mlocked VM. When I kill it, I notice that it takes upwards of 2 minutes to terminate.
I investigated a little. First, I closed all open file descriptors but that didn't seem to make a difference. Then I added some printk in the kernel and through it I found that all delay comes from the kernel unlocking my VMAs. The delay is uniform across pages, which I verified by repeatedly checking the locked page count in /proc/meminfo. I've checked with programs that allocate that much memory and they all die as soon as I signal them.
What do you think I should check now? Thanks for your replies.
Edit: I had to find a way to share more information about the problem so I wrote this below program:
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
#include <sys/time.h>
#define MAP_PERM_1 (PROT_WRITE | PROT_READ | PROT_EXEC)
#define MAP_PERM_2 (PROT_WRITE | PROT_READ)
#define MAP_FLAGS (MAP_ANONYMOUS | MAP_FIXED | MAP_PRIVATE)
#define PG_LEN 4096
#define align_pg_32(addr) (addr & 0xFFFFF000)
#define num_pg_in_range(start, end) ((end - start + 1) >> 12)
inline void __force_pgtbl_alloc(unsigned int start)
{
volatile int *s = (int *) start;
*s = *s;
}
int __map_a_page_at(unsigned int start, int whichperm)
{
int perm = whichperm ? MAP_PERM_1 : MAP_PERM_2;
if(MAP_FAILED == mmap((void *)start, PG_LEN, perm, MAP_FLAGS, 0, 0)){
fprintf(stderr,
"mmap failed at 0x%x: %s.\n",
start, strerror(errno));
return 0;
}
return 1;
}
int __mlock_page(unsigned int addr)
{
if (mlock((void *)addr, (size_t)PG_LEN) < 0){
fprintf(stderr,
"mlock failed on page: 0x%x: %s.\n",
addr, strerror(errno));
return 0;
}
return 1;
}
void sigint_handler(int p)
{
struct timeval start = {0 ,0}, end = {0, 0}, diff = {0, 0};
gettimeofday(&start, NULL);
munlockall();
gettimeofday(&end, NULL);
timersub(&end, &start, &diff);
printf("Munlock'd entire VM in %u secs %u usecs.\n",
diff.tv_sec, diff.tv_usec);
exit(0);
}
int make_vma_map(unsigned int start, unsigned int end)
{
int num_pg = num_pg_in_range(start, end);
if (end < start){
fprintf(stderr,
"Bad range: start: 0x%x end: 0x%x.\n",
start, end);
return 0;
}
for (; num_pg; num_pg --, start += PG_LEN){
if (__map_a_page_at(start, num_pg % 2) && __mlock_page(start))
__force_pgtbl_alloc(start);
else
return 0;
}
return 1;
}
void display_banner()
{
printf("-----------------------------------------\n");
printf("Virtual memory allocator. Ctrl+C to exit.\n");
printf("-----------------------------------------\n");
}
int main()
{
unsigned int vma_start, vma_end, input = 0;
int start_end = 0; // 0: start; 1: end;
display_banner();
// Bind SIGINT handler.
signal(SIGINT, sigint_handler);
while (1){
if (!start_end)
printf("start:\t");
else
printf("end:\t");
scanf("%i", &input);
if (start_end){
vma_end = align_pg_32(input);
make_vma_map(vma_start, vma_end);
}
else{
vma_start = align_pg_32(input);
}
start_end = !start_end;
}
return 0;
}
As you would see, the program accepts ranges of virtual addresses, each range being defined by start and end. Each range is then further subdivided into page-sized VMAs by giving different permissions to adjacent pages. Interrupting (using SIGINT) the program triggers a call to munlockall() and the time for said procedure to complete is duly noted.
Now, when I run it on freescale e500mc with Linux version at 2.6.34 over the range 0x30000000-0x35000000, I get a total munlockall() time of almost 45 seconds. However, if I do the same thing with smaller start-end ranges in random orders (that is, not necessarily increasing addresses) such that the total number of pages (and locked VMAs) is roughly the same, observe total munlockall() time to be no more than 4 seconds.
I tried the same thing on x86_64 with Linux 2.6.34 and my program compiled against the -m32 parameter and it seems the variations, though not so pronounced as with ppc, are still 8 seconds for the first case and under a second for the second case.
I tried the program on Linux 2.6.10 on the one end and on 3.19, on the other and it seems these monumental differences don't exist there. What's more, munlockall() always completes at under a second.
So, it seems that the problem, whatever it is, exists only around the 2.6.34 version of the Linux kernel.

You said the VM was developed in-house. Does this mean you have access to the source? I would start by checking to see if it has anything to stop it from immediately terminating to avoid data loss.
Otherwise, could you potentially try to provide more information? You may also want to check out: https://unix.stackexchange.com/ as they would be better suited to help with any issues the linux kernel may be having.

why file with hole has smaller disk block than file without hole?

#include <fcntl.h>
#include <unistd.h>
char buf1[] = "abcdefghij";
char buf2[] = "ABCDEFGHIJ";
char buf3[10];
int
main(void)
{
int fd;
if ((fd = creat("file.hole", FILE_MODE)) < 0) {
err_sys("creat error");
}
if (write(fd, buf1, 10) != 10) { // offset is now = 10
err_sys("buf1 write error");
}
if (lseek(fd, 16380, SEEK_SET) == -1) { // offset now = 16380
err_sys("lseek error");
}
if (write(fd, buf2, 10) != 10) { // offset now = 16390
err_sys("buf2 write error");
}
close(fd);
if ((fd = open("file.hole", O_RDWR)) == -1) {
err_sys("failed to re-open file");
}
ssize_t n;
ssize_t m;
while ((n = read(fd, buf3, 10)) > 0) {
if ((m = write(STDOUT_FILENO, buf3, 10)) != 10) {
err_sys("stdout write error");
}
}
if (n == -1) {
err_sys("buf3 read error");
}
exit(0);
}
I'm newbie in unix system programming
There is code making file with hole.
Output result is:
$ls -ls file.hole file.nohole
8 -rw-r--r-- 1 sar 16394 time file.hole
20 -rw-r--r-- 1 sar 16394 time file.nohole
Why file with hole has fewer disk block than file without hole?
In my thinking, file without hole takes smaller disk blocks
Because file with hole is more spreaded than without hole..
From "Advanced Programming in the UNIX Environment 3rd-Stevens Rago, example 3.2"

Why do you think that a file without hole takes smaller space ? This exactly the contrary.
If the file has holes, then it is not necessary to reserve disk blocks for that space.
The number of disk blocks is not related to the spreading of the file, but directly related to the size of the data you wrote in the file.

The distribution of the data blocks on the hard disk doesn't count against the number of blocks which the file system needs to store the data. It really doesn't matter if the blocks are close together or far away since the file system can use the blocks between for different files.
So the output shows you that file.hole only occupies 8 blocks in the hard disk, not where they are.

mmap /dev/fb0 fails with "Invalid argument"

I have an embedded system and want to use /dev/fb0 directly. As a first test, I use some code based on example-code found everywhere in the net and SO. Opening succeeds, also fstat and similar. But mmap fails with EINVAL.
Source:
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <linux/fb.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
int main() {
int fbfd = 0;
struct fb_var_screeninfo vinfo;
struct fb_fix_screeninfo finfo;
long int screensize = 0;
char *fbp = 0;
int x = 0, y = 0;
long int location = 0;
// Open the file for reading and writing
fbfd = open("/dev/fb0", O_RDWR);
if (fbfd == -1) {
perror("Error: cannot open framebuffer device");
exit(1);
}
printf("The framebuffer device was opened successfully.\n");
struct stat stat;
fstat(fbfd, &stat);
printf("/dev/mem -> size: %u blksize: %u blkcnt: %u\n",
stat.st_size, stat.st_blksize, stat.st_blocks);
// Get fixed screen information
if (ioctl(fbfd, FBIOGET_FSCREENINFO, &finfo) == -1) {
perror("Error reading fixed information");
exit(2);
}
// Get variable screen information
if (ioctl(fbfd, FBIOGET_VSCREENINFO, &vinfo) == -1) {
perror("Error reading variable information");
exit(3);
}
printf("%dx%d, %dbpp\n", vinfo.xres, vinfo.yres, vinfo.bits_per_pixel);
// Figure out the size of the screen in bytes
screensize = vinfo.xres * vinfo.yres * vinfo.bits_per_pixel / 8;
const int PADDING = 4096;
int mmapsize = (screensize + PADDING - 1) & ~(PADDING-1);
// Map the device to memory
fbp = (char *)mmap(0, mmapsize, PROT_READ | PROT_WRITE, MAP_SHARED, fbfd, 0);
if ((int)fbp == -1) {
perror("Error: failed to map framebuffer device to memory");
exit(4);
}
printf("The framebuffer device was mapped to memory successfully.\n");
munmap(fbp, screensize);
close(fbfd);
return 0;
}
Output:
The framebuffer device was opened successfully.
/dev/mem -> size: 0 blksize: 4096 blkcnt: 0
640x480, 4bpp
Error: failed to map framebuffer device to memory: Invalid argument
strace:
...
open("/dev/fb0", O_RDWR) = 3
write(1, "The framebuffer device was opene"..., 48The framebuffer device was opened successfully.
) = 48
fstat64(3, {st_mode=S_IFCHR|0640, st_rdev=makedev(29, 0), ...}) = 0
write(1, "/dev/mem -> size: 0 blksize: 409"..., 44/dev/mem -> size: 0 blksize: 4096 blkcnt: 0
) = 44
ioctl(3, FBIOGET_FSCREENINFO or FBIOPUT_CONTRAST, 0xbfca6564) = 0
ioctl(3, FBIOGET_VSCREENINFO, 0xbfca6600) = 0
write(1, "640x480, 4bpp\n", 14640x480, 4bpp
) = 14
old_mmap(NULL, 155648, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 EINVAL (Invalid argument)
write(2, "Error: failed to map framebuffer"..., 49Error: failed to map framebuffer device to memory) = 49
write(2, ": ", 2: ) = 2
write(2, "Invalid argument", 16Invalid argument) = 16
write(2, "\n", 1
) = 1
The boot-screen with console and tux is visible. And cat /dev/urandom > /dev/fb0 fills the screen with noise. The pagesize is 4096 on the system (`getconf PAGESIZE). So, 155648 (0x26000) is a multiple. Offset and pointer are both zero. Mapping and filemode are both RW .. what am I missing?
This is for an embedded device build with uClibc and busybox running a single application and I have to port it from an ancient kernel. There is code for linedrawing and such and no need for multiprocessing/ windowing .. please no hints to directfb ;).

The kernel driver that presents the framebuffer doesn't support the legacy direct mmap() of the framebuffer device; you need to use the newer DRM and KMS interface.

linux kernel, userspace buffers, do access_ok and wait create a race condition?

In the following code (the read implementation for a char driver), is it possible for MMU TLB entries to change during wait_event_interruptible, such that __put_user causes an exception even though access_ok succeeded?
Is it possible to lock the user buffer such that it remains valid for the duration of the request?
Would repeating the access_ok check after wait_event_interruptible returns make this safe?
ssize_t mydriver_pkt_read( struct file* filp, char __user* const buff, size_t count, loff_t* offp )
{
struct mydriver_pkt_private* priv;
volatile unsigned short* iobase;
unsigned next;
char __user* p = buff;
if (count <= 0) return -EINVAL;
if (!access_ok(VERIFY_WRITE, buff, count)) return -EFAULT;
priv = (struct mydriver_pkt_private*)filp->private_data;
iobase = priv->iobase;
next = priv->retained;
if ((next & PKTBUF_FLAG_NOTEMPTY) == 0) {
next = ioread16(iobase);
if ((next & PKTBUF_FLAG_NOTEMPTY) == 0) { // no data, start blocking read
iowrite16(1, iobase); // enable interrupts
if (wait_event_interruptible(priv->wait_for_ringbuffer, (priv->retained & PKTBUF_FLAG_NOTEMPTY)))
return -ERESTARTSYS;
next = priv->retained;
}
}
while (count > 0) {
__put_user( (char)next, p );
p++;
count--;
next = ioread16(iobase);
if ((next & PKTBUF_FLAG_STARTPKT) || !(next & PKTBUF_FLAG_NOTEMPTY)) {
priv->retained = next;
return (p - buff);
}
}
/* discard remainder of packet */
do {
next = ioread16(iobase);
} while ((next & PKTBUF_FLAG_NOTEMPTY) && !(next & PKTBUF_FLAG_STARTPKT));
priv->retained = next;
return (p - buff);
}
Exclusive open code:
int mydriver_pkt_open( struct inode* inode, struct file* filp )
{
struct mydriver_pkt_private* priv;
priv = container_of(inode->i_cdev, struct mydriver_pkt_private, cdevnode);
if (atomic_cmpxchg(&priv->inuse, 0, 1))
return -EBUSY;
nonseekable_open(inode, filp);
filp->private_data = priv;
return 0;
}

Unless you have the mm_sem semaphore held, page tables can change at any time (by other threads of the same process unmapping pages from a different processor, or by evictions from page reclaim processes). You don't even need to sleep; it can happen even if you have preemption disabled, as long as the TLB shootdown interrupt can arrive. And it can happen even if interrupts are disabled, if you have a SMP machine, as you can, sometimes, see page table updates reflected even without an explicit TLB flush.
access_ok() only checks that the range of addresses does not overlap with kernel space. So it doesn't tell you anything about whether the page table entries allow access - but its result also does not change, even if you block. If access is denied, __put_user() will return -EFAULT, which must be propagated to userspace (ie, error out here with -EFAULT).
Note that the only difference between put_user() and __put_user() is that put_user() performs an access_ok() check as well. So if you're using it in a loop, doing a single access_ok() ahead of time and using __put_user() is probably the right thing to do.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How does limits on the shared memory work on Linux - linux

I did not find any Physical memory allocation at do_shmat function (linux/ipc/shm.c) https://github.com/torvalds/linux/blob/5469dc270cd44c451590d40c031e6a71c1f637e8/ipc/shm.c so shmat consumes only vm (your process address space), the main function of shmat is mmap

Related

Get the serial number of the volume udf cd / dvd disk?

Why does my process take too long to die?

why file with hole has smaller disk block than file without hole?

mmap /dev/fb0 fails with "Invalid argument"

linux kernel, userspace buffers, do access_ok and wait create a race condition?

Categories

Resources