O_DIRECT not defined on Arch LInux - linux

I'm trying to write some low latency disk access code. The issue is that the library I'm using has the following code:
#ifdef O_DIRECT
int flags = O_DIRECT;
#else
int flags = 0;
#endif
and my installation doesn't have O_DIRECT defined.
I've confirmed this via this simple program:
#include <stdio.h>
int main(void){
#ifdef O_DIRECT
printf("O_DIRECT");
#else
printf("Otherwise");
#endif
}
Which prints Otherwise.
So the question is why is this not defined? And additionally how to resolve this?

The macro is actually defined in <fcntl.h>, not stdio.h.
Second, to access the definition, you need to #define _GNU_SOURCE as it is Linux-specific. Note that the definition must go before any libc header includes, not just before fcntl.h.

Related

Is there a linux command (along the lines of cat or dd) that will allow me to specify the offset of the read syscall?

I am working on a homework assignment for an operating systems class, and we are implementing basic versions of certain file system operations using FUSE.
The only operation that we are implementing that I couldn't test to a point I was happy with was the read() syscall. I am having trouble finding a way to get the read() syscall to be called with an offset other than 0.
I tried some of the commands (like dd, head, and tail) mentioned in answers to this question, but by the time that they reached my implementation of the read() syscall the offset was 0. To clarify, when I called these commands I received (at the calling terminal) the bytes in the file that were specified in the calls, but in another terminal that was displaying the syscalls that were being handled by FUSE, and hence my implementations, it displayed that my implementation of the read() syscall was always being called with offset 0 (and usually size of 4096, which I presume is the block size of the real linux file system I am using). I assume that these commands are making read() syscalls in blocks of 4096 bytes, then internally (i.e., within the dd, head, or tail command's code rather than through syscalls) modifying the output to what is seen on the calling terminal.
Is there any command (or script) I can run (or write and then run in the case of the script) that will allow me to test this syscall with varying offset values?
I figured out the issue I was having. For posterity, I will record the answer rather than just delete my question, because the answer wasn't necessarily easy to find.
Essentially, the issue occurred within FUSE. FUSE defaults to not using direct I/O (which is definitely the correct default to have, don't get me wrong), which is what resulted in the reads in size chunks of 4096 (these are the result of FUSE using a page cache of file contents [AKA a file content cache] in the kernel). For what I wanted to test (as explained in the question), I needed to enable direct I/O. There are a few ways of doing this, but the simplest way for me to do this was to pass -o direct_io as a command line argument. This worked for me because I was using the fuse_main call in the main function of my program.
So my main function looked like this:
int main(int argc, char *argv[])
{
return fuse_main(argc, argv, &my_operations_structure, NULL);
}
and I was able to call my program like this (I used the -d option in addtion to the -o direct_io option in order to display the syscalls that FUSE was processing and the output/debug info from my program):
./prog_name -d -o direct_io test_directory
Then, I tested my program with the following simple test program (I know I don't do very much error checking, but this program is only for some quick and dirty tests):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
int main(int argc, char *argv[])
{
FILE * file;
char buf[4096];
int fd;
memset(&buf[0], 0, sizeof(buf));
if (argc != 4)
{
printf("usage: ./readTest [size] [offset] [filename]\n");
return 0;
}
file = fopen(argv[3], "r");
if (file == NULL)
{
printf("Couldn't open file\n");
return -1;
}
fd = fileno(file);
pread(fd, (void *) buf, atoi(argv[1]), (off_t) atoi(argv[2]));
printf("%s\n", buf);
return 0;
}

Controlling the memory map of another process

Is it possible, somehow, to change the memory map of another process in Linux? As opposed, that is, to only being able to control it by way of code running in the process itself calling mmap.
The reason I'm asking is because I'd like to be able to build a process with a very custom memory map, and without being able to use shared libraries or even the vDSO, I don't see any way to do that inside the process itself that does not involve basically writing my own libc to handle syscalls and such. (Even if I were to link libc statically, wouldn't it attempt to use the vDSO?)
mmap mapped memory is preserved by fork but wiped by system calls from the exec family, therefore this can't be acheived by the classic sequence fork, setup stuff then exec.
A simple solution is to use an LD_PRELOAD hook. Let's place this code in add_mmap.c.
#include <sys/mman.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
void add_mmap(void) __attribute__((constructor));
void add_mmap(void)
{
int fd;
void *addr;
printf("calling mmap() before main...\n");
fd = open("/etc/passwd", O_RDONLY);
printf("fd=%d\n", fd);
/* map the first 100 bytes of the file to an address chosen by the kernel */
addr = mmap(0, 100, PROT_READ, MAP_SHARED, fd, 0);
printf("addr=%llx\n", (long long unsigned)addr);
}
Then, build it as a dynamic library:
gcc -Wall -g -fPIC -shared -o add_mmap.so add_mmap.c
And finally run some existing program with it:
$ LD_PRELOAD=./add_mmap.so /bin/cat
calling mmap() before main...
fd=3
addr=7fe4916f8000
We can check that the mapping was set up and preserved before cat was run:
$ cat /proc/27967/maps
...
7f2f7f2d0000-7f2f7f2d1000 r--s 00000000 09:00 1056387 /etc/passwd
...
EDIT
I only show here how to add a memory mapping before the program starts, but my example can be easily extended to transparently inject a "memory manger" thread inside the program. This thread would receive orders through an IPC mechanism like a socket and manipulate the mappings accordingly.

Why a segfault instead of privilege instruction error?

I am trying to execute the privileged instruction rdmsr in user mode, and I expect to get some kind of privilege error, but I get a segfault instead. I have checked the asm and I am loading 0x186 into ecx, which is supposed to be PERFEVTSEL0, based on the manual, page 1171.
What is the cause of the segfault, and how can I modify the code below to fix it?
I want to resolve this before hacking a kernel module, because I don't want this segfault to blow up my kernel.
Update: I am running on Intel(R) Xeon(R) CPU X3470.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <sched.h>
#include <assert.h>
uint64_t
read_msr(int ecx)
{
unsigned int a, d;
__asm __volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(ecx));
return ((uint64_t)a) | (((uint64_t)d) << 32);
}
int main(int ac, char **av)
{
uint64_t start, end;
cpu_set_t cpuset;
unsigned int c = 0x186;
int i = 0;
CPU_ZERO(&cpuset);
CPU_SET(i, &cpuset);
assert(sched_setaffinity(0, sizeof(cpuset), &cpuset) == 0);
printf("%lu\n", read_msr(c));
return 0;
}
The question I will try to answer: Why does the above code cause SIGSEGV instead of SIGILL, though the code has no memory error, but an illegal instruction (a privileged instruction called from non-privileged user pace)?
I would expect to get a SIGILL with si_code ILL_PRVOPC instead of a segfault, too. Your question is currently 3 years old and today, I stumbled upon the same behavior. I am disappointed too :-(
What is the cause of the segfault
The cause seems to be that the Linux kernel code decides to send SIGSEGV. Here is the responsible function:
http://elixir.free-electrons.com/linux/v4.9/source/arch/x86/kernel/traps.c#L487
Have a look at the last line of the function.
In your follow up question, you got a list of other assembly instructions which get propagated as SIGSEGV to userspace though they are actually general protection faults. I found your question because I triggered the behavior with cli.
and how can I modify the code below to fix it?
As of Linux kernel 4.9, I'm not aware of any reliable way to distinguish between a memory error (what I would expect to be a SIGSEGV) and a privileged instruction error from userspace.
There may be very hacky and unportable way to distibguish these cases. When a privileged instruction causes a SIGSEGV, the siginfo_t si_code is set to a value which is not directly listed in the SIGSEGV section of man 2 sigaction. The documented values are SEGV_MAPERR, SEGV_ACCERR, SEGV_PKUERR, but I get SI_KERNEL (0x80) on my system. According to the man page, SI_KERNEL is a code "which can be placed in si_code for any signal". In strace, you see SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0}. The responsible kernel code is here.
It would also be possible to grep dmesg for this string.
Please, never ever use those two methods to distinguish between GPF and memory error on a production system.
Specific solution for your code: Just don't run rdmsr from user space. But this answer is really unsatisfying if you are looking for a generic way to figure out why a program received a SIGSEGV.

Can I use Linux kernel linked list outside kernel code?

I want to play with kernel linked list before I use it in some part of the kernel code. But if I just include list.h, it doesn't work due to dependencies.
How can I write code using list in a single.c file e.g. test.c so that I can test my code just by compiling test.c? Looking forward to hearing from you soon.
Also, how can I use nested linked list?
You can get a userspace port from http://www.mcs.anl.gov/~kazutomo/list/list.h.
It says:
Here is a recipe to cook list.h for user space program
copy list.h from linux/include/list.h
remove
#ifdef KERNE and its #endif
all #include line
prefetch() and rcu related functions
add macro offsetof() and container_of
It is not meant to use the list in Userspace since its made for inside Kernel use and has several dependencies of kernel types and so on. You can see this by compiling your code with correct include paths:
gcc -I path-to-kernel-src/include/ test.c
When test.c contains this code:
#include <stdio.h>
#include <stdlib.h>
#include <linux/list.h>
int main(int argc, char **argv) { }
It fails to compile since there are includes in list.h which conflicts the userspace include (stdlib.h).
Nevertheless, the dependencies of such a data structures like list are pretty small. You need to sort them out in order to break the list.h dependencies from other kernel. In a short test, I removed the includes to and from list.h and added the data types struct list_head/hlist_head and hlist_node.

Question about file seeking position

My previous Question is about raw data reading and writing, but a new problem arised, it seems there is no ending....
The question is: the parameters of the functions like lseek() or fseek() are all 4 bytes. If i want to move a span over 4G, that is imposible. I know in Win32, there is a function SetPointer(...,Hign, Low,....), this pointers can generate 64 byte pointers, which is what i want.
But if i want to create an app in Linux or Unix (create a file or directly write
the raw drive sectors), How can I move to a pointer over 4G?
Thanx, Waiting for your replies...
The offset parameter of lseek is of type off_t. In 32-bit compilation environments, this type defaults to a 32-bit signed integer - however, if you compile with this macro defined before all system includes:
#define _FILE_OFFSET_BITS 64
...then off_t will be a 64-bit signed type.
For fseek, the fseeko function is identical except that it uses the off_t type for the offset, which allows the above solution to work with it too.
a 4 byte unsigned integer can represent a value up to 4294967295, which means if you want to move more than 4G, you need to use lseek64(). In addition, you can use fgetpos() and fsetpos() to change the position in the file.
On Windows, use _lseeki64(), on Linux, lseek64().
I recommend to use lseek64() on both systems by doing something like this:
#ifdef _WIN32
#include <io.h>
#define lseek64 _lseeki64
#else
#include <unistd.h>
#endif
That's all you need.

Resources