Where is sys_mount defined in linux kernel(.14.255)? - linux

I find the use of 'sys_' in many places in linux kernel,such as 'sys_mount' ,'sys_unlink'......
But I can't find where are these functions(or something else) defined.I guess they are defined in some ways like #define sys(name) sys_##name(){}.
Can you tell me where and how they are defined?The kernel version is 4.14.255.

The table of syscalls seems to be defined for each architecture, e.g.
#
# 64-bit system call numbers and entry vectors
#
# The format is:
# <number> <abi> <name> <entry point>
#
# The abi is "common", "64" or "x32" for this file.
#
0 common read sys_read
1 common write sys_write
2 common open sys_open
3 common close sys_close
...
Source: https://github.com/torvalds/linux/blob/v4.14/arch/x86/entry/syscalls/syscall_64.tbl
The functions themselves are spread throughout the code. For example, sys_read is defined as:
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
return ksys_read(fd, buf, count);
}
Source: https://github.com/torvalds/linux/blob/v4.14/fs/read_write.c#L615-L618
The SYSCALL_DEFINE3 macro is defined at https://github.com/torvalds/linux/blob/v4.14/include/linux/syscalls.h#L198.

Related

RIP register doesn't understand valid memory address [duplicate]

I want a simple C method to be able to run hex bytecode on a Linux 64 bit machine. Here's the C program that I have:
char code[] = "\x48\x31\xc0";
#include <stdio.h>
int main(int argc, char **argv)
{
int (*func) ();
func = (int (*)()) code;
(int)(*func)();
printf("%s\n","DONE");
}
The code that I am trying to run ("\x48\x31\xc0") I obtained by writting this simple assembly program (it's not supposed to really do anything)
.text
.globl _start
_start:
xorq %rax, %rax
and then compiling and objdump-ing it to obtain the bytecode.
However, when I run my C program I get a segmentation fault. Any ideas?
Machine code has to be in an executable page. Your char code[] is in the read+write data section, without exec permission, so the code cannot be executed from there.
Here is a simple example of allocating an executable page with mmap:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
int main ()
{
char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi]
0xC3 // ret
};
int (*sum) (int, int) = NULL;
// allocate executable buffer
sum = mmap (0, sizeof(code), PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
// copy code to buffer
memcpy (sum, code, sizeof(code));
// doesn't actually flush cache on x86, but ensure memcpy isn't
// optimized away as a dead store.
__builtin___clear_cache (sum, sum + sizeof(sum)); // GNU C
// run code
int a = 2;
int b = 3;
int c = sum (a, b);
printf ("%d + %d = %d\n", a, b, c);
}
See another answer on this question for details about __builtin___clear_cache.
Until recent Linux kernel versions (sometime before 5.4), you could simply compile with gcc -z execstack - that would make all pages executable, including read-only data (.rodata), and read-write data (.data) where char code[] = "..." goes.
Now -z execstack only applies to the actual stack, so it currently works only for non-const local arrays. i.e. move char code[] = ... into main.
See Linux default behavior against `.data` section for the kernel change, and Unexpected exec permission from mmap when assembly files included in the project for the old behaviour: enabling Linux's READ_IMPLIES_EXEC process for that program. (In Linux 5.4, that Q&A shows you'd only get READ_IMPLIES_EXEC for a missing PT_GNU_STACK, like a really old binary; modern GCC -z execstack would set PT_GNU_STACK = RWX metadata in the executable, which Linux 5.4 would handle as making only the stack itself executable. At some point before that, PT_GNU_STACK = RWX did result in READ_IMPLIES_EXEC.)
The other option is to make system calls at runtime to copy into an executable page, or change permissions on the page it's in. That's still more complicated than using a local array to get GCC to copy code into executable stack memory.
(I don't know if there's an easy way to enable READ_IMPLIES_EXEC under modern kernels. Having no GNU-stack attribute at all in an ELF binary does that for 32-bit code, but not 64-bit.)
Yet another option is __attribute__((section(".text"))) const char code[] = ...;
Working example: https://godbolt.org/z/draGeh.
If you need the array to be writeable, e.g. for shellcode that inserts some zeros into strings, you could maybe link with ld -N. But probably best to use -z execstack and a local array.
Two problems in the question:
exec permission on the page, because you used an array that will go in the noexec read+write .data section.
your machine code doesn't end with a ret instruction so even if it did run, execution would fall into whatever was next in memory instead of returning.
And BTW, the REX prefix is totally redundant. "\x31\xc0" xor eax,eax has exactly the same effect as xor rax,rax.
You need the page containing the machine code to have execute permission. x86-64 page tables have a separate bit for execute separate from read permission, unlike legacy 386 page tables.
The easiest way to get static arrays to be in read+exec memory was to compile with gcc -z execstack. (Used to make the stack and other sections executable, now only the stack).
Until recently (2018 or 2019), the standard toolchain (binutils ld) would put section .rodata into the same ELF segment as .text, so they'd both have read+exec permission. Thus using const char code[] = "..."; was sufficient for executing manually-specified bytes as data, without execstack.
But on my Arch Linux system with GNU ld (GNU Binutils) 2.31.1, that's no longer the case. readelf -a shows that the .rodata section went into an ELF segment with .eh_frame_hdr and .eh_frame, and it only has Read permission. .text goes in a segment with Read + Exec, and .data goes in a segment with Read + Write (along with the .got and .got.plt). (What's the difference of section and segment in ELF file format)
I assume this change is to make ROP and Spectre attacks harder by not having read-only data in executable pages where sequences of useful bytes could be used as "gadgets" that end with the bytes for a ret or jmp reg instruction.
// TODO: use char code[] = {...} inside main, with -z execstack, for current Linux
// Broken on recent Linux, used to work without execstack.
#include <stdio.h>
// can be non-const if you use gcc -z execstack. static is also optional
static const char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi] // retval = a+b;
0xC3 // ret
};
static const char ret0_code[] = "\x31\xc0\xc3"; // xor eax,eax ; ret
// the compiler will append a 0 byte to terminate the C string,
// but that's fine. It's after the ret.
int main () {
// void* cast is easier to type than a cast to function pointer,
// and in C can be assigned to any other pointer type. (not C++)
int (*sum) (int, int) = (void*)code;
int (*ret0)(void) = (void*)ret0_code;
// run code
int c = sum (2, 3);
return ret0();
}
On older Linux systems: gcc -O3 shellcode.c && ./a.out (Works because of const on global/static arrays)
On Linux before 5.5 (or so) gcc -O3 -z execstack shellcode.c && ./a.out (works because of -zexecstack regardless of where your machine code is stored). Fun fact: gcc allows -zexecstack with no space, but clang only accepts clang -z execstack.
These also work on Windows, where read-only data goes in .rdata instead of .rodata.
The compiler-generated main looks like this (from objdump -drwC -Mintel). You can run it inside gdb and set breakpoints on code and ret0_code
(I actually used gcc -no-pie -O3 -zexecstack shellcode.c hence the addresses near 401000
0000000000401020 <main>:
401020: 48 83 ec 08 sub rsp,0x8 # stack aligned by 16 before a call
401024: be 03 00 00 00 mov esi,0x3
401029: bf 02 00 00 00 mov edi,0x2 # 2 args
40102e: e8 d5 0f 00 00 call 402008 <code> # note the target address in the next page
401033: 48 83 c4 08 add rsp,0x8
401037: e9 c8 0f 00 00 jmp 402004 <ret0_code> # optimized tailcall
Or use system calls to modify page permissions
Instead of compiling with gcc -zexecstack, you can instead use mmap(PROT_EXEC) to allocate new executable pages, or mprotect(PROT_EXEC) to change existing pages to executable. (Including pages holding static data.) You also typically want at least PROT_READ and sometimes PROT_WRITE, of course.
Using mprotect on a static array means you're still executing the code from a known location, maybe making it easier to set a breakpoint on it.
On Windows you can use VirtualAlloc or VirtualProtect.
Telling the compiler that data is executed as code
Normally compilers like GCC assume that data and code are separate. This is like type-based strict aliasing, but even using char* doesn't make it well-defined to store into a buffer and then call that buffer as a function pointer.
In GNU C, you also need to use __builtin___clear_cache(buf, buf + len) after writing machine code bytes to a buffer, because the optimizer doesn't treat dereferencing a function pointer as reading bytes from that address. Dead-store elimination can remove the stores of machine code bytes into a buffer, if the compiler proves that the store isn't read as data by anything. https://codegolf.stackexchange.com/questions/160100/the-repetitive-byte-counter/160236#160236 and https://godbolt.org/g/pGXn3B has an example where gcc really does do this optimization, because gcc "knows about" malloc.
(And on non-x86 architectures where I-cache isn't coherent with D-cache, it actually will do any necessary cache syncing. On x86 it's purely a compile-time optimization blocker and doesn't expand to any instructions itself.)
Re: the weird name with three underscores: It's the usual __builtin_name pattern, but name is __clear_cache.
My edit on #AntoineMathys's answer added this.
In practice GCC/clang don't "know about" mmap(MAP_ANONYMOUS) the way they know about malloc. So in practice the optimizer will assume that the memcpy into the buffer might be read as data by the non-inline function call through the function pointer, even without __builtin___clear_cache(). (Unless you declared the function type as __attribute__((const)).)
On x86, where I-cache is coherent with data caches, having the stores happen in asm before the call is sufficient for correctness. On other ISAs, __builtin___clear_cache() will actually emit special instructions as well as ensuring the right compile-time ordering.
It's good practice to include it when copying code into a buffer because it doesn't cost performance, and stops hypothetical future compilers from breaking your code. (e.g. if they do understand that mmap(MAP_ANONYMOUS) gives newly-allocated anonymous memory that nothing else has a pointer to, just like malloc.)
With current GCC, I was able to provoke GCC into really doing an optimization we don't want by using __attribute__((const)) to tell the optimizer sum() is a pure function (that only reads its args, not global memory). GCC then knows sum() can't read the result of the memcpy as data.
With another memcpy into the same buffer after the call, GCC does dead-store elimination into just the 2nd store after the call. This results in no store before the first call so it executes the 00 00 add [rax], al bytes, segfaulting.
// demo of a problem on x86 when not using __builtin___clear_cache
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
int main ()
{
char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi]
0xC3 // ret
};
__attribute__((const)) int (*sum) (int, int) = NULL;
// copy code to executable buffer
sum = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON,-1,0);
memcpy (sum, code, sizeof(code));
//__builtin___clear_cache(sum, sum + sizeof(code));
int c = sum (2, 3);
//printf ("%d + %d = %d\n", a, b, c);
memcpy(sum, (char[]){0x31, 0xc0, 0xc3, 0}, 4); // xor-zero eax, ret, padding for a dword store
//__builtin___clear_cache(sum, sum + 4);
return sum(2,3);
}
Compiled on the Godbolt compiler explorer with GCC9.2 -O3
main:
push rbx
xor r9d, r9d
mov r8d, -1
mov ecx, 34
mov edx, 7
mov esi, 4
xor edi, edi
sub rsp, 16
call mmap
mov esi, 3
mov edi, 2
mov rbx, rax
call rax # call before store
mov DWORD PTR [rbx], 12828721 # 0xC3C031 = xor-zero eax, ret
add rsp, 16
pop rbx
ret # no 2nd call, CSEd away because const and same args
Passing different args would have gotten another call reg, but even with __builtin___clear_cache the two sum(2,3) calls can CSE. __attribute__((const)) doesn't respect changes to the machine code of a function. Don't do it. It's safe if you're going to JIT the function once and then call many times, though.
Uncommenting the first __clear_cache results in
mov DWORD PTR [rax], -1019804531 # lea; ret
call rax
mov DWORD PTR [rbx], 12828721 # xor-zero; ret
... still CSE and use the RAX return value
The first store is there because of __clear_cache and the sum(2,3) call. (Removing the first sum(2,3) call does let dead-store elimination happen across the __clear_cache.)
The second store is there because the side-effect on the buffer returned by mmap is assumed to be important, and that's the final value main leaves.
Godbolt's ./a.out option to run the program still seems to always fail (exit status of 255); maybe it sandboxes JITing? It works on my desktop with __clear_cache and crashes without.
mprotect on a page holding existing C variables.
You can also give a single existing page read+write+exec permission. This is an alternative to compiling with -z execstack
You don't need __clear_cache on a page holding read-only C variables because there's no store to optimize away. You would still need it for initializing a local buffer (on the stack). Otherwise GCC will optimize away the initializer for this private buffer that a non-inline function call definitely doesn't have a pointer to. (Escape analysis). It doesn't consider the possibility that the buffer might hold the machine code for the function unless you tell it that via __builtin___clear_cache.
#include <stdio.h>
#include <sys/mman.h>
#include <stdint.h>
// can be non-const if you want, we're using mprotect
static const char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi] // retval = a+b;
0xC3 // ret
};
static const char ret0_code[] = "\x31\xc0\xc3";
int main () {
// void* cast is easier to type than a cast to function pointer,
// and in C can be assigned to any other pointer type. (not C++)
int (*sum) (int, int) = (void*)code;
int (*ret0)(void) = (void*)ret0_code;
// hard-coding x86's 4k page size for simplicity.
// also assume that `code` doesn't span a page boundary and that ret0_code is in the same page.
uintptr_t page = (uintptr_t)code & -4095ULL; // round down
mprotect((void*)page, 4096, PROT_READ|PROT_EXEC|PROT_WRITE); // +write in case the page holds any writeable C vars that would crash later code.
// run code
int c = sum (2, 3);
return ret0();
}
I used PROT_READ|PROT_EXEC|PROT_WRITE in this example so it works regardless of where your variable is. If it was a local on the stack and you left out PROT_WRITE, call would fail after making the stack read only when it tried to push a return address.
Also, PROT_WRITE lets you test shellcode that self-modifies, e.g. to edit zeros into its own machine code, or other bytes it was avoiding.
$ gcc -O3 shellcode.c # without -z execstack
$ ./a.out
$ echo $?
0
$ strace ./a.out
...
mprotect(0x55605aa3f000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
exit_group(0) = ?
+++ exited with 0 +++
If I comment out the mprotect, it does segfault with recent versions of GNU Binutils ld which no longer put read-only constant data into the same ELF segment as the .text section.
If I did something like ret0_code[2] = 0xc3;, I would need __builtin___clear_cache(ret0_code+2, ret0_code+2) after that to make sure the store wasn't optimized away, but if I don't modify the static arrays then it's not needed after mprotect. It is needed after mmap+memcpy or manual stores, because we want to execute bytes that have been written in C (with memcpy).
You need to include the assembly in-line via a special compiler directive so that it'll properly end up in a code segment. See this guide, for example: http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
Your machine code may be all right, but your CPU objects.
Modern CPUs manage memory in segments. In normal operation, the operating system loads a new program into a program-text segment and sets up a stack in a data segment. The operating system tells the CPU never to run code in a data segment. Your code is in code[], in a data segment. Thus the segfault.
This will take some effort.
Your code variable is stored in the .data section of your executable:
$ readelf -p .data exploit
String dump of section '.data':
[ 10] H1À
H1À is the value of your variable.
The .data section is not executable:
$ readelf -S exploit
There are 30 section headers, starting at offset 0x1150:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[...]
[24] .data PROGBITS 0000000000601010 00001010
0000000000000014 0000000000000000 WA 0 0 8
All 64-bit processors I'm familiar with support non-executable pages natively in the pagetables. Most newer 32-bit processors (the ones that support PAE) provide enough extra space in their pagetables for the operating system to emulate hardware non-executable pages. You'll need to run either an ancient OS or an ancient processor to get a .data section marked executable.
Because these are just flags in the executable, you ought to be able to set the X flag through some other mechanism, but I don't know how to do so. And your OS might not even let you have pages that are both writable and executable.
You may need to set the page executable before you may call it.
On MS-Windows, see the VirtualProtect -function.
URL: http://msdn.microsoft.com/en-us/library/windows/desktop/aa366898%28v=vs.85%29.aspx
Sorry, I couldn't follow above examples which are complicated.
So, I created an elegant solution for executing hex code from C.
Basically, you could use asm and .word keywords to place your instructions in hex format.
See below example:
asm volatile(".rept 1024\n"
CNOP
".endr\n");
where CNOP is defined as below:
#define ".word 0x00010001 \n"
Basically, c.nop instruction was not supported by my current assembler. So, I defined CNOP as the hex equivalent of c.nop with proper syntax and used inside asm, with which I was aware of.
.rept <NUM> .endr will basically, repeat the instruction NUM times.
This solution is working and verified.

inline asm type mismatch

I am trying to make my own read system call for a 64bit linux system. but it keeps telling me I have a bad type. Is the compiler trying to indirectly address buf? I have a feeling I messed up in my input constraints. I just need the address of buf at %2.
error:
test.c: Assembler messages:
test.c:28: Error: operand type mismatch for `movq'
static int myread(int fd, char *buf, int size) {
register int bytes;
asm(
"movq $0, %%rax\n"
"movq %1, %%rdi\n"
"movq %2, %%rsi\n"
"movq %3, %%rdx\n"
"syscall\n"
"movq %%rax, %0"
: "=r" (bytes)
: "m" (fd), "m" (buf), "m" (size)
: "%rax", "%rdi", "%rsi", "%rdx"
);
return bytes;
}
As Mystical said, the mismatch error comes from the fact that you are using movq (which is for 64bit values) on 32bit integers (like fd and size).
But beyond that, this code is really inefficient and subtly (but dangerously) flawed. Maybe something more like this:
static int myread(int fd, char *buf, int size) {
register int bytes;
asm(
"syscall"
: "=a" (bytes)
: "D" (fd), "S" (buf), "d" (size), "0" (0)
: "rcx", "r11", "memory", "cc"
);
return bytes;
}
To understand this, check out the machine constraints for i386.
Note that syscalls clobber the rcx and r11 registers. Failing to advise the compiler that you are changing these values can lead to very strange problems. And the problems won't happen on the syscall, but a hundred lines downstream.
I'm also going to make a pitch for NOT using inline asm. I'm not sure why you don't want to just use the system calls, but you are just setting yourself up for grief.

C++ argv indices larger than argc

So I've been trying to access command line arguments with indices larger than argc in C++ (because why not) and discovered that there are actually some parameters passed on to the program. The following code produces the following result in my Ubuntu 14.04:
#include <cstdio>
int main (int argc, char** argv) {
for (int i=argc+1; argv[i]!=0; i++)
printf("%3d %s\n", i, argv[i]);
}
This is the output:
2 XDG_VTNR=7
3 LC_PAPER=tr_TR.UTF-8
4 LC_ADDRESS=tr_TR.UTF-8
5 XDG_SESSION_ID=c2
6 XDG_GREETER_DATA_DIR=/var/lib/lightdm-data/kubuzetto
7 SELINUX_INIT=YES
8 LC_MONETARY=tr_TR.UTF-8
9 CLUTTER_IM_MODULE=xim
(and so on)
What is this and is it distro-dependent?
Don't run past argc, it's not defined.
In this case, we know what happened. You see, main() is declared and called as
extern int main(int argc, char **argv, char **envp);
and it just so happens that the the way argv and envp are built leaves envp right after argv, so running off the end encounters envp.
It might change someday, so don't depend on this working. If you wanna depend on the kernel's passing method, provide the kernel's entry point yourself (which must be done in asm [or possibly a naked function] as it doesn't look like a function call).

Configure kern.log to give more info about a segfault

Currently I can find in kern.log entries like this:
[6516247.445846] ex3.x[30901]: segfault at 0 ip 0000000000400564 sp 00007fff96ecb170 error 6 in ex3.x[400000+1000]
[6516254.095173] ex3.x[30907]: segfault at 0 ip 0000000000400564 sp 00007fff0001dcf0 error 6 in ex3.x[400000+1000]
[6516662.523395] ex3.x[31524]: segfault at 7fff80000000 ip 00007f2e11e4aa79 sp 00007fff807061a0 error 4 in libc-2.13.so[7f2e11dcf000+180000]
(You see, apps causing segfault are named ex3.x, means exercise 3 executable).
Is there a way to ask kern.log to log the complete path? Something like:
[6...] /home/user/cclass/ex3.x[3...]: segfault at 0 ip 0564 sp 07f70 error 6 in ex3.x[4...]
So I can easily figure out from who (user/student) this ex3.x is?
Thanks!
Beco
That log message comes from the kernel with a fixed format that only includes the first 16 letters of the executable excluding the path as per show_signal_msg, see other relevant lines for segmentation fault on non x86 architectures.
As mentioned by Makyen, without significant changes to the kernel and a recompile, the message given to klogd which is passed to syslog won't have the information you are requesting.
I am not aware of any log transformation or injection functionality in syslog or klogd which would allow you to take the name of the file and run either locate or file on the filesystem in order to find the full path.
The best way to get the information you are looking for is to use crash interception software like apport or abrt or corekeeper. These tools store the process metadata from the /proc filesystem including the process's commandline which would include the directory run from, assuming the binary was run with a full path, and wasn't already in path.
The other more generic way would be to enable core dumps, and then to set /proc/sys/kernel/core_pattern to include %E, in order to have the core file name including the path of the binary.
The short answer is: No, it is not possible without making code changes and recompiling the kernel. The normal solution to this problem is to instruct your students to name their executable <student user name>_ex3.x so that you can easily have this information.
However, it is possible to get the information you desire from other methods. Appleman1234 has provided some alternatives in his answer to this question.
How do we know the answer is "Not possible to the the full path in the kern.log segfault messages without recompiling the kernel":
We look in the kernel source code to find out how the message is produced and if there are any configuration options.
The files in question are part of the kernel source. You can download the entire kernel source as an rpm package (or other type of package) for whatever version of linux/debian you are running from a variety of places.
Specifically, the output that you are seeing is produced from whichever of the following files is for your architecture:
linux/arch/sparc/mm/fault_32.c
linux/arch/sparc/mm/fault_64.c
linux/arch/um/kernel/trap.c
linux/arch/x86/mm/fault.c
An example of the relevant function from one of the files(linux/arch/x86/mm/fault.c):
/*
* Print out info about fatal segfaults, if the show_unhandled_signals
* sysctl is set:
*/
static inline void
show_signal_msg(struct pt_regs *regs, unsigned long error_code,
unsigned long address, struct task_struct *tsk)
{
if (!unhandled_signal(tsk, SIGSEGV))
return;
if (!printk_ratelimit())
return;
printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address,
(void *)regs->ip, (void *)regs->sp, error_code);
print_vma_addr(KERN_CONT " in ", regs->ip);
printk(KERN_CONT "\n");
}
From that we see that the variable passed to printout the process identifier is tsk->comm where struct task_struct *tsk and regs->ip where struct pt_regs *regs
Then from linux/include/linux/sched.h
struct task_struct {
...
char comm[TASK_COMM_LEN]; /* executable name excluding path
- access with [gs]et_task_comm (which lock
it with task_lock())
- initialized normally by setup_new_exec */
The comment makes it clear that the path for the executable is not stored in the structure.
For regs->ip where struct pt_regs *regs, it is defined in whichever of the following are appropriate for your architecture:
arch/arc/include/asm/ptrace.h
arch/arm/include/asm/ptrace.h
arch/arm64/include/asm/ptrace.h
arch/cris/include/arch-v10/arch/ptrace.h
arch/cris/include/arch-v32/arch/ptrace.h
arch/metag/include/asm/ptrace.h
arch/mips/include/asm/ptrace.h
arch/openrisc/include/asm/ptrace.h
arch/um/include/asm/ptrace-generic.h
arch/x86/include/asm/ptrace.h
arch/xtensa/include/asm/ptrace.h
From there we see that struct pt_regs is defining registers for the architecture. ip is just: unsigned long ip;
Thus, we have to look at what print_vma_addr() does. It is defined in mm/memory.c
/*
* Print the name of a VMA.
*/
void print_vma_addr(char *prefix, unsigned long ip)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
/*
* Do not print if we are in atomic
* contexts (in exception stacks, etc.):
*/
if (preempt_count())
return;
down_read(&mm->mmap_sem);
vma = find_vma(mm, ip);
if (vma && vma->vm_file) {
struct file *f = vma->vm_file;
char *buf = (char *)__get_free_page(GFP_KERNEL);
if (buf) {
char *p;
p = d_path(&f->f_path, buf, PAGE_SIZE);
if (IS_ERR(p))
p = "?";
printk("%s%s[%lx+%lx]", prefix, kbasename(p),
vma->vm_start,
vma->vm_end - vma->vm_start);
free_page((unsigned long)buf);
}
}
up_read(&mm->mmap_sem);
}
Which shows us that a path was available. We would need to check that it was the path, but looking a bit further in the code gives a hint that it might not matter. We need to see what kbasename() did with the path that is passed to it. kbasename() is defined in include/linux/string.h as:
/**
* kbasename - return the last part of a pathname.
*
* #path: path to extract the filename from.
*/
static inline const char *kbasename(const char *path)
{
const char *tail = strrchr(path, '/');
return tail ? tail + 1 : path;
}
Which, even if the full path is available prior to it, chops off everything except for the last part of a pathname, leaving the filename.
Thus, no amount of runtime configuration options will permit printing out the full pathname of the file in the segment fault messages you are seeing.
NOTE: I've changed all of the links to kernel source to be to archives, rather than the original locations. Those links will get close to the code as it was at the time I wrote this, 2104-09. As should be no surprise, the code does evolve over time, so the code which is current when you're reading this may or may not be similar or perform in the way which is described here.

Difference between structures ttysize and winsize?

I have following code to get width and height of screen in Linux.
#ifdef TIOCGSIZE
struct ttysize ts;
ioctl(STDIN_FILENO, TIOCGSIZE, &ts);
cols = ts.ts_cols;
lines = ts.ts_lines;
#elif defined(TIOCGWINSZ)
struct winsize ts;
ioctl(STDIN_FILENO, TIOCGWINSZ, &ts);
cols = ts.ws_col;
lines = ts.ws_row;
#endif /* TIOCGSIZE */
What is difference between ttysize and winsize?
The ttysize was the original implementation for SunOS 3.0 (February 1986), and soon after made obsolete by winsize, which adds the size of the window in pixels. Here's
what the definitions look like in <sys/ttycom.h> from SunOS 4:
/*
* Window/terminal size structure.
* This information is stored by the kernel
* in order to provide a consistent interface,
* but is not used by the kernel.
*
* Type must be "unsigned short" so that types.h not required.
*/
struct winsize {
unsigned short ws_row; /* rows, in characters */
unsigned short ws_col; /* columns, in characters */
unsigned short ws_xpixel; /* horizontal size, pixels - not used */
unsigned short ws_ypixel; /* vertical size, pixels - not used */
};
#define TIOCGWINSZ _IOR(t, 104, struct winsize) /* get window size */
#define TIOCSWINSZ _IOW(t, 103, struct winsize) /* set window size */
/*
* Sun version of same.
*/
struct ttysize {
int ts_lines; /* number of lines on terminal */
int ts_cols; /* number of columns on terminal */
};
#define TIOCSSIZE _IOW(t,37,struct ttysize)/* set tty size */
#define TIOCGSIZE _IOR(t,38,struct ttysize)/* get tty size */
The data types are different (an integer would waste memory), and the fields have different names.
The ttysize structure has long been obsolete: if either is provided by the system, winsize is supported. That wasn't true when porting ncurses to SCO OpenServer in 1997, as noted in this chunk from lib_setup.c:
/*
* SCO defines TIOCGSIZE and the corresponding struct. Other systems (SunOS,
* Solaris, IRIX) define TIOCGWINSZ and struct winsize.
*/
#ifdef TIOCGSIZE
# define IOCTL_WINSIZE TIOCGSIZE
# define STRUCT_WINSIZE struct ttysize
# define WINSIZE_ROWS(n) (int)n.ts_lines
# define WINSIZE_COLS(n) (int)n.ts_cols
#else
# ifdef TIOCGWINSZ
# define IOCTL_WINSIZE TIOCGWINSZ
# define STRUCT_WINSIZE struct winsize
# define WINSIZE_ROWS(n) (int)n.ws_row
# define WINSIZE_COLS(n) (int)n.ws_col
# endif
#endif
You might notice that Linux is not mentioned in the comment. According to comments in asm-sparc64/ioctls.h, the ioctl for ttysize was unsupported as of 2.6.16:
/* Note that all the ioctls that are not available in Linux have a
* double underscore on the front to: a) avoid some programs to
* think we support some ioctls under Linux (autoconfiguration stuff)
*/
...
#define TIOCCONS _IO('t', 36)
#define __TIOCSSIZE _IOW('t', 37, struct sunos_ttysize) /* SunOS Specific */
#define __TIOCGSIZE _IOR('t', 38, struct sunos_ttysize) /* SunOS Specific */
#define TIOCGSOFTCAR _IOR('t', 100, int)
#define TIOCSSOFTCAR _IOW('t', 101, int)
#define __TIOCUCNTL _IOW('t', 102, int) /* SunOS Specific */
#define TIOCSWINSZ _IOW('t', 103, struct winsize)
#define TIOCGWINSZ _IOR('t', 104, struct winsize)
A much earlier comment in 1995 added the definitions (without the double-underscore). Possibly a few programs used that with Linux, although winsize was well established on most platforms before Linux was begun. A little more digging finds that the double-underscore was introduced in 1996 (patch-2.1.9 linux/include/asm-sparc/ioctls.h). Given that, very few programs would have used it with Linux.
Further reading:
Garbage-collect struct ttysize (OpenBSD mailing list)
/dev/console MKS manual page
They have the same function (at least in this case - maybe one structure has an additional field not used here).
Historically some (mainly older) Linux versions have different definitions of the IOCTL codes. Therefore some Linux versions have only TIOCGSIZE defined (using a ttysize structure) and some Linux versions have only TIOCGWINSZ defined.
Using the "#ifdef" construction the program can be compiled for both Linux versions.
Newer Linux versions should have both IOCTLs defined.

Resources