SIGSEGV segmentation fault at strftime_l lib64/libc.so.6 - linux

I'am porting pro*c codes from UNIX to LINUX. The codes are compiled and created executables successfully. But during run time its raising segmentation fault. I debugged the code step by step and the below is the output of GDB debug.
Breakpoint 4 at 0x3b19690f50
(gdb) n
525 strftime (buf, MAX_STRING_LEN, "%d/%b/%Y:%H:%M:%S", dummy_time);
(gdb) n
Breakpoint 4, 0x0000003b19690f50 in strftime () from /lib64/libc.so.6
(gdb) n
Single stepping until exit from function strftime,
which has no line number information.
0x0000003b19690f70 in strftime_l () from /lib64/libc.so.6
(gdb) n
Single stepping until exit from function strftime_l,
which has no line number information.
Program received signal SIGSEGV, Segmentation fault.
0x0000003b19690f8b in strftime_l () from /lib64/libc.so.6
Actually in code the function strftime() is called. But I have no idea why it is reaching strftime_l() in /lib64/libc.so.6.
This issue is not coming in UNIX. please help on this. code is
static void speed_hack_libs(void)
{
time_t dummy_time_t = time(NULL);
struct tm *dummy_time = localtime (&dummy_time_t);
struct tm *other_dummy_time = gmtime (&dummy_time_t);
char buf[MAX_STRING_LEN];
strftime (buf, MAX_STRING_LEN, "%d/%b/%Y:%H:%M:%S", dummy_time);
}

struct tm *dummy_time = localtime (&dummy_time_t);
struct tm *other_dummy_time = gmtime (&dummy_time_t);
This is not gonna work. From the man page:
The localtime() function converts the calendar time timep to broken-down time representation, expressed relative to the user's specified time-zone. ... The return value points to a statically allocated struct which might be overwritten by
subsequent calls to any of the date and time functions.
The gmtime() function converts the calendar time timep to broken-down time representation, expressed in Coordinated Universal Time (UTC). It
may return NULL when the year does not fit into an integer. The return value points to a statically allocated struct which might be overwritten by subsequent calls to any of the date and time functions.
So, *dummy_time will probably be overwritten by the time you use it, and contain unpredictable garbage. You should copy the data to your buffer like this:
struct tm dummy_time ;
memcpy(&dummy_time, localtime (&dummy_time_t), sizeof(struct tm));
Although I'm not sure how could this cause a SIGSEGV (might be something with getting the month names etc. - check if the problem persists with LC_ALL=C), you must fix this before you can move on. Also, check (in the debugger) the contents of *dummy_time.

It is calling strftime_l because you compiled 64 bit - that is the 64 bit library entry point for strftime. You have two pointers in strftime - a string and a struct tm pointer. One of them is pointing to invalid memory. jpalacek gave you where to look first.

Did you add the time.h header file? I think you have missed it.

Related

How to get the value of RIP(program counter) from inside a system call?

RIP is a register that stores the address of the current instruction. I am writing a system call to get the program counter of a user level process. Taking the following code for an example, the sys_getrip() is the system call that I am working on. How shall I implement it?
The platform that I am working on is Ubuntu 14.04 with Linux kernel 3.14.4.
/*A system call in the kernel level*/
void sys_getrip(){
char *pc =//What's the magic here?
printk(KERN_INFO, "The current program counter of %d is %p", current->pid, pc);
}
/*A test function running in user level*/
void main(void){
...
sys_getrip();
...
...
sys_getrip();
...
}
A similar problem has been posted here: How to print exact value of the program counter in C. I tried the approach in this post by implementing the sys_getrip() as following
void sys_getrip(void){
printk(KERN_INFO "RIP = %p\n", __builtin_return_address(0));
}
In this case, however, different invokes of sys_getrip() in main() gave the same RIP value.

Debugging in threading building Blocks

I would like to program in threading building blocks with tasks. But how does one do the debugging in practice?
In general the print method is a solid technique for debugging programs.
In my experience with MPI parallelization, the right way to do logging is that each thread print its debugging information in its own file (say "debug_irank" with irank the rank in the MPI_COMM_WORLD) so that the logical errors can be found.
How can something similar be achieved with TBB? It is not clear how to access the thread number in the thread pool as this is obviously something internal to tbb.
Alternatively, one could add an additional index specifying the rank when a task is generated but this makes the code rather complicated since the whole program has to take care of that.
First, get the program working with 1 thread. To do this, construct a task_scheduler_init as the first thing in main, like this:
#include "tbb/tbb.h"
int main() {
tbb::task_scheduler_init init(1);
...
}
Be sure to compile with the macro TBB_USE_DEBUG set to 1 so that TBB's checking will be enabled.
If the single-threaded version works, but the multi-threaded version does not, consider using Intel Inspector to spot race conditions. Be sure to compile with TBB_USE_THREADING_TOOLS so that Inspector gets enough information.
Otherwise, I usually first start by adding assertions, because the machine can check assertions much faster than I can read log messages. If I am really puzzled about why an assertion is failing, I use printfs and task ids (not thread ids). Easiest way to create a task id is to allocate one by post-incrementing a tbb::atomic<size_t> and storing the result in the task.
If I'm having a really bad day and the printfs are changing program behavior so that the error does not show up, I use "delayed printfs". Stuff the printf arguments in a circular buffer, and run printf on the records later after the failure is detected. Typically for the buffer, I use an array of structs containing the format string and a few word-size values, and make the array size a power of two. Then an atomic increment and mask suffices to allocate slots. E.g., something like this:
const size_t bufSize = 1024;
struct record {
const char* format;
void *arg0, *arg1;
};
tbb::atomic<size_t> head;
record buf[bufSize];
void recf(const char* fmt, void* a, void* b) {
record* r = &buf[head++ & bufSize-1];
r->format = fmt;
r->arg0 = a;
r->arg1 = b;
}
void recf(const char* fmt, int a, int b) {
record* r = &buf[head++ & bufSize-1];
r->format = fmt;
r->arg0 = (void*)a;
r->arg1 = (void*)b;
}
The two recf routines record the format and the values. The casting is somewhat abusive, but on most architectures you can print the record correctly in practice with printf(r->format, r->arg0, r->arg1) even if the the 2nd overload of recf created the record.
~
~

Confusing result from counting page fault in linux

I was writing programs to count the time of page faults in a linux system. More precisely, the time kernel execute the function __do_page_fault.
And somehow I wrote two global variables, named pfcount_at_beg and pfcount_at_end, which increase once when the function __do_page_fault is executed at different locations of the function.
To illustrate, the modified function goes as:
unsigned long pfcount_at_beg = 0;
unsigned long pfcount_at_end = 0;
static void __kprobes
__do_page_fault(...)
{
struct vm_area_sruct *vma;
... // VARIABLES DEFINITION
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
pfcount_at_beg++; // I add THIS
...
...
// ORIGINAL CODE OF THE FUNCTION
...
pfcount_at_end++; // I add THIS
}
I expected that the value of pfcount_at_end is smaller than the value of pfcount_at_beg.
Because, I think, every time kernel executes the instructions of code pfcount_at_end++, it must have executed pfcount_at_beg++(Every function starts at the very beginning of the code).
On the other hand, as there are many conditional return between these two lines of code.
However, the result turns out oppositely. The value of pfcount_at_end is larger than the value of pfcount_at_beg.
I use printk to print these kernel variables through a self-defined syscall. And I wrote the user level program to call the system call.
Here is my simple syscall and user-level program:
// syscall
asmlinkage int sys_mysyscall(void)
{
printk( KERN_INFO "total pf_at_beg%lu\ntotal pf_at_end%lu\n", pfcount_at_beg, pfcount_at_end)
return 0;
}
// user-level program
#include<linux/unistd.h>
#include<sys/syscall.h>
#define __NR_mysyscall 223
int main()
{
syscall(__NR_mysyscall);
return 0;
}
Is there anybody who knows what exactly happened during this?
Just now I modified the code, to make pfcount_at_beg and pfcount_at_end static. However the result did not change, i.e. the value of pfcount_at_end is larger than the value of pfcount_at_beg.
So possibly it might be caused by in-atomic operation of increment. Would it be better if I use read-write lock?
The ++ operator is not garanteed to be atomic, so your counters may suffer concurrent access and have incorrect values. You should protect your increment as a critical section, or use the atomic_t type defined in <asm/atomic.h>, and its related atomic_set() and atomic_add() functions (and a lot more).
Not directly connected to your issue, but using a specific syscall is overkill (but maybe it is an exercise). A lighter solution could be to use a /proc entry (also an interesting exercise).

Configure kern.log to give more info about a segfault

Currently I can find in kern.log entries like this:
[6516247.445846] ex3.x[30901]: segfault at 0 ip 0000000000400564 sp 00007fff96ecb170 error 6 in ex3.x[400000+1000]
[6516254.095173] ex3.x[30907]: segfault at 0 ip 0000000000400564 sp 00007fff0001dcf0 error 6 in ex3.x[400000+1000]
[6516662.523395] ex3.x[31524]: segfault at 7fff80000000 ip 00007f2e11e4aa79 sp 00007fff807061a0 error 4 in libc-2.13.so[7f2e11dcf000+180000]
(You see, apps causing segfault are named ex3.x, means exercise 3 executable).
Is there a way to ask kern.log to log the complete path? Something like:
[6...] /home/user/cclass/ex3.x[3...]: segfault at 0 ip 0564 sp 07f70 error 6 in ex3.x[4...]
So I can easily figure out from who (user/student) this ex3.x is?
Thanks!
Beco
That log message comes from the kernel with a fixed format that only includes the first 16 letters of the executable excluding the path as per show_signal_msg, see other relevant lines for segmentation fault on non x86 architectures.
As mentioned by Makyen, without significant changes to the kernel and a recompile, the message given to klogd which is passed to syslog won't have the information you are requesting.
I am not aware of any log transformation or injection functionality in syslog or klogd which would allow you to take the name of the file and run either locate or file on the filesystem in order to find the full path.
The best way to get the information you are looking for is to use crash interception software like apport or abrt or corekeeper. These tools store the process metadata from the /proc filesystem including the process's commandline which would include the directory run from, assuming the binary was run with a full path, and wasn't already in path.
The other more generic way would be to enable core dumps, and then to set /proc/sys/kernel/core_pattern to include %E, in order to have the core file name including the path of the binary.
The short answer is: No, it is not possible without making code changes and recompiling the kernel. The normal solution to this problem is to instruct your students to name their executable <student user name>_ex3.x so that you can easily have this information.
However, it is possible to get the information you desire from other methods. Appleman1234 has provided some alternatives in his answer to this question.
How do we know the answer is "Not possible to the the full path in the kern.log segfault messages without recompiling the kernel":
We look in the kernel source code to find out how the message is produced and if there are any configuration options.
The files in question are part of the kernel source. You can download the entire kernel source as an rpm package (or other type of package) for whatever version of linux/debian you are running from a variety of places.
Specifically, the output that you are seeing is produced from whichever of the following files is for your architecture:
linux/arch/sparc/mm/fault_32.c
linux/arch/sparc/mm/fault_64.c
linux/arch/um/kernel/trap.c
linux/arch/x86/mm/fault.c
An example of the relevant function from one of the files(linux/arch/x86/mm/fault.c):
/*
* Print out info about fatal segfaults, if the show_unhandled_signals
* sysctl is set:
*/
static inline void
show_signal_msg(struct pt_regs *regs, unsigned long error_code,
unsigned long address, struct task_struct *tsk)
{
if (!unhandled_signal(tsk, SIGSEGV))
return;
if (!printk_ratelimit())
return;
printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address,
(void *)regs->ip, (void *)regs->sp, error_code);
print_vma_addr(KERN_CONT " in ", regs->ip);
printk(KERN_CONT "\n");
}
From that we see that the variable passed to printout the process identifier is tsk->comm where struct task_struct *tsk and regs->ip where struct pt_regs *regs
Then from linux/include/linux/sched.h
struct task_struct {
...
char comm[TASK_COMM_LEN]; /* executable name excluding path
- access with [gs]et_task_comm (which lock
it with task_lock())
- initialized normally by setup_new_exec */
The comment makes it clear that the path for the executable is not stored in the structure.
For regs->ip where struct pt_regs *regs, it is defined in whichever of the following are appropriate for your architecture:
arch/arc/include/asm/ptrace.h
arch/arm/include/asm/ptrace.h
arch/arm64/include/asm/ptrace.h
arch/cris/include/arch-v10/arch/ptrace.h
arch/cris/include/arch-v32/arch/ptrace.h
arch/metag/include/asm/ptrace.h
arch/mips/include/asm/ptrace.h
arch/openrisc/include/asm/ptrace.h
arch/um/include/asm/ptrace-generic.h
arch/x86/include/asm/ptrace.h
arch/xtensa/include/asm/ptrace.h
From there we see that struct pt_regs is defining registers for the architecture. ip is just: unsigned long ip;
Thus, we have to look at what print_vma_addr() does. It is defined in mm/memory.c
/*
* Print the name of a VMA.
*/
void print_vma_addr(char *prefix, unsigned long ip)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
/*
* Do not print if we are in atomic
* contexts (in exception stacks, etc.):
*/
if (preempt_count())
return;
down_read(&mm->mmap_sem);
vma = find_vma(mm, ip);
if (vma && vma->vm_file) {
struct file *f = vma->vm_file;
char *buf = (char *)__get_free_page(GFP_KERNEL);
if (buf) {
char *p;
p = d_path(&f->f_path, buf, PAGE_SIZE);
if (IS_ERR(p))
p = "?";
printk("%s%s[%lx+%lx]", prefix, kbasename(p),
vma->vm_start,
vma->vm_end - vma->vm_start);
free_page((unsigned long)buf);
}
}
up_read(&mm->mmap_sem);
}
Which shows us that a path was available. We would need to check that it was the path, but looking a bit further in the code gives a hint that it might not matter. We need to see what kbasename() did with the path that is passed to it. kbasename() is defined in include/linux/string.h as:
/**
* kbasename - return the last part of a pathname.
*
* #path: path to extract the filename from.
*/
static inline const char *kbasename(const char *path)
{
const char *tail = strrchr(path, '/');
return tail ? tail + 1 : path;
}
Which, even if the full path is available prior to it, chops off everything except for the last part of a pathname, leaving the filename.
Thus, no amount of runtime configuration options will permit printing out the full pathname of the file in the segment fault messages you are seeing.
NOTE: I've changed all of the links to kernel source to be to archives, rather than the original locations. Those links will get close to the code as it was at the time I wrote this, 2104-09. As should be no surprise, the code does evolve over time, so the code which is current when you're reading this may or may not be similar or perform in the way which is described here.

Segmentation fault while trying to print parts of a pointer struct

I'm writing a program that must take user input to assign values to parts of a structure. I need to create a pointer to the structure that I will pass through as a one and only parameter for a function that will print each part of the structure individually. I also must malloc memory for the structure. As it is now, the program compiles and runs through main and asks the user for inputs. A segmentation fault occurs after the last user input is collected and when I'm assuming the call to the printContents function is run. Any help would be appreciated!
#include <stdio.h>
#include <stdlib.h>
struct info
{
char name[100], type;
int size;
long int stamp;
};
void printContents(struct info *iptr);
int main(void)
{
struct info *ptr=malloc(sizeof(struct info));
printf("Enter the type: \n");
scanf("%c", &(*ptr).type);
printf("Enter the filename: \n");
scanf("%s", (*ptr).name);
printf("Enter the access time: \n");
scanf("%d", &(*ptr).stamp);
printf("Enter the size: \n");
scanf("%d", &(*ptr).size);
printf("%c", (*ptr).type);
printContents(ptr);
}
void printContents(struct info *iptr)
{
printf("Filename %s Size %d Type[%s] Accessed # %d \n", (*iptr).name, (*iptr).size, (*iptr).type, (*iptr).stamp);
}
Check the operator precedence. Is this &(*ptr).type the thing you're trying to do? Maybe &((*ptr).type) ?
ptr->member is like access to structure variable right? Also same for scanf() usr &ptr->member to get value. For char input use only ptr->charmember .
First let's do it the hard way. We'll assume that the code is already written, the compiler tells us nothing useful, and we don't have a debugger. First we put in some diagnostic output statements, and we discover that the crash happens in printContents:
printf("testing four\n"); /* we see this, so the program gets this far */
printf("Filename %s Size %d Type[%s] Accessed # %d \n", (*iptr).name, (*iptr).size, (*iptr).type, (*iptr).stamp);
printf("testing five\n"); /* the program crashes before this */
If we still can't see the bug, we narrow the problem down by preparing a minimal compete example. (This is a very valuable skill.) We compile and run the code over and over, commenting things out. When we comment something out and the code still segfaults, we remove it entirely; but if commenting it out makes the problem go away, we put it back in. Eventually we get down to a minimal case:
#include <stdio.h>
int main(void)
{
char type;
type = 'a';
printf("Type[%s]\n", type);
}
Now it should be obvious: when we printf("%s", x) something, printf expects x to be a string. That is, x should be a pointer to (i.e. the address of) the first element of a character array which ends with a null character. Instead we've given it a character (in this case 'a'), which it interprets as a number (in this case 97), and it tries to go to that address in memory and start reading; we're lucky to get nothing worse than a segfault. The fix is easy: decide whether type should be a char or a char[], if it's char then change the printf statement to "%c", if it's char[] then change its declaration.
Now an easy way. If we're using a good compiler like gcc, it will warn us that we're doing something fishy:
gcc foo.c -o foo
foo.c:35: warning: format ‘%s’ expects type ‘char *’, but argument 4 has type ‘int’
In future, there's a way you can save yourself all this trouble. Instead of writing a lot of code, getting a mysterious bug and backtracking, you can write in small increments. If you had added one term to that printf statement at a time, you would have seen exactly when the bug appeared, and which term was to blame.
Remember: start small and simple, add complexity a little at a time, test at every step, and never add to code that doesn't work.

Resources