"clocksource tsc unstable" shown when the linux kernel boots up - linux

I am booting up a linux kernel using a full system simulator, and I'd like to run my benchmark on the booted system. However, when it boots up, it shows me this message: "clocksource tsc unstable" and occasionally it hangs in the beggining. However, sometimes it lets me run my benchmark and then probably it hangs in the middle since the application never finishes and seems it's stuck there. Any idea how to fix this issue?
Thanks.

It suggests that, kernel didn't manage to calculate tsc (Time Stamp Counter) value properly i.e value is stale. This usually happens with VM. The way to avoid this is to - use predefined lpj (loops per jiffy) as kernel parameter (lpj=). Try it, hope issue will be fixed!

Related

is there a way to configure spike (riscv simulator) for entry PC etc.?

i'm doing verification to a RISCV cpu that supports Machine-Mode only, and i want to run my generated program on spike simulator.
i'm struggling to find any documentation about it.
how can i configure the first PC to my DUT first PC?
how can i configure other parameters like 'mvendorid' etc. ?
currently i'm working without pk and i'm getting "terminate called after throwing an instance of trap_load_access_fault".
when i'm working with pk, the program enters endless loop and the first PC doesn't look related to the ELF.
any suggestions?
Spike provides an option "--pc" which override the default entry point, run with '--pc DUT_FIRST_PC' should solve your problem.
When spike start simulation, its first pc is defined by DEFAULT_RSTVEC(0x1000), and spike set a "trampoline program" starts at DEFAULT_RSTVEC. After the "trampoline program" finished, spike jump to your program and go on, the jump pc can be override by '--pc' option.

Running perf record with Intel-PT event on compiled binaries from SPECCpu2006 crashes the server machine

I am having a recurring problem when using perf with Intel-PT event. I am currently performing profiling on a Intel(R) Xeon(R) CPU E5-2620 v4 # 2.10GHz machine, with x86_64 architecture and 32 hardware threads with virtualization enabled. I specifically use programs/source codes from SpecCPU2006 for profiling.
I am specifically observing that the first time I perform profiling on one of the compiled binaries from SpecCPU2006, everything works fine and the perf.data file gets generated, which is as expected with Intel-PT. As SpecCPU2006 programs are computationally-intensive(use 100% of CPU at any time), clearly perf.data files would be large for most of the programs. I obtain roughly 7-10 GB perf.data files for most of the profiled programs.
However, when I try to perform profiling the second time on the same compiled binary, after the first one is successfully done -- my server machine freezes up. Sometimes, this happens when I try profiling the third time/the fourth time (after the second or third profiling completed successfully). This behavior is highly unpredictable. Now I cannot profile any more binaries unless I have restarted the machine again.
I have also posted the server error logs which I get once I see that the computer has stopped responding.
Server error logs
Clearly there is an error message saying Fixing recursive fault but reboot is needed!.
This happens for particularly large enough SpecCPU2006 binaries which take more than 1 minute to run without perf.
Is there any particular reason why this might happen ? This should not occur due to high CPU usage, as running the programs without perf or with perf but any other hardware event(that can be seen by perf list) completed successfully. This only seems to happen with Intel-PT.
Please guide me in using the steps to solve this problem. Thanks.
Seems I resolved this issue now. So will post an answer.
The server crashed because of a null pointer dereference/access happening with a specific member of the structure perf_event. Basically the member perf_event->handle was the culprit. This information, as suggested by #osgx, was obtained from var/log/syslog file. A portion of the error message was :-
Apr 19 04:49:15 ###### kernel: [582411.404677] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ea
Apr 19 04:49:15 ###### kernel: [582411.404747] IP: [] perf_event_aux_event+0x2e/0xf0
One possible scenario where this structure member turns out to be NULL is if I start capturing packets even before an earlier run of perf record finished releasing all of its resources. This has been properly handled in kernel version 4.10. I was using kernel version 4.4.
I upgraded my kernel to the newer version and it works fine now!

Raspbian hangs in qemu

i'm running raspbian (2015-05-05-raspbian-wheezy.img) in qemu using compiled kernel (https://github.com/dhruvvyas90/qemu-rpi-kernel) on ubuntu 14.04. my final goal is to launch my python script within the emulation.
i'm following manual from http://www.unixmen.com/emulating-raspbian-using-qemu/, though many others suggest very similar sequence of actions.
things i'm trying and issues i'm experiencing:
first boot is more or less ok. i comment the line in /etc/ld.so.preload as suggested and reboot.
on second boot (after i remove init=/bin/bash) and all subsequent boots i get
ERROR ../libkmod/libkmod.c:554 kmod_search_moddep: could not open moddep file '/lib/modules/3.10.25/modules.dep.bin'
some googling suggested to run "sudo rpi-update". it didn't help, same message during boot.
on second boot (after i remove init=/bin/bash) and all subsequent boots i get
fsck died with exit status 6
looking into "/var/log/fsck/checkfs" as suggested tells that some location is not there, but it doesn't say which one
running "startx" produces error message from 1. it loads the UI eventually, but desktop only has "wastebasket" icon. there is also a thick white stripe on top of the screen blinking, like it keeps trying to load a tab but fails everytime. qemu window stops to respond to further interaction after this.
running "sudo apt-get upgrade" installs some packages, but after reboot i can't even get to UI - just blank screen with mouse cursor.
i'm not very experienced with how linux is configured at low level. i understand that i might be doing something completely stoopid.
so, my questions are:
how do i debug? i couldn't figure out the settings for qemu to write logs. i really don't want to fallback to gdb, as i'm not debugging qemu itself, just want to get notification on it's events.
ctrl key doesn't seem to work inside qemu window.
no copy-paste available. or i can't see how to turn it on.
am i missing something? from all the manuals i have seen it seems like this should go much much smoother. like it should "just work".
Since your post many things changed. The most important things is that now using Andrew Baumann GitHub repo you can build QEMU that boots recent Raspbian. I described my experience woth this code here. Instructions are straight forward. Implementation needs polishing but it best compilation of work so far.
To answer your questions:
QEMU have -s and -S options for GDB. First option setup gdb server hook and second freez CPU, so you can connect debugger. This is not for QEMU debugging this for guest system debugging. Default QEMU logging is to stderr, so if something valuable happen you will see it in terminal. You can raise QEMU verbosity by uncommenting various *DEBUG_ statements in source code. Also check help for -d and -D command line flags of QEMU.
Not sure I can help with this. Only thing that I can say is that my QEMU version 2.5.50 reacts to Ctrl+Alt which exits from GUI after capturing cursor, so it looks like QEMU understand Ctrl key. I assume that QEMU do not capture your special keys combination because your window manager do it before passing to QEMU.
This also not work for me, but I see some work was done in this area. Not sure how to enable and use that feature.
Emulating any hardware is very complex and requires a lot of work. All emulated targets are limited to some most important features. BCM2835/BCM2836 (Raspberry Pi/Raspberry Pi 2) SoC are still not accepted by mainline QEMU, so just work will not apply to those platforms.

How to debug ARM Linux kernel (msleep()) lock up?

I am first of all looking for debugging tips. If some one can point out the one line of code to change or the one peripheral config bit to set to fix the problem, that would be terrific. But that's not what I'm hoping for; I'm looking more for how do I go about debugging it.
Googling "msleep hang linux kernel site:stackoverflow.com" yields 13 answers and none is on the point, so I think I'm safe to ask.
I rebuild an ARM Linux kernel for an embedded TI AM1808 ARM processor (Sitara/DaVinci?). I see the all the boot log up to the login: prompt coming out of the serial port, but trying to login gets no response, doesn't even echo what I typed.
After lots of debugging I arrived at the kernel and added debugging code between line 828 and 830 (yes, kernel version is 2.6.37). This is at this point in the kernel mode before 'sbin/init' is called:
http://lxr.linux.no/linux+v2.6.37/init/main.c#L815
Right before line 830 I added a forever loop printk and I see the results. I have let it run for about a couple of hour and it counts to about 2 million. Sample line:
dbg:init/main.c:1202: 2088430
So it has spit out 60 million bytes without problem.
However, if I add msleep(1000) in the loop, it prints only once, i.e. msleep () does not return.
Details:
Adding a conditional printk at line 4073 in the scheduler that condition on a flag that get set at the start of the forever test loop described above shows that the schedule() is no longer called when it hangs:
http://lxr.linux.no/linux+v2.6.37/kernel/sched.c#L4064
The only selections under .config/'Device Drivers' are:
Block devices
I2C support
SPI support
The kernel and its ramdisk are loaded using uboot/TFTP.
I don't believe it tries to use the Ethernet.
Since all these happened before '/sbin/init', very little should be happenning.
More details:
I have a very similar board with the same CPU. I can run the same uImage and the same ramdisk and it works fine there. I can login and do the usual things.
I have run memory test (64 MB total, limit kernel to 32M and test the other 32M; it's a single chip DDR2) and found no problem.
One board uses UART0, and the other UART2, but boot log comes out of both so it should not be the problem.
Any debugging tips is greatly appreciated.
I don't have an appropriate JTAG so I can't use that.
If msleep doesn't return or doesn't make it to schedule, then in order to debug we can follow the call stack.
msleep calls schedule_timeout_uninterruptible(timeout) which calls schedule_timeout(timeout) which in the default case exits without calling schedule if the timeout in jiffies passed to it is < 0, so that is one thing to check.
If timeout is positive , then setup_timer_on_stack(&timer, process_timeout, (unsigned long)current); is called, followed by __mod_timer(&timer, expire, false, TIMER_NOT_PINNED); before calling schedule.
If we aren't getting to schedule then something must be happening in either setup_timer_on_stack or __mod_timer.
The calltrace for setup_timer_on_stack is setup_timer_on_stack calls setup_timer_on_stack_key which calls init_timer_on_stack_key is either external if CONFIG_DEBUG_OBJECTS_TIMERS is enabled or calls init_timer_key(timer, name, key);which calls
debug_init followed by __init_timer(timer, name, key).
__mod_timer first calls timer_stats_timer_set_start_info(timer); then a whole lot of other function calls.
I would advise starting by putting a printk or two in schedule_timeout probably either side of the setup_timer_on_stack call or either side of the __mod_timer call.
This problem has been solved.
With liberal use of prink it was determined that schedule() indeed switches to another task, the idle task. In this instance, being an embedded Linux, the original code base I copied from installed an idle task. That idle task seems not appropriate for my board and has locked up the CPU and thus causing the crash. Commenting out the call to the idle task
http://lxr.linux.no/linux+v2.6.37/arch/arm/mach-davinci/cpuidle.c#L93
works around the problem.

Address space identifiers using qemu for i386 linux kernel

Friends, I am working on an in-house architectural simulator which is used to simulate the timing-effect of a code running on different architectural parameters like core, memory hierarchy and interconnects.
I am working on a module takes the actual trace of a running program from an emulator like "PinTool" and "qemu-linux-user" and feed this trace to the simulator.
Till now my approach was like this :
1) take objdump of a binary executable and parse this information.
2) Now the emulator has to just feed me an instruction-pointer and other info like load-address/store-address.
Such approaches work only if the program content is known.
But now I have been trying to take traces of an executable running on top of a standard linux-kernel. The problem now is that the base kernel image does not contain the code for LKM(Loadable Kernel Modules). Also the daemons are not known when starting a kernel.
So, my approach to this solution is :
1) use qemu to emulate a machine.
2) When an instruction is encountered for the first time, I will parse it and save this info. for later.
3) create a helper function which sends the ip, load/store address when an instruction is executed.
i am stuck in step2. how do i differentiate between different processes from qemu which is just an emulator and does not know anything about the guest OS ??
I can modify the scheduler of the guest OS but I am really not able to figure out the way forward.
Sorry if the question is very lengthy. I know I could have abstracted some part but felt that some part of it gives an explanation of the context of the problem.
In the first case, using qemu-linux-user to perform user mode emulation of a single program, the task is quite easy because the memory is linear and there is no virtual memory involved in the emulator. The second case of whole system emulation is a lot more complex, because you basically have to parse the addresses out of the kernel structures.
If you can get the virtual addresses directly out of QEmu, your job is a bit easier; then you just need to identify the process and everything else functions just like in the single-process case. You might be able to get the PID by faking a system call to get_pid().
Otherwise, this all seems quite a bit similar to debugging a system from a physical memory dump. There are some tools for this task. They are probably too slow to run for every instruction, though, but you can look for hints there.

Resources