I am running Redmine on Apache 2 with mod_rails (passenger) 2.0.3 and Enterprise Ruby 1.8.6. Every so often I get a segfault from Apache when I try to login. Anyone know how I can debug this issue? I see something like this in Apache's error.log:
[Mon Jan 19 17:09:48 2009] [notice] child pid 8714 exit signal Segmentation fault (11)
The only way I can get the application to work after that is restart the whole system (restarting Apache only doesn't help).
First steps are:
Find out where the core file is being left on your system (enable core dumps if necessary).
Run file(1) on the resulting core file. This will probably say "... generated by httpd", but it's as well to check.
Fire up gdb against the executable name from (2) and the core file from (1), and start digging. The command where (or bt) is a good place to start: this will give you a stack trace at the time the process dumped core.
It sounds like you don't have a mass of C coding experience, so good luck! Tracking down this kind of error can be a real dog. You can try posting the stack trace from (3) here, but don't hold your breath whilst waiting for an answer. At best, the failing function name might be a good string to feed to Google.
I ran into a similar issue with a segfault (11). Found the following question on ServerFault which offered an upgrade as a solution.
Was running an older version of Ubuntu and had the segfault problem. A do-release-upgrade brought my system to Ubuntu 11.10 and the problem magically went away.
Related
I've been trying to install MIT Scheme under a 64-bit Windows 10 installation, however whenever I try to start the program I get the following error message:
>>The system has trapped within critical section "band load".
>>The trap is an ACCESS_VIOLATION trap.
>>Successful recovery is unlikely.
Then I'm presented with the option to try to recover, but the program then crashes with another ACCESS_VIOLATION.
I have already tried installing it in different directories and drives, setting the heap size, running with and without --edit and running it in multiple compatibility modes.
GDB reports this:
Thread 1 received signal SIGSEGV, Segmentation fault.
0x779f2a4c in win32u!NtUserMessageCall () from C:\WINDOWS\SysWOW64\win32u.dll
Is there a way to fix this problem or should I just not bother and try another implementation?
Thank you for your help!
I am having a recurring problem when using perf with Intel-PT event. I am currently performing profiling on a Intel(R) Xeon(R) CPU E5-2620 v4 # 2.10GHz machine, with x86_64 architecture and 32 hardware threads with virtualization enabled. I specifically use programs/source codes from SpecCPU2006 for profiling.
I am specifically observing that the first time I perform profiling on one of the compiled binaries from SpecCPU2006, everything works fine and the perf.data file gets generated, which is as expected with Intel-PT. As SpecCPU2006 programs are computationally-intensive(use 100% of CPU at any time), clearly perf.data files would be large for most of the programs. I obtain roughly 7-10 GB perf.data files for most of the profiled programs.
However, when I try to perform profiling the second time on the same compiled binary, after the first one is successfully done -- my server machine freezes up. Sometimes, this happens when I try profiling the third time/the fourth time (after the second or third profiling completed successfully). This behavior is highly unpredictable. Now I cannot profile any more binaries unless I have restarted the machine again.
I have also posted the server error logs which I get once I see that the computer has stopped responding.
Server error logs
Clearly there is an error message saying Fixing recursive fault but reboot is needed!.
This happens for particularly large enough SpecCPU2006 binaries which take more than 1 minute to run without perf.
Is there any particular reason why this might happen ? This should not occur due to high CPU usage, as running the programs without perf or with perf but any other hardware event(that can be seen by perf list) completed successfully. This only seems to happen with Intel-PT.
Please guide me in using the steps to solve this problem. Thanks.
Seems I resolved this issue now. So will post an answer.
The server crashed because of a null pointer dereference/access happening with a specific member of the structure perf_event. Basically the member perf_event->handle was the culprit. This information, as suggested by #osgx, was obtained from var/log/syslog file. A portion of the error message was :-
Apr 19 04:49:15 ###### kernel: [582411.404677] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ea
Apr 19 04:49:15 ###### kernel: [582411.404747] IP: [] perf_event_aux_event+0x2e/0xf0
One possible scenario where this structure member turns out to be NULL is if I start capturing packets even before an earlier run of perf record finished releasing all of its resources. This has been properly handled in kernel version 4.10. I was using kernel version 4.4.
I upgraded my kernel to the newer version and it works fine now!
Im looking for a way how to get rid of (kernel?) messages that appear in my ncurses app. I wrote the app myself, so i would prefer a API that redirects these messages to /dev/null. I mean messages like, a USB stick that is inserted.
I tried to add this, but unfortunately it doesn't work
freopen("/dev/null", "w", stderr);
I'm not running X, just ncurses direct from the console.
I mean messages like, a USB stick that is inserted.
Thanks!
UPDATE 1:
Someone votes to close this question because it would not be related to programming. But it is, i wrote the ncurses app myself, i'm looking for a way how to disable the kernel message. I updated the question.
UPDATE 2:
Let me explain what i'm doing, and whats the problem in more detail:
I'm using Tiny Core linux, thats after boots starts (self written) ncurses program. Now when you for example connect a USB drive, a message (i suspect kernel) is shown over my program. I guess the message is written straight into the framebuffer. Im using TC 5.x since i need 32 bit, im running as root and have full access to the os.
You should be able to use openvt to have your program run on a new Virtual Terminal.
I'll also note that it should be possible to embed control for the VTs yourself if you prefer to break the external dependency, but note that structures used may not be stable between kernel versions, and may require recompilation.
See the KBD project's sources, specifically openvt.c to see how it works.
Try configuring the kernel through boot parameters with the option:
loglevel=3 (or a lower value)
0 (KERN_EMERG) system is unusable
1 (KERN_ALERT) action must be taken immediately
2 (KERN_CRIT) critical conditions
3 (KERN_ERR) error conditions
4 (KERN_WARNING) warning conditions
5 (KERN_NOTICE) normal but significant condition
6 (KERN_INFO) informational
7 (KERN_DEBUG) debug-level messages
source: https://www.kernel.org/doc/Documentation/kernel-parameters.txt
See also: Change default console loglevel during boot up
It might be impossible to block some other process with sufficient access from writing to /dev/console but you may be able to redefine console as some other device, at boot time by setting console=ttyS0 (first serial port), see:
https://unix.stackexchange.com/questions/60641/linux-difference-between-dev-console-dev-tty-and-dev-tty0
Also if we know exactly which software is sending the message it may be possible to reconfigure it (possibly dynamically) but it would help to know the version and edition of Tiny Core Linux you are using?
E.g. this website has a "Core", "TinyCore" and "CorePlus" versions 1.x up to 7
http://tinycorelinux.net/downloads.html
This would help reproducing the exact same behavior and testing potential solutions.
So I was looking into why a program was getting rid of my background, and the author of the program said to post .xsession-errors and many people did. Then my next question was: What is .xsession-errors? A google search reveals many results but nothing explaining what it is.
What I know so far:
It's some kind of error log. I can't figure out what it's related too (ubuntu itself? programs?)
I have one and it seems like all Ubuntu systems have it, though I cannot verify.
Linux graphical interfaces (such as GNOME) provide a way to run applications by clicking on icons instead of running them manually on the command-line. However, by doing so, output from the command-line is lost - especially the error output (STDERR).
To deal with this, some display managers (such as GDM) pipe the error output to ~/.xsession-errors, which can then be used for debugging purposes. Note that since all applications launched this way dump to the same log, it can get quite large and difficult to find specific messages.
Update: Per the documentation:
The ~/.xsession-errors X session log file has been deprecated and is
no longer used.
It has been replaced by the systemd journal (journalctl command).
It's the error log produced by your X windows system (which the Ubuntu GUI is built on top of).
Basically it's quite a low level error log for X11.
I am an intern who was offered the task of porting a test application from Solaris to Red Hat. The application is written in Ada. It works just fine on the Unix side. I compiled it on the linux side, but now it is giving me a seg fault. I ran the debugger to see where the fault was and got this:
Warning: In non-Ada task, selecting an Ada task.
=> runtime tasking structures have not yet been initialized.
<non-Ada task> with thread id 0b7fe46c0
process received signal "Segmentation fault" [11]
task #1 stopped in _dl_allocate_tls
at 0870b71b: mov edx, [edi] ;edx := [edi]
This seg fault happens before any calls are made or anything is initialized. I have been told that 'tasks' in ada get started before the rest of the program, and the problem could be with a task that is running.
But here is the kicker. This program just generates some code for another program to use. The OTHER program, when compiled under linux gives me the same kind of seg fault with the same kind of error message. This leads me to believe there might be some little tweak I can use to fix all of this, but I just don't have enough knowledge about Unix, Linux, and Ada to figure this one out all by myself.
This is a total shot in the dark, but you can have tasks blow up like this at startup if they are trying to allocate too much local memory on the stack. Your main program can safely use the system stack, but tasks have to have their stack allocated at startup from dynamic memory, so typcially your runtime has a default stack size for tasks. If your task tries to allocate a large array, it can easily blow past that limit. I've had it happen to me before.
There are multiple ways to fix this. One way is to move all your task-local data into package global areas. Another is to dynamically allocate it all.
If you can figure out how much memory would be enough, you have a couple more options. You can make the task a task type, and then use a
for My_Task_Type_Name'Storage_Size use Some_Huge_Number;
statement. You can also use a "pragma Storage_Size(My_Task_Type_Name)", but I think the "for" statement is preferred.
Lastly, with Gnat you can also change the default task stack size with the -d flag to gnatbind.
Off the top of my head, if the code was used on Sparc machines, and you're now runing on an x86 machine, you may be running into endian problems.
It's not much help, but it is a common gotcha when going multiplat.
Hunch: the linking step didn't go right. Perhaps the wrong run-time startup library got linked in?
(How likely to find out what the real trouble was, months after the question was asked?)