I have strange problem with my multi threaded server. It is Windows service and works similar to FTP server managing socket connection to many clients. It was created using Delphi 2006 (Turbo Delphi) and works well on most machines. Unfortunately on some machines it sometimes crashes without any trace from itself (exception should be saved to log, but are not). Sometimes system shows MessageBox (it is not MessageBox from service, but I think it is system MessageBox), but most often I see such information in System EventLog:
Application popup: ht_switch.exe - Application Error : The exception unknown software exception (0x0eedfade) occurred in the application at location 0x77e4bef7.
In Application EventLog I can see:
Faulting application ht_switch.exe, version 1.2.0.2, faulting module kernel32.dll, version 5.2.3790.5069, fault address 0x0000bef7.
Sometimes such entries are in Application or System EventLog, but nothing happens -- my server works as usually, but sometimes is simply disappears. Then Service Manager reports in EventlLog that my service unexpectedly stopped.
I see no "common" scenario to such problem. It appears on some WinXP, Win2003 and Win2008. All test machines have all MS patches applied.
I have read answers to: 0x0eedfade kernelbase.dll faulting module in d7 windows service but I do not use Dialog unit.
What can I do to repair it? How to trace such 0x0eedfade exception?
EDIT
I tested for some days my server with both EurekaLog and madExcept.
EurekaLog:
Server works without problem. No exception is reported in EventLog. No exception is reported in %AppData%\EurekaLab s.a.s\EurekaLog\Bug Reports\ (there should be directory for my program, but it was not created -- I don't know if it should be created or if it is an EurekaLog error).
EurekaLog7 have problem with setting "Application Type" to Windows Service. It is known problem and authors works on it. My service compiled with it works on WinXP but was not able to work on Win2003. It simply do not start.
madExcept:
Server worked for 4 hours and crashed. I have caught this exception in my thread:
EAccessViolation: Access violation at address 7C90100B in module 'ntdll.dll'. Read of address 00000018!!!
I haven't noticed any madExcept report on this exception. After this exception one thread was lost with socket in CLOSE_WAIT state (other side closed connection). Then I restarted my service and after next few hours it worked without problem.
disabled EurekaLog and madExcept:
After 10-30 minutes I see MessageBox with error. But 0x0eedfade error is cryptic and do not show me any hint on source of the problem. It is also very strange because after displaying such message service works without problem (most of the time).
Summary od exception interceptors:
EurekaLog and madExcept are probably good at exceptions raised by Delphi but it seems that change behavior of my service and error magically disappeared or they report exception to place I cannot find.
EDIT: Problem solved
After some debugging that lead me to nowhere (Call Stack with very strange places) I resigned from it and started to inspect lastly commited changes. One change was string operation where string (AnsiString) can be of length 64 or 128 (some kind of bit mask). I set 70th character of string that was earlier allocated with SetLength(buffer, 64). That was the problem. I think I would save time by enabling range checking.
How to trace such 0x0eedfade exception?
This is the code for a Delphi exception. Clearly you are raising a Delphi exception that is not being handled and that is bringing your process down.
You should add madExcept, EurekaLog, JCLDebug or similar to your process. These tools will produce diagnostics reports when your process fails. The most useful part of those reports will be the stack trace at the point of failure. You should be able then to work out where the failure occurs, at the very least, and this usually is enough to work out what is wrong with your code.
Related
I have an application in the Production environment which is Windows Server 2012/IIS 8 and is load balanced.
Recently out of nowhere, the website app pool suddenly started gettig disabled. The System Windows Logs logged the following error message by the Resource-Exhaustion-Detector ...
Application Pool 'x' is being automatically disabled due to a series of failures in the process(es) serving that application pool.
Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: w3wp.exe (6604) consumed 5080641536 bytes, w3wp.exe (1572) consumed 477335552 bytes, and w3wp.exe (352) consumed 431423488 bytes.
Anyone got any idea how I figure out what is happening? We've never come across this issue before and the application has been running for a good couple of years.
Also, this isn't something that happens regularly but instead seems to happen one every day or so, and even that is at a random time. The Virtual Memory was initially 4GB but because of the issue above, we increased it to 8GB. Recently it spiked at using about 6.8GB out of 8GB, which it has no reason to do so.
Any help would be really appreciated!
The answer is easy here, obviously and certainly you have two issues here
1- You have a serious bug in your process/code that happens intermittently "you need to debug it to find how/when that happens" or at least run a ProcDump
such that you keep it listening on the server on the process W3WP till an exception happens and then analyze this dump to find where the code get stuck and consume that memory/otherwise just debug the code and see what changes were made in last few months "not days"
2- the application get stopped because you have configured/it is configured by default to get disabled break after a certain number of failure repeats, and that's a normal behavior but the main issue as I said is not the application pool itself, its inside the process
please let me know if you need a further explanation or help on this matter
I'm using NServiceBus (v 4.0.5) on an Azure virtual machine using the Azure Service Bus transport (v 4.0.5). The NServiceBus.Host service has been crashing on an occasional basis but lately has been crashing more often than not. The exception thrown is:
Application: NServiceBus.Host.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: Microsoft.ServiceBus.Common.CallbackException
Stack:
at Microsoft.ServiceBus.Common.Fx+IOCompletionThunk.UnhandledExceptionFrame(UInt32, UInt32, System.Threading.NativeOverlapped*)
at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
I'm using dedicated machine running the generic host service, and I have 3 machines which send messages to it (I don't use pub/sub).
What I've tried
Rebooting / restarting the service manually.
Researching the error: not many people seem to have received this message, and for the people that have, their response did not apply to my situation.
Verifying the dead letter queue: several messages are placed in the dead letter queue (over 400 in the past 6 months), but I could not correlate any specific message types to the crash (at least 40% of my message types have been found in the dead letter queue). I'm assuming that most of these messages have been added to the DLQ because the service is failing.
Checking application logs: my application logs exceptions to a log4net log, however no exceptions were logged during the time of the crashes.
Checking event logs: nothing relevant was found except for the main error message noted above.
Upgrading NServiceBus to 4.4.2 and WindowsAzureServiceBus package to 5.1.1: due to NuGet package conflicts upgrading is proving to be painful. I'm using Microsoft.Data.OData 5.4.0 and Microsoft.Data.Edm 5.4.0, but the NServiceBus.Azure package depends on v5.2.0 of these assemblies. I could discard the nuget package dependencies and add the references myself, but I'd like to know why the WindowsAzureServiceBus package depends specifically on v5.2.0 before doing this.
Any thoughts or ideas would be helpful.
Thank you!
I will look into this, It sounds like a bug, most likely an unhandled exception coming from the azure servicebus (but doesn't necessarily originate there)
I've created a github issue here: https://github.com/Particular/NServiceBus.Azure/issues/133
Are you able to reproduce the issue? And what has changed between the time where you saw it occasionally and where it happens often.
One thing you could do is to add an eventhandler for all exceptions occuring on the appdomain and log those as well, that should theorethically catch anything and if there is an innerexception to this callback exception you could catch it this way.
On the strict dependency of the packages. This is mostly done because nuget package manager does not apply binding redirects to the app.config of worker roles, which tripped up way to many users in the past (it often manifests itself as an infinitly rebooting worker role). So go ahead and override.
I have an iOS4 Application, sometimes if any crash happens in my application i am not able to launch my application it crashes again and again and not allow me to launch application for continuously 4 or 5 times.
This is my application log error code:
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x50000000
How to solve these kind of application crashes?
From the error, it looks like you might try to access memory that's been already released or never actually allocated.
If you have no idea, where the crash is coming from, you have to work out bottom-up where the crash might occur. Disable all functionality of your app and gradually turn on until the crash comes up. That way you can isolate the memory issue you're having.
But first try running Analyze function in XCode - it may find potential problems you may be having in the code.
Behavior:
Application is loaded and being used as expected.
Suddenly, a particular DLL can no longer be loaded. The error message is:
ActiveX component cannot create object.
In each case, the object had been created successfully many times before failure. All objects are marked for "retain in memory".
This error is cleared when the application pool is recycled. It may be hours or months before it is seen again.
Issue has happened within two hours of a refresh, as well as never happened in months of uptime.
Issue has happened with hundreds of simultaneous users (heavy usage) and also with 1-3 users.
While the issue is occurring, the process running that application pool cannot create the object that is failing. However it can create any other objects. Memory, CPU, and other resources all remain at normal usage. In addition, other processes (such as a stand-alone exe) can successfully create the object.
The first instance of the issue appeared in mid 2008. There have been less than fifty instances since then, despite a pool of hundreds of servers for it to occur on. All instances except one have failed on the same DLL.
DLL Failure Info:
most common - generic data structure implementing a b-tree, has no references other than to its interface. Code consists of arrays and one use of the vb6 Event functionality. The object has not been changed in any way since 2005.
one-time - interop to a .NET module. the failure is occurring when trying to create the interop object, not the .NET object. This object is updated a few times each year.
Application Environment:
IIS hosted application
VB6, classic ASP, some interop to minor .NET components
Windows Server 2003 / Windows Server 2008 (both have independently had the problem)
Attempts to Reproduce:
Using scripts (and real-life humans) to run the same end-user workflows that our logs reported the days before the issue occurred.
Using scripts to create/destroy suspected objects as fast as possible from multiple simultaneous sessions.
Wild speculation.
No intentional success, but it does manifest randomly on the servers on its own.
Troubleshooting:
Code reviews
Test harnesses to investigate upper limits of object creation / destruction
Verification of ability to create object outside of the process experiencing the issue
Monitoring of resources over time on servers under load
Review of IIS, error, and event logs to determine events leading up to issue
Questions:
Any ideas on how to reproduce the issue?
What could cause this behavior?
Ideas for bypassing the first two questions in favor of a fast solution?
The DLL isn't on a network drive is it? You can get "glitches" where the drive is not available momentarily that then means COM can't do what it needs and could then fail to notice the drive is available again.
I used Process Monitor to debug similar problem when accessing ADO/OLEDB stack. Turned out environment got corrupted at some point and ADO classes are registered with InprocServer32 being REG_EXPAND_SZ pointing to %CommonProgramFiles%\System\ado\msado15.dll or similar ot x64 OSes.
Also when you register an application with Restart Manager, on failure the process gets restarted by winlogon process whose environment is different than explorer's one and unfortunately is missing %CommonProgramFiles% -- ouch!
This seems like a random failure; some race condition.
Try VMWARE to record the state of the machine you run this dll on. When the error happens you can then replay the record and inspect the memory contents. That why you won't have to play try and catch the error. At least you will have a solid record of it.
While I can't provide a solution, try catching the error and retry loading the dll when this happens after a refresh to the environment.
Our Website is in .NET but with some old ASP and 32bits libraries too in it. It had been working fine for a while (2 years). But for the past month, we have seen the following error on our IIS7 server, which we have been unable to track down and fix:
"Faulting application w3wp.exe, version 7.0.6001.18000, time stamp 0x47919413, faulting module kernel32.dll, version 6.0.6001.18215, time stamp 0x4995344f, exception code 0xe053534f, fault offset 0x0002f328, process id 0x%9, application start time 0x%10."
We are able to reproduce the error:
One of our .ASPX pages starts loading, executing code and queries (we have response.flush() all over the page to track where the code breaks), then it suddenly stops and we get the above error in IIS.
The page stops loading and, without the response.flush(), it's not redirecting to our error.aspx page (as configured in web.config)
The error does NOT happen all the time. Sometimes, it happens 3 times in a row, then it's working fine for 15 minutes non-stop with a proper redirection to error.aspx.
The error we get then is a classic: "Either BOF or EOF is True, or the current record has been deleted."
When the error occurs, the page hangs and all other session on the same computer from any browsers have hanging web pages as well (BTW, we only allow 1 worker process while we are testing). From other computers, the site loads fine.
I can recycle the Application Pool, kill w3wp.exe, restart IIS. Nothing will do. The only way to successfully load the page again is to Restart MS SQL which handles our Session States. I don't know why this is, but we guessed that the Session Cookies on the users browsers points to a thread which was not terminated properly (due to the above crash) and IIS is waiting for it to terminate to process more code (?). If someone can explain this better, that would be really helpful. Is there a timeout which we can set to "terminate" threads? Is it a MS SQL related issue?
I have also looked at the Private and Virtual Memory usages, because I think our code is not the most effective and I am certain we have remaining memory leaks. However, I saw the page crash even though both Private and Virtual Memories were still quite low (under 100MB each).
I have used Debug Diag and WinDbg as indicated here: http://blogs.msdn.com/b/tess/archive/2009/03/20/debugging-a-net-crash-with-rules-in-debug-diag.aspx, but we are not able to make windbg work, this is what we are trying to do at the moment.
If someone could help us or point us toward the right direction that would be really great, thank you.
"Either BOF or EOF is True, or the current record has been deleted" means the table is empty and you are attempting to do a MoveNext. So check for eof before you do any moves.
IIS is notorious for throwing kernel errors in w3wp.exe like this one. All your errors in session state are just symptoms of the crashed process. Multiple APP pools won't help much - they just spread the error around.
I''d wager it is SQL deadlocks due to your user environment changing. This will cause a 10-second lag as SQL tries to determine which query to kill off. One wins, one loses. The loser gets back a pointer to an unexpectedly empty table and you try a move and subsequent crash. You maybe could point your DB to an ODBC connection and turn on tracing, or figure out a way to get SQL to log it.
I had all the same symptoms as above in Perl. I was able to make a wrapper fn() to do all SQL queries and log all sql, + params and any errors to disk to track down the problem. It was deadlocks, then we were able to code in auto-retry, and eventually we recoded the query order and scanned columns to eliminate the deadlocks.
It's entirely possible one of your referenced/linked assemblies somewhere has randomly gone corrupt (it can happen) on disk. Can you try a replicate the problem on a new, clean machine with the same stats, fresh installs of the latest xyz drivers you're using?
I solved a mysterious problem that took me months to isolate this way. It seemed clean, new machines with the same specs and prerequired drivers would work just fine - only some older machines with the same specs were failing consistently. I ended up uninstalling everything (IIS, ASP.NET, .NET, database and client) and starting from scratch. The end cause when I isolated it was that the db client driver was corrupt on the older machines (and all the older machines were clones of each other, so I assume they were cloned after the corruption occured), and it seemed to be messing with the .NET memory space even when I wasn't calling it directly. I have yet to even reply to my "help me debug this monster" post with this answer because I doubted it would ever help anyone.
We started receiving this error after installing windows updates on a Windows Server 2008R2 machine. Windows Process Activation Service (WAS) installs some additional site bindings that caused issues for our setup.
We removed net.tcp, net.pipe, net.msmq, and msmq.formatname bindings from our website and no longer got the faulting application exception.
This is probably an edge case, but just in case someone is coming here and they are using MVCMailer , I was getting this same error due to the .SendAsync() method on the mailers.
I switched them all to .Send() and the crashing stopped.
See this SO answer for ways to use the mailer async and avoid the crash (allegedly, I did not personally implement it)