HTTP JVM: boom : Message in domino server log - xpages

I can see "HTTP JVM: boom" message in server log coming sometimes,
what does it mean?
Can you please explain if something going wrong.
Server sometimes crashes also, is it any clue to that. We have a big xpage application running on server which is being accessed by lot of people always and posting big images files etc.

This is a sysout from developer unit testing code included as a consequence of a concurrent code merge conflict in an early 8.5.3 codestream. It has since been removed from the codestream. It was part of debugging fail-safe try/catch code introduced when making the CD/MIME code more robust but not something you should be alarmed about. It is highly unlikely to be related to the crash you sometimes experience - you should report this crash via the regular IBM support channel.

It is probably a result of a System.out.println in your Java code that is running somewhere.

Related

Server performance degrading over time (a short time) - Xpages

I have an xpages 9.0.1 server running multiple online form type apps. The server runs fine and performance response is quite good. Pages load fast, users are happy.
Over time (yet to determine how long), the server performance degrades and ultimately grinds to an almost stop.
Each night I am scheduling -c "tel http restart" and it is getting me out of trouble.
I am not sure what page is causing the problem as the degrading happens over a couple of days.
Most of our xpages are SSJS, all of our java (of which there is not much) is appropriately recycled.
It does not seem to be effecting RAM memory - it bounces up a bit and down a bit but well with limits. There is no correlation with the increased response times to more memory used.
So where do I look and what tools can I use to isolate the problem. We are more Dev than Admin.
Cheers
Damien
There are profiling tools available that may help pinpoint which application is causing problems. From OpenNTF, XPages Toolbox is specifically for XPages and was contributed by Philippe Riand, who at the time was Chief Architect for XPages http://www.openntf.org/main.nsf/project.xsp?r=project/XPages%20Toolbox.
There are more heavy-duty, Java-specific tools like YourKit available.
Chapter 20 of Mastering XPages second edition specifically covers performance and there is also a lot of information in XPages Portable Command Guide about performance tuning.
If performance is degrading over time, it could be session timeout. By default, that's 30 minutes. You can extend it, but the danger then is that a browser cannot tell the server it's closing the session when the user closes the browser. So those sessions hang around. Equally if there are very long-running tasks, they would hang around until they complete and the session would then still be active until the timeout.
Are you recycling your SSJS?
If you go into the server tasks of Domino Admin what do you see the CPU usage of the HTTP task doing. Also what is the memory usage of your nHTTP task? You may want to watch that.
Have you gone into the console to see if you can see if there us anything that looks bad?
If you can't pinpoint a problem you may want to think of putting some of your pages on a different server to determine if which app if not all is causing this.
Are you using scoped variables that are session or application scope? Application scope variables stay alive so if you are creating those and have some sort of issue where you end up creating a bunch that can affect memory.
Also there is a server and application setting for when the XPages stay in memory. The suggested setting to Keep only the current page in memory and save the others to disk. This is in the XSP properties.

Benchmarking Apache using Specweb 2009

I am bench marking Apache 2.2 performance using Specweb2009 specifically by using the support workload. I want to load the server so as to make it execute at high CPU load.
The issue is when I run the test using 10 Apache server processes and 25 load clients the test fails with the errors like HTTPRequestSched: [ERROR] Bad response (-1). Request was GET /support/downloads/dir0000000019/download5_0 HTTP/1.1
SPECweb_Support: [ERROR] STATE 6; makeHttpRequest() failed.
Connection: [ERROR] Bad status: 404 and still the CPU usage for the Apache processes reaches maximum of 0.7 %.
On the other hand if I use more Apache processes like 20 and client load processes 25 my test passes the QOS without nay errors but the CPU % for Apache still only touches maximum of 0.7%.
My understanding is that in the first case Apache processes are loaded heavily so they are not able to send the responses but in that case they should show high CPU utilization but they are not.
Am I missing something in the config files?
I will really appreciate if any one can suggest any good links or just suggest some config changes to rectify the issue.
Some of my config files are:
httpd-mpm.conf
httpd.conf
SPECweb_Support.config
Test.config
Thanks
You do know that SpecWeb2009 is retired now?
Having said that, the last time I set up for this was a few years ago, and from what I remember, to get it working, you had to follow the recommended steps carefully (verifying as you go along). In particular, you need to setup for an entire Specweb suite test, else it's pointless and you might as well use apache bench or something else.
In terms of documentation/guidelines, I found that looking through reports worked best in showing how the various elements were setup. This is a typical report, scroll down and read the Notes (HTTP Software Notes). That was an IBM originated report, and they discuss a little bit how they configured for apache in this PDF report (see around page-14 forwards).
So, on the whole, I would say:
ensure you have a good set of guides
be prepared to configure for the whole suite, or seek another benchmarking solution
go through various reports and compare their configs with what you have.
don't be reluctant to start over when things don't seem to be working.
Some documentation links:
a specweb2009 suggested design document
of course, the spec2009 user manual (in case you haven't seen it)
this is a chinese site (i think) but the config directives are in english
this brief paper talks webserver benchmarking methodologies (seems a useful quick ref)
Hope you find this useful ..

Azure ServiceBus Queue Timing Out

Encountering a strange issue with one of our queues (for production, no less). When I try to put a message onto the queue, it's throwing an exception that simply states:
A timeout has occurred during the operation
The messages do seem to be making it onto the queue, as evidenced by the fact that I can see the queue length increasing in the management portal. However, the client application is not receiving any messages.
The management portal shows that there have been several failed requests, and also several internal server exceptions; though unfortunately I don't see any way to get more details about those failed requests and errors.
I'm somewhat at a loss as to what may have caused this, how to get more information about what's wrong, and how to move ahead in troubleshooting this. Any help would be greatly appreciated.
edit: I should mention just for completeness sake, that I did not make any changes to the clients that I'm aware of; This issue just sort of started happening all of a sudden
edit #2, woke up this morning, and things have magically returned to normal. Still not sure what happened, so I'd like to change the tone of the question to solicit suggestions as to how this kind of thing may be mitigated and/or troubleshooted (troubleshot? troubleshat? :) ) better
I have experienced this scenario too. When I tried too create a new service bus namespace, and pointed my app to this new namespace, it worked for me. This suggests that it might be some hardware failure going on (on the node where your sb-namespace resides).
Be sure to use transient failure handling, for example http://www.nuget.org/packages/EnterpriseLibrary.WindowsAzure.TransientFaultHandling/
But there might as well be required too use a "second level retry" for errors that are not transient. This you have to code yourself.
Too be more fault tolerant you can also use the new feature of paired namespaces. Here is a good resource: http://msdn.microsoft.com/en-us/library/dn292562.aspx
Hth
//Peter

Node.js and Socket.io become unresponsive

I have a relatively simple chat-type application running on Node.js and Socket.io. The node server streams chat data from a Minecraft server and then streams this to any clients connected on the website using Socket.io. A working demo of the system can be found here: standardsurvival.com/chat.
It works decently fine for the most part, but every once in a while the node server stops responding and active connections die shortly thereafter. The process will start consuming 100% CPU during this time but memory always stays relatively constant, so I'm doubting any sort of memory leak is involved.
It's been super frustrating as I haven't been able to reproduce the issue consistently enough to figure out what the problem is, and I don't know where to look. I've been setting up loops and commenting out various parts of the pipeline between the node server and website to try and pinpoint what may be causing it. No luck as of yet.
The code behind this system can be found here and here.
Any ideas?
Well I ended up figuring out what the problem was. A library I'm using was opening net.Sockets for standard HTTP requests to the Minecraft server, but was never actually closing them. Apparently the "end" event was never called when the request finished. So eventually all available file handles for the process were being used up and causing new requests to outright fail, which made the server appear to stop responding. I would have found this out sooner if I logged this error. Lesson learned.
I added a timeout to all sockets to fix this at least temporarily. Now the server has been running for days without a single issue :)

IIS7: Faulting application w3wp.exe, what is the root cause of these crashes?

Our Website is in .NET but with some old ASP and 32bits libraries too in it. It had been working fine for a while (2 years). But for the past month, we have seen the following error on our IIS7 server, which we have been unable to track down and fix:
"Faulting application w3wp.exe, version 7.0.6001.18000, time stamp 0x47919413, faulting module kernel32.dll, version 6.0.6001.18215, time stamp 0x4995344f, exception code 0xe053534f, fault offset 0x0002f328, process id 0x%9, application start time 0x%10."
We are able to reproduce the error:
One of our .ASPX pages starts loading, executing code and queries (we have response.flush() all over the page to track where the code breaks), then it suddenly stops and we get the above error in IIS.
The page stops loading and, without the response.flush(), it's not redirecting to our error.aspx page (as configured in web.config)
The error does NOT happen all the time. Sometimes, it happens 3 times in a row, then it's working fine for 15 minutes non-stop with a proper redirection to error.aspx.
The error we get then is a classic: "Either BOF or EOF is True, or the current record has been deleted."
When the error occurs, the page hangs and all other session on the same computer from any browsers have hanging web pages as well (BTW, we only allow 1 worker process while we are testing). From other computers, the site loads fine.
I can recycle the Application Pool, kill w3wp.exe, restart IIS. Nothing will do. The only way to successfully load the page again is to Restart MS SQL which handles our Session States. I don't know why this is, but we guessed that the Session Cookies on the users browsers points to a thread which was not terminated properly (due to the above crash) and IIS is waiting for it to terminate to process more code (?). If someone can explain this better, that would be really helpful. Is there a timeout which we can set to "terminate" threads? Is it a MS SQL related issue?
I have also looked at the Private and Virtual Memory usages, because I think our code is not the most effective and I am certain we have remaining memory leaks. However, I saw the page crash even though both Private and Virtual Memories were still quite low (under 100MB each).
I have used Debug Diag and WinDbg as indicated here: http://blogs.msdn.com/b/tess/archive/2009/03/20/debugging-a-net-crash-with-rules-in-debug-diag.aspx, but we are not able to make windbg work, this is what we are trying to do at the moment.
If someone could help us or point us toward the right direction that would be really great, thank you.
"Either BOF or EOF is True, or the current record has been deleted" means the table is empty and you are attempting to do a MoveNext. So check for eof before you do any moves.
IIS is notorious for throwing kernel errors in w3wp.exe like this one. All your errors in session state are just symptoms of the crashed process. Multiple APP pools won't help much - they just spread the error around.
I''d wager it is SQL deadlocks due to your user environment changing. This will cause a 10-second lag as SQL tries to determine which query to kill off. One wins, one loses. The loser gets back a pointer to an unexpectedly empty table and you try a move and subsequent crash. You maybe could point your DB to an ODBC connection and turn on tracing, or figure out a way to get SQL to log it.
I had all the same symptoms as above in Perl. I was able to make a wrapper fn() to do all SQL queries and log all sql, + params and any errors to disk to track down the problem. It was deadlocks, then we were able to code in auto-retry, and eventually we recoded the query order and scanned columns to eliminate the deadlocks.
It's entirely possible one of your referenced/linked assemblies somewhere has randomly gone corrupt (it can happen) on disk. Can you try a replicate the problem on a new, clean machine with the same stats, fresh installs of the latest xyz drivers you're using?
I solved a mysterious problem that took me months to isolate this way. It seemed clean, new machines with the same specs and prerequired drivers would work just fine - only some older machines with the same specs were failing consistently. I ended up uninstalling everything (IIS, ASP.NET, .NET, database and client) and starting from scratch. The end cause when I isolated it was that the db client driver was corrupt on the older machines (and all the older machines were clones of each other, so I assume they were cloned after the corruption occured), and it seemed to be messing with the .NET memory space even when I wasn't calling it directly. I have yet to even reply to my "help me debug this monster" post with this answer because I doubted it would ever help anyone.
We started receiving this error after installing windows updates on a Windows Server 2008R2 machine. Windows Process Activation Service (WAS) installs some additional site bindings that caused issues for our setup.
We removed net.tcp, net.pipe, net.msmq, and msmq.formatname bindings from our website and no longer got the faulting application exception.
This is probably an edge case, but just in case someone is coming here and they are using MVCMailer , I was getting this same error due to the .SendAsync() method on the mailers.
I switched them all to .Send() and the crashing stopped.
See this SO answer for ways to use the mailer async and avoid the crash (allegedly, I did not personally implement it)

Resources