My application deployed in Weblogic instance is getting too slow sometimes. At that time, it's hitting the error related to Stuck Thread Time in Managed server log. Initially, when I noticed this, I did some research and increased the value of Max Stuck Thread Time to 800 seconds in place of 600 seconds. But, this didn't fix the issue. I got the following error again.
WatchRule: (SEVERITY = 'Error') AND ((MSGID = 'WL-000337') OR (MSGID = 'BEA-000337'))
WatchData: MESSAGE = [STUCK] ExecuteThread: '58' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "812" seconds working on the request "Http Request Information: weblogic.servlet.internal.ServletRequestImpl#42f38088[POST /****/faces/index.jsf]
", which is more than the configured time (StuckThreadMaxTime) of "800" seconds in "server-failure-trigger". Stack trace:
oracle.jbo.pcoll.PCollNode.objectAt(PCollNode.java:1753)
oracle.jbo.pcoll.PCollNode.objectAt(PCollNode.java:1753)
oracle.jbo.pcoll.PCollection.elementAt(PCollection.java:839)
oracle.jbo.server.QueryCollection.get(QueryCollection.java:2556)
oracle.jbo.server.ViewRowSetImpl.getRow(ViewRowSetImpl.java:5540)
oracle.jbo.server.ViewRowSetIteratorImpl.getRangeIndexOf(ViewRowSetIteratorImpl.java:1179)
oracle.jbo.server.ViewRowSetIteratorImpl.notifyRowUpdated(ViewRowSetIteratorImpl.java:3491)
We are using
ADF 12c for application development
Weblogic Version: 12.2.1 in windows server
Database : Oracle 11g
Jdk version : 1.8-65
Can anyone please advise me on the reason and possible solution for this issue?
Thanks in advance.
Increasing the stuck threads time will not resolve your issue. From the stack trace it looks like your application may be reading some file. There can be different reason for slowness
1) May be the file which you are reading is very large
2) May be slow network
3) May be IO latency
If possible you can introduce some debugging messages in your application. This will provide the details on the process which your application is running.
If you are running your application on linux/unix environment then you can monitor process activity inside the /var/proc directory
Related
I have a Blazor server side app published on IIS 10.
When browsing to an arbitrary page and just letting it idle after a minute or so (sometimes only 45 sec, sometimes something between 1 and two minutes) the modal
Attempting to reconnect to server ...
appears for a couple of seconds.
In the browser console the logging shows either
Error: Connection disconnected with error 'Error: Server timeout
elapsed without receiving a message from the server.'.
or
Information: Connection disconnected.
Since this seems to be a timeout problem I added the following options to ConfigureServices in my startup.cs
services.AddServerSideBlazor()
.AddHubOptions(options =>
{
options.ClientTimeoutInterval = TimeSpan.FromMinutes(10);
options.KeepAliveInterval = TimeSpan.FromSeconds(3);
options.HandshakeTimeout = TimeSpan.FromMinutes(10);
});
This does not solve the problem though.
I also went to the advanced settings of my site in IIS and increased the connection timeout from the default 120 sec to 600 sec. This did not help either.
Those frequent disconnections only happen on the live site hosted on IIS 10.
If I start the app locally with Visual Studio the connection is stable.
Any hints of what I'm missing would be appreciated!
Update:
As suggested by #agua from mars in comment below I changed transport type like this
app.UseEndpoints(endpoints =>
{
endpoints.MapControllers();
endpoints.MapBlazorHub(options => { options.Transports = HttpTransportType.LongPolling; });
endpoints.MapFallbackToPage("/_Host");
});
With this change the connection is still closed. The console log shows
Information: (LongPolling transport) Poll terminated by server.
I also tried HttpTransportType.ServerSentEvents which does not work at all but gives this error
Error: Failed to start the connection: Error: Unable to connect to the
server with any of the available transports. ServerSentEvents failed:
Error: 'ServerSentEvents' does not support Binary.
Update 2:
The IIS is configured to use HTTP 1.1
I tried changing to HTTP/2 but this did not change anything regarding the disconnections.
This is related to application pool recycling in IIS as stated by #Programmer. You can reproduce this by going into the application pool, right click the pool and choose recycle to force it. Your blazor app will get the "reconnect modal screen".
For me, I did not want to disable pool recycle, so I added js in the _Hosts.cshtml file as
<script>Blazor.defaultReconnectionHandler._reconnectCallback = function (d) {document.location.reload();}</script>
to automatically reconnect when the server comes back up.
Try this out..
app.UseEndpoints(endpoints =>
{
//other settings
.
.
endpoints.MapBlazorHub(options => options.WebSockets.CloseTimeout = new TimeSpan(1, 1, 1));
//other settings
.
.
});
This could be related to IIS application pool recycling. Try disabling the recycling to see if that's casing the disconnection.
I suffer the same problem on my Blazor server too: Myspector.com
I am sure this comes from network of data provider. I use Othello in Germany with 4G and see disconnection in 5 sec . When I am with wifi with t online on same target server no disconnection at all.
I Think some operators are incompatible with Blazor server/websoscket....
My recent experience especially on a shared server, increase the pool memory. Connectivity issues went away when we bumped 256MB up to 1GB for a small user base.
I need to connect a NodeJS to AS/400 server. In order to do that, I installed NodeJS in AS/400 with IBM documentation and tryed (succesfully) to send and receive data with Class iDataQueues (https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/IBM%20i%20Technology%20Updates/page/Toolkit%20for%20i%20APIs?section=Class%20iDataQueue) between AS/400 server and my develope computer. My next step was to run a program (Class iPgm) to received some data.
I'm running my test with
/path/to/ibm/node/installation/node /home/test/app.js"
and I'm getting a "node[537]: pthread_create: Resource temporarily unavailable" error, but I have no idea what is the problem:
Server has enough resources
There are not any node running in server
There are not any changes in server config between iDataQueue and iPgm tests
I have root privileges
I think my code is not the problem because I dont' even run it ¬¬'
This is caused by running node in QSH. By default QSH does not support running threads, so any attempt to create a thread will result in that "Resource temporarily unavailable" message. This can fixed by setting the environment variable QIBM_MULTI_THREADED=Y prior to running QSH, using QP2TERM, or using SSH.
Backgound:
I'm currently hosting an ASP.NET application in Azure with the following specs:
ASP .Net Core 2.2
Using Flurl for HTTP requests
Kestrel Webserver
Docker (Linux - mcr.microsoft.com/dotnet/core/aspnet:2.2 runtime)
Azure App Service on P2V2 tier app service plan
I have a a couple of background jobs that run on the service that makes a lot of outbound HTTP calls to a 3rd party service.
Issue:
Under a small load (approximately 1 call per 10 seconds), all requests are completed in under a second with no issue. The issue I'm having is that under a heavy load, when service can make up to 3/4 calls in a 10 second span, some of the requests will randomly timeout and throw an exception. When I was using RestSharp the exception would read "The operation has timed out". Now that I'm using Flurl, the exception reads "The call timed out".
Here's the kicker - If I run the same job from my laptop running Windows 10 / Visual Studios 2017, this problem does NOT occur. This leads me to believe I'm hitting some limit or running out of some resource in my hosted environment. Unclear if that is connection/socket or thread related.
Things I've tried:
Ensure all code paths to the request are using async/await to prevent lockouts
Ensure Kestrel Defaults allow unlimited connections (it does by default)
Ensure Dockers default connection limits are sufficient (2000 by default, more than enough)
Configuring ServicePointManager settings for connection limits
Here is the code in my startup.cs that I'm currently using to try and prevent this issue:
public class Startup
{
public Startup(IHostingEnvironment hostingEnvironment)
{
...
// ServicePointManager setup
ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;
ServicePointManager.DefaultConnectionLimit = int.MaxValue;
ServicePointManager.EnableDnsRoundRobin = true;
ServicePointManager.ReusePort = true;
// Set Service point timeouts
var sp = ServicePointManager.FindServicePoint(new Uri("https://placeholder.thirdparty.com"));
sp.ConnectionLeaseTimeout = 15 * 1000; // 15 seconds
FlurlHttp.ConfigureClient("https://placeholder.thirdparty.com", cli => cli.Settings.ConnectionLeaseTimeout = new TimeSpan(0, 0, 15));
}
}
Has anyone else run into a similar issue to this? I'm open to any suggestions on how to best debug this situation, or possible methods to correct the issue. I'm at a complete loss after researching this for several days.
Thank you in advance.
I had similar issues. Take a look at Asp.net Core HttpClient has many TIME_WAIT or CLOSE_WAIT connections . Debugging via netstat helped identify the problem for me. As one possible solution. I suggest you use IHttpClientFactory. You can get more info from https://learn.microsoft.com/en-us/aspnet/core/fundamentals/http-requests?view=aspnetcore-2.2 It should be fairly easy to use as described in Flurl client lifetime in ASP.Net Core 2.1 and IHttpClientFactory
I am running an azure webjobs SDK console application (continuous) with the recommended setup:
public static void ProcessQueueMessage([QueueTrigger("logqueue")] string logMessage, TextWriter logger)
The azure queue I am running against has ~6000 messages in it and I am running the web-job locally, as a console application.
The problem I'm having is that the processing randomly stops after processing between zero and ~30 messages. The console stays open, but no more console messages are displayed.
For example, it might just process 2 messages:
Executing: 'Functions.ProcessQueueMessage' - Reason: 'New queue message detected on 'QueueName'.'
Executed: 'Functions.ProcessQueueMessage' (Succeeded)
Executing: 'Functions.ProcessQueueMessage' - Reason: 'New queue message detected on 'QueueName'.'
Executed: 'Functions.ProcessQueueMessage' (Succeeded)
And then, nothing. There doesn't seem to be anything wrong with my internet connection and I can't trace the issues down to any particular messages.
Has anyone else had issues with this SDK?
Update:
I made sure that I was using the right versions of all of the dependencies by removing the nuget packages and then re-running install-package Microsoft.Axure.Webjobs. I am now using webjobs version 1.1.0 which has pulled in version 4.3 of azure storage.
As recommended by Matthew, I have pulled down the source code for azure webjobs to determine where the process is freezing up. Once the freez-up occurs, I pause execution and checked the running threads for what I believe is the culprit within Microsoft.Azure.WebJobs.Host.CompositeTraceWriter
protected virtual void InvokeTextWriter(TraceEvent traceEvent)
{
if (_innerTextWriter != null)
{
string message = traceEvent.Message;
if (!string.IsNullOrEmpty(message) &&
message.EndsWith("\r\n", StringComparison.OrdinalIgnoreCase))
{
// remove any terminating return+line feed, since we're
// calling WriteLine below
message = message.Substring(0, message.Length - 2);
}
_innerTextWriter.WriteLine(message);
if (traceEvent.Exception != null)
{
_innerTextWriter.WriteLine(traceEvent.Exception.ToDetails());
}
}
}
The line it freezes on is line 66 : _innerTextWriter.WriteLine(message);
_innerTextWriter is an instance of System.IO.TextWriter.SyncTextWriter
Is it possible there is some deadlock issue with this class or the way it is being used?
Some notes:
I am running in the debugger, so in this case I believe the textwriter is forwarding to the console internally
I have my batchsize set to 1 via config.Queues.BatchSize = 1;, not sure if that could matter
I'm currently working on setting up an environment on another computer so that I can see if it is reproducible somewhere other than this machine (surface book).
Update
The issue was me not understanding how the new windows 10 command prompt works. Any time you click on the command window, it goes into "select" mode which completely pauses execution of the process.
Basically: https://superuser.com/questions/419717/windows-command-prompt-freezing-randomly?newreg=ece53f5584254346be68f85d1fd2f18d
You can tell it is in this state because it will prefix the window title with the word "Select":
You have to press enter or click again to get it going once again.
So, two final comments:
1) What an incredibly confusing and un-intuitive behavior for a command window!
2) I hope some admin will come take pity on the shame I have brought upon myself and my family by deleting this question.
To get rid of this strange behavior, you can disable QuickEdit mode:
Strange. When it is in this stuck state, can you try adding a new queue message to the queue and see if that triggers? Are you sure your function isn't hanging internally? What version of the SDK are you using? You might also try upgrading to v1.1.0 which we just released last week. If there are really a bunch of messages in the queue waiting to be processed, I can't think of anything that would cause this. The queue listener in the SDK should chug along, reading batches of messages in parallel and dispatching them to your function. Have you changed any of the JobHostConfiguration.Queues configuration knobs? You haven't force updated the version of the Azure SDK have you to something higher than the WebJobs SDK supports?
Another option if you can't figure this out might be to clone the SDK, build it and debug it locally. The repo is here. The main queue processing loop is here.
I am using a Mobile Services in Windows Azure. I use the new Scheduler available for a Mobile Service. The Scheduler I called SendOut.
I am running a pretty simple script that will insert a message to a queue. The entire script:
function SendOut() {
var azure = require('azure');
var queueService = azure.createQueueService("mailsoutscheduler", "[The key to the storage]");
queueService.createQueueIfNotExists("mailsout", function(error){ });
queueService.createMessage("mailsout", "SendOut", function(error){});
}
It works fine when I try to run the script once. It it scheduled to run every 5 minutes. And it usually goes fine. However sometimes I receive this error:
An unhandled exception occurred. Error: One of your scripts caused the
service to become unresponsive and the service was restarted. This is
commonly caused by a script executing an infinite loop or a long,
blocking operation. The service was restarted after the script
continuously executed for longer than 1000 milliseconds.
at EventEmitter. (C:\DWASFiles\Sites\VogSendOut\VirtualDirectory0\site\wwwroot\runtime\server.js:84:17)
at EventEmitter.emit (events.js:88:20)
I cannot figure out why I get this error - or how to solve it.
Could it be because it's running in the FREE Mobile Service Tier?
I don't think it's due to the FREE mobile subscription.
try to add a
try{}
catch{} block
and use console.log() to log if an error occured. It could help you to resolve you problem.