asp.net 4.5 GC perfmon changes? - garbage-collection

After weeks of testing we deployed .NET 4.5 (upgraded from 4.0) on our ASP.NET production application. Site functionality is solid as our testing demonstrated. There ARE differences though that we are working thru that may prompt other questions.
I have a question about the garbage collection performance monitoring counters. Prior to 4.5, my rule of thumb has been that there are 10x gen1 collections as gen2, and 10x gen0 collections as gen1. Based on this, a healthy snapshot of GC counters would be
gen0 1200
gen1 150
gen2 20
Now that 4.5 is running the 10x rule doesn't seem to apply anymore. I'm seeing numbers more like this (taken at roughly the same time of day as before):
gen0 850
gen1 650
gen2 400
I am also seeing more 'induced GC' than before.
In addition, 'bytes in all heeps' and 'cache entries' are much lower, and yet our site is very responsive and CPU is nice and low, as it was before.
When we deployed 4.5 we made NO changes to our 4.0 application.
I don't want to solve a problem that isn't here, but it appears 'normal' has changed. Does this make sense?

This was a false alarm. After two weeks of monitoring after the 4.5 migration the GC counters do indeed follow the original pattern - gen0 ~10x gen1 ~10x gen2.
So all is well! Now to get to the bottom of why compilations don't appear... :)

Related

Azure Web Application (windows) - stalling during CILJit::compileMethod in calls to Entity Framework Core

I have been looking into performance, specifically calls to an ASP.NET Core 3.1 Web API project that is running on Azure.
Note: yes, we should be moving to a later version of .NET Core, and that's in the pipeline, but it's not something I can just switch to without a bit of effort.
We are targeting netcoreapp3.1 for our libraries, and are referencing Entity Framework Core v3.1.5.
Looking at a typical end-to-end trace in Application Insights, we see this:
If I'm reading this correctly, we're spending a grand total of 135ms in the database executing queries, but between the last 2 queries we appear to stall for ~12 seconds!
When I dig into the profiler trace for this request, I see this:
Again, if I read this right, that means during the second DB call (from our end to end transaction above), we spend ~12.4 seconds inside the call to EntityFrameworkQueryableExtensions.ToListAsync() doing some jit compilation.
That looks excessive to me.
This appears to be a pattern that I see through out the day, even though the application is set to be Always On and there are no restarts of the application between occurrences of this.
The questions I have around this are:
is this to be typically expected?
if so, should it really be taking this long?
is there a way to reduce the need to jit as often as we appear to be doing?
will a move to .NET 6 (and future framework versions) help us here?
On average, the API operates pretty well, and does have a typical avg response time in the < 1 second range. However, when these do occur, they are noticeable and are causing me a headache.
Had exactly the same problem. After emailing azure with the issue, back and forth, it turns out the solution (that worked for us) was to turn off the profiler. https://github.com/dotnet/runtime/issues/66649
Off option - same in .net / .net core
After turning the option off we went from having over 1,500 requests a day taking over 10 seconds, to none taking over 5.
Thanks,
Dave

Performance degradation after migrating application from .net 5.0 to .net 6.0

I have upgraded my payment application from .Net 5.0 to .Net 6.0 without any single line of code changes. I performed stress test using jmeter on both pre migration and post migration releases.
The stress results of post migration are showing degradation in 2 respects, when users were 100-200 and duration was 5 mins:
lower TPS (Transaction per second)
High CPU usage
Am I missing something? Do i need to do some server level configurations to make my application give best results with .net6.0?
UPDATES:
The high CPU usage issue is resolved, which was due to a costly database call in one of the application. But still I cannot see any difference in the TPS, infact TPS is a bit dropped in .net 6.0.
I have set the below variables:
set DOTNET_ReadyToRun=0
set DOTNET_TieredPGO=1
set DOTNET_TC_QuickJitForLoops=1
Still no difference in performance can be seen. Pls. suggest.
It shouldn't be the case, looking at Performance Improvements in .NET 6 article your app should work faster.
If you're absolutely sure that the lower TPS is caused by higher CPU usage you need to use a profiler tool like YourKit or an APM tool to see what exactly causes the CPU usage or slowdown.
You also can try setting the following options:
set DOTNET_ReadyToRun=0
set DOTNET_TieredPGO=1
set DOTNET_TC_QuickJitForLoops=1

Fluctuating response times for Azure website

I have a .Net web app hosted on Azure and am on the S1 Production pricing tier (1x core, 1.75 GB memory, A-Series compute). What's weird is I am going through extended periods of poor performance. Usually my average response time is around the 1.4 s mark. Not good by any stretch but it's something I can work with. However I'm experiencing extended periods where the response time shoots up to around the 5 s or greater mark. These periods last for days, up to a week, before coming back to normal levels. My knowledge of Azure is pretty limited but I can't seem to find anything that would explain this.
average response time over the last 30 days
You might first want to identify if this is an issue with your web app itself or is this the trend of app usage(i.e. it receives max hits during specific weeks of a month).
There are several areas you might want to look for further diagnosis. A few are -
Look for the number of requests during the time the website is slow. This is a web part on the website overview page.
Check the diagnose and solve problems. It is self-service diagnostic and troubleshooting experiencing to help you resolve issues with your web app.
If you have a considerable user base(number of users) and its a production environment sometimes a 1 Core + 1.75 GB RAM might not be sufficient to bear the load. If you determine that this is due to a usage trend from your users then you can plan for scaling out/up your application to meet the demands of high usage.

SQL Azure Premium tier is unavailable for more than a minute at a time and we're around 10-20% utilization, if that

We run a web service that gets 6k+ requests per minute during peak hours and about 3k requests per minute during off hours. Lots of data feeds compiled from 3rd party web services and custom generated images. Our service and code is mature, we've been running this for years. A lot of work by good developers has gone into our service's code base.
We're migrating to Azure, and we're seeing some serious problems. For one, we are seeing our Premium P1 SQL Azure database routinely become unavailable for 1-2 full entire minutes. I'm sorry, but this seems absurd. How are we supposed to run a web service with requests waiting 2 minutes for access to our database? This is occurring several times a day. It occurs less after switching from Standard level to Premium level, but we're nowhere near our DB's DTU capacity and we're getting throttled hard far too often.
Our SQL Azure DB is Premium P1 and our load according to the new Azure portal is usually under 20% with a couple spikes each hour reaching 50-75%. Of course, we can't even trust Azure's portal metrics. The old portal gives us no data for our SQL, and the new portal is very obviously wrong at times (our DB was not down for 1/2 an hour, like the graph suggests, but it was down for more than 2 full minutes):
Azure reports the size of our DB at a little over 12GB (in our own SQL Server installation, the DB is under 1GB - that's another of many questions, why is it reported as 12GB on Azure?). We've done plenty of tuning over the years and have good indices.
Our service runs on two D4 cloud service instances. Our DB libraries are all implementing retry logic, waiting 2, 4, 8, 16, 32, and then 48 seconds before failing completely. Controllers are all async, most of our various external service calls are async. DB access is still largely synchronous but our heaviest queries are async. We heavily utilize in-memory and Redis caching. The most frequent use of our DB is 1-3 records inserted for each request (those tables are queried only once every 10 minutes to check error levels).
Aside from batching up those request logging inserts, there's really not much more give in our application's db access code. We're nowhere near our DTU allocation on this database, and the server our DB is on has like 2000 DTU's available to be allocated still. If we have to live with 1+ minute periods of unavailability every day, we're going to abandon Azure.
Is this the best we get?
Querying stats in the database seems to show we are nowhere near our resource limits. Also, on premium tier we should be guaranteed our DTU level second-by-second. But, again, we go more than an entire solid minute without being able to get a database connection. What is going on?
I can also say that after we experience one of these longer delays, our stats seem to reset. The above image was a couple minutes before a 1 min+ delay and this is a couple minutes after:
We have been in contact with Azure's technical staff and they confirm this is a bug in their platform that is causing our database to go through failover multiple times a day. They stated they will be deploying fixes starting this week and continuing over the next month.
Frankly, we're having trouble understanding how anyone can reliably run a web service on Azure. Our pool of Websites randomly goes down for a few minutes a few times a month, taking our public sites down. If our cloud service returns too many 500 responses something in front of it is cutting off all traffic and returning 502's (totally undocumented behavior as far as we can tell). SQL Azure has very limited performance and obviously isn't ready for prime time.

TFS and SharePoint are slower after upgrading to TFS2010 & SharePoint2010 [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
We've recently upgraded (migration path) from TFS/Sharepoint to TFS2010/Sharepoint2010.
Most things went well, but there were a few issues that became immediately apparent.
TFS was noticeably slower (as pointed out by the entire dev team). Basically all "get latest", and query operations were more sluggish. Starting VS2010 SP1 is also really slow with loading all the projects (40+) on my machine. A refresh after that is not normally a problem. Even though other people may only have 3-4 projects open at the time, they too noticed the "working..." delay.
Sharepoint was definitely much slower. The "Show Portal" takes forever to load, and the basic editing is slower too.
Work items occasionally "time out" for no reason, and end up in a "connection lost" error. It's normally while creating a new work item, and a redo of the same command works fine. It happens even during bulk work item creation, but the timing is random.
The server runs on Windows 2008, 12 GB, and plenty of CPU power (QuadCore). The IIS connectionTimeout is set to 2 minutes (default), I've played with the MinBytesPerSecond which is set to 240 by default (I've set it to 42 as well, but no joy), and I understand that VS 2010 in general might be a bit slower than its 2008 counterpart, but even then. No processors are maxed out. There are lots of MSSQLSERVER info logs in the Event Viewer though (I just noticed this - not sure if this is a problem). I've also changed the defaultProxy setting in the devenv.exe file - no joy there either.
It's too late for a downgrade. ;)
Has anyone experienced similar problems after the upgrade?
I would love to hear from ya! :o)
We experienced performance issues after upgrading from TFS 2008 to 2010 but it is much better now. We have learned that the Antivirus and SQL Server configurations are critical. In a virtualized environment store performance is key too. We have about 100 TFS users in a 2 tier Server setup.
The SQL server has it's default memory setting set as follows:
1 - SQL Server max memory 2TB
2 - Analysis Services max memory 100%
With those settings, our 8GB SQL machine was unusable.
Now we have:
1 - SQL Server max memory 4GB
2 - Analysis Server Max memory 15%
Now the performance is ok but not great.
The Antivirus Exclusions have to configured too. Basically excluded all the data and temp directories.
As our TFS setup is virtualized we are in the process of adding more storage hardware to have better disk performance. That should definitely solve our performance issues.
Simon
are all components installed on one machine? Is SQL layer also installed on that machine? Is the machine virtualized?
It's always better to install SQL layer on physical hardware than installing it virtually. SharePoint 2010 requires 4 gigs of RAM. To ensure that SharePoint is usable you should size the WFE with at least 8 gigs of RAM.
Our TFS was also slow with 4 gigs so I've added another 4 gigs. With this setup the entire environment is right now really fast.
Let's summarize
SQL: physical installation w/ 12GB RAM, Quad Core (duplicated for failover)
SharePoint: virtualized w/ 8GB RAM, Quad Core
TFS: virtualized w/ 8GB RAM, Quad Core
Both SharePoint and TFS are generating heavy load on the database. I've a showcase machine running on my Elitebook as HyperV image. The image has about 12 gigs of ram and is running on an external SSD but it is a lot slower than our productive environment.
I hope my thoughts and setups are helpful.
Thorsten

Resources