Been running my site on Azure for 2 months with no big issues. Today I notice a HUGE data spike in my DATAOUT. Note that I don't have any big media files (nothing > 100k on the site) and do not have an FTP site in use. I'm not even sure how anybody could transfer this much info if they wanted to.
A bit hard to read, but at 9:48am Eastern today the spike was 230 meg, while all other intervals are about 1 meg
I couldn't find any specific tools in Azure to help me track down what this spike is all about. An IP address (I suspect it's one IP) would be helpful.
It's really a strange Dataout spike. There are some ways you could try to trace down the spike.
If you have opened the diagnostics logs in App Service, you could search the related logs in Kudu at 9:48am.
If you have an application insight service which maps the App Service, you could filter data to find the specific spike record during this time. For example, I have a Dataout spike at 3:22 PM.
Related
I have a .Net web app hosted on Azure and am on the S1 Production pricing tier (1x core, 1.75 GB memory, A-Series compute). What's weird is I am going through extended periods of poor performance. Usually my average response time is around the 1.4 s mark. Not good by any stretch but it's something I can work with. However I'm experiencing extended periods where the response time shoots up to around the 5 s or greater mark. These periods last for days, up to a week, before coming back to normal levels. My knowledge of Azure is pretty limited but I can't seem to find anything that would explain this.
average response time over the last 30 days
You might first want to identify if this is an issue with your web app itself or is this the trend of app usage(i.e. it receives max hits during specific weeks of a month).
There are several areas you might want to look for further diagnosis. A few are -
Look for the number of requests during the time the website is slow. This is a web part on the website overview page.
Check the diagnose and solve problems. It is self-service diagnostic and troubleshooting experiencing to help you resolve issues with your web app.
If you have a considerable user base(number of users) and its a production environment sometimes a 1 Core + 1.75 GB RAM might not be sufficient to bear the load. If you determine that this is due to a usage trend from your users then you can plan for scaling out/up your application to meet the demands of high usage.
Whenever I start my Azure Functions app, Live Metrics Streaming in Application Insights shows that there are a high number of servers online -- currently sitting at 24 and climbing -- and seems to bring another server online every few seconds or so. This is before I even make a call to any HttpTriggers. And for every call I make to an HttpTrigger, Functions seems to bring another server online (at least that's what Application Insights is saying).
I'm at a complete loss of how to account for this behavior, especially since my app isn't doing anything yet (no requests). There aren't any logged errors that I can see except for the "Unable to acquire Singleton lock" one for new servers, which I'd expect. The CPU% for a server spikes when it comes online, but then goes back down to 0% within a few seconds and stays there.
Even more bizarre is that when I left the app running over night, I saw 180 servers online in the morning and calls to my API were very slow.
Is the "servers online" metric in Live Metrics Stream misleading with Azure Functions, or is there any way to find out why there are so many servers being brought online? Does anyone have any other suggestions for how to figure out what's happening?
I currently have 4 websites hosted in a S2 hosting plan and this evening received a CPU percentage alert. I went to the management portal and checked all of the sites hosted in the hosting plan, however found no reason for it to be so high. After checking site by site and finding no evidence of what could be causing this problem I went and stopped every site, much to my surprise the CPU usage did not drop and it's been a staggering 50% for the last 30 minutes, is there any way to find out what is causing this? Do you guys have any idea if it could be a bug in the azure sites service?
Thanks in advance.
A couple of things to check for:
- do you have any webjobs on that system? They also consume resources but don't show up in all reports.
You can also check the Kudu Process Monitor to see if there are any other processes running (maybe you've been hacked and someone is running something on your box?) If you've never used the Kudu tool, it is quite handy - to get to it in your browser, put '.scm' after the sitename in your url. For example, if your site is
'mysite.azurewebsites.net'
the Kudu tools are at
'mysite.scm.azurewebsites.net'
There is a process explorer in there that you can see what processes are running under your account.
We run a web service that gets 6k+ requests per minute during peak hours and about 3k requests per minute during off hours. Lots of data feeds compiled from 3rd party web services and custom generated images. Our service and code is mature, we've been running this for years. A lot of work by good developers has gone into our service's code base.
We're migrating to Azure, and we're seeing some serious problems. For one, we are seeing our Premium P1 SQL Azure database routinely become unavailable for 1-2 full entire minutes. I'm sorry, but this seems absurd. How are we supposed to run a web service with requests waiting 2 minutes for access to our database? This is occurring several times a day. It occurs less after switching from Standard level to Premium level, but we're nowhere near our DB's DTU capacity and we're getting throttled hard far too often.
Our SQL Azure DB is Premium P1 and our load according to the new Azure portal is usually under 20% with a couple spikes each hour reaching 50-75%. Of course, we can't even trust Azure's portal metrics. The old portal gives us no data for our SQL, and the new portal is very obviously wrong at times (our DB was not down for 1/2 an hour, like the graph suggests, but it was down for more than 2 full minutes):
Azure reports the size of our DB at a little over 12GB (in our own SQL Server installation, the DB is under 1GB - that's another of many questions, why is it reported as 12GB on Azure?). We've done plenty of tuning over the years and have good indices.
Our service runs on two D4 cloud service instances. Our DB libraries are all implementing retry logic, waiting 2, 4, 8, 16, 32, and then 48 seconds before failing completely. Controllers are all async, most of our various external service calls are async. DB access is still largely synchronous but our heaviest queries are async. We heavily utilize in-memory and Redis caching. The most frequent use of our DB is 1-3 records inserted for each request (those tables are queried only once every 10 minutes to check error levels).
Aside from batching up those request logging inserts, there's really not much more give in our application's db access code. We're nowhere near our DTU allocation on this database, and the server our DB is on has like 2000 DTU's available to be allocated still. If we have to live with 1+ minute periods of unavailability every day, we're going to abandon Azure.
Is this the best we get?
Querying stats in the database seems to show we are nowhere near our resource limits. Also, on premium tier we should be guaranteed our DTU level second-by-second. But, again, we go more than an entire solid minute without being able to get a database connection. What is going on?
I can also say that after we experience one of these longer delays, our stats seem to reset. The above image was a couple minutes before a 1 min+ delay and this is a couple minutes after:
We have been in contact with Azure's technical staff and they confirm this is a bug in their platform that is causing our database to go through failover multiple times a day. They stated they will be deploying fixes starting this week and continuing over the next month.
Frankly, we're having trouble understanding how anyone can reliably run a web service on Azure. Our pool of Websites randomly goes down for a few minutes a few times a month, taking our public sites down. If our cloud service returns too many 500 responses something in front of it is cutting off all traffic and returning 502's (totally undocumented behavior as far as we can tell). SQL Azure has very limited performance and obviously isn't ready for prime time.
I have an Orchard CMS website currently hosted on Windows Azure Websites.
Its a pretty standard blog where images are hosted via skydrive and linked, so the blog itself only serves html.
I've set it in Shared mode, running 1 instance.
But I keep getting quota reached. and it seems like my site is always maxing out the memory (max is 512mb per hour) and I can't understand why?
I've tried increasing to 3 instances, but it doesn't increase the maximum memory I can use.
Update:
The maximum usage for websites under Shared mode are:
CPU Time: 4 hours per day, 2.5minutes per 5 minute
File System: 1024mb
Memory usage: 512mb per hour
Database: 1024mb (web instance)
Update2:
I've tried re-creating my website in different zones. Currently my site is hosted in US West, which has the above limits, but other zones have slightly different limits, such as East Asia has 1024mb per hour memory usage limit! I haven't been able to dig up any documentation on this, which is puzzling.
Update3:
In Update2 I mentioned that different regions have different "memory usage per hour limit". This is actually not true. I had set up a new site under the "Free" setting with 1024mb per hour, but when I switched this to "Shared" the memory usage limit came down to 512mb per hour.
I have not been able to reproduce this issue in any of my other sites despite being the same source code, which leads me to believe its something weird with my particular azure website set up. Possibly something to do with the dashboard as mentioned by #Vinblad.
I'm planning to set up a new azure website in a different region, and while I'm at it, upgrade to Orchard 1.6
Had a similar issue on Azure with Orchard. It was due to the error log files continually increasing and taking up space. Manually deleting files at the moment but have to look into a more automated solution.
512MB / hour doesn't make any sense at all, I agree with Steve. 512MB (not per hour) is more than enough to host Orchard however. Try to measure memory on your local copy of the site. If you do get abnormal memory consumption, try to profile it and find the module that's responsible for it. If not, then contact Azure support and ask them why the same application would take more memory on Azure than on your local machine.
Another thing to investigate would be caching: do you have output caching enabled?
I saw this post on the Azure forums where they recommend disabling the dynamic module loader. We gave this a try but this gave us problems with the images so we had to revert back.