Should Azure Diagnostics be enabled for Production Deployment? - azure

I'm looking at performance improvements for Azure Web Role and wondering if Diagnostics should be left on when publishing/deploying to the production site. This article says to disable it, but one of the comments say you lose critical data.

You should absolutely leave it enabled. How else will you do monitoring or auto-scaling of your application, once it is running in production?
Whether you use on-demand monitoring software like RedGate/Cerebrata's Diagnostic Manager or active monitoring/auto-scaling service like AzureWatch, you need to have Diagnostics enabled so that your instances are providing the external software with a way to monitor it and visualize performance data.
Just don't go crazy and enable every possible diagnostic data to be captured at the most frequent rate possible, but do so on a need basis.
Consider the reality that these "thousands of daily transactions" cost approximately 1 penny for 100k of transactions. So, if you transfer data once per minute to table storage, this is 1440 transactions per server per day, or 43,200 transactions per server per month. A whopping 0.43cents per server per month. If the ability to quickly debug or be notified of a production issue is not worth 0.43 cents per server per month, then you should reconsider your cost models :)
HTH

Related

Choosing the right EC2 instance for three NodeJS Applications

I'm running three MEAN stack programmes. Each application receives over 10,000 monthly users. Could you please assist me in finding an EC2 instance for my apps?
I've been using a "t3.large" instance with two vCPUs and eight gigabytes of RAM, but it costs $62 to $64 per month.
I need help deciding which EC2 instance to use for three Nodejs applications.
First check CloudWatch metrics for the current instances. Is CPU and memory usage consistent over time? Analysing the metrics could help you to decide whether you should select a smaller/bigger instance or not.
One way to avoid too unnecessary costs is to use auto scaling groups and load balancers. By using them and finding and applying proper settings, you could have always right amount of computing power for your applications.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html
https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html
Depends on your applications. If your apps need more compute power or more memory or more storage? Deciding a server is similar to installing an app on system. Check what are basic requirements for it & then proceed to choose server.
If you have 10k+ monthly customers, think about using ALB so that traffic gets distributed evenly. Try caching to server some content if possible. Use unlimited burst mode of t3 servers if CPU keeps hitting 100%. Also, try to optimize code so that fewer resources are consumed. Once you are comfortable with ec2 choice, try to purchase saving plans or RIs for less cost.
Also, do monitor the servers & traffic using Cloudwatch agent, internet monitor etc features.

Can I disable all logs and metrics except the ContainerLogs?

I use Azure Kubernetes service, which is connected to Log Analytics Workspace.
Log Analytics Workspace collects too much data, what is quite expensive.
After googling the ways to reduce the costs I found a few recommendations, but most of them about reducing the ContainerLogs size. My case is different. I do need all ContainerLogs, but nothing else.
On the image you can see the result of running this query:
union withsource = tt *
| where TimeGenerated > ago(1day)
| where _IsBillable == true
| summarize BillableDataMBytes = sum(_BilledSize)/ (1000. * 1000.) by tt
| render piechart
As you can see I need only 3% of all stored data, the rest 97% I want to disable or reduce (take new values much rare). Is it possible?
Is it possible to disable/reduce at least the "Perf"?
Basic Logs and Archive Logs options can help you. Please take a look at this article.
You can find explanation below regarding Basic Logs.
The first option is the introduction of “Basic Logs”. This sits
alongside the current option, called “Analytics Logs”. Basic logs
allow you to designate specific tables in your Log Analytics workspace
as “basic”. This then means that the cost of the data ingestion is
reduced; however, some of the features of Log Analytics are not
available on that data. This means that you cannot configure alerts on
this data, and the options for querying the data are limited. You also
pay a cost for running a query against the data.
For the Archive Logs:
The second change is adding an option for archiving logs. It is often
necessary to keep certain logs for an extended period due to company
or regulatory requirements. Log Analytics does support retention of
logs for up to two years, but you pay a retention cost that is
relatively high because the data is kept in live tables that can be
accessed at any time.
Archive logs allow you to move the data into an offline state where it
cannot be accessed directly but is significantly cheaper. Archive data
is charged at $0.025 per GB per month, compared to $0.12 per GB per
month for standard data retention. If you need to access the archive
data, you pay an additional fee for either querying the archive or
restoring it to active tables. This costs $0.007 per GB of data
scanned for querying or $0.123 per GB per day of data restored (so
effectively the same as standard data retention.). These prices were
correct at the time of publishing, please check the Azure Monitor
pricing page.

SQL Azure Premium tier is unavailable for more than a minute at a time and we're around 10-20% utilization, if that

We run a web service that gets 6k+ requests per minute during peak hours and about 3k requests per minute during off hours. Lots of data feeds compiled from 3rd party web services and custom generated images. Our service and code is mature, we've been running this for years. A lot of work by good developers has gone into our service's code base.
We're migrating to Azure, and we're seeing some serious problems. For one, we are seeing our Premium P1 SQL Azure database routinely become unavailable for 1-2 full entire minutes. I'm sorry, but this seems absurd. How are we supposed to run a web service with requests waiting 2 minutes for access to our database? This is occurring several times a day. It occurs less after switching from Standard level to Premium level, but we're nowhere near our DB's DTU capacity and we're getting throttled hard far too often.
Our SQL Azure DB is Premium P1 and our load according to the new Azure portal is usually under 20% with a couple spikes each hour reaching 50-75%. Of course, we can't even trust Azure's portal metrics. The old portal gives us no data for our SQL, and the new portal is very obviously wrong at times (our DB was not down for 1/2 an hour, like the graph suggests, but it was down for more than 2 full minutes):
Azure reports the size of our DB at a little over 12GB (in our own SQL Server installation, the DB is under 1GB - that's another of many questions, why is it reported as 12GB on Azure?). We've done plenty of tuning over the years and have good indices.
Our service runs on two D4 cloud service instances. Our DB libraries are all implementing retry logic, waiting 2, 4, 8, 16, 32, and then 48 seconds before failing completely. Controllers are all async, most of our various external service calls are async. DB access is still largely synchronous but our heaviest queries are async. We heavily utilize in-memory and Redis caching. The most frequent use of our DB is 1-3 records inserted for each request (those tables are queried only once every 10 minutes to check error levels).
Aside from batching up those request logging inserts, there's really not much more give in our application's db access code. We're nowhere near our DTU allocation on this database, and the server our DB is on has like 2000 DTU's available to be allocated still. If we have to live with 1+ minute periods of unavailability every day, we're going to abandon Azure.
Is this the best we get?
Querying stats in the database seems to show we are nowhere near our resource limits. Also, on premium tier we should be guaranteed our DTU level second-by-second. But, again, we go more than an entire solid minute without being able to get a database connection. What is going on?
I can also say that after we experience one of these longer delays, our stats seem to reset. The above image was a couple minutes before a 1 min+ delay and this is a couple minutes after:
We have been in contact with Azure's technical staff and they confirm this is a bug in their platform that is causing our database to go through failover multiple times a day. They stated they will be deploying fixes starting this week and continuing over the next month.
Frankly, we're having trouble understanding how anyone can reliably run a web service on Azure. Our pool of Websites randomly goes down for a few minutes a few times a month, taking our public sites down. If our cloud service returns too many 500 responses something in front of it is cutting off all traffic and returning 502's (totally undocumented behavior as far as we can tell). SQL Azure has very limited performance and obviously isn't ready for prime time.

Reduce costs of Azure availability set

I am planning on running Sharepoint Foundation on one VM size A3 and SQL Server on another of size A6. As far as I understand this is not enough to achieve SLA and I should use 2 more instances - one for Sharepoint and one for SQL Server configured in 2 seperate availability sets.
Can I use scaling (by CPU usage) to turn off one instance and leave only one running at a time in an availability set? This would reduce the costs but I wonder if this solution will be good enough to achieve Azure's SLA. The way I see it one instance is running at a time while other one is shut down so I am billed for one instance. When there is an update or failure going on, the instance that until then has been running is shut down and the other one comes online. Is this the way it works? Can I cut costs of availability sets like this?
no, the SLA requires two running instances. However, if you want to control your costs, the approach you have in place will work. Just keep in mind that the duration/window for a disruption will be dependent on how quickly you detect that the primary VM has failed, and how fast you can start the secondary VM. And depending on the nature of the service disruption, it may not be possible for you to start the secondary. So its a risk.

Windows Azure, MSDN offer, 750 small compute hours

I'm an msdn subscriber and I'm looking at Azure as a possible platform for a new website that will test the water of a new service. This website is expecting low to very low traffic at the time of launch. I've heard that this kind of traffic levels is very expensive for Azure but since they have this msdn offer, I thought I'd finally take a look at Azure.
In the offer, I'm looking at getting "750 small compute hours per month". From the reading I've done, this seems that, if I purchase nothing more than what's given (although the subscription itself is thousands of dollars of course), that an entire month would be covered. Since 24 (hours) x 31 (max days in a month) = 744 I'm still below my allotted 750 for the month.
Am I missing something else from this simple equation? Is there further aspects that could cause the site to be "turned off" temporarily that should be considered?
Yes, you can indeed run a small instance during a whole month. Or you can have 2 extra-small instances instead (having 2 instances means you're covered by the SLA).
There are 2 other things you need to consider:
Depending on your subscription you can have maximum 45GB of storage (blob/table/queue). If you use Virtual Machines you need to know that the system disk (and additional data disks) are persisted as blobs, so make sure not to reach the limit here.
There are also other limits active, but the most important one besides storage is the data transfer limit which is also very limited (max 35GB out).
If you're expecting very low traffic, did you ever consider Windows Azure Web Sites? You get 10 of those for free during 12 months. The free ones run on shared instances, but they are perfect to host the first low-traffic version of your app.

Resources