I have a WCF service that receives requests from a Sivlerlight client, it reads or writes to a SQL database and then it sends the response back to the client.
With 6 or 7 clients delay starts to increase and I would like to check if the more instances I have, the lesser delay (because of load balancing).
I am trying to implement the autoscaling application block but the rules for storage (such as queue lenght) or CPU usage aren't useful for me because I don't use a storage account and although I have 10 clients connected to the service, the instance CPU usage barely reaches 10%.
How could I set a rule to start a new instance when a certain number of connections is reached?
You can capture IIS's Requests Current performance counter and send it to Windows Azure Diagnostics.
I'm fairly certain that WASABi can scale your WCF/WebRole application based on any performance counter. If for some reason WASABi cannot auto-scale you based on that performance counter or you need more features later, you can try AzureWatch (personal plug)
But either way, ASP.NET\Requests Current (if you're using IIS to hose your WCF) seems like the performance counter you need.
Related
I'm trying to find the optimal cloud architecture to host a software on Microsoft Azure.
The scenario is the following:
A (containerised) REST API is exposed to the users through which they can submit POST and GET requests. POST requests trigger a backend that needs a robust configuration to operate properly and GET requests are sent to fetch the result of the backend, if any. This component of the solution is currently hosted on an Azure Web App Service which does the job perfectly.
The (containerised) backend (triggered by POST requests) perform heavy calculations during a short amount of time (typically 5-10 minutes are allotted for the calculation). This backend needs (at least) 4 cores and 16 Gb RAM, but the more the better.
The current configuration consists in the backend hosted together with the REST API on the App Service with a plan that accommodates the backend's requirements. This is clearly not very cost-efficient, as the backend is idle ~90% of the time. On top of that it's not really scalable despite an automatic scaling rule to spawn new instances based on the CPU use: it's indeed possible that if several POST requests come at the same time, they are handled by the same instance and make it crash due to a lack of memory.
Azure Functions doesn't seem to be an option: the serverless (consumption plan) solution they propose is restricted to 1.5 Gb RAM and doesn't have Docker support.
Azure Container Instances neither, because first the max number of CPUs is 4 (which is really few for the needs here, although acceptable) and second there are cold starts of approximately 2 minutes (I imagine due to the creation of the container group, pull of the image, and so on). Despite the process is async from a user perspective, a high latency is not allowed as the result is expected within 5-10 minutes, so cold starts are a problem.
Azure Batch, which at first glance appears to be a perfect fit (beefy configurations available, made for hpc, cost effective, made for time limited tasks, ...) seems to be slow too (it takes a couple of minutes to create a pool and jobs don't run immediately when submitted).
Do you have any idea what I could use?
Thanks in advance!
Azure Functions
You could look at Azure Functions Elastic Premium plan. EP3 has 4 cores, 14GB of RAM and 250GB of storage.
Premium plan hosting provides the following benefits to your functions:
Avoid cold starts with perpetually warm instances
Virtual network connectivity.
Unlimited execution duration, with 60 minutes guaranteed.
Premium instance sizes: one core, two core, and four core instances.
More predictable pricing, compared with the Consumption plan.
High-density app allocation for plans with multiple function apps.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-premium-plan?tabs=portal
Batch Considerations
When designing an application that uses Batch, you must consider the possibility of Batch not being available in a region. It's possible to encounter a rare situation where there is a problem with the region as a whole, the entire Batch service in the region, or your specific Batch account.
If the application or solution using Batch always needs to be available, then it should be designed to either failover to another region or always have the workload split between two or more regions. Both approaches require at least two Batch accounts, with each account located in a different region.
https://learn.microsoft.com/en-us/azure/batch/high-availability-disaster-recovery
So I have hosted asp.net core 2.2 web service on Azure(S2 plan). The problem is that my application sometimes getting high CPU usage(almost 99%). What I have done for now - checked process explorer on azure. I see there a lot of processes who are consuming CPU. Maybe someone knows if it's okay for these processes consume CPU?
Currently, I don't have an idea where do they come from. Maybe it's normal to have them here.
Shortly about my application:
Currently, there is not much traffic. 500-600 request in a day. Most of the request is used to communicate with MS SQL by querying records, adding, etc.
As well I am using MS Websocket, but high CPU happens when no WebSocket client is connected to web service, so I hardly believe that it's a cause. I tried to use apache ab for load testing, but there isn't any pattern, that after one request's load test, I would get high CPU. So sometimes happens, sometimes don't during load testing.
So I just update screenshot of processes, I see that lots of threads are being locked/used during the time when fluent migrator start running its logging.
Update*
I will remove fluent migrator logging middleware from Configure method. Will look forward with the situation.
UPDATE**
So I removed logging of FluentMigrator. Until now I didn't notice any CPU usage over 90%.
But still, I am confused. My CPU usage is spinning. Is it health CPU usage graph or not?
Also, I tried to make a load test on the websocket server.
I made a script that calls some functions of WebSocket every 100ms from 6-7 clients. So every 100ms there are 7 calls to WebSocket server from different clients, every function within itself queries some data/insert (approximately 3-4 queries of every WebSocket function).
What I did notice, on Azure S1 DTU 20 after 2min I am getting out of SQL pool connections, If I increase DTU to 100, it handles 7 clients properly without any errors of 'no connection pool'.
So the first question: is it a normal CPU spinning?
Second: should I get an error message of 'no SQL connection free' using this kind of load test on DTU 10 Azure SQL. I am afraid that when creating a scoped service on singleton WebSocket Service I am leaking connections.
This topic gets too long, maybe I should move it to a new topic?
-
At this stage I would say you need to profile your application and figure out what areas of your code are CPU intensive. In the past I have used dotTrace, this highlighted methods which are the most expensive with a call tree.
Once you know what areas of your code base are the least efficient, you can begin to refactor them so that they are more efficient. This could simply be changing some small operations, adding caching for queries or using distributed locking for example.
I believe the reason the other DLLs are showing CPU usage is because your code calling methods which are within those DLLs.
I have one asp.net web app running on Azure. It can receive users REST request, and process the data and send it back.
I have setup Scale up for Azure instances if the CPU percentage goes upto 60%, and Scale down if CPU percentage moves below 40%.
One thing I observed that, My tasks are most IO bound not CPU intensive tasks. So if the number of request increases then my app is keeping all the requests waiting.
How can I make my App to process 100's of request at once ?
I am observing that my app can process 2 request in parallel, Is there a way to increase this parallelism ?
How to setup Azure Scaling based upon the number of incoming requests.
I would recommend making sure that you are correctly using async/await in applicable areas. If your web app is performing some long-running operation (such as a database call), you should definitely be using async/await to free up threads to service other requests.
Second, even a small instance should be able to handle more than two requests at a time.
Third, you can set up custom rules to scale your web application on metrics other than CPU utilization. Disk queue length is another metric that you might find useful to scale on.
I have a windows service which wakes up everyday at a particular time and finds around 100k transactions that it needs to process. It will spawn 25 threads, which look at the bucket of transactions that need to be processed and will make a call to a WCF service.
This WCF service will do some internal processing and make a synchronous call to an external service (which we have mocked and written an emulator for the sake of volume testing). Using this setup for shorter runs for around 10k transactions we were able to achieve a TPS of around 10.
I scaled this setup to have three load balanced servers running our WCF services and two other servers running the emulator, also we increased the number of threads on the windows service to 75 threads. With this new setup we expected an increase in performance, but the TPS is still at 10.
I have performance monitor running on all five machines. The three loaded balanced servers which have the WCF service are showing an "Outstanding Calls" of around 25 constantly in "ServiceModelService" category for the WCF service. But the two servers which have the emulators running show only around 9 "Outstanding Calls" constantly for the mocked out service. This same emulator was showing around 20 "Outstanding Calls" when it was running on a single server.
My questions are:
Why is there no increase in TPS in the three load balanced machines setup?
Where is the bottleneck in this system?
The target is to get to a TPS of around 30 with the three loaded balanced servers running the WCF service.
Note: I have increased the maxconnection limit in the web config on the WCF service and windows service to 200 which increased the TPS from around 6 to the current value of 10.
Edit: More information, if each of the load balanced server has 25 outstanding calls, shouldn't the mocked external service have 3*25=75 outstanding calls?
By maxconnection limit I meant:
<system.net>
<connectionManagement>
<add address="*" maxconnection="200" />
</connectionManagement>
</system.net>
You probably can't give us enough information to diagnose the problem accurately. However, what you describe gives me enough to recommend some places to look.
First, you probably shouldn't be spawning 25 (or more) threads in your Windows service. Rather, you can have a single thread looking at the "bucket of transactions," and make asynchronous calls to the WCF service. You can control the number of concurrent transactions using a Semaphore. Something like:
Semaphore _transactionSemaphore = new Semaphore(25, 25);
while (transactionCount > 0)
{
transactionSemaphore.WaitOne(); // wait for a free spot
var transaction = getTransaction();
DoAsyncWcfCall(transaction);
}
And the async completed event (see the above link about asynchronous calls) releases the semaphore:
void AsyncCompletedEvent(...)
{
// do after-call processing
// and then release the semaphore
transactionSemaphore.Release();
}
When transactionCount gets to 0, you have to wait for all outstanding calls to complete. You do that by repeatedly waiting on the semaphore: 25 times. That is:
for (int i = 0; i < 25; ++i)
{
transactionSemaphore.WaitOne();
}
If your main thread has gobbled up the semaphore, then you know that there can't be any outstanding calls.
You can extend that concurrent transaction count to 50 or 75, or whatever value you like. Your hardware being able to handle it, of course.
The difference here is that asynchronous service calls use I/O completion ports rather than individual threads. Allocating a thread that just sits there and waits is very wasteful. With the I/O completion port, the only time a thread gets involved is when the call completes--in the async completed method. Those threads are allocated automatically by the thread pool.
If the service is constantly showing 25 outstanding calls, then the total of outstanding calls for all of the servers better not be more than that. If the WCF services are showing more outstanding transactions than the Windows service is showing, then you have a problem. Come to think of it, if the service is showing more outstanding calls than do the load balanced servers, you also have a problem. If the two don't match, then somebody's dropping something: either the Windows service thinks that it has outstanding calls that the WCF services think are filled, or vice-versa.
It's hard to say where the bottleneck is. If you're experiencing high CPU usage on the Windows service machine, that's probably your bottleneck. You say that the WCF services call yet another external service. That external service could be the bottleneck. Depending on how you mocked it, the mock could be the bottleneck. Have you determined how much time that external service takes? That the services running the mocked service seem to have lower throughput than the WCF service that's talking to the real service makes me think there's a problem with performance of your mock.
I suppose it's possible that your WCF services aren't properly cleaning up resources, and they are spending an inordinate amount of time in garbage collection. Have you verified that you're using the server garbage collector? I think that's the default for WCF services, but you need to check.
Given the information you've provided, I consider those the most likely possible bottlenecks.
One other thing. It's incredibly wasteful to have a Windows service that does nothing but sit there and once a day "wakes up" to process some transactions. You should make that a console application and schedule a task to run it once per day. You can use the Task Scheduler GUI, or you can schedule it with the schtasks command. See Programs are not cats.
Another benefit of making your program a console app rather than a service is that it's a whole lot easier to debug a console app.
I'm running a Windows Azure web role which, on most days, receives very low traffic, but there are some (foreseeable) events which can lead to a high amount of background work which has to be done. The background work consists of many database calls (Azure SQL) and HTTP calls to external web services, so it is not really CPU-intensive, but it requires a lot of threads which are waiting for the database or the web service to answer. The background work is triggered by a normal HTTP request to the web role.
I see two options to orchestrate this, and I'm not sure which one is better.
Option 1, Threads: When the request for the background work comes in, the web role starts as many threads as necessary (or queues the individual work items to the thread pool). In this option, I would configure a larger instance during the heavy workload, because these threads could require a lot of memory.
Option 2, Self-Invoking: When the request for the background work comes in, the web role which receives it generates a HTTP request to itself for every item of background work. In this option, I could configure several web role instances, because the load balancer of Windows Azure balances the HTTP requests across the instances.
Option 1 is somewhat more straightforward, but it has the disadvantage that only one instance can process the background work. If I want more than one Azure instance to participate in the background work, I don't see any other option than sending HTTP requests from the role to itself, so that the load balancer can delegate some of the work to the other instances.
Maybe there are other options?
EDIT: Some more thoughts about option 2: When the request for the background work comes in, the instance that receives it would save the work to be done in some kind of queue (either Windows Azure Queues or some SQL table which works as a task queue). Then, it would generate a lot of HTTP requests to itself, so that the load balancer 'activates' all of the role instances. Each instance then dequeues a task from the queue and performs the task, then fetches the next task etc. until all tasks are done. It's like occasionally using the web role as a worker role.
I'm aware this approach has a smelly air (abusing web roles as worker roles, HTTP requests to the same web role), but I don't see the real disadvantages.
EDIT 2: I see that I should have elaborated a little bit more about the exact circumstances of the app:
The app needs to do some small tasks all the time. These tasks usually don't take more than 1-10 seconds, and they don't require a lot of CPU work. On normal days, we have only 50-100 tasks to be done, but on 'special days' (New Year is one of them), they could go into several 10'000 tasks which have to be done inside of a 1-2 hour window. The tasks are done in a web role, and we have a Cron Job which initiates the tasks every minute. So, every minute the web role receives a request to process new tasks, so it checks which tasks have to be processed, adds them to some sort of queue (currently it's an SQL table with an UPDATE with OUTPUT INSERTED, but we intend to switch to Azure Queues sometime). Currently, the same instance processes the tasks immediately after queueing them, but this won't scale, since the serial processing of several 10'000 tasks takes too long. That's the reason why we're looking for a mechanism to broadcast the event "tasks are available" from the initial instance to the others.
Have you considered using Queues for distribution of work? You can put the "tasks" which needs to be processed in queue and then distribute the work to many worker processes.
The problem I see with approach 1 is that I see this as a "Scale Up" pattern and not "Scale Out" pattern. By deploying many small VM instances instead of one large instance will give you more scalability + availability IMHO. Furthermore you mentioned that your jobs are not CPU intensive. If you consider X-Small instance, for the cost of 1 Small instance ($0.12 / hour), you can deploy 6 X-Small instances ($0.02 / hour) and likewise for the cost of 1 Large instance ($0.48) you could deploy 24 X-Small instances.
Furthermore it's easy to scale in case of a "Scale Out" pattern as you just add or remove instances. In case of "Scale Up" (or "Scale Down") pattern since you're changing the VM Size, you would end up redeploying the package.
Sorry, if I went a bit tangential :) Hope this helps.
I agree with Gaurav and others to consider one of the Azure Queue options. This is really a convenient pattern for cleanly separating concerns while also smoothing out the load.
This basic Queue-Centric Workflow (QCW) pattern has the work request placed on a queue in the handling of the Web Role's HTTP request (the mechanism that triggers the work, apparently done via a cron job that invokes wget). Then the IIS web server in the Web Role goes on doing what it does best: handling HTTP requests. It does not require any support from a load balancer.
The Web Role needs to accept requests as fast as they come (then enqueues a message for each), but the dequeue part is a pull so the load can easily be tuned for available capacity (or capacity tuned for the load! this is the cloud!). You can choose to handle these one at a time, two at a time, or N at a time: whatever your testing (sizing exercise) tells you is the right fit for the size VM you deploy.
As you probably also are aware, the RoleEntryPoint::Run method on the Web Role can also be implemented to do work continually. The default implementation on the Web Role essentially just sleeps forever, but you could implement an infinite loop to query the queue to remove work and process it (and don't forget to Sleep whenever no messages are available from the queue! failure to do so will cause a money leak and may get you throttled). As Gaurav mentions, there are some other considerations in robustly implementing this QCW pattern (what happens if my node fails, or if there's a bad ("poison") message, bug in my code, etc.), but your use case does not seem overly concerned with this since the next kick from the cron job apparently would account for any (rare, but possible) failures in the infrastructure and perhaps assumes no fatal bugs (so you can't get stuck with poison messages), etc.
Decoupling placing items on the queue from processing items from the queue is really a logical design point. By this I mean you could change this at any time and move the processing side (the code pulling from the queue) to another application tier (a service tier) rather easily without breaking any part of the essential design. This gives a lot of flexibility. You could even run everything on a single Web Role node (or two if you need the SLA - not sure you do based on some of your comments) most of the time (two-tier), then go three-tier as needed by adding a bunch of processing VMs, such as for the New Year.
The number of processing nodes could also be adjusted dynamically based on signals from the environment - for example, if the queue length is growing or above some threshold, add more processing nodes. This is the cloud and this machinery can be fully automated.
Now getting more speculative since I don't really know much about your app...
By using the Run method mentioned earlier, you might be able to eliminate the cron job as well and do that work in that infinite loop; this depends on complexity of cron scheduling of course. Or you could also possibly even eliminate the entire Web tier (the Web Role) by having your cron job place work request items directly on the queue (perhaps using one of the SDKs). You still need code to process the requests, which could of course still be your Web Role, but at that point could just as easily use a Worker Role.
[Adding as a separate answer to avoid SO telling me to switch to chat mode + bypass comments length limitation] & thinking out loud :)
I see your point. Basically through HTTP request, you're kind of broadcasting the availability of a new task to be processed to other instances.
So if I understand correctly, when an instance receives request for the task to be processed, it pushes that request in some kind of queue (like you mentioned it could either be Windows Azure Queues [personally I would actually prefer that] or SQL Azure database [Not prefer that because you would have to implement your own message locking algorithm]) and then broadcast a message to all instances that some work needs to be done. Remaining instances (or may be the instance which is broadcasting it) can then see if they're free to process that task. One instance depending on its availability can then fetch the task from the queue and start processing that task.
Assuming you used Windows Azure Queues, when an instance fetched the message, it becomes unavailable to other instances immediately for some amount of time (visibility timeout period of Azure queues) thus avoiding duplicate processing of the task. If the task is processed successfully, the instance working on that task can delete the message.
If for some reason, the task is not processed, it will automatically reappear in the queue after visibility timeout period has expired. This however leads to another problem. Since your instances look for tasks based on a trigger (generating HTTP request) rather than polling, how will you ensure that all tasks get done? Assuming you get to process just one task and one task only and it fails since you didn't get a request to process the 2nd task, the 1st task will never get processed again. Obviously it won't happen in practical situation but something you might want to think about.
Does this make sense?
i would definitely go for a scale out solution: less complex, more manageable and better in pricing. Plus you have a lesser risk on downtime in case of deployment failure (of course the mechanism of fault and upgrade domains should cover that, but nevertheless). so for that matter i completely back Gaurav on this one!