I have cloud service with single worker role running self hosted web api.
Trying to consume this web api from another cloud service (in same vnet) using xxx.cloudapp.net address. But performance is very unstable. Sometimes after couple hundred requests, http requests freezes for some time. Seams like Azure load balancer is throttling my requests.
Here is output from Apache Bench with reproduced freezing (ran from another VM in same vnet):
ab -c 10 -n 1000 http://xxx.cloudapp.net/ping
<..>
Time taken for tests: 39.970 seconds
Complete requests: 1000
Failed requests: 1
(Connect: 1, Receive: 0, Length: 0, Exceptions: 0)
<..>
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 19 402.6 0 9017
Processing: 0 360 2307.4 15 21046
Waiting: 0 318 2178.6 0 21046
Total: 0 379 2339.6 16 21046
Percentage of the requests served within a certain time (ms)
50% 16
66% 16
75% 16
80% 16
90% 16
95% 17
98% 9015
99% 9032
100% 21046 (longest request)
There are not freezes when using local IP (e.g. 10.0.0.x).
Tried using web role/iis with same results.
Why this is happening? How to avoid this? I don't want to use local IP, because then I will loose swapping feature.
I understand that you are doing perf test from one cloud service to another cloud service.
Azure load balancer does not throttle. It is an infrastructure components with extremely high limits. What could happen is that you may run our or ports. When you create outbound connections, we SNAT to source VIP. Since you SNAT, you get a limit of 64K ports.
The best to verify this is to associate a public IP to your VM running the perf test (https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-instance-level-public-ip/) then run your test again
Yves
Related
Problem - Increase in latencies ( p90 > 30s ) for a simple WebSocket server hosted on VM.
Repro
Run a simple websocket server on a single VM. The server simply receives a request and then upgrades it to websocket without any logic. The client will continuously send 50 parallel requests for a period of 5 mins ( so approximately 3000 requests ).
Issue
Most requests have a latency of the range 100ms-2s. However, for 300-500 requests, we observe that latencies are high ( 10-40s with p90 greater than 30s ) while some have TCP timeouts (Linux default timeout of 127s).
When analyzing the VM processes, we observe that when requests are taking a lot of time, the node process loses it CPU share in favor of some processes started by the VM.
Further Debugging
Increasing process priority (renice) and i/o priority (ionice) did not solve the problem
Increasing cores and memories to 8 core, 32 GiB memory did not solve the problem.
Edit - 1
Repro Code ( clustering enabled ) - https://gist.github.com/Sid200026/3b506a9f77cfce3fa4efdd1ec9dd29bc
When monitoring active processes via htop, we find that the processes started by the following commands are causing an issue
python3 -u bin/WALinuxAgent-2.9.0.4-py2.7.egg -run-exthandlers
/usr/lib/linux-tools/5.15.0-1031-azure/hv_kvp_daemon -n
I have deployed around 23 models (amounting to 1.57 GB) in a Azure ML workspace using Azure Kubernetes Service. For the AKS cluster, I have used 3 D8sv3 nodes, and enabled cluster auto scaling for the cluster up to 6 nodes.
The AksWebService is configured with 4.4 cores, 16 GB memory. I have enabled pod auto scaling for the Web service, having set autoscale_max_replicas at 40:
aks_config = AksWebservice.deploy_configuration(cpu_cores = 4.4, memory_gb = 16, autoscale_enabled = True,
description = 'TEST - Configuration for Kubernetes Compute Target',
enable_app_insights = True, max_request_wait_time = 25000,
autoscale_target_utilization = 0.6, autoscale_max_replicas = 40)
I tried running load tests with 10 concurrent users (using JMeter). And I monitored the cluster application insights:
I can see the nodes and pods scaling. However, there is no spike in CPU/memory utilization. For 10 concurrent requests, only 5 to 6 requests pass, the rest fail. When I send an individual request to the deployed endpoint, the response is generated in 7 to 9 seconds. However, in the load test logs, there are plenty requests taking more than 15 seconds to generate a response. And the requests taking more than 25 seconds, fail with status code 503. I increased the max_request_wait_time due to this reason, however, I don't understand why it would take so much time despite such amount of compute, and the dashboard shows that memory isn't even 30% utilized. Should I be changing the replica_max_concurrent_requests param? Or should I be increasing the autoscale_max_replicas even more? Concurrent requests load may sometimes reach 100 in production, is there any solution to this?
Will be grateful for any advice. Thanks.
We have created an image scoring model on Machine learning Service and deployed using AMLS portal on ACI and AKS both.
Though it runs on smaller images , for larger images it gets timed-out after exactly 1 minute on both ACI and AKS.
It is expected that an image scoring can take few minutes.
Wanted to know , if it’s a limitation on using AMLS deployment, or on ACI and AKS that they timeout the deployed webservice after 60 seconds??
Any workaround would be welcomed
ACI Error :-
Post http://localhost:5001/score: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
AKS Error :-
Replica closed connection before replying
If you are deploying a service in AKS, then #Greg's solution should be sufficient for most cases. However, if your value for scoring_timeout_ms is going to exceed 60000 milliseconds (i.e. 60 secs), then I recommend also tuning with the following config settings. When your model gets deployed in Kubernetes as a deployment, we define a LivenessProbe so that if your model container becomes unresponsive, Kubernetes can automatically restart your container in an effort to restore the health of your model.
period_seconds: the time interval between each LivenessProbe. If your model is going to take 45 seconds to respond to a scoring request, then 1 thing you can do is to increase the time interval between each LivenessProbe execution from the default 10 seconds to possibly 30 seconds (or more).
failure_threshold: the number of LivenessProbe failures after which Kubernetes restarts your model container. If you want to run LivenessProbe every 10 seconds and your model is going to take 45 seconds to respond, then you can increase failure_threshold from default 3 to 10. This would mean after 10 consecutive LivenessProbe failures, Kubernetes will restart your container.
timeout_seconds: the time interval for LivenessProbe to wait before giving up. One other option you could consider is increasing the timeout_seconds from default 2 seconds to 30 seconds. This would result in LivenessProbe waiting for up to 30 seconds when your container is busy but when it is not, it will reply back earlier.
There is no one "correct" config setting to modify, but the combination of these will definitely help in preventing 502 "Replica closed connection before replying" error.
The deployment class has a timeout setting you can change in the constructor, that can help. Some clients will time out anyways.
https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice.aks.aksservicedeploymentconfiguration?view=azure-ml-py
scoring_timeout_ms : int => A timeout to enforce for scoring calls to this Webservice. Defaults to 60000
I kept node.js sever on one machine and mongodb sever on another machine. requests were mixture of 70% read and 30% write. It is observed that at 100 request in a second throughput is 60req/sec and at 200 requests second throughput is 130 req/sec. cpu and memory usage is same in both the cases. If application can server 130 req/sec then why it has not server 100req/sec in first case since cpu and memory utilization is same. machines are using ubuntu server 14.04
Make user threads in Jmeter and use loop forever for 300 sec. Then get the values.
I am trying to scale websites on Widows Azure. So far I‘ve tested Wordpress, Ghost (Blog) and a plain HTML site and it’s all the same: If I scale them up (add instances), they don’t get any faster. I am sure I must do something wrong…
This is what I did:
I created a new shared website, with a plain HTML Bootstrap template on it. http://demobootstrapsite.azurewebsites.net/
Then I installed ab.exe from the Apache Project on a hosted bare metal server (4 Cores, 12 GB RAM, 100 MBit)
I ran the test two times. The first time with a single shared instance and the second time with two shared instances using this command:
ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/
This means ab.exe is going to create 10000 requests with 100 parallel threads.
I expected the response times of the test with two shared instances to be significantly lower than the response times with just one shared instance. But the mean time per request even rised a bit from 1452.519 ms with one shared instance to 1460.631 ms with two shared instances. Later I even ran the site on 8 shared instances with no effect at all. My first thought was that maybe the shared instances are the problem. So I put the site on a standard VM and ran the test again. But the problems remain the same. Also adding more instances didn’t make the site any faster (even a bit slower).
Later I‘ve whatched a Video with Scott Hanselman and Stefan Schackow in which they‘ve explained the Azure Scaling features. Stefan says that Azure has a kind of „sticky loadbalancing“ which will redirect a client always to the same instance/VM to avoid compatibility problems with statefull applications. So I‘ve checked the WebServer logs and I found a Logfile for every instance with about the same size. Usually that means that every instance was used during the test..
PS: During the test run I‘ve checked the response time oft the website from my local computer (from a different network than the server) and the response times were about 1.5s.
Here are the test results:
######################################
1 instance result
######################################
PS C:\abtest> .\ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking demobootstrapsite.azurewebsites.net (be patient)
Finished 10000 requests
Server Software: Microsoft-IIS/8.0
Server Hostname: demobootstrapsite.azurewebsites.net
Server Port: 80
Document Path: /
Document Length: 16396 bytes
Concurrency Level: 100
Time taken for tests: 145.252 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 168800000 bytes
HTML transferred: 163960000 bytes
Requests per second: 68.85 [#/sec] (mean)
Time per request: 1452.519 [ms] (mean)
Time per request: 14.525 [ms] (mean, across all concurrent requests)
Transfer rate: 1134.88 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 14 8.1 16 78
Processing: 47 1430 93.9 1435 1622
Waiting: 16 705 399.3 702 1544
Total: 62 1445 94.1 1451 1638
Percentage of the requests served within a certain time (ms)
50% 1451
66% 1466
75% 1482
80% 1498
90% 1513
95% 1529
98% 1544
99% 1560
100% 1638 (longest request)
######################################
2 instances result
######################################
PS C:\abtest> .\ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking demobootstrapsite.azurewebsites.net (be patient)
Finished 10000 requests
Server Software: Microsoft-IIS/8.0
Server Hostname: demobootstrapsite.azurewebsites.net
Server Port: 80
Document Path: /
Document Length: 16396 bytes
Concurrency Level: 100
Time taken for tests: 146.063 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 168800046 bytes
HTML transferred: 163960000 bytes
Requests per second: 68.46 [#/sec] (mean)
Time per request: 1460.631 [ms] (mean)
Time per request: 14.606 [ms] (mean, across all concurrent requests)
Transfer rate: 1128.58 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 14 8.1 16 78
Processing: 31 1439 92.8 1451 1607
Waiting: 16 712 402.5 702 1529
Total: 47 1453 92.9 1466 1622
Percentage of the requests served within a certain time (ms)
50% 1466
66% 1482
75% 1482
80% 1498
90% 1513
95% 1529
98% 1544
99% 1560
100% 1622 (longest request)
"Scaling" the website in terms of resources adds more capacity to accept more requests, and won't increase the speed at which a single capacity instance can perform when not overloaded.
For example; assume a Small VM can accept 100 requests per second, processing each request at 1000ms, (and if it was 101 requests per second, each request would start to slow down to say 1500ms) then scaling to more Small VMs won't increase the speed at which a single request can be processed, it just raises us to accepting 200 requests per second under 1000ms each (as now both machines are not overloaded).
For per-request performance; the code itself (and CPU performance of the Azure VM) will impact how quickly a single request can be executed.
Given the complete absence in the question of the most important detail of such a test, it sounds to me you are merely testing your Internet connection bandwidth. 10 Mb/sec is a very common rate.
No, it doesn't scale.
I usually run logparser against the iis logs that were generated at the time of the load test and calculate the RPS and latency (time-taken field) off that. This helps isolate the slowness from network, to server processing to actual load test tool reporting.
Some ideas:
Is Azure throttling to prevent a DOS attack? You are making a hell of a lot of requests from one location to a single page.
Try Small sized Web Sites rather than shared. Capacity and Scaling might be quite different. Load of 50 requests/sec doesn't seem terrible for a shared service.
Try to identify where that time is going. 1.4s is a really long time.
Run load tests from several different machines simultaneously, to determine if there's throttling going on or you're affected by sticky load balancing or other network artefacts.
You said it's ok under load of about 10 concurrent requests at 50 requests/second. Gradually increase the load you're putting on the server to determine the point at which it starts to choke. Do this across multiple machines too.
Can you log on to Web Sites? Probably not ... see if you can replicate the same issues on a Cloud Service Web Role and analyze from there using Performance Monitor and typical IIS tools to see where the bottleneck is, or if it's even on the machine versus Azure network infrastructure.
Before you load test the websites, you should do a baseline test with a single instance, say with 10 concurrent threads, to check how the website handles when not under load. Then use this base line to understand how the websites behave under load.
For example, if the baseline shows the website responds in 1.5s to requests when not under load, and again with 1.5s under load, then this means the website is able to handle the load easily. If under load the website takes 3-4s using a single instance, then this means it doesn't handle the load so well - try to add another instance and check if the response time improves.
Here
You can test for FREE
http://tools.pingdom.com/fpt/#!/ELmHA/http://demobootstrapsite.azurewebsites.net/
http://tools.pingdom.com/
Regards
Valentin