My application sends requests to Azure Machine Learning REST API in order to invoke a batch endpoint and start scoring jobs as described here. It works well for small number of requests, but if the app sends many concurrent requests the REST API sometimes responds with status code 429 "TooManyRequests" and message "Received too many requests in a short amount of time. Retry again after 1 seconds.". For example, it happened after sending 77 requests at once.
The message is pretty clear and the best solution I can think about is to throttle outgoing requests. That is making sure the app doesn't exceed limits when it sends concurrent requests. But the problem is I don't know what are the request limits for Azure Machine Learning REST API. Looking through the Microsoft documentation I could only find this article which provides limits for Managed online endpoints whereas I'm looking for Batch endpoints.
I would really appreciate if someone helped me to find the Azure ML REST API request limits or suggested a better solution. Thanks.
UPDATE 20 Jun 2022:
I couldn't find out how many concurrent requests are allowed by Azure Machine Learning batch endpoints. So I ended with a limit of 10 outgoing requests which solved the "TooManyRequests" problem. In order to throttle requests I used SemaphoreSlim as described here.
According to the document, there is chance to enhance the quota of the request limit which is the way to solve the request limit exceed issue. Regarding batch quota limit, here is the document designed by Microsoft.
According to the above image, change the quota values.
Document Credit: prkannap and team
Alternatively, you could reduce the number of requests by storing multiple input files in a folder and invoking the job with the folder path.
If you want further assistance, please file a support ticket and a customer support engineer will assist you.
Related
I am working with the Google Admin SDK to create Google Groups for my organization. I can't add members to the group when creating the group, ideally, when I create a new group I'd like to add roughly 60 members. In addition, the ability to add members after the group is created in bulk (a batch request) was deprecated August 2020. Right now, after I create a group I need to make a web request to the API to add each member of the group individually (which will be about 60 members).
I am using node.js and express, is there a good way to handle 60 web requests to an api? I don't know how taxing this will be on my server. If anyone has any resources to share where I can learn about the impact this would have on a nodejs server that would be great.
Fortunately, these groups aren't created often, maybe 15 a month.
One idea I have is to offload the work to something like a cloud function so my node server makes one request to the cloud function, then the cloud function makes all the additional requests to add members to the Group. I'm not 100% sure if this is necessary and I'm curious on other approaches.
Limits and Quotas
Note that adding group members may take 10 minutes to propagate.
The rate limit for the Directory API is 3000 queries per 100 seconds per IP address. This works out to around 30 per second. 60 requests is not a large amount of requests, but if you try to send them all in a few milliseconds the system may extrapolate the rate and deem it over the limit, I wouldn't think so, though probably best to test it on your end with your system and connection etc.
Exponential Backoff
If you do need to make many requests this is the method Google recommends. It involves repeating the request if it fails and then exponentially increasing the amount of time to wait until it reaches 16 seconds. You can always implement a longer wait to retry. Its only 60 requests after all.
Batch Requests
The previously mentioned methods should work no issue for you, since there are only 60 requests to make, it won't put any "stress" on the system as such. That said, the most performant way to handle many requests to the Directory API is to use a batch request. This allows you to bundle up all your member requests into one large batch, of up to 1000 calls. This will also give you a nice cushion in case you need to increase your requests in future!
EDIT - I am sorry, I missed that you mentioned that Batching is deprecated. Only global batching is deprecated. If you send a batch request to a specific API, batching will still be supported. What you can no longer do is send a single batch request to different APIs, like Directory and Sheets in one.
References
Limits and Quotas
Exponential Backoff
Batch Requests
We're running our Node backend on Firebase Functions, and have to frequently hit a third-party API (HubSpot), which is rate-limited to 100 requests / 10 seconds.
We're making these requests to HubSpot from our cloud functions, and often find ourselves exceeding HubSpot's rate-limit during campaigns or other website usage spikes. Also, since they are all write requests to update data on HubSpot, these requests cannot be made out of order.
Is there a way to throttle our requests to HubSpot, so as to not exceed their rate limit? Open to suggestions that may not necessarily involve cloud functions, although that would be preferred.
Note: When I say "throttle", I mean that all requests to HubSpot need to go through. I'm trying to achieve something similar to what Lodash's throttle method does, if that makes sense.
What we usually do in this case is store the data into a database, and then pass it over to HubSpot in a tempered way (e.g. without exceeding their rate limit) using a cron that runs every minute. For every data item that we pass to HubSpot successfully, we mark it as "success" in the database.
Cloud Functions can not be rate limited. It will always attempt to service requests and events as fast as they arrive. But you can use Cloud Tasks to create an task queue to spread out the load of some work over time using a configured rate limit. A task queue can target another HTTP function. This effectively makes your processing asynchronous, but is really the only mechanism that Google Cloud gives you to smooth out load.
How to control the usage of APIs by consumers during a given period in Azure function app Http trigger. Simply how to set a requests throttle when exceed the request limit, and please let me know a solution without using azure API Gateway.
The only control you have over host creation in Azure Functions an obscure application setting: WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT. This implies that you can control the number of hosts that are generated, though Microsoft claim that “it’s not completely foolproof” and “is not fully supported”.
From my own experience it only throttles host creation effectively if you set the value to something pretty low, i.e. less than 50. At larger values then its impact is pretty limited. It’s been implied that this feature will be will be worked on in the future, but the corresponding issue has been open in GitHub with no update since July 2017.
For more details, you could refer to this article.
You can use the initialVisibilityDelay property of the CloudQueue.AddMessage function as outlined in this blog post.
This will throttle the message to prevent the 429 error if implemented correctly using the leaky bucket algorithm or equivalent.
I'm building an application using tag subscriptions in the real-time API and have a question related to capacity planning. We may have a large number of users posting to a subscribed hashtag at once, so the question is how often will the API actually POST to our subscription processing endpoint? E.g., if 100 users post to #testhashtag within a second or two, will I receive 100 POSTs or does the API batch those together as one update? A related question: is there a maximum rate at which POSTs can be sent (e.g., one per second or one per ten seconds, etc.)?
The Instagram API seems to lack detailed information about both how many updates are sent and what are the rate limits. From the [API docs][1]:
Limits
Be nice. If you're sending too many requests too quickly, we'll send back a 503 error code (server unavailable).
You are limited to 5000 requests per hour per access_token or client_id overall. Practically, this means you should (when possible) authenticate users so that limits are well outside the reach of a given user.
In other words, you'll need to check for a 503 and throttle your application accordingly. No information I've seen for how long they might block you, but it's best to avoid that completely. I would advise you manage this by placing a rate limiting mechanism on your own code, such as pushing your API requests through a queue with rate control. That will also give you the benefit of a retry of you're throttled so you won't lose any of the updates.
Moreover, a mechanism such as a queue in the case of real-time updates is further relevant because of the following from the API docs:
You should build your system to accept multiple update objects per payload - though often there will be only one included. Also, you should acknowledge the POST within a 2 second timeout--if you need to do more processing of the received information, you can do so in an asynchronous task.
Regarding the number of updates, the API can send you 1 update or many. The problem with this is you can absolutely murder your API calls because I don't think you can batch calls to specific media items, at least not using the official python or ruby clients or API console as far as I have seen.
This means that if you receive 500 updates either as 1 request to your server or split into many, it won't matter because either way, you need to go and fetch these items. From what I observed in a real application, these seemed to count against our quota, however the quota itself seems to consume resources erratically. That is, sometimes we saw no calls at all consumed, other times the available calls dropped by far more than we actually made. My advice is to be conservative and take the 5000 as a best guess rather than an absolute. You can check the remaining calls by parsing one of the headers they send back.
Use common sense, don't be stupid, and using a rate limiting mechanism should keep you safe and have the benefit of dealing with failures either due to outages (this happens more than you may think), network hicups, and accidental rate limiting. You could try to be tricky and use different API keys in a pooling mechanism, but this is likely a violation of the TOS and if they are doing anything via IP, you'd have to split this up to different machines with different IPs.
My final advice would be to restructure your application to not completely rely on the subscription mechanism. It's less than reliable and very expensive API wise. It's only truly useful if you just need to do something in your app that doesn't require calling back to Instgram, your number of items is small, or you can filter out the majority of items to avoid calling back to Instagram accept when a specific business rule is matched.
Instead, you can do things like query the tag or the user (ex: recent media) and scale it out that way. Normally this allows you to grab 100 items with 1 request rather than 100 items with 100 requests. If you really want to be cute, you could at least merge the subscription notifications asynchronously and combine the similar ones into a single batched request when you combine the duplicate characteristics such as tag into a single bucket. Sort of like a map/reduce but on a small data set. You could of course do an actual map/reduce from time-to-time on your own data as another way of keeping things in async. Again, be careful not to thrash instagram, but rather just use map/reduce to batch out your calls in a way that's useful to your app.
Hope that helps.
I use Amazon's Product Advertising API to retrieve their node hierarchy using the API's BrowseNodeLookup method (REST using Java). On Amazon's sandbox individual requests seem to work but if I keep sending requests for various nodes I eventually end up getting HTTP 503 errors.
One of previous posts on an Amazon's forum indicated a limit of 20 requests per second on sandbox: https://forums.aws.amazon.com/thread.jspa?messageID=152657𥑑
After I put throttling in place I tried limiting code to send 20 requests/sec, as well as 10 requests/sec. In both cases I ended up eventually getting a 503 error. I had posted my question on Amazon's forum but have not received any information so I was wondering whether anybody knows answers to the following questions:
What kind of limits does the sandbox environment impose in this case?
Are those or similar limits in place in the production environment?
Do those limits apply to both REST and SOAP calls?
Maybe 10 requests/sec is too many?
I am having the same problem. I found this link that mentions 1 request/sec.
http://www.mail-archive.com/google-appengine#googlegroups.com/msg19305.html
It's approximately 2,000 per hour; with the opportunity to scale up if you're a merchant shipping a lot of product sold through their marketplace.
One way to help with this limit is by batching multiple requests in each API call - they're treated as one invocation, for the purposes of Amazon's rate-limiting governor. Not only does that help with throughput by permitting larger sets of requests to be issued; but because you're not dealing with intermachine latency (between your app & the Amazon server handling your API request) you're making up a bunch of time there, as well.