On occasion I'm getting a rate limit error without being over my rate limit. I'm using the text completions endpoint on the paid api which has a rate limit of 3,000 requests per minute. I am using at most 3-4 requests per minute.
Sometimes I will get the following error from the api:
Status Code: 429 (Too Many Requests)
Open Ai error type: server_error
Open Ai error message: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists.
Open ai documentation states that a 429 error indicates that you have exceeded your rate limit which clearly I have not.
https://help.openai.com/en/articles/6891829-error-code-429-rate-limit-reached-for-requests
The weird thing is the open ai error message is not stating that. It is giving the response I usually get from a 503 error (service unavailable).
I'd love to hear some thoughts on this, any theories, or if anyone else has been experiencing this.
I have seen a few message on the OpenAI community forum with similar messages. I suggest checking out the error code guide we have with suggestion to mitigate these errors, in general though, it's possible the model itself was down and has nothing to do with your rate limit: https://platform.openai.com/docs/guides/error-codes
This error indicates the OpenAI servers have too many requests from all users and their servers have reached their capacity to service your request. It's pretty common at the moment.
Hopefully they will upgrade their servers soon. Not really sure why it is a big problem since they run on Azure and should be able to scale based on ramped up demand. Maybe they are just trying to minimise costs.
Related
My application sends requests to Azure Machine Learning REST API in order to invoke a batch endpoint and start scoring jobs as described here. It works well for small number of requests, but if the app sends many concurrent requests the REST API sometimes responds with status code 429 "TooManyRequests" and message "Received too many requests in a short amount of time. Retry again after 1 seconds.". For example, it happened after sending 77 requests at once.
The message is pretty clear and the best solution I can think about is to throttle outgoing requests. That is making sure the app doesn't exceed limits when it sends concurrent requests. But the problem is I don't know what are the request limits for Azure Machine Learning REST API. Looking through the Microsoft documentation I could only find this article which provides limits for Managed online endpoints whereas I'm looking for Batch endpoints.
I would really appreciate if someone helped me to find the Azure ML REST API request limits or suggested a better solution. Thanks.
UPDATE 20 Jun 2022:
I couldn't find out how many concurrent requests are allowed by Azure Machine Learning batch endpoints. So I ended with a limit of 10 outgoing requests which solved the "TooManyRequests" problem. In order to throttle requests I used SemaphoreSlim as described here.
According to the document, there is chance to enhance the quota of the request limit which is the way to solve the request limit exceed issue. Regarding batch quota limit, here is the document designed by Microsoft.
According to the above image, change the quota values.
Document Credit: prkannap and team
Alternatively, you could reduce the number of requests by storing multiple input files in a folder and invoking the job with the folder path.
If you want further assistance, please file a support ticket and a customer support engineer will assist you.
Once a min (1,440 times/day), I'm reading a Gmail mailbox from an Azure Logic App. After 2 days, it consistently returns 429-Too many requests. The quota threshold is 20,000/day. It has not run successfully since.
You might be running into the threshold for Concurrent Requests of gmail due to the parallel actions of Logic Apps. This will also return the 429 error.
What are you exactly doing in the logic app?
Based from this documentation, the Gmail API enforces the standard daily mail sending limits.
These limits are per-user and are shared by all of the user's clients, whether API clients, native/web clients or SMTP MSA. If these limits are exceeded a HTTP 429 Too Many Requests "User-rate limit exceeded" error mentioning "(Mail sending)" is returned with a time to retry. Note that daily limits being exceeded may result in these types of errors for multiple hours before the request is accepted, so your client may retry the request with standard exponential backoff.
These per-user limits cannot be increased for any reason.
The mail sending pipeline is complex: once the the user exceeds their quota, there can be a delay of several minutes before the API begins to return 429 error responses. So you cannot assume that a 200 response means the email was successfully sent.
You may try considering exponential backoff. Here's also an additional link which might help: Gmail API error 429 rateLimitExceeded even where is no any activity
could someone correct this problem without having to change plan in cloudant?
If you are using the official Cloudant nodejs library, see the retry plugin which handles 429 errors.
Note that 429 retry handling is probably only suitable for development environments or for the small fluctuations in demand that exceed your capacity. Excessive use of 429 handling will result in a build up of 'back-pressure' in your application.
I have a document db database on azure. I have a particularly heavy query that happens when I archive a user record and all of their data.
I was on the S1 plan and would get an exception that indicated I was hitting the limit of RU/s. The S1 plan has 250.
I decided to switch to the Standard plan that lets you set the RU/s and pay for it.
I set it to 500 RU/s.
I did the same query and went back and looked at the monitoring chart.
At the time I did this latest query test it said I did 226 requests and 10 were throttled.
Why is that? I set it to 500 RU/s. The query had failed, by the way.
Firstly, Requests != Request Units, so your 226 requests will at some point have caused more than 500 Request Units to be needed within one second.
The DocumentDb API will tell you how many RUs each request costs, so you can examine that client side to find out which request is causing the problem. From my experience, even a simple by-id request often cost at least a few RUs.
How you see that cost is dependent on which client-side SDK you use. In my code, I have added something to automatically log all requests that cost more than 10 RUs, just so I know and can take action.
It's also the case that the monitoring tools in the portal are quite inadequate and I know the team are working on that; you can only see the total RUs for every five minute interval, but you may try to use 600 RUs in one second and you can't really see that in the portal.
In your case, you may either have a single big query that just costs more than 500 RU - the logging will tell you. In that case, look at the generated SQL to see why, maybe even post it here.
Alternatively, it may be the cumulative effect of lots of small requests being fired off in a small time window. If you are doing 226 requests in response to one user action (and I don't know if you are) then you probably want to reconsider your design :)
Finally, you can retry failed requests. I'm not sure about other SDKs but the .Net SDK retries a request automatically 9 times before giving up (that might be another explanation for the 229 requests hitting the server).
If your chosen SDK doesn't retry, you can easily do it yourself; the server will return a specific status code (I think 429 but can't quite remember) along with an instruction on how long to wait before retrying.
Please examine the queries and update your question so we can help further.
I use Amazon's Product Advertising API to retrieve their node hierarchy using the API's BrowseNodeLookup method (REST using Java). On Amazon's sandbox individual requests seem to work but if I keep sending requests for various nodes I eventually end up getting HTTP 503 errors.
One of previous posts on an Amazon's forum indicated a limit of 20 requests per second on sandbox: https://forums.aws.amazon.com/thread.jspa?messageID=152657𥑑
After I put throttling in place I tried limiting code to send 20 requests/sec, as well as 10 requests/sec. In both cases I ended up eventually getting a 503 error. I had posted my question on Amazon's forum but have not received any information so I was wondering whether anybody knows answers to the following questions:
What kind of limits does the sandbox environment impose in this case?
Are those or similar limits in place in the production environment?
Do those limits apply to both REST and SOAP calls?
Maybe 10 requests/sec is too many?
I am having the same problem. I found this link that mentions 1 request/sec.
http://www.mail-archive.com/google-appengine#googlegroups.com/msg19305.html
It's approximately 2,000 per hour; with the opportunity to scale up if you're a merchant shipping a lot of product sold through their marketplace.
One way to help with this limit is by batching multiple requests in each API call - they're treated as one invocation, for the purposes of Amazon's rate-limiting governor. Not only does that help with throughput by permitting larger sets of requests to be issued; but because you're not dealing with intermachine latency (between your app & the Amazon server handling your API request) you're making up a bunch of time there, as well.