Azure DocumentDB Throttled Requests

Azure DocumentDB Throttled Requests - azure

I have a document db database on azure. I have a particularly heavy query that happens when I archive a user record and all of their data.
I was on the S1 plan and would get an exception that indicated I was hitting the limit of RU/s. The S1 plan has 250.
I decided to switch to the Standard plan that lets you set the RU/s and pay for it.
I set it to 500 RU/s.
I did the same query and went back and looked at the monitoring chart.
At the time I did this latest query test it said I did 226 requests and 10 were throttled.
Why is that? I set it to 500 RU/s. The query had failed, by the way.

Firstly, Requests != Request Units, so your 226 requests will at some point have caused more than 500 Request Units to be needed within one second.
The DocumentDb API will tell you how many RUs each request costs, so you can examine that client side to find out which request is causing the problem. From my experience, even a simple by-id request often cost at least a few RUs.
How you see that cost is dependent on which client-side SDK you use. In my code, I have added something to automatically log all requests that cost more than 10 RUs, just so I know and can take action.
It's also the case that the monitoring tools in the portal are quite inadequate and I know the team are working on that; you can only see the total RUs for every five minute interval, but you may try to use 600 RUs in one second and you can't really see that in the portal.
In your case, you may either have a single big query that just costs more than 500 RU - the logging will tell you. In that case, look at the generated SQL to see why, maybe even post it here.
Alternatively, it may be the cumulative effect of lots of small requests being fired off in a small time window. If you are doing 226 requests in response to one user action (and I don't know if you are) then you probably want to reconsider your design :)
Finally, you can retry failed requests. I'm not sure about other SDKs but the .Net SDK retries a request automatically 9 times before giving up (that might be another explanation for the 229 requests hitting the server).
If your chosen SDK doesn't retry, you can easily do it yourself; the server will return a specific status code (I think 429 but can't quite remember) along with an instruction on how long to wait before retrying.
Please examine the queries and update your question so we can help further.

Related

How to handle multiple API Requests

I am working with the Google Admin SDK to create Google Groups for my organization. I can't add members to the group when creating the group, ideally, when I create a new group I'd like to add roughly 60 members. In addition, the ability to add members after the group is created in bulk (a batch request) was deprecated August 2020. Right now, after I create a group I need to make a web request to the API to add each member of the group individually (which will be about 60 members).
I am using node.js and express, is there a good way to handle 60 web requests to an api? I don't know how taxing this will be on my server. If anyone has any resources to share where I can learn about the impact this would have on a nodejs server that would be great.
Fortunately, these groups aren't created often, maybe 15 a month.
One idea I have is to offload the work to something like a cloud function so my node server makes one request to the cloud function, then the cloud function makes all the additional requests to add members to the Group. I'm not 100% sure if this is necessary and I'm curious on other approaches.

Limits and Quotas
Note that adding group members may take 10 minutes to propagate.
The rate limit for the Directory API is 3000 queries per 100 seconds per IP address. This works out to around 30 per second. 60 requests is not a large amount of requests, but if you try to send them all in a few milliseconds the system may extrapolate the rate and deem it over the limit, I wouldn't think so, though probably best to test it on your end with your system and connection etc.
Exponential Backoff
If you do need to make many requests this is the method Google recommends. It involves repeating the request if it fails and then exponentially increasing the amount of time to wait until it reaches 16 seconds. You can always implement a longer wait to retry. Its only 60 requests after all.
Batch Requests
The previously mentioned methods should work no issue for you, since there are only 60 requests to make, it won't put any "stress" on the system as such. That said, the most performant way to handle many requests to the Directory API is to use a batch request. This allows you to bundle up all your member requests into one large batch, of up to 1000 calls. This will also give you a nice cushion in case you need to increase your requests in future!
EDIT - I am sorry, I missed that you mentioned that Batching is deprecated. Only global batching is deprecated. If you send a batch request to a specific API, batching will still be supported. What you can no longer do is send a single batch request to different APIs, like Directory and Sheets in one.
References
Limits and Quotas
Exponential Backoff
Batch Requests

How to handle the API call rate limit for Docusign in nodejs

How to handle API call rate limit for Docusign in nodejs?I am getting error like "The maximum number of hourly API invocations has been exceeded ".

So, you hit your QPH (queries per hour) limit, right?
Well, there are two things you can do to minize that:
Throttling
Caching
Throttling
When an application has a query limit per period of time, throttling is often a common strategy used.
Throttling revolves around distributing your requests over a period of time. So for example, if you have a 500 limit of requests per hour, you can throttle your application to do 250 requests in the first 30 minutes, and 250 after.
This way you avoid hitting the limit. If you have more requests, then you save them for the next hour.
Caching
Throttling is good because it gives you control over time. But sometimes, requests are similar, so similar in fact that you can just save the answer and use it for later.
Caching is also often used together with throttling, by saving the answers from old requests in your system, and re-using them, you effectively loose the need to make requests against the API, and you gain the ability to answer more user requests (provided you cached them before).
Sum up
There is no silver bullet to solve your problem. There is no simple line of code to do that. Instead, you have two methods that used together will minimize your problem and perhaps even eliminate it altogether if used correctly.

How frequently the Traffic Manager monitors endpoints

How frequently the Traffic Manager monitors endpoints? It's very obvious that it's not event driven (when an endpoint is down it takes up-to 30 secs - 2.5 mins to identify the status of the endpoint as per my observations). Can we configure this frequency, I cannot see any configuration for this.
Is there a relationship between Traffic Manager Monitoring interval and TTL?
This may look like a general question, but my real issue is that I experience a service downtime in a fail over scenario (fail over of the primary). I understand the effect in TTL where until the client DNS cache expires they are calling the cached endpoint. I spent a lot of time on this and now I have narrowed down it to a specific question.
Issue is that there is a delay in Traffic Manager identifying the endpoint status after it's stopped or started. I need a logical explanation for this, could not find any Azure reference which explains this.
Traffic manager settings
I need to understand this delay and plan for that down time.

I have gone through the same issue. Check this link, it explains the Monitoring behaviour
Traffic Manager Monitoring
The monitoring system performs a GET, but does not receive a response in 10 seconds or less. It then performs three more tries at 30 second intervals. This means that at most, it takes approximately 1.5 minutes for the monitoring system to detect when a service becomes unavailable. If one of the tries is successful, then the number of tries is reset. Although not shown in the diagram, if the 200 OK message(s) come back more than 10 seconds after the GET, the monitoring system will still count this as a failed check.
This explains the 30-2 mins delay.
basically the maximum delay would be 1.5 mins + TTL as per the details.

What does the 500 telemetry data points per second limit actually mean in Application Insights?

On this documentation page there is the following limitation of Application Insights documented:
Up to 500 telemetry data points per second per instrumentation key (that is, per application). This includes both the standard telemetry sent by the SDK modules, and custom events, metrics and other telemetry sent by your code.
However it doesn't explain what the implications of that limit are?
a) Does it buffer and throttle, but still persist all data eventually? So say - 1000 data points get pushed within a second - it will persist the first 500, then wait for a bit and push the other 500?
or
b) Does it just drop/not log data? So say - 1000 data points get pushed within a second and only the first 500 will be persisted and the other 500 not (ever)?

It is the latter (b) with the caveat that ALL data will start to be throttled in this case, i.e. once RPC is > 500 (100 for free apps, please see https://azure.microsoft.com/en-us/documentation/articles/app-insights-data-retention-privacy/ for details) is detected, it will start rejecting all data from this instrumentation key on data collection endpoint, until RPC rate is back to under 500.
EDIT: Further information from Bret Grinslade:
The current implementation averages over one minute -- so if you send 30K in 1 minute (500*60) it will throttle your application. The HTTP response will tell the SDK to retry later. If the incoming rate never comes down, the response will tell the SDK to drop the data. We are working on other features to improve this experience -- pre-aggregation on the client, improved burst data rates, etc.

A bit more detail on top of Alex's response. The current implementation averages over one minue -- so if you send 30K in 1 minute (500*60) it will throttle your application. The HTTP response will tell the SDK to retry later. If the incoming rate never comes down, the response will tell the SDK to drop the data. We are working on other features to improve this experience -- pre-aggregation on the client, improved burst data rates, etc.

AI now has the ingestion throttling limit of 16K EPS: https://learn.microsoft.com/en-us/azure/application-insights/app-insights-pricing

Instagram real-time API POST rate

I'm building an application using tag subscriptions in the real-time API and have a question related to capacity planning. We may have a large number of users posting to a subscribed hashtag at once, so the question is how often will the API actually POST to our subscription processing endpoint? E.g., if 100 users post to #testhashtag within a second or two, will I receive 100 POSTs or does the API batch those together as one update? A related question: is there a maximum rate at which POSTs can be sent (e.g., one per second or one per ten seconds, etc.)?

The Instagram API seems to lack detailed information about both how many updates are sent and what are the rate limits. From the [API docs][1]:
Limits
Be nice. If you're sending too many requests too quickly, we'll send back a 503 error code (server unavailable).
You are limited to 5000 requests per hour per access_token or client_id overall. Practically, this means you should (when possible) authenticate users so that limits are well outside the reach of a given user.
In other words, you'll need to check for a 503 and throttle your application accordingly. No information I've seen for how long they might block you, but it's best to avoid that completely. I would advise you manage this by placing a rate limiting mechanism on your own code, such as pushing your API requests through a queue with rate control. That will also give you the benefit of a retry of you're throttled so you won't lose any of the updates.
Moreover, a mechanism such as a queue in the case of real-time updates is further relevant because of the following from the API docs:
You should build your system to accept multiple update objects per payload - though often there will be only one included. Also, you should acknowledge the POST within a 2 second timeout--if you need to do more processing of the received information, you can do so in an asynchronous task.
Regarding the number of updates, the API can send you 1 update or many. The problem with this is you can absolutely murder your API calls because I don't think you can batch calls to specific media items, at least not using the official python or ruby clients or API console as far as I have seen.
This means that if you receive 500 updates either as 1 request to your server or split into many, it won't matter because either way, you need to go and fetch these items. From what I observed in a real application, these seemed to count against our quota, however the quota itself seems to consume resources erratically. That is, sometimes we saw no calls at all consumed, other times the available calls dropped by far more than we actually made. My advice is to be conservative and take the 5000 as a best guess rather than an absolute. You can check the remaining calls by parsing one of the headers they send back.
Use common sense, don't be stupid, and using a rate limiting mechanism should keep you safe and have the benefit of dealing with failures either due to outages (this happens more than you may think), network hicups, and accidental rate limiting. You could try to be tricky and use different API keys in a pooling mechanism, but this is likely a violation of the TOS and if they are doing anything via IP, you'd have to split this up to different machines with different IPs.
My final advice would be to restructure your application to not completely rely on the subscription mechanism. It's less than reliable and very expensive API wise. It's only truly useful if you just need to do something in your app that doesn't require calling back to Instgram, your number of items is small, or you can filter out the majority of items to avoid calling back to Instagram accept when a specific business rule is matched.
Instead, you can do things like query the tag or the user (ex: recent media) and scale it out that way. Normally this allows you to grab 100 items with 1 request rather than 100 items with 100 requests. If you really want to be cute, you could at least merge the subscription notifications asynchronously and combine the similar ones into a single batched request when you combine the duplicate characteristics such as tag into a single bucket. Sort of like a map/reduce but on a small data set. You could of course do an actual map/reduce from time-to-time on your own data as another way of keeping things in async. Again, be careful not to thrash instagram, but rather just use map/reduce to batch out your calls in a way that's useful to your app.
Hope that helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string