I'm planning a project and working through all the potential issues I might face. One that I keep running into which might be specific to my project is concurrency issues. From my understanding, Azure Functions scale when under demand which is exactly what I'm looking for but causes a problem when it comes to concurrency. Let me explain the scenario:
Http triggered Azure Function which does the below
Gets clients available credit, if zero, auto-charge clients card.
Deduct credit from the client for the request.
Processes the request and returns to the client.
Where I see an issue is getting the available credit and auto-charging card. Due to the possibility of having multiple instances of the function I might auto-charge the card multiple times and on top of that getting and deducting the credit will be affected.
I'm wanting the scaling of Azure Functions but can't figure a way around these concurrency issues. Any insight or pointers in the right direction would be very much appreciated.
Related
I got some credits for using GCP. It asked me for a credit card before accessing. Since it is quite tricky to monitor the cost in real time, I hope to set up a circuit breaker such that once my credit runs out, all my VM instance jobs will cease running. I understand there will still be some small charges for space and static IP, but they are OK.
Is such a circuit breaker feasible? I checked some FAQs on budget alerts, e.g., https://cloud.google.com/billing/docs/how-to/budgets, but there appears no mentioning of it.
Thanks.
It's possible to be notified progammatically by creating a budget then using pub/sub which that'll push a Cloud Function to stop your resources. Here is an example based on the previous suggestion.
For small app they are no problem.
But for apps with traffic you can hit limits easily.
Http protocol is req-res driven. Just because your backend is stuck with limit, you can't really wait to send respond back until rate limit allows you to resume making your api calls.
What do you do?
I can think of several scenarios:
Wait it out: while it sucks, but sometimes it's easy fix, as you don't need to do anything.
Queue it: this a lot of work oppose to making just api call. This requires that first you store it in database, then have background task go through database and do the task. Also user would be told "it is processing" not "it's done"
Use lot of apis: very hacky... and lot of trouble to manage. Say you are using amazon, now you would have to create, verify, validate like 10 accounts. Not even possible for where you need to verify with say domain name. Since amazon would know account abc already owns it.
To expand on what your queueing options are:
Unless you can design the problem of hitting this rate limit out of existence as #Hammerbot walks through, I would go with some implementation of queue. The solution can scale in complexity and robustness according to what loads you're facing and how many rate limited APIs you're dealing with.
Recommended
You use some library to take care of this for you. Node-rate-limiter looks promising. It still appears you would have to worry about how you handle your user interaction (make them wait, write to a db/cache-service and notify them later).
"Simplest case" - not recommended
You can implement a minimally functioning queue and back it with a database or cache. I've done this before and it was fine, initially. Just remember you'll run into needing to implement your own retry logic, will have to worry about things like queue starvation **. Basically, the caveats of rolling your own < insert thing whose implementation someone already worried about > should be taken into consideration.
**(e.g. your calls keep failing for some reason and all of a sudden your background process is endlessly retrying large numbers of failing queue work elements and your app runs out of memory).
Complex case:
You have a bunch of API calls that all get rate-limited and those calls are all made at volumes that make you start considering decoupling your architecture so that your user-facing app doesn't have to worry about handling this asynchronous background processing.
High-level architecture:
Your user-facing server pushes work units of different type onto different queues. Each of these queues corresponds to a differently rate-limited processing (e.g. 10 queries per hour, 1000 queries per day). You then have a "rate-limit service" that acts as a gate to consuming work units off the different queues. Horizontally distributed workers then only consume items from the queues if and only if the rate limit service says they can. The results of these workers could then be written to a database and you could have some background process to then notify your users of the result of the asynchronous work you had to perform.
Of course, in this case you're wading into a whole world of infrastructure concerns.
For further reading, you could use Lyft's rate-limiting service (which I think implements the token bucket algorithm to handle rate limiting). You could use Amazon's simple queueing service for the queues and Amazon lambda as the queue consumers.
There are two reasons why rate limits may cause you problems.
Chronic: (that is, a sustained situation). You are hitting rate limits because your sustained demand exceeds your allowance.
In this case, consider a local cache, so you don't ask for the same thing twice. Hopefully the API you are using has a reliable "last-modified" date so you can detect when your cache is stale. With this approach, your API calling is to refresh your cache, and you serve requests from your cache.
If that can't help, you need higher rate limits
Acute: your application makes bursts of calls that exceed the rate limit, but on average your demand is under the limit. So you have a short term problem. I have settled on a brute-force solution for this ("shoot first, ask permission later"). I burst until I hit the rate limit, then I use retry logic, which is easy as my preferred tool is python, which supports this easily. The returned error is trapped and retry handling takes over. I think every mature library would have something like this.
https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html
The default retry logic is to backoff in increasingly big steps of time.
This has a starvation risk, I think. That is, if there are multiple clients using the same API, they share the same rate limit as a pool. On your nth retry, your backoff may be so long that newer clients with shorter backoff times are stealing your slots ... by the time your long backoff time expires, the rate limit has already been consumed by a younger competitor, so you now retry even longer, making the problem worse,although at the limit, this just means the same as the chronic situation: the real problem is your total rate limit is insufficient, but you might not be sharing fairly among jobs due to starvation. An improvement is to provide a less naive algorithm, it's the same locking problem that you do in computer science (introducing randomisation is a big improvement). Once again, a mature library is aware of this and should help with built-in retry options.
I think that this depends on which API you want to call and for what data.
For example, Facebook limits their API call to 200 requests per hour and per user. So if your app grows, and you are using their OAuth implementation correctly, you shouldn't be limited here.
Now, what data do you need? Do you really need to make all these calls? Is the information you call somewhat storable on any of your server?
Let's imagine that you need to display an Instagram feed on a website. So at each visitor request, you reach Instagram to get the pictures you need. And when your app grows, you reach the API limit because you have more visitors than what the Instagram API allows. In this case, you should definitely store the data on your server once per hour, and let your users reach your database rather than Instagram's one.
Now let's say that you need specific information for every user at each request. Isn't it possible to let that user handle his connection to the API's? Either by implementing the OAuth 2 flow of the API or by asking the user their API informations (not very secure I think...)?
Finally, if you really can't change the way you are working now, I don't see any other options that the ones you listed here.
EDIT: And Finally, as #Eric Stein stated in his comment, some APIs allow you to rise your API limit by paying (a lot of SaaS do that), so if your app grows, you should afford to pay for those services (they are delivering value to you, it's fair to pay them back)
EDIT: STILL NOT ANSWERED. I appreciate the advice I have received so far, but I still have not found a proper way to test the amount of resources my server is using. I decided to use GCE instead of GAE but I still want to measure the resource usage.
I have searched all over google as well as SA and can't seem to figure this one out.
I would like to deploy my (very small) node.js server to either Google App Engine or Google Compute Engine (not sure which to use yet).
I see that they charge based on how many resources you use, but how can I check this before I make my decision? Basically what I would like to do is find a way to analyse my server and see what CPU/DISK/NETWORK/RAM/Etc it uses, and then possibly make some refinements to my code to get the usage down as low as possible.
I am a hobbyist programmer and this server is just for personal stuff so I don't need anything fancy. I just want to get it hosted on google and not my home server. My real fear is that, since I am not a professional, my code might be doing some crazy background stuff repeatedly that would rack my usage up for nothing.
Quick rundown on what my server does:
Basic node.js express template that IntelliJ made me, then I added my code to sit and listen to a Firebase. When the firebase gets a message (once or twice a day maybe, text message equivalent size) the server sends a quick GCM/FCM message to a few devices. Extremely simple server, very little code. Nothing crazy.
As a little bonus for me, if you have a suggestion as to which platform I should use, I am all-ears.
If you do not need this server to run 24x7, use App Engine. It stops an instance if it is not being used for 15 minutes. The startup time for new instances depends on your code, but for Node.js instances it should not be long.
Generally speaking it is easier to run an app on App Engine than Compute Engine, but if you use a single instance and don't change code often the difference is negligible.
App Engine has a generous free quota. You may end up paying nothing until the usage gets over a certain threshold.
You can run some diagnostic tools on your existing server, but even then you will get an approximation - a server with a different combination of resources sitting on a different network may use resources differently. You may be able to get a rather accurate estimate of memory usage, though.
If this is a small app with not too many users, even a small instance should be able to handle it. There is no harm in trying - start with the smallest instance, test, go to the next instance up if tests fail. Your key concern should be to have enough memory to handle a small number of requests.
As for the number of requests your server can handle, you can configure automatic scaling. It is a default option in App Engine and can be enabled for flexible runtime. Then you can have the smallest instance (i.e. your server does not crash due to the lack of memory) running, and another instance will be added if and when that small instance is not enough.
Well, after over a month I figure I might as well answer this myself.
What I ended up doing was creating a basic instance on Computer Engine (the micro. Smallest one available) and letting it just sit there for a few weeks. I looked back at the data to see what some good baselines were and took note.
Then I took my server code and ran it on the server. I left if there for a few days, changed it, updated it, etc. Just tried to simulate the things I would be doing. Sent messages on my client app (that's what this server is doing after all is said and done) and I let this go on for a few more weeks.
The rest is history. I looked at the baseline then looked at my new memory, CPU, network and disk usage and there we go. Good to go. My free trial still isn't even over so it was a free experiment.
The good news is that my server is more 'lightweight' than I thought.
We are currently in the process of organising a student conference.
The issue is that we offer several different events at the same time over the course of a week. The conference runs the whole day.
It's currently been operating on a first come, first served basis, however this has led to dramatic problems in the past, namely the server crashing almost immediately, as 1000+ students all try to get the best events as quickly as they can.
Is anyone aware of a way to best handle this so that each user has the best chance of enrolling in the events they wish to attend, firstly without the server crashing and secondly with people registering for events which have a maximum capacity, all within a few minutes? Perhaps somehow staggering the registration process or something similar?
I'm aware this is a very broad question, however I'm not sure where to look to when trying to solve this problem...
Broad questions have equally broad answers. There are broadly two ways to handle it
Write more performant code so that a single server can handle the load.
Optimize backend code; cache data; minimize DB queries; optimize DB queries; optimize third party calls; consider storing intermediate things in memory; make judicious use of transactions trading off consistency with performance if possible; partition DB.
Horizontally scale - deploy multiple servers. Put a load balancer in front of your multiple front end servers. Horizontally scale DB by introducing multiple read slaves.
There are no quick fixes. It all starts with analysis first - which parts of your code are taking most time and most resources and then systematically attacking them.
Some quick fixes are possible; e.g. search results may be cached. The cache might be stale; so there would be situations where the search page shows that there are seats available, when in reality the event is full. You handle such cases on registration time. For caching web pages use a caching proxy.
We just suffered a SQL Database connectivity issue on Azure. Although very quick, around 1 minute, it did kick all users out, and/or raised Elmah Errors such as:
The wait operation timed out ...
at System.Data.ProviderBase.DbConnectionPool.TryGetConnection
Even glitches like this compromises confidence. I am trying to understand about good approaches to confront these transitory outages. Some thoughts that come to mind include:
Have some code that checks that all required services are running before using them and keep checking with provide friendly error message until they are. I think there is a tendency to assume all is available and working, and I wonder whether this is a dangerous assumption in the world of cloud. I suppose this is more an approach one would take when building a distributed application, although one may not for a database which is usually close to the web application.
Use failover procedures such as TrafficManager. However it is expensive as one now has >1 instance and also one needs to take care of the syncing data across >1 DB etc. Associated link on Failover procedure in Azure
Make sure Custom Error pages are used so Yellow Screen of death (YSOD) is not seen:
<customErrors mode="RemoteOnly" defaultRedirect="~/Error/Error" />
Although YSOD was seen by a colleague, not sure how with the above in force. Once criticism I have of Azure is that if Websites are down, then one can get bad error pages, only provided by Azure and not customisable, although I was advised that using something like CloudFlare can sort this issue.
I think a) is the most interesting concept. Should we code Azure Web Apps as if they are WAN rather than LAN applications, and assume nodes could be down, and so check beforehand?
I would really appreciate thoughts on the above. Our feeling is that Azure is getting a few too many of these outage blips now, which may be due to increased customers... not sure. Although no doubt within the 99.9% annual SLA.
EDIT1
A useful MSDN Azure Cloud Architecture article on this:
Resilient Azure Website Architectures