Process long arrays in Cloud Functions - node.js

I'am developing an application that makes users able to broadcast videos. As many social network do, users need to receive a notification when someone goes live. To do so I'am using Cloud Functions and i pass to the functions the array of the users that must receive the notification; for every user of the array I need to extract the FCM TOKEN from the server and then send the notification.
For arrays of 10 / 20 Users the functions doesn't take so long, but for 150/300 users sometimes I get timeout or a very slow execution.
So my question is: Is it possible to divide the array in groups of 20/30 users and process many arrays at same time??
Thanks

There is 2 way to answer this
From a development point of view, some languages are easier for allowing the concurrent processing (Go is very handy for this). So, like this, because you spend a lot of time in API call (FCM), it's a first solution to perform several calls concurrently
From an architecture point of view, PubSub and/or Cloud Task are well designed for this
Your first function only creates chunk of message to send and posts them to Cloud Task or PubSub
Your second function receives the chunks and sends the messages. The chunks are processed in parallel on several functions.

Related

Firebase Functions Batch Processing / Time Limit

I am using Firebase Functions and I have a pubsub function that runs every night.
This function is responsible for doing some processing on every user's account (such as preparing some data for them for the next day and sending them an email).
The issue I'm having is that there is a time limit on how long a function can take to run, which for event-driven functions, is 10 minutes.
https://firebase.google.com/docs/functions/quotas#:~:text=Time%20Limits,-Quota&text=60%20minutes%20for%20HTTP%20functions,minutes%20for%20event%2Ddriven%20functions.
Now that my number of users has scaled significantly, this limit is no longer sufficient to complete the work for all users.
To help with this, I created a firestore.onWrite event function that fires when a certain document is written. Now, when my pubsub function runs, it writes to a document in firestore, which then triggers my onWrite function to run. This seems to have allowed me to process more users than previously, However, 1) this just feels wrong, and 2) it's still not sufficient as there are rate-limits to writing documents that I've run into as well and I just can't squeeze all my users into the time limit allowed.
What is the correct approach for doing nightly batches (on user accounts), preferably within the Firebase ecosystem?
As you correctly pointed out and as it is mentioned in the documentation, the maximum amount of time a function can run before being forcibly terminated is 10 mins for an event-driven function.
However, in order to circumvent that limitation I found this SO post of another person that had a similar issue to you.Most of the recommendations are batching it under 500 and then committing.
Here, have a look at the solution here.
Hope this helps.

Azure Durable Functions as Message Queue

I have a serverless function that receives orders, about ~30 per day. This function is depending on a third-party API to perform some additional lookups and checks. However, this external endpoint isn't 100% reliable and I need to be able to store order requests if the other API isn't available for a couple of hours (or more..).
My initial thought was to split the function into two, the first part would receive orders, do some initial checks such as validating the order, then post the request into a message queue or pub/sub system. On the other side, there's a consumer that reads orders and tries to perform the API requests, if the API isn't available the orders get posted back into the queue.
However, someone suggested to me to simply use an Azure Durable Function for the requests, and store the current backlog in the function state, using the Aggregator Pattern (especially since the API will be working find 99.99..% of the time). This would make the architecture a lot simpler.
What are the advantages/disadvantages of using one over the other, am I missing any important considerations?
I would appreciate any insight or other suggestions you have. Let me know if additional information is needed.
You could solve this problem with Durable Task Framework or Azure Storage or Service Bus Queues, but at your transaction volume, I think that's overcomplicating the solution.
If you're dealing with ~30 orders per day, consider one of the simpler solutions:
Use Polly, a well-supported resilience and fault-tolerance framework.
Write request information to your database. Have an Azure Function Timer Trigger read occasionally and finish processing orders that aren't marked as complete.
Durable Task Framework is great when you get into serious volume. But there's a non-trivial learning curve for the framework.

How to make multiple API calls with rate limits per user using RabbitMQ?

In my app I am getting data on behalf of different users via one API which has a rate limit of 1 API call every 2 seconds per user.
Currently I am storing all the calls I need to make in a single message queue. I am using RabbitMQ for this.
There is currently one consumer who is taking one message at a time, doing the call, processing the result and then start with the next message.
The queue is filling up faster than this single consumer can make the API calls (1 call every 2 seconds as I don't know which user comes next and I don't want to hit API limits).
My problem is now that I don't know how to add more consumers which in theory would be possible as the queue holds jobs for different users and the API rate limit is per user so e.g. I could do 2 API calls every 2 seconds if they are from different users.
However I have no information about the messages in the queue. Could be from a single user, could be from many different users.
Only solution I see right now is to create separate queues for each user. But I have many different users (say 1,000) and would rather stay with 1 queue.
If possible I would stick with RabbitMQ as I use this for other similar tasks as well. But if I need to change my stack I would be willing to do so.
App is using the MEAN stack.
You will need to maintain a state somewhere, I had a similar application and what i did was maintain state in Redis, before every call check if user has made request in last 2 seconds eg:
Redis key:
user:<user_id> // value is epoch time-stamp
update Redis once request is made.
refrence:
redis

How to manage multiple parallel threads in nodejs?

Scenario:
I am using node.js for writing a heavy application that listens to the tweets using streaming API, and for every tweet - it calls 3-4 REST APIs, then listens to a web socket for about 5-10 minutes and again calls 2 more REST apis.
Current Approach:
Though I have prepared functions for every task but I am not using callbacks. From tweet function F, I call another function F1 then from inside of F1 I call F2 and so on...
Problem:
It makes the code super messy and MAJORLY- the problem is that the concurrent requests are overlapping and sharing data among each other. For eg- while listening to a web socket I pass endpoint and auth info -
listenXYZ(endpoint, auth)
But it seems when two threads hit this function at same time then same endpoint is passed in those 2 sometimes.
Why does this happen?
How can I accomplish multi-threading with nice and organized data flow with controlled memory management?
Can I use workers?
Multi threading?

Instagram real-time API POST rate

I'm building an application using tag subscriptions in the real-time API and have a question related to capacity planning. We may have a large number of users posting to a subscribed hashtag at once, so the question is how often will the API actually POST to our subscription processing endpoint? E.g., if 100 users post to #testhashtag within a second or two, will I receive 100 POSTs or does the API batch those together as one update? A related question: is there a maximum rate at which POSTs can be sent (e.g., one per second or one per ten seconds, etc.)?
The Instagram API seems to lack detailed information about both how many updates are sent and what are the rate limits. From the [API docs][1]:
Limits
Be nice. If you're sending too many requests too quickly, we'll send back a 503 error code (server unavailable).
You are limited to 5000 requests per hour per access_token or client_id overall. Practically, this means you should (when possible) authenticate users so that limits are well outside the reach of a given user.
In other words, you'll need to check for a 503 and throttle your application accordingly. No information I've seen for how long they might block you, but it's best to avoid that completely. I would advise you manage this by placing a rate limiting mechanism on your own code, such as pushing your API requests through a queue with rate control. That will also give you the benefit of a retry of you're throttled so you won't lose any of the updates.
Moreover, a mechanism such as a queue in the case of real-time updates is further relevant because of the following from the API docs:
You should build your system to accept multiple update objects per payload - though often there will be only one included. Also, you should acknowledge the POST within a 2 second timeout--if you need to do more processing of the received information, you can do so in an asynchronous task.
Regarding the number of updates, the API can send you 1 update or many. The problem with this is you can absolutely murder your API calls because I don't think you can batch calls to specific media items, at least not using the official python or ruby clients or API console as far as I have seen.
This means that if you receive 500 updates either as 1 request to your server or split into many, it won't matter because either way, you need to go and fetch these items. From what I observed in a real application, these seemed to count against our quota, however the quota itself seems to consume resources erratically. That is, sometimes we saw no calls at all consumed, other times the available calls dropped by far more than we actually made. My advice is to be conservative and take the 5000 as a best guess rather than an absolute. You can check the remaining calls by parsing one of the headers they send back.
Use common sense, don't be stupid, and using a rate limiting mechanism should keep you safe and have the benefit of dealing with failures either due to outages (this happens more than you may think), network hicups, and accidental rate limiting. You could try to be tricky and use different API keys in a pooling mechanism, but this is likely a violation of the TOS and if they are doing anything via IP, you'd have to split this up to different machines with different IPs.
My final advice would be to restructure your application to not completely rely on the subscription mechanism. It's less than reliable and very expensive API wise. It's only truly useful if you just need to do something in your app that doesn't require calling back to Instgram, your number of items is small, or you can filter out the majority of items to avoid calling back to Instagram accept when a specific business rule is matched.
Instead, you can do things like query the tag or the user (ex: recent media) and scale it out that way. Normally this allows you to grab 100 items with 1 request rather than 100 items with 100 requests. If you really want to be cute, you could at least merge the subscription notifications asynchronously and combine the similar ones into a single batched request when you combine the duplicate characteristics such as tag into a single bucket. Sort of like a map/reduce but on a small data set. You could of course do an actual map/reduce from time-to-time on your own data as another way of keeping things in async. Again, be careful not to thrash instagram, but rather just use map/reduce to batch out your calls in a way that's useful to your app.
Hope that helps.

Resources