AWS Step/Lambda - storing variable between runs - node.js

In my first foray into any computing in the cloud, I was able to follow Mark West's instructions on how to use AWS Rekognition to process images from a security camera that are dumped into an S3 bucket and provide a notification if a person was detected. His code was setup for the Raspberry Pi camera but I was able to adapt it to my IP camera by having it FTP the triggered images to my Synology NAS and use CloudSync to mirror it to the S3 bucket. A step function calls Lambda functions per the below figure and I get an email within 15 seconds with a list of labels detected and the image attached.
The problem is the camera will upload one image per second as long the condition is triggered and if there is a lot of activity in front of the camera, I can quickly rack up a few hundred emails.
I'd like to insert a function between make-alert-decision and nodemailer-send-notification that would check to see if an email notification was sent within the last minute and if not, proceed to nodemailer-send-notification right away and if so, store the list of labels, and path to the attachment in an array and then send a single email with all of the attachments once 60 seconds had passed.
I know I have to store the data externally and came across this article explaining the benefits of different methods of caching data and I also thought that I could examine the timestamps of the files uploaded to S3 to compare the time elapsed between the two most recent uploaded files to decide whether to proceed or batch the file for later.
Being completely new to AWS, I am looking for advice on which method makes the most sense from a complexity and cost perspective. I can live with the lag involved in any of methods discussed in the article, just don't know how to proceed as I've never used or even heard of any of the services.
Thanks!

You can use a SQS queue to which the lambda make-alert-decision sends message with each label and path to attachment.
The lambda nodemailer-send-notification would be a consumer of that queue, but being executed on a regular schedule.
You can specify that lambda to be executed every 1 minute, reading all the messages from the queue - and deleting them from the queue right away or setting a visibility time suitable and deleting afterwards - to get the list of attachments and send a single email. We would have a single email with all the attachments every 60 seconds.

Related

Generate big data in excel or pdf using REST API

I'm trying to generate the excel report file in micro-service using REST API.
On REST API if the generation process may take long time, connection would give time out for the users.
Is there any best practice or architecture pattern for this purpose?
EX: If data includes 10 column with 1 million rows the generation process should spend 30 seconds. Also it might depends on what technical resources we have.
You should do heavy task in asynchronous way. Client should just trigger the process and should not wait for the completion. Now question come how Client will get updated copy of Excel. There are 2 ways:-
In response of initiate call, server return a job Id. Client will keep polling for the status of job Id. Whenever job get completed, it will get the file.
Some notification mechanism like Socket.io, where server will notify whenever job is done. After getting notification, client may download the processed file.

Process long arrays in Cloud Functions

I'am developing an application that makes users able to broadcast videos. As many social network do, users need to receive a notification when someone goes live. To do so I'am using Cloud Functions and i pass to the functions the array of the users that must receive the notification; for every user of the array I need to extract the FCM TOKEN from the server and then send the notification.
For arrays of 10 / 20 Users the functions doesn't take so long, but for 150/300 users sometimes I get timeout or a very slow execution.
So my question is: Is it possible to divide the array in groups of 20/30 users and process many arrays at same time??
Thanks
There is 2 way to answer this
From a development point of view, some languages are easier for allowing the concurrent processing (Go is very handy for this). So, like this, because you spend a lot of time in API call (FCM), it's a first solution to perform several calls concurrently
From an architecture point of view, PubSub and/or Cloud Task are well designed for this
Your first function only creates chunk of message to send and posts them to Cloud Task or PubSub
Your second function receives the chunks and sends the messages. The chunks are processed in parallel on several functions.

What is the best way to keep local copy of Firebase Database on node.js

I have an app where I need to check people's posts constantly. I am trying to make sure that the -server- handles more than 100,000 posts. I tried to explain the program and specify the issues I am worried about by numbers.
I am running a simple node.js program on my terminal that runs as firebase admin controlling the Firebase Database. The program has no connectivity with clients(users), it just keeps the database locally to check users' posts every 2-3 seconds. I am keeping the posts in local hash variables by using on('child_added') to simply push the post to a posts hash and so on for on('child_removed') and on('child_changed').
Are these functions able to handle more than 5 requests per second?
Is this the proper way of keeping data locally for faster processing(and not abusing firebase limits)? I need to check every post on the platform every 2-3 seconds, so I am trying to keep a local copy of the -posts data.
That local copy of the posts are looped through every 2-3 seconds.
If there are thousands of posts, will a simple array variable handle that load?
Second part of the program:
I run a for loop to loop through the posts in a function. I run the function every 2-3 seconds using setInterval(). The program needs not only to check new added posts but it constantly needs to check all posts on the database.
If(specific condition for a post) => the program changes the state of the post
.on(child_changed) function => sends an API request to a website after that state change
Can this function run asynchronously ? When it is called, the function should not wait for the previous call to finish because the old call is sending an API request and it might not complete fast. How can I make sure that .on(child_changed) doesn't miss a single change on the -posts data?
Listen for Value Events documentation shows how to observe changes, namely one uses the .on method.
In terms of backing up your Realtime Database, you simply export the data manually, or if you have the paid plan you can automate it.
I don't understand why you would want to recreate the wheel, so to speak, and have your server ping firebase for updates. Simply use firebase observers.

Stream - Suspiciously large amount of feed updates

We are in the process of integrating Stream to power our notifications module.
When looking at the usage metrics in the dashboard, we see suspiciously large amount of feed updates:
As you can see we have around 9K feed updates per day.
Those daily 9K feed updates don't make sense since right now our backend code does not create any activities.
The only Stream API calls that happen, is when a new user registers we create a new stream for it, of type 'notification', and make this new stream follow a single admin stream which is of type 'flat':
const notifications = client.feed('notifications', userId);
await notifications.follow('user', 'admin');
So for example, if we had today 200 new users who registered, the admin's flat stream will have additional +200 followers.
As of today, we have:
4722 streams of type 'notification'
1 stream of type 'flat'
These are the only interactions we do with Stream's API, and we don't understand what is the source of all those feed updates in the dashboard.
(Maybe these follow commands counts as a feed update?)
We have something happening very similar. We have a dev app for testing, and suddenly 1K "read feed" operation on notification feed group showed up in the log 1 day ago. Its impossible since we haven't rolled the feature out and in case this is the dev app where we did read the feed at most 10 times manually via postman to our backend to getstream.
The correct ops in the log show client as stream-python-client-2.11.0 which makes sense.
The incorrect ops in the log show client as stream-javascript-client-browser-unknown, which does not make sense.
Further the incorrect ops timestamps are all clustered within a short time.
This has not happened since, and has not happened in the production app yet.

SQS: Know remaining jobs

I'm creating an app that uses a JobQueue using Amazon SQS.
Every time a user logs in, I create a bunch of jobs for that specific user, and I want him to wait until all his jobs have been processed before taking the user to a specific screen.
My problem is that I don't know how to query the queue to see if there are still pending jobs for a specific user, or how is the correct way to implement such solution.
Everything regarding the queue (Job creation and processing is working as expected). But I am missing that final step.
Just for the record:
In my previous implementation I was using Redis + Kue and I had created a key with the user Id and the job count, every time a job was added that job count was incremented, and every time a job finished or failed I decremented that count. But now I want to move away from Redi + Kue and I am not sure how to implement this step.
Amazon SQS is not the ideal tool for the scenario you describe. A queueing system is normally used in a "Send and Forget" situation, where the sending system doesn't remain interested in later processing.
You could investigate Amazon Simple Workflow (SWF), which allows work to be monitored as it goes through several processes. Your existing code could mostly be re-used, just with the SWF framework added. Or even power it from Lambda, since you are already using node.js.

Resources