How can I get old cloud-watch logs in node js lambda code (aws-sdk) - node.js

I want to get old cloud watch logs into my lambda code and update it in some other db. I want to use AWS-SDK solution so that i can write code in NODEJS to get some changes in logs n save it to some other DB

GetLogEvents method may work for you.
Lists log events from the specified log stream. You can list all the log events or filter using a time range.
By default, this operation returns as many log events as can fit in a response size of 1MB (up to 10,000 log events). You can get additional log events by specifying one of the tokens in a subsequent call.
here is the link for details

Related

Media conversion on AWS

I have an API written in nodeJS (/api/uploadS3) which is a PUT request and accepts a video file and a URL (AWS s3 URL in this case). Once called its task is to upload the file on the s3 URL.
Now, users are uploading files to this node API in different formats (thanks to the different browsers recording videos in different formats) and I want to convert all these videos to mp4 and then store them in s3.
I wanted to know what is the best approach to do this?
I have 2 solutions till now
1. Convert on node server using ffmpeg -
The issue with this is that ffmpeg can only execute a single operation at a time. And since I have only one server I will have to implement a queue for multiple requests which can lead to longer waiting times for users who are at the end of the queue. Apart from that, I am worried that during any ongoing video conversion if my node's traffic handling capability will be affected.
Can someone help me understand what will be the effect of other requests coming to my server while video conversion is going on? How will it impact the RAM, CPU usage and speed of processing other requests?
2. Using AWS lambda function -
To avoid load on my node server I was thinking of using an AWS lambda server where my node API will upload the file to S3 in the format provided by the user. Once, done s3 will trigger a lambda function which can then take that s3 file and convert it into .mp4 using ffmpeg or AWS MediaConvert and once done it uploads the mp4 file to a new s3 path. Now I don't want the output path to be any s3 path but the path that was received by the node API in the first place.
Moreover, I want the user to wait while all this happens as I have to enable other UI features based on the success or error of this upload.
The query here is that, is it possible to do this using just a single API like /api/uploadS3 which --> uploads to s3 --> triggers lambda --> converts file --> uploads the mp4 version --> returns success or error.
Currently, if I upload to s3 the request ends then and there. So is there a way to defer the API response until and unless all the operations have been completed?
Also, how will the lambda function access the path of the output s3 bucket which was passed to the node API?
Any other better approach will be welcomed.
PS - the s3 path received by the node API is different for each user.
Thanks for your post. The output S3 bucket generates File Events when a new file arrives (i.e., is delivered from AWS MediaConvert).
This file event can trigger a second Lambda Function which can move the file elsewhere using any of the supported transfer protocols, re-try if necessary; log a status to AWS CloudWatch and/or AWS SNS; and then send a final API response based on success/completion of them move.
AWS has a Step Functions feature which can maintain state across successive lambda functions, for automating simple workflows. This should work for what you want to accomplish. see https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-creating-lambda-state-machine.html
Note, any one lambda function has a 15 minute maximum runtime, so any one transcoding or file copy operation must complete within 15min. The alternative is to run on EC2.
I hope this helps you out!

Which amazon service should i use to implement time based queue dispatcher (serverless application)?

User submit a csv file which contains time (Interval) with message. I want to submit that message on the time mentioned with message to chat API. I am using DynamoDB to store message and a lambda function which read the message from DynamoDB and one at a time use setTimeout function to publish message on chat. I am using node js to implement that functionality. I also created a amazon API to trigger that lambda fUnction.
But this approach is not working. Can any one suggest me which other service should i use to do same ? Is there any amazon queue service for that?
From the context of your question what I understand is that you basically need to create a futuristic timer. A system that can inform you sometime in the future with some metadata to take an action.
If this is the case, on top of my head I think you can use the below solution to achieve your goal:
Pre-requisites: I assume, you are already using Dynamo DB(aka DDB) as a primary store. So all CSV data is persisted in the dynamo and you are using dynamo stream to read the insert and updated records to trigger your lambda function(let's call this lambda function as Proxy_Lambda).
Create another lambda function that processes records and sends a message to your chat system(let's call this lambda function as Processor_Lambda)
Option 1: AWS SQS
Proxy_Lambda reads records from DDB stream and based on the future timestamp attribute present in the record, it publishes a message to AWS SQS queue with initial visibility timeout equals to the timestamp. Sample example: Link. Remember, these messages will not be visible to any of the consumer until the visibility timeout.
Add a trigger for Processor_Lambda to start polling from this SQS queue.
Once message becomes visible in the queue(after the initial timeout), Processor_Lambda consumes the message and send the chat events.
Result: You will be able to create a futuristic timer using the SQS visibility timeout feature. Cons here is that you will not be able to view the in-flight SQS message content until the visibility timeout of the message occurs.
Note: Max visibility timeout can be set for 12 hours. So if your use-case demand a timer for more then 12 hours, you need to add code logic in Processor_Lambda to send that message back to queue with new visibility timeout.
Option 2: AWS Step function (my preferred approach ;) )
Crate state machine in AWS Step function to generate task timers (let's call it Timer_Function). These task timers will keep looping between the wait state until the timer expires. Timer window will be provided as an input to this step function.
Link Timer_Function to trigger Processor_Lambda once the task timer expires. Basically, that will be the next step after the Timer step.
Connect Proxy_Lambda with Timer_Function i.e. Proxy_Lambda will read records from DDB stream and invoke the Timer_Function with message interval attribute present the Dynamo DB record and the necessary payload.
Result: A Timer_Function that keep looping until the time window(message interval) expires. Which in turn provide you a mechanism to trigger Proxy_Lambda in the future(i.e. the timer window)
Having said that, now I will leave this up to you to choose the right solution based on the use-case and business requirement.

What is the best way to keep local copy of Firebase Database on node.js

I have an app where I need to check people's posts constantly. I am trying to make sure that the -server- handles more than 100,000 posts. I tried to explain the program and specify the issues I am worried about by numbers.
I am running a simple node.js program on my terminal that runs as firebase admin controlling the Firebase Database. The program has no connectivity with clients(users), it just keeps the database locally to check users' posts every 2-3 seconds. I am keeping the posts in local hash variables by using on('child_added') to simply push the post to a posts hash and so on for on('child_removed') and on('child_changed').
Are these functions able to handle more than 5 requests per second?
Is this the proper way of keeping data locally for faster processing(and not abusing firebase limits)? I need to check every post on the platform every 2-3 seconds, so I am trying to keep a local copy of the -posts data.
That local copy of the posts are looped through every 2-3 seconds.
If there are thousands of posts, will a simple array variable handle that load?
Second part of the program:
I run a for loop to loop through the posts in a function. I run the function every 2-3 seconds using setInterval(). The program needs not only to check new added posts but it constantly needs to check all posts on the database.
If(specific condition for a post) => the program changes the state of the post
.on(child_changed) function => sends an API request to a website after that state change
Can this function run asynchronously ? When it is called, the function should not wait for the previous call to finish because the old call is sending an API request and it might not complete fast. How can I make sure that .on(child_changed) doesn't miss a single change on the -posts data?
Listen for Value Events documentation shows how to observe changes, namely one uses the .on method.
In terms of backing up your Realtime Database, you simply export the data manually, or if you have the paid plan you can automate it.
I don't understand why you would want to recreate the wheel, so to speak, and have your server ping firebase for updates. Simply use firebase observers.

AWS Step/Lambda - storing variable between runs

In my first foray into any computing in the cloud, I was able to follow Mark West's instructions on how to use AWS Rekognition to process images from a security camera that are dumped into an S3 bucket and provide a notification if a person was detected. His code was setup for the Raspberry Pi camera but I was able to adapt it to my IP camera by having it FTP the triggered images to my Synology NAS and use CloudSync to mirror it to the S3 bucket. A step function calls Lambda functions per the below figure and I get an email within 15 seconds with a list of labels detected and the image attached.
The problem is the camera will upload one image per second as long the condition is triggered and if there is a lot of activity in front of the camera, I can quickly rack up a few hundred emails.
I'd like to insert a function between make-alert-decision and nodemailer-send-notification that would check to see if an email notification was sent within the last minute and if not, proceed to nodemailer-send-notification right away and if so, store the list of labels, and path to the attachment in an array and then send a single email with all of the attachments once 60 seconds had passed.
I know I have to store the data externally and came across this article explaining the benefits of different methods of caching data and I also thought that I could examine the timestamps of the files uploaded to S3 to compare the time elapsed between the two most recent uploaded files to decide whether to proceed or batch the file for later.
Being completely new to AWS, I am looking for advice on which method makes the most sense from a complexity and cost perspective. I can live with the lag involved in any of methods discussed in the article, just don't know how to proceed as I've never used or even heard of any of the services.
Thanks!
You can use a SQS queue to which the lambda make-alert-decision sends message with each label and path to attachment.
The lambda nodemailer-send-notification would be a consumer of that queue, but being executed on a regular schedule.
You can specify that lambda to be executed every 1 minute, reading all the messages from the queue - and deleting them from the queue right away or setting a visibility time suitable and deleting afterwards - to get the list of attachments and send a single email. We would have a single email with all the attachments every 60 seconds.

Azure stream analytics - How to redirect or handle error events/rows?

Is there a way to capture and redirect data error events/rows to a separate output?
For example, say I have events coming through and for some reason there are data conversion errors. I would like to handle those errors and do something, probably a separate output for further investigation?
Currently in stream analytics error policy, if an event fails to be written to output we only got two options
Drop - which just drops the event (or)
Retry - retries writing the event until it succeeds
Collecting all error events is not supported currently. You can enable diagnostic logs and get a a sample of every kind of error at frequent intervals.
Here is the documentation link.
If there is a way for you to filter such events in the query itself, then you could redirect such events to a different output and reprocess that later.

Resources