Update MongoDB Atlas collection using AWS lamda - node.js

We use MongoDB atlas, a cloud MongoDB database for our DB and NodeJS in the backend. I have to run a cron job at 2 AM every day which fetches the data from some third-party API and updates some collection in the DB. The client wants us to use AWS, preferably Lamda. Our System is run on an EC2 instance. Any leads ?? What would be the most efficient solution? It worked fine with 'node-cron' in my local but they want lamdas preferably AWS.

You can do that by attaching the cron trigger event on AWS lambda. You can use the same code that you are running on local.
It will be easy for you to use SAM cli for lambda, it will help you deploy and test your lambda easily.
What would be the most efficient solution ?
I believe there won't be any challenge for efficiency. There will be difference in billing, if you are only running this code base on the EC2 instance you need to start the EC2 instance to trigger the cron-job. However, AWS lambda will be only charged when trigger run the code for the cron-job. There won't be any major difference, but I believe lambda could be better for this job.
So, I recommend you to use AWS lambda for this job.
You should check this link which tells how many different ways we can trigger cron-job in AWS.

Related

How to deploy a Cron Job Node Js Script to Google Cloud?

I have a nodeJs script that is run by a Cron Job using the node-cron module
The purpose of this nodeJs script is to loop over items in my MongoDB and run some function.
Is it possible to deploy this nodeJS script/app to the GCP and have it run at every Sunday?
In my CronJob config in my NodeJS app, I already have it run only every Sunday.
However I was wondering whether if I Could use GCP's scheduler or just keep my Cron-Job in my NodeJs.
I've achieved this before by using Heroku Scheduler, however I have been having problems with deploying Puppeteer to Heroku therefore I am using GCP since Puppeteer works fine in the google cloud node js environment.
If anyone can give me some insight or some instructions on what I have to do I would appreciate it.
Thank you
What you are trying to achieve could be done by setting up a MongoDB Atlas with Google Cloud. Here you can find the documentation.
Then, you could use the Cloud Scheduler and Pub/Sub to trigger a Cloud Function (in nodeJS, like your script). Here is an example tutorial.
Then, in order to be able to connect your Cloud Function to your MongoDB cluster, this detailed guide will show you how to do so.
This should give you some insights to start searching for more information by yourself. Have in mind there are different alternatives. For example, instead of using MongoDB, you could use Firestore with your Cloud Functions and set the Cron Schedule with the Pub/Sub as previously mentioned.

Build an extensible system for scraping websites

Currently, I have a server running. Whenever I receive a request, I want some mechanism to start the scraping process on some other resource(preferably dynamically created) as I don't want to perform scraping on my main instance. Further, I don't want the other instance to keep running and charging me when I am not scraping data.
So, preferably a system that I can request to start scraping the site and close when it finishes.
Currently, I have looked in google cloud functions but they have a cap at 9 min max for every function so it won't fit my requirement as scraping would take much more time than that. I have also looked in AWS SDK it allows us to create VMs on runtime and also close them but I can't figure out how to push my API script onto the newly created AWS instance.
Further, the system should be extensible. Like I have many different scripts that scrape different websites. So, a robust solution would be ideal.
I am open to using any technology. Any help would be greatly appreciated. Thanks
I can't figure out how to push my API script onto the newly created AWS instance.
This is achieved by using UserData:
When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts.
So basically, you would construct your UserData to install your scripts, all dependencies and run them. This would be executed when new instances are launched.
If you want the system to be scalable, you can lunch your instances in Auto Scaling Group and scale it up or down as you require.
The other option is running your scripts as Docker containers. For example using AWS Fargate.
By the way, AWS Lambda has limit of 15 minutes, so not much more than Google functions.

How is it possible to debug an AWS Lambda function from remote?

We are taking over a whole application from another company, and they have built the whole pipeline for deploying, but we still don't have access to it. What we know, that there's a lambda function is running triggered by certain SNS messages, and all the code is in Node.js, and the development is in VS Code. We also have issues debugging it locally, but it's a bigger problem, that we need to debug it remotely.
Since I am new in AWS services, I'd really appreciate if somebody could help me in this.
Does it necessary to open a port? How is it possible to connect to a lambda? Do we need serverless to setup? Many unresolved questions.
I don't think there is way you can debug a lambda function remotely. Your best bet is to download the code on local machine, setup the env variables as you have set up on your lambda function and take it from there.
Remember at the end of the day lambda is just a container which is running the code for you. AWS doesn't allow any ssh or connection with those container. In your case you should be able to debug it on local till you have the same env variables. There are other things as well which are lambda specific but considering it is a running code which you have got so you should be able to find out the issue.
Hope it makes sense.
Thundra (https://www.thundra.io/aws-lambda-debugger) has live/remote debugging support for AWS Lambda through its native IDE plugins (VSCode and IntelliJ IDEA).
The way AWS have you 'remote' debug is to execute the lambda locally through Docker as it proxies the requests to the cloud for you, using AWS Toolkit. You have a lambda running on your local computer via docker that can access resources on the cloud, such as databases, api's etc. You can step through debug them using editors like vscode.
I use SAM with a template.yaml . This way, I can pass event data to the handler, reference dependency layers (shared code libraries) and have a deployment manifest to create a Cloudformation stack (release instance with history and resource management).
Debugging can be a bit slow as it compiles, deploys to Docker and invokes, but allows step through debugging and variable inspection.
https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-using-debugging.html
While far from ideal, any console-printing actions would likely get logged to CloudWatch, which you could then access to go through printed data.
For local debugging, there are many Github projects with Dockerfiles which which you can build a docker container locally, just like AWS does when your Lambda is invoked.

How should I migrate (update DB schema) my DB for my AWS Serverless application

How should I be running my DB migrations in a AWS Serverless application? In a traditional NodeJS app, I usually have npm start run sequelize db:migrate first. But with Lambda how should I do that?
My DB will be in a private subnet. Was wondering if CodeBuild will be able to do it? Was also considering to have a Lambda function run migration ... not sure if its recommended tho.
There are a number of ways to achieve this. You actually are kind of on the right track with CodeBuild, at least there shouldn't be anything wrong with taking that approach.
Since your DB is in a private subnet, you will need to configure CodeBuild to access your VPC. Once you have that configured, it's a simple matter of allowing access from the CodeBuild security group to your database.
You might want to setup this whole thing as a CodePipeline. You might even set it up with multiple buildspec files for different runs of CodeBuild. That way you can have a CodePipeline that looks like:
Source -> CodeBuild (test) -> Approval -> CodeBuild (migrations) -> Lambda
Theoretically, you could also create a Lambda function that does the migration, and trigger that as needed. If the migrations take a long time, you could also use AWS Batch to run them. But using CodeBuild as part of a deployment pipeline makes a lot of sense.
Lambda might not be a right tool for this task because of short runtime.
You are better of using a custom script run on the CodeBuild. And having a sequential CodeBuild tasks in your Codepipeline where first codebuild will complete the migration and on-complete of the first CodeBuild, you can execute the new CodeBuild which will deploy your lambdas. Just in case if your DB migration fails, you can exit the CodePipeLine.
Your CodePipeLine will looks like this.
pre_build:
commands:
- DB migration command
finally:
- CleanUp Command
build:
commands:
- Deploy lambdas command
finally:
- Cleanup command
Both approaches (lambda and codebuild) are ok, it depends on your Continuous Deployment/Integration flow. For example, if you need to run those migrations on several environments, Codebuild would fit better.
If you don't have a CI/CD mechanism, you can just run it on a lambda, as it's very flexible in terms of memory (you just need to be careful in the maximum execution time), or use an already made package like this (this is a suggestion and depends on your db).
As a last opinion, if your process is really heavy and/or needs to do a lot of read/write operations, you could also try running it on an AWS ECS instance which will scale when you run the migrations and after finishing, return to its minimum size defined.

Using MongoDB with AWS ElasticBean application

I have an ElasticBean application running (setup with NodeJS) and I wondered what the best way to integrate MongoDB would be. As of now, all I've done is ssh into my eb instance (with the eb cli) and install mongodb. My main worry comes from the fact that the mongo db exists in my instance. As I understand it, that means that my data will most certainly be lost as soon as I terminate my instance. Is that correct? If so the what is the best way to go about hooking en EB app to a MongoDB? Does AWS support that natively without having to go rent a DB on a dedicated server?
You definitely do NOT want to install MongoDB on an Elastic Beanstalk instance.
You have a few options for running MongoDB on AWS. You can install it yourself on some EC2 servers (NOT Elastic Beanstalk servers) and handle all the management of that yourself. The other option is to use mLab (previously MongoLab) which is a managed MongoDB as a Service provider that works on AWS as well as other cloud services. Using mLab you can easily provision a MongoDB database in the same AWS region as your Elastic Beanstalk servers.
Given the steps involved in setting up a highly-available MongoDB cluster on AWS I generally recommend using mLab instead of trying to handle it yourself. Just know that the free-tier is not very performant and you will want to upgrade to one of the paid plans for your production database.
Been there, done that. As #MarkB suggested it´d be a lot easier to use a SaaS instead.
AWS by itself doesn´t have a native MongoDB support but deppending of your requierements you could find a solution with little or no extra cost (beside EC2 price) on Amazon Marketplace. These images are vendor´s pre-configured production-ready AMIs of popular tools like MongoDB.

Resources