AWS S3 bucket to bucket sync using Nodejs - node.js

I have to create a Nodejs script to perform the S3 bucket to bucket sync. I don't want to run this when a file is just uploaded to the master S3, so I think lambda is not an option. I need to run the task daily once at a particular time.
How can I achieve this S3 bucket sync using NodeJS using aws-sdk?
Cron can be used for scheduling. I found only aws-sdk code to copy from S3 to another S3. Do we have a code in place to sync two S3 buckets?

AWS S3 Bucket synchronization using Nodejs and aws-sdk can be performed by the method of the s3sync package. If you use it with node-cron, you will be able to implement AWS S3 bucket synchronization scheduling through Nodejs.
I don't know if it'll help, if Cron and aws-cli are available, the purpose can be achieved without Nodejs.
You simply add the code below to the crontab.
0 0 * * * aws s3 sync s3://bucket-name-1 s3://bucket-name-2

You will need a cron job and nodejs provides a library named node-cron
let cron = require('node-cron');
cron.schedule('* * * * *', () => {
// TODO
...
});
For daily cron you can use something like
0 0 * * *
The first 0 specifies the minutes and the second the hours so this cron will run every day at midnight.

Related

EC2 instance running S3 Sync command terminates before data transfer is complete

I have an EC2 instance running Linux. This instance is used to run aws s3 commands.
I want to sync the last 6 months worth of data from source to target S3 buckets. I am using credentials with the necessary permissions to do this.
Initially I just ran the command:
aws s3 sync "s3://source" "s3://target" --query "Contents[?LastModified>='2022-08-11' && LastModified<='2023-01-11']"
However, after maybe 10 mins this command stops running, and only a fraction of the data is synced.
I thought this was because my SSM session was terminating, and with it the command stopped executing.
To combat this, I used the following command to try and ensure that this command would continue to execute even after my SSM terminal session was closed:
nohup aws s3 sync "s3://source" "s3://target" --query "Contents[?LastModified>='2022-08-11' && LastModified<='2023-01-11']" --exclude "*.log" --exclude "*.bak" &
Checking the status of the EC2 instance, the command appears to run for about 20 mins, before clearly stopping for some reason.
The --query parameter controls what information is displayed in the response from an API call.
It does not control which files are copied in an aws s3 sync command. The documentation for aws s3 sync defines the --query parameter as: "A JMESPath query to use in filtering the response data."
Your aws s3 sync command will be synchronizing ALL files unless you use Exclude and Include Filters. These filters operate on the name of the object. It is not possible to limit the sync command by supplying date ranges.
I cannot comment on why the command would stop running before it is complete. I suggest you redirect output to a log file and then review the log file for any clues.

Trying to execute my code every 12hours on the AWS lambda server

I have to run my code every 12 hours. I wrote the following code, and I deployed it to AWS Lambda for the code to run every 12 hours. However, I see that the code does not run every 12 hours. Could you guys help me with this?
nodeCron.schedule("0 */12 * * *", async () => {
let ids = ["5292865", "2676271", "5315840"];
let filternames = ["Sales", "Engineering", ""];
await initiateProcess(ids[0], filternames[0]);
await initiateProcess(ids[1], filternames[1]);
await initiateProcess(ids[2], filternames[2]);
});
AWS Lambda is an event-driven service. It runs your code in response to events. AWS Lambda functions can be configured to run up to 15 minutes per execution. They cannot run longer than that.
I would suggest you use Amazon EventBridge to trigger your Lambda function to run on a 12-hour schedule:
Create a Rule with a schedule to run every 12 hours
In this Rule, create a Target, which is your Lambda function.
Then, you will be able to see if it was executed properly or not, and Lambda logs will be available in CloudWatch Logs.

How to start an ec2 instance using sqs and trigger a python script inside the instance

I have a python script which takes video and converts it to a series of small panoramas. Now, theres an S3 bucket where a video will be uploaded (mp4). I need this file to be sent to the ec2 instance whenever it is uploaded.
This is the flow:
Upload video file to S3.
This should trigger EC2 instance to start.
Once it is running, I want the file to be copied to a particular directory inside the instance.
After this, I want the py file (panorama.py) to start running and read the video file from the directory and process it and then generate output images.
These output images need to be uploaded to a new bucket or the same bucket which was initially used.
Instance should terminate after this.
What I have done so far is, I have created a lambda function that is triggered whenever an object is added to that bucket. It stores the name of the file and the path. I had read that I now need to use an SQS queue and pass this name and path metadata to the queue and use the SQS to trigger the instance. And then, I need to run a script in the instance which pulls the metadata from the SQS queue and then use that to copy the file(mp4) from bucket to the instance.
How do i do this?
I am new to AWS and hence do not know much about SQS or how to transfer metadata and automatically trigger instance, etc.
Your wording is a bit confusing. It says that you want to "start" an instance (which suggests that the instance already exists), but then it says that it wants to "terminate" an instance (which would permanently remove it). I am going to assume that you actually intend to "stop" the instance so that it can be used again.
You can put a shell script in the /var/lib/cloud/scripts/per-boot/ directory. This script will then be executed every time the instance starts.
When the instance has finished processing, it can call sudo shutdown now -h to turn off the instance. (Alternatively, it can tell EC2 to stop the instance, but using shutdown is easier.)
For details, see: Auto-Stop EC2 instances when they finish a task - DEV Community
I tried to answer in the most minimalist way, there are many points below that can be further improved. I think below is still quite some as you mentioned you are new to AWS.
Using AWS Lambda with Amazon S3
Amazon S3 can send an event to a Lambda function when an object is created or deleted. You configure notification settings on a bucket, and grant Amazon S3 permission to invoke a function on the function's resource-based permissions policy.
When the object uploaded it will trigger the lambda function. Which creates the instance with ec2 user data Run commands on your Linux instance at launch.
For the ec2 instance make you provide the necessary permissions via Using instance profiles for download and uploading the objects.
user data has a script that does the rest of the work which you need for your workflow
Download the s3 object, you can pass the name and s3 bucket name in the same script
Once #1 finished, start the panorama.py which processes the videos.
In the next step you can start uploading the objects to the S3 bucket.
Eventually terminating the instance will be a bit tricky which you can achieve Change the instance initiated shutdown behavior
OR
you can use below method for terminating the instnace, but in that case your ec2 instance profile must have access to terminate the instance.
ec2-terminate-instances $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
You can wrap the above steps into a shell script inside the userdata.
Lambda ec2 start instance:
def launch_instance(EC2, config, user_data):
ec2_response = EC2.run_instances(
ImageId=config['ami'], # ami-0123b531fc646552f
InstanceType=config['instance_type'],
KeyName=config['ssh_key_name'],
MinCount=1,
MaxCount=1,
SecurityGroupIds=config['security_group_ids'],
TagSpecifications=tag_specs,
# UserData=base64.b64encode(user_data).decode("ascii")
UserData=user_data
)
new_instance_resp = ec2_response['Instances'][0]
instance_id = new_instance_resp['InstanceId']
print(f"[DEBUG] Full ec2 instance response data for '{instance_id}': {new_instance_resp}")
return (instance_id, new_instance_resp)
Upload file to S3 -> Launch EC2 instance

How to get results of AWS Glue Job when executing via API?

I executed an AWS Glue Job via API Gateway to start the job run. The job run is successful. But the result of the Script (print of a result) has not gotten through the execution. Only job run ID comes as the response. Is there any way to get the result of the job through an API?
For glue anything you print or log goes into cloud watch
You have an option of adding a handler in your logger that writes to a stream and push that stream to a file in s3. Or better yet, create a StringIO object , store your result to it and then send that to s3

Background process in node

I am new to the whole javascript stack .I have been trying to learn by building a small application based on React-Express-Mongo. My application basically saves some config settings to mongo . And based on these settings the app periodically tries to fetch some values by querying and elasticsearch index.
So far i have the part to save the config settings done.
What I need to do now ,is to extract these setting from my mongo DB and schedule a job which keeps runnning periodically (the period is one of the settings) to poll my elastic index.The thing that i cannot wrap my head around ,is how do I create this scheduled job. All i have been using till now is the Express router to interact with my UI and DB.
I did some research ,would spawning a child process be the ideal way to go ahead with this ?
I would suggest you to go take a look to node-corn. Cron is a popular task scheduler on UNIX and node-cron is its implementation in node.
Basic usage - taken from the doc
var CronJob = require('cron').CronJob;
new CronJob('* * * * * *', function() {
console.log('You will see this message every second');
}, null, true, 'America/Los_Angeles');

Resources