Run sync script on AWS ec2 triggered by write - linux

I have an ec2 instance running and have setup it up where it takes SFTP writes (I have to use SFTP unfortunately so I am aware of better solutions but I can't use them). I have an s3 bucket mounted but I ran into an issue with allowing SFTP writes directly into the bucket. My work around is to run
aws s3 sync <directory> s3://<s3-bucket-name>/
And this works. My problem I don't know how to run this script automatically, I would prefer to run it whenever there is a write to a specified directory but I will settle for it running on regular intervals.
So essentially my question is "How do I fire a script automatically in a ec2 aws instance running linux"
Thanks.

use inotifywait for file watcher or use cronjob to kick-off your S3 Sync script at regular interval.

Related

Stream stdout from a running AWS EC2 instance to a file in S3 (Linux bash)

So I'm running models on several EC2 Linux instances at once. The model is a black box binary that prints out it's progress to stdout as it runs. All EC2's have AWS CLI and are hooked up to an S3 bucket where they save the final results after their run, and then self terminate.
All I need to do is track the progress of each model as it runs. I'm launching these all from python, so they're randomly assigned an IP, and it would be overarchitecting to reserve a block of IP's to SSH into. So my aim is to simply stream their stdouts to a file on S3.
This should be the solution https://stackoverflow.com/a/42285350, but it's not working for me. It only writes to the file once after the .sh is finished (the binary run is triggered inside run.sh).
./run.sh | aws s3 cp - s3://bucket/pathto/results/results/progress.txt
How do I get it to stream the stdout in realtime? I'm also hoping not to add much overhead which could increase simulation run times. Thanks

Move files from S3 to AWS EFS on the fly

I am building an application where the user can upload images, for this I am using S3 as files storage.
In other area of the application there is some process deployed on EC2 that need to use the uploaded images.
This process need the images multiple times (it generate some report with it) and it's in part of multiple EC2 - using elastic beanstalk.
The process doesn't need all the images at once, but need some subset of it every job it gets (depend the parameters it gets).
Every ec2 instance is doing an independent job - they are not sharing file between them but they might need the same uploaded images.
What I am doing now is to download all the images from s3 to the EC2 machine because it's need the files locally.
I have read that EFS can be mounted to an EC2 and then I can access it like it was a local storage.
I did not found any example of uploading directly to EFS with nodejs (or other lang) but I found a way to transfer file from S3 to EFS - "DataSync".
https://docs.aws.amazon.com/efs/latest/ug/transfer-data-to-efs.html
So I have 3 questions about it:
It is true that I can't upload directly to EFS from my application? (nodesjs + express)
After I move files to EFS, will I able to use it exactly like it in the local storage of the ec2?
Is it a good idea to move file from s3 to efs all the time or there is other solution to the problem I described?
For this exact situation, we use https://github.com/kahing/goofys
It's very reliable, and additionally, offers the ability to mount S3 buckets as folders on any device - windows, mac as well as of course Linux.
Works outside of the AWS cloud 'boundary' too - great for developer laptops.
Downside is that it does /not/ work in a Lambda context, but you can't have everything!
Trigger a Lambda to call an ECS task when the file is uploaded to s3. The ECS task starts and mounts the EFS volume and copies the file from s3 to the EFS.
This wont run into problems with really large files with Lambda getting timed out.
I dont have the code but would be interested if someone already has this solution coded.

Transfer a file from EC2 linux instance to a Windows system

I have one scheduled task on an EC2 linux system and it is generating a file daily. Now I want to transfer this file from that to another Windows machine that is not on AWS.
I want to schedule this job on the EC2 instance only. I don't want to download it from target machine, I want the upload facility from EC2.
I have tried below command:
scp -i (Key File for my current EC2 instance) (IP of target):(Local file path of current EC2 instance) D:\TEMP(Target Path of window machine)
I am getting:
ssh: connect to host (IP of target) port 22: Connection refused
We already have a functionality to store the file in S3 but it depends on the task of EC2 instance. (Sometimes it takes 1 hour or sometimes it takes 4 hours, that's why I want it at end of this task.)
The error you're receiving is most likely caused by an incorrect firewall setting on the EC2 Security Group in front of your EC2 instance, or, on your Windows server's network.
I would like to suggest using an Amazon S3 bucket to upload the file from your EC2 instance into S3. The file can then wait to be collected from your Windows instance as a scheduled job. You could delete the file from S3 after Windows has downloaded it, or use a lifecycle policy to automatically delete the saved files after a certain time.
This will remove the need to open SSH to your EC2 instance, and also enable you to save the file in S3 so that you can re-download it if you need it again.
I don't know what technology stack you're using, but you could start using Amazon S3 on both servers with the AWS CLI, or an SDK for your preferred programming language.

How to ensure that node.js script on EC2 instance restarts if it ever crashes?

I have a node.js script that is streaming in 1% Twitter data real-time into my S3 bucket. This script is on my EC2 instance and run using forever to ensure that it just runs on the EC2 instance non-stop remotely.
I noticed sometimes that the stream will stop randomly; this might be a dumb question but how do I ensure that if the code stops, that it automatically restarts without me having to check?
Thanks!
You can use something like Upstart on ubuntu. Some ideas are here

How to write a shell script to run scripts on several remote machines without ssh?

Can anyone please tell me how I can write a bash shell script that executes another script on several remote machines without ssh.
The scenario is I've a couple of scripts that I should run on 100 Amazon Ec2 cloud instances. The naive approach is to write a script to scp both the source scripts to all the instances and then run them by doing a ssh on each instance. Is there a better way of doing this?
Thanks in advance.
If you just want to do stuff in parallel, you can use Parallel SSH or Cluster SSH. If you really don't want to use SSH, you can install a task queue system like celery. You could even go old school and just have a cron job that periodically checks a location in s3 and if the key exists, download the file and run it, though you have to be careful to only run it once. You can also use tools like Puppet and Chef if you're generally trying to manage a bunch of machines.
Another option is rsh, but be careful, it has security implications. See rsh vs. ssh.

Resources