Sending files using aws sync - aws-cli

I'm using the aws sync command to have jenkins send files to s3. Problem is that aws sync sends the same file again even though it has the same file name as the one in s3. That is because we have to tar then untar the file before sending to s3 and tarring and untar changes the modification time. Is there anyway to send files to s3 more efficiently so that the same file does not get sent?

You can specify the --size-only flag to make the size of each object the only criteria used to decide whether to sync from source to destination. (docs)

Related

How can I edit an xml file that is stored in S3 without using a temp directory?

I have a config file in an S3 bucket that needs to be modified by adding the response from ec2.describe_subnets and sent to an api endpoint. Is there a way to automate this process using Lambda to get the file from S3 without having to save it to a temp directory?
Is there a way to automate this process using Lambda to get the file from S3 without having to save it to a temp directory?
If you're asking about modifying the contents of the S3 Object, then the answer is no, it's not possible because S3 doesn't support that kind of operation. The only thing you can do is overwrite the entire object (ie, not just parts of it).
If you're asking about overwriting the S3 Object with new contents, then yes, it is possible to do it "without having to save it to a temp directory" if you do it in memory, for example.
Download the object from S3 without writing it to storage, make the changes in memory, and re-upload it to S3. If the file size is too big to fully store in memory, you can do it in a streaming fashion, too (i.e., initiate the download, for each chunk you download you make the necessary changes, and upload the modified chunk with a multipart upload, clear up the memory, repeat, etc).
As a final note, do keep in mind that S3 supports only eventual consistency for updates. This means that after you update the object, subsequent reads may still download the previous version. If whatever is consuming the file cannot properly deal with that, you'll probably need a different approach (i.e., don't overwrite, but write a new object with a new key, and send that new key to the consumer; or just use storage system that does support strong consistency, such as DynamoDB).

How to scan and remove maclious file when uploading?

When uploading malicious file in this time need scanning and remove that file. any special package is there in NPM? Can you help me on this any one thanks advance.
Following 2 steps are basic steps.
In frontend only allow specific file extensions like(.pdf,.png etc.) with limitations like size. (Don't forget front end code can be manipulated).
2.You shoul also need to check file extenstions & sizes in backend(if you are using node you can use multer to achieve this.)
What more we can do in backend?
If we only rely on checking with extensions it doesn't help. (anyone can modify name of sample.exe to sample.jpg & upload).
For example if you check whether file uploaded image or not in backend other than checking with file extension you can follow below approach also.
The first eight bytes of a PNG file always contain the following (decimal) values: 137 80 78 71 13 10 26 10
If you want to check whether uploaded file is png or not above condition will work. Not only that if you want to check files uploaded properly or not you can follow some approaches like mentioned above. (for .pdf, .doc some rules might be there) You can check MIME signature data which is the best practice.
Don't save uploaded files in backend code repository. Store them some other workspace. (optional)
Following links might help.
Cloud Storages
Other than storing files in local server you can save uploaded files on cloud like amazon s3 bucket. After every time any file is uploaded to that s3 bucket you can trigger scanner using lambdas(automatic file scanners on amazon).
Other than amazon you can also use google drive for files upload (not optimal one). But when someone downloads uploaded file google will automatically scan for viruses.
amazon's s3 bucket file's scan links::
amazon s3 bucket files scan SO
amazon s3 bucket files reddit
s3 files scanning using lambda & clamav
For local server::
checking MIME signature offical docs
check file types plugin
clam scan npm
check image content without extension SO 1
check image content without extensions SO 2

How to unzip s3 file through lambda and chain with other lambda functions using nodejs?

Two Issues over here:-
I have a zip file which once uploaded to s3 should trigger an event and
1. unzip the file
2. After unzip second lambda should trigger and call api.
If you want an AWS Lambda function to "unzip to S3", then the Lambda function would need to:
Download the Zip file to local storage
Unzip the files
Loop through the files and upload each one to Amazon S3
Please note that there is a maximum of 500MB of disk storage space available to Lambda functions (in the /tmp/ directory), and this space needs to hold both the zip file and the extracted zip files.
Also, you'll need to extend the timeout setting to give adequate time for the function to perform all of the above operations.
The previous answer might work but the tmp folder has limited memory so it might cause problems as your file size gets increased. Instead, you can create an event based trigger that runs a Lambda function. This Lambda can then read zip files into a buffer (instead of downloading them locally), unzip them using the zipfile library and upload the unzipped files back to s3. Here's a detailed tutorial here: https://betterprogramming.pub/unzip-and-gzip-incoming-s3-files-with-aws-lambda-f7bccf0099c9
You should increase the timeout and memory size both to the max.

AWS: What Happens to Static S3 Files When a New Instance of a Website is Deployed?

So a little background. We have a website (js, jquery, less, node) that is hosted on Amazon AWS S3 and is distributed using CloudFront. In the past we have stored our resources statically in the assets folder within app locally and on S3.
Recently we have set up a node lambda that listens to Kinesis events and generates a json file that is then stored within the assets folder in S3. Currently, the file in the bucket with the same key is overwritten and the site using the generated file as it should.
My questions is, what happens to that json file when we deploy a new instance of our website? Even if we remove the json file from the local assets folder, if the deployment overwrites the whole assets directory in the S3 project when a new one is deployed, does that result in the json file being removed?
Thanks in advance!
Please let me know if you need any more clarification.
That will depend on how you'r syncing files, I recommend you use the "sync" command so that only new files are uploaded and only if you specify to delete a file that doesn't exist in your repo but it exists in S3 it will get deleted, otherwise not.
See for example the CLI command docs here: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html ... as you can see, if you specify --delete the files will be deleted.
But not sure what's your use case, do you want that file to get deleted? It seems that you don't want that :)

How to upload all files from directory to s3

I am writing node js script which should send images from directory '/images/', to the amazon s3. I know the knox is very good library, but how can I upload all files from directory, with the old file names. I can probably use fs module, get all names and upload it with for loop. Is there any function in knox which can do this?
Knox does not provide any functionality for client-side file handling.
You need to find your files manually and upload them one after one.
Unfortunately its impossible to upload multiple files once. The problem is that S3 requires that you send the Content-Length header for every file.
Why not use the command line tool s3cmd ( http://s3tools.org/s3cmd ) ? If you really want to do it in node.js, you can spawn a process to execute s3cmd in your javascript code.

Resources