Allow download of entire contents of Amazon S3 folder in Django

Allow download of entire contents of Amazon S3 folder in Django - python-3.x

I'm using Django and I have a S3 Account with files in a folder called media. I want to allow users to download the entire list of files as an archived zip folder to save them having to click on each individual link to get the file.
I'm using Django, and Amazon S3 and Boto3. Links are like,
https://mydomain.s3.amazonaws.com/media/filename1.txt
https://mydomain.s3.amazonaws.com/media/filename2.txt
https://mydomain.s3.amazonaws.com/media/filename3.txt
https://mydomain.s3.amazonaws.com/media/filename4.txt
https://mydomain.s3.amazonaws.com/media/filename5.txt
So essential I want to bundle all files in another link that is a download to all files, as an archive.
Any suggestions on how to do this with Django?

In order to allow access to a file stored in s3, you can generate a s3 pre-signed URL, that will give specific access(read or write) to the file. Each Url generated gives access to exactly one file. The access is granted only for a certain time (maximum 7 days). this is the most recommended approach. if you dont want the Url to expire, you will need to create a django route that downloads the file for the user.
In order to download files as archive, you need to create a django route, such as /downloadAll?user=1. When the user visit to the url, you need to download files from s3 using and create the archive.

I wrote a management task in django to avoid the long wait on the page, so that once these files are created you can run the management task.
python manage.py archive_files
from django.core.management.base import BaseCommand, CommandError
class Command(BaseCommand):
help = ''
def handle(self, *args, **options):
self.stdout.write("doing")
write_archive_file()
self.stdout.write("done")
With the archive file created, I can then make that downloadable.

Related

How to force browser to download public asset from GCP Storage Bucket url?

I have assets from Wordpress being uploaded to GCP Storage bucket. But when I then list all these links to these assets within the website im working on, I would like the user to automatically download the file instead of viewing it in the browser when the user clicks on the link.
Is there an "easy" way to implement this behaviour?
The project is running with Wordpress as headless API, and Next.js frontend.
Thanks

You can change object metadata for your objects in Cloud Storage to force browsers to download files directly, instead of previewing them. You can do this through the available content-disposition property. Setting this property to attachment will allow you to directly download the content.
I quickly tested downloading public objects with and without this property and can confirm the behavior, downloads do happen directly. The documentation explains how to quickly change the metadata for existing objects in your bucket. While it is not directly mentioned, you can use wildcards to apply metadata changes to multiple objects at the same time. For example this command will apply the content-disposition property in all objects of the bucket:
gsutil setmeta -h "content-disposition:attachment" gs://BUCKET_NAME/**

How to scan and remove maclious file when uploading?

When uploading malicious file in this time need scanning and remove that file. any special package is there in NPM? Can you help me on this any one thanks advance.

Following 2 steps are basic steps.
In frontend only allow specific file extensions like(.pdf,.png etc.) with limitations like size. (Don't forget front end code can be manipulated).
2.You shoul also need to check file extenstions & sizes in backend(if you are using node you can use multer to achieve this.)
What more we can do in backend?
If we only rely on checking with extensions it doesn't help. (anyone can modify name of sample.exe to sample.jpg & upload).
For example if you check whether file uploaded image or not in backend other than checking with file extension you can follow below approach also.
The first eight bytes of a PNG file always contain the following (decimal) values: 137 80 78 71 13 10 26 10
If you want to check whether uploaded file is png or not above condition will work. Not only that if you want to check files uploaded properly or not you can follow some approaches like mentioned above. (for .pdf, .doc some rules might be there) You can check MIME signature data which is the best practice.
Don't save uploaded files in backend code repository. Store them some other workspace. (optional)
Following links might help.
Cloud Storages
Other than storing files in local server you can save uploaded files on cloud like amazon s3 bucket. After every time any file is uploaded to that s3 bucket you can trigger scanner using lambdas(automatic file scanners on amazon).
Other than amazon you can also use google drive for files upload (not optimal one). But when someone downloads uploaded file google will automatically scan for viruses.
amazon's s3 bucket file's scan links::
amazon s3 bucket files scan SO
amazon s3 bucket files reddit
s3 files scanning using lambda & clamav
For local server::
checking MIME signature offical docs
check file types plugin
clam scan npm
check image content without extension SO 1
check image content without extensions SO 2

AWS: What Happens to Static S3 Files When a New Instance of a Website is Deployed?

So a little background. We have a website (js, jquery, less, node) that is hosted on Amazon AWS S3 and is distributed using CloudFront. In the past we have stored our resources statically in the assets folder within app locally and on S3.
Recently we have set up a node lambda that listens to Kinesis events and generates a json file that is then stored within the assets folder in S3. Currently, the file in the bucket with the same key is overwritten and the site using the generated file as it should.
My questions is, what happens to that json file when we deploy a new instance of our website? Even if we remove the json file from the local assets folder, if the deployment overwrites the whole assets directory in the S3 project when a new one is deployed, does that result in the json file being removed?
Thanks in advance!
Please let me know if you need any more clarification.

That will depend on how you'r syncing files, I recommend you use the "sync" command so that only new files are uploaded and only if you specify to delete a file that doesn't exist in your repo but it exists in S3 it will get deleted, otherwise not.
See for example the CLI command docs here: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html ... as you can see, if you specify --delete the files will be deleted.
But not sure what's your use case, do you want that file to get deleted? It seems that you don't want that :)

Upload file and folder structure to S3

The users in my site need to be able to upload a bunch of files and folders into S3 while maintaining the folder structure.
Say they have the following files in their local boxes.
/file1.jpg
/some_folder/file2.jpg
After upload, I need their s3 urls to be
http://s3bucket.amazon.com/usersfolder/file1.jpg
http://s3bucket.amazon.com/usersfolder/some_folder/file2.jpg
How can i do this ? To make matters a little more complicated, Upload from client side can be initiated only after they download an upload policy.
Edit: I would like to know a solution for the front end part of this question. Looks like on server i can use a wildcard character to specify access permissions, so i am good on that part.
I am using Node.JS/Express JS as a backend

Amazon S3 Browser Based Upload - Prevent Overwrites

We are using Amazon S3 for images on our website and users upload the images/files directly to S3 through our website. In our policy file we ensure it "begins-with" "upload/". Anyone is able to see the full urls of these images since they are publicly readable images after they are uploaded. Could a hacker come in and use the policy data in the javascript and the url of the image to overwrite these images with their data? I see no way to prevent overwrites after uploading once. The only solution I've seen is to copy/rename the file to a folder that is not publicly writeable but that requires downloading the image then uploading it again to S3 (since Amazon can't really rename in place)

If I understood you correctly The images are uploaded to Amazon S3 storage via your server application.
So the Amazon S3 write permission has only your application. Clients can upload images only throw your application (which will store them on S3). Hacker can only force your application to upload image with same name and rewrite the original one.
How do you handle the situation when user upload a image with a name that already exists in your S3 storage?
Consider following actions:
First user upload a image some-name.jpg
Your app stores that image in S3 under name upload-some-name.jpg
Second user upload a image some-name.jpg
Will your application overwrite the original one stored in S3?

I think the question implies the content goes directly through to S3 from the browser, using a policy file supplied by the server. If that policy file has set an expiration, for example, one day in the future, then the policy becomes invalid after that. Additionally, you can set a starts-with condition on the writeable path.
So the only way a hacker could use your policy files to maliciously overwrite files is to get a new policy file, and then overwrite files only in the path specified. But by that point, you will have had the chance to refuse to provide the policy file, since I assume that is something that happens after authenticating your users.
So in short, I don't see a danger here if you are handing out properly constructed policy files and authenticating users before doing so. No need for making copies of stuff.

actually S3 does have a copy feature that works great
Copying Amazon S3 Objects
but as amra stated above, doubling your space by copying sounds inefficient
mybe itll be better to give the object some kind of unique id like a guid and set additional user metadata that begin with "x-amz-meta-" for some more information on the object, like the user that uploaded it, display name, etc...
on the other hand you could always check if the key exists already and prompt for an error

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string