How to scan and remove maclious file when uploading?

How to scan and remove maclious file when uploading? - node.js

When uploading malicious file in this time need scanning and remove that file. any special package is there in NPM? Can you help me on this any one thanks advance.

Following 2 steps are basic steps.
In frontend only allow specific file extensions like(.pdf,.png etc.) with limitations like size. (Don't forget front end code can be manipulated).
2.You shoul also need to check file extenstions & sizes in backend(if you are using node you can use multer to achieve this.)
What more we can do in backend?
If we only rely on checking with extensions it doesn't help. (anyone can modify name of sample.exe to sample.jpg & upload).
For example if you check whether file uploaded image or not in backend other than checking with file extension you can follow below approach also.
The first eight bytes of a PNG file always contain the following (decimal) values: 137 80 78 71 13 10 26 10
If you want to check whether uploaded file is png or not above condition will work. Not only that if you want to check files uploaded properly or not you can follow some approaches like mentioned above. (for .pdf, .doc some rules might be there) You can check MIME signature data which is the best practice.
Don't save uploaded files in backend code repository. Store them some other workspace. (optional)
Following links might help.
Cloud Storages
Other than storing files in local server you can save uploaded files on cloud like amazon s3 bucket. After every time any file is uploaded to that s3 bucket you can trigger scanner using lambdas(automatic file scanners on amazon).
Other than amazon you can also use google drive for files upload (not optimal one). But when someone downloads uploaded file google will automatically scan for viruses.
amazon's s3 bucket file's scan links::
amazon s3 bucket files scan SO
amazon s3 bucket files reddit
s3 files scanning using lambda & clamav
For local server::
checking MIME signature offical docs
check file types plugin
clam scan npm
check image content without extension SO 1
check image content without extensions SO 2

Related

Allow download of entire contents of Amazon S3 folder in Django

I'm using Django and I have a S3 Account with files in a folder called media. I want to allow users to download the entire list of files as an archived zip folder to save them having to click on each individual link to get the file.
I'm using Django, and Amazon S3 and Boto3. Links are like,
https://mydomain.s3.amazonaws.com/media/filename1.txt
https://mydomain.s3.amazonaws.com/media/filename2.txt
https://mydomain.s3.amazonaws.com/media/filename3.txt
https://mydomain.s3.amazonaws.com/media/filename4.txt
https://mydomain.s3.amazonaws.com/media/filename5.txt
So essential I want to bundle all files in another link that is a download to all files, as an archive.
Any suggestions on how to do this with Django?

In order to allow access to a file stored in s3, you can generate a s3 pre-signed URL, that will give specific access(read or write) to the file. Each Url generated gives access to exactly one file. The access is granted only for a certain time (maximum 7 days). this is the most recommended approach. if you dont want the Url to expire, you will need to create a django route that downloads the file for the user.
In order to download files as archive, you need to create a django route, such as /downloadAll?user=1. When the user visit to the url, you need to download files from s3 using and create the archive.

I wrote a management task in django to avoid the long wait on the page, so that once these files are created you can run the management task.
python manage.py archive_files
from django.core.management.base import BaseCommand, CommandError
class Command(BaseCommand):
help = ''
def handle(self, *args, **options):
self.stdout.write("doing")
write_archive_file()
self.stdout.write("done")
With the archive file created, I can then make that downloadable.

Revisions in google cloud storage

I want to save my files to google cloud storage. I have stored my files like this name doc_personId_fileId. But now If my user uploads another file old file will be replaced. I want to keep revisions. What is best approach to keep record of all the revisions. For example:
I have a file named doc_1_1. Now if user uploads another file. Old file should be named as doc_1_1_revision_1 and after that doc_1_1_revision_2 and so on and new file should be doc_1_1.
What is best method to save this?
Or is there anything provided by google to handle this type of scenarios?
Thanks.

You want to upload doc_1_1 a few times, for example 3 times, and expect your bucket to look like:
doc_1_1
doc_1_1_revision_3
doc_1_1_revision_2
. . .
In short, you cannot achieve this automatically by GCP supports and it requires you work around your upload code to do 2 operations :
moving the old file to name it with revision
upload the new file
Alternatively, GCP support object revision using two concepts generation on the object itself and metagenerationon meta-data associated with the object. So you either keep uploading new file and do not need to pay attention to other revisions but leave it to GCP to handle. Listing files with option to see generation and metadata will give you all files and revisions
Of course, you can restore / retrieve a file with specfiying the revision

Your goal is:
I have a file named doc_1_1. Now if user uploads another file. Old
file should be named as doc_1_1_revision_1 and after that
doc_1_1_revision_2 and so on and new file should be doc_1_1.
Google Cloud Storage does not support this naming technique. You will have to implement this on the client side as part of your upload process.
Another option is to enable "Object Versioning" where previous objects with the same name still persist. The last uploaded instance is the "current" version.
This link will help you understand object versions:
Object Versioning

AWS: What Happens to Static S3 Files When a New Instance of a Website is Deployed?

So a little background. We have a website (js, jquery, less, node) that is hosted on Amazon AWS S3 and is distributed using CloudFront. In the past we have stored our resources statically in the assets folder within app locally and on S3.
Recently we have set up a node lambda that listens to Kinesis events and generates a json file that is then stored within the assets folder in S3. Currently, the file in the bucket with the same key is overwritten and the site using the generated file as it should.
My questions is, what happens to that json file when we deploy a new instance of our website? Even if we remove the json file from the local assets folder, if the deployment overwrites the whole assets directory in the S3 project when a new one is deployed, does that result in the json file being removed?
Thanks in advance!
Please let me know if you need any more clarification.

That will depend on how you'r syncing files, I recommend you use the "sync" command so that only new files are uploaded and only if you specify to delete a file that doesn't exist in your repo but it exists in S3 it will get deleted, otherwise not.
See for example the CLI command docs here: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html ... as you can see, if you specify --delete the files will be deleted.
But not sure what's your use case, do you want that file to get deleted? It seems that you don't want that :)

What are the possible ways to transfer a file of more than 100MB generated via shell script, to any URL or server?

I have a shell script written, which generates an excel file of more than 100MB. Now, I want to transfer the file or say upload the file to one URL which is online storage server. This server will generate a URL containing the file after uploading.
The question is if we are able to upload the file using "cURL" to any given URL, then how to get the generated URL from that web page??
The URL generated after uploading the file is dynamic. (Dropbox kind of storage)
If it is not possible to get that URL then how to transfer such a big file.
Note: It is kind of automation, so please answer keeping automation in mind.
Thank you in advance.

Amazon S3 Browser Based Upload - Prevent Overwrites

We are using Amazon S3 for images on our website and users upload the images/files directly to S3 through our website. In our policy file we ensure it "begins-with" "upload/". Anyone is able to see the full urls of these images since they are publicly readable images after they are uploaded. Could a hacker come in and use the policy data in the javascript and the url of the image to overwrite these images with their data? I see no way to prevent overwrites after uploading once. The only solution I've seen is to copy/rename the file to a folder that is not publicly writeable but that requires downloading the image then uploading it again to S3 (since Amazon can't really rename in place)

If I understood you correctly The images are uploaded to Amazon S3 storage via your server application.
So the Amazon S3 write permission has only your application. Clients can upload images only throw your application (which will store them on S3). Hacker can only force your application to upload image with same name and rewrite the original one.
How do you handle the situation when user upload a image with a name that already exists in your S3 storage?
Consider following actions:
First user upload a image some-name.jpg
Your app stores that image in S3 under name upload-some-name.jpg
Second user upload a image some-name.jpg
Will your application overwrite the original one stored in S3?

I think the question implies the content goes directly through to S3 from the browser, using a policy file supplied by the server. If that policy file has set an expiration, for example, one day in the future, then the policy becomes invalid after that. Additionally, you can set a starts-with condition on the writeable path.
So the only way a hacker could use your policy files to maliciously overwrite files is to get a new policy file, and then overwrite files only in the path specified. But by that point, you will have had the chance to refuse to provide the policy file, since I assume that is something that happens after authenticating your users.
So in short, I don't see a danger here if you are handing out properly constructed policy files and authenticating users before doing so. No need for making copies of stuff.

actually S3 does have a copy feature that works great
Copying Amazon S3 Objects
but as amra stated above, doubling your space by copying sounds inefficient
mybe itll be better to give the object some kind of unique id like a guid and set additional user metadata that begin with "x-amz-meta-" for some more information on the object, like the user that uploaded it, display name, etc...
on the other hand you could always check if the key exists already and prompt for an error

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string