Revisions in google cloud storage - node.js

I want to save my files to google cloud storage. I have stored my files like this name doc_personId_fileId. But now If my user uploads another file old file will be replaced. I want to keep revisions. What is best approach to keep record of all the revisions. For example:
I have a file named doc_1_1. Now if user uploads another file. Old file should be named as doc_1_1_revision_1 and after that doc_1_1_revision_2 and so on and new file should be doc_1_1.
What is best method to save this?
Or is there anything provided by google to handle this type of scenarios?
Thanks.

You want to upload doc_1_1 a few times, for example 3 times, and expect your bucket to look like:
doc_1_1
doc_1_1_revision_3
doc_1_1_revision_2
. . .
In short, you cannot achieve this automatically by GCP supports and it requires you work around your upload code to do 2 operations :
moving the old file to name it with revision
upload the new file
Alternatively, GCP support object revision using two concepts generation on the object itself and metagenerationon meta-data associated with the object. So you either keep uploading new file and do not need to pay attention to other revisions but leave it to GCP to handle. Listing files with option to see generation and metadata will give you all files and revisions
Of course, you can restore / retrieve a file with specfiying the revision

Your goal is:
I have a file named doc_1_1. Now if user uploads another file. Old
file should be named as doc_1_1_revision_1 and after that
doc_1_1_revision_2 and so on and new file should be doc_1_1.
Google Cloud Storage does not support this naming technique. You will have to implement this on the client side as part of your upload process.
Another option is to enable "Object Versioning" where previous objects with the same name still persist. The last uploaded instance is the "current" version.
This link will help you understand object versions:
Object Versioning

Related

How to organize s3 uploads client/server with AWS SDK

I have a bucket that has multiple users, and would like to pre-sign urls for the client to upload to s3 (some files can be large, so I'd rather they not pass through the Node server. My question is this: Until the mongo database is hit, there is no mongo Object Id to tag as a prefix for the file. (I'm separating the files in this structure: (UserID/PostID/resource) so you can check all of a user's pictures by looking under /UserID, and you can target a specific post by also adding the PostID. Conversely, there is no Object URL until the client uploads the file, so I'm at a bit of an impasse.
Is it bad practice to rename files after they touch the bucket? I just can't pre-know the ObjectID (the post has to be created in Mongo first) - but the user has to select what files they want to upload before the object is created. I was thinking the best flow could be one of two situations:
Client sets files -> Mongo created Document -> Responds to client with ObjectID and pre-signed urls for each file with the key set to /UserID/PostID/name. After successful upload, it re-triggers an update function on the server to edit the urls of the post. after update, send success to client.
Client uploads files to root of bucket -> Mongo doc created where urls of uploaded s3 files are being stored -> iterate over list and prepend the UserID and newly-created PostID, updating mongo document -> success response to client
Is there another approach that I don't know about?
Answering your question:
Is it bad practice to rename files after they touch the server?
If you are planing to use S3 to save your files, there is no server, so there is no problems to change these files after you upload them.
The only thing that you need to understand is renaming a object you need to two requests:
copy the object with a new name
delete the old object with the old name
And this means that maybe can be a problem in costs/latency if you have a huge number of changes (but I can say for most of cases this will not be a problem)
I can say that the first option will be a good option for you, and the only thing that I would change is adding a Serverless processing for your object/files, using the AWS Lambda service will be a good option .
In this case instead of updating the files on the server, you will update using a Lambda function, you only need to add a trigger for your bucket in the PutObject event on S3, this way will can change the name of your files in the best processing time for your client and with low costs.

Upload to AWS S3 Best Practice

I'm creating a node backend application and I have an entity which can have files assigned.
I have the following options:
Make a request and upload the files as soon as the user selects them in the frontend form and assign them to the entity when the user makes the request to create / update it
Upload the files in the same request which creates / updates the entity
I was wondering if there is a best practice for this scenario. I can't really decide whats better.
This is one of those "depends" answers, and it depends how you are doing uploads and if you plan to clean up your S3 buckets.
I'd suggest creating the entity first (option #2), because than you can store what S3 files are with that entity. If you tried option #1, you might have untracked files (or some kind of staging area), which could require cleanup at some point in the future. (If you files are small, it may never matter, and you just eat that $0.03/GB fee each month : )
I've been seeing on some web sites that look like option #1, where files are included in my form/document as I'm "editing". Pasting an image from my buffer is particularly sweet, and sometimes I see a placeholder of text while it is uploaded, showing the picture when complete. Now I think these "documents" are actually saved on their servers in some kind of draft status, so it might be your option #2 anyway. You could do the same that creates a draft entity, and finalizes it later on (and then have a way to clean out drafts and their attachments at some point).
Also, depending on bucket privacy you need to achieve, have a look at AWS Cognito to upload directly from the browser. You could save your server bandwidth, and reduce your request time, by not using your server as as pass-through.

Why save file with original name is dangerous in an upload?

Currently I'm working on a web project (Classic Asp) and I'm going to make an upload form.
Folklore says:
"Don't use the real name to save the uploaded files"
.
What are the problems, dangers, from the security point of view ?
Proper directory permissions should stop most of this stuff but I suppose for file names a potential danger is that they could name it something like "../Default.asp" or "../Malware.asp" or some other malicious path attempting to overwrite files and/or have an executable script on your server.
If I'm using a single upload folder, I always save my users uploads with a GUID file name just because users aren't very original and you get name conflicts very often otherwise.

Avoid over-writing blobs AZURE

if i upload a file on azure blob in the same container where the file is existing already, it is over-writing the file, how to avoid overwriting the same? below i am mentioning the scenario...
step1 - upload file "abc.jpg" on azure in container called say "filecontainer"
step2 - once it gets uploaded, try uploading some different file with the same name to the same container
Output - it will overwrite existing file with the latest uploaded
My Requirement - i want to avoid this overwrite, as different people may upload files having same name to my container.
Please help
P.S.
-i do not want to create different containers for different users
-i am using REST API with Java
Windows Azure Blob Storage supports conditional headers using which you can prevent overwriting of blobs. You can read more about conditional headers here: http://msdn.microsoft.com/en-us/library/windowsazure/dd179371.aspx.
Since you want that a blob should not be overwritten, you would need to specify If-None-Match conditional header and set it's value to *. This would cause the upload operation to fail with Precondition Failed (412) error.
Other idea would be to check for blob's existence just before uploading (by fetching it's properties) however I would not recommend this approach as it may lead to some concurrency issues.
You have no control over the name your users upload their files with. You, however, have control over the name you store those files with. The standard way is to generate a Guid and name each file accordingly. The chances of conflict is almost zero.
A simple pseudocode looks like this:
//generate a Guid and rename the file the user uploaded with the generated Guid
//store the name of the file in a dbase or what-have-you with the Guid
//upload the file to the blob storage using the name you generated above
Hope that helps.
Let me put it that way:
step one - user X uploads file "abc1.jpg" and you save it io a local folder XYZ
step two - user Y uploads another file with same name "abc1.jpg", and now you save it again in a local folder XYZ
What do you do now?
With this I am illustrating that your question does not relate to Azure in any way!
Just do not rely on original file names when saving files. Where-ever you are saving them. Generate random names (GUIDs for example) and "attach" the original name as meta-data.

Amazon S3 Browser Based Upload - Prevent Overwrites

We are using Amazon S3 for images on our website and users upload the images/files directly to S3 through our website. In our policy file we ensure it "begins-with" "upload/". Anyone is able to see the full urls of these images since they are publicly readable images after they are uploaded. Could a hacker come in and use the policy data in the javascript and the url of the image to overwrite these images with their data? I see no way to prevent overwrites after uploading once. The only solution I've seen is to copy/rename the file to a folder that is not publicly writeable but that requires downloading the image then uploading it again to S3 (since Amazon can't really rename in place)
If I understood you correctly The images are uploaded to Amazon S3 storage via your server application.
So the Amazon S3 write permission has only your application. Clients can upload images only throw your application (which will store them on S3). Hacker can only force your application to upload image with same name and rewrite the original one.
How do you handle the situation when user upload a image with a name that already exists in your S3 storage?
Consider following actions:
First user upload a image some-name.jpg
Your app stores that image in S3 under name upload-some-name.jpg
Second user upload a image some-name.jpg
Will your application overwrite the original one stored in S3?
I think the question implies the content goes directly through to S3 from the browser, using a policy file supplied by the server. If that policy file has set an expiration, for example, one day in the future, then the policy becomes invalid after that. Additionally, you can set a starts-with condition on the writeable path.
So the only way a hacker could use your policy files to maliciously overwrite files is to get a new policy file, and then overwrite files only in the path specified. But by that point, you will have had the chance to refuse to provide the policy file, since I assume that is something that happens after authenticating your users.
So in short, I don't see a danger here if you are handing out properly constructed policy files and authenticating users before doing so. No need for making copies of stuff.
actually S3 does have a copy feature that works great
Copying Amazon S3 Objects
but as amra stated above, doubling your space by copying sounds inefficient
mybe itll be better to give the object some kind of unique id like a guid and set additional user metadata that begin with "x-amz-meta-" for some more information on the object, like the user that uploaded it, display name, etc...
on the other hand you could always check if the key exists already and prompt for an error

Resources