How to get the updated 'uploads' folder from AWS Elastic BeanStalk? - node.js

I hosted my server side coding in elastic beanstalk. I used multer to upload files in 'upload' folder, that means using api the client is able to store images or pdfs etc in this 'upload' folder dynamically. When I hosted the .zip in ebs say some 3 files are stored in uploads folder. And more files are added after hosting. Now if I change my code and deploy my code to ebs, the empty uploads folder is getting created. If I download the previous code, I'm getting only the 3 files which are there at the time of hosting. I'm unable to get back the files added after the code is hosted. How to overcome this?

First rule of hosting an app on ElasticBeanstalk is that your code should be stateless. By stateless, I mean it should not depend on the machine at all as instances get created and shut down depending on scaling requirements.
What I do is do everything you say, upload it to uploads folder but then I store it in S3 (or somewhere where it's safe if instance is terminated). So basically the uploads folder is just a temporary location.
The content which is dynamically created should not be a part of your codebase.
You can't get the data that is lost as whenever you deploy a new version the directory where your code is deployed is erased and new version is copied there. I believe its /var/app/current/.
Whenever you want to deal with uploads in the future, you should follow:
Upload it to a temp directory on instance,
Upload it to somewhere where it's safe (maybe something like AWS S3),
Save the link to the object in safe storage (S3 link) into the Database so you can get the uploads if you want.

Related

Path where I store profile pictures is unreachable in production mode

My goal is to upload a profile picture. I did this in development mode using multer in Node.js. Multer asks for a path where to save the new picture.
In development mode, my Angular frontend and my Node.js backend were in the same file (see below for the project structure). The destination path used in Multer worked for development mode.
I then deployed my backend and frontend separately and now this path doesn't work. How can I make sure that the uploaded profile pictures end up in the same map as it did in development?
This is the structure in development mode. SRC map contains the Angular frontend code and backend contains the Node.js backend.
This is the path I used to store uploaded profile pictures with Multer. The problem now is that I deployed my backend and frontend separately to Heroku and so this path doesn't work anymore.
How can I change my path so that my uploaded profile pictures still get added to this assets/images/profile-pictures map?
The filesystem that Heroku provides is ephemeral: any changes you make to it will be lost the next time your dyno restarts. This happens frequently (at least once per day).
Instead of storing uploaded files on the local filesystem, Heroku recommends storing them on a third-party service like Amazon S3. The multer-s3 library should let you do that fairly easily.
Once the files have been stored you can access them via Amazon's SDK or, if you've configured your uploads accordingly, via HTTP. Regular HTTP access can be authenticated or anonymous.

AWS: What Happens to Static S3 Files When a New Instance of a Website is Deployed?

So a little background. We have a website (js, jquery, less, node) that is hosted on Amazon AWS S3 and is distributed using CloudFront. In the past we have stored our resources statically in the assets folder within app locally and on S3.
Recently we have set up a node lambda that listens to Kinesis events and generates a json file that is then stored within the assets folder in S3. Currently, the file in the bucket with the same key is overwritten and the site using the generated file as it should.
My questions is, what happens to that json file when we deploy a new instance of our website? Even if we remove the json file from the local assets folder, if the deployment overwrites the whole assets directory in the S3 project when a new one is deployed, does that result in the json file being removed?
Thanks in advance!
Please let me know if you need any more clarification.
That will depend on how you'r syncing files, I recommend you use the "sync" command so that only new files are uploaded and only if you specify to delete a file that doesn't exist in your repo but it exists in S3 it will get deleted, otherwise not.
See for example the CLI command docs here: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html ... as you can see, if you specify --delete the files will be deleted.
But not sure what's your use case, do you want that file to get deleted? It seems that you don't want that :)

For a web app that allows simple image uploads, how should I store the images? Confused about file system vs. cdn

Every search result says something about storing the images in the file system but store the paths in the database, but I'm not sure exactly what "file system" means. Would that mean you have something like:
/public (assets)
/js
/css
/img
/app (frontend)
/server (backend)
and you'd upload directly to that /public/img directory?
I remember trying something like that in the past with a Node.js app hosted on Heroku, and it wouldn't let me. I had to set up Amazon S3 and upload the images THERE, which leads to my confusion.
Is using something like Amazon S3 the usual practice or do people upload directly to the /img directory (assuming this is the "file system"?) and it just happened to be the case that Heroku doesn't allow this but other hosts do?
I'd characterize the pattern as "store the data in a blob storage service, store a pointer in your database". The uploaded file is the "blob" - once it has left the user's computer and filesystem, is it really a file anymore? :) On the server, a file system can store that "blob". S3 can store that blob. In the first case, you are storing a path. In the second case, you are storing the URL to the S3 object. A database could even store that blob (not at all recommended, though...)
In any case, the question to ask is: "what happens when I need two app servers to support my traffic?". Wherever that blob goes, both app servers need access to it.
In a data center under your control, there are many ways to share a filesystem across servers - network attached storage (NFS- or SMB-mounted volumes), or storage area networks (iSCSI, Fibre Channel). With more limited network/hardware configuration options in cloud-based Infrastructure/Platform-as-a-Service providers, the de facto standard is S3 because it is inexpensive, reliable, easy to use, and can completely offload serving the file from your servers.
For Heroku, though, you don't have much control over the file system. And, know that the file system for each of your dynos is "ephemeral" - it goes away when the dyno restarts. Which will happen when your app goes idle, or every 24 hours, whichever comes first. So that forces the choice a little.
Final point - S3 comes with the ancillary benefit of taking the burden of serving the blob off of your servers. You can also store files directly to S3 from the browser, without routing it through your app (see https://devcenter.heroku.com/articles/s3-upload-node). The benefit in both cases is that those downloads/uploads can take up lots of your application's precious time for stuff that's pretty rote.
Uploading directly to a host file system is generally not a best practice. This is one reason services like S3 are so popular.
If you're using the host file system and ever need more than one instance of a server, the file systems will grow out of sync. Imagine one user uploads 'foo.jpg' to server A (A/app/uploads) and another uploads 'bar.jpg' to server B (B/app/uploads). When either of these images is later requested, the request has a 50% chance of failing, depending on whether the load balancer routes the request to server A or server B.
There are several ancillary benefits to avoiding the host filesystem. For instance, you can set the filesystem serving your app to read-only for increased security. Files are a form of state, and stateless web servers allow you to do things like blow away one instance and deploy another instance to take over its work.
You might find this of help:
https://codeforgeek.com/2014/11/file-uploads-using-node-js/
I used multer in my node.js server file to handle uploading from the front end. Basically I had an html form that would submit the image to the server file, where it would be handled by multer. This actually led it to be saved in the file system (to answer your question concretely, yes, this was to something like the /img directory right in your project file structure). My application is running on heroku, and this feature works on there as well. However, I would not recommending using the file system to store your image like this (I doubt you will have enough space for a large amount of images/files) - using AWS storage or a DB would be better.

Upload file and folder structure to S3

The users in my site need to be able to upload a bunch of files and folders into S3 while maintaining the folder structure.
Say they have the following files in their local boxes.
/file1.jpg
/some_folder/file2.jpg
After upload, I need their s3 urls to be
http://s3bucket.amazon.com/usersfolder/file1.jpg
http://s3bucket.amazon.com/usersfolder/some_folder/file2.jpg
How can i do this ? To make matters a little more complicated, Upload from client side can be initiated only after they download an upload policy.
Edit: I would like to know a solution for the front end part of this question. Looks like on server i can use a wildcard character to specify access permissions, so i am good on that part.
I am using Node.JS/Express JS as a backend

Heroku dynos not sharing the file system

My heroku web app has a feature to download images from S3. It works like this:
There is one endpoint (A) to request downloading an array of images, that returns a task id.
Those images are downloaded by A to the tmp Heroku folder of my app. And when all images are downloaded, a zip file is created.
While images can still be downloading, the web client calls another endpoint (B) with the task id from point 1. This second endpoint checks how many images are already downloaded to return a progress percentage. When the zip is already created it "returns" the zip file and the images are downloaded.
This approach worked fine in Heroku with 1 dyno. Unfortunately, after scaling to 2 dynos, we have realised that it doesn't work anymore. The reason is that dynos in Heroku doesn't share the same file system, and endpoint A and B are managed by different dynos. Therefore, the dyno in endpoint B doesn't find any file.
Is there an easy way to make my approach work with multiple dynos?
If not, how should I implement the feature described? (downloading multiple images from S3 in a zip file)
You could create a second S3 bucket and push the zip file to the second S3 bucket once it's done downloading. Then you can redirect the client to download the zip file directly from S3.
Then setup a process to run periodically to clean out anything old in that S3 bucket.
I think the solution is here , that may help
http://technomile.github.io/wordpress/setup.html

Resources