Heroku dynos not sharing the file system - node.js

My heroku web app has a feature to download images from S3. It works like this:
There is one endpoint (A) to request downloading an array of images, that returns a task id.
Those images are downloaded by A to the tmp Heroku folder of my app. And when all images are downloaded, a zip file is created.
While images can still be downloading, the web client calls another endpoint (B) with the task id from point 1. This second endpoint checks how many images are already downloaded to return a progress percentage. When the zip is already created it "returns" the zip file and the images are downloaded.
This approach worked fine in Heroku with 1 dyno. Unfortunately, after scaling to 2 dynos, we have realised that it doesn't work anymore. The reason is that dynos in Heroku doesn't share the same file system, and endpoint A and B are managed by different dynos. Therefore, the dyno in endpoint B doesn't find any file.
Is there an easy way to make my approach work with multiple dynos?
If not, how should I implement the feature described? (downloading multiple images from S3 in a zip file)

You could create a second S3 bucket and push the zip file to the second S3 bucket once it's done downloading. Then you can redirect the client to download the zip file directly from S3.
Then setup a process to run periodically to clean out anything old in that S3 bucket.

I think the solution is here , that may help
http://technomile.github.io/wordpress/setup.html

Related

I want to record some user actions for my python flask app deployed on heroku. Suggest me ways to do it

I have a Python Flask app deployed on heroku. I want to record user interactions in a file(kind of log file). Since heroku storage is temporary, even though I append actions to a log file the data is lost. I don't want use a DataBase for this simple task. My idea is to have an API that can modify files in a remote file system. I am looking for such remote file system(cloud storage) along with API to accomplish my task.
For example, let us assume that I have 3 buttons on my app and a tracking.txt file. Then
if button1 is clicked, I want to write(append) 1 to tracking.txt .
Similarly for button2 and button3.
I have searched the internet but didn't find any that can fit my exact need or I didn't understand any of them well.
Any help is appreciated. Thanks in advance.
PS: I am open to change my thought if there's no way other than using DB.
one possible solution is to use Amazon S3 together with the Boto3, the Amazon Web Services (AWS) SDK for Python.
You can copy (push) your file from Heroku to an S3 bucket (at intervals or after every change, this depends on your logic)
import boto3
session = boto3.session.Session()
s3 = session.client(
service_name='s3',
aws_access_key_id='MY_AWS_ACCESS_KEY_ID',
aws_secret_access_key='MY_AWS_SECRET_ACCESS_KEY'
)
# upload file from local path to S3 Bucker
s3.upload_file(Bucket='data', Key='files/file1.log', Filename='/tmp/file1.log')
One option with this approach is that you can use localstack for your local development, hence only your (production-like) application on Heroku will send files to S3, while during development you can work offline

using backend files nodejs

Sorry, It might be very novice problem but I am new to node and web apps and just have been stuck on this for couples of days.
I have been working with a API called "Face++" that requires user to upload images to detect faces. So basically users needed to upload images to my webapps backend and my backend would do an API request with that image. I somehow managed to upload the files at my node's backend using tutorial provided below but now I am struggling how to use those image files. I really don't know how to have access to those files. I thought writing just the filepath/filename would help but it did not. I am really new at webapps.
I used tutorial from here: https://coligo.io/building-ajax-file-uploader-with-node/
to upload my files at back-end.
thanks
You can also use the Face++ REST API node client
https://www.npmjs.com/package/faceppsdk
As per in documentation it requires a live URL on web. Then you have to upload your files into remote location (You may upload files to a Amazon S3 Bucket)
And also you check the sample codes from Documentation where you can upload directly to Face++

AWS: What Happens to Static S3 Files When a New Instance of a Website is Deployed?

So a little background. We have a website (js, jquery, less, node) that is hosted on Amazon AWS S3 and is distributed using CloudFront. In the past we have stored our resources statically in the assets folder within app locally and on S3.
Recently we have set up a node lambda that listens to Kinesis events and generates a json file that is then stored within the assets folder in S3. Currently, the file in the bucket with the same key is overwritten and the site using the generated file as it should.
My questions is, what happens to that json file when we deploy a new instance of our website? Even if we remove the json file from the local assets folder, if the deployment overwrites the whole assets directory in the S3 project when a new one is deployed, does that result in the json file being removed?
Thanks in advance!
Please let me know if you need any more clarification.
That will depend on how you'r syncing files, I recommend you use the "sync" command so that only new files are uploaded and only if you specify to delete a file that doesn't exist in your repo but it exists in S3 it will get deleted, otherwise not.
See for example the CLI command docs here: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html ... as you can see, if you specify --delete the files will be deleted.
But not sure what's your use case, do you want that file to get deleted? It seems that you don't want that :)

How to get the updated 'uploads' folder from AWS Elastic BeanStalk?

I hosted my server side coding in elastic beanstalk. I used multer to upload files in 'upload' folder, that means using api the client is able to store images or pdfs etc in this 'upload' folder dynamically. When I hosted the .zip in ebs say some 3 files are stored in uploads folder. And more files are added after hosting. Now if I change my code and deploy my code to ebs, the empty uploads folder is getting created. If I download the previous code, I'm getting only the 3 files which are there at the time of hosting. I'm unable to get back the files added after the code is hosted. How to overcome this?
First rule of hosting an app on ElasticBeanstalk is that your code should be stateless. By stateless, I mean it should not depend on the machine at all as instances get created and shut down depending on scaling requirements.
What I do is do everything you say, upload it to uploads folder but then I store it in S3 (or somewhere where it's safe if instance is terminated). So basically the uploads folder is just a temporary location.
The content which is dynamically created should not be a part of your codebase.
You can't get the data that is lost as whenever you deploy a new version the directory where your code is deployed is erased and new version is copied there. I believe its /var/app/current/.
Whenever you want to deal with uploads in the future, you should follow:
Upload it to a temp directory on instance,
Upload it to somewhere where it's safe (maybe something like AWS S3),
Save the link to the object in safe storage (S3 link) into the Database so you can get the uploads if you want.

For a web app that allows simple image uploads, how should I store the images? Confused about file system vs. cdn

Every search result says something about storing the images in the file system but store the paths in the database, but I'm not sure exactly what "file system" means. Would that mean you have something like:
/public (assets)
/js
/css
/img
/app (frontend)
/server (backend)
and you'd upload directly to that /public/img directory?
I remember trying something like that in the past with a Node.js app hosted on Heroku, and it wouldn't let me. I had to set up Amazon S3 and upload the images THERE, which leads to my confusion.
Is using something like Amazon S3 the usual practice or do people upload directly to the /img directory (assuming this is the "file system"?) and it just happened to be the case that Heroku doesn't allow this but other hosts do?
I'd characterize the pattern as "store the data in a blob storage service, store a pointer in your database". The uploaded file is the "blob" - once it has left the user's computer and filesystem, is it really a file anymore? :) On the server, a file system can store that "blob". S3 can store that blob. In the first case, you are storing a path. In the second case, you are storing the URL to the S3 object. A database could even store that blob (not at all recommended, though...)
In any case, the question to ask is: "what happens when I need two app servers to support my traffic?". Wherever that blob goes, both app servers need access to it.
In a data center under your control, there are many ways to share a filesystem across servers - network attached storage (NFS- or SMB-mounted volumes), or storage area networks (iSCSI, Fibre Channel). With more limited network/hardware configuration options in cloud-based Infrastructure/Platform-as-a-Service providers, the de facto standard is S3 because it is inexpensive, reliable, easy to use, and can completely offload serving the file from your servers.
For Heroku, though, you don't have much control over the file system. And, know that the file system for each of your dynos is "ephemeral" - it goes away when the dyno restarts. Which will happen when your app goes idle, or every 24 hours, whichever comes first. So that forces the choice a little.
Final point - S3 comes with the ancillary benefit of taking the burden of serving the blob off of your servers. You can also store files directly to S3 from the browser, without routing it through your app (see https://devcenter.heroku.com/articles/s3-upload-node). The benefit in both cases is that those downloads/uploads can take up lots of your application's precious time for stuff that's pretty rote.
Uploading directly to a host file system is generally not a best practice. This is one reason services like S3 are so popular.
If you're using the host file system and ever need more than one instance of a server, the file systems will grow out of sync. Imagine one user uploads 'foo.jpg' to server A (A/app/uploads) and another uploads 'bar.jpg' to server B (B/app/uploads). When either of these images is later requested, the request has a 50% chance of failing, depending on whether the load balancer routes the request to server A or server B.
There are several ancillary benefits to avoiding the host filesystem. For instance, you can set the filesystem serving your app to read-only for increased security. Files are a form of state, and stateless web servers allow you to do things like blow away one instance and deploy another instance to take over its work.
You might find this of help:
https://codeforgeek.com/2014/11/file-uploads-using-node-js/
I used multer in my node.js server file to handle uploading from the front end. Basically I had an html form that would submit the image to the server file, where it would be handled by multer. This actually led it to be saved in the file system (to answer your question concretely, yes, this was to something like the /img directory right in your project file structure). My application is running on heroku, and this feature works on there as well. However, I would not recommending using the file system to store your image like this (I doubt you will have enough space for a large amount of images/files) - using AWS storage or a DB would be better.

Resources