Static Files and Heroku - node.js

I've been gathering some ideas for a web app, which is to include functionality for merging PDFs based upon user selections. There isn't going to be a giant amount of documents, but the problem is that the Node plugin I'm using to merge them doesn't seem to be able to draw them down from S3 for the process. I know that storing static files on Heroku is frowned-upon, but if they're not something that the user changes, then is it okay to store some of them there, or is there something I'm overlooking? The argument I've heard against storing anything on Heroku is that it's ephemeral, so the PDFs that the user generates would be deleted when the Dyno is restarted...but that's no problem, because when they create them, it's just a one and done download situation. Am I going to run into any issues storing just 100-200 MB of PDFs on the Dyno, or is there some clever way I could bridge that gap?

Related

Saving records in a protected folder .htaccess, secure?

I have done quite a lot of searching but am not really able to find a clear answer. I'm wondering if storing simple generated record documents (.txt files, e.g. purchase records) in a protected directory with deny from all is secure? Obviously, anyone going directly to the file in the browser will not be able to access it, but I wonder if the information in these text files is visible in other ways?
Why store them in a place accessible by the browser? Can’t you place the files somewhere else in the server, in a directory that is not seen by the http server?
I assume you would like to access them later through the browser and if that’s the case, can’t you create those reports on the fly each time a request is made for them? I have seen servers littered with saved reports when the best solution would have been to generate the reports again by retrieving data from a database. Please do not take this as an insult, but if my assumption is correct, try to consider another solution.
Technically, the answer to your question is “those files are not accessible if the server is configured correctly, you have no bugs in the code, etc.”

Huge files download, what should I use to build this?

I want to build an API that the user can download files, but there is one problem, the files can be huge, something like more than 100GB sometimes. I'm thinking about making the API using nodejs, but I don't know if it gonna be a great idea to make the file download features using node, some users may spend more than a day to make a download, node is single thread and I'm afraid that can hold to much time and make the others request slower, or worse, block them.
I gonna use clouding computing to host this API, I gonna start to study serverless hosts to see if it worths it in my case. Do you have any idea what I should use to make the download feature? There is an open-source code to use as an example?

Best image upload directory structure practises?

I have developed a large web application with NodeJS. I allow my users to upload multiple images to my Google Cloud Storage bucket.
Currently, I am storing all images under the same directory of /uploads/images/.
I'm beginning to think that this is not the safest way, and could effect performance later down the track when the directory has thousands of images. It also opens up a threat since some images are meant to be private, and it could allow users to search for images by guessing a unique ID, such as uploads/images/29rnw92nr89fdhw.png.
Would I be best changing my structure to something like /uploads/{user-id}/images/ instead? That way each directory only has a couple dozen images. Although, can a directory handle thousands of other subdirectories without suffering performance issues? Does Google Cloud Storage happen to accomodate for issues like this?
GCS does not actually have "directories." They're an illusion that the UI and command-line tools provide as a nicety. As such, you can put billions of objects inside the same "directory" without running into any problems.
One addendum there: if you are inserting more than a thousand objects per second, there are some additional caveats worth being aware of. In such a case, you would see a performance benefit to avoiding sequential object names. In other words, uploading /uploads/user-id/images/000000.jpg through /uploads/user-id/images/999999.jpg, in order, in rapid succession, would likely be slower than if you used random object names. GCS has documentation with more on this, but this should not be a concern unless you are uploading in excess of 1000 objects per second.
A nice, long GUID should be effectively unguessable (or at least no more guessable than a password or an access token), but they do have the downside of being non-revocable without renaming the image. Once someone knows it, they know it forever and can leak it to others. If you need firm control of your objects, you could keep them all private and visible only to your project and allow users to access them only via signed URLs. This offers you the most flexibility and control, but it's also harder to implement.

storing quick analytics using redis and node.js

I am new to redis and would like to store the web analytic of web site globally and per user activity .
Below is what i am stuck with.
// to get all unique ips
client.sadd('visitors',ip);
// to records hits per ip
client.hincrby('hits',ip,1);
The above so far works fine and i do get number of different ips and hit counter per ip.
the problem comes to store the activities made by each ip. i.e. Storing the link he clicked, searches he did, with datetime
Can some one please throw light on how to best manage it.
Thanks
the problem comes to store the activities made by each
You will need a separate structure for storing these.
The simplest rational structure is to have a "list of actions by session". Take a look at the sorted sets commands which provide a basic framework for creating a list of actions within a session.
This will get you something quickly. However, this is probably not what you really want. In fact redis is probably not useful for this at all.
If you want to re-trace an entire site visit you really want to connect to some sort of true analytics framework. There are dozens of website tracking tools that provide this type of functionality, so it's not really clear that building one is very efficient.

What kind of security issues will I have if I provide my web app write access?

I would like to give my web application write access to a particular folder on my web server. My web app can create files on this folder and can write data to those files. However, the web app does not provide any interface to the users nor does it publicize the fact that it can create files or write to files. Am I susceptible to any security vulnerabilities? If so, what are they?
You are suspectible to having your server tricked into writing malicious files into that location.
The issues that can arrive out of that depend on what happens with that folder.
Is it web-accessible?
Then malicious files can be hosted, such as stealing cookies or serving up malware.
Is it a folder where applications are executed automatically?
This would be madness. Do not do this.
Is just some place where you store files for later processing?
Consider what could happen if malicious files are put there. Malicious PDFs, say, fed into your PDF processing system, and then some PDF bug is executed that causes some malicious code to be executed, and then it's all over.
Basically, the issue you expose yourself to, potentially, is as I said - malicious files in that location. You can think through carefully what happens in that folder, and how exposed it is, and decide for yourself how risky it is.
With those risks identified, you can then decide how to go ahead. And obviously, you probably don't allow direct uploads to that area, so you can consider the risk being significantly less, because you are basically assessing a situation in which someone has identified a bug in your webserver that lets them, without you providing access, telling it to save some file in some place. I'd hazard to say there aren't hundreds of these types of issues. There may be though. Hence the reason it is appropriate to minimise the risk of a file in that folder, by making sure the folder and files therein are used in a restricted way and, if possible, checked to see if they are "good" files.

Resources