NodeJS API sync uploaded files through SFTP - node.js

I have a NodeJS REST API which has endpoints for users to upload assets (mostly images). I distribute my assets through a CDN. How I do it right now is call my endpoint /assets/upload with a multipart form, the API creates the DB resource for the asset and then use SFTP to transfer the image to the CDN origin's. Upon success I respond with the url of the uploaded asset.
I noticed that the most expensive operation for relatively small files is the connection to the origin through SFTP.
So my first question is:
1. Is it a bad idea to always keep the connection alive so that I can
always reuse it to sync my files.
My second question is:
2. Is it a bad idea to have my API handle the SFTP transfer to the CDN origin, should I consider having a CDN origin that could handle the HTTP request itself?

Short Answer: (1) it a not a bad idea to keep the connection alive, but it comes with complications. I recommend trying without reusing connections first. And (2) The upload should go through the API, but there maybe be ways to optimize how the API to CDN transfer happens.
Long Answer:
1. Is it a bad idea to always keep the connection alive so that I can always reuse it to sync my files.
It is generally not a bad idea to keep the connection alive. Reusing connections can improve site performance, generally speaking.
However, it does come with some complications. You need to make sure the connection is up. You need to make sure that if the connection went down you recreate it. There are cases where the SFTP client thinks that the connection is still alive, but it actually isn't, and you need to do a retry. You also need to make sure that while one request is using a connection, no other requests can do so. You would possibly want a pool of connections to work with, so that you can service multiple requests at the same time.
If you're lucky, the SFTP client library already handles this (see if it supports connection pools). If you aren't, you will have to do it yourself.
My recommendation - try to do it without reusing the connection first, and see if the site's performance is acceptable. If it isn't, then consider reusing connections. Be careful though.
2. Is it a bad idea to have my API handle the SFTP transfer to the CDN origin, should I consider having a CDN origin that could handle the HTTP request itself?
It is generally a good idea to have the HTTP request go through the API for a couple of reasons:
For security reasons' you want your CDN upload credentials to be stored on your API, and not on your client (website or mobile app). You should assume that your code for website can be seen (via view source) and people can generally decompile or reverse engineer mobile apps, and they'll be able to see your credentials in the code.
This hides implementation details from the client, so you can change this in the future without the client code needing to change.
#tarun-lalwani's suggestion is actually a good one - use S3 to store the image, and use a lambda trigger to upload it to the CDN. There are a couple of Node.js libraries that allow you to stream the image through your API's http request towards the S3 bucket directly. This means that you don't have to worry about disk space on your machine instance.
Regarding your question to #tarun-lalwani's comment - one way to do it is to use the S3 image url path until the lambda function is finished. S3 can serve images too, if properly given permissions to do so. Then after the lambda function is finished uploading to the CDN, you just replace the image path in your db.

Related

Reusing RabbitMQ connection per expressjs api request

We are trying to use rabbitmq using this library called amqplib (not node-amqplib). Now all docs and examples I read say the connections should be long term. So my original thought of as soo nas someone makes a request to our web server (express), it will open a connection thus open a channel thus a queue so on and so forth. But that means per request we will be constantly opening and closing which is said ia not how you should ot it.
So, with that said, if you have an express server or some nodejs arch which involves pusblishing data per http request, is there some standard? do i make the connection object global and then any functions i need to use it, i pass it in/call it? is there som,e better way to get that connection instance per request without re-creating it?
Thanks in advance!

which approach is better? Upload a file from client side or from server side?

I want to upload a file to AWS s3. I use nodejs as a server.
I wanted to know which approach is better. Uploading it from client side or Server Side?
As the file size is large I am worried about the bandwidth to send to server side and process it.
At the same time I am worried about the security issues of the key to process it from client side.
What are the pros and cons of upload a file from client side as well as server side?
Client -> Your Server -> S3 Upload Considerations:
Security - How will you control who uploads files to your server?
Security - How will you control how large of files are uploaded to your server?
Cost - Inbound traffic to your server is free. Outbound from your server to S3 will be charged to you.
Performance - The type and size of your instance will determine overall upload performance from the user to your server and then from your server to S3.
Reliability - Your server(s) will need to be available for the users to upload to your server.
Reliability - Your server need reliable code to transfer the files to S3.
Complexity - You are in control of and responsible for transferring the files to S3 in the customer's mind. Upload problems to your server will be your fault in the customer's mind.
Client Side Upload Directly to S3 Considerations:
Security - You have many options to provide access rights to the users to upload to S3: a) Signed URLs b) Access Keys (horrible idea) c) Cognito to provide temporary Access Keys. Recommendation: use Signed URLs generated on your server and no access keys of any kind handed to the client.
Cost - Inbound from the client to S3 is free.
Performance - Your server is not the man in the middle in regards to performance. Performance will be limited to the performance of the user's Internet connection. Customers will be less likely (but not always) to blame you for poor upload problems.
Reliability - There are lots of well tested libraries available for uploading to S3. Recommendation: invest in a production quality client (web browser) library that handles Internet interruptions and other issues, retries, nice user prompts, background uploads, etc.
Complexity - You will still need to write code (or purchase) for both the server and the client side. However, by choosing good quality software you can reduce your headaches.
In most cases from the server side. Following are the important points that matter while uploading files to AWS S3.
Authentication of user uploading,
Basic conversion/resize for performance,
Manual logic to control who and how much one can upload,
Securing your AWS secret keys

What are some strategies to prevent flooding/abuse of api requests

I have an API on my server(node) that writes new data into my database.
To use the API the user is required to provide a token which acts as an identifier. So if someone floods my database or abuses the api, I can tell who it is.
But, what are some techniques I can use to prevent the ability to flood or hang my server all together? Notice that most request to the API are done by the server itself, so, in theory I might get dozens of requests a second from my own server's address.
I'd love to get some references to reading materials.
Thanks!
You could use this module: https://www.npmjs.com/package/ddos to put limits depending on the user.
However you will still be exposed to larger scale ddos attacks. These attacks cannot be stopped at the node.js level since they often target infrastructure. This is another can of worms however.
Try to configure limits on proxy or/and load balancer.
Alternatively, you can use rate-limiter-flexible package to limit number of requests by user per N seconds.
There is also black-and-white list, so you're able to whitelist your server's IP.

How to detect and possibly ignore processing a bad/hung client browser request

I'm developing a node web application. And, while testing around, one of the client chrome browser went into hung state. The browser entered into an infinite loop where it was continuously downloading all the JavaScript files referenced by the html page. I rebooted the webserver (node.js), but once the webserver came back online, it continued receiving tons of request per second from the same browser in question.
Obviously, I went ahead and terminated the client browser so that the issue went away.
But, I'm concerned, once my web application go live/public, how to handle such problem-client-connections from the server side. Since I will have no access to the clients.
Is there anything (an npm module/code?), that can make best guess to handle/detect such bad client connections from within my webserver code. And once detected, ignore any future requests from that particular client instance. I understand handling within the Node server might not be the best approach. But, at least I can save my cpu/network by not rendering to the bad requests.
P.S.
Btw, I'm planning to deploy my node web application onto Heroku with a small budget. So, if you know of any firewall/configuration that could handle the above scenario please do recommend.
I think it's important to know that this is a pretty rare case. If your application has a very large user base or there is some other reason you are concerned with DOS/DDOS related attacks, it looks like Heroku provides some DDOS security for you. If you have your own server, I would suggest looking into Nginx or HAProxy as load balancers for your app combined with fail2ban. See this tutorial.

Setting up a secure back-end NodeJS server for multiple front-end domains

I've been doing a lot of research recently on creating a backend for all the websites that I run and a few days ago I leased a VPS running Debian.
Long-term, I'd like to use it as the back-end for some web applications. However, these client-side javascript apps are running on completely different domains than the VPS domain. I was thinking about running the various back-end applications on the VPS as daemons. For example, daemon 1 is a python app, daemons 2 and 3 are node js, etc. I have no idea how many of these I might eventually create.
Currently, I only have a single NodeJS app running on the VPS. I want to implement two methods on it listening over some arbitrary port, port 4000 for example:
/GetSomeData (GET request) - takes some params and serves back some JSON
/AddSomeData (POST request) - takes some params and adds to a back-end MySQL db
These methods should only be useable from one specific domain (called DomainA) which is different than the VPS domain.
Now one issue that I feel I'm going to hit my head against is CORS policy. It sounds like I need to include a response header for Access-Control-Allow-Origin: DomainA. The problem is that in the future, I may want to add another acceptable requester domain, for example DomainB. What would I do then? Would I need to validate the incoming request.connection.remoteAddress, and if it matched DomainA/DomainB, write the corresponding Access-Control-Allow-Origin?
As of about 5 minutes ago before posting this question, I came across this from the W3C site:
Resources that wish to enable themselves to be shared with multiple Origins but do not respond uniformly with "*" must in practice generate the Access-Control-Allow-Origin header dynamically in response to every request they wish to allow. As a consequence, authors of such resources should send a Vary: Origin HTTP header or provide other appropriate control directives to prevent caching of such responses, which may be inaccurate if re-used across-origins.
Even if I do this, I'm a little worried about security. By design anyone on my DomainA website can use the web app, you don't have to be a registered user. I'm concerned about attackers spoofing their IP address to be equal to DomainA. It seems like it wouldn't matter for the GetSomeData request since my NodeJS would then send the data back to DaemonA rather than the attacker. However, what would happen if the attackers ran a script to POST to AddSomeData a thousand times? I don't want my sql table being filled up by malicious requests.
On another note, I've been reading about nginx and virtual hosts and how you can use them to establish different routes depending on the incoming domain but I don't BELIEVE that I need these things; however perhaps I'm mistaken.
Once again, I don't want to use the VPS as a web-site server, the Node JS listener is going to be returning some collection of JSON hence why I'm not making use of port 80. In fact the primary use of the VPS is to do some heavy manipulation of data (perhaps involving the local MySQL db) and then return a collection of JSON that any number of front-end client browser apps can use.
I've also read some recommendations about making use of NodeJS Restify or ExpressJS. Do I need these for what I'm trying to do?

Resources