I've 3 servers running nodejs, 4th server for ngninx load balancer (reverse proxy) and front-end code. everything works perfectly, but i want to manage file uploads. how i can manage this under this infastructure?
for example: if one of these 3 nodejs server uploads file on same server, how can i access this file?
nodejs servers are under example.com/api link, but becaouse of reverse proxy request goes to one server and i dont know on which server is particular file.
should i upload file to all nodejs servers?
If you have three separate node.js servers that are physically on separate servers with their own storage, then the usual way to share access to files is to have some shared storage that all three servers can access and then when any node.js server takes a file upload, it puts the data on the shared storage where everyone can access it.
If your three separate node.js servers are just separate processes on the same box, then they can all already access the same disk storage.
When sharing storage form separate processes or servers, you will have to make sure that your file management code is concurrency-safe - proper file locking when writing, concurrency-safe mechanisms for generating unique file names, safe caching, etc...
Or, you could use a database server for storage that all node.js servers have access to, though if you're just storing data files and don't have lots of meta data associated with them that you want to query and shared file system access is all you really need, then a database may not be the most efficient means of storing the data.
should i upload file to all nodejs servers?
Usually not since that's just not very efficient. Typically, you would upload once to a shared storage location or server or database that all servers can access.
Related
I want to upload a file to AWS s3. I use nodejs as a server.
I wanted to know which approach is better. Uploading it from client side or Server Side?
As the file size is large I am worried about the bandwidth to send to server side and process it.
At the same time I am worried about the security issues of the key to process it from client side.
What are the pros and cons of upload a file from client side as well as server side?
Client -> Your Server -> S3 Upload Considerations:
Security - How will you control who uploads files to your server?
Security - How will you control how large of files are uploaded to your server?
Cost - Inbound traffic to your server is free. Outbound from your server to S3 will be charged to you.
Performance - The type and size of your instance will determine overall upload performance from the user to your server and then from your server to S3.
Reliability - Your server(s) will need to be available for the users to upload to your server.
Reliability - Your server need reliable code to transfer the files to S3.
Complexity - You are in control of and responsible for transferring the files to S3 in the customer's mind. Upload problems to your server will be your fault in the customer's mind.
Client Side Upload Directly to S3 Considerations:
Security - You have many options to provide access rights to the users to upload to S3: a) Signed URLs b) Access Keys (horrible idea) c) Cognito to provide temporary Access Keys. Recommendation: use Signed URLs generated on your server and no access keys of any kind handed to the client.
Cost - Inbound from the client to S3 is free.
Performance - Your server is not the man in the middle in regards to performance. Performance will be limited to the performance of the user's Internet connection. Customers will be less likely (but not always) to blame you for poor upload problems.
Reliability - There are lots of well tested libraries available for uploading to S3. Recommendation: invest in a production quality client (web browser) library that handles Internet interruptions and other issues, retries, nice user prompts, background uploads, etc.
Complexity - You will still need to write code (or purchase) for both the server and the client side. However, by choosing good quality software you can reduce your headaches.
In most cases from the server side. Following are the important points that matter while uploading files to AWS S3.
Authentication of user uploading,
Basic conversion/resize for performance,
Manual logic to control who and how much one can upload,
Securing your AWS secret keys
I have a NodeJS REST API which has endpoints for users to upload assets (mostly images). I distribute my assets through a CDN. How I do it right now is call my endpoint /assets/upload with a multipart form, the API creates the DB resource for the asset and then use SFTP to transfer the image to the CDN origin's. Upon success I respond with the url of the uploaded asset.
I noticed that the most expensive operation for relatively small files is the connection to the origin through SFTP.
So my first question is:
1. Is it a bad idea to always keep the connection alive so that I can
always reuse it to sync my files.
My second question is:
2. Is it a bad idea to have my API handle the SFTP transfer to the CDN origin, should I consider having a CDN origin that could handle the HTTP request itself?
Short Answer: (1) it a not a bad idea to keep the connection alive, but it comes with complications. I recommend trying without reusing connections first. And (2) The upload should go through the API, but there maybe be ways to optimize how the API to CDN transfer happens.
Long Answer:
1. Is it a bad idea to always keep the connection alive so that I can always reuse it to sync my files.
It is generally not a bad idea to keep the connection alive. Reusing connections can improve site performance, generally speaking.
However, it does come with some complications. You need to make sure the connection is up. You need to make sure that if the connection went down you recreate it. There are cases where the SFTP client thinks that the connection is still alive, but it actually isn't, and you need to do a retry. You also need to make sure that while one request is using a connection, no other requests can do so. You would possibly want a pool of connections to work with, so that you can service multiple requests at the same time.
If you're lucky, the SFTP client library already handles this (see if it supports connection pools). If you aren't, you will have to do it yourself.
My recommendation - try to do it without reusing the connection first, and see if the site's performance is acceptable. If it isn't, then consider reusing connections. Be careful though.
2. Is it a bad idea to have my API handle the SFTP transfer to the CDN origin, should I consider having a CDN origin that could handle the HTTP request itself?
It is generally a good idea to have the HTTP request go through the API for a couple of reasons:
For security reasons' you want your CDN upload credentials to be stored on your API, and not on your client (website or mobile app). You should assume that your code for website can be seen (via view source) and people can generally decompile or reverse engineer mobile apps, and they'll be able to see your credentials in the code.
This hides implementation details from the client, so you can change this in the future without the client code needing to change.
#tarun-lalwani's suggestion is actually a good one - use S3 to store the image, and use a lambda trigger to upload it to the CDN. There are a couple of Node.js libraries that allow you to stream the image through your API's http request towards the S3 bucket directly. This means that you don't have to worry about disk space on your machine instance.
Regarding your question to #tarun-lalwani's comment - one way to do it is to use the S3 image url path until the lambda function is finished. S3 can serve images too, if properly given permissions to do so. Then after the lambda function is finished uploading to the CDN, you just replace the image path in your db.
We have a simple express node server deployed on windows server 2012 that recieves GET requests with just 3 parameters. It does some minor processing on these parameters, has a very simple in-memory node-cache for caching some of these parameter combinations, interfaces with an external license server to fetch license for the requesting user and sets it in the cookie, followed by which, it interfaces with some workers via a load balancer (running with zmq) to download some large files (in chunks, and unzips and extracts them, writes them to some directories) and display them to the user. On deploying these files, some other calls to the workers are initiated as well.
The node server does not talk to any database or disk. It simply waits for response from the load balancer running on some other machines (these are long operations taking typically between 2-3 minutes to send response). So, essentially, the computation and database interactions happens on other machines. The node server is only a simple message passing/handshaking server that waits for response in event handlers, initiates other requests and renders the response.
We are not using a 'cluster' module or nginx at the moment. With a bare bones node server, is it possible to accept and process atleast 16 requests simultaneously ? Pages such as these http://adrianmejia.com/blog/2016/03/23/how-to-scale-a-nodejs-app-based-on-number-of-users/ mention that a simple node server can handle only 2-9 requests at a time. But even with our bare bones implementation, not more than 4 requests are accepted at a time.
Is using a cluster module or nginx necessary even for this case ? How to scale this application for a few hundred users to begin with ?
An Express server can handle many more than 9 requests at a time, especially if it isn't talking to a datebase.
The article you're referring to assumes some database access on each request and serving static assets via node itself, rather than a CDN. All of this taking place on a single CPU with 1GB of RAM. That's a database and web server all running on a single core with minimal RAM.
There really are not hard numbers on this sort of thing; You build it and see how it performs. If it doesn't perform well enough, put a reverse proxy in front of it like nginx or haproxy to do load balancing.
However, based on your problem, if you really are running into bottlenecks where only 4 connections are possible at a time, it sounds like you're keeping those connections open way too long and blocking others. Better to have those long running processes kicked off by node, close the connections, then have those servers call back somehow when they're done.
I have an app that receives data from several sources in realtime using logins and passwords. After data is recieved it's stored in memory store and replaced after new data is available. Also I use sessions with mongo-db to auth user requests. Problem is I can't scale this app using pm2, since I can use only one connection to my datasource for one login/password pair.
Is there a way to use different login/password for each cluster or get cluster ID inside app?
Are memory values/sessions shared between clusters or is it separated? Thank you.
So if I understood this question, you have a node.js app, that connects to a 3rd party using HTTP or another protocol, and since you only have a single credential, you cannot connect to said 3rd party using more than one instance. To answer your question, yes it is possibly to set up your clusters to use a unique use/pw combination, the tricky part would be how to assign these credentials to each cluster (assuming you don't want to hard code it). You'd have to do this assignment when the servers start up, and perhaps use a a data store to hold these credentials and introduce some sort of locking mechanism for each credential (so that each credential is unique to a particular instance).
If I was in your shoes, however, what I would do is create a new server, whose sole job would be to get this "realtime data", and store it somewhere available to the cluster, such as redis or some persistent store. The server would then be a standalone server, just getting this data. You can also attach a RESTful API to it, so that if your other servers need to communicate with it, they can do so via HTTP, or a message queue (again, Redis would work fine there as well.
'Realtime' is vague; are you using WebSockets? If HTTP requests are being made often enough, also could be considered 'realtime'.
Possibly your problem is like something we encountered scaling SocketStream (websockets) apps, where the persistent connection requires same requests routed to the same process. (there are other network topologies / architectures which don't require this but that's another topic)
You'll need to use fork mode 1 process only and a solution to make sessions sticky e.g.:
https://www.npmjs.com/package/sticky-session
I have some example code but need to find it (over a year since deployed it)
Basically you wind up just using pm2 for 'always-on' feature; sticky-session module handles the node clusterisation stuff.
I may post example later.
i have an application that has client and a server. The server is basically only used to store the file names that the clients have so that when other clients want to search for files, they can go the server, find the client that has the file they want and receive the file by directly connecting to it. By now, i can get the socket information of the client that has the file requested by the other client. However, i am now confused about how to connect these two clients. Do i have to create a separate client and a server socket between the two clients or there are other ways.
Now You have two choices:-
Let the server continue his role, and the server can act as an intermediary between the two parties. It should download the file from the client which has it and send it (via any suitable protocol) to the client who requested the file. This is called the Client -Server Architecture. This is a simple approach and you have the benefits such as file caching etc. i.e. If in future same file is requested the server can send it directly without asking for the client.
You can continue using the P2P architecture, and Create a separate a socket between the two parties, this is not straight forward and needs special care when multiple processes are working simultaneously.