API development, one gateway page? - security

Im currently developing an API, and one thing that I decided was to have one gateway.cfm page that the client sends the request to with a sig for verification and etc, and the gateway processes the request and sends the result back by invoking the components needed.
For example gateway.cfm?component=getBooks&sig=232345343 will call the getbooks component and return the JSON.
Ignoring any security issues, will this api suffer and performance since all the requests are going to one page? Or does this not matter to the web server whether all the requests go to the same page or not.
Also this will be secured by SSL too.

It does not matter for the server if all requests go to one page or to different pages. At least, not for the common webservers (e.g. Apache/IIS).
A webserver has a threadpool, each request gets a thread assigned, each thread performs its work and finishes.
However, there is one detail. On a lower level, the threads that process the request all read the same binary/text (dont know if cfm is compiled/interpreted) so for a very short period of time the file is possibly locked for reading. That may introduce a penalty if the number of requests are enormous. You can only find out if this is really a performance bottleneck by benchmarking and testing.
But i think that doing the SSL handshake will kill performance much sooner as the reading lock.

Related

NodeJS Polling per User Structure best practice

My project is a full stack application where a web client subscribes to an unready object. When the subscription is triggered, the backend will run an observation loop to that unready object until it becomes ready. When that happens it sends a message to the frontend through socketIO (suggestions are welcome, I'm not quite sure if it's the best method). My question is how do I construct the observation loop.
My frontend basically subscribes to the backend, and gets a return 200 and will connect to the server per Websocket (socketIO) if it got subscribed correctly, or an error 4XX code if there was something that went wrong. On the backend, when the user subscribes, it should start for that user, a "thread" (I know Nodejs doesn't support threads, it's just for the mental image) that polls an information from an api every 10 or so seconds.
I do that, because the API that I poll from does not support WebHooks, so I need to observe the API response until it's at the state that I want it (this part I already got cleared).
What I'm asking, is there a third party library that actually is meant for those kinds of tasks? Should I use worker threads or simple setTimeouts abstracted by Classes? The response will be sent over SocketIO, that part I already got working as well, it's just the method I'm using im not quite sure how to build.
I'm also open to use another fitting programming language that makes solving this case easier. I'm not in a hurry.
A polling network request (which it sounds like this is) is non-blocking and asynchronous so it doesn't really take much of your nodejs CPU unless you're doing some heavy-weight computation of the result.
So, a single nodejs thread can make a lot of network requests (for your polling and for sending data over socket.io connection) without adding WorkerThreads or clustering. This is something that nodejs is very, very good at.
I'm not aware of any third party library specifically for this as you have to custom code looking at the results of the network request anyway and that's most of the coding. There are a bunch of libraries for making http requests of other servers from nodejs listed here. My favorite in that list is got(), but you can look at the choices and decide what you like.
As for making the repeated requests, I would probably just use either repeated setTimeout() calls or a setInterval() call.
You don't say whether you have to make separate requests for every single client that is subscribed to something or whether you can somehow combine all clients watching the same resource so that you use the same polling interval for all of them. If you can do the latter, that would certainly be more efficient.
If, as you scale, you run into scaling issues, you can then move the polling code to one or more child processes or WorkerThreads and then just communicate back to the main thread via messaging when you have found a new state that needs to be sent to the client. But, I would not anticipate you would need to code that extra step until you reach larger scale. As with most scaling things, you would need to code up the more basic option (which should scale well by itself) and then measure and benchmark and see where any bottlenecks are and modify the architecture based on data, not speculation. Far too often, the architecture is over-designed and over-implemented based on where people think the bottlenecks might be rather than where they actually turn out to be. Not only does this make the development take longer and end up with more complicated implementation than required, but it can target development at the wrong part of the problem. Profile, measure, then decide.

Node.js design approach. Server polling periodically from clients

I'm trying to learn Node.js and adequate design approaches.
I've implemented a little API server (using express) that fetches a set of data from several remote sites, according to client requests that use the API.
This process can take some time (several fecth / await), so I want the user to know how is his request doing. I've read about socket.io / websockets but maybe that's somewhat an overkill solution for this case.
So what I did is:
For each client request, a requestID is generated and returned to the client.
With that ID, the client can query the API (via another endpoint) to know his request status at any time.
Using setTimeout() on the client page and some DOM manipulation, I can update and display the current request status every X, like a polling approach.
Although the solution works fine, even with several clients connecting concurrently, maybe there's a better solution?. Are there any caveats I'm not considering?
TL;DR The approach you're using is just fine, although it may not scale very well. Websockets are a different approach to solve the same problem, but again, may not scale very well.
You've identified what are basically the only two options for real-time (or close to it) updates on a web site:
polling the server - the client requests information periodically
using Websockets - the server can push updates to the client when something happens
There are a couple of things to consider.
How important are "real time" updates? If the user can wait several seconds (or longer), then go with polling.
What sort of load can the server handle? If load is a concern, then Websockets might be the way to go.
That last question is really the crux of the issue. If you're expecting a few or a few dozen clients to use this functionality, then either solution will work just fine.
If you're expecting thousands or more to be connecting, then polling starts to become a concern, because now we're talking about many repeated requests to the server. Of course, if the interval is longer, the load will be lower.
It is my understanding that the overhead for Websockets is lower, but still can be a concern when you're talking about large numbers of clients. Again, a lot of clients means the server is managing a lot of open connections.
The way large services handle this is to design their applications in such a way that they can be distributed over many identical servers and which server you connect to is managed by a load balancer. This is true for either polling or Websockets.

Websockets for non-realtime apps?

I have been studying web sockets recently and plan to use them in my application even though the app is not realtime. I am mostly doing this because I want to try it out and further down the line it might open more possibilites for the app's functionality. Also I am not bothered about having an API for mobile at the moment but think it would still be possible to have some kind of api over web sockets if I needed it in the future.
However for in-production apps are there any real reasons why somebody would consider implementing websockets if there is no real-time element?
Are there any benefits over HTTP requests other than the real timeness of it?
HTTP requests include the full HTTP headers. Depending on the cookie load, this may reach a couple of KB per request. WebSocket protocol headers are minimal compared to that. If you have a lot of requests and care about bandwidth then going with WebSocket makes sense.
Additionally a HTTP connection is (traditionally) negotiated for each request, which means you have overhead on each request compared to WebSocket, which has persistent connections. Connection establishment takes time (hence the advantage in real-time applications), but it also uses resources on the server. Again, depending on your app's communication patterns, using WebSocket may make sense.

Node.js App with API endpoints which take 20sec+ :: Connections Left Open :: How to Optimize?

I have a Node.js RESTful API returning JSON data. One of the API calls can (and frequently does) take 10 - 20 seconds to finish. This long RTT is due to connecting to external APIs, like DiffBot, MailChimp, Facebook, Twitter, etc. I wish I could make the API call shorter, but I cannot.
Of course, I've implemented the node code in a nice async way, but the problem is that the client's inbound connection (to the node app) is alive while it waits for the server to finish, and thus might be killing my performance. In fact, I'm currently guessing that this may explain my long-running timeout issue in node.
I've already increased maxSockets to a huge number...
require('http').globalAgent.maxSockets = 9999;
For the sake of interest, I'm printing out the active sockets each time a new connection is made (here's the code).
Which gives me output like this:
SOCKETS: {} { 'graph.facebook.com:443': 5, 'api.instagram.com:443': 1 }
Nothing too enlightening there. The max connections I ever see is around 20 or so, total, across all hosts. But this doesn't really tell me anything about incoming connections, or how to optimize them so that my server does not choke when there are many of them alive at once (which I suspect it is).
You should optimize your architecture, not just the code.
First, I would change the way the client/server interact with each other. The server should end the request upon recept and notify the client once all the tasks for that request are truly complete.
There are different ways to achieve that. For example, the client can query the stats of the request using AJAX (poll) every X seconds. Another example would be to use WebSocket.
If you're going with this approach, look into Socket.IO. It supports many transports with the same API, if WebSocket is available, it would use that, otherwise, it would fall back to other transports such as Flash Socket, long-polling, etc.
Second, you shouldn't use one process to do all this work. You should use a queue (preferably a messaging system that supports queues), then, run workers (separate processes) to do the "heavy lifting".
Personally, I use AMQP due to its features and portability (it's an open-standard) but feel free to use any other queue system with a persistant backend.
That way, if one or more process(es) crash(es) and you use the right queue, you wouldn't lose any data (such as the API tasks you mentioned).
Hope it helps.

Being both event-driven servers, why node.js needs async code where Nginx doesn't?

The question is in the title. In another words, if Nginx works as the same event-driven async IO model of node.js, why doesn't it requires writing async style code? I know, Nginx is NOT actually executing any code, rather proxying them to who can. Then why doesn't node do so? Are we missing anything in the current Ngninx way? Or, gaining anything more from node (apart from the pain of writing async codes)?
Ps.
To be more specific, how different is Nginx+php-fpm or Nginx+wsgi+python/ruby from node alone regarding performance or utilizing computing resource that node claims? Couldn't node just use existing FastCGI models, be a sync style JavaScript interpreter and let webserver do its async job?
Cross-posted from NodeJS google groups:
Okay i'll try my best to answer your question:
Nginx is a web server that only proxies requests. Now if you take the example of Nginx+php+fpm or Nginx+wsgi+ruby you are having an asynchronous, evented web server sitting in front of webserver that is executing synchronously. So Nginx will accept() as many connections as possible and all of them would be queued. The requests from Nginx to your backend synchronous server would be asynchronous. But your backend synchronous server which also does accept() is not queuing any connections. It can serve only one request at a time (considering you are single threaded) and multiple requests at a time (prefork/fork(slow)/multithreaded -> has its own drawbacks like thread creation time(can be avoided with thread-pools but PITA to implement), context switches, thread deadlocks, number of connections accept()ed can never be greater than number of threads etc)
Imagine you have 2 routes to your backend server that Nginx is hitting:
/404, /login.
If the /login route is doing a lot of I/O and if another request is made to /404, the rendering of the /404 page will depend on the completion of /login's request (because the process is blocked). So basically the response to any request will depend on the request that takes the longest time to do I/O. So even though Nginx is async and evented its response time for any request will depend entirely on that one request that takes the longest time to finish (culprit: the synchronous backend server).
Now if you take the example of NodeJS, everything is asynchronous and evented. Be it File/Network I/O etc. So nothing blocks the process. So taking the previous example, even if /login route is doing a lot of I/O its all asynchronous and /404 page is rendered immediately.
My explanation is quite rudimentary. But I think it should give you more clarity.
nginx is a simple static HTTP and proxy server. Node.js is a full-featured application platform.
Why would you not expect the more specialised application to have abstracted away all the internal workings that you don't need to control directly?
Edit:
Your PS is pretty similar to this question, and is concerned specifically with using Node.JS as an HTTP server. Bear in mind that v0.4.12 had just been released when that question was closed - v0.8.5 is the latest stable release at the moment. The key point anyway is it depends what you're trying to achieve.
This blog post describes a Node.JS-based set-up achieving 250k concurrent connections on a single server. A quick google search shows people attempting similar with nginx+php struggling to reach 100k with far more hardware resources available.

Resources