Caching and Replay Proxy Server in Deployment

Caching and Replay Proxy Server in Deployment - security

I have a logging server that receives data from some stateless clients on a single network (inaccessible from the outside world). I'd like to make sure all logs are eventually received by the server, even if the internet connection goes down.
To do this the easiest solution would be to set up a proxy server, and have the client log to both the logging server and the proxy server. The proxy server then tries to log to the logging server, and if it fails it caches the request for later. Something like this:
Notes:
All requests are idempotent.
The clients are stateless (logs can not be cached on the clients)
All parts of the system, except the intermediate "internet" step, are configurable.
The proxy server does not need to read or modify the data.
The logging server response is not used by the client.
I cannot make significant changes to the client or logging server (Cassandra would be great for this application, though).
My questions: is there any off the shelf software that can serve as the proxy? If not, anything to think about when writing this? Are there any concerns with this scheme?

your proxy looks like a simple persistent queue. all you have to do is to add/configure connector to the logging server.
but even without a queue the whole process looks like 2 db queries and and 2 rest calls - you will probably waste more time comparing different products than writing it on your own

Related

Simple message passing Nodejs server accepting only 4 requests at a time

We have a simple express node server deployed on windows server 2012 that recieves GET requests with just 3 parameters. It does some minor processing on these parameters, has a very simple in-memory node-cache for caching some of these parameter combinations, interfaces with an external license server to fetch license for the requesting user and sets it in the cookie, followed by which, it interfaces with some workers via a load balancer (running with zmq) to download some large files (in chunks, and unzips and extracts them, writes them to some directories) and display them to the user. On deploying these files, some other calls to the workers are initiated as well.
The node server does not talk to any database or disk. It simply waits for response from the load balancer running on some other machines (these are long operations taking typically between 2-3 minutes to send response). So, essentially, the computation and database interactions happens on other machines. The node server is only a simple message passing/handshaking server that waits for response in event handlers, initiates other requests and renders the response.
We are not using a 'cluster' module or nginx at the moment. With a bare bones node server, is it possible to accept and process atleast 16 requests simultaneously ? Pages such as these http://adrianmejia.com/blog/2016/03/23/how-to-scale-a-nodejs-app-based-on-number-of-users/ mention that a simple node server can handle only 2-9 requests at a time. But even with our bare bones implementation, not more than 4 requests are accepted at a time.
Is using a cluster module or nginx necessary even for this case ? How to scale this application for a few hundred users to begin with ?

An Express server can handle many more than 9 requests at a time, especially if it isn't talking to a datebase.
The article you're referring to assumes some database access on each request and serving static assets via node itself, rather than a CDN. All of this taking place on a single CPU with 1GB of RAM. That's a database and web server all running on a single core with minimal RAM.
There really are not hard numbers on this sort of thing; You build it and see how it performs. If it doesn't perform well enough, put a reverse proxy in front of it like nginx or haproxy to do load balancing.
However, based on your problem, if you really are running into bottlenecks where only 4 connections are possible at a time, it sounds like you're keeping those connections open way too long and blocking others. Better to have those long running processes kicked off by node, close the connections, then have those servers call back somehow when they're done.

Handle subscriber server offline using redis pubsub?

I currently am creating a horizontally scalable socket.io server which looks like the following:
LoadBalancer (nginx)
Proxy1 Proxy2 Proxy3 Proxy{N}
BackEnd1 BackEnd2 BackEnd3 BackEnd4 BackEnd{N}
Where the proxies are using sticky session + cluster each with a socket.io server running on a core and being load balanced by the nginx proxy.
Now to my question, these backend nodes use redis pubsub to communicate with the proxies which are handling all the communication via the transport (websockets).
When a request is sent to a backend server by a proxy, it knows the user who requested it, along with the proxy the user is on. My fear is that, when a proxy server goes offline for whatever reason, any pending request on my backend nodes will fail to reach the user when it comes back online because the messages where sent while the server was offline. What can I implement to circumvent this issue and essentially have messages get queued while any proxy server is offline, then delivered when its back on?

Pubsub doesn't persist messages. At all. In order to use Redis for this you would need to use a queue instead. For example you can use a combination of list operations where the producer pushes them to a list and you client server uses a BLPOP or BRPOP depending on how you add them and whether you want messages in FIFO or LIFO sequence.

node.js built in support for handling requests for same data

In my node.js server app I'm providing a service to my js client that performs some handling of remote api's.
It might very well be possible that two different clients request the same information. Say client 1 requests information, then before client 1's request is fully handled (remote api's didn't returns their response yet) client 2 is requesting the same data. What I'd want to is to wait for client 1 data to be ready and then write it to both client 1 and 2.
This seems to me like a very common issue and I was wondering if there was any library or built-in support in connect or express that supports this issue.

You might not want to use HTTP for providing the data to the client. Reasons:
If the remote API is taking a lot of time to process you will risk the client request to timeout, or the browser to repeat the request.
You will have to share some state between requests which is not a good practice.
Have a look at websockets (socket.io would be a place to start). With them you can push data from the server to the client. In your scenario, clients will perform the request to the server, which will return 202 and when the remote API will respond, the server will push the data to the clients using websockets.

It is interesting to create a new node app to handle socket.io?

I want to add on an existing project some sockets with nodeJs and Socket.io.
I already have 2 servers :
An API RESTful web service, to storage and manage my datas.
A Public web service to return HTML, assets (js, css, images, ...)
On the first try, I create my socket server on the Public one. But I think it will be better if I create an other one to handle only socket query.
What do you think ? It's a good idea or just an useless who will add more problem than solve (maybe duplicate intern lib, ..)
Also, i'm using token to communicate between Public and API, do I have to create another to communication between socket and API ? Or I can use the same one ?
------[EDIT]------
As nobody didn't understand me well I have create a schema with the infrastructure I was thinking about.
It is a good way to proceed ?
The Public Server and Socket server have to be the same ? Or can be separate ?
Do I must create a socket connection between API and Socket server for each client connected ?
Thank you !

Thanks for explaining better.
First of all, while this seems reasonable, this way of using Socket.io is not the most common one. The biggest advantage of using Socket.io is that it keeps a channel open for 2-way communication. The main advantage of this is that the server itself can send messages to the client without the latter having to poll periodically.
Think, for example, of a mail client. Without sockets, the browser would have to poll periodically to check for new mail. With an open socket connection, instead, as soon as a new mail comes the server notifies the client immediately.
In your case, the benefits could be limited, and I'm not sure the additional complexity of a Socket.io server (and cost!) would really be worth the modest speed improvement on REST requests. However, at the end it's up to you.
In answer to your points
See above
If the "public server" is not written in Node.js they can't be the same application. Wether they reside on the same server, it's up to you and your budget. Ideally they should be separate, for bigger workloads.
If you just want the socket server to act as a real-time proxy, then yes, you'll have to create a socket connection for each request. How that will work is:
The client requests a resource to the Socket.io server.
The Socket.io server does the normal HTTP request to the API server (e.g. using request)
The response is returned to the client over the socket connection
The workflow represented in #3 is the reason why you should expect only moderate performance improvement. Indeed, you'll get some better latency, but most of the overhead for starting a HTTP request is still there!

Apache httpd request/response logging

Is there a way to log request (full GET URI, response and POST data and response) content with Apache? I have a bunch of games that communicate with the client side over HTTP, they use different variables and output all sorts of things.
I'd like to push all this content into a database for further processing so I can report game play step by step. Cannot modify the server side game files themselves to log this data, they're too many (thousands).
It's not much data, up to 512 bytes or 1K per of both request and response data.
Can't set up a varnish or squid to do it, I have lots of back-end servers, can't add yet another layer to this, I have a load of stuff happening before the app servers (load balancing, firewall, whatnot).
TIA

mod_dumpio will get it into a file.
http://httpd.apache.org/docs/2.2/mod/mod_dumpio.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string