Rsyslog: relay messages from UDP input only - multithreading

Considering a docker container which receives logs on UDP and forwards to a central logging server using TLS, I was wondering if I could be satisfied with one queue or if I needed several.
Indeed, knowing that I won't be doing anything else than sending the logs to the collector, I don't really understand the need for several queues. Surely they make sense if other actions exist such as writing logs on files or duplicate outputs for failover, but what if forwarding is the only task? Wouldn't it be enough to have the main queue and the "send to collector" action in direct queue mode (i.e. no queue)? If the central server goes down (or during network outage) then logs will simply be re-enqueued to the main message queue ?
For example in this scenario below the Action Q makes total sense, but if we discard logging on /var/log/messages wouldn't Action Q be useless ? It's not only useless but also slows down the forwarding, right?

Related

epoll: must I use multi-threading

I've got a basic knowledge from here about epoll. I know that epoll can monitor multiple FDs and handle them.
My question is: can a heavy event block the server so I must use multithreading?
For example, the epoll of a server is monitoring 2 sockets A and B. Now A starts to send lot of messages to the server so the server starts to read them. One second later, B starts to send messages too while A is still sending. In this case, Need I create a thread for these read actions? If I don't, does it mean that the server has no chance to get the messages from B until A finishes its sending?
If you can process incoming messages fast enough (no blocking calls, no heavy computations), you don't need a separate thread. Otherwise, you would benefit from going multi-threaded.
In any case, it helps to understand what happens when you have only one thread and you can't process messages fast enough. If you are working with TCP protocol, the machines sending you the data will simply reduce their transmission rate. When using UDP, some incoming packets will get dropped.

Node script - Failover from one server to another server

I have a nodejs script - lets call it "process1" on server1, and same script is running on server2 - "process2" (just with flag=false).
Process1 will be preforming actions and will be in "running" state at the beginning. process2 will be running but in "block" state with flag programmed within it.
What i want to acomplish is to, implement failover/fallback for this process. If process1 goes down flag on process2 will change, and process2 will take over all tasks from process1 (and vice versa when process1 cames back - fallback).
What is the best approach to do this? TCP connection between those?
NOTE: Even its not too much relevant, but i want to mention that these processes are going to work internally, establishing tcp connection with third server and parsing data we are getting from that server. Both of the processes will be running on both of the servers, but only ONE process at the time can be providing services - running with flag true (and not both of them)
Update: As per discussions bellow and internal research/test and monitoring of solution, using reverse proxy will save you a lot of time. Programming fail-over based on 2 servers only will cover 70% of the cases related with the internal process which is used on the both machines - but you will not be able to detect others 30% of the issues caused because of the issues with the network (especially if you are having a lot of traffic towards DATA RECEIVER).
This is more of an infrastructure problem than it is a Node one, and the same situation can be applied to almost any server.
What you basically need is some service that monitors Server 1 and determines whether it's "healthy" or "alive" and if so continue to direct traffic to it. If the service determines that the server is no longer in a stable condition (e.g. it takes too long to respond, returns an error) it will redirect any incoming traffic to Server 2. When it's happy Server 1 has returned to normal operating conditions it will redirect the traffic back onto it.
In most cases, the "service" in this scenario is a reverse proxy like Nginx or CloudFlare. In your situation, this server would act as a buffer between Data Reciever and your network (Server 1 / Server 2) and route the incoming traffic to the relevant server.
That looks like a classical use case for a reverse proxy. Using a well tested server such as nginx should provide plenty reliability the proxy won't fail (other than hardware failure) and you could put that infront of whatever cluster size you want. You'd even get the benefit of load-balancing if that is applicable and configured properly.
Alternatively and also leaning towards a load-balancing solution, you could have a front server push requests into a queue (ZMQ for example) and either push from the queue to the app server(s) or have your app-server(s) pull tasks from the queue independently.
In both solutions, if it's a requirement not to "push" 2 simultaneous results to your data receiver, you could use an outbound queue that all app-servers push into.

How to design a scalable rpc call listener?

I have to listen for rpc calls , stack them somewhere , process them, and answer. The thing is that they are not run as soon as they come. The response is an ACK for each rpc call recieved.
The problem is that i want to design it in a way that i can have many listening servers writing in the same stack of calls, piling them up as they come.
My objective is to listen to as many calls as possible. How should i achieve this?
My main technology is Perl and node.js but would use any open source software for this task.
It sounds like any kind of job queue will do what you need it to; I'm personally a big fan of using Redis for this kind of thing. Since Redis lists maintain insertion order, you can simply LPUSH your RPC call info on to the end of the list from any number of web servers listening to the RPC calls, and somewhere else (in another process/on another machine, I assume) RPOP (or BRPOP) them off and process them.
Since Node.js uses fully asynchronous IO, assuming you're not doing a lot of processing in your RPC listeners (that is, you're only listening for requests, sending an ACK, and pushing onto Redis), my guess is that Node would be exceedingly efficient at this.
An aside on using Redis for a queue: if you want to ensure that, in the event of a catastrophic failure, jobs are not lost, you'll need to implement a little more logic; from the RPOPLPUSH documentation:
Pattern: Reliable queue
Redis is often used as a messaging server to implement processing of background jobs or other kinds of messaging
tasks. A simple form of queue is often obtained pushing values into a
list in the producer side, and waiting for this values in the consumer
side using RPOP (using polling), or BRPOP if the client is better
served by a blocking operation.
However in this context the obtained
queue is not reliable as messages can be lost, for example in the case
there is a network problem or if the consumer crashes just after the
message is received but it is still to process.
RPOPLPUSH (or
BRPOPLPUSH for the blocking variant) offers a way to avoid this
problem: the consumer fetches the message and at the same time pushes
it into a processing list. It will use the LREM command in order to
remove the message from the processing list once the message has been
processed.
An additional client may monitor the processing list for
items that remain there for too much time, and will push those timed
out items into the queue again if needed.

PUB/SUB with short-lived publisher and long-lived subscribers

Context: OS: Linux (Ubuntu), language: C (actually Lua, but this should not matter).
I would prefer a ZeroMQ-based solution, but will accept anything sane enough.
Note: For technical reasons I can not use POSIX signals here.
I have several identical long-living processes on a single machine ("workers").
From time to time I need to deliver a control message to each of processes via a command-line tool. Example:
$ command-and-control worker-type run-collect-garbage
Each of workers on this machine should receive a run-collect-garbage message. Note: it would be perfect if the solution would somehow work for all workers on all machines in the cluster, but I can write that part myself.
This is easily done if I will store some information about running workers. For example keep the PIDs for them in a known location and open a control Unix domain socket on a known path with a PID somewhere in it. Or open TCP socket and store host and port somewhere.
But this would require careful management of the stored information — e.g. what if worker process suddenly dies? (Nothing unmanageable, but, still, extra fuss.) Also, the information needs to be stored somewhere, thus adding an extra bit of complexity.
Is there a good way to do this in PUB/SUB style? That is, workers are subscribers, command-and-control tool is a publisher, and all they know is a single "channel url", so to say, on which to come for messages.
Additional requirements:
Messages to the control channel must wake up workers from the poll (select, whatever)
loop.
Message delivery must be guaranteed, and it must reach each and every worker that is listening.
Worker should have a way to monitor for messages without blocking — ideally by the poll/select/whatever loop mentioned above.
Ideally, worker process should be "server" in a sense — he should not bother about keeping connections to the "channel server" (if any) persistent etc. — or this should be done transparently by the framework.
Usually such a pattern requires a proxy for the publisher, i.e. you send to the proxy which immediately accepts delivery and then that reliably forwads to the end subscriber workers. The ZeroMQ guide covers a few different methods of implementing this.
http://zguide.zeromq.org/page:all
Given your requirements, Steve's suggestion does seem the simplest: run a daemon which listens on two known sockets - the workers connect to that and the command tool pushes to it which redistributes to connected workers.
You could do something complicated that would probably work, by effectively nominating one of the workers. For example, on startup workers attempt to bind() a PUB ipc:// socket somewhere accessible, like tmp. The one that wins bind()s a second IPC as a PULL socket and acts as a forwarder device on top of it's normal duties, the others connect() to the original IPC. The command line tool connect()s to the second IPC, and pushes it's message. The risk there is that the winner dies, leaving a locked file. You could identify this in the command line tool, rebind then sleep (to allow the connections to be established). Still, that's all a little bit complex, I think I'd go with a proxy!
I think what you're describing would fit well with a gearmand/supervisord implementation.
Gearman is a great task queue manager and supervisord would allow you to make sure that the process(es) are all running. It's TCP based too so you could have clients/workers on different machines.
http://gearman.org/
http://supervisord.org/
I recently set something up with multiple gearmand nodes, linked to multiple workers so that there's no single point of failure
edit: Sorry - my bad, I just re-read and saw that this might not be ideal.
Redis has some nice and simple looking pub/sub functionality that I've not used yet but sounds promising.
Use a mulitcast PUB/SUB. You'll have to make sure the pgm option is compiled into your ZeroMQ distribution (man 7 zmq_pgm).

How to avoid flooding a message queue?

I'm working on an application that is divided in a thin client and a server part, communicating over TCP. We frequently let the server make asynchronous calls (notifications) to the client to report state changes. This avoids that the server loses too much time waiting for an acknowledgement of the client. More importantly, it avoids deadlocks.
Such deadlocks can happen as follows. Suppose the server would send the state-changed-notification synchronously (please note that this is a somewhat constructed example). When the client handles the notification, the client needs to synchronously ask the server for information. However, the server cannot respond, because he is waiting for an answer to his question.
Now, this deadlock is avoided by sending the notification asynchronously, but this introduces another problem. When asynchronous calls are made more rapidly than they can be processed, the call queue keeps growing. If this situation is maintained long enough, the call queue will get totally full (flooded with messages). My question is: what can be done when that happens?
My problem can be summarized as follows. Do I really have to choose between sending notifications without blocking at the risk of flooding the message queue, or blocking when sending notifications at the risk of introducing a deadlock? Is there some trick to avoid flooding the message queue?
Note: To repeat, the server does not stall when sending notifications. They are sent asynchronously.
Note: In my example I used two communicating processes, but the same problem exists with two communicating threads.
If the server is sending informational messages to the client, which you yourself say are asynchronous, it should not have to wait for a reply from the client. If they are not informational, in other words they require an answer, I would say a server should never send such messages to a client, and their presence indicates a poor design.
If you have a constant congestion problem, there is little you can do other than gracefully fail and notify the client that no new messages can be posted; then it is up to the client to maintain a backlog of messages to be posted.
Introducing a priority queue and using message expiration/filtering could allow you to free up space in the queue, but that really just postpones the problem. If possible, you could also aggregate messages or ignore duplicate messages, but again the problem does not seem to be the queue itself. (Not to mention that the more complex queue logic could eat up valuable resources that would be better used actually processing messages.)
Depending on what the server side does, you could introduce result hashing for long computations, offload some types of messages to a dedicated device, check if the server waits unreasonably long for I/O operations, and a myriad of other techniques. Profile if possible, at least try to find out which message(s) causes congestion.
Oh, and the business solution: Compare cost of estimated development time to the cost of better hardware and conclude that you should just buy a more powerful server (or an additional one).
Depending on how important these messages are you might want to look into Message Expiration, or perhaps a Message Filter, though it sounds like your architecture may be incorrect.
I would rather fix the logic in the server side. The message queue should not stall waiting for the answer. Rather have a state machine which can also receive those info queries while it is waiting for the answer from the client.
Of course you can still flood your message queue, but with TCP you can handle it pretty easily.
The best way, I believe, would be to add another state to your client. This I borrowed from the SMPP protocol specs.
Add a congestion state to the client, whereby it always checks the queue length, assuming this is possible, and therefore once a certain threshold is attained, say 1000 unprocessed messages, the client sends the server a message indicating that it's congested and the server will be required to cease all messaging until it receives a notification indicating that the client is no longer congested.
Alternatively, on the server side, if there is a certain number of pending replies, the server could simply cease sending messages until the client replies a certain number of them.
These thresholds can be dynamically calculated or fixed, depending.....

Resources