I'll post as much as possible here and I'll add more info upon your requests in the comments
I have a mule flow that takes SNMP packets over UDP (Inbound UDP Endpoint) and then passes the message to a transformer that transforms the byte array (The packet) into an Trap object then if this trap of some specific kind I'll just log it and ignore it, otherwise, it will be logged and inserted into DB in addition to some updates (The figure below illustrates the my flow).
The application will be listening on port 17985 for coming SNMP traps from an agent, now if the agent have lot of traps the UDP socket's Recv-Q will be like in the figure below and the UDP endpoint will stop to log any events (Traps to DB nor Log messages to log file)
What did I do?
I tried to increase endpoint buffer size to be 10MB.
I tried to increase system buffer size to 25MB but no matter how much is it, it will be filled up as soon as the agent starts to get crazy
Additional Info
The agent may sometimes send up to 400~600 traps per second.
Trap's packet size is ~1500 bytes max.
The database is very fast, no more optimization is needed there I think.
Related
I'm using TCP over a very lossy network system (almost 20% drop rate) but one with extremely low latency (<2ms). And the default TCP implementation on our Linux system is atrocious. Sometimes it waits 5-6 seconds before re-transmitting packets. On the other side, our TCP stack just retries every 20ms, and it's fine.
I can't find any way to manually incur a re-transmission, even with TCP_NODELAY and sending no data aggressively. Also, there do not appear to be per-socket configurations for this. As we only want to change the timing for specific sockets (the ones on this network).
Is there any kernel feature for either manually re-transmitting with TCP or aggressively set the timers such to allow many retries per second?
(I am aware this is a similar (but not the same)) issue as the person in this thread: Application control of TCP retransmission on Linux -- but I do not want to close the connection, like TCP_USER_TIMEOUT just make it keep retrying, a lot.
TCP_NODELAY is an entirely unrelated options - it disables Nagle's algorithm for consolidating packets - it is used on interactive connection such as ssh where it is desirable to have every single character in a separate packet (to get an immediate remote echo).
Normally, the TCP layer will continuously monitor the RTT (Round-Trip Time) for receiving an ACK and will readjust its timeout. This is called Karn's Algorithm.
AFAIK, There are no tunables in the Linux kernel when it comes to retransmissions because this is something that is specified by the RFCs and it is embedded in the protocol.
I suggest you read this introduction: https://www.catchpoint.com/blog/tcp-rtt and then to capture a packet dump containing those exceptionally long retransmissions and post them here.
Also, are you sure that the packet loss is the same in both directions? This can be an explanation.
I did some search on the question, but it seems like people only emphasize on Non-blocking IO.
Let's say if I just have a very simple application to respond "Hello World" text to the client, it still needs time to finish the execution, no matter how quick it is. What if there are two request coming in at exactly the same time, how does Node.js make sure both requests will be processed with one thread?
I read the blog Understanding the node.js event loop which says "Of course, on the backend, there are threads and processes for DB access and process execution". That statement is regarding IO, but I also wonder if there is separate thread to handle the request queue. If that's the case, can I say that the Node.js single thread concept only applies to the developers who build applications on Node.js, but Node.js is actually running on multi-threads behind the scene?
The operating system gives each socket connection a send and receive queue. That is where the bytes sit until something at the application layer handles them. If the receive queue fills up no connected client can send information until there is space available in the queue. This is why an application should handle requests as fast as possible.
If you are on a *nix system you can use netstat to view the current number of bytes in the send and receive queues. In this example, there are 0 bytes in the receive queue and 240 bytes in the send queue (waiting to be sent out by the OS).
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 240 x.x.x.x:22 x.x.x.x:* LISTEN
On Linux you can check the default size and max allowed size of the send/receive queues with the proc file system:
Receive:
cat /proc/sys/net/core/rmem_default
cat /proc/sys/net/core/rmem_max
Send:
cat /proc/sys/net/core/wmem_max
cat /proc/sys/net/core/wmem_default
My Linux application needs to receive a single UDP flow with modestly-sized packets (~1 KB) at a rate on the order of ~600,000 packets per second. My current implementation is naive: it has a single thread that simply calls recv() repeatedly, placing the received data in a queue to be processed by another thread. Therefore, the receiver thread is only in charge of pulling in the packets.
In some initial testing that I've done, I'm only able to receive between 200,000-300,000 packets per second before the thread reaches full utilization of its CPU core. This obviously isn't good enough to meet the goal of ~600,000 packets per second.
Ideally, I would find some way of distributing the packet reception load across multiple threads. In looking for a solution to the problem, I came across the SO_REUSEPORT socket option, which allows multiple TCP/UDP threads to be bound to the same IP/port combination. At first, this seemed to be exactly what I wanted.
However, the article also points out this detail:
Incoming connections and datagrams are distributed to the server sockets using a hash based on the 4-tuple of the connection—that is, the peer IP address and port plus the local IP address and port. This means, for example, that if a client uses the same socket to send a series of datagrams to the server port, then those datagrams will all be directed to the same receiving server (as long as it continues to exist). This eases the task of conducting stateful conversations between the client and server.
Therefore, if I only have a single UDP flow, the above hashing implementation would yield all of the packets being directed to the same receiver thread, thwarting my attempt at parallelizing the work. Therefore, the question is: is there a way to receive a single flow of UDP packets from multiple threads, using SO_REUSEPORT or some other mechanism?
Note that my application can handle reordering of packets; the protocol that the datagrams are formatted with contains sequencing information that I can use to reorder them properly afterward.
If you didn't find the solution for last 3 years take a look at SO_ATTACH_REUSEPORT_CBPF. We had exactly the same issue and we solved it by attaching simple BPF program which distributes datagrams randomly mod n.
I'm working on an application where I need to ensure that even if the network goes down, messages will still arrive at their destination reliably, in-order, and unmodified. I've been using TCP, and up until now, I was just using a strategy of:
If a send/receive fails, do it again until no error.
If the remote disconnects, wait until the next connection and replace the socket I was send/receiving from with this new one (achieved through some threading and blocking to ensure it's swapped cleanly).
I recently realised that this doesn't work, as send can't report errors indicating that the remote hasn't received the message (cite eg. here).
I did also learn that TCP connections can survive brief network outages, as the kernel buffers the packets until the connection is declared dead after the timeout period (cite.
here).
The question: Is it a feasible strategy to just crank the timeout period waaaay higher on both client/server side (using setsockopt and the SO_KEEPALIVE options), so that a connection "never times out"? I'd have to handle errors related to the kernel's buffer filling up, but that should be relatively simple.
Are there any other failure cases?
If both ends doesn't explicitly disconnect, the tcp connection will stay open forever even if you unplug the cable. There is no timeout in TCP.
However, I would use (or design) an application protocol on top of tcp, making it possible to resume data transmission after re-connects. You may use HTTP for example.
That would be much more stable because depending on buffers would, as you say, at some time exhaust the buffers but the buffers would also being lost on let's say a power outage.
I have some troubles understanding send (2) syscall on my linux x86 box.
Consider I established an SSH connection in my app with the other host in LAN. Then I put down the network (e.g. unplug the cable) and call the function (from my app) that sends some SSH packets trough the connection. This function inside calls send like
w = send(s->fd_out,buffer, len, 0);
In debugger I found that send returns len (i.e. w == len after the call).
How this can be if network is unreachable? When I call netstat it says my SSH connection is in state ESTABLISHED even though the network is down.
Can't understand why send executes normally and don't return any error (like EPIPE or ECONNRESET). May be an SSH connection lives some time after the network put down?
Thanks to all.
It's due to the implementation of TCP (and ssh uses TCP). Your send() just writes to a socket, which is just a file descriptor, and return means this operation is successful. It doesn't mean the data has been sent. A file descriptor is just some pointer with state for kernel after all. It's implemented in the kernel to keep TCP state a bit longer before failing a session. In fact, kernel is allowed to indefinitely keep this session until you explicitly call close() or kill your process. So your data is actually buffered in kernel space for network card to deliver it later.
Here is a quick experiment you can do:
Write a server that keeps receiving messages after establishing a connection
socket();
bind();
listen();
while (1) {
accept();
recv();
}
Write a client establishes a connection, takes cin inputs, and send a message to server whenever you hit return.
socket();
connect();
while (1) {
getline();
send();
}
Be careful that you NEVER call close() in while loop on either side. Now, if you unplug your cable AFTER you've established a connection, send a message, reconnect again, and send another message, you will find both messages on the server side.
What you will NEVER observe is that you receive the second message before the first one. You either lose them all, or receive them in order.
Now let me explain why it behaves like this. This is the state diagram of a TCP session.
https://dl.dropbox.com/u/17011409/TCP_State.png
You can see clearly that until you explicitly call close(), the connection will always be in established state. That's expected behavior of TCP. Establishing TCP connection is expensive, and keeping a session alive is good for performance. (That's partially how those TCP DOS works. Attackers keep establishing connections until server runs out of resources to keep TCP state information.)
In this state, your send() will be delegated to kernel for actual sending. TCP guarantees in-order, reliable delivery, but network can lose packets at any time. So TCP HAVE TO buffer your packets, and keep trying. There are algorithms to throttle this retry, but it's buffered for quite a very long time before it declares failure. The default time out to assume a packet loss is 3 seconds in Linux. But after a loss, TCP will retry. Then try again after certain seconds. The fact you unplugged your cable is just the same situation as a packet loss along the way to the destination. Once you plug in your cable again, a retry succeeds, and TCP will start sending remaining messages in order.
I know I must have failed to explain it thoroughly. You really need to know the details of TCP to reason about this behavior. It's required for the properties TCP is giving you. And it's not acceptable to expose internal implementations to programmer. (How about a send call that sometimes returns within milliseconds, and sometimes returns after 10 seconds? I bet no one will want this performance bomb in their code. The point of having a TCP library is exactly to hide this ugly nature of networks.) In fact, you even need to understand multiple RFCs and algorithms of how TCP realize in-order reliable delivery over a lossy network. Congestion control comes into the play of how long the buffer will be there as well. Wikipedia is a good starting point, but it's a full semester's undergraduate course if you really want to understand the details.
With a zero flags argument, send() is equivalent to write(2). And it will write your data on file descriptor (stores in kernel space to deliver).
You have to use other types of flag: MSG_CONFIRM may help you.