TCP or UDP? Delays building up on production for video stream - node.js

I am creating a video stream from a camera with FFMPEG and nodejs stream Duplex class.
this.ffmpegProcess = spawn('"ffmpeg"', [
'-i', '-',
'-loglevel', 'info',
/**
* MJPEG Stream
*/
'-map', '0:v',
'-c:v', 'mjpeg',
'-thread_type', 'frame', // suggested for performance on StackOverflow.
'-q:v', '20', // force quality of image. 2-31 when 2=best, 31=worst
'-r', '25', // force framerate
'-f', 'mjpeg',
`-`,
], {
shell: true,
detached: false,
});
On local network, we are testing it with a few computers and every thing works very stable with latency of maximum 2 seconds.
However, we have imported the service to AWS production, when we connect the first computer we have a latency of around 2 seconds, but if another client connects to the stream, they both start delaying a lot, where the latency builds up to 10+ seconds, and in addition the video is very slow, as in the motion between frames.
Now I asked about TCP or UDP is because we are using TCP for the stream, which means that every packet that is sent is implementing the TCP syn-ack-synack protocol and sequence.
My question is, could TCP really be causing such an issue that the delays get up to 10 seconds with very slow motion? because on local network it works very just fine.

Yes, TCP is definitely not correct protocol for it. Can be used but with some modification on the source side. UDP sadly is not a magic bullet, without additional logic UDP will not solve the problem either (if you don't care that you will see broken frames constructed randomly from other frames).
Explanation
The main features of the TCP are that packets are delivered in the correct order and all packets are delivered. These two features are very useful but quite harming for video streaming.
On the local network bandwidth is quite large and also packet loss is very low so TCP works fine. On the internet bandwidth is limited and every reach of the limit causes packet loss. TCP will deal with packet loss by resending the packet. Each resend will cause extending delay of the stream.
For better understanding try to imagine that a packet contains whole frame (JPEG image). Let's assume that normal delay of the link is 100ms (time in which frame will be in transit). For 25 FPS you need to deliver each frame every 40ms. If the frame is the lost in transit, TCP will ensure that copy will resend. TCP can detect this situation and fix in ideal situation in 2 times of delay - 2*100ms (in the reality it will be more, but for sake of simplicity I am keeping this). So during image loss 5 frames are waiting in the receiver queue and waits for single missing frame. In this example one missing frame causes 5 frames of delay. And because TCP creates queue of packets. delay will not be ever reduced, only can grow. In the ideal situation when bandwidth is enough delay will be still the same.
How to fix it
What I have done in the nodejs was fixing the source side. Packets in the TCP can be skipped only if the source will do it, TCP has no way how to do it itself.
For that purpose, I used event drain.
Idea behind the algorithm is that ffmpeg generates frame it's own speed. And node.js reads frames and always keep the last received. It has also outgoing buffer with size of a single frame. So if sending of single frame is delayed due to network conditions, incoming images from ffmpeg are silently discarded (this compensates low bandwidth) except the last received. So when TCP buffer signals (by drain event) that some frame was correctly sent, nodejs will take the last received image and write it to the stream.
This algorithm regulates itself. If sending bandwidth is enough (sending is faster then frames generated by ffmpeg) then no packet will be discarded and 25fps will be delivered. If bandwidth can transfer only half of frames in average one of two frames will be discarded so receiver will se 12.5fps but not growing delay.
Probably the most complicated part in this algorithm is correctly slicing byte stream to frames.

On either the server or a client, run a wireshark trace, this will tell you the exact "delay" of the packets, or which side is not doing the right thing. It sound's like the server is not living up to your expectations. Make sure the stream is UDP.

This might not be relevant for your use case, but this is where a HTTP based content delivery system, like HLS, is very useful.
UDP or TCP, trying to stream the data in real time across the internet is hard.
HLS will spit out files that can be downloaded over HTTP (which is actually quite efficient when serving large files, and ensures the integrity of the video as it's got built in error correction and so on), and re-assembled in order on the client side by the player.
It adapts easily to different quality streams (part of the protocol) which lets the client decide how quickly it consume the data.
As an aside, Netflix and the like use Widevine DRM which is somewhat similar - HTTP based delivery of the content, although widevine will take the different quality encodes, stuff it into one gigantic file and use HTTP range requests to switch between quality streams.

Related

When using recv(n), with n greather than the MTU are you guaranteed to read at least a whole layer 2 frame?

I was wondering, imagine if there is no data to read from a TCP socket, then a whole frame of 1492 bytes arrives (full). In your code (C or any language supporting TCP) you have let's say recv 4096 bytes, will the OS guarantee that the recv reads the whole 1492 bytes, or is it possible that the loading of the frame in memory and recv are "interleaved", so the recv may get less ?
TCP is a stream oriented protocol. Data are received in order but you must not do any assumption about how many times you have to call recv until you receive all your data.
It is up to your application to repeat the calls to recv until you know you have received what you need.
(1) TCP is stream-oriented protocol. This means that it accepts a stream of data from the upper layer on the sender and returns the stream of data to the upper layer on the receiver. TCP itself receives packets from IP layer, and then reconstructs the stream. That is at some points packets cease to exist. In theory it is possible that somewhere during this reconstructed stream, only half of the incomming packet is copied in buffer, but it seems to me pretty unlikely that this would happen.
Now, linux man page states
The receive calls normally return any data available up to the requested amount,
I would interpret it as "if one packet has arrived (correctly, in order, etc), you will get the whole packet worth of data". But there is no guarantee.
On the other hand Windows docs states:
recv will return as much data as is currently available—up to the size of the buffer specified.
Which sounds more like the guarantee.
Note, however, that the data will only be returned if the packet is received correctly, and it is next in-order packet (with next expected sequence numbers).
(2) Now, TCP layer works on complete packets. It is actually impossible for it to do interleaving or anything. Ethernet has a checksum, which cannot be computed unless the packet was received completely. Packets with incorrect Ethernet checksum should be filtered out by the network card. TCP also has a checksum which requires all packet data to compute. So, if the network card has passed the packet to your OS, then data should be available.
(3) I don't think you can assume that if the packet is received, it is immediatelly available. A pretty common feature of network cards is TCP segmentation offload, which reconstructs part of the stream and results in network card passing one TCP packet that was reconstructed from multiple TCP packets. There are other things that can be in place to reduce the number of interrupts, which more or less result in several packets comming at once. So, the more likely situation is that you will have maybe some delay and then receive data from several packets at once.
The point is, the opposite of what you described is likely to happen. However, I still would not write an application that makes any assumptions about how large a chunk of data is available at a time. This negates the concept of a stream.

Why TCP/IP speed depends on the size of sending data?

When I sent small data (16 bytes and 128 bytes) continuously (use a 100-time loop without any inserted delay), the throughput of TCP_NODELAY setting seems not as good as normal setting. Additionally, TCP-slow-start appeared to affect the transmission in the beginning.
The reason is that I want to control a device from PC via Ethernet. The processing time of this device is around several microseconds, but the huge latency of sending command affected the entire system. Could you share me some ways to solve this problem? Thanks in advance.
Last time, I measured the transfer performance between a Windows-PC and a Linux embedded board. To verify the TCP_NODELAY, I setup a system with two Linux PCs connecting directly with each other, i.e. Linux PC <--> Router <--> Linux PC. The router was only used for two PCs.
The performance without TCP_NODELAY is shown as follows. It is easy to see that the throughput increased significantly when data size >= 64 KB. Additionally, when data size = 16 B, sometimes the received time dropped until 4.2 us. Do you have any idea of this observation?
The performance with TCP_NODELAY seems unchanged, as shown below.
The full code can be found in https://www.dropbox.com/s/bupcd9yws5m5hfs/tcpip_code.zip?dl=0
Please share with me your thinking. Thanks in advance.
I am doing socket programming to transfer a binary file between a Windows 10 PC and a Linux embedded board. The socket library are winsock2.h and sys/socket.h for Windows and Linux, respectively. The binary file is copied to an array in Windows before sending, and the received data are stored in an array in Linux.
Windows: socket_send(sockfd, &SOPF->array[0], n);
Linux: socket_recv(&SOPF->array[0], connfd);
I could receive all data properly. However, it seems to me that the transfer time depends on the size of sending data. When data size is small, the received throughput is quite low, as shown below.
Could you please shown me some documents explaining this problem? Thank you in advance.
To establish a tcp connection, you need a 3-way handshake: SYN, SYN-ACK, ACK. Then the sender will start to send some data. How much depends on the initial congestion window (configurable on linux, don't know on windows). As long as the sender receives timely ACKs, it will continue to send, as long as the receivers advertised window has the space (use socket option SO_RCVBUF to set). Finally, to close the connection also requires a FIN, FIN-ACK, ACK.
So my best guess without more information is that the overhead of setting up and tearing down the TCP connection has a huge affect on the overhead of sending a small number of bytes. Nagle's algorithm (disabled with TCP_NODELAY) shouldn't have much affect as long as the writer is effectively writing quickly. It only prevents sending less than full MSS segements, which should increase transfer efficiency in this case, where the sender is simply sending data as fast as possible. The only effect I can see is that the final less than full MSS segment might need to wait for an ACK, which again would have more impact on the short transfers as compared to the longer transfers.
To illustrate this, I sent one byte using netcat (nc) on my loopback interface (which isn't a physical interface, and hence the bandwidth is "infinite"):
$ nc -l 127.0.0.1 8888 >/dev/null &
[1] 13286
$ head -c 1 /dev/zero | nc 127.0.0.1 8888 >/dev/null
And here is a network capture in wireshark:
It took a total of 237 microseconds to send one byte, which is a measly 4.2KB/second. I think you can guess that if I sent 2 bytes, it would take essentially the same amount of time for an effective rate of 8.2KB/second, a 100% improvement!
The best way to diagnose performance problems in networks is to get a network capture and analyze it.
When you make your test with a significative amount of data, for example your bigger test (512Mib, 536 millions bytes), the following happens.
The data is sent by TCP layer, breaking them in segments of a certain length. Let assume segments of 1460 bytes, so there will be about 367,000 segments.
For every segment transmitted there is a overhead (control and management added data to ensure good transmission): in your setup, there are 20 bytes for TCP, 20 for IP, and 16 for ethernet, for a total of 56 bytes every segment. Please note that this number is the minimum, not accounting the ethernet preamble for example; moreover sometimes IP and TCP overhead can be bigger because optional fields.
Well, 56 bytes for every segment (367,000 segments!) means that when you transmit 512Mib, you also transmit 56*367,000 = 20M bytes on the line. The total number of bytes becomes 536+20 = 556 millions of bytes, or 4.448 millions of bits. If you divide this number of bits by the time elapsed, 4.6 seconds, you get a bitrate of 966 megabits per second, which is higher than what you calculated not taking in account the overhead.
From the above calculus, it seems that your ethernet is a gigabit. It's maximum transfer rate should be 1,000 megabits per second and you are getting really near to it. The rest of the time is due to more overhead we didn't account for, and some latencies that are always present and tend to be cancelled as more data is transferred (but they will never be defeated completely).
I would say that your setup is ok. But this is for big data transfers. As the size of the transfer decreases, the overhead in the data, latencies of the protocol and other nice things get more and more important. For example, if you transmit 16 bytes in 165 microseconds (first of your tests), the result is 0.78 Mbps; if it took 4.2 us, about 40 times less, the bitrate would be about 31 Mbps (40 times bigger). These numbers are lower than expected.
In reality, you don't transmit 16 bytes, you transmit at least 16+56 = 72 bytes, which is 4.5 times more, so the real transfer rate of the link is also bigger. But, you see, transmitting 16 bytes on a TCP/IP link is the same as measuring the flow rate of an empty acqueduct by dropping some tears of water in it: the tears get lost before they reach the other end. This is because TCP/IP and ethernet are designed to carry much more data, with reliability.
Comments and answers in this page point out many of those mechanisms that trade bitrate and reactivity for reliability: the 3-way TCP handshake, the Nagle algorithm, checksums and other overhead, and so on.
Given the design of TCP+IP and ethernet, it is very normal that, for little data, performances are not optimal. From your tests you see that the transfer rate climbs steeply when the data size reaches 64Kbytes. This is not a coincidence.
From a comment you leaved above, it seems that you are looking for a low-latency communication, instead than one with big bandwidth. It is a common mistake to confuse different kind of performances. Moreover, in respect to this, I must say that TCP/IP and ethernet are completely non-deterministic. They are quick, of course, but nobody can say how much because there are too many layers in between. Even in your simple setup, if a single packet get lost or corrupted, you can expect delays of seconds, not microseconds.
If you really want something with low latency, you should use something else, for example a CAN. Its design is exactly what you want: it transmits little data with high speed, low latency, deterministic time (just microseconds after you transmitted a packet, you know if it has been received or not. To be more precise: exactly at the end of the transmission of a packet you know if it reached the destination or not).
TCP sockets typically have a buffer size internally. In many implementations, it will wait a little bit of time before sending a packet to see if it can fill up the remaining space in the buffer before sending. This is called Nagle's algorithm. I assume that the times you report above are not due to overhead in the TCP packet, but due to the fact that the TCP waits for you to queue up more data before actually sending.
Most socket implementations therefore have a parameter or function called something like TcpNoDelay which can be false (default) or true. I would try messing with that and seeing if that affects your throughput. Essentially these flags will enable/disable Nagle's algorithm.

Finding out the number of dropped packets in raw sockets

I am developing a program that sniffs network packets using a raw socket (AF_PACKET, SOCK_RAW) and processes them in some way.
I am not sure whether my program runs fast enough and succeeds to capture all packets on the socket. I am worried that the recieve buffer for this socket occainally gets full (due to traffic bursts) and some packets are dropped.
How do I know if packets were dropped due to lack of space in the
socket's receive buffer?
I have tried running ss -f link -nlp.
This outputs the number of bytes that are currently stored in the revice buffer for that socket, but I can not tell if any packets were dropped.
I am using Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-52-generic x86_64).
Thanks.
I was having a similar problem as you. I knew that tcpdump was able to to generate statistics about packet drops, so I tried to figure out how it did that. By looking at the code of tcpdump, I noticed that it is not generating those statistic by itself, but that it is using the libpcap library to get those statistics. The libpcap is on the other hand getting those statistics by accessing the if_packet.h header and calling the PACKET_STATISTICS socket option (at least I think so, but I'm no C expert).
Therefore, I saw only two solutions to the problem:
I had to interact somehow with the linux header files from my Pyhton script to get the packet statistics, which seemed a bit complicated.
Use the Python version of libpcap which is pypcap to get those information.
Since I had no clue how to do the first thing, I implemented the second option. Here is an example how to get packet statistics using pypcap and how to get the packet data using dpkg:
import pcap
import dpkt
import socket
pc=pcap.pcap(name="eth0", timeout_ms=10000, immediate=True)
def packet_handler(ts,pkt):
#printing packet statistic (packets received, packets dropped, packets dropped by interface
print pc.stats()
#example packet parsing using dpkt
eth=dpkt.ethernet.Ethernet(pkt)
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip =eth.data
layer4=ip.data
ipsrc=socket.inet_ntoa(ip.src)
ipdst=socket.inet_ntoa(ip.dst)
pc.loop(0,packet_handler)
tpacket_stats structure is defined in linux/packet.h header file
Create variable using the tpacket_stats structre and pass it to getSockOpt with PACKET_STATISTICS SOL_SOCKET options will give packets received and dropped count.
-- some times drop can be due to buffer size
-- so if you want to decrease the drop count check increasing the buffersize using setsockopt function
First off, switch your operating system.
You need a reliable, network oriented operating system. Not some pink fluffy "ease of use" with "security" functionality enabled. NetBSD or Gentoo/ArchLinux (the bare installations, not the GUI kitted ones).
Start a simultaneous tcpdump on a network tap and capture the traffic you're supposed to receive along side of your program and compare the results.
There's no efficient way to check if you've received all the packets you intended to on the receiving end since the packets might be dropped on a lower level than you anticipate.
Also this is a question for Unix # StackOverflow, there's no programming here what I can see, at least there's no code.
The only certain way to verify packet drops is to have a much more beefy sender (perhaps a farm of machines that send packets) to a single client, record every packet sent to your reciever. Have the statistical data analyzed and compared against your senders and see how much you dropped.
The cheaper way is to buy a network tap or even more ad-hoc enable port mirroring in your switch if possible. This enables you to dump as much traffic as possible into a second machine.
This will give you a more accurate result because your application machine will be busy as it is taking care of incoming traffic and processing it.
Further more, this is why network taps are effective because they split the communication up into two channels, the receiving and sending directions of your traffic if you will. This enables you to capture traffic on two separate machines (also using tcpdump, but instead of a mirrored port, you get a more accurate traffic mirroring).
So either use port mirroring
Or you buy one of these:

Peer to Peer Game Networking

I have made a game and have roughly implemented P2P networking. I am sending packets currently every 20th of a second. At the moment i am sending one packet for each NPC telling the client its current position, so if i have 20 NPCs that is 20 packets being sent every 20th of a second.
My question is should i have one packet sent every 20th of a second containing all of the current NPCS position ? And if so is there a max size this packet should be ? And also any sources on game peer to peer networking are welcomed.
That depends on which protocol you're using, UDP or TCP. In gaming, especially FPSs, UDP tends to be the chosen protocol because it's much less "chatty" than TCP. The trade off is that UDP can lose packets because delivery is not guaranteed.
With that in mind you should be sending all position data with each single packet. That way, if you lose a packet here or there, the game can go on and you will mostly be synced.
For the packet size, keep it as small as possible. You definitely want to stay away from needing multiple packets for a single update, especially with UDP. Your data set does not sound like it would be large but I'm just guessing.

Web Audio Api Realtime streaming PCM ADPCM

I have a server that passes the client PCM or ADPCM data.
I initially decided to use PCM because I did not want to deal with encoding and decoding.
I got PCM to work however between each chunk of audio I heard glitches.(Sort of like clipping)
So I thought maybe the reason is latency/high quality audio and all that stuff.
So I decided to use ADPCM to reduce the data amount. I wrote a adpcm to pcm decoder in javascript. It was a hassle. I was hoping that since the data count reduced maybe that would stop the glitches(data would catch up with what is being played)
But I was wrong. I still get the glitches.
Can this even be done with TCP ? Or is it a lost cause. I dont have UDP over websockets.
Do I need to implement a buffering algorithm ? I don't want to do this as it is real time audio and i just want to process it as fast as I can.
Do you guys know a good link to read about real time audio over the web.
I can give code example but this is a high level question.
PS: I tried to use tabs but we get a buffering issue and we cant control it.
I also dont get any flow control from the server. It does not say that Audio starter or audio stopped our paused.
It is a push protocol and All I get is ADPCM and PCM data
Yes, of course you can use TCP. UDP is often used in telephony applications as the lower overhead makes everything faster, and for this application it doesn't matter if packets are dropped or arrive in the wrong order. But since UDP isn't an option, you can use TCP.
As you have suspected, it seems to me that your problem is buffer underruns. Either your connection to the server is not fast enough (or at least consistently fast enough), or you are not providing data from the encoder at a fast enough rate. This can happen if you are recording data in real time, and trying to play it back in real time.
A solution is to buffer data server-side before sending it to the client. Have as large of buffer as your latency requirements allow. For internet radio purposes, I usually pick a 30-second buffer, as latency doesn't matter. For your purposes, you will probably want a buffer of at least 64KB. This is the maximum size allowed in a TCP packet. This packet will get fragmented along the way, but that is okay.
You might also look into how your server is sending data. Experiment with disabling the Nagle algorithm so that your server isn't waiting for ACKs before sending more data.

Resources