Linux raw datalink layer socket only returns partial packet (96 bytes) - linux

In my application, I am receiving packets at the data link layer using a raw socket (type PF_PACKET, SOCK_RAW). What I am finding is that I only get the first 96 bytes of any packet. I'm assuming there is some option somewhere that is preventing me from receiving the entire packet, but what?
Here is a snipped from my code:
int sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP));
int nBytesRead = read(sock, (char *) buf, 1500);
int nFlags = fcntl(m_sock, F_GETFL, 0); // make it non-blocking
fcntl(sock, F_SETFL, nFlags | O_NONBLOCK);
nBytesRead is never more than 96, even though my network sniffer shows longer packets. This is uClinux if that makes a difference.
I found someone else with the same problem at http://www.network-builders.com/raw-socket-captures-only-first-96-bytes-packet-t57283.html but no answers there.

Solved it! What I failed to mention in my original post was that I was attaching a filter to the raw socket so it would only receive traffic on certain TCP/IP ports. This filter code was created with TCPDUMP, which apprently limits capture to 96 bytes by default. I had to add the -s0 option to my TCPDUMP command line to tell it to capture everything:
tcpdump -dd -s0 "ip and tcp and dst port 60001".
With that change, it now gives me the full packet. Thanks to this blog post for the clue.
Hope this helps someone else in the future.

Related

How to set linux kernel not to send RST_ACK, so that I can give SYN_ACK within raw socket

I want to ask a classic question about raw socket programming and linux kernel TCP handling. I've done the research to some same threads like linux raw socket programming question, How to reproduce TCP protocol 3-way handshake with raw sockets correctly?, and TCP ACK spoofing, but still can't get the solution.
I try to make a server which don't listen to any port, but sniff SYN packets from remote hosts. After the server do some calculation, it will send back a SYN_ACK packet to corresponding SYN packet, so that I can create TCP Connection manually, without including kernel's operation. I've create raw socket and send the SYN_ACK over it, but the packet cannot get through to the remote host. When I tcpdump on the server (Ubuntu Server 10.04) and wireshark on client (windows 7), the server returns RST_ACK instead of my SYN_ACK packet. After doing some research, I got information that we cannot preempt kernel's TCP handling.
Is there still any other ways to hack or set the kernel not to responds RST_ACK to those packets?
I've added a firewall to local ip of server to tell the kernel that maybe there's something behind the firewall which is waiting for the packet, but still no luck
Did you try to drop RST using iptables?
iptables -A OUTPUT -p tcp --tcp-flags RST RST -j DROP
should do the job for you.
I recommend using ip tables, but since you ask about hacking the kernel as well, here is an explanation of how you could do that (I'm using kernel 4.1.20 as reference):
When a packet is received (a sk_buff), the IP protocol handler will send it to the networking protocol registered:
static int ip_local_deliver_finish(struct sock *sk, struct sk_buff *skb)
{
...
ipprot = rcu_dereference(inet_protos[protocol]);
if (ipprot) {
...
ret = ipprot->handler(skb);
Assuming the protocol is TCP, the handler is tcp_v4_rcv:
static const struct net_protocol tcp_protocol = {
.early_demux = tcp_v4_early_demux,
.handler = tcp_v4_rcv,
.err_handler = tcp_v4_err,
.no_policy = 1,
.netns_ok = 1,
.icmp_strict_tag_validation = 1,
};
So tcp_v4_cv is called. It will try to find the socket for the skb received, and if it doesn't, it will send reset:
int tcp_v4_rcv(struct sk_buff *skb)
{
sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest);
if (!sk)
goto no_tcp_socket;
no_tcp_socket:
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
goto discard_it;
tcp_v4_send_reset(NULL, skb);
...
There are many different ways you can hack this. You could go to the xfrm4_policy_check function and hack/change the policy for AF_INET. Or you can just simply comment out the line that calls xfrm4_policy_check, so that the code will always go to discard_it, or you can just comment out the line that calls tcp_v4_send_reset (which will have more consequences, though).
Hope this helps.

Linux app sends UDP without socket

fellow coders.
I'm monitoring my outgoing traffic using libnetfilter_queue module and an iptables rule
ipatbles -I OUTPUT 1 -p all -j NFQUEUE --queue-num 11220
A certain app, called Jitsi (which runs on Java) is exhibiting such a strange behaviour a haven't encountered before:
My monitoring program which process NFQUEUE packets clearly shows that UDP packets are being sent out,
yet when I look into:
"/proc/net/udp" and "/proc/net/udp6" they are empty, moreover "/proc/net/protocols" has a column "sockets" for UDP and it is 0.
But the UDP packets keep getting sent.
Then after a minute or so, "/proc/net/udp" and "/proc/net/protocols" begin to show the correct information about UDP packets.
And again after a while there is no information in them while the UDP packets are being sent.
My only conclusion is that somehow it is possible for an application to send UDP packets without creating a socket and/or it is possible create a socket, then delete it (so that kernel thinks there are none) and still use some obscure method to send packets outside.
Could somebody with ideas about such behaviour land a hand, please?
Two ideas:
Try running the app through strace and take a look at that output.
You could also try to run it through systemtap with a filter for the socket operations.
From that link:
probe kernel.function("*#net/socket.c").call {
printf ("%s -> %s\n", thread_indent(1), probefunc())
}
probe kernel.function("*#net/socket.c").return {
printf ("%s <- %s\n", thread_indent(-1), probefunc())
}
Thank you Paul Rubel for giving me a hint in the right direction. strace showed that Java app was using IPv6 sockets. I had a closer look at /proc/net/udp6 and there those sockets were. I probably had too cursory a view the first time around chiefly because I didn't even expect to find them there. This is the first time I stumbled upon IPv4 packets over IPv6 sockets. But that is what Java does.
Cheers.

Discard incoming UDP packet without reading

In some cases, I'd like to explicitly discard packets waiting on the socket with as little overhead as possible. It seems there's no explicit "drop udp buffer" system call, but maybe I'm wrong?
The next best way would be probably to recv the packet to a temporary buffer and just drop it. It seems I can't receive 0 bytes, since man says about recv: The return value will be 0 when the peer has performed an orderly shutdown. So 1 is the minimum in this case.
Is there any other way to handle this?
Just in case - this is not a premature optimisation. The only thing this server is doing is forwarding / dispatching the UDP packets in a specific way - although recv with len=1 won't kill me, I'd rather just discard the whole queue in one go with some more specific function (hopefully lowering the latency).
You can have the kernel discard your UDP packets by setting the UDP receive buffer to 0.
int UdpBufSize = 0;
socklen_t optlen = sizeof(UdpBufSize);
setsockopt(socket, SOL_SOCKET, SO_RCVBUF, &UdpBufSize, optlen);
Whenever you see fit to receive packets, you can then set the buffer to, for example, 4096 bytes.
I'd rather just discard the whole queue in one go
Since this is UDP we are talking here: close(udp_server_socket) and socket()/bind() again?
To my understanding should work.
man says about recv: the return value will be 0 when the peer has performed an orderly shutdown.
That doesn't apply to UDP. There is no "connection" to shut down in UDP. A return value of 0 is perfectly valid, it just means datagram with no payload was received (i.e. the IP and UDP headers only.)
Not sure that helps your problem or not. I really don't understand where you are going with the len=1 stuff.

sendto on Tru64 is returning ENOBUF

I am currently running an old system on Tru64 which involves lots of UDP sockets using the sendto() function. The sockets are used in our code to send messages to/from various processes and then eventually on to a thick client app that is connected remotely. Occasionally the socket to the thick client gets stuck, this can cause some of these messages to get built up. My question is how can I determine the current buffer size, and how do I determine the maximum message buffer. The code below gives a snippet of how I set up the port and use the sendto function.
/* need to adjust the maximum size we can send on this */
/* as it needs to be able to cope with the biggest */
/* messages we send */
lenlen = sizeof(len) ;
/* allow double for when the system is under load */
int lenlen, len ;
lenlen = sizeof(len) ;
len = 2 * 32000;
msg_socket = socket( AF_UNIX,SOCK_DGRAM, 0);
result = setsockopt(msg_socket, SOL_SOCKET, SO_SNDBUF, (char *)&len, lenlen) ;
result = sendto( msg_socket,
(char *)message,
(int)message_len,
flags,
dest_addr,
addrlen);
Note. We have ported this application to Linux and the problem does not seem to appear there.
Any help would be greatly appreciated.
Regards
UDP send buffer size is different from TCP - it just limits the size of the datagram. Quoting Stevens UNP Vol. 1:
...
A UDP socket has a send buffer size (which we can change with SO_SNDBUF socket option, Section 7.5), but this is simply an upper limit on the maximum-sized UDP datagram that can be written to the socket. If an application writes a datagram larger than the socket send buffer size, EMSGSIZE is returned. Since UDP is unreliable, it does not need to keep a copy of the application's data and does not need an actual send buffer. (The application data is normally copied into a kernel buffer of some form as it passes down the protocol stack, but this copy is discarded by the datalink layer after the data is transmitted.)
UDP simply prepends 8-byte header and passes the datagram to IP. IPv4 or IPv6 prepends its header, determines the outgoing interface by performing the routing function, and then either adds the datagram to the datalink output queue (if it fits within the MTU) or fragments the datagram and adds each fragment to the datalink output queue. If a UDO application sends large datagrams (say 2,000-byte datagrams), there's a much higher probability of fragmentation than with TCP. because TCP breaks the application data into MSS-sized chunks, something that has no counterpart in UDP.
The successful return from write to a UDP socket tells us that either the datagram or all fragments of the datagram have been added to the datalink output queue. If there is no room on the queue for the datagram or one of its fragments, ENOBUFS is often returned to the application.
Unfortunately, some implementations do not return this error, giving the application no indication that the datagram was discarded without even being transmitted.
The last footnote needs attention - but it looks like Tru64 has this error code listed in the manual page.
The proper way of doing it though is to queue your outstanding messages in the application itself and to carefully check return values and the errno after each system call. This still does not guarantee delivery (since UDP receivers might drop the packets without any notice to the senders). Check the UDP packet discard counters with netstat -s on both/all sides, see if they are growing. There is really no way around this besides switching to TCP or implementing your own timeout/ack and re-transmission logic.
You should probably be using some sort of congestion control to avoid overloading the network. By far the easiest way to do this is to use TCP instead of UDP.
It fails less often on Linux because UDP sockets wait for space in the local network interface queue on Linux (unless you set them non-blocking). However, with any operating system, if the overfull queue is not in the local system, the packet will be dropped silently.

linux raw socket programming question

I am trying to create a raw socket which send and receive message with ip/tcp header under linux.
I can successfully binds to a port and receive tcp message(ie:syn)
However, the message seems to be handled by the os, but not mine. I am just a reader of it(like wireshark).
My raw socket binds to port 8888, and then i try to telnet to that port .
In wireshark, it shows that the port 8888 reply a "rst ack" when it receive the "syn" request. In my program, it shows that it receive a new message and it doesnot reply with any message.
Any way to actually binds to that port?(prevent os handle it)
Here is part of my code, i try to cut those error checking for easy reading
sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_TCP);
int tmp = 1;
const int *val = &tmp;
setsockopt (sockfd, IPPROTO_IP, IP_HDRINCL, val, sizeof (tmp));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
servaddr.sin_port = htons(8888);
bind(sockfd, (struct sockaddr*)&servaddr, sizeof(servaddr));
//call recv in loop
When your kernel receives a SYN/ACK from the remote host, it finds no record of it having sent a SYN to that IP:PORT combination (which was sent from your raw socket) which is why it assumes that there has been an error and sends a RST to the remote host. This problem can be solved by setting up an IP filter that blocks all TCP traffic on that port (Check the iptables manpage for this). That way you don't have to program in kernel space nor will there be any affect on already existing kernel TCP modules.
man 7 raw says:
Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case the packets are passed to both the kernel module and the raw socket(s).
I take this to mean that you can't "do TCP" on a raw socket without interference from the kernel unless your kernel lacks TCP support -- which, of course, isn't something you want. What raw sockets are good for is implementing other IP protocols that the kernel doesn't handle, or for special applications like sending crafted ICMP packets.
To access raw headers you dont bind a raw socket to a port. Thats not done.
Simply write a sniffer , to "PICK UP" all incoming packets and find out "YOUR" ones. That will also give you access to all of the packets content etc.
This is how you do it :
int sock_raw = socket( AF_PACKET , SOCK_RAW , htons(ETH_P_ALL)) ;
while(true)
{
saddr_size = sizeof saddr;
//Receive a packet
data_size = recvfrom(sock_raw , buffer , 65536 , 0 , &saddr , (socklen_t*)&saddr_size);
if(data_size <0 )
{
printf("Recvfrom error , failed to get packets\n");
return 1;
}
//Now process the packet
ProcessPacket(buffer , data_size);
}
In the ProcessPacket function analyse the packet and see if they belong to your application.
Edit:
In case you intend to program raw sockets, check this.
It has a few examples of how to send and receive raw packets.
In case you want to use SOCK_STREAM and SOCK_SEQPACKET connection-oriented type sockets:
You need to tell it to listen after binding to a given address:port.
int connectionQueue = 10;
if ( -1 == listen(sockfd, connectionQueue) )
{
// Error occurred
}
Afterwards, you will need to verify the descriptor for incoming connections using select, and accept an incoming connection on either the server socket (which will lead to not accepting new connections), or a dedicated client socket.

Resources