linux raw socket programming question - linux

I am trying to create a raw socket which send and receive message with ip/tcp header under linux.
I can successfully binds to a port and receive tcp message(ie:syn)
However, the message seems to be handled by the os, but not mine. I am just a reader of it(like wireshark).
My raw socket binds to port 8888, and then i try to telnet to that port .
In wireshark, it shows that the port 8888 reply a "rst ack" when it receive the "syn" request. In my program, it shows that it receive a new message and it doesnot reply with any message.
Any way to actually binds to that port?(prevent os handle it)
Here is part of my code, i try to cut those error checking for easy reading
sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_TCP);
int tmp = 1;
const int *val = &tmp;
setsockopt (sockfd, IPPROTO_IP, IP_HDRINCL, val, sizeof (tmp));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
servaddr.sin_port = htons(8888);
bind(sockfd, (struct sockaddr*)&servaddr, sizeof(servaddr));
//call recv in loop

When your kernel receives a SYN/ACK from the remote host, it finds no record of it having sent a SYN to that IP:PORT combination (which was sent from your raw socket) which is why it assumes that there has been an error and sends a RST to the remote host. This problem can be solved by setting up an IP filter that blocks all TCP traffic on that port (Check the iptables manpage for this). That way you don't have to program in kernel space nor will there be any affect on already existing kernel TCP modules.

man 7 raw says:
Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case the packets are passed to both the kernel module and the raw socket(s).
I take this to mean that you can't "do TCP" on a raw socket without interference from the kernel unless your kernel lacks TCP support -- which, of course, isn't something you want. What raw sockets are good for is implementing other IP protocols that the kernel doesn't handle, or for special applications like sending crafted ICMP packets.

To access raw headers you dont bind a raw socket to a port. Thats not done.
Simply write a sniffer , to "PICK UP" all incoming packets and find out "YOUR" ones. That will also give you access to all of the packets content etc.
This is how you do it :
int sock_raw = socket( AF_PACKET , SOCK_RAW , htons(ETH_P_ALL)) ;
while(true)
{
saddr_size = sizeof saddr;
//Receive a packet
data_size = recvfrom(sock_raw , buffer , 65536 , 0 , &saddr , (socklen_t*)&saddr_size);
if(data_size <0 )
{
printf("Recvfrom error , failed to get packets\n");
return 1;
}
//Now process the packet
ProcessPacket(buffer , data_size);
}
In the ProcessPacket function analyse the packet and see if they belong to your application.

Edit:
In case you intend to program raw sockets, check this.
It has a few examples of how to send and receive raw packets.
In case you want to use SOCK_STREAM and SOCK_SEQPACKET connection-oriented type sockets:
You need to tell it to listen after binding to a given address:port.
int connectionQueue = 10;
if ( -1 == listen(sockfd, connectionQueue) )
{
// Error occurred
}
Afterwards, you will need to verify the descriptor for incoming connections using select, and accept an incoming connection on either the server socket (which will lead to not accepting new connections), or a dedicated client socket.

Related

How to flush raw AF_PACKET socket to get correct filtered packets

sock = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &f, sizeof (f))
With this simple BPF/LPF attach code, when I try to receive packet on the socket, will get some wrong packets that doesn't match with the filter. Seems those packets got into the socket before I call setsockopt().
Seems like should first create the AF_PACKET SOCK_RAW socket, then attach the filter, then flush the socket to get rid of those wrong packets.
So the question is, how to flush those packet?
The "bug" you're describing is real and I've seen it at multiple companies in my career. There is something like an "oral tradition" around this bug that is passed from one network engineer to another. Here are the common fixes:
Just call recv on the socket until it is empty
Double-filter by filtering packets in usermode as well as using the bpf
Use the zero-bpf technique just like libpcap where you apply an empty bpf first, then empty the socket, and then apply the real bpf.
I've written about this problem extensively on my blog to try and codify the oral tradition around this bug into a concrete recommendation and best-practice.

Linux raw datalink layer socket only returns partial packet (96 bytes)

In my application, I am receiving packets at the data link layer using a raw socket (type PF_PACKET, SOCK_RAW). What I am finding is that I only get the first 96 bytes of any packet. I'm assuming there is some option somewhere that is preventing me from receiving the entire packet, but what?
Here is a snipped from my code:
int sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP));
int nBytesRead = read(sock, (char *) buf, 1500);
int nFlags = fcntl(m_sock, F_GETFL, 0); // make it non-blocking
fcntl(sock, F_SETFL, nFlags | O_NONBLOCK);
nBytesRead is never more than 96, even though my network sniffer shows longer packets. This is uClinux if that makes a difference.
I found someone else with the same problem at http://www.network-builders.com/raw-socket-captures-only-first-96-bytes-packet-t57283.html but no answers there.
Solved it! What I failed to mention in my original post was that I was attaching a filter to the raw socket so it would only receive traffic on certain TCP/IP ports. This filter code was created with TCPDUMP, which apprently limits capture to 96 bytes by default. I had to add the -s0 option to my TCPDUMP command line to tell it to capture everything:
tcpdump -dd -s0 "ip and tcp and dst port 60001".
With that change, it now gives me the full packet. Thanks to this blog post for the clue.
Hope this helps someone else in the future.

Receive all multicast ICMPv6 packets on Linux

I would like to receive all multicast IPv6 packets arriving on a certain interface, without resorting to operate on layer 2, if that is possible.
I open a socket for raw ICMPv6 packets, and receiving unicast packets dedicated for my machine works just fine. However many ICMPv6 packets are link-local multicast (e.g. neighbor solicitations). What's the right way to listen for all multicast traffic, including solicited-node multicast? Currently I try to add a multicast group with IPV6_ADD_MEMBERSHIP, but this does not seem to work. Here's my code:
/* open RAW socket to receive on */
if ((sockfd = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6)) < 0) {
perror("socket");
}
/* get device index */
memset(&if_idx, 0, sizeof(struct ifreq));
strncpy(if_idx.ifr_name, DEVNAME, IFNAMSIZ-1);
if (ioctl(sockfd, SIOCGIFINDEX, &if_idx) < 0) {
perror("SIOCGIFINDEX");
}
/* configure to receive all multicast packets on this interface */
memset(&mreq, 0, sizeof(struct ipv6_mreq));
inet_pton(AF_INET6, "ff02::", &mreq.ipv6mr_multiaddr);
mreq.ipv6mr_interface = if_idx.ifr_ifindex;
if (setsockopt(sockfd, SOL_SOCKET, IPV6_ADD_MEMBERSHIP, &mreq,
sizeof(struct ipv6_mreq)) < 0) {
perror("setsockopt");
}
What am I doing wrong? What I want must be possible somehow. I tried ff02:: and ff02::1:ff00:0 as groups, and the latter even made setsockopt fail. What's going on? Unfortunately there's very little documentation on IPv6 multicast programming.
Use SOL_IPV6 instead of SOL_SOCKET.
Test subscribing to ff80::1 and generating traffic with ping6 -I eth0 ff08::1.
This appears to be impossible after all. I am now using Linux's AF_PACKET socket type with "cooked" mode (SOCKET_DGRAM) to access the raw IPv6 packets with link-layer header and a BPF to filter out ICMPv6 ND packets - at least I won't need to deal with parsing the ethernet header this way, and I can possibly support other link-layer types more easily.

How to set linux kernel not to send RST_ACK, so that I can give SYN_ACK within raw socket

I want to ask a classic question about raw socket programming and linux kernel TCP handling. I've done the research to some same threads like linux raw socket programming question, How to reproduce TCP protocol 3-way handshake with raw sockets correctly?, and TCP ACK spoofing, but still can't get the solution.
I try to make a server which don't listen to any port, but sniff SYN packets from remote hosts. After the server do some calculation, it will send back a SYN_ACK packet to corresponding SYN packet, so that I can create TCP Connection manually, without including kernel's operation. I've create raw socket and send the SYN_ACK over it, but the packet cannot get through to the remote host. When I tcpdump on the server (Ubuntu Server 10.04) and wireshark on client (windows 7), the server returns RST_ACK instead of my SYN_ACK packet. After doing some research, I got information that we cannot preempt kernel's TCP handling.
Is there still any other ways to hack or set the kernel not to responds RST_ACK to those packets?
I've added a firewall to local ip of server to tell the kernel that maybe there's something behind the firewall which is waiting for the packet, but still no luck
Did you try to drop RST using iptables?
iptables -A OUTPUT -p tcp --tcp-flags RST RST -j DROP
should do the job for you.
I recommend using ip tables, but since you ask about hacking the kernel as well, here is an explanation of how you could do that (I'm using kernel 4.1.20 as reference):
When a packet is received (a sk_buff), the IP protocol handler will send it to the networking protocol registered:
static int ip_local_deliver_finish(struct sock *sk, struct sk_buff *skb)
{
...
ipprot = rcu_dereference(inet_protos[protocol]);
if (ipprot) {
...
ret = ipprot->handler(skb);
Assuming the protocol is TCP, the handler is tcp_v4_rcv:
static const struct net_protocol tcp_protocol = {
.early_demux = tcp_v4_early_demux,
.handler = tcp_v4_rcv,
.err_handler = tcp_v4_err,
.no_policy = 1,
.netns_ok = 1,
.icmp_strict_tag_validation = 1,
};
So tcp_v4_cv is called. It will try to find the socket for the skb received, and if it doesn't, it will send reset:
int tcp_v4_rcv(struct sk_buff *skb)
{
sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest);
if (!sk)
goto no_tcp_socket;
no_tcp_socket:
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
goto discard_it;
tcp_v4_send_reset(NULL, skb);
...
There are many different ways you can hack this. You could go to the xfrm4_policy_check function and hack/change the policy for AF_INET. Or you can just simply comment out the line that calls xfrm4_policy_check, so that the code will always go to discard_it, or you can just comment out the line that calls tcp_v4_send_reset (which will have more consequences, though).
Hope this helps.

sendto on Tru64 is returning ENOBUF

I am currently running an old system on Tru64 which involves lots of UDP sockets using the sendto() function. The sockets are used in our code to send messages to/from various processes and then eventually on to a thick client app that is connected remotely. Occasionally the socket to the thick client gets stuck, this can cause some of these messages to get built up. My question is how can I determine the current buffer size, and how do I determine the maximum message buffer. The code below gives a snippet of how I set up the port and use the sendto function.
/* need to adjust the maximum size we can send on this */
/* as it needs to be able to cope with the biggest */
/* messages we send */
lenlen = sizeof(len) ;
/* allow double for when the system is under load */
int lenlen, len ;
lenlen = sizeof(len) ;
len = 2 * 32000;
msg_socket = socket( AF_UNIX,SOCK_DGRAM, 0);
result = setsockopt(msg_socket, SOL_SOCKET, SO_SNDBUF, (char *)&len, lenlen) ;
result = sendto( msg_socket,
(char *)message,
(int)message_len,
flags,
dest_addr,
addrlen);
Note. We have ported this application to Linux and the problem does not seem to appear there.
Any help would be greatly appreciated.
Regards
UDP send buffer size is different from TCP - it just limits the size of the datagram. Quoting Stevens UNP Vol. 1:
...
A UDP socket has a send buffer size (which we can change with SO_SNDBUF socket option, Section 7.5), but this is simply an upper limit on the maximum-sized UDP datagram that can be written to the socket. If an application writes a datagram larger than the socket send buffer size, EMSGSIZE is returned. Since UDP is unreliable, it does not need to keep a copy of the application's data and does not need an actual send buffer. (The application data is normally copied into a kernel buffer of some form as it passes down the protocol stack, but this copy is discarded by the datalink layer after the data is transmitted.)
UDP simply prepends 8-byte header and passes the datagram to IP. IPv4 or IPv6 prepends its header, determines the outgoing interface by performing the routing function, and then either adds the datagram to the datalink output queue (if it fits within the MTU) or fragments the datagram and adds each fragment to the datalink output queue. If a UDO application sends large datagrams (say 2,000-byte datagrams), there's a much higher probability of fragmentation than with TCP. because TCP breaks the application data into MSS-sized chunks, something that has no counterpart in UDP.
The successful return from write to a UDP socket tells us that either the datagram or all fragments of the datagram have been added to the datalink output queue. If there is no room on the queue for the datagram or one of its fragments, ENOBUFS is often returned to the application.
Unfortunately, some implementations do not return this error, giving the application no indication that the datagram was discarded without even being transmitted.
The last footnote needs attention - but it looks like Tru64 has this error code listed in the manual page.
The proper way of doing it though is to queue your outstanding messages in the application itself and to carefully check return values and the errno after each system call. This still does not guarantee delivery (since UDP receivers might drop the packets without any notice to the senders). Check the UDP packet discard counters with netstat -s on both/all sides, see if they are growing. There is really no way around this besides switching to TCP or implementing your own timeout/ack and re-transmission logic.
You should probably be using some sort of congestion control to avoid overloading the network. By far the easiest way to do this is to use TCP instead of UDP.
It fails less often on Linux because UDP sockets wait for space in the local network interface queue on Linux (unless you set them non-blocking). However, with any operating system, if the overfull queue is not in the local system, the packet will be dropped silently.

Resources