This question is similar to Network port open, but no process attached? and netstat shows a listening port with no pid but lsof does not. But the answers to them can't solve mine, since it is so weird.
I have a server application called lps that waits for tcp connections on port 8588.
[root#centos63 lcms]# netstat -lnp | grep 8588
tcp 0 0 0.0.0.0:8588 0.0.0.0:* LISTEN 6971/lps
As you can see, nothing is wrong with the listening socket, but when I connect some thousand test clients(written by another colleague) to the server, whether it's 2000, 3000, or 4000. There have always been 5 clients(which are also random) that connect and send login request to the server, but cannot receive any response. Take 3000 clients as an example. This is what the netstat command gives:
[root#centos63 lcms]# netstat -nap | grep 8588 | grep ES | wc -l
3000
And this is lsof command output:
[root#centos63 lcms]# lsof -i:8588 | grep ES | wc -l
2995
That 5 connections are here:
[root#centos63 lcms]# netstat -nap | grep 8588 | grep -v 'lps'
tcp 92660 0 192.168.0.235:8588 192.168.0.241:52658 ESTABLISHED -
tcp 92660 0 192.168.0.235:8588 192.168.0.241:52692 ESTABLISHED -
tcp 92660 0 192.168.0.235:8588 192.168.0.241:52719 ESTABLISHED -
tcp 92660 0 192.168.0.235:8588 192.168.0.241:52721 ESTABLISHED -
tcp 92660 0 192.168.0.235:8588 192.168.0.241:52705 ESTABLISHED -
The 5 above shows that they are connected to the server on port 8588 but no program attached. And the second column(which is RECV-Q) keeps increasing as the clients are sending the request.
The links above say something about NFS mount and RPC. As for RPC, I used the command rcpinfo -p and the result has nothing to do with port 8588. And NFS mount, nfssta output says Error: No Client Stats (/proc/net/rpc/nfs: No such file or directory).
Question : How can this happen? Always 5 and also not from the same 5 clients. I don't think it's port conflict as the other clients are also connected to the same server IP and port and they are all properly handled by the server.
Note: I'm using Linux epoll to accept client requests. I also write debug code in my program and record every socket(along with the clients' information) that accept returns but cannot find the 5 connections. This is uname -a output:
Linux centos63 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Thanks for your kind help! I'm really confused.
Update 2013-06-08:
After upgrading the system to CentOS 6.4, the same problem occurs. Finally I returned to epoll, and found this page saying that set listen fd to be non-blocking and accept till EAGAIN or EWOULDBLOCK error returns. And yes, it works. No more connections are pending. But why is that? The Unix Network Programming Volume 1 says
accept is called by a TCP server to return the next completed connection from the
front of the completed connection queue. If the completed connection queue is empty,
the process is put to sleep (assuming the default of a blocking socket).
So if there are still some completed connections in the queue, why the process is put to sleep?
Update 2013-7-1:
I use EPOLLET when adding the listening socket, so I can't accept all if not keeping accept till EAGAIN encountered. I just realized this problem. My fault. Remember: always read or accept till EAGAIN comes out if using EPOLLET, even if it is listening socket. Thanks again to Matthew for proving me with a testing program.
I've tried duplicating your problem using the following parameters:
The server uses epoll to manage connections.
I make 3000 connections.
Connections are blocking.
The server is basically 'reduced' to handling the connections only and performing very little complicated work.
I cannot duplicate the problem. Here is my server source code.
#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/epoll.h>
#include <err.h>
#include <sysexits.h>
#include <string.h>
#include <unistd.h>
struct {
int numfds;
int numevents;
struct epoll_event *events;
} connections = { 0, 0, NULL };
static int create_srv_socket(const char *port) {
int fd = -1;
int rc;
struct addrinfo *ai = NULL, hints;
memset(&hints, 0, sizeof(hints));
hints.ai_flags = AI_PASSIVE;
if ((rc = getaddrinfo(NULL, port, &hints, &ai)) != 0)
errx(EX_UNAVAILABLE, "Cannot create socket: %s", gai_strerror(rc));
if ((fd = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol)) < 0)
err(EX_OSERR, "Cannot create socket");
if (bind(fd, ai->ai_addr, ai->ai_addrlen) < 0)
err(EX_OSERR, "Cannot bind to socket");
rc = 1;
if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &rc, sizeof(rc)) < 0)
err(EX_OSERR, "Cannot setup socket options");
if (listen(fd, 25) < 0)
err(EX_OSERR, "Cannot setup listen length on socket");
return fd;
}
static int create_epoll(void) {
int fd;
if ((fd = epoll_create1(0)) < 0)
err(EX_OSERR, "Cannot create epoll");
return fd;
}
static bool epoll_join(int epollfd, int fd, int events) {
struct epoll_event ev;
ev.events = events;
ev.data.fd = fd;
if ((connections.numfds+1) >= connections.numevents) {
connections.numevents+=1024;
connections.events = realloc(connections.events,
sizeof(connections.events)*connections.numevents);
if (!connections.events)
err(EX_OSERR, "Cannot allocate memory for events list");
}
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, fd, &ev) < 0) {
warn("Cannot add socket to epoll set");
return false;
}
connections.numfds++;
return true;
}
static void epoll_leave(int epollfd, int fd) {
if (epoll_ctl(epollfd, EPOLL_CTL_DEL, fd, NULL) < 0)
err(EX_OSERR, "Could not remove entry from epoll set");
connections.numfds--;
}
static void cleanup_old_events(void) {
if ((connections.numevents - 1024) > connections.numfds) {
connections.numevents -= 1024;
connections.events = realloc(connections.events,
sizeof(connections.events)*connections.numevents);
}
}
static void disconnect(int fd) {
shutdown(fd, SHUT_RDWR);
close(fd);
return;
}
static bool read_and_reply(int fd) {
char buf[128];
int rc;
memset(buf, 0, sizeof(buf));
if ((rc = recv(fd, buf, sizeof(buf), 0)) <= 0) {
rc ? warn("Cannot read from socket") : 1;
return false;
}
if (send(fd, buf, rc, MSG_NOSIGNAL) < 0) {
warn("Cannot send to socket");
return false;
}
return true;
}
int main()
{
int srv = create_srv_socket("8558");
int ep = create_epoll();
int rc = -1;
struct epoll_event *ev = NULL;
if (!epoll_join(ep, srv, EPOLLIN))
err(EX_OSERR, "Server cannot join epollfd");
while (1) {
int i, cli;
rc = epoll_wait(ep, connections.events, connections.numfds, -1);
if (rc < 0 && errno == EINTR)
continue;
else if (rc < 0)
err(EX_OSERR, "Cannot properly perform epoll wait");
for (i=0; i < rc; i++) {
ev = &connections.events[i];
if (ev->data.fd != srv) {
if (ev->events & EPOLLIN) {
if (!read_and_reply(ev->data.fd)) {
epoll_leave(ep, ev->data.fd);
disconnect(ev->data.fd);
}
}
if (ev->events & EPOLLERR || ev->events & EPOLLHUP) {
if (ev->events & EPOLLERR)
warn("Error in in fd: %d", ev->data.fd);
else
warn("Closing disconnected fd: %d", ev->data.fd);
epoll_leave(ep, ev->data.fd);
disconnect(ev->data.fd);
}
}
else {
if (ev->events & EPOLLIN) {
if ((cli = accept(srv, NULL, 0)) < 0) {
warn("Could not add socket");
continue;
}
epoll_join(ep, cli, EPOLLIN);
}
if (ev->events & EPOLLERR || ev->events & EPOLLHUP)
err(EX_OSERR, "Server FD has failed", ev->data.fd);
}
}
cleanup_old_events();
}
}
Here is the client:
from socket import *
import time
scks = list()
for i in range(0, 3000):
s = socket(AF_INET, SOCK_STREAM)
s.connect(("localhost", 8558))
scks.append(s)
time.sleep(600)
When running this on my local machine I get 6001 sockets using port 8558 (1 listening, 3000 client side sockets and 3000 server side sockets).
$ ss -ant | grep 8558 | wc -l
6001
When checking the number of IP connections connected on the client I get 3000.
# lsof -p$(pgrep python) | grep IPv4 | wc -l
3000
I've also tried the test with the server on a remote machine with success too.
I'd suggest you attempt to do the same.
In addition try turning off iptables completely just in case its some connection tracking quirk.
Sometimes the iptables option in /proc can help too. So try sysctl -w net.netfilter.nf_conntrack_tcp_be_liberal=1.
Edit: I've done another test which produces the output you see on your side. Your problem is that you are shutting down the connection on the server side pre-emptively.
I can duplicate results similar to what you are seeing doing the following:
After reading some data in to my server, call shutdown(fd, SHUT_RD).
Do send(fd, buf, sizeof(buf)) on the server.
After doing this the following behaviours are seen.
On the client I get 3000 connections open in netstat/ss with ESTABLISHED.
In lsof output I get 2880 (nature of how I was doing shutdown) connections established.
The remainder of the connections lsof -i:8558 | grep -v ES are in CLOSE_WAIT.
This only happens on a half-shutdown connection.
As such I suspect this is a bug in your client or server program. Either you are sending something to the server which the server objects to, or the server is invalidly closing connections down for some reason.
You need to confirm that what state the "anomalous" connections in (like close_wait or something else).
At this stage I also consider this a programming problem and not really something that belongs on serverfault. Without seeing the relevant portions of the source for the client/server it is not going to be possible for anybody to track down the cause of the fault. Albeit I am pretty confident this is nothing to do with the way the operating system is handling the connections.
Related
I've stumbled across something interesting and I can't explain it, nor was Googling it productive.
I have one Express server, server 1, bound to localhost:
const express = require('express')
const app = express()
app.get('/', (req, res) => res.send('server 1'))
app.listen(4000, 'localhost')
node 37624 user 27u IPv4 0x681653f502970305 0t0 TCP localhost:4000 (LISTEN)
I have another Express server, server 2, bound to all interfaces at 0.0.0.0:
const express = require('express')
const app = express()
app.get('/', (req, res) => res.send('server 2'))
app.listen(4000, '0.0.0.0')
node 37624 user 27u IPv4 0x681653f502970305 0t0 TCP localhost:4000 (LISTEN)
node 37693 user 25u IPv4 0x681653f4fdbdc005 0t0 TCP *:4000 (LISTEN)
Curling 0.0.0.0 gives a response from server 1, the one bound to localhost, so clearly these two are conflicting.
Somehow, however, this does not throw an error one would expect, EADDRINUSE, how can that be?
The SO_REUSEADDR flag is being set on the network sockets in the OS by Node is causing this behavior. The REUSEADDR flag has a special interaction with the IPARR_ANY (aka 0.0.0.0 for IPv4) address. From the socket manual pages (reputable source):
SO_REUSEADDR
Indicates that the rules used in validating addresses supplied
in a bind(2) call should allow reuse of local addresses. For
AF_INET sockets this means that a socket may bind, except when
there is an active listening socket bound to the address.
When the listening socket is bound to INADDR_ANY with a spe‐
cific port then it is not possible to bind to this port for
any local address. Argument is an integer boolean flag.
From an article that goes into this exact problem:
Some folks don't like SO_REUSEADDR because it has a security stigma
attached to it. On some operating systems it allows the same port to
be used with a different address on the same machine by different
processes at the same time. This is a problem because most servers
bind to the port, but they don't bind to a specific address, instead
they use INADDR_ANY (this is why things show up in netstat output as
*.8080). So if the server is bound to *.8080, another malicious user on the local machine can bind to local-machine.8080, which will
intercept all of your connections since it is more specific.
I modified some Linux test code to explicitly demonstrate this (bottom). When you run it you get the following output:
Opening 0.0.0.0 with no reuse flag:19999
Opening Loopback with no resuse flag:19999
bind: Address already in use
Correct: could not open lookpback with no reuse 19999
Opening 0.0.0.0 with with reuse flag:19999
Opening Loopback with with resuse flag:19999
Correct: could open lookpback with reuse 19999
The first test case opens a socket on the IPADDR_ANY address without the REUSEADDR flag set and when there is an attempt to open a socket on the loopback a EADDRINUSE error is thrown by 'bind' (as you originally expected). The second test case does the same thing but with the REUSEADDR flag set and the second socket is created without an error.
#include <errno.h>
#include <error.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <stdbool.h>
#include <stdio.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>
#define PORT 19999
int open_port(int any, int reuse)
{
int fd = -1;
int reuseaddr = 1;
int v6only = 1;
int addrlen;
int ret = -1;
struct sockaddr *addr;
int family = AF_INET;
struct sockaddr_in addr4 = {
.sin_family = AF_INET,
.sin_port = htons(PORT),
.sin_addr.s_addr = any ? htonl(INADDR_ANY) : inet_addr("127.0.0.1"),
};
addr = (struct sockaddr*)&addr4;
addrlen = sizeof(addr4);
if ((fd = socket(family, SOCK_STREAM, IPPROTO_TCP)) < 0) {
perror("socket");
goto out;
}
if (reuse){
if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &reuseaddr,
sizeof(reuseaddr)) < 0) {
perror("setsockopt SO_REUSEADDR");
goto out;
}
}
if (bind(fd, addr, addrlen) < 0) {
perror("bind");
goto out;
}
if (any)
return fd;
if (listen(fd, 1) < 0) {
perror("listen");
goto out;
}
return fd;
out:
close(fd);
return ret;
}
int main(void)
{
int listenfd;
int fd1, fd2;
fprintf(stderr, "Opening 0.0.0.0 with no reuse flag:%d\n", PORT);
listenfd = open_port(1, 0);
if (listenfd < 0)
error(1, errno, "Couldn't open listen socket");
fprintf(stderr, "Opening Loopback with no resuse flag:%d\n", PORT);
fd1 = open_port(0, 0);
if (fd1 >= 0)
error(1, 0, "Was allowed to create an loopback with no reuse");
fprintf(stderr, "Correct: could not open lookpback with no reuse %d\n", PORT);
close(listenfd);
fprintf(stderr, "Opening 0.0.0.0 with with reuse flag:%d\n", PORT);
listenfd = open_port(1, 1);
if (listenfd < 0)
error(1, errno, "Couldn't open listen socket");
fprintf(stderr, "Opening Loopback with with resuse flag:%d\n", PORT);
fd1 = open_port(0, 1);
if (fd1 < 0)
error(1, 0, "Was not allowed to create an loopback with reuse");
fprintf(stderr, "Correct: could open lookpback with reuse %d\n", PORT);
close(fd1);
close(listenfd);
return 0;
}
Hello i think i can help you on this .
First see difference between 0.0.0.0 and localhost .
Suppose if you are running your server on 0.0.0.0 this means that it will run that server which is available at that time so that's server 1 because 0.0.0.0 is nothing but it means that just run a server available to you so 0.0.0.0 know that server 1 is running that's why it is redirecting to server 1 because you have initialized on same port .
I mean if you run 0.0.0.0:4000 it will redirect to localhost:4000 because 0.0.0.0 is not a host but it's an address used to refer to all IP addresses on the same machine so 0.0.0.0 refers to 127.0.0.1:4000 it is the normal loopback address, and localhost:4000 is the hostname for 127.0.0.1:4000.
here is much more simpler explanation : 0.0.0.0:4000 ---> 127.0.0.1:4000 ---> localhost:4000
The kernel in Windows allows multiple applications to share a port as long as the Url is unique
is it possible to make nodejs scripts listen to same port
You are listening to the same port 4000 with your two servers.
And if you wanna run two servers you should explicitly set two different port for each of the servers, something like this.
// server 1
app.get('/', (req, res) => res.send('server 1'))
app.listen(4000,() => console.log('server 1 listening to port 4000'))
// server 2
app.get('/', (req, res) => res.send('server 2'))
app.listen(5000, () => console.log('server 2 listening to port 5000'))
I need to find the specific interface which is used by a socket, so that I can keep stats for it, using the sysfs files (/sys/class/net/<IF>/statistics/etc).
I've tried two different approaches in the test code below, but both fail. The first one connects to a remote server, and uses ioctl with SIOCGIFNAME, but this fails with 'no such device'. The second one instead uses getsockopt with SO_BINDTODEVICE, but this again fails (it sets the name length to 0).
Any ideas on why these are failing, or how to get the I/F name? after compiling, run the test code as test "a.b.c.d", where a.b.c.d is any IPV4 address which is listening on port 80. Note that I've compiled this on Centos 7, which doesn't appear to have IFNAMSZ in <net/if.h>, so you may have to comment out the #define IFNAMSZ line to get this to compile on other systems.
Thanks.
EDIT
I've since found that this is essentially a dupe of How can I get the interface name/index associated with a TCP socket?, so I should probably remove this. (Only) one of the answers there is correct (https://stackoverflow.com/a/37987807/785194) - get your local IP address with getsockname, and then look up this address in the list returned by getifaddrs.
On the general issue that sockets are essentially dynamic (mentioned below, and several times in the other question): not really relevant. I've checked the kernel source, and sockets have an interface index and interface name, and the API includes at least three ways to get the current name, and other routines to look up the name from the index, and vice-versa. However, the index is somtimes zero, which is not valid, which is why the getsockopt version below fails. No idea why ioctl fails.
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/ioctl.h>
#include <net/if.h>
int main(int argc, char **argv) {
int sock;
struct sockaddr_in dst_sin;
struct in_addr haddr;
if(argc != 2)
return 1;
if(inet_aton(argv[1], &haddr) == 0) {
printf("'%s' is not a valid IP address\n", argv[1]);
return 1;
}
dst_sin.sin_family = AF_INET;
dst_sin.sin_port = htons(80);
dst_sin.sin_addr = haddr;
if((sock = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
perror("socket");
return 1;
}
if(connect(sock, (struct sockaddr*)&dst_sin, sizeof(dst_sin)) < 0) {
perror("connect");
return 1;
}
printf(
"connected to %s:%d\n",
inet_ntoa(dst_sin.sin_addr), ntohs(dst_sin.sin_port));
#if 0 // ioctl fails with 'no such device'
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
// get the socket's interface index into ifreq.ifr_ifindex
if(ioctl(sock, SIOCGIFINDEX, &ifr) < 0) {
perror("SIOCGIFINDEX");
return 1;
}
// get the I/F name for ifreq.ifr_ifindex
if(ioctl(sock, SIOCGIFNAME, &ifr) < 0) {
perror("SIOCGIFNAME");
return 1;
}
printf("I/F is on '%s'\n", ifr.ifr_name);
#else // only works on Linux 3.8+
#define IFNAMSZ IFNAMSIZ // Centos7 bug in if.h??
char optval[IFNAMSZ] = {0};
socklen_t optlen = IFNAMSZ;
if(getsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE, &optval, &optlen) < 0) {
perror("getsockopt");
return 1;
}
if(!optlen) {
printf("invalid optlen\n");
return 1;
}
printf("I/F is on '%s'\n", optval);
#endif
close(sock);
return 0;
}
TCP (and UDP) sockets are not bound to interfaces, so there is really no facility for answering this query. Now it's true that in general, a given socket will end up passing packets to a specific interface based on the address of the peer endpoint, but that is nowhere encoded in the socket. That's a routing decision that is made dynamically.
For example, let's say that you are communicating with a remote peer that is not directly on your local LAN. And let's say you have a default gateway configured to be 192.168.2.1 via eth0. There is nothing to prevent your configuring a second gateway, say, 192.168.3.1 via eth1, then taking eth0 down. As long as the new gateway can also reach the remote IP, eth1 can now be used to reach the destination and your session should continue uninterrupted.
So, if you need this info, you'll need to infer it from routing entries (but realize that it is not guaranteed to be static, even though in practice it will likely be so). You can obtain the address of your peer from getpeername(2). You can then examine the available routes to determine which one will get you there.
To do this, you could parse and interpret /proc/net/route for yourself, or you can just ask the ip command. For example, my route to an (arbitrary) ibm.com address goes through my eth0 interface, and connecting a socket to there, my local address will be 192.168.0.102 (which should match what getsockname(2) on the connected socket returns):
$ ip route get 129.42.38.1
129.42.38.1 via 192.168.0.1 dev eth0 src 192.168.0.102
cache
What I am trying to achieve is binding an IPv6 socket to any address of just one particular device, not system-wide. My intuition is that I could setsockopt() with SO_BINDTODEVICE followed by a bind to ::. It mostly does what I expect it to do. The behaviour is the same in v4.
The sockets bound to an interface with SO_BINDTODEVICE will only accept connections made to addresses on that interface. That much is expected.
However, I run into errno "Address already in use", if I'm trying to bind to a source port on interface B when there is a socket using the same port but on interface A.
Ex:
nic A has IPv6 fd00:aaaa::a/64
nic B has IPv6 fd00:bbbb::b/64
they do not share networks.
Put shortly (pseudocode):
process 1 calls socket(...) and binds bind(fd00:aaaa::a/64, 9000).
process 2 calls socket(...) and setsockopt(SO_BINDTODEVICE, "B")
process 2 (continued) calls bind(::, 9000) and gets EADDRINUSE. Why?
How does SO_BINDTODEVICE really work? Does the determination for "addresses in use" ignore, conservatively, the interface sockets are bound to? Is it a networking stack layering issue?
Example traces:
I start a listening socket (server) on a specific address: nc -l fd00:aaaa::a 9000. Its trace is as follows:
socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {
sa_family=AF_INET6,
sin6_port=htons(9000),
inet_pton(AF_INET6, "fd00:aaaa::a", &sin6_addr),
sin6_flowinfo=0, sin6_scope_id=0
}, 28) = 0
listen(3, 1) = 0
accept(3, ...
Connecting to it (client) fails if I bind to the port in use by the other interface, even though I've already bound to a different interface:
socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "nicB\0", 5) = 0
bind(3, {sa_family=AF_INET6,
sin6_port=htons(9000),
inet_pton(AF_INET6, "::", &sin6_addr),
sin6_flowinfo=0,
sin6_scope_id=0
}, 28) = -1 //EADDRINUSE (Address already in use)
However, if I don't specify the port, then all is good when binding to :: (while the listener still runs):
socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "nicB\0", 5) = 0
bind(3, {
sa_family=AF_INET6,
sin6_port=htons(0),
inet_pton(AF_INET6, "::", &sin6_addr),
sin6_flowinfo=0, sin6_scope_id=0
}, 28) = 0
connect(3, {
sa_family=AF_INET6,
sin6_port=htons(9000),
inet_pton(AF_INET6, "fd00:aaaa::a", &sin6_addr),
sin6_flowinfo=0, sin6_scope_id=0
}, 28) = ...
Note: This is on 3.19.0-68-generic x86_64 . Ubuntu 14.04. In case it makes a difference, for my tests, nicB is a macvlan in bridge mode whose parent is nicA.
I've found a satisfying explanation for this problem.
The observation is that even though only interface "A" has IP fd00:aaaa::a/64 when the program is started, the listening socket could accept connections coming in over different interfaces if they were to receive that IP in the future. IPs can be added and removed -- and server processes listening on :: or (0.0.0.0 in v4) need not be restarted when interfaces receive new IPs.
So, in a way, process 1's bind("fd00:aaaa::a/64", 9000) binds implicitly to ALL interfaces. Even though process 2 only needs to use interface B, process 1's already got first dibs, because it uses port 9000 on both interfaces, so process 2 gets denied.
If I change program 1 so that it too uses SO_BINDTODEVICE (to interface "A"), then both processes can bind(::, 9000) without issues.
experiment
I've tested this out with a little LD_PRELOAD goop, which precedes calls to bind() with setsockopt(...SO_BINDTODEVICE...). The two following TCP listeners can both bind to port 9000 simulateneously if they are each bound to a different interface.
# LD_PRELOAD=./bind_hook.so _BINDTODEVICE=eth0 nc -l 0.0.0.0 9000
# LD_PRELOAD=./bind_hook.so _BINDTODEVICE=eth1 nc -l 0.0.0.0 9000
If only one of the two uses SO_BINDTODEVICE, then the last process gets EADDRINUSE. Which is the situation put forward in the question.
I'm including the C code (GNU/Linux) for my tool in case someone needs something similar:
/**
* bind_hook.c
*
* Calls setsockopt() with #SO_BINDTODEVICE before _any_ bind().
* The name of the interface to bind to is obtained from
* environment variable `_BINDTODEVICE`.
*
* Needs root perms. errors are not signalled out.
*
* Compile with:
* gcc -Wall -Werror -shared -fPIC -o bind_hook.so -D_GNU_SOURCE bind_hook.c -ldl
* Example usage:
* LD_PRELOAD=./bind_hook.so _BINDTODEVICE=eth0 nc -l 0.0.0.0 9500
*
* #author: init-js
**/
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <net/if.h>
#include <dlfcn.h>
#include <errno.h>
static char iface[IF_NAMESIZE];
static int (*bind_original)(int, const struct sockaddr*, socklen_t addrlen);
int bind(int sockfd, const struct sockaddr *addr,
socklen_t addrlen);
__attribute__((constructor))
void ctor() {
bind_original = dlsym(RTLD_NEXT, "bind");
char *env_iface = getenv("_BINDTODEVICE");
if (env_iface) {
strncpy(iface, env_iface, IF_NAMESIZE - 1);
}
}
/* modified bind() -- call setsockopt first */
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen) {
int _errno;
if (iface[0]) {
/* preserve errno */
_errno = errno;
setsockopt(sockfd, SOL_SOCKET, SO_BINDTODEVICE,
(void*)iface, IF_NAMESIZE);
errno = _errno;
}
return bind_original(sockfd, addr, addrlen);
}
If there is a socket already bound to a specific IP address and port, you can only bind to that port again if you provide another specific IP address. You cannot use INADDR_ANY in this circumstance.
I want to implement command tcpdump -i eth0 arp to observe arp packets on interface eth0 on my ubuntu. I use libpcap, but the return value of function pcap_next_ex is always 0. With tcpdump -i eth0 arp in the same time , it can observe arp packets.
/*
* compile(root): gcc test.c -lpcap
* run : ./a.out
* output : time out
* time out
* time out
* ...
*/
#include <pcap.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#define ARP_REQUEST 1
#define ARP_REPLY 2
typedef struct arp_hdr_s arp_hdr_t;
struct arp_hdr_s {
u_int16_t htype;
u_int16_t ptype;
u_char hlen;
u_char plen;
u_int16_t oper;
u_char sha[6];
u_char spa[4];
u_char tha[6];
u_char tpa[4];
};
#define MAXBYTES2CAPTURE 2048
int
main(int argc, char **argv)
{
char err_buf[PCAP_ERRBUF_SIZE];
const unsigned char *packet;
int i;
int ret;
arp_hdr_t *arp_header;
bpf_u_int32 net_addr;
bpf_u_int32 mask;
pcap_t *desrc;
struct pcap_pkthdr *pkthdr;
struct bpf_program filter;
net_addr = 0;
mask = 0;
memset(err_buf, 0, PCAP_ERRBUF_SIZE);
desrc = pcap_open_live("eth0", MAXBYTES2CAPTURE, 0, 512, err_buf);
if (desrc == NULL) {
fprintf(stderr, "error: %s\n", err_buf);
exit(-1);
}
ret = pcap_lookupnet("eth0", &net_addr, &mask, err_buf);
if (ret < 0) {
fprintf(stderr, "error: %s\n", err_buf);
exit(-1);
}
ret = pcap_compile(desrc, &filter, "arp", 1, mask);
if (ret < 0) {
fprintf(stderr, "error: %s\n", pcap_geterr(desrc));
exit(-1);
}
ret = pcap_setfilter(desrc, &filter);
if (ret < 0) {
fprintf(stderr, "errnor: %s\n", pcap_geterr(desrc));
exit(-1);
}
while (1) {
ret = pcap_next_ex(desrc, &pkthdr, &packet);
if (ret == -1) {
printf("%s\n", pcap_geterr(desrc));
exit(1);
} else if (ret == -2) {
printf("no more\n");
} else if (ret == 0) { // here
printf("time out\n");
continue;
}
arp_header = (arp_hdr_t *)(packet + 14);
if (ntohs(arp_header->htype) == 1 && ntohs(arp_header->ptype == 0x0800)) {
printf("src IP: ");
for (i = 0; i < 4; i++) {
printf("%d.", arp_header->spa[i]);
}
printf("dst IP: ");
for (i = 0; i < 4; i++) {
printf("%d.", arp_header->tpa[i]);
}
printf("\n");
}
}
return 0;
}
Without getting too deep in your code, I can see a major problem:
In your use of pcap_open_live(), you do not set promiscuous mode: the third parameter should be non-zero. If the ARP request is not targeted to your interface IP, pcap will not see it without promiscuous mode. tcpdump does, unless specifically told not to do so by using the --no-promiscuous-mode, use promisc (and hence will require CAP_NET_ADMIN privilege, which you'll get by sudo, which your program will require too).
Side note:
1/ Leak: you may want to free your filter using pcap_freecode() after your pcap_setfilter().
2/ I assume you've read the official tuto here:
http://www.tcpdump.org/pcap.html
...if that's not the case you'd be well advised to do that first. I quote:
A note about promiscuous vs. non-promiscuous sniffing: The two
techniques are very different in style. In standard, non-promiscuous
sniffing, a host is sniffing only traffic that is directly related to
it. Only traffic to, from, or routed through the host will be picked
up by the sniffer. Promiscuous mode, on the other hand, sniffs all
traffic on the wire. In a non-switched environment, this could be all
network traffic. [... more stuff on promisc vs non-promisc]
EDIT:
Actually, looking deeper to you code compared to my code running for +1 year at production level (both in-house and at the customer) I can see many more things that could be wrong:
You never call pcap_create()
You never call pcap_set_promisc(), we've talked about this already
You never call pcap_activate(), this may be the core issue here
...pcap is very touchy about the sequence order of operations to first get a pcap_t handle, and then operate on it.
At the moment, the best advice I can give you - otherwise this is going to a live debugging session between you and me, are:
1/ read and play/tweak with the code from the official tutorial:
http://www.tcpdump.org/pcap.html
This is mandatory.
2/ FWIW, my - definitely working - sequence of operations is this:
pcap_lookupnet()
pcap_create()
pcap_set_promisc()
pcap_set_snaplen(), you may or may not need this
pcap_set_buffer_size(), you may or may not need this
pcap_activate() with a note: Very important: first activate, then set non-blocking from PCAP_SETNONBLOCK(3PCAP): When first activated with pcap_activate() or opened with pcap_open_live() , a capture handle is not in non-blocking mode''; a call to pcap_set-nonblock() is required in order to put it intonon-blocking'' mode.
...and then, because I do not use stinking blocking/blocking with timeout, busy looping:
pcap_setnonblock()
pcap_get_selectable_fd()
...then and only then:
- pcap_compile()
- followed by a pcap_setfilter()
- and then as I mentioned a pcap_freecode()
- and then a select() or family on the file'des' I get from pcap_get_selectable_fd(), to pcap_dispatch(), but this is another topic.
pcap is an old API starting back in the 80's, and its really very very touchy. But don't get discouraged! It's great - once you get it right.
It would probably work better if you did
if (ntohs(arp_header->htype) == 1 && ntohs(arp_header->ptype) == 0x0800) {
rather than
if (ntohs(arp_header->htype) == 1 && ntohs(arp_header->ptype == 0x0800)) {
The latter evaluates arp_header->type == 0x0800, which, when running on a little-endian machine (such as a PC), will almost always evaluate to "false", because the value will look like 0x0008, not 0x0800, in an ARP packet - ARP types are big-endian, so they'll look byte-swapped on a little-endian machine). That means it'll evaluate to 0, and byte-swapping 0 gives you zero, so that if condition will evaluate to "false", and the printing code won't be called.
You'll still get lots of timeouts if you fix that, unless there's a flood of ARP packets, but at least you'll get the occasional ARP packet printed out. (I would advise printing nothing on a timeout; pcap-based programs doing live capturing should expect that timeouts should happen, and should not report them as unusual occurrences.)
Is there possible that accept() (on redhat Enterprise 4/linux kernel 2.6) return a same socket value for different tcp connections from the same process of a same application and same machine?
I am so surprised that when I got such a result that many connections have the same socket value on server side when I checked the log file!! How is it possible?!!
By the way, I am using TCP blocking socket to listen.
main(){
int fd, clientfd, len, clientlen;
sockaddr_in address, clientaddress;
fd = socket(PF_INET, SOCK_STREAM, 0);
....
memset(&address, 0, sizeof address);
address.sin_address = AF_INET;
address.sin_port = htons(port);
....
bind(fd, &address, sizeof address);
listen(fd, 100);
do {
clientfd = accept(fd, &clientaddress, &clientlen);
if (clientfd < 0) {
....
}
printf("clientfd = %d", clientfd);
switch(fork()){
case 0:
//do something else
exit(0);
default:
...
}
} while(1);
}
my question is that why printf("clientfd = %d"); prints a same number for different connections!!!
If server runs in multiple processes (like Apache with mpm worker model), then every process has its own file descriptor numbering starting from 0.
In other words, it is quite possible that different processes will get exact same socket file descriptor number. However, fd number it does not really mean anything. They still refer to different underlying objects, and different local TCP ports.
The socket is just a number.It is a hook to a data structure for the kernel.
BTW TCP uses IP. Look up the RFC
That printf() doesn't print any FD at all. It's missing an FD parameter. What you are seeing could be a return address or any other arbitrary junk on the stack.