The same pub-sub code works on local machine (Linux zephyr 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:08:16 UTC 2014 i686 i686 i686 GNU/Linux).
However, on EC2 machine (Linux <host> 3.2.0-60-virtual #91-Ubuntu SMP Wed Feb 19 04:13:28 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux) it fails.
The security group is set to allow all for 19019 port and also, for all TCP ports starting from 0.
I tried adding prints in the NodeJS ZMQ module and was able to get the data that I am sending when I added it in flush function.
What else could be the problem?
I tried listening to pub traffic using tcpflow on port 19019 but it didn't work. How can I listen to this traffic?
sudo tcpflow -i eth0 port 19019 and sudo tcpflow -i lo port 19019
Both didn't work. Is there any tool through which I can debug this?
Pub.coffee
zmq = require 'zmq'
dpush_socket = zmq.socket 'pub'
dpush_socket.bind 'tcp://127.0.0.1:19019', (err) ->
if not err?
console.log "Bind successful"
dpush_socket.send 'pid' + ' req ' + req.query.pid
Sub.coffee
zmq = require "zmq"
endPoint = "tcp://0.0.0.0:19019"
sub = zmq.socket "sub"
sub.identity = 'worker' + process.pid;
sub.connect endPoint
console.log "worker connected!"
sub.subscribe('')
sub.on "message", (msg) ->
console.log(sub.identity + 'got ' + msg.toString())
Transport Class shall rather meet each other on the same IP:PORT#
Sub.coffee
zmq = require "zmq"
# # rather set URL, where PUB .bind() listens
endPoint = "tcp://127.0.0.1:19019" # endPoint = "tcp://0.0.0.0:19019"
Part of the answer is probably what user3666197 pointed out: you need to bind and connect on the same IP. I'm not sure what you intend with the 0.0.0.0 address, and it shouldn't work even on your local machine unless you found some undocumented corner of your network stack that supports this behavior.
The other thing is that you either want to include your send call in your callback, or probably want to use bindSync to ensure that the socket is bound before you attempt to send anything. What may be happening is that the socket is discarding your sent message because the socket hasn't completed binding by the time you get to the call. This could well be different between different machines.
The problem is I use a nodejs cluster module and in each of the work a zmq pub socket is created which binds on same port which messes up the issue. On my local machine its a single worker spawning.
Related
I have a simple program which creates a simple web server at localhost with a random port between 10000 and 65535 (which is the highest unsigned 16-bit integer). You can also specify a port but if you don't know on which port it runs it's hard to find out.
I have written a little helper program that should show every port that's being listened to.
The helper:
import requests
for port in range(10000, 65535):
try:
print(port, requests.get("http://localhost:{}".format(port)))
except Exception as e:
print("{}: {}".format(type(e).__name__, port), end="\r")
I expect it to show ConnectionError: 10000 and counting up to 65535 and showing any found connections. But it hangs always on port 25564 25565, last showing the message for port 25564. And if I do a completely unrelated request to 'http://localhost:25564' or any higher port it hangs.
The script hangs on port 25565 when I start a server on 25564.
Normally if a port has no server listening it will immediately refuse the connection and give a ConnectionError. Above port 25564 it doesn't but just waits until I stop it.
This behaviour seems completely random as port 25564 is unassigned according to speedguide.net.
Port 25565 is the standard MySQL and Minecraft Dedicated Server port (according to speedguide.net), both of which I haven't running on my machine. Therefore the hang still seems random.
I'm using python3 on Ubuntu 20.04 LTS.
Interestingly it didn't fail on my laptop with Linux Mint 21...
As #root requested in the comments, here is the output of nmap localhost:
Starting Nmap 7.80 ( https://nmap.org ) at 2022-09-25 11:42 CEST
Host is up (0.00014s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
80/tcp open http
631/tcp open ipp
8080/tcp open http-proxy
9050/tcp open tor-socks
Nmap done: 1 IP address (1 host up) scanned in 0.06 seconds
Just a little note: port 80/tcp is listened on by apache2 with the "You are an idiot" flash animation.
As per the comments, you can try something like this:
You will note that i have added the timeout parameter in the requests. This units are in seconds. The default timeout is None, which means it'll wait (hang) until the connection is closed.
import requests
for port in range(10_000, 65_535):
try:
r = requests.get(f'http://localhost:{port}', timeout=5)
print(port)
except Exception as e:
print(f'{type(e).__name__}, {port}', end='\r')
I've been trying to troubleshoot this problem for some days now.
A couple of minutes after starting an SSH connection to my Namecheap server (on Mac/windows/cPanel's "Terminal"), it crashes and give the following error message :
Error: The connection to the server ended in failure at {TIME} PM. (SIGKILL)
and :
Exit Code: 137
I've tried to create some kind of log file for any SIGKILL signal, but, it seems like none can be made on a Namecheap server :
auditctl doesn't exist,
We can't get systemtap because no package managers are available.
Precision :
uname -a : Linux [-n] 2.6.32-954.3.5.lve1.4.78.el6.x86_64 #1 SMP Thu Mar 26 08:20:27 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
I calculated the time between each crash : around 6min.
I don't have a very good knowledge of Linux servers, and maybe didn't include needed information. So please ask for any specificities!
I upgraded my Linux kernel and dovecot failed to start with the following error messages:
Error: service(managesieve-login): listen(*, 4190) failed: Address already in use
Error: service(pop3-login): listen(*, 110) failed: Address already in use
Error: service(pop3-login): listen(*, 995) failed: Address already in use
Error: service(imap-login): listen(*, 143) failed: Address already in use
Error: service(imap-login): listen(*, 993) failed: Address already in use
Fatal: Failed to start listeners
Strangely enough, I couldn't find any process bounded to those port numbers. All commands below return nothing.
# netstat -tulpn | grep 110
# ss -tulpn |grep 110
# fuser 110/tcp
# lsof -i :110
I also tried to change the listen setting to my specific IP address and it still failed the same way.
Any idea how I can solve this problem? Here's my version info:
# uname -a
Linux ip-172-31-26-222 4.14.177-107.254.amzn1.x86_64 #1 SMP Thu May 7 18:30:14 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
# dovecot --version
2.2.36 (1f10bfa63)
Hi it looks like you are using AWS as I am. I recently updated via Yum as well. I noticed that a new package named 'portreserve' was also installed. I killed that process, left the /etc/dovecot/dovecot.conf as it was before and then started Dovecot successfully. I was also immediately able to reconnect my mail clients connection. I hope that helps you.
I also restarted the portreserve program since it seems useful to limit port access.
Code
To reproduce requires two application running and connecting to each other through TCP. So I've made a tiny repo that also includes the powershell build script. link to the full repo
However to avoid the extra click, here is the code for clientA.go.
package main
import (
"fmt"
"net"
"time"
)
func main() {
clientA, err := net.ResolveTCPAddr("tcp4", fmt.Sprintf(":%v", "2222"))
if err != nil {
fmt.Println(err)
return
}
clientB, err := net.ResolveTCPAddr("tcp4", fmt.Sprintf(":%v", "3333"))
if err != nil {
fmt.Println(err)
return
}
for {
clientAtoB, err := net.DialTCP("tcp4", clientA, clientB)
if err != nil {
fmt.Println(err)
} else {
defer clientAtoB.Close()
clientAtoB.SetLinger(0)
clientAtoB.SetNoDelay(true)
clientAtoB.SetKeepAlive(false)
fmt.Println("connected as Client A!")
buffer := make([]byte, 64)
_, err = clientAtoB.Read(buffer)
if err != nil {
continue
}
}
time.Sleep(time.Second)
}
}
The code for clientB.go is identical except the local and remote endpoints are swapped around:
clientBtoA, err := net.DialTCP("tcp4", clientB, clientA)
Problem
I build the same go code for both Windows and Linux but at runtime the applications produce different results. Specifically with how TCP connections are dialed on each platform.
On Windows, when I run the two executables clientA.exe and clientB.exe (built from the build.ps1 script) I get the desired result. As seen in this screenshot:
However when I upload and execute the Linux binaries, the result is different:
ubuntu#ip-172-31-16-224:~/go/src/github.com/fanmanpro/dial-vs-listen$ sudo chmod +x clientA clientB
ubuntu#ip-172-31-16-224:~/go/src/github.com/fanmanpro/dial-vs-listen$ ls -la
total 10984
drwxrwxr-x 3 ubuntu ubuntu 4096 Apr 27 03:09 .
drwxrwxr-x 4 ubuntu ubuntu 4096 Apr 27 03:08 ..
drwxrwxr-x 8 ubuntu ubuntu 4096 Apr 27 03:08 .git
-rw-rw-r-- 1 ubuntu ubuntu 11255 Apr 27 03:12 A.txt
-rw-rw-r-- 1 ubuntu ubuntu 11255 Apr 27 03:12 B.txt
-rw-rw-r-- 1 ubuntu ubuntu 247 Apr 27 03:08 build.ps1
-rwxrwxr-x 1 ubuntu ubuntu 2950662 Apr 27 03:08 clientA
-rw-rw-r-- 1 ubuntu ubuntu 2642944 Apr 27 03:08 clientA.exe
-rw-rw-r-- 1 ubuntu ubuntu 718 Apr 27 03:08 clientA.go
-rwxrwxr-x 1 ubuntu ubuntu 2950662 Apr 27 03:08 clientB
-rw-rw-r-- 1 ubuntu ubuntu 2642944 Apr 27 03:08 clientB.exe
-rw-rw-r-- 1 ubuntu ubuntu 718 Apr 27 03:08 clientB.go
ubuntu#ip-172-31-16-224:~/go/src/github.com/fanmanpro/dial-vs-listen$ ./clientA > A.txt & ./clientB > B.txt &
[1] 24914
[2] 24915
ubuntu#ip-172-31-16-224:~/go/src/github.com/fanmanpro/dial-vs-listen$ cat A.txt
dial tcp4 :2222->:3333: connect: connection refused
ubuntu#ip-172-31-16-224:~/go/src/github.com/fanmanpro/dial-vs-listen$ cat B.txt
dial tcp4 :3333->:2222: connect: connection refused
ubuntu#ip-172-31-16-224:~/go/src/github.com/fanmanpro/dial-vs-listen$
I don't expect the connection refused error since these two applications are running under the same environment, so no firewalls are in effect, and the permissions are identical.
How can I get the same result regardless of platform? Or why are they different in the first place?
Edit
The successful connection on Windows is not just the luck of good timing. On Windows, I can run A for 5 minutes, then when I run B, both connect successfully.
Update (2020-04-27)
After receiving feedback from Go developers, I've been told that this is likely a Linux configuration issue and not specific to Go. Other than permissions, I can't thing of anything that would prevent two applications in the same environment from establishing a TCP connection like this? (These low level Linux stuff isn't really my forte.)
Why this doesn't work on Linux is quite obvious. Both A and B are clients that are connecting to counterpart that needs to listen. On Linux (or UNIX) if you try to run ClientA it will try to dial in to ClientB's address and port. If there's no process already listening on this address and port to accept the connection in that moment ClientA will immediately end up with connection refused error (this is not entirely true, but most of time is, see my EDIT at the end of answer).
On Windows, under the hood Golang uses (for tcp, tcp4 and tcp6 protocols) ConnectEx API which is for connection-oriented sockets. This API behaves different from Linux connect API. If ConnectEx cannot connect immediately it returns error code ERROR_IO_PENDING and behind the scenes OS waits/retries until connection is accepted and established (or it gives up and makes it definitively failed) and then notifies back - this is called overlapped I/O.
Relevant part of MSDN ConnectEx documentation:
Connection-oriented sockets are often unable to complete their connection immediately, and therefore the operation is initiated and the function immediately returns with the ERROR_IO_PENDING or WSA_IO_PENDING error. When the connect operation completes and success or failure is achieved, status is reported using the completion notification mechanism indicated in lpOverlapped.
Now, what happens in your case on Windows is that you try to ConnectEx from both sides and OS connects those sockets for you. This will only work if other side gets connected within certain period. If you try to reasonably increase time.Sleep interval in both clients (e.g. 17 and 28), you can see even on Windows they will have hard time to connect anymore.
Answer to your question is that your code as it is written now depends on OS-specific behavior of TCP dialing in Golang on Windows and is not portable. To fix your software to be portable on any platform supported by Golang you probably want to change logic so both ClientA and ClientB listen for incoming connection and also periodically try to connect to the opposite side.
EDIT: I'm not saying your code can not work on Linux at all. It actually uses rare connection mode called TCP simultaneous connect where you can connect two processes without having any of them listen. Both dialing sides send their SYN simultaneously, so each side responds with SYN/ACK and then ACK to complete the 3-way handshake and ESTABLISH connection. That requires very precise timing and syncing of the connect call in both clients. Both sides would connect if TCP simultaneous connect is allowed in Linux kernel and that sync between connects is achieved (hardly done by just running both clients by hand or from same script; even simulating within same process and thread is not that easy).
I try to remote log my OpenWRT system. For that i set /etc/config/system like:
config system
option hostname 'MySystem'
option timezone 'UTC'
option log_file '/var/log/messages'
option log_type 'file'
option log_size '64'
option log_rotated '10'
option log_ip '192.168.1.200'
On my Ubuntu system i try to receive those log messages. syslog-ng is installed. /etc/syslog-ng/syslog-ng.conf looks like:
#version: 3.5
#include "scl.conf"
#include "`scl-root`/system/tty10.conf"
# First, set some global options.
options { chain_hostnames(off); flush_lines(0); use_dns(no); use_fqdn(no);
owner("root"); group("adm"); perm(0640); stats_freq(0);
bad_hostname("^gconfd$");
};
source s_net { udp(); };
destination s_messages { file("/var/log/my_test/remote.log");};
log { source(s_net); destination(s_messages);};
#include "/etc/syslog-ng/conf.d/*.conf"
Whenever a log message is logged on OpenWRT in /var/log/messages the file says:
Mon Dec 19 15:11:18 2016 daemon.emerg logread[1021]: Logread connected to 192.168.1.200:514
Mon Dec 19 15:11:27 2016 local0.info my_service[1348]: My logging message
Mon Dec 19 15:11:27 2016 daemon.emerg logread[1021]: failed to send log data to 192.168.1.200:514 via udp
What could be the problem? Ping from OpenWRT to 192.168.1.200 is successful. I guess OpenWRT is workling fine. Problem is the syslog-ng configuration right?
Thx for any help!
Finally it worked. Problem was on my ubuntu system (firewall). OpenWRT worked fine.
I just used the config system part of this question and the server configuration instructions on this page and it worked like a charm.
I created a /etc/rsyslog.d/10-openwrt-remote-logread.conf file with this content (no iptables needed):
$ModLoad imudp
$UDPServerRun 514
:fromhost-ip, isequal, "192.168.0.1" /var/log/openwrt.log
& ~
Now I have a nice openwrt.log file on my Raspberry.