YAF terminating on error (couldn't create connected TCP socket) - linux

I've installed and configured YAF (v. 2.8.4) + SiLK(v. 3.12.1) on Debian 8.2, and I faced with 2 problems:
1st. Every time I start yaf, as long as a TCP connection established, yaf process terminated with this error:
[2016-05-25 08:13:36] yaf terminating on error: couldn't create connected TCP socket to 127.0.0.1:18000 Connection refused
2nd. Also when yaf is running (there is not any TCP connection on eth0), out put of rwfilter --proto=0- --type=all --pass=stdout | rwcut | head command is empty.
I have some flow information for two days ago in /data/ directory, and I'm able to filter them by
rwfilter --start-date=2016/05/22 --end-date=2017/05/23 --proto=0- --type=all --pass=stdout | rwstats --fields=protocol --bottom --count=10
That's show that yaf and SiLK worked correctly on 23th May. (BUT for some minuets!!!). Unfortunately I only have today's logs and logs for 23th truncated.
Configs and Logs:
ps ax |grep "yaf\|rwflowpack":
58984 ? Ssl 0:00 /usr/local/sbin/rwflowpack --sensor-configuration=/data/sensor.conf --site-config-file=/ata/silk.conf --archive-directory=/var/lib/rwflowpack/archive --output-mode=local-storage --root-directory=/data --pidfile=/var/lib/rwflowpack/log/rwflowpack.pid --log-level=info --log-destination=syslog
84140 ? Ss 0:00 /usr/local/bin/yaf -d --live pcap --in eth0 --ipfix tcp --out localhost --ipfix-port 18000 --log /var/log/yaf/log/yaf.log --verbose --silk --applabel --max-payload=2048 --plugin-name=/usr/local/lib/yaf/dpacketplugin.la --pidfile /var/log/yaf/run/yaf.pid
iptables rules:
ACCEPT udp -- anywhere anywhere udp spt:18000
ACCEPT udp -- anywhere anywhere udp dpt:18000
ACCEPT tcp -- anywhere anywhere tcp spt:18000
ACCEPT tcp -- anywhere anywhere tcp dpt:18000
yaf.log:
[2016-05-25 08:48:02] yaf starting
[2016-05-25 08:48:02] Initializing Rules From File: /usr/local/etc/yafApplabelRules.conf
[2016-05-25 08:48:02] Application Labeler accepted 44 rules.
[2016-05-25 08:48:02] Application Labeler accepted 0 signatures.
[2016-05-25 08:48:02] DPI Running for ALL Protocols
[2016-05-25 08:48:02] Initializing Rules from DPI File /usr/local/etc/yafDPIRules.conf
[2016-05-25 08:48:02] DPI rule scanner accepted 63 rules from the DPI Rule File
[2016-05-25 08:48:02] DPI regular expressions cover 7 protocols
[2016-05-25 08:48:02] Forked child 82020. Parent exiting
[2016-05-25 08:48:02] running as root in --live mode, but not dropping privilege
[2016-05-25 08:50:48] Processed 814 packets into 0 flows:
[2016-05-25 08:50:48] Mean flow rate 0.00/s.
[2016-05-25 08:50:48] Mean packet rate 4.90/s.
[2016-05-25 08:50:48] Virtual bandwidth 0.0032 Mbps.
[2016-05-25 08:50:48] Maximum flow table size 36.
[2016-05-25 08:50:48] 29 flush events.
[2016-05-25 08:50:48] 0 asymmetric/unidirectional flows detected (-nan%)
[2016-05-25 08:50:48] YAF read 1643 total packets
[2016-05-25 08:50:48] Assembled 0 fragments into 0 packets:
[2016-05-25 08:50:48] Expired 0 incomplete fragmented packets. (0.00%)
[2016-05-25 08:50:48] Maximum fragment table size 0.
[2016-05-25 08:50:48] Rejected 829 packets during decode: (33.54%)
[2016-05-25 08:50:48] 829 due to unsupported/rejected packet type: (33.54%)
[2016-05-25 08:50:48] 829 unsupported/rejected Layer 3 headers. (33.54%)
[2016-05-25 08:50:48] 729 ARP packets. (29.49%)
[2016-05-25 08:50:48] 83 802.3 packets. (3.36%)
[2016-05-25 08:50:48] yaf terminating on error: couldn't create connected TCP socket to localhost:18000 Connection refused
rwflowpack logs:
May 25 13:17:54 XXX rwflowpack[58984]: 'S0': forward 0, reverse 0, ignored 0
May 25 13:19:54 XXX rwflowpack[58984]: Flushing files after 120 seconds.
May 25 13:19:54 XXX rwflowpack[58984]: 'S0': forward 0, reverse 0, ignored 0
May 25 13:21:54 XXX rwflowpack[58984]: Flushing files after 120 seconds.
May 25 13:21:54 XXX rwflowpack[58984]: 'S0': forward 0, reverse 0, ignored 0
usr/local/etc/yaf.conf:
ENABLED=1
YAF_CAP_TYPE=pcap
YAF_CAP_IF=eth0
YAF_IPFIX_PROTO=tcp
YAF_IPFIX_HOST=localhost
YAF_IPFIX_PORT=18000
YAF_STATEDIR=/var/log/yaf
YAF_EXTRAFLAGS="--silk --applabel --max-payload=2048 --plugin-name=/usr/local/lib/yaf/dpacketplugin.la"
/data/silk.conf:
version 2
sensor 0 S0 "Description for sensor S0"
sensor 1 S1
sensor 2 S2 "Optional description for sensor S2"
sensor 3 S3
sensor 4 S4
sensor 5 S5
sensor 6 S6
sensor 7 S7
sensor 8 S8
sensor 9 S9
sensor 10 S10
sensor 11 S11
sensor 12 S12
sensor 13 S13
sensor 14 S14
class all
sensors S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14
end class
class all
type 0 in in
type 1 out out
type 2 inweb iw
type 3 outweb ow
type 4 innull innull
type 5 outnull outnull
type 6 int2int int2int
type 7 ext2ext ext2ext
type 8 inicmp inicmp
type 9 outicmp outicmp
type 10 other other
default-types in inweb inicmp
end class
default-class all
packing-logic "packlogic-twoway.so"
/data/sensor.conf:
probe S0 ipfix
listen-on-port 18001
protocol tcp
end probe
sensor S0
ipfix-probes S0
internal-ipblocks 192.168.1.0/24 10.10.10.0/24
external-ipblocks remainder
end sensor
/usr/local/etc/rwflowpack.conf:
ENABLED=1
statedirectory=/var/lib/rwflowpack
CREATE_DIRECTORIES=yes
BIN_DIR=/usr/local/sbin
SENSOR_CONFIG=/data/sensor.conf
DATA_ROOTDIR=/data
SITE_CONFIG=/data/silk.conf
PACKING_LOGIC=
INPUT_MODE=stream
INCOMING_DIR=${statedirectory}/incoming
ARCHIVE_DIR=${statedirectory}/archive
FLAT_ARCHIVE=0
ERROR_DIR=
OUTPUT_MODE=local
SENDER_DIR=${statedirectory}/sender-incoming
INCREMENTAL_DIR=${statedirectory}/sender-incoming
COMPRESSION_TYPE=
POLLING_INTERVAL=
FLUSH_TIMEOUT=
FILE_CACHE_SIZE=
FILE_LOCKING=1
PACK_INTERFACES=0
SILK_IPFIX_PRINT_TEMPLATES=
LOG_TYPE=syslog
LOG_LEVEL=info
LOG_DIR=${statedirectory}/log
PID_DIR=${LOG_DIR}
USER=root
EXTRA_OPTIONS=
EXTRA_ENVVAR=
yaf --version:
yaf version 2.8.4 Build Configuration:
* Timezone support: UTC
* Fixbuf version: 1.7.1
* DAG support: NO
* Napatech support: NO
* Netronome support: NO
* Bivio support: NO
* PFRING support: NO
* Compact IPv4 support: NO
* Plugin support: YES
* Application Labeling: YES
* Payload Processing Support: YES
* Entropy support: NO
* Fingerprint Export Support: NO
* P0F Support: NO
* Spread Support: NO
* MPLS Support: NO
* Non-IP Support: NO
* Separate Interface Support: NO
SiLK version:
SiLK 3.12.1; configuration settings:
* Root of packed data tree: /data
* Packing logic: Run-time plug-in
* Timezone support: UTC
* Available compression methods: none [default], zlib
* IPv6 network connections: yes
* IPv6 flow record support: yes
* IPFIX/NetFlow9/sFlow collection: ipfix,netflow9,sflow
* Transport encryption: no
* PySiLK support: no
* Enable assert(): no

I'm new in YAF and SiLK.
I used bellow link for building and configuring YAF+SiLk
https://tools.netsa.cert.org/yaf/libyaf/yaf_silk.html
Thus all parameters derived from that tutorial.
YAF take a port in YAF_IPFIX_PORT to connect to the IPFIX collector on the specified port. So, YAF does not open any port with that number and does not listening to that port :|
So I changed the value ofYAF_IPFIX_PORT= in yaf.conf, from 18000 to 18001 (the port which is defined for listen-on-port in sensor.conf)
Now It's working and I'm able to filter traffics.

Related

[RTPEngine]Failed to init DTLS connection: key values mismatch

i got the connection like this
sip flow
freeswitch server --(sip)-- > opensips ----(wss)---> sip client in chrome with jssip/webrtc
the rtp flow
freeswitch server ---- > rtpengine-------> sip client in chrome with jssip/webrtc
the sip client is registered in opensips, when sip calls is originated, opensips will shift the sip call to wss protocol while the rtpengine will transfer the rtp stream in an encrypted way.
everything works fine unless the part between rtpengine to jssip client. the call will hangup in 30 sec due to NO MEDIA by freeswitch.
i checked the logs in rtpengine, and found these warnings and error,
ERR [crypto] Failed to init DTLS connection: key values mismatch
WARNING [core] ICE restart detected, but reset not allowed at this point
ERR [rtcp] SRTCP output wanted, but no crypto suite was negotiated
Here is my configure part for rtpengine offer in opensips.cfg
branch_route[1] {
# couples of lines omitted
$var(rtpengine_flags) = "RTP/SAVPF SDES-no rtcp-mux-offer replace-session-connection replace-origin ICE=force address-family=IP4 out-iface=pub in-iface=pub";
rtpengine_offer("$var(rtpengine_flags)");
}
Here are the SDP negotiation part
rtpengine-> jssip client in INVITE
v=0
o=FreeSWITCH 1668770237 1668770238 IN IP4 8.210.107.107
s=FreeSWITCH
c=IN IP4 8.210.107.107
t=0 0
a=msid-semantic: WMS 2ImUufOQPNDojzwQRNjtlprF4InCfAd7
m=audio 42782 RTP/SAVPF 8 0 101
a=ssrc:1803089321 cname:8pRtvw9aaDpk3M0K
a=ssrc:1803089321 msid:2ImUufOQPNDojzwQRNjtlprF4InCfAd7 a0
a=ssrc:1803089321 mslabel:2ImUufOQPNDojzwQRNjtlprF4InCfAd7
a=ssrc:1803089321 label:2ImUufOQPNDojzwQRNjtlprF4InCfAd7a0
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:101 telephone-event/8000
a=sendrecv
a=rtcp:42783
a=rtcp-mux
a=setup:actpass
a=fingerprint:sha-256 A4:C8:C6:97:DA:0A:FA:BC:B9:C1:97:D2:22:EF:70:6D:A1:78:B9:F9:00:60:AC:DF:69:E3:60:DB:F7:EA:2C:F5
a=ptime:20
a=ice-ufrag:ODhSSSee
a=ice-pwd:sK872thzGVIdizUSWzfuVzEOWe
a=candidate:zGAW3IRHAMII4q3e 1 UDP 2130706431 8.210.107.107 42782 typ host
a=candidate:zGAW3IRHAMII4q3e 2 UDP 2130706430 8.210.107.107 42783 typ host
jssip client --> rtpengine in reply 200 OK
v=0
o=- 5243747727296622141 2 IN IP4 127.0.0.1
s=-
t=0 0
a=msid-semantic: WMS O9XWhjITMoc3TM2D2J9KMKWKOC7Bd0xR9DZf
m=audio 64247 RTP/SAVPF 8 0 101
c=IN IP4 10.11.0.34
a=rtcp:9 IN IP4 0.0.0.0
a=candidate:589451690 1 udp 2122260223 10.11.0.34 64247 typ host generation 0 network-id 1 network-cost 50
a=candidate:3667169833 1 udp 2122194687 169.254.43.39 64248 typ host generation 0 network-id 2
a=candidate:2999745851 1 udp 2122129151 192.168.56.1 64249 typ host generation 0 network-id 3
a=candidate:6184858 1 udp 2122063615 169.254.223.104 64250 typ host generation 0 network-id 4
a=candidate:508951713 1 udp 2121998079 192.168.3.30 64251 typ host generation 0 network-id 5 network-cost 10
a=candidate:1839312218 1 tcp 1518280447 10.11.0.34 9 typ host tcptype active generation 0 network-id 1 network-cost 50
a=candidate:2484563673 1 tcp 1518214911 169.254.43.39 9 typ host tcptype active generation 0 network-id 2
a=candidate:4233069003 1 tcp 1518149375 192.168.56.1 9 typ host tcptype active generation 0 network-id 3
a=candidate:1323148138 1 tcp 1518083839 169.254.223.104 9 typ host tcptype active generation 0 network-id 4
a=candidate:1356202065 1 tcp 1518018303 192.168.3.30 9 typ host tcptype active generation 0 network-id 5 network-cost 10
a=ice-ufrag:xbP9
a=ice-pwd:5zKlMQDpo8aqCnJliT5bt6rH
a=ice-options:trickle
a=fingerprint:sha-256 12:FD:0A:B5:05:0B:0D:B8:62:7E:59:65:45:F9:5A:07:10:63:7C:0C:05:96:35:C9:27:D7:D7:7B:DE:C5:70:8A
a=setup:active
a=mid:0
a=sendrecv
a=rtcp-mux
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:101 telephone-event/8000
a=ssrc:2078142037 cname:ARU0gPhz4hvVPHMO
a=ssrc:2078142037 msid:O9XWhjITMoc3TM2D2J9KMKWKOC7Bd0xR9DZf f4a6248a-f3ca-4e23-a656-36b5d039ee3e
I tried to search any related issue or question and found issue 1524
However, it is no answer and solution for it yet.
I also read the source code of rtpengine, and realise it was from openssl, but i don't quite fimiliar with that. Hope anyone can help.
After a lot of ways to try, i just figured it out by myself.
before i tell the cause, i need to brief the process of dtls process of rtpengine.
when RtpEngine starts, it creates a certificate itself.
[1668851507.814949] INFO: [crypto] Generating new DTLS certificate
[1668851507.818173] INFO: [core] Startup complete, version undefined
the underlayer thing is invoking openssl api to create the certificate that interacts in dtls. this openssl api depends on the installed openssl lib in the os.
It performs as Servce end, and its gereated cert will be passed to the Client; My client is a webrtc sip client in Chrome, which has a latest version of openssl i guess.
It seems that client and server have different version for the encryption, which causes the error: key value mismatched
therefore, the root cause is old-version openssl lib.
i installed rtpengine in centos 7, which has a quite old version of openssl, like openssl 1.0.2k .( this is likely to be 2015 if i'm not wrong)
when i changed the os version to ubuntu 20, this issue was gone immediately.
And then I checked the base version of openssl that ubuntu 20 installed, it is openssl 3.0. I'm not sure whether this version is a must or not, but it works for this version.
So theoretically if you installed this verson of openssl in your OS, it could fix this issue.
but i didn't try that in centos 7 anyway.
Hope it can help the guys who encountered the similar issue like this.

use trex test vpp performance, memif can't connnect

The goal is to connect TREX packet generator with VPP memif interface on the same Host. The memif Interface fails to get connected with the following logs
Failed with info:
Apr 28 15:12:24 xx[370159]: memif_connect_client(): Failed to connect socket: /run/vpp/memif.sock.
Apr 28 15:12:25 xx[370159]: EAL: Error - exiting with code: 1#012 Cause:
Apr 28 15:12:25 xx[370159]: rte_eth_dev_start: err=-1, port=0
vpp version: 19.08.3
trex version: v2.89
vpp# show memif
sockets
id listener filename
0 yes (2) /run/vpp/memif.sock
interface memif0/0
socket-id 0 id 0 mode ethernet
flags admin-up
listener-fd 49 conn-fd 0
num-s2m-rings 0 num-m2s-rings 0 buffer-size 0 num-regions 0
interface memif0/1
socket-id 0 id 1 mode ethernet
flags admin-up
listener-fd 49 conn-fd 0
num-s2m-rings 0 num-m2s-rings 0 buffer-size 0 num-regions 0
# cat /etc/trex_cfg.yaml**
- version: 2
interfaces: ["--vdev=net_memif0,role=slave,id=0,socket=/run/vpp/memif.sock",
"--vdev=net_memif1,role=slave,id=1,socket=/run/vpp/memif.sock"]
port_info:
- ip: 172.21.0.253
default_gw: 172.21.0.254
- ip: 192.168.1.254
default_gw: 192.168.1.253
platform:
master_thread_id: 16
latency_thread_id: 17
dual_if:
- socket: 0
threads: [18,19]
DPDK memif PMD has two modes of operation
Client mode
Server mode
The sequence of start also affects how memif communicate and connects to listener (server).
So if we need to make VPP as server and TREX as client use (always start VPP first and create memif interface)
create interface memif id 0 master
set interface state memif0/0 up
set interface ip address memif0/0 12.12.5.1/24
and
--vdev=net_memif0,role=client,id=0,socket=/run/vpp/memif.sock
if we need to make TREX server and VPP client use (always start TREX first)
--vdev=net_memif0,role=server,id=0,socket=/run/vpp/memif.sock
and
create interface memif id 0 slave
set interface state memif0/0 up
set interface ip address memif0/0 12.12.5.2/24
note:
if more than 1 interface is required, ensure VPP creates memif interface 0 and 1 on separate sockets.
tested the same with dpdk-pktgen and vpp.
Since the Trex 2.92 has been upgraded to use DPDK version 21.02
In interface in /etc/trex_cfg.yaml, instead of:
interfaces: ["--vdev=net_memif0,role=slave,id=0,socket=/run/vpp/memif.sock",
it should be
interfaces: ["--vdev=net_memif0,role=slave,id=0,socket-abstract=no,socket=/run/vpp/memif.sock",
as by default it assumes the socket path as raw socket path.

Unable to record mediasoup producer using FFmpeg on real server

I have built a nice app in react native for audio calling, many thanks to MediaSoup!!
To take it to next level, I need to record some of my calls.
I used this tutorial for reference:
mediasoup recording demo
I followed the FFmpeg way and have reached a point where I have created a plainTransport with
router.createPlainTransport({
// No RTP will be received from the remote side
comedia: false,
// FFmpeg and GStreamer don't support RTP/RTCP multiplexing ("a=rtcp-mux" in SDP)
rtcpMux: false,
listenIp: {ip:"0.0.0.0", announcedIp:"MY_PUBLIC_IP"},
});
Then I connect to this transport:
rtpPlainTransport.connect({
ip: 127.0.0.1,
port: "port1",
rtcpPort: "port2",
});
My first doubt: is the ip address in .connect({}) parameters supplied above correct?
Second, the FFMPEG command requires an SDP header. This is mine:
v=0
o=- 0 0 IN IP4 127.0.0.1
s=-
c=IN IP4 127.0.0.1
t=0 0
m=audio port1 RTP/AVPF 111
a=rtcp:port2
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;useinbandfec=1
When I start recording, the FFMPEG process does not receive any data.
Moreover, on stopping, I get the following message
Output file is empty, nothing was encoded (check -ss / -t / -frames
parameters if used) Exiting normally, received signal 2. Recording
process exit, code: 255, signal: null
I was able to make the recording save on localhost with 127.0.0.1 when the server was itself running on localhost.
However, with my actual server hosted with Nginx, I'm not able to figure out what is going wrong.
I can see data being sent on my audio port:
1 0.000000000 127.0.0.1 → 127.0.0.1 UDP 117 10183 → 5004 Len=75
2 0.020787740 127.0.0.1 → 127.0.0.1 UDP 108 10183 → 5004 Len=66
3 0.043201757 127.0.0.1 → 127.0.0.1 UDP 118 10183 → 5004 Len=76
What do I do with FFmpeg so that it starts the recording!?
Can someone please help?
Solved the error. I had not set
“preferredPayloadType” value in mediaCodecs for audio to 111, which was required by FFmpeg.
100 does not work. Although I don’t completely understand why. It has to be 111.
If someone can explain this it’d be good. But anyways, I’m now able to record!
so mediacodecs must be:
{
kind: "audio",
mimeType: "audio/opus",
preferredPayloadType: 111,
clockRate: 48000,
channels: 2,
parameters: {
minptime: 10,
useinbandfec: 1,
},
},
and sdp should be:
v=0
o=- 0 0 IN IP4 127.0.0.1
s=-
c=IN IP4 127.0.0.1
t=0 0
m=audio 5004 RTP/AVPF 111
a=rtcp:5005
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;useinbandfec=1

Python program can not launch redhawksdr component with external network connected

I am a new user of RedHawkSDR and have written a python program under CentOS that controls a "SigGen" component. It works if there is no network connection except the loopback, but fails if I connect a wired network (listed as System ethO).
I do not specify any IP address in the python program, and the omniORB.cfg explicitly lists the loopback address as shown below, since there have been comments in other posts warning against using "localhost"
traceLevel=10
InitRef = NameService=corbaname::127.0.0.1:2809
supportBootstrapAgent = 1
InitRef = EventService=corbaloc::127.0.0.1:11169/omniEvents
Comparing the ominORB data that prints to the screen from the two cases:
Last identical step ==> "omniORB: AsyncInvoker: thread id=2 has started. Total threads=1
Next step:
for working (no network) ==> "omniORB: Adding root<0> (activating) to object table
for nonworking (network connected) ==> "omniORB: Removing root<0> (etherealizing) from object table
full message stream for network connected case==>
[aecom#crancentos1 Desktop]$ python pTrigger.py keyboard 5555 5050
omniORB: Version: 4.1.6
omniORB: Distribution date: Fri Jul 1 15:57:00 BST 2011 dgrisby
omniORB: Information: the omniDynamic library is not linked.
omniORB: omniORBpy distribution date: Fri Jul 1 14:52:31 BST 2011 dgrisby
omniORB: Initialising incoming endpoints.
omniORB: Attempt to set socket to listen on IPv4 and IPv6.
omniORB: Starting serving incoming endpoints.
omniORB: AsyncInvoker: thread id = 2 has started. Total threads = 1
/usr/local/redhawk/core/lib/python/ossie/utils/sb/domainless.py:863:
DeprecationWarning: Component class is deprecated. Use launch() method instead.
warnings.warn('Component class is deprecated. Use launch() method instead.', DeprecationWarning)
omniORB: Adding root<0> (activating) to object table.
omniORB: Creating ref to local: root<0>
target id : IDL:omg.org/CORBA/Object:1.0
most derived id: IDL:omg.org/CosNaming/NamingContextExt:1.0
omniORB: Creating Python ref to local: root<0>
target id : IDL:omg.org/CosNaming/NamingContextExt:1.0
most derived id: IDL:omg.org/CosNaming/NamingContextExt:1.0
omniORB: Version: 4.1.6
omniORB: Distribution date: Fri Jul 1 15:57:00 BST 2011 dgrisby
omniORB: Information: the omniDynamic library is not linked.
omniORB: omniORBpy distribution date: Fri Jul 1 14:52:31 BST 2011 dgrisby
omniORB: Initialising incoming endpoints.
omniORB: Attempt to set socket to listen on IPv4 and IPv6.
omniORB: Starting serving incoming endpoints.
omniORB: AsyncInvoker: thread id = 2 has started. Total threads = 1
omniORB: Removing root<0> (etherealising) from object table
Traceback (most recent call last):
File "pTrigger.py", line 118, in <module>
sigGen=sb.Component("SigGen")
File "/usr/local/redhawk/core/lib/python/ossie/utils/sb/domainless.py", line 872, in __new__
raise AssertionError, "Unable to launch component: '%s'" % e
AssertionError: Unable to launch component: 'resource 'SigGen_2' did not register with virtual environment'
Is there a system variable/token/thing that is "127.0.0.1" in the loopback case that switches to the network IP address when the system makes the network connection, which then confuses omniORB?
Any constructive guidance would be appreciated...
Best Regards,
Brad Meyer
ADDITIONAL DATA
// Firewall is off=============================
// Smoking gun ?===============================
omniORB: omniORBpy distribution date: Fri Jul 1 14:52:31 BST 2011 dgrisby
omniORB: Python thread state scavenger start.
omniORB: Initialising incoming endpoints.
omniORB: Instantiate endpoint 'giop:tcp:127.0.0.1:'
omniORB: Explicit bind to host 127.0.0.1.
omniORB: Bind to address 127.0.0.1 ephemeral port.
omniORB: Publish specification: 'addr'
omniORB: Try to publish 'addr' for endpoint giop:tcp:127.0.0.1:46877
omniORB: Publish endpoint 'giop:tcp:127.0.0.1:46877'
omniORB: Starting serving incoming endpoints.
omniORB: AsyncInvoker: thread id = 2 has started. Total threads = 1
omniORB: giopRendezvouser task execute for giop:tcp:127.0.0.1:46877
==>omniORB: SocketCollection idle. Sleeping.
omniORB: State root<0> (active) -> deactivating
// ifconfig shows loopback running ==================
Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:302 errors:0 dropped:0 overruns:0 frame:0
TX packets:302 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:26197 (25.5 KiB) TX bytes:26197 (25.5 KiB)
// ping 127.0.0.1 works ===========================================
// netstat -tulpn SHOWS OMIN CONNECTED TO SOME PORTS=========================================
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:42451 0.0.0.0:* LISTEN 2409/omniEvents
tcp 0 0 REDACTED FOR POST 0.0.0.0:* LISTEN 2617/dnsmasq
tcp 0 0 0.0.0.0:50517 0.0.0.0:* LISTEN 2067/rpc.statd
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 2254/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 2098/cupsd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 2346/master
tcp 0 0 127.0.0.1:42251 0.0.0.0:* LISTEN 2022/omniNames
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1902/rpcbind
tcp 0 0 :::39409 :::* LISTEN 2067/rpc.statd
tcp 0 0 :::22 :::* LISTEN 2254/sshd
tcp 0 0 ::1:631 :::* LISTEN 2098/cupsd
tcp 0 0 ::1:25 :::* LISTEN 2346/master
tcp 0 0 :::2809 :::* LISTEN 2022/omniNames
tcp 0 0 :::11169 :::* LISTEN 2409/omniEvents
tcp 0 0 :::111 :::* LISTEN 1902/rpcbind
When multiple network interfaces are available, omniORB arbitrarily chooses one of them for publishing object references (see part 5 in http://omniorb.sourceforge.net/omni41/omniNames.html). In your case, it seems to be grabbing eth0 when you are network-connected, which is not playing well with omniNames for whatever reason (could be a firewall setting).
To get around this, I recommend adding the following line to your /etc/omniORB.cfg file:
endPoint = giop:tcp:127.0.0.1:
This will force omniNames to always use the local loopback instead of eth0. Given your current omniORB.cfg settings, I am assuming using localhost is acceptable for your application. If this is not the case (i.e., you really need to use eth0 instead of localhost), we will need to find the root cause of why omniNames is having trouble with your eth0 interface.
Clarification (since I can't use line breaks in the comments section):
Try turning the log level up to 40 and see if anything useful shows up between these log lines:
omniORB: AsyncInvoker: thread id = 2 has started. Total threads = 1
omniORB: Removing root<0> (etherealising) from object table
I'm having trouble reproducing your problem. In my working case, I get something like this:
omniORB: AsyncInvoker: thread id = 2 has started. Total threads = 1
omniORB: giopRendezvouser task execute for giop:tcp:127.0.0.1:60625
omniORB: Adding root<0> (activating) to object table.
I'm curious as to see if the IP on the second line looks suspicious for you.
I worked on this for a week… but finally found the solution as DrewC pointed me in new directions to look.
In the etc/host file, I added a line “ 127.0.0.1 ‘ComputerName’” and the problem went away.
Brad

HAProxy + NodeJS gets stuck on TCP Retransmission

I have a HAProxy + NodeJS + Rails Setup, I use the NodeJS Server for file upload purposes.
The problem I'm facing is that if I'm uploading through haproxy to nodejs and a "TCP (Fast) Retransmission" occurs because of a lost packet the TX rate on the client drops to zero for about 5-10 secs and gets flooded with TCP Retransmissions.
This does not occur if I upload to NodeJS directly (TCP Retransmission happens too but it doesn't get stuck with dozens of retransmission attempts).
My test setup is a simple HTML4 FORM (method POST) with a single file input field.
The NodeJS Server only reads the incoming data and does nothing else.
I've tested this on multiple machines, networks, browsers, always the same issue.
Here's a TCP Traffic Dump from the client while uploading a file:
.....
TCP 1506 [TCP segment of a reassembled PDU]
>> everything is uploading fine until:
TCP 1506 [TCP Fast Retransmission] [TCP segment of a reassembled PDU]
TCP 66 [TCP Dup ACK 7392#1] 63265 > http [ACK] Seq=4844161 Ack=1 Win=524280 Len=0 TSval=657047088 TSecr=79373730
TCP 1506 [TCP Retransmission] [TCP segment of a reassembled PDU]
>> the last message is repeated about 50 times for >>5-10 secs<< (TX drops to 0 on client, RX drops to 0 on server)
TCP 1506 [TCP segment of a reassembled PDU]
>> upload continues until the next TCP Fast Retransmission and the same thing happens again
The haproxy.conf (haproxy v1.4.18 stable) is the following:
global
log 127.0.0.1 local1 debug
maxconn 4096 # Total Max Connections. This is dependent on ulimit
nbproc 2
defaults
log global
mode http
option httplog
option tcplog
frontend http-in
bind *:80
timeout client 6000
acl is_websocket path_beg /node/
use_backend node_backend if is_websocket
default_backend app_backend
# Rails Server (via nginx+passenger)
backend app_backend
option httpclose
option forwardfor
timeout server 30000
timeout connect 4000
server app1 127.0.0.1:3000
# node.js
backend node_backend
reqrep ^([^\ ]*)\ /node/(.*) \1\ /\2
option httpclose
option forwardfor
timeout queue 5000
timeout server 6000
timeout connect 5000
server node1 127.0.0.1:3200 weight 1 maxconn 4096
Thanks for reading! :)
Simon
Try setting "timeout http-request" to 6 seconds globally. It can typically be too low to pickup re-transmits and while it won't explain the cause it might solve your problem.
Try using https://github.com/nodejitsu/node-http-proxy. I am not sure if it will fit in your overall architecture requirement but it would be worth a try.

Resources