GridGain multicast discover does not find cluster nodes - gridgain

I'm trying to setup a gridgain cluster in a cloud environment (opensciencedatacloud.org).
I've verified that UDP multicast is available and port 47400 is open in this environment, but unfortunately GridGain is unable to find the other nodes when they are launched. Do you have clue why it is not working.
Following you can find below the a cluster node log:
INFO o.g.grid.kernal.GridKernal%nextflow - Config URL: n/a
INFO o.g.grid.kernal.GridKernal%nextflow - Daemon mode: off
INFO o.g.grid.kernal.GridKernal%nextflow - OS: Linux 2.6.32-358.2.1.el6.x86_64 amd64
INFO o.g.grid.kernal.GridKernal%nextflow - OS user: root
INFO o.g.grid.kernal.GridKernal%nextflow - Language runtime: Groovy
INFO o.g.grid.kernal.GridKernal%nextflow - VM information: Java(TM) SE Runtime Environment 1.7.0_51-b13 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 24.51-b03
INFO o.g.grid.kernal.GridKernal%nextflow - VM total memory: 0.83GB
INFO o.g.grid.kernal.GridKernal%nextflow - Remote Management [restart: off, REST: on, JMX (remote: off)]
INFO o.g.grid.kernal.GridKernal%nextflow - GRIDGAIN_HOME=/root
INFO o.g.grid.kernal.GridKernal%nextflow - VM arguments: [-Djava.awt.headless=true]
WARN o.g.grid.kernal.GridKernal%nextflow - SMTP is not configured - email notifications are off.
INFO o.g.grid.kernal.GridKernal%nextflow - Configured caches ['allSessions']
INFO o.g.grid.kernal.GridKernal%nextflow - 3-rd party licenses can be found at: /root/libs/licenses
INFO o.g.grid.kernal.GridKernal%nextflow - Local node user attribute [ROLE=worker]
[gridgain-#5%pub-nextflow%] WARN o.g.grid.kernal.GridDiagnostic - Initial heap size is less than 512MB (59MB). It is highly recommended to allocate at least 512MB of initial heap to run GridGain. Use -Xms512m -Xmx512m to set initial heap size.
INFO o.g.grid.kernal.GridKernal%nextflow - Non-loopback local IPs: 172.16.1.98, fe80:0:0:0:78b5:53ff:fe01:643b%3, fe80:0:0:0:f816:3eff:fe54:f4e8%2, 172.17.42.1
INFO o.g.grid.kernal.GridKernal%nextflow - Enabled local MACs: FA163E54F4E8, 7AB55301643B
INFO o.g.g.s.c.t.GridTcpCommunicationSpi - IPC shared memory server endpoint started [port=48100, tokDir=/root/work/ipc/shmem/cf5dbd14-4bb8-420b-998f-820056aa6d1c-2646]
INFO o.g.g.s.c.t.GridTcpCommunicationSpi - Successfully bound shared memory communication to TCP port [port=48100, locHost=0.0.0.0/0.0.0.0]
INFO o.g.g.s.c.t.GridTcpCommunicationSpi - Successfully bound to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0]
WARN o.g.g.s.c.noop.GridNoopCheckpointSpi - Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
INFO o.g.grid.kernal.GridKernal%nextflow - Security status [authentication=off, secure-session=off]
WARN o.g.g.k.p.cache.GridCacheProcessor - Cache write synchronization mode is set to FULL_ASYNC. All single-key 'put' and 'remove' operations will return 'null', all 'putx' and 'removex' operations will return 'true'.
WARN o.g.g.k.p.cache.GridCacheProcessor - Automatically set write order mode to PRIMARY for write synchronization mode [writeSynchronizationMode=FULL_ASYNC, cacheName=allSessions]
WARN o.g.g.k.p.cache.GridCacheProcessor - Query indexing is disabled (queries will not work) for cache: 'allSessions'. To enable change GridCacheConfiguration.isQueryIndexEnabled() property.
INFO o.g.g.k.p.cache.GridCacheDgcManager - <allSessions> DGC trace log disabled.
INFO o.g.g.k.p.cache.GridCacheProcessor - Started cache [name=allSessions, mode=REPLICATED]
INFO org.eclipse.jetty.server.Server - jetty-9.0.5.v20130815
INFO o.e.jetty.server.ServerConnector - Started ServerConnector#7b9617a0{HTTP/1.1}{0.0.0.0:8080}
INFO o.g.g.k.p.r.p.h.j.GridJettyRestProtocol - Command protocol successfully started [name=Jetty REST, host=/0.0.0.0, port=8080]
INFO o.g.g.k.p.r.p.t.GridTcpRestProtocol - Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
INFO o.g.g.s.d.tcp.GridTcpDiscoverySpi - Successfully bound to TCP port [port=47500, localHost=/172.16.1.98]
WARN o.g.g.s.d.t.i.m.GridTcpDiscoveryMulticastIpFinder - GridTcpDiscoveryMulticastIpFinder has no pre-configured addresses (it is recommended in production to specify at least one address in GridTcpDiscoveryMulticastIpFinder.getAddresses() configuration property)
>>> +------------------------------------------------------------------------------------+
>>> GridGain ver. platform-os-6.0.2#20140323-sha1:f9c796a1b29d2d7ce2737e681cbe578b5315d79f
>>> +------------------------------------------------------------------------------------+
>>> OS name: Linux 2.6.32-358.2.1.el6.x86_64 amd64
>>> CPU(s): 2
>>> Heap: 0.83GB
>>> VM name: 2646#node.novalocal
>>> Grid name: nextflow
>>> Local node [ID=CF5DBD14-4BB8-420B-998F-820056AA6D1C, order=1]
>>> Local node addresses: [node.novalocal/172.16.1.98]
>>> Local ports: TCP:8080 TCP:11211 TCP:47100 TCP:47500 TCP:48100
>>> GridGain documentation: http://www.gridgain.com/documentation
INFO o.g.g.k.m.d.GridDiscoveryManager - Topology snapshot [ver=1, nodes=1, CPUs=2, heap=0.83GB]

Usually software firewalls prevent multicast packets. Can you try with firewall disabled on your system?

Related

STM32CubeIDE and OpenOCD: Error: timed out while waiting for target halted

Hardware/IDE Context:
Part/board: Genuine STM32F103C8 (BluePill)
Programmer: ST-Link V2
IDE: STM32CubeIDE 1.5.1 on fully-updated Windows 10
Flashing utility/debugger: OpenOCD
In attempting to build/flash a simple PC_13 LED blinky program to my BluePill board, I experience errors from OpenOCD like so:
Open On-Chip Debugger 0.10.0+dev-01288-g7491fb4 (2020-10-27-17:36)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : STLINK V2J37S7 (API v2) VID:PID 0483:3748
Info : Target voltage: 3.256346
Info : Unable to match requested speed 8000 kHz, using 4000 kHz
Info : Unable to match requested speed 8000 kHz, using 4000 kHz
Info : clock speed 4000 kHz
Info : stlink_dap_op_connect(connect)
Info : SWD DPIDR 0x1ba01477
Info : STM32F103C8Tx.cpu: hardware has 6 breakpoints, 4 watchpoints
Info : STM32F103C8Tx.cpu: external reset detected
Info : starting gdb server for STM32F103C8Tx.cpu on 3333
Info : Listening on port 3333 for gdb connections
Info : accepting 'gdb' connection on tcp/3333
Error: timed out while waiting for target halted
Error executing event gdb-attach on target STM32F103C8Tx.cpu:
TARGET: STM32F103C8Tx.cpu - Not halted
Info : device id = 0x20036410
Info : flash size = 64kbytes
Info : accepting 'gdb' connection on tcp/3333
Error: timed out while waiting for target halted
Error executing event gdb-attach on target STM32F103C8Tx.cpu:
TARGET: STM32F103C8Tx.cpu - Not halted
Error: timed out while waiting for target halted
Error executing event gdb-flash-erase-start on target STM32F103C8Tx.cpu:
TARGET: STM32F103C8Tx.cpu - Not halted
Error: Target not halted
Error: failed erasing sectors 0 to 5
Error: flash_erase returned -304
shutdown command invoked
Info : dropped 'gdb' connection
shutdown command invoked
I'm interested in using OpenOCD-based flashing for my project to make use of some STM32F103C8 clone boards I have lying around, but the upload process works again when I switch the flashing mode/"Debug Probe" in STM32CubeIDE back to ST-Link (ST-Link GDB Server) from ST-Link (OpenOCD).
This is a peculiar error to me, especially since I specifically remember this exact configuration (STM32CubeIDE + OpenOCD + ST-Link + STM32F103C8) working a couple of months ago. Does anyone have any ideas as to what this could be caused by? I have the OpenOCD debugger to use the standard auto-generated config file.
Also please let me know if there is any more information/details you'd need to help diagnose this issue. I'd be happy to provide anything necessary.
EDIT 2/22/2021:
Here is a copy of the auto-generated (by STM32CubeIDE) OpenOCD .cfg file:
# This is an genericBoard board with a single STM32F103C8Tx chip
#
# Generated by STM32CubeIDE
# Take care that such file, as generated, may be overridden without any early notice. Please have a look to debug launch configuration setup(s)
source [find interface/stlink-dap.cfg]
set WORKAREASIZE 0x5000
transport select "dapdirect_swd"
set CHIPNAME STM32F103C8Tx
set BOARDNAME genericBoard
# Enable debug when in low power modes
set ENABLE_LOW_POWER 1
# Stop Watchdog counters when halt
set STOP_WATCHDOG 1
# STlink Debug clock frequency
set CLOCK_FREQ 8000
# Reset configuration
# use hardware reset, connect under reset
# connect_assert_srst needed if low power mode application running (WFI...)
reset_config srst_only srst_nogate connect_assert_srst
set CONNECT_UNDER_RESET 1
set CORE_RESET 0
# ACCESS PORT NUMBER
set AP_NUM 0
# GDB PORT
set GDB_PORT 3333
# BCTM CPU variables
source [find target/stm32f1x.cfg]
#SWV trace
tpiu config disable
Ultimately, after some further research and trial & error, I settled on a fix that seems to work for me. I noticed that when the error with halting the CPU came up, the correct program seemed to have been loaded and the RESET button just needed to be toggled manually. These are the OpenOCD settings I ended up settling on:
Changes from the default configuration:
SWD Frequency: 8 → 4 MHz
This is technically not required to work, but OpenOCD will automatically revert back to 4 MHz during the upload anyway
Reset Mode: Connect Under Reset → None
This works for me with the following output:
Open On-Chip Debugger 0.10.0+dev-01288-g7491fb4 (2020-10-27-17:36)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : STLINK V2J37S7 (API v2) VID:PID 0483:3748
Info : Target voltage: 3.254751
Info : clock speed 4000 kHz
Info : stlink_dap_op_connect(connect)
Info : SWD DPIDR 0x1ba01477
Info : STM32F103C8Tx.cpu: hardware has 6 breakpoints, 4 watchpoints
Info : STM32F103C8Tx.cpu: external reset detected
Info : starting gdb server for STM32F103C8Tx.cpu on 3333
Info : Listening on port 3333 for gdb connections
Info : accepting 'gdb' connection on tcp/3333
Info : device id = 0x20036410
Info : flash size = 64kbytes
undefined debug reason 8 - target needs reset
Info : accepting 'gdb' connection on tcp/3333
undefined debug reason 8 - target needs reset
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x08000474 msp: 0x20005000
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x08000474 msp: 0x20005000
shutdown command invoked
Info : dropped 'gdb' connection
shutdown command invoked

Connection between 2 nodes with private blockchain ethereum

Hello to all,
for three days I've been trying to get 2 nodes ethereum connected to each other but without success. I forgot, sorry if you find it difficult to understand me but I do not speak English................................................................................................
I started all over again ... so maybe I can make myself understand better ...
I initialize node 1:
Account created,
CustomGenesis.json:
{ "config": { "chainId": 150, "homesteadBlock": 0,
"eip155Block": 0, "eip158Block": 0 }, "difficulty": "2000",
"gasLimit": "2100000", "alloc": {
"ff61ed39188497df4b48ae61284e2c76f29adbb4": { "balance": "1000000000000" }
}, "coinbase": "0xfad2cc813e3de65335444f88a80c267fbb33b7b5", "nonce":
"0x0000000000000042", "mixhash":
"0x0000000000000000000000000000000000000000000000000000000000000000",
"parentHash":
"0x0000000000000000000000000000000000000000000000000000000000000000",
"timestamp": "0x00" }
I initialize the blockchain
$ geth --datadir ~/data/.ethereum_private init ~/data/CustomGenesis.json
INFO [02-09|00:36:52] Allocated cache and file handles
database=/home/max/data/.ethereum_private/geth/chaindata cache=16
handles=16
INFO [02-09|00:36:52] Writing custom genesis block
INFO [02-09|00:36:52] Successfully wrote genesis state
database=chaindata hash=8f09d5…
b82b6f
INFO [02-09|00:36:52] Allocated cache and file handles
database=/home/max/data/.ethereum_private/geth/lightchaindata cache=16
handles=16
INFO [02-09|00:36:52] Writing custom genesis block
INFO [02-09|00:36:52] Successfully wrote genesis state
database=lightchaindata
hash=8f09d5…b82b6f
$ geth --datadir ~/data/.ethereum_private --nodiscover --maxpeers 1 --
networkid 150 --rpc --rpccorsdomain "*" console 2>>eth2.log
INFO [02-09|00:42:30] Starting peer-to-peer node
instance=Geth/v1.7.3-stable-4bb3c89d/linux-amd64/go1.9
INFO [02-09|00:42:30] Allocated cache and file handles
database=/home/max/data/.ethereum_private/geth/chaindata cache=128
handles=1024
INFO [02-09|00:42:30] Initialised chain configuration config="
{ChainID: 150 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: <nil>
EIP155: 0 EIP158: 0 Byzantium: <nil> Engine: unknown}"
INFO [02-09|00:42:30] Disk storage enabled for ethash caches
dir=/home/max/data/.ethereum_private/geth/ethash count=3
INFO [02-09|00:42:30] Disk storage enabled for ethash DAGs
dir=/home/max/.ethash count=2
INFO [02-09|00:42:30] Initialising Ethereum protocol versions="
[63 62]" network=150
INFO [02-09|00:42:30] Loaded most recent local header number=0
hash=8f09d5…b82b6f td=2000
INFO [02-09|00:42:30] Loaded most recent local full block number=0
hash=8f09d5…b82b6f td=2000
INFO [02-09|00:42:30] Loaded most recent local fast block number=0
hash=8f09d5…b82b6f td=2000
INFO [02-09|00:42:30] Loaded local transaction journal
transactions=0 dropped=0
INFO [02-09|00:42:30] Regenerated local transaction journal
transactions=0 accounts=0
INFO [02-09|00:42:30] Starting P2P networking
INFO [02-09|00:42:30] HTTP endpoint opened: http://127.0.0.1:8545
INFO [02-09|00:42:30] RLPx listener up
self="enode://e88391e5f801132c12912f52f
a27e22231d782a883138e7219a9c16c8bed7212b
d03a45580400cbc019997d2365b090246deb
216e4b35d80be332fa3ef39ff38#[::]:30303?
discport=0"
INFO [02-09|00:42:30] IPC endpoint opened:
/home/max/data/.ethereum_private/geth.ipc
INFO [02-09|00:42:34] Mapped network port proto=tcp
extport=30303 intport=30303 interface="UPNP IGDv1-IP1"
I initialize node 2:
Account created,
CustomGenesis.json idem (I change only the address of the account)
$ geth --datadir ~/data/.ethereum_private init ~/data/CustomGenesis.json
INFO [02-09|00:50:10] Allocated cache and file handles
database=/home/max/data/.ethereum_private/geth/chaindata cache=16
handles=16
INFO [02-09|00:50:10] Writing custom genesis block
INFO [02-09|00:50:10] Successfully wrote genesis state
database=chaindata hash=822931…
c3a730
INFO [02-09|00:50:10] Allocated cache and file handles
database=/home/max/data/.ethereum_private/geth/lightchaindata cache=16
handles=16
INFO [02-09|00:50:10] Writing custom genesis block
INFO [02-09|00:50:10] Successfully wrote genesis state
database=lightchaindata
hash=822931…c3a730
$ $ geth --datadir ~/data/.ethereum_private --nodiscover --maxpeers 1 --
networkid 150 --rpc --rpccorsdomain "*" console 2>>eth2.log
INFO [02-09|00:56:40] Starting peer-to-peer node
instance=Geth/v1.7.3-stable-4bb3c89d/linux-amd64/go1.9
INFO [02-09|00:56:40] Allocated cache and file handles
database=/home/max/data/.ethereum_private/geth/chaindata cache=128
handles=1024
WARN [02-09|00:56:40] Upgrading database to use lookup entries
INFO [02-09|00:56:40] Initialised chain configuration config="
{ChainID: 150 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: <nil>
EIP155: 0 EIP158: 0 Byzantium: <nil> Engine: unknown}"
INFO [02-09|00:56:40] Disk storage enabled for ethash caches
dir=/home/max/data/.ethereum_private/geth/ethash count=3
INFO [02-09|00:56:40] Disk storage enabled for ethash DAGs
dir=/home/max/.ethash count=2
INFO [02-09|00:56:40] Initialising Ethereum protocol
versions="[63 62]" network=150
INFO [02-09|00:56:40] Loaded most recent local header number=0
hash=822931…c3a730 td=2000
INFO [02-09|00:56:40] Loaded most recent local full block
number=0 hash=822931…c3a730 td=2000
INFO [02-09|00:56:40] Loaded most recent local fast block number=0
hash=822931…c3a730 td=2000
INFO [02-09|00:56:40] Regenerated local transaction journal
transactions=0 accounts=0
INFO [02-09|00:56:40] Starting P2P networking
INFO [02-09|00:56:40] HTTP endpoint opened: http://127.0.0.1:8545
INFO [02-09|00:56:40] Database deduplication successful
deduped=0
INFO [02-09|00:56:40] RLPx listener up
self="enode://0e1055c31a7108698693
4bd8ba8add3a13721a81f061a657c61d109eaf
0a75faa4d56309bdd699cdba0fac9abd18a92fa
05285a7c4cdded73489c41aaaf2ee17#[::]:30303?discport=0"
INFO [02-09|00:56:40] IPC endpoint opened:
/home/max/data/.ethereum_private/geth.ipc
INFO [02-09|00:56:44] Mapped network port
proto=tcp extport=30303 intport=30303 interface="UPNP IGDv1-IP1"
proceed by inserting the enode of node 1 in node 2
NODE 2
admin.addPeer("enode://e88391e5f801132c12912f52fa27e22231d782a883138e7219a9c16c8bed7212bd03a45580400cbc019997d2365b090246deb216e4b35d80be332fa3ef39ff38#10.0.0.61:30303")
true
>
at this point nothing comes out in the log and everything remains still. After entering the node's node 1, in node 2, node 2 should not connect to node 1?
It's clear that I'm wrong, but I do not understand what!
Can you help me?

Docker-Flink: TaskManagers can't find JobManager when in different nodes in Docker Swarm

This happens even when the nodes are in the same subnet.
I am using the Docker-Flink project in:
https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
I am creating the services with the following commands:
docker network create -d overlay overlay
docker service create --name jobmanager --env JOB_MANAGER_RPC_ADDRESS=jobmanager -p 8081:8081 --network overlay --constraint 'node.hostname == ubuntu-swarm-manager' flink jobmanager
docker service create --name taskmanager --env JOB_MANAGER_RPC_ADDRESS=jobmanager --network overlay --constraint 'node.hostname != ubuntu-swarm-manager' flink taskmanager
This is the error I get:
- Trying to register at JobManager akka.tcp://flink#jobmanager:6123/ user/jobmanager (attempt 4, timeout: 4000 milliseconds)
These are my environment configurations:
node: ubuntu-swarm-master Azure VM Standard D4s v3 (4 vcpus, 16 GB
memory) Docker version 17.03.1-ce, build c6d412e
node: azure-swarm-worker-1 Azure VM Standard D2 v2 Promo (2 vcpus, 7
GB memory) Docker version 17.09.0-ce, build afdb6d4
Flink: using image 1.3.2-hadoop2-scala_2.10
This is from the log of the container running TaskManager:
Starts ok...
Starting Task Manager
config file:
jobmanager.rpc.address: jobmanager
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 1024
taskmanager.numberOfTaskSlots: 2
taskmanager.memory.preallocate: false
parallelism.default: 1
jobmanager.web.port: 8081
blob.server.port: 6124
query.server.port: 6125
Starting taskmanager as a console application on host 00afd4130a94.
Then there are some errors (scroll right):
2017-11-02 14:06:51,064 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to select the network interface and address to use by connecting to the leading JobManager.
2017-11-02 14:06:51,065 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics
2017-11-02 14:06:51,067 INFO org.apache.flink.runtime.net.ConnectionUtils - Retrieved new target address jobmanager/10.0.0.2:6123.
2017-11-02 14:06:54,578 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address jobmanager/10.0.0.2:6123
2017-11-02 14:06:54,779 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '00afd4130a94/10.0.0.5': connect timed out
2017-11-02 14:06:54,829 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:06:54,880 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:06:54,931 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.18.0.3': connect timed out
2017-11-02 14:06:54,981 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:06:55,031 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:06:55,032 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed)
2017-11-02 14:06:56,034 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.18.0.3': connect timed out
2017-11-02 14:06:57,036 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:06:58,037 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:06:58,038 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed)
2017-11-02 14:06:58,138 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address jobmanager/10.0.0.2:6123
2017-11-02 14:06:58,339 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '00afd4130a94/10.0.0.5': connect timed out
2017-11-02 14:06:58,389 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:06:58,439 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:06:58,490 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.18.0.3': connect timed out
2017-11-02 14:06:58,541 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:06:58,592 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:06:58,592 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed)
2017-11-02 14:06:59,593 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.18.0.3': connect timed out
2017-11-02 14:07:00,595 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:07:01,599 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:07:01,599 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed)
2017-11-02 14:07:01,600 WARN org.apache.flink.runtime.net.ConnectionUtils - Could not connect to jobmanager/10.0.0.2:6123. Selecting a local address using heuristics.
2017-11-02 14:07:01,601 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager will use hostname/address '00afd4130a94' (10.0.0.5) for communication.
2017-11-02 14:07:01,601 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager
2017-11-02 14:07:01,601 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor system at 00afd4130a94:0.
2017-11-02 14:07:01,947 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2017-11-02 14:07:01,978 INFO Remoting - Starting remoting
2017-11-02 14:07:02,168 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://flink#00afd4130a94:33881]
2017-11-02 14:07:02,174 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor
2017-11-02 14:07:02,192 INFO org.apache.flink.runtime.io.network.netty.NettyConfig - NettyConfig [server address: 00afd4130a94/10.0.0.5, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: NIO, number of server threads: 2 (manual), number of client threads: 2 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
2017-11-02 14:07:02,199 INFO org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages have a max timeout of 10000 ms
2017-11-02 14:07:02,201 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Temporary file directory '/tmp': total 29 GB, usable 25 GB (86.21% usable)
2017-11-02 14:07:02,286 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 101 MB for network buffer pool (number of memory segments: 3260, bytes per segment: 32768).
2017-11-02 14:07:02,393 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Starting the network environment and its components.
2017-11-02 14:07:02,400 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful initialization (took 2 ms).
2017-11-02 14:07:02,434 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful initialization (took 32 ms). Listening on SocketAddress /10.0.0.5:42921.
2017-11-02 14:07:02,493 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Limiting managed memory to 0.7 of the currently free heap space (640 MB), memory will be allocated lazily.
2017-11-02 14:07:02,498 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager uses directory /tmp/flink-io-e57d51fa-2269-4df0-9910-0fe26c6042bd for spill files.
2017-11-02 14:07:02,501 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported.
2017-11-02 14:07:02,553 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-2c0c063f-464e-48f1-9fb8-fcfa48868e3a
2017-11-02 14:07:02,564 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-0c5e2b25-70a2-4964-9eec-24b0e79d560e
2017-11-02 14:07:02,572 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor at akka://flink/user/taskmanager#1719715507.
2017-11-02 14:07:02,572 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager data connection information: df5992297d269fa16a5e945e1dce0451 # 00afd4130a94 (dataPort=42921)
2017-11-02 14:07:02,573 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager has 2 task slot(s).
2017-11-02 14:07:02,574 INFO org.apache.flink.runtime.taskmanager.TaskManager - Memory usage stats: [HEAP: 113/1024/1024 MB, NON HEAP: 33/33/-1 MB (used/committed/max)]
2017-11-02 14:07:02,576 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink#jobmanager:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds)
2017-11-02 14:07:03,106 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink#jobmanager:6123/user/jobmanager (attempt 2, timeout: 1000 milliseconds)
2017-11-02 14:07:04,126 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink#jobmanager:6123/user/jobmanager (attempt 3, timeout: 2000 milliseconds)
Here is the log from the container running JobManager:
Starting Job Manager
config file:
jobmanager.rpc.address: jobmanager
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 1024
taskmanager.numberOfTaskSlots: 1
taskmanager.memory.preallocate: false
parallelism.default: 1
jobmanager.web.port: 8081
blob.server.port: 6124
query.server.port: 6125
Starting jobmanager as a console application on host c30e0fe7b765.
2017-11-02 13:42:33,721 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - --------------------------------------------------------------------------------
2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 # 10:23:11 UTC)
2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - Current user: flink
2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15
2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - Maximum heap size: 981 MiBytes
2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - JAVA_HOME: /docker-java-home/jre
2017-11-02 13:42:33,799 INFO org.apache.flink.runtime.jobmanager.JobManager - Hadoop version: 2.7.2
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options:
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms1024m
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx1024m
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - Program Arguments:
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - --configDir
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - /opt/flink/conf
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - --executionMode
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - cluster
2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - Classpath: /opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar:::
2017-11-02 13:42:33,801 INFO org.apache.flink.runtime.jobmanager.JobManager - --------------------------------------------------------------------------------
2017-11-02 13:42:33,801 INFO org.apache.flink.runtime.jobmanager.JobManager - Registered UNIX signal handlers for [TERM, HUP, INT]
2017-11-02 13:42:33,911 INFO org.apache.flink.runtime.jobmanager.JobManager - Loading configuration from /opt/flink/conf
2017-11-02 13:42:33,914 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, jobmanager
2017-11-02 13:42:33,915 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2017-11-02 13:42:33,915 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024
2017-11-02 13:42:33,915 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024
2017-11-02 13:42:33,915 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2017-11-02 13:42:33,915 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false
2017-11-02 13:42:33,916 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2017-11-02 13:42:33,916 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081
2017-11-02 13:42:33,917 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124
2017-11-02 13:42:33,917 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125
2017-11-02 13:42:33,924 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager without high-availability
2017-11-02 13:42:33,926 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager on jobmanager:6123 with execution mode CLUSTER
2017-11-02 13:42:33,934 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, jobmanager
2017-11-02 13:42:33,934 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2017-11-02 13:42:33,934 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024
2017-11-02 13:42:33,934 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024
2017-11-02 13:42:33,935 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2017-11-02 13:42:33,935 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false
2017-11-02 13:42:33,935 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2017-11-02 13:42:33,935 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081
2017-11-02 13:42:33,936 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124
2017-11-02 13:42:33,936 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125
2017-11-02 13:42:33,962 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to flink (auth:SIMPLE)
2017-11-02 13:42:34,026 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor system reachable at jobmanager:6123
2017-11-02 13:42:34,290 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2017-11-02 13:42:34,327 INFO Remoting - Starting remoting
2017-11-02 13:42:34,505 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://flink#jobmanager:6123]
2017-11-02 13:42:34,524 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager web frontend
2017-11-02 13:42:34,532 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - Log file environment variable 'log.file' is not set.
2017-11-02 13:42:34,532 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'jobmanager.web.log.path'.
2017-11-02 13:42:34,532 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory /tmp/flink-web-9f0ba581-3488-4086-a79c-53e17b56352c for the web interface files
2017-11-02 13:42:34,533 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory /tmp/flink-web-17a58ccf-7d8b-475e-b727-4a7935a19c0f for web frontend JAR file uploads
2017-11-02 13:42:34,741 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Web frontend listening at 0:0:0:0:0:0:0:0:8081
2017-11-02 13:42:34,741 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor
2017-11-02 13:42:34,751 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-d10b620a-73ae-40af-bd23-aad5211fe1cc
2017-11-02 13:42:34,752 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:6124 - max concurrent requests: 50 - max backlog: 1000
2017-11-02 13:42:34,763 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported.
2017-11-02 13:42:34,769 INFO org.apache.flink.runtime.jobmanager.MemoryArchivist - Started memory archivist akka://flink/user/archive
2017-11-02 13:42:34,774 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Starting with JobManager akka.tcp://flink#jobmanager:6123/user/jobmanager on port 8081
2017-11-02 13:42:34,774 INFO org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader reachable under akka.tcp://flink#jobmanager:6123/user/jobmanager:00000000-0000-0000-0000-000000000000.
2017-11-02 13:42:34,776 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager at akka.tcp://flink#jobmanager:6123/user/jobmanager.
2017-11-02 13:42:34,785 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Trying to associate with JobManager leader akka.tcp://flink#jobmanager:6123/user/jobmanager
2017-11-02 13:42:34,801 INFO org.apache.flink.runtime.jobmanager.JobManager - JobManager akka.tcp://flink#jobmanager:6123/user/jobmanager was granted leadership with leader session ID Some(00000000-0000-0000-0000-000000000000).
2017-11-02 13:42:34,814 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#844712453] - leader session 00000000-0000-0000-0000-000000000000
Why can't the TaskManagers talk to JobManager? I wonder if there's some configuration missing. Any help will be much appreciated. Thank you very much!

ConfigurationException while launching Apache Cassanda DB: This node was decommissioned and will not rejoin the ring

This is a snippet from the system log while shutting down:
INFO [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:28:50,995 StorageService.java:3788 - Announcing that I have left the ring for 30000ms
INFO [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:29:20,995 ThriftServer.java:142 - Stop listening to thrift clients
INFO [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:29:20,997 Server.java:182 - Stop listening for CQL clients
WARN [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:29:20,997 Gossiper.java:1508 - No local state or state is in silent shutdown, not announcing shutdown
INFO [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:29:20,997 MessagingService.java:786 - Waiting for messaging service to quiesce
INFO [ACCEPT-sysengplayl0127.bio-iad.ea.com/10.72.194.229] 2016-07-27 22:29:20,998 MessagingService.java:1133 - MessagingService has terminated the accept() thread
INFO [RMI TCP Connection(12)-127.0.0.1] 2016-07-27 22:29:21,022 StorageService.java:1411 - DECOMMISSIONED
INFO [main] 2016-07-27 22:32:17,534 YamlConfigurationLoader.java:89 - Configuration location: file:/opt/cassandra/product/apache-cassandra-3.7/conf/cassandra.yaml
And then while starting up:
INFO [main] 2016-07-27 22:32:20,316 StorageService.java:630 - Cassandra version: 3.7
INFO [main] 2016-07-27 22:32:20,316 StorageService.java:631 - Thrift API version: 20.1.0
INFO [main] 2016-07-27 22:32:20,316 StorageService.java:632 - CQL supported versions: 3.4.2 (default: 3.4.2)
INFO [main] 2016-07-27 22:32:20,351 IndexSummaryManager.java:85 - Initializing index summary manager with a memory pool size of 397 MB and a resize interval of 60 minutes
ERROR [main] 2016-07-27 22:32:20,357 CassandraDaemon.java:731 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set, or all existing data is removed and the node is bootstrapped again
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:815) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:725) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:625) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:370) [apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:585) [apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:714) [apache-cassandra-3.7.jar:3.7]
WARN [StorageServiceShutdownHook] 2016-07-27 22:32:20,358 Gossiper.java:1508 - No local state or state is in silent shutdown, not announcing shutdown
INFO [StorageServiceShutdownHook] 2016-07-27 22:32:20,359 MessagingService.java:786 - Waiting for messaging service to quiesce
Is there something wrong with the configuration?
I had faced same issue.
Posting the answer so that it might help others.
As the log suggests, the property "cassandra.override_decommission" should be overridden.
start cassandra with the syntax:
cassandra -Dcassandra.override_decommission=true
This should add the node back to the cluster.

site unavailable after install and reboot and plonectl start

Ubuntu 10.04 system. new Plone install, went fine and created some content, everything seemed fine. New kernel update and a reboot later, Plone is running but will not present any pages to a browser. In fact, a browser attempt just times out. I can telnet to the port 8080 on the system and send an HTTP get by hand and nothing comes back. The log file for client1 in a zeo install keeps repeating:
2011-08-10T16:59:57 INFO ZServer HTTP server started at Wed Aug 10 16:59:57 2011
Hostname: 0.0.0.0
Port: 8080
------
2011-08-10T16:59:57 INFO Zope Set effective user to "plone"
------
2011-08-10T17:00:02 INFO ZEO.ClientStorage zeostorage ClientStorage (pid=24596) created RW/normal for storage: '1'
------
2011-08-10T17:00:02 INFO ZEO.cache created temporary cache file '<fdopen>'
------
2011-08-10T17:00:02 INFO ZEO.ClientStorage zeostorage Testing connection <ManagedClientConnection ('127.0.0.1', 8100)>
------
2011-08-10T17:00:02 INFO ZEO.zrpc.Connection(C) (127.0.0.1:8100) received handshake 'Z3101'
------
2011-08-10T17:00:02 INFO ZEO.ClientStorage zeostorage Server authentication protocol None
------
2011-08-10T17:00:02 INFO ZEO.ClientStorage zeostorage Connected to storage: ('dns', 8100)
------
2011-08-10T17:00:02 INFO ZEO.ClientStorage zeostorage No verification necessary -- empty cache
------
2011-08-10T17:00:22 INFO ZServer HTTP server started at Wed Aug 10 17:00:22 2011
Hostname: 0.0.0.0
Port: 8080
I haven't been able to find any other info on what is causing this, nor can I find any documentation on debugging a Plone install.
Thanks for any help you can provide.
Forgive the aborted answer, misread the log snippet. The repeated log entries you're seeing are what you'd expect to see from repeated restarts. Are you repeatedly restarting the instance? If not, then in it seems your instance is restarting on it's own. Shut down the instance and start it using "bin/instance fg" and see if that gives you more information.

Resources