Hazelcast 3.6 - java.io.IOException: No available connection to address - hazelcast

i am using hazelcast 3.6 cluster consist of 2 nodes .
my client configuration is :
ClientConfig clientConfig = new ClientConfig();
clientConfig.getGroupConfig().setName("dev").setPassword("dev-pass");
String[] list = hazelcastServerList.toString().split(" ");
clientConfig.getNetworkConfig().addAddress(list);
clientConfig.getNetworkConfig().setConnectionAttemptLimit(5);
clientConfig.getNetworkConfig().setSmartRouting(true);
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
i see that sometimes i get this error :
error is :java.io.IOException: No available connection to address Address[{node1_address}]:5701
i wonder :
why it happens
why it is not failed over to the second node, this is the whole purpose of the cluster , isn't it ?
i don't know if it is related or not , but the address of the hazelcast servers is recognized behind a VPN network , and is resolved to a private IP.
member config is :
Copyright (C) 2012.
Olaf Bergner.
Hamburg, Germany. olaf.bergner#gmx.de
All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS"
BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied. See the License for the specific language
governing permissions and limitations under the License.
-->
<hazelcast
xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.6.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
id="hazelcast-server.defaultInstance">
<properties>
<property
name="hazelcast.logging.type">slf4j</property>
<property
name="hazelcast.version.check.enabled">false</property>
<property
name="hazelcast.mancenter.enabled">false</property>
<property
name="hazelcast.memcache.enabled">true</property>
<property
name="hazelcast.rest.enabled">true</property>
<property
name="hazelcast.log.state">true</property>
<property
name="hazelcast.jmx">true</property>
<property
name="hazelcast.jmx.detailed">true</property>
<property
name="hazelcast.executor.client.thread.count">100</property>
</properties>
<group>
<name>dev</name>
<password>dev-pass</password>
</group>
<management-center
enabled="false">http://localhost:8080/mancenter</management-center>
<network>
<port
auto-increment="true">5701</port>
<join>
<multicast
enabled="false">
<multicast-group>IP</multicast-group>
<multicast-port>54327</multicast-port>
<multicast-timeout-seconds>3</multicast-timeout-seconds>
</multicast>
<tcp-ip connection-timeout-seconds="60"
enabled="true">
<!-- <connection-timeout-seconds>60</connection-timeout-seconds> -->
<interface>hostname1:5701</interface>
<interface>hostname2:5701</interface>
</tcp-ip>
</join>
<interfaces
enabled="false">
<interface>10.10.1.*</interface>
</interfaces>
<ssl
enabled="false" />
<socket-interceptor
enabled="false" />
</network>
<partition-group
enabled="false" />
<executor-service name="exec">
<pool-size>16</pool-size>
<!--Queue capacity. 0 means Integer.MAX_VALUE.-->
<queue-capacity>0</queue-capacity>
<statistics-enabled>true</statistics-enabled>
<!-- <core-pool-size>50</core-pool-size>
<max-pool-size>200</max-pool-size>
<keep-alive-seconds>60</keep-alive-seconds> -->
</executor-service>
<map name="default">
<!--
Number of backups. If 1 is set as the backup-count for example, then all entries of
the map will be copied to another JVM for fail-safety. 0 means no backup.
-->
<backup-count>1</backup-count>
<!--
Maximum number of seconds for each entry to stay in the map. Entries that are
older than <time-to-live-seconds> and not updated for <time-to-live-seconds>
will get automatically evicted from the map.
Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. Default is 0.
-->
<time-to-live-seconds>86400</time-to-live-seconds>
<!--
Maximum number of seconds for each entry to stay idle in the map. Entries that are
idle(not touched) for more than <max-idle-seconds> will get
automatically evicted from the map. Entry is touched if get, put or containsKey is called.
Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. Default is 0.
-->
<max-idle-seconds>86400</max-idle-seconds>
<!--
Valid values are:
NONE (no eviction),
LRU (Least Recently Used),
LFU (Least Frequently Used).
NONE is the default.
-->
<eviction-policy>LFU</eviction-policy>
<!--
Maximum size of the map. When max size is reached,
map is evicted based on the policy defined.
Any integer between 0 and Integer.MAX_VALUE. 0 means
Integer.MAX_VALUE. Default is 0.
-->
<max-size policy="PER_NODE">100000</max-size>
<!--
When max. size is reached, specified percentage of
the map will be evicted. Any integer between 0 and 100.
If 25 is set for example, 25% of the entries will
get evicted.
-->
<eviction-percentage>15</eviction-percentage>
<!--
Minimum time in milliseconds which should pass before checking
if a partition of this map is evictable or not.
Default value is 100 millis.
-->
<min-eviction-check-millis>100</min-eviction-check-millis>
<!--
While recovering from split-brain (network partitioning),
map entries in the small cluster will merge into the bigger cluster
based on the policy set here. When an entry merge into the
cluster, there might an existing entry with the same key already.
Values of these entries might be different for that same key.
Which value should be set for the key? Conflict is resolved by
the policy set here. Default policy is PutIfAbsentMapMergePolicy
There are built-in merge policies such as
com.hazelcast.map.merge.PassThroughMergePolicy; entry will be
overwritten if merging entry exists for the key.
com.hazelcast.map.merge.PutIfAbsentMapMergePolicy ; entry will be added if the merging entry doesn't exist in the cluster.
com.hazelcast.map.merge.HigherHitsMapMergePolicy ; entry with the higher hits wins.
com.hazelcast.map.merge.LatestUpdateMapMergePolicy ; entry with the latest update wins.
-->
<merge-policy>com.hazelcast.map.merge.LatestUpdateMapMergePolicy</merge-policy>
</map>
<map name="local">
<!--
Number of backups. If 1 is set as the backup-count for example,
then all entries of the map will be copied to another JVM for
fail-safety. Valid numbers are 0 (no backup), 1, 2, 3.
-->
<backup-count>1</backup-count>
<!--
Maximum number of seconds for each entry to stay in the map. Entries
that are
older than <time-to-live-seconds> and not updated for <time-to-live-
seconds>
will get automatically evicted from the map.
Any integer between 0 and Integer.MAX_VALUE. 0 means infinite.
Default is 0.
-->
<time-to-live-seconds>86400</time-to-live-seconds>
<!--
Maximum number of seconds for each entry to stay idle in the map. Entries that are
idle(not touched) for more than <max-idle-seconds> will get
automatically evicted from the map.
Entry is touched if get, put or containsKey is called.
Any integer between 0 and Integer.MAX_VALUE.
0 means infinite. Default is 0.
-->
<max-idle-seconds>86400</max-idle-seconds>
<!--
Valid values are:
NONE (no extra eviction, <time-to-live-seconds> may still apply),
LRU (Least Recently Used),
LFU (Least Frequently Used).
NONE is the default.
Regardless of the eviction policy used, <time-to-live-seconds> will still apply.
-->
<eviction-policy>LRU</eviction-policy>
<!--
Maximum size of the map. When max size is reached,
map is evicted based on the policy defined.
Any integer between 0 and Integer.MAX_VALUE. 0 means
Integer.MAX_VALUE. Default is 0.
-->
<!-- <max-size policy="cluster_wide_map_size">0</max-size> -->
<max-size policy="PER_NODE">100000</max-size>
<!--
When max. size is reached, specified percentage of
the map will be evicted. Any integer between 0 and 100.
If 25 is set for example, 25% of the entries will
get evicted.
-->
<eviction-percentage>15</eviction-percentage>
<!--
Specifies when eviction will be started. Default value is 3.
So every 3 (+up to 5 for performance reasons) seconds
eviction will be kicked of. Eviction is costly operation, setting
this number too low, can decrease the performance. -->
<!--
Minimum time in milliseconds which should pass before checking
if a partition of this map is evictable or not.
Default value is 100 millis.
-->
<min-eviction-check-millis>100</min-eviction-check-millis>
<!--
While recovering from split-brain (network partitioning),
map entries in the small cluster will merge into the bigger cluster
based on the policy set here. When an entry merge into the
cluster, there might an existing entry with the same key already.
Values of these entries might be different for that same key.
Which value should be set for the key? Conflict is resolved by
the policy set here. Default policy is PutIfAbsentMapMergePolicy
There are built-in merge policies such as
com.hazelcast.map.merge.PassThroughMergePolicy; entry will be
overwritten if merging entry exists for the key.
com.hazelcast.map.merge.PutIfAbsentMapMergePolicy ; entry will be
added if the merging entry doesn't exist in the cluster.
com.hazelcast.map.merge.HigherHitsMapMergePolicy ; entry with the
higher hits wins.
com.hazelcast.map.merge.LatestUpdateMapMergePolicy ; entry with the
latest update wins.
-->
<merge-policy>com.hazelcast.map.merge.LatestUpdateMapMergePolicy</merge-
policy>
</map>
</hazelcast>

In your member config, you need to change tcp joiner config like
<tcp-ip connection-timeout-seconds="60" enabled="true">
<!--connection-timeout-seconds>60</connection-timeout-seconds -->
<member>hostname1:5701</member>
<member>hostname2:5701</member>
</tcp-ip>
In this case, client config should look like
ClientConfig clientConfig = new ClientConfig();
// those are default values, it's not necessary to explicitly set it
clientConfig.getGroupConfig().setName("dev").setPassword("dev-pass");
String hazelcastServerList = "hostname1:5701 hostname2:5701";
String[] list = hazelcastServerList.split(" ");
clientConfig.getNetworkConfig().addAddress(list);
clientConfig.getNetworkConfig().setConnectionAttemptLimit(5);
// enabled by default
clientConfig.getNetworkConfig().setSmartRouting(true);
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
p.s. for the best performance, a client and the members should be on the same local network. To understand different ways to configure which network interfaces Hazelcast will use / listen, kindly, consult with documentation
Best,
Vik
p.p.s if you have any questions, write them in comments below.

Related

'vary' in APIM caching policies

I am learning Apim policies.
In caching policies there is many elements of <vary-by....
In description of microsoft docs, it is mentionned that it caches by for example developer, query parameter...
What does it mean exactly?
Does this have a relation to refreshing values?
I guess you are referring to https://learn.microsoft.com/en-us/azure/api-management/api-management-caching-policies#GetFromCache
<cache-lookup vary-by-developer="true | false" vary-by-developer-groups="true | false" caching-type="prefer-external | external | internal" downstream-caching-type="none | private | public" must-revalidate="true | false" allow-private-response-caching="#(expression to evaluate)">
<vary-by-header>Accept</vary-by-header>
<!-- should be present in most cases -->
<vary-by-header>Accept-Charset</vary-by-header>
<!-- should be present in most cases -->
<vary-by-header>Authorization</vary-by-header>
<!-- should be present when allow-private-response-caching is "true"-->
<vary-by-header>header name</vary-by-header>
<!-- optional, can repeated several times -->
<vary-by-query-parameter>parameter name</vary-by-query-parameter>
<!-- optional, can repeated several times -->
</cache-lookup>
By default the cache lookup would only use the URL path as a reference (the cache key) to find the cached item.
If you however wanted also to vary the cache lookup depending on a certain header or query parameter - with that extending the cache key - you would use <vary-by-header> or <vary-by-query-parameter> - so that for the same URL path responses are cached based on additional header and/or query parameters.
vary-by-developer or vary-by-developer-groups would extend the cache key with user or the group of the user assigned to the subscription key used - so that for the same URL path responses are cached based on who is calling the API operation.

Hazelcast map max-size configuration without eviction policy

In hazelcast Map configuration,if we set eviction-policy to None and Used max-idle-seconds,time-to-live-seconds like below ,
<map name="simpleMap">
<backup-count>0</backup-count>
<max-idle-seconds>360</max-idle-seconds> <time-to-live-seconds>30</time-to-live-seconds>
<eviction-policy>NONE</eviction-policy>
<max-size>3000</max-size>
<eviction-percentage>30</eviction-percentage>
<merge-policy>com.hazelcast.map.merge.PutIfAbsentMapMergePolicy</merge-policy>
Can someone explain,In this case max-size will work or not?
Configuring max-size with no eviction policy is not a valid configuration. Please check the description here.
If you want max-size to work, set the to a value other than NONE.

What is the purpose of "EnableSubscriptionPartitioning" property in an Azure Service Bus Topic?

When it comes to partitioning in an Azure Service Bus Topic, there are two properties: EnablePartitioning and EnableSubscriptionPartitioning.
It is very clear to me what EnablePartitioning property does. Based on my understanding of this property, essentially when this property is set to true, the topic in question will be partitioned across multiple message brokers.
What I am not able to find is any concrete information on EnableSubscriptionPartitioning property. The documentation I looked at simply describes this property as:
Value that indicates whether partitioning is enabled or disabled.
Furthermore when I create a topic with this property set to true (and enable partitioning property set to false) a topic is created for me with 118784 MB in size (MaxSizeInMegabytes property). Here's the response XML I get when I fetch topic's properties.
<entry xml:base="https://namespace.servicebus.windows.net/$Resources/topics?api-version=2016-07">
<id>https://namespace.servicebus.windows.net/gauravtesttopic?api-version=2016-07</id>
<title type="text">gauravtesttopic</title>
<published>2017-08-18T02:00:12Z</published>
<updated>2017-08-18T02:00:18Z</updated>
<author><name>namespace</name></author>
<link rel="self" href="../gauravtesttopic?api-version=2016-07"/>
<content type="application/xml">
<TopicDescription xmlns="http://schemas.microsoft.com/netservices/2010/10/servicebus/connect" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<DefaultMessageTimeToLive>P10675199DT2H48M5.4775807S</DefaultMessageTimeToLive>
<MaxSizeInMegabytes>118784</MaxSizeInMegabytes>
<RequiresDuplicateDetection>false</RequiresDuplicateDetection>
<DuplicateDetectionHistoryTimeWindow>PT10M</DuplicateDetectionHistoryTimeWindow>
<EnableBatchedOperations>true</EnableBatchedOperations>
<SizeInBytes>0</SizeInBytes>
<FilteringMessagesBeforePublishing>false</FilteringMessagesBeforePublishing>
<IsAnonymousAccessible>false</IsAnonymousAccessible>
<AuthorizationRules></AuthorizationRules>
<Status>Active</Status>
<CreatedAt>2017-08-18T02:00:11.5270915Z</CreatedAt>
<UpdatedAt>2017-08-18T02:00:18.087Z</UpdatedAt>
<AccessedAt>0001-01-01T00:00:00Z</AccessedAt>
<SupportOrdering>true</SupportOrdering>
<CountDetails xmlns:d2p1="http://schemas.microsoft.com/netservices/2011/06/servicebus">
<d2p1:ActiveMessageCount>0</d2p1:ActiveMessageCount>
<d2p1:DeadLetterMessageCount>0</d2p1:DeadLetterMessageCount>
<d2p1:ScheduledMessageCount>0</d2p1:ScheduledMessageCount>
<d2p1:TransferMessageCount>0</d2p1:TransferMessageCount>
<d2p1:TransferDeadLetterMessageCount>0</d2p1:TransferDeadLetterMessageCount>
</CountDetails>
<SubscriptionCount>0</SubscriptionCount>
<AutoDeleteOnIdle>P10675199DT2H48M5.4775807S</AutoDeleteOnIdle>
<EnablePartitioning>false</EnablePartitioning>
<IsExpress>false</IsExpress>
<EntityAvailabilityStatus>Available</EntityAvailabilityStatus>
<EnableSubscriptionPartitioning>true</EnableSubscriptionPartitioning>
<EnableExpress>false</EnableExpress>
</TopicDescription>
</content>
</entry>
The problem I run with this is when I try to update the topic, I get an error message from the service complaining about invalid size. Because the topic is not partitioned, the size should be one of the following: 1GB, 2GB, 3GB, 4GB or 5GB.
Any insights into this would be highly appreciated.

Hazelcast using HD

I am trying to test hazelcast hd.
<map name="testMap">
<!-- <in-memory-format>BINARY</in-memory-format> -->
<in-memory-format>NATIVE</in-memory-format>
<backup-count>1</backup-count>
<async-backup-count>0</async-backup-count>
<read-backup-data>false</read-backup-data>
</map>
<native-memory allocator-type="POOLED" enabled="true">
<size unit="GIGABYTES" value="150"/>
</native-memory>
I have no idea where the data is stored. Checked with management center and found max native memory is 30G but used is always 0.
Log from node below:
INFO: [192.168.129.155]:5701 [dev] [3.5.1] processors=4, physical.memory.total=38.4G, physical.memory.free=2.3G, swap.space.total=1024.0M, swap.space.free=997.4M, heap.memory.used=261.6M, heap.memory.free=205.4M, heap.memory.total=467.0M, heap.memory.max=8.5G, heap.memory.used/total=56.01%, heap.memory.used/max=3.00%, native.memory.used=0, native.memory.free=30.6G, native.memory.total=0, native.memory.max=30.6G, minor.gc.count=324, minor.gc.time=3225ms, major.gc.count=1, major.gc.time=74ms, load.process=100.00%, load.system=100.00%, load.systemAverage=0.40, thread.count=57, thread.peakCount=61, cluster.timeDiff=3, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operation.size=0, executor.q.priorityOperation.size=0, executor.q.response.size=0, operations.remote.size=0, operations.running.size=0, operations.pending.invocations.count=0, operations.pending.invocations.percentage=0.00%, proxy.count=2, clientEndpoint.count=1, connection.active.count=13, client.connection.count=1, connection.count=9
Heap memory does not increase, non heap no change & native no change where is the data stored.
Am i missing some thing?
Update: using hazelcast version 3.5 and management center version 3.5 they are licensed version
You should use hazelcast and management center version 3.8.1.
The latest version of management center shows memory consumption and entries cost for binary and native memory.
Thanks

Hazelcast Cluster members going out of memory due to huge number of "IsStillRunningService" objects

We have a system that makes use of Hazelcast IExecutor Service and IMap on 3.5 version. We recently encountered with Hazelcast cluster members going Out of Memory in Production, one after the other and at the end all nodes are crashed with OOM.
While doing the causal analysis, we found that there were thousands of below log entries and log file size grew exponentially. Also the storage space where logs were present, had also ran out of space.
WARNING: [10.7.90.189]:30103 [FB] [3.5] Asking if operation execution has been started: com.hazelcast.spi.impl.operationservice.impl.IsStillRunningService$InvokeIsStillRunningOperationRunnable#48b3ac3b
Mar 30, 2016 11:09:29 AM com.hazelcast.spi.impl.operationservice.impl.Invocation
WARNING: [10.7.90.189]:30103 [FB] [3.5] While asking 'is-executing': Invocation{ serviceName='hz:core:partitionService', op=com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation{serviceName='hz:core:partition
Service', partitionId=-1, callId=59834, invocationTime=1459349279980, waitTimeout=-1, callTimeout=5000}, partitionId=-1, replicaIndex=0, tryCount=0, tryPauseMillis=0, invokeCount=1, callTimeout=5000, target=Address[1.2.3.4]:30102, b
ackupsExpected=0, backupsCompleted=0}
com.hazelcast.core.OperationTimeoutException: No response for 10000 ms. Aborting invocation! Invocation{ serviceName='hz:core:partitionService', op=com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation{servic
eName='hz:core:partitionService', partitionId=-1, callId=268177, invocationTime=1459349295209, waitTimeout=-1, callTimeout=5000}, partitionId=-1, replicaIndex=0, tryCount=0, tryPauseMillis=0, invokeCount=1, callTimeout=5000, target=Addr
ess[10.7.90.190]:30102, backupsExpected=0, backupsCompleted=0} No response has been received! backups-expected:0 backups-completed: 0
at com.hazelcast.spi.impl.operationservice.impl.Invocation.newOperationTimeoutException(Invocation.java:491)
at com.hazelcast.spi.impl.operationservice.impl.IsStillRunningService$IsOperationStillRunningCallback.setOperationTimeout(IsStillRunningService.java:224)
at com.hazelcast.spi.impl.operationservice.impl.IsStillRunningService$IsOperationStillRunningCallback.onFailure(IsStillRunningService.java:219)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture$1.run(InvocationFuture.java:137)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)
I understand that, cluster members will keep making Heartbeats to make sure all the members are alive and I believe default is 10sec. The problem now is that, if incase any of the member goes unresponsive or hugh state, rest of the members will keep making is-executing calls. After looking into the Heap dump, came to know that >73% heap is full of "IsStillRunningService" objects.
Questions:
How to get to know what exactly went wrong?
Running out of storage space is just a co-incidence or might have any corelation? We are suspecting that one might have lead to other, since it happened twice within a week.
Hazelcast XML Configuration:
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config-3.5.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<map name="myMap">
<backup-count>0</backup-count>
<time-to-live-seconds>43200</time-to-live-seconds>
<eviction-policy>LRU</eviction-policy>
<max-size policy="USED_HEAP_PERCENTAGE">75</max-size>
<eviction-percentage>10</eviction-percentage>
<in-memory-format>OBJECT</in-memory-format>
</map>
<executor-service name="calculation">
<pool-size>10</pool-size>
<queue-capacity>400</queue-capacity>
</executor-service>
<executor-service name="loader">
<pool-size>5</pool-size>
<queue-capacity>400</queue-capacity>
</executor-service>
<properties>
<property name="hazelcast.icmp.timeout">5000</property>
<property name="hazelcast.initial.wait.seconds">10</property>
<property name="hazelcast.connection.monitor.interval">5000</property>
</properties>
<network>
<port auto-increment="true" port-count="100">30101</port>
<join>
<multicast enabled="false">
<multicast-group>224.2.2.3</multicast-group>
<multicast-port>54327</multicast-port>
</multicast>
<tcp-ip enabled="true">
<interface>1.2.3.4</interface>
<interface>1.2.3.5</interface>
<interface>1.2.3.6</interface>
</tcp-ip>
<aws enabled="false"/>
</join>
<interfaces enabled="false">
<interface>127.0.0.1</interface>
</interfaces>
</network>
</hazelcast>
StackTrace
LinkedBlockingQueue which holds IsStillRunningService Objects
Can you upgrade to 3.6. Fixes were added to prevent running into OOME using is-still-running. In 3.7 the whole mechanism is going to be removed and replaced by a less problematic approach.
https://github.com/hazelcast/hazelcast/pull/7719

Resources