Vertx cluster member connection breaks on hazelcast uuid reset

Vertx cluster member connection breaks on hazelcast uuid reset - hazelcast

I am working on a project that is composed of multiple vertx micro services where each service runs on different containers in Openshift platform. Eventbus is used for communication between services.
Sometime when a request is made via eventbus there is no response and failing with below errors
[vert.x-eventloop-thread-2] DEBUG io.vertx.core.eventbus.impl.clustered.ConnectionHolder - tx.id=ea60ebe0-1d81-4041-80d5-79cbe1d2a11c Not connected to server
[vert.x-eventloop-thread-2] WARN io.vertx.core.eventbus.impl.clustered.ConnectionHolder - tx.id=97441ebe-8ce9-42b2-996d-35455a5b32f2 Connecting to server 65c9ab20-43f8-4c59-8455-ecca376b71ac failed
Whenever this happen I can see the below error in the destination server to which above request was made
message=WARNING: [192.168.33.42]:5701 [cdart] [5.0.3] Resetting local member UUID. Previous: 65c9ab20-43f8-4c59-8455-ecca376b71ac, new: 8dd74cdf-e4c4-443f-a38e-3f6c36721795
Could this be due to reset event raised by Hazelcast is not handled in Vertx?
Vertx 4.3.5 version is used in this project.

This is a known issue that will be fixed in the forthcoming 4.4 release.

Related

enableTransaction not working with J2ee Container

I get below error when I consume message in a Jboss ejb container
2022-05-18 22:37:24,699 ERROR [org.jboss.resource.adapter.jms.inflow.JmsActivation] Unable to reconnect org.jboss.resource.adapter.jms.inflow.JmsActivationSpec#8315a9(ra=org.jboss.resource.adapter.jms.JmsResourceAdapter#10a62f0 destination=queues/Subscriber.global.globalvirtual.e2e.cmdm.changepub.Virtual destinationType=javax.jms.Queue tx=true durable=false reconnect=10 provider=java:/ACTIVEMQJMSNONXAProviderAsync user=ldcp.cmdm pass= )
javax.jms.JMSException: Please enable transactions on PulsarConnectionFactory with enableTransaction=true
After getting this error - I set enableTransaction=true and started getting different error
javax.jms.JMSException: not supported
at com.datastax.oss.pulsar.jms.PulsarConnection.createConnectionConsumer(PulsarConnection.java:685)

the error is in createConnectionConsumer.
This is the method called by the JBoss container.
This method is currently not implemented
https://github.com/datastax/pulsar-jms/blob/96606d0f22a1af8fdde0c5eff0f5dde086d9862c/pulsar-jms/src/main/java/com/datastax/oss/pulsar/jms/PulsarConnection.java#L685
the implementation should be quite straighforward.
There are integration tests for Payara and Apache TomEE but JBoss is still not covered.
Please open an issue on github.

EventHubConsumerClient Apache Qpid memory leak?

I am reading events from an Azure EventHub cluster synchronously via the receiveFromPartition method on the EventHubConsumerClient class.
I create the client once like so:
EventHubConsumerClient eventHubConsumerClient = new EventHubClientBuilder()
.connectionString(eventHubConnectionString)
.consumerGroup(consumerGroup)
.buildConsumerClient());
I then just use a ScheduledExecutorService to retrieve events every 1.5s via:
IterableStream<PartitionEvent> receivedEvents = eventHubConsumerClient.receiveFromPartition(
partitionId, 1, eventPosition);
The equivalent logic in V3 of the SDK worked fine (using PartitionReceivers), but now I am seeing OOMs in my JVM.
Running a profiler against a local version of the logic I see the majority of the heap (90%, mainly in OG) is being taken up by byte[]s, referenced by org.apache.qpid.proton.codex.CompositeReadableBuffer. This pattern is not present when I profile the V3 logic.
What could be causing a leak of the AMQP messages here, do I need to interact with the SDK further, for example close a connection that I'm not aware of after each call?
Any advise would be very appreciated, thanks!

Turns out it was a bug, solved here: https://github.com/Azure/azure-sdk-for-java/issues/13775

Kafka Zookeeper Security Authentication & Authorization(JAAS) Using SASL

Regarding Kafka-Zookeeper Security using DIGEST MD5 Authentication, I am trying to rotate/change credentials/password for both server(zookeeper) and client(kafka) jaas config file.
We have a 3 node cluster of 3 zookeepers and 3 kafka broker nodes with below jaas configuration file.
kafka.conf
org.apache.zookeeper.server.auth.DigestLoginModule required
username="super"
password="password";
};
zookeeper.conf
Server {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_super="password";
};
To rotate we do a rolling restart of server(zookeeper) instances after updating the credential(password) and during the process of rolling restart after updating the same credential/password for super user for client(kafka instances) one at a time, we notice
[2019-06-15 17:17:38,929] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-06-15 17:17:38,929] INFO [ZooKeeperClient] Connected. (kafka.zookeeper.ZooKeeperClient)
these info level in server logs, which eventually results in unclean shutdown and restart of the broker which impacts the writes and reads for longer than expected. I have tried commenting requireClientAuthScheme=sasl in zookeeper zoo.cfg https://cwiki.apache.org/confluence/display/ZOOKEEPER/Client-Server+mutual+authentication to allow any clients authenticate to zookeeper but no success.
Also, alternative approach - tried to update the credential/password in jaas config file dynamically using sasl.jaas.config and do get the same exception documented in this jira (reference: https://issues.apache.org/jira/browse/KAFKA-8010).
can someone have any suggestions? Thanks in advance.

Random connection errors to MS SQL from nodeJS app

We have an AWS server running some nodeJS services. The services connecting to MS sql are randomly crashing with message "Failed to connect to databaseserver:1433 - Could not connect (sequence)".
We are running on:
App server:
Linux Ubuntu 14.4
AWS m5
NodeJS: 8.11.2
Services are using package mssql latest version (4.3.0). This includes tedious 2.7.1.
DB server:
Windows server 2012.
sql server 2012
throughput: about 300 rpm, error also happens when throughput is lower (about 20 rpm).
App is running in a cluster through PM2 (runs 4 times). We see the error happening on all 4 at the same time, but sometimes also on 1 or 2 instances.
What we tried:
Upgrading to alpha version of mssql with tedious 3.0.1. Did not make a difference
Upgrading from Amazon M4 machine to M5 machine with enhanced networking
Changing the pool settings in the app. We tried setting min connections to 0 or low/high value. Max also to low/high value but no avail.
Duplicate server to new machine.
Setting idleTimeoutMillis to 1 second
Pinging DB server to see if there is a connection problem, but we see no weird pings when the error happens.
Connection on app startup:
App.sqlConnection = new App.SQL.ConnectionPool(config, function(err) {
if(err){
Log.error(err);
process.exit(1);
}
App.sqlConnection.on('error', err => {
Log.error(`There was a connection err : ${err}`);
process.exit(1);
});
});
request;
var request = new App.SQL.Request(App.sqlConnection);
request.query(sQuery, function(err,results)
{
});
Errors are catched by the "on error" handler.
The error happens randomly across services. Some have more instances of the error then others.
We are running out of options. Any idea if we can see more detailed errors?

I have a couple suggestions.
First, how sure are you that these errors are actually a problem? If your code simply retries, instead of exiting, are the connections stable afterwards, or can a connection drop in the middle of a query?
(Connections dropping in the middle of queries are obviously not good, but random failures on connection, that can be fixed by retries, are the best kind of problem to have IMHO.)
Ignoring the potential in-code fix, I'm wondering when you say you "duplicated server to new machine" - did you launch a new AMI using latest Windows Server 2012, or did you image and clone? If your database server is a couple years old, you might actually be running outdated network drivers in your instance, which could give you some hiccups.
If you wanted to explore that, you could attempt rebuilding the entire database server from scratch on a newly launched AMI. Alternately you can upgrade PV driver, network adapter, and EC2Config on your existing instance, you can find the instructions at the following links:
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/Upgrading_PV_drivers.html#aws-pv-upgrade
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/sriov-networking.html#enable-enhanced-networking
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/UsingConfig_Install.html

Hazelcast Eureka Cloud Discovery Plugin not working

We have implemented Hazelcast as an embedded cache in our Spring Boot app, and need a way using which Hazelcast members within a "cluster group" can discover each other dynamically so that we dont have to provide possible IP address/port where Hazelcast might be running.
We came across this hazelcast plugin on github:
https://github.com/hazelcast/hazelcast-eureka which seems to provide the same feature using Eureka as discovery/registration tool.
As mentioned in this github documentation, hazelcast-eureka-one library is included within our boot app classpath, we also disabled TCP-IP & multicast discovery and added below discovery strategy in hazelcast.xml:
<discovery-strategies>
<discovery-strategy class="com.hazelcast.eureka.one.EurekaOneDiscoveryStrategy" enabled="true">
<properties>
<property name="self-registration">true</property>
<property name="namespace">hazelcast</property>
</properties>
</discovery-strategy>
</discovery-strategies>
Our application also provides configured EurekaClient, which is what we are autowiring and inject into this plugin implementation:
*
Config hazelcastConfig = new FileSystemXmlConfig(hazelcastConfigFilePath);
**EurekaOneDiscoveryStrategyFactory.setEurekaClient(eurekaClient);**
hazelcastInstance = Hazelcast.newHazelcastInstance(hazelcastConfig);
*
Problem:
We are able to start 2 instances of our spring boot app on same machine and we notice that each app is starting hazelcast instance embedded on separate port (5701, 5702). But it doesnt seem to recognize each other running within a cluster, this is what we see in app logs when 2nd instance is starting:
Members [1] {
Member [10.41.70.143]:5702 - 7c42eb24-3fa0-45cb-9394-17175cc92b9c this
}
17-12-13 12:22:44.480 WARN [main] c.h.i.Node.log(LoggingServiceImpl.java:168) - [10.41.70.143]:5702 [domain-services] [3.8.2] Config seed port is 5701 and cluster size is 1. Some of the ports seem occupied!
which seem to indicate that both hazelcast instances are running independently and doesnt recognize other running instance in a cluster/group.
Also, immediately after restart we see this exception thrown frequently on both the nodes:
*
java.lang.ClassCastException: com.hazelcast.nio.tcp.MemberWriteHandler cannot be cast to com.hazelcast.nio.ascii.TextWriteHandler
at com.hazelcast.nio.ascii.TextReadHandler.<init>(TextReadHandler.java:109) ~[hazelcast-3.8.2.jar:3.8.2]
at com.hazelcast.nio.tcp.SocketReaderInitializerImpl.init(SocketReaderInitializerImpl.java:89) ~[hazelcast-3.8.2.jar:3.8.2]
*
which seem to indicate there is Incompatibility between hazelcast library in the classpath?

It seems like your Eureka service returns the wrong ports. Hazelcast tries to connect to 8080 and other ports in the same range, whereas Hazelcast uses 5701. Not exactly sure why this happens but it feels like you requesting the wrong service name from Eureka which ends up returning the HTTP (Tomcat?!) ports instead of the separate Hazelcast service that should be registered.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Vertx cluster member connection breaks on hazelcast uuid reset - hazelcast

This is a known issue that will be fixed in the forthcoming 4.4 release.

Related

enableTransaction not working with J2ee Container

EventHubConsumerClient Apache Qpid memory leak?

Kafka Zookeeper Security Authentication & Authorization(JAAS) Using SASL

Random connection errors to MS SQL from nodeJS app

Hazelcast Eureka Cloud Discovery Plugin not working

Categories

Resources