I installed Apache pulsar standalone. Pulsar get timeout sometimes. It's not related to high throuput neither to a particular topic (following log). Pulsar-admin brokers healthcheck returns OK or timeout also. How to investigate on it ?
10:46:46.365 [pulsar-ordered-OrderedExecutor-7-0] WARN org.apache.pulsar.broker.service.BrokerService - Got exception when reading persistence policy for persistent://nnx/agent_ns/action_up-53da8177-b4b9-4b92-8f75-efe94dc2309d: null
java.util.concurrent.TimeoutException: null
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_232]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_232]
at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.5.0.jar:2.5.0]
at org.apache.pulsar.broker.service.BrokerService.lambda$getManagedLedgerConfig$32(BrokerService.java:922) ~[org.apache.pulsar-pulsar-broker-2.5.0.jar:2.5.0]
at org.apache.bookkeeper.mledger.util.SafeRun$2.safeRun(SafeRun.java:49) [org.apache.pulsar-managed-ledger-2.5.0.jar:2.5.0]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
I am glad you were able to resolve the issue my adding more cores. The issue was a connection timeout while trying to access some topic metadata that is stored inside of ZookKeeper as indicated by the following line in the stack trace:
at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.5.0.jar:2.5.0]
Increasing the cores must of freed up enough threads to allow the ZK node to respond to this request.
You can check the connection to the server looks like connection issue if you are using any TLScertificate file path check if you have the right certificate.
The Problem is we don't have lot of solutions in internet for apache pulsar but if you are following the apache pulsar doc might help and also we have apache pulsar git hub and sample projects.
Related
How long does the threads take to stop and exit for ActiveMQConsumer? I get a segmentation fault on closing my application. Which I figured out was due to the ActiveMQ threads. If I comment the consumer the issue is no longer present. Currently I am using cms::MessageConsumer in activemq-cpp-library-3.9.4.
I see that the activemq::core::ActiveMQConsumer has isClosed() function that I can use to confirm if the consumer is closed and then move forward with deleting the objects thereby avoiding the segmentation fault. I am assuming this will solve my issue. But I wanted to know what is the correct approach with these ActiveMQ objects to avoid the issues with threads?
I was using the same session with consumer and producer, but when the broker is stopped and started the ActiveMQ reconnect was adding threads. I am not using failover.
So I have separated the session to send and receive and have instantiated connection factory, connection, and session for each separately. This design has no issues until the applications memory was not getting cleaned up due to above segmentation fault.
That's why I wanted to know when should I use cms::MessageConsumer vs ActiveMQConsumer?
The ActiveMQ Website has documentation with examples for the CMS client. I'd suggest reading those and following the example code in how it shuts down the connection and the library resources prior to application shutdown to ensure that resources are cleaned up appropriately.
As with JMS the CMS consumer instance is linked with the thread in the session that created it so if you are closing down a good rule to follow is to close the session to ensure that message deliveries get stopped before you delete anything consumer instances.
I have a 3 node Cassandra cluster setup (replication set to 2) with Solr installed, each node having RHEL, 32 GB Ram, 1 TB HDD and DSE 4.8.3. There are lots of writes happening on my nodes and also my web application reads from my nodes.
I have observed that all the nodes go down after every 3-4 days. I have to do a restart of every node and then they function quite well till the next 3-4 days and again the same problem repeats. I checked the server logs but they do not show any error even when the server goes down. I am unable to figure out why is this happening.
In my application, sometimes when I connect to the nodes through the C# Cassandra driver, I get the following error
Cassandra.NoHostAvailableException: None of the hosts tried for query are available (tried: 'node-ip':9042) at Cassandra.Tasks.TaskHelper.WaitToComplete(Task task, Int32 timeout) at Cassandra.Tasks.TaskHelper.WaitToComplete[T](Task``1 task, Int32 timeout) at Cassandra.ControlConnection.Init() at Cassandra.Cluster.Init()`
But when I check the OpsCenter, none of the nodes are down. All nodes status show perfectly fine. Could this be a problem with the driver? Earlier I was using Cassandra C# driver version 2.5.0 installed from nuget, but now I updated even that to version 3.0.3 still this errors persists.
Any help on this would be appreciated. Thanks in advance.
If you haven't done so already, you may want to look at setting your logging levels to default by running: nodetool -h 192.168.XXX.XXX setlogginglevel org.apache.cassandra DEBUG on all your nodes
Your first issue is most likely an OutOfMemory Exception.
For your second issue, the problem is most likely that you have really long GC pauses. Tailing /var/log/cassandra/debug.log or /var/log/cassandra/system.log may give you a hint but typically doesn't reveal the problem unless you are meticulously looking at the timestamps. The best way to troubleshoot this is to ensure you have GC logging enabled in your jvm.options config and then tail your gc logs taking note of the pause times:
grep 'Total time for which application threads were stopped:' /var/log/cassandra/gc.log.1 | less
The Unexpected exception during request; channel = [....] java.io.IOException: Error while read (....): Connection reset by peer error is typically inter-node timeouts. i.e. The coordinator times out waiting for a response from another node and sends a TCP RST packet to close the connection.
We are having issues with dropped connections in a node-kaka client. It appears that the zookeeper is resetting connections on us. When I run a tcpdump I see the following all over the place when viewed in wireshark:
The source is always one of our zookeeper servers and the destination is our kafka consumer. It appears that our client handles these in most situations just fine. In fact, I'm not at all convinced this is the cause for our failures. But, it does seem odd. I was hoping someone with more experience in how kaka-node, zookeeper, and kafka interact could provide some explanation.
ADDING SOME DETAILS FROM LOG
So, I see a few things in the logs. First, there are a ton of the following:
2016-03-11 20:26:32,357 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#868] - Client attempting to establish new session at /10.196.2.106:59300
2016-03-11 20:26:32,358 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#822] - Connection request from old client /10.196.2.106:59296; will be dropped if server is in r-o mode
Then there are a whole bunch of these:
2016-03-12 03:40:49,041 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1527b11827bfcfe, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2016-03-12 03:40:49,042 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1007] - Closed socket connection for client /10.196.2.106:33197 which had sessionid 0x1527b11827bfcfe
The strange thing is that IPs correlate to our secor servers. So, this is probably not related.
Other than that, I do not see anything really out of the ordinary.
I followed an excellent step-by-step tutorial for installing Kafka on Linux. Everything was working fine for me until I restarted Linux. After the restart, I get the following error when I try to consume a queue with kafka-console-consumer.sh.
$ ~/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic TutorialTopic --from-beginning
[2016-02-04 03:16:54,944] WARN [console-consumer-6966_bob-kafka-storm-1454577414492-8728ae43], no brokers found when trying to rebalance. (kafka.consumer.ZookeeperConsumerConnector)
Before I ran the kafka-console-consumer.sh script, I did push data to the Kafka queue using the kafka-console-producer.sh script. These steps worked without issue before restarting Linux.
I can fix the error by manually starting Kafka; but, I would rather that Kafka start automatically.
What might cause Kafka to not start correctly after restart?
So I had a similar issue in our hortonworks cluster earlier today. It seems like zookeeper was not starting correctly. I first tried kakfa-console-producer and got the exception below:
kafka-console-producer.sh --broker-list=localhost:9093 --topic=some_topic < /tmp/sometextfile.txt```
kafka.common.KafkaException: fetching topic metadata for topics```
The solution for me was to restart the server that had stopped responding. yours may be different but play around with console producer and see what errors your getting.
I has this same issue today when running a consumer:
WARN [console-consumer-74006_my-machine.com-1519117041191-ea485191], no brokers found when trying to rebalance. (kafka.consumer.ZookeeperConsumerConnector)
It turned out to be a disk full issue on the machine. Once space was freed up, the issue was resolved.
Deleting everything out relating to zookeeper and kafka out of the /tmp folder.. worked for me. But my system is not production.. so for other people proceed with caution.
By default kafka stores all the information in the /tmp folder. The tmp folder gets deleted every time you restart. You can change the placement of those files by changing in the properties file config/server.properties the property log.dirs.
For me,when start a consumer to fetch the message from broker cluster. But the Kafka server was shut down.
So, returned the message "no brokers found when trying to rebalance".
When i restart the kafka server, the error disappeared.
For producer consumer to work you have to start a broker. Broker is what intermediates message passing from producer to consumer.
sh kafka/bin/kafka-server-start.sh kafka/config/server.properties
The broker runs on 9092 port by default
I followed this article to try eventhub with Apache Storm, But when I run the Storm topology it was receiving events for a minute and then it stopped receiving. So I've restarted my program and then it was receiving the remaining messages. Every time I run the program after a minute it couldn't receive from eventhub. Please help me with the possibilities of the issue...
Should I change any configurations at Storm or Zookeeper.
The above jar contains a fix for a known issue in the QPID JMS client, which is used by the Event Hub spout implementation. When the service sends an empty frame (heartbeat to keep connection alive), a decoding error occurs in the client and that causes the client to stop processing commands. Details of this issue can be found here: https://issues.apache.org/jira/browse/QPID-6378