Azure Eventhub Apache Storm issue - azure

I followed this article to try eventhub with Apache Storm, But when I run the Storm topology it was receiving events for a minute and then it stopped receiving. So I've restarted my program and then it was receiving the remaining messages. Every time I run the program after a minute it couldn't receive from eventhub. Please help me with the possibilities of the issue...
Should I change any configurations at Storm or Zookeeper.

The above jar contains a fix for a known issue in the QPID JMS client, which is used by the Event Hub spout implementation. When the service sends an empty frame (heartbeat to keep connection alive), a decoding error occurs in the client and that causes the client to stop processing commands. Details of this issue can be found here: https://issues.apache.org/jira/browse/QPID-6378

Related

Apache Pulsar Java client taking too much memory (OOM)

I wrote a simple Apache Pulsar client with Spring boot - a pulsar-producer initialized as beans that will be used in the rest controller to publish incoming api messages to Pulsar, and a consumer that consumes message, prints some values in console & acknowledge.
As of now the application is very simple, but the moment this spring-boot app loads I see memory peak, at times getting OOM. Is there any specific configuration to be used when using Pulsar client with Spring-boot?
The code is mostly the one found the Pulsar doc.
I am answering this to doc this issue - do not use the loops to consume messages, instead adopt the MessageListener subscribed to consumer via
consumer.messageListener(new Myconsumer())
or
consumer.messageListener((consumer, msg)->{//do something})
Docs didnt mention this, but I found surfing the consumer api.

ActiveMQ threads

How long does the threads take to stop and exit for ActiveMQConsumer? I get a segmentation fault on closing my application. Which I figured out was due to the ActiveMQ threads. If I comment the consumer the issue is no longer present. Currently I am using cms::MessageConsumer in activemq-cpp-library-3.9.4.
I see that the activemq::core::ActiveMQConsumer has isClosed() function that I can use to confirm if the consumer is closed and then move forward with deleting the objects thereby avoiding the segmentation fault. I am assuming this will solve my issue. But I wanted to know what is the correct approach with these ActiveMQ objects to avoid the issues with threads?
I was using the same session with consumer and producer, but when the broker is stopped and started the ActiveMQ reconnect was adding threads. I am not using failover.
So I have separated the session to send and receive and have instantiated connection factory, connection, and session for each separately. This design has no issues until the applications memory was not getting cleaned up due to above segmentation fault.
That's why I wanted to know when should I use cms::MessageConsumer vs ActiveMQConsumer?
The ActiveMQ Website has documentation with examples for the CMS client. I'd suggest reading those and following the example code in how it shuts down the connection and the library resources prior to application shutdown to ensure that resources are cleaned up appropriately.
As with JMS the CMS consumer instance is linked with the thread in the session that created it so if you are closing down a good rule to follow is to close the session to ensure that message deliveries get stopped before you delete anything consumer instances.

After Linux restart, Kafka throwing "no brokers found when trying to rebalance"

I followed an excellent step-by-step tutorial for installing Kafka on Linux. Everything was working fine for me until I restarted Linux. After the restart, I get the following error when I try to consume a queue with kafka-console-consumer.sh.
$ ~/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic TutorialTopic --from-beginning
[2016-02-04 03:16:54,944] WARN [console-consumer-6966_bob-kafka-storm-1454577414492-8728ae43], no brokers found when trying to rebalance. (kafka.consumer.ZookeeperConsumerConnector)
Before I ran the kafka-console-consumer.sh script, I did push data to the Kafka queue using the kafka-console-producer.sh script. These steps worked without issue before restarting Linux.
I can fix the error by manually starting Kafka; but, I would rather that Kafka start automatically.
What might cause Kafka to not start correctly after restart?
So I had a similar issue in our hortonworks cluster earlier today. It seems like zookeeper was not starting correctly. I first tried kakfa-console-producer and got the exception below:
kafka-console-producer.sh --broker-list=localhost:9093 --topic=some_topic < /tmp/sometextfile.txt```
kafka.common.KafkaException: fetching topic metadata for topics```
The solution for me was to restart the server that had stopped responding. yours may be different but play around with console producer and see what errors your getting.
I has this same issue today when running a consumer:
WARN [console-consumer-74006_my-machine.com-1519117041191-ea485191], no brokers found when trying to rebalance. (kafka.consumer.ZookeeperConsumerConnector)
It turned out to be a disk full issue on the machine. Once space was freed up, the issue was resolved.
Deleting everything out relating to zookeeper and kafka out of the /tmp folder.. worked for me. But my system is not production.. so for other people proceed with caution.
By default kafka stores all the information in the /tmp folder. The tmp folder gets deleted every time you restart. You can change the placement of those files by changing in the properties file config/server.properties the property log.dirs.
For me,when start a consumer to fetch the message from broker cluster. But the Kafka server was shut down.
So, returned the message "no brokers found when trying to rebalance".
When i restart the kafka server, the error disappeared.
For producer consumer to work you have to start a broker. Broker is what intermediates message passing from producer to consumer.
sh kafka/bin/kafka-server-start.sh kafka/config/server.properties
The broker runs on 9092 port by default

Mqtt paho using spring integration stops processing messages on topic over certain load requests

I am using Spring Integration with mqtt-paho version 4.0.4 For receiving MQTT messages on specified topic.
When application is receiving huge load I found that, sometimes application is dropping connection with IMA (mqtt) and this was happened three times in a span of 1 Lac record.
But it regains the connectivity and started consuming messages received there after. There were no issue in IMA re-connectivity.
There is some other issue which I faced during this testing.
When there is continuous load on application, at some point application stops receiving messages and we can see one message flashed on screen i.e.
May 04, 2015 2:45:29 PM org.eclipse.paho.client.mqttv3.internal.ClientState checkForActivity
SEVERE: gvjIpONtSpP: Timed out as no activity, keepAlive=60,000 lastOutboundActivity=1,430,730,869,017 lastInboundActivity=1,430,730,929,151
After this we can see that there are no messages received on application even if continuous load is pushed through utility.
This behavior I found it three times.
At around 40K.
At around 90K.
At around 145K.
There is no consistent point or figures where application actually stops receiving messages.
Please let me know if anybody has faced and solved this before .
We had the same issue during performance testing and during MQTT Paho client performance/durability testing, before moving to production. The issue was on broker side, after settings adjustment, the IMA broker was able to consume millions of messages with no rejection.
Please look into max buffer parameter on IMA configuration web console. And overlimit behavior policy (what to do with messages published over specified threshold): reject, rollover etc.

Using Apache Kafka for log aggregation

I am learning Apache Kafka from their quickstart tutorial: http://kafka.apache.org/documentation.html#quickstart. Upto now, I have done the setup as follows. A producer node, where a web server is running at port 8888. A Kafka server(broker), Consumer and Zookeeper instance on another node. And I have tested the default console/file enabled producer and consumer with 3 partitions. The setup is perfect, and I am able to see the messages I sent in the order they created (with in each partition).
Now, I want to send the logs generated from the web server to Kafka Broker. These messages will be processed by consumer later. Currently I am using syslog-ng to capture server logs to a text file. I have come up with 3 rough ideas on how to implement producer to use kafka for log aggregation
Producer Implementations
First Kind:
Listen to tcp port of syslog-ng. Fetch each message and send to kafka server. Here we have two middle processes: Producer and syslog-ng
Second Kind: Using syslog-ng as Producer. Should find a way to send messages to Kafka server instead of writing to a file. Syslog-ng, the producer is the middle process.
Third Kind: Configuring the webserver itself as producer.
Am I correct in my thinking. In the last case we don't have any middle process. But I doubt its implementation will effect server performance. Can anyone let me know the best way of using Apache Kafka(if the above 3 are not good) and guide me through appropriate configuration of server?..
P.S.: I am using node.js for my web server
Thanks,
Sarath
Since you specify that you wish to send the logs generated to kafka broker, it indeed looks as if executing a process to listen and resend messages mainly creates another point of failure with no additional value (unless you need a specific syslog-ng capability).
Syslog-ng can send messages to external applications using:
http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guides/en/syslog-ng-ose-v3.4-guide-admin/html/configuring-destinations-program.html. I don't know if there are other ways to do that.
For the third option, I am not sure if kafka can easily be integrated into Node.js as it requires a c++ producer and when I last looked for one, I was not able to find. However, an easy alternative could be to have kafka read the log file created by the server and send those logs (using the console producer provided with kafka). This is usually a good way, as it completely remove dependencies between kafka and the web server (embedding the producer in would require error handling, configuration, etc). It requires the use of tail --follow and it works for us very well. If you wish more details on that, I can include them as well. Still you would need to supervise kafka execution to make sure messages are not lost (and provide a recovery option to offline send messages that failed). But, the good thing about this method is that there are no dependency between the tools.
Hope it helps...
Eran

Resources