distributed logging: JMS and log4j? - log4j

Been doing some searching for a solution to this problem: I need log entries from apps running on several machines to be sent to & aggregated on a remote server. Requirements:
logging in the app needs to be asynchronous (can't wait for log entry to traverse network)
logging in the app needs to be queued; if the network fails, log entries need to be queued locally and sent to
centralized server when the network becomes available again
I'm looking at using log4j and a JMSAppender. Assuming that's a suitable solution, are there any examples available? What process would be running on the centralized server to receive log entries in this scenario?
Thanks.

One simple setup I came to think about is to use Apache ActiveMQ
It is an open source messaging broker (JMS compatible) that is able to cluster queues among several physical machines and the ActiveMQ installation is rather lightweight. You simple install one ActiveMQ on each of your applications machines. Then on the logging server (Physical Server C in the picture) you would have another ActiveMQ. Your application would use a JMS appender (read more here) and you could actually just use the included apache camel to read from the queue and write a log on file or database without needing to write an application for that task.
It could be as simple as adding something like the following to the camel.xml in the activemq /conf installation and import the camel.xml in the activemq.xml configuration.
<route>
<from uri="activemq:queue:LogQueue"/>
<to uri="file:target/folder/?fileName=logfile.log&fileExist=Append"/>
</route>
You could use a myrriad of other frameworks, JMS servers and technologies, but I think this is a rather easy approach to achieve with very low cost and high stability.

Related

How exactly Nagios server communicates with remote nodes i.e which protocol does it use in agent and agentless settings?

I installed Nagios Core and NCPA on a Mac. Implemented a few checks via custom plugins to understand how to use it. I am trying to understand the following:
Protocol that Nagios server actually use to communicate with NCPA agent and how exactly does NCPA return the result back to Nagios. Does it ssh into Nagios server and writes a file that server processes?
From application monitoring standpoint how can it be leveraged? Is it just to monitor that application is up and running (I read its not just for that it can do more but couldn't find any place where I could see how its actually implemented) or is there a restful API as well that we invoke from with in our application to send custom notification to Nagios server. I understand it might require some configuration at Nagios server end as well.
I came across Pager Duty and Sematext articles i.e PagerDuty Integration and SemaText Nagios Alert Integration where they have integrated their solution with Nagios I am trying to do something similar. Adding integration support for Nagios so that a user can utilise our applications UI to configure alerts/notification. For e.g. if a condition is met then alert or notify Nagios server to show a notification on its dashboard.
Can we generate an alert from within a spark streaming application based on a variable e.g. if its value is above a threshold or some condition is met send an alert to Nagios Server to display as notification on Nagios Dashboard. I came across a link where we can monitor status of a spark application but didn't find anything for something within a spark application.
I tried looking for answers to above questions but couldn't find anything useful or complete as such online. I would really appreciate if someone could help me understand above.
Nagios is highly configurable, and can communicate across many protocols. NCPA can return JSON or XML data. The most common agentless protocol is probably SNMP. If you can read Python, look directly at the /usr/local/nagios/libexec/check_ncpa.py file to see what's up.
Nagios can check whether a system is running a service, how much resources it is consuming, etc... There is a restful API.
Nagios offers an application with a more advanced graphical interface called Nagios XI. Perhaps that is what you are after.
I bet you probably could, yeah. It might take some development work to get the systems to communicate though.

Submit & monitor spark jobs via java in cluster mode

I have a java class which manage jobs and execute them via spark(using 1.6).
I am using the API - sparkLauncher. startApplication(SparkAppHandle.Listener... listeners) in order to monitor the state of the job.
The problem is I moved to work in a real cluster environment and this way can’t work when the master and workers are not on the same machine, as the internal implementation is making a use of localhost only (loopback) to open a port for the workers to bind to.
The API sparkLauncher.launch() works but doesn’t let me monitor the status.
What is the best practice for cluster environment using a java code?
I also saw the option of hidden Rest API, is it mature enough? Should I enable it in spark somehow (I am getting access denied, even though the port is open from outside) ?
REST API
In addition to viewing the metrics in the UI, they are also available as JSON. This gives developers an easy way to create new visualizations and monitoring tools for Spark. The JSON is available for both running applications, and in the history server. The endpoints are mounted at /api/v1. Eg., for the history server, they would typically be accessible at http://:18080/api/v1, and for a running application, at http://localhost:4040/api/v1.
More details you can find here.
Every SparkContext launches a web UI, by default on port 4040, that displays useful information about the application. This includes:
A list of scheduler stages and tasks
A summary of RDD sizes and memory usage
Environmental information.
Information about the running executors
You can access this interface by simply opening http://driver-node:4040 in a web browser. If multiple SparkContexts are running on the same host, they will bind to successive ports beginning with 4040 (4041, 4042, etc).
More details you can find here.

Open a port for Kafka communication to the outside-world

I have a VM (Linux OS) in Azure which has Hortonworks on it, which launches Kafka.
Kafka service is running and I am able of creating producer and consumer inside the VM.
I have the server IP and I'm also able to log into Ambari using 8080 port.
When I am trying to send a message to Kafka from my Java application I get a TimoutEception after 60 seconds.
What do I need to do in order to set the right port for Kafka communication from outside the VM?
I think that the m,ain issue here, is that Kafka is listening on local IP and not on the VM IP (WAN).
Any help will be really appreciated...
If you have used the Azure Resource Manager workflow to create the VM you have a Network Security Group that has been created automatically. You need to create rules in the NSG to make Kafka available. See : https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-nsg/
If you have used the Azure classic deployment workflow, you need to define an endpoint to expose Kafka. See: https://azure.microsoft.com/fr-fr/documentation/articles/virtual-machines-windows-classic-setup-endpoints/
Hope this helps,
Julien
Did you set Kafka advertised.host.name and advertised.port environment variables? That's how you present yourself to the outside world.
(Copy and pasting my response to a similar post)
For the recent versions of Kafka (0.10.0 as of this writing), you don't want to use advertised.host.name at all. In fact, even the [documentation] states that advertised.host.name is already deprecated. Moreover, Kafka will use this not only as the "advertised" host name for the producers/consumers, but for other brokers as well (in a multi-broker environment)...which is kind of a pain if you're using using a different (perhaps internal) DNS for the brokers...and you really don't want to get into the business of adding entries to the individual /etc/hosts of the brokers (ew!)
So, basically, you would want the brokers to use the internal name, but use the external FQDNs for the producers and consumers only. To do this, you will update advertised.listeners instead.

Dynamically Changing Hazelcast Server Log Level

I am using client - server mode of Hazelcast. Is it possible to control the logging level of Hazelcast server dynamically from Hazelcast client ?. My intention is that, by default I will start Hazelcast server in ERROR mode and in case of any problem, I want to change the log level to DEBUG mode without restarting the Hazelcast server.
Thanks
JK
Hazelcast does not depend on any custom logging frameworks and makes use of adaptors to connect to a number of existing logging frameworks. See some details here:
http://docs.hazelcast.org/docs/3.5/manual/html/logging.html
Most of the current logging frameworks allow you to dynamically / programmatically change the log levels. I'm at a loss here, since you haven't given any details of the logging framework you have used.
For example :
LogManager.getLogger("loggername").setLevel(newLoglevel);
will achieve whatever you are looking for. You can also change logj configuration file (logj.xml) in runtime and the changes will be in effect without restarting any of the hazelcast servers.

AWS EC2 instance Application logs

I want to store logs of applications like uWSGI ("/var/log/uwsgi/uwsgi.log") on a device that can be accessed from
multiple instances and can save their logs to that particular device under their own instance name dir.
So does AWS provides any solution to do that....
There are a number of approaches you can take here. If you want to have an experience that is like writing directly to the filesystem, then you could look at using something like s3fs to mount a common S3 bucket to each of your instances. This would give you more or less a real-time log merge though honestly I would be concerned over the performance of such a set up in a high volume application.
You could process the logs at some regular interval to push the data to some common store. This would not be real time, but would likely be a pretty simple solution. The problem here is that it may be difficult to interleave your log entries from different servers if you need to have them arranged in time order.
Personally, I set up a Graylog server for each instance cluster I have, to which I log all my access logs, error logs, etc. It is UDP based, so it is fire and forget from the application servers' standpoint. It provides nice search/querying tools as well. Personally I like this approach as it removes log management from the application servers altogether.
Two options that I've used:
Use syslog (or Syslog-NG) to log to a centralized location. We do this to ship our AWS log data offsite to our datacenter. Syslog-NG is more reliable than plain ole' Syslog and allows us to use MongoDB as a backing store.
Use logrotate to push your logs to S3. It's not real-time like the Syslog solution, but it's a lot easier to set up and manage, especially if you have a lot of instances and aren't using a VPC
Loggly and Splunk Storm are also two interesting SaaS products intended to solve this problem.

Resources