I have created a demo environment using Logstash, redis , elastic search and kibana. (http://indico.cern.ch/getFile.....
Here logstash shipper is reading logs from log file which i have centralized using syslog-ng. Loogstash shipper is forwarding it to redis then Logstash indexer (filter) and finally to Elasticsearch.
Now i want to skip logstash shipper and redis part from it. Is this a good idea? Or redis is mandatory Or require to deal with heavy load. I'm not sure about it.
In above pdf link i have read that Logstash has low buffering and redis manages flow of logs that why redis is used. As redis keeps data in memory what if memory gets full? Also read that Logstash and Elasticsearch can be quite hungry in terms of RAM usage. JVM options needs to be properly tuned. if so then, how to tune JVM?
Is it required to purge/rotate elasticsearch data/index?
So which one is best suited for heavy load? I want to parse logs like [ System (OS and daemons ) logs, syslog, web server logs (apache, lighttpd), application server logs (tomcat), database server logs (mysql) and some Application logs (through logfiles) ].
Give your suggestions for improvement. Thanks !!!.
Kindly find following link for IMAGE.
(http://a.disquscdn.com/uploads/mediaembed/images/709/3604/original.jpg)
In the set up you describe Redis should not be required, using syslog-ng to centralise the log files serves the same purpose as Redis when multiple shippers are used.
It might be necessary to prune elasticsearch indexes to reduce disk space requirements. This will depend on how quickly your elasticsearch data are growing, how much disk space you have available and how long you need the logs to be searchable for.
I can't advise on JVM tuning.
Related
I'm looking for an official AWS CloudWatch Appender for Log4J2.
I've search all over and didn't find any.
Anybody there using CloudWatch in Java Apps with Log4J2?
I've been reading that the best approach to integrate with AWS Cloud Watch logs is using the Cloud Watch Log Agent.
It seems that having an independent agent will be much more reliable that the Application logging directly to Cloud Watch.
[Update] Why it may be more reliable?
If CloudWatch or the WebServer connection is down, the Appender may miss the Log Event. A write to disk would never be miss.
Nothing is faster than write to a stream file on local disk. When high log volume, sending data through a TCP connection could have performance impact or bottolnecks in the Application.
I would support the answer from Gonzalo.
I just want to update the answer with the new unified agent that can collect both logs and performances.
Collecting Metrics and Logs from Amazon EC2 Instances
I have ELK stack installed and about to do performance testing.
Getting below doubt which am not able to resolve myself, expertise suggestions/opinions would be helpful.
I am doubtful on,
1. whether to do logstash on LIVE - meaning, install logstash and run ELK in parallel with my performance testing on application.
2. Or First do the performance testing collect logs and feed logs to logstash offline. (this option is very much possible, as am running this test for about 30minutes only)
Which will b better performant ?
My application is on Java and since logstash also uses JVM for its parsing, am afraid it will have impact on my application performance.
Considering this, I prefer to go with option 2 , but would like to know are there any benefits/advantages going with option 1 that am missing ??
Help/suggestions much appreciated
Test your real environment under real conditions to get anything meaningful.
Will you run logstash on the server? Or will you feed your logs in the background to i.e. Kafka as described in my blogpost you summoned me from? Or will you run a batch job and then after the fact collect the logs?
Of course doing anything on the server itself during processing will have an impact and also tuning your JVM will have a big influence on how well everything performs. In general it is not an issue to run multiple JVMs on the same server.
Do your tests once with logstash / kafka / flume or any other log processing or shipping tool you want to use enabled and then run a second pass without these tools to get an idea of how much they impact the performance.
when should I use filebeat , packetbeat or topbeat ?
I am new to elk stack. I may sound silly but I am really confused over these. I would appreciate any sort of help.
It took me a while but I have figured out the solution.
File beat is used to read input from files we can use it when some application is generating logs in a file like elasticsearch's logs are generated in a log file , so we can use filebeat to read data from log files.
Topbeat is used to visualise the cpu usage , ram usage and other stuffs which are related to system resources.
Packetbeat can be used to analyze network traffic and we can directly log the transactions taking place using the ports on which transactions are happening.
While I was wondering about the difference between logstash and the beats platform it turned out that beats are more lightweight you need not install JVM on each of your servers to use logstash. However , logstash has a rich community of plugins with their count exceeding 200 but beats is still under development , so logstash can be used if we don't have the required protocol support in beats.
These are all Elasticsearch data shippers belonging to Elastic's Beats family. Each beat helps you analyze different bits and pieces in your environment.
Referring specifically to the beats you mentioned:
Filebeat is good for tracking and forwarding specific log files (e.g. apache access log)
Packetbeat is good for network analysis, monitoring the actual data packets being transferred across the wire
Topbeat can be used for infrastructure monitoring, giving you perf metrics on CPU usage, memory, etc.
There are plenty of resources to help you get started. Try Elastic's site. I also saw a series of tutorials on the Logz.io blog.
I need to collect Azure operation logs to feed my ELK (elasticsearch, logstash and kibaba) cluster.
I'm looking for a ready-to-use solution. If none is available, I can write my own and in this case I'm looking for a design which is simple and reliable.
My current design is to have a worker role which uses Azure's REST API to fetch logs every minute or so and push log entries to my ELK cluster. Sounds like that will cost U$20/no and I'll have to design some bookkeeping for the periods which my worker role is interrupted.
With so many input options, my hope was that logstash had a plugin for this task.
I have setup graphite and statsd on a specific machine that will be dedicated for stats. Now, if I would like to connect my application servers to provide stats - what would be the best way?
I know that carbon does this for the stats machine already, but what do I do on the appservers that doesn't have graphite installed?
What I am looking for is to store load, disk usage and memory free/used.
running collectd (http://collectd.org/) with a graphite agent (https://github.com/indygreg/collectd-carbon) would be an excellent start to gather the information you're after.
There is an almost unlimited amount of ways to get your data into graphite.
You can find a list of tools that have known to work very well with graphite on the readthedocs.org page: http://graphite.readthedocs.org/en/0.9.10/tools.html