is there a recommended way of feeding logstash with azure operation logs? - azure

I need to collect Azure operation logs to feed my ELK (elasticsearch, logstash and kibaba) cluster.
I'm looking for a ready-to-use solution. If none is available, I can write my own and in this case I'm looking for a design which is simple and reliable.
My current design is to have a worker role which uses Azure's REST API to fetch logs every minute or so and push log entries to my ELK cluster. Sounds like that will cost U$20/no and I'll have to design some bookkeeping for the periods which my worker role is interrupted.
With so many input options, my hope was that logstash had a plugin for this task.

Related

Emitting application level metrics in node js

I want to emit metrics from my node application to monitor how frequently a certain branch of code is reached. For example, I am interested in knowing how many times a service call didn't return the expected response. Also I want to be able to emit for each service call the time it took etc.
I am expecting I will be using a client in the code that will emit metrics to a server and then I will be able to view the metrics in a dashboard on the server. I am more interested in open source solutions that I can host on my own infrastructure.
Please note, I am not interested in system metrics here such as CPU, memory usage etc.
Implement pervasive logging and then use something like Elasticsearch + Kibana to display them in a dashboard.
There are other metric dashboard systems such as Grafana, Graphite, Tableu etc. A lot of them send metrics which are numbers associated with tags such as counting function calls, CPU load etc. The main reason I like the Kibana solution is that it is not based on metrics but instead extracts metrics from your log files.
The only thing you really need to do with your code is make sure your logs are timestamped.
Google for Kibana or "ELK stack" (ELK stands for Elasticsearch + Logstash + Kibana) for how to set this up. The first time I set it up took me just a few hours to get results.
Node has several loggers that can be configured to send log events to ELK. In addition the Logstash (or the modern "Beats") part of ELK can ingest any log file and parse them with regexp to forward data to Elasticsearch so you do not need to modify your software.
The ELK solution can be configured simply or you can spend literally weeks tuning your data parsing and graphs to get more insights - it is very flexible and how you use it is up to you.
Metrics vs Logs (opinion):
What you want is of course the metrics. But metrics alone doesn't say much. What you are ultimately after is being able to analyse your system for debugging and optimisation. This is where logging has an advantage.
With a solution that extracts metrics from logs like Kibana you have another layer to deep-dive into behind the metrics. You can query it to find what events caused the metrics. This is not easy to do on a running system because you would normally have to simulate inputs to your system to get similar metrics to figure out what is happening. But with Kibana you can analyse historical events that already happened instead!
Here's an old screenshot of a Kibana set-up I did a few years back to monitor a web service (including all emails it receives):
Note the screenshot above - apart from the graphs and metrics I extract from my system I also display parsed logs at the bottom of the dashboard so I get near real-time view of what is happening. This is the email received dashboard which we used to monitor things like subscriptions, complaints, click-through rates etc.

logstash vs a task queue for nodejs application

I have a nodejs web server where there seem to be a couple of bottlenecks preventing it from fully functioning peak load.
logging multiple events to our SQL server
logging multiple events to our elastic cluster
under heavy load , both SQL and elastic seem to reject my requests and/or considerably slow performance. So I've decided to reduce the load on these DBs via Logstash (for elastic) and an async task queue (for SQL)
Since i'm working on limited time , i'm wondering if i can solve these issues with just an async task queue (like kue) where i can push both SQL and elastic logs.
Do i need to implement logstash as well? or does an async queue solve the same thing as logstash.
You can try Async library's Queue feature and try and make it run on a child process or much better in a separate server as a Microservice for queues. That way you can move the load to another location, giving you a boost on your app server.
As you mentioned you are using azure I would strongly recommend using their queue solution plus a couple of azure functions to handle the read from the queue and processing.
I've rolled my own solution before using node.js and rabbitmq with node workers to read from the queue and write to elastic search because the solution couldn't go into the cloud.
It works and is robust but it takes quite a bit of configuration and custom code that is a maintenance nightmare. Ripped that out as soon as I could.
The benefits of using the azure service are:
Very little bespoke configuration is required.
Less custom code === less bugs & maintainence
scaling isn't an issue, some huge businesses rely on this
no 2am support calls, if azure is down they are going to fix it... fast
much cheaper, unless the throughput is massive and constant the scaling model is much cheaper and azure functions are perfect as you won't have running servers sitting there doing nothing when the queue is empty.
Same could be said for AWS and Google Cloud.

Log4J2 CloudWatch Appender

I'm looking for an official AWS CloudWatch Appender for Log4J2.
I've search all over and didn't find any.
Anybody there using CloudWatch in Java Apps with Log4J2?
I've been reading that the best approach to integrate with AWS Cloud Watch logs is using the Cloud Watch Log Agent.
It seems that having an independent agent will be much more reliable that the Application logging directly to Cloud Watch.
[Update] Why it may be more reliable?
If CloudWatch or the WebServer connection is down, the Appender may miss the Log Event. A write to disk would never be miss.
Nothing is faster than write to a stream file on local disk. When high log volume, sending data through a TCP connection could have performance impact or bottolnecks in the Application.
I would support the answer from Gonzalo.
I just want to update the answer with the new unified agent that can collect both logs and performances.
Collecting Metrics and Logs from Amazon EC2 Instances

Dynamic Service Creation to Distribute Load

Background
The problem we're facing is that we are doing video encoding and want to distribute the load to multiple nodes in the cluster.
We would like to constrain the number of video encoding jobs on a particular node to some maximum value. We would also like to have small video encoding jobs sent to a certain grouping of nodes in the cluster, and long video encoding jobs sent to another grouping of nodes in the cluster.
The idea behind this is to help maintain fairness amongst clients by partitioning the large jobs into a separate pool of nodes. This helps ensure that the small video encoding jobs are not blocked / throttled by a single tenant running a long encoding job.
Using Service Fabric
We plan on using an ASF service for the video encoding. With this in mind we had an idea of dynamically creating a service for each job that comes in. Placement constraints could then be used to determine which pool of nodes a job would run in. Custom metrics based on memory usage, CPU usage ... could be used to limit the number of active jobs on a node.
With this method the node distributing the jobs would have to poll whether a new service could currently be created that satisfies the placement constraints and metrics.
Questions
What happens when a service can't be placed on a node? (Using CreateServiceAsync I assume?)
Will this polling be prohibitively expensive?
Our video encoding executable is packaged along with the service which is approximately 80MB. Will this make the spinning up of a new service take a long time? (Minutes vs seconds)
As an alternative to this we could use a reliable queue based system, where the large jobs pool pulls from one queue and the small jobs pool pulls from another queue. This seems like the simpler way, but I want to explore all options to make sure I'm not missing out on some of the features of Service Fabric. Is there another better way you would suggest?
I have no experience with placement constraints and dynamic services, so I can't speak to that.
The polling of the perf counters isn't terribly expensive, that being said it's not a free operation. A one second poll interval shouldn't cause any huge perf impact while still providing a decent degree of resolution.
The service packages get copied to each node at deployment time rather than when services get spun up, so it'll make the deployment a bit slower but not affect service creation.
You're going to want to put the job data in reliable collections any way you structure it, but the question is how. One idea I just had that might be worth considering is making the job processing service a partitioned service and base your partitioning strategy based off encoding job size and/or tenant so that large jobs from the same tenant get stuck in the same queue, and smaller jobs for others go elsewhere.
As an aside, one thing I've dealt with in the past is SF remoting limits the size of the messages sent and throws if its too big, so if your video files are being passed from service to service you're going to want to consider a paging strategy for inter service communication.

Is Logstsh shipper instance and redis required in this architecture?

I have created a demo environment using Logstash, redis , elastic search and kibana. (http://indico.cern.ch/getFile.....
Here logstash shipper is reading logs from log file which i have centralized using syslog-ng. Loogstash shipper is forwarding it to redis then Logstash indexer (filter) and finally to Elasticsearch.
Now i want to skip logstash shipper and redis part from it. Is this a good idea? Or redis is mandatory Or require to deal with heavy load. I'm not sure about it.
In above pdf link i have read that Logstash has low buffering and redis manages flow of logs that why redis is used. As redis keeps data in memory what if memory gets full? Also read that Logstash and Elasticsearch can be quite hungry in terms of RAM usage. JVM options needs to be properly tuned. if so then, how to tune JVM?
Is it required to purge/rotate elasticsearch data/index?
So which one is best suited for heavy load? I want to parse logs like [ System (OS and daemons ) logs, syslog, web server logs (apache, lighttpd), application server logs (tomcat), database server logs (mysql) and some Application logs (through logfiles) ].
Give your suggestions for improvement. Thanks !!!.
Kindly find following link for IMAGE.
(http://a.disquscdn.com/uploads/mediaembed/images/709/3604/original.jpg)
In the set up you describe Redis should not be required, using syslog-ng to centralise the log files serves the same purpose as Redis when multiple shippers are used.
It might be necessary to prune elasticsearch indexes to reduce disk space requirements. This will depend on how quickly your elasticsearch data are growing, how much disk space you have available and how long you need the logs to be searchable for.
I can't advise on JVM tuning.

Resources