Logstash input event size calculation

Logstash input event size calculation - logstash

I want to monitor Logstash event size per minute from a particular event source.
I am collecting events from multiple application and want to track how much data each application is sending to Logstash in bytes.
I am able to count number of events per application but now stuck with size/volume metric.
Is there any way we can achieve this in Logstash

Related

communication filebeat --> logstash: messages wrong order

I try to feed a csv data file to logstash using filebeat. Unfortunately the messages are out of order. Is there any way to correct this?
Could this caused by TCP or any pipeline? Logstash started logstash.javapipeline / pipeline_id=>"main", "pipeline.workers"=>8
I tried:
filebeat - output to console - pass
filebeat - output to logstash (localhost) - logstash w/o filter; output to stdout - fail (wrong order of messages)

Per default order is not guaranteed in Logstash as the events in the batch can be reordered in the filter processing and some events can be processed faster than others.
If you need to order your events you will have to change the number of pipeline.workers to 1, which means that only 1 CPU will be used to process your messages.
Also, set pipeline.ordered to auto in logstash.yml.
Setting pipeline.workers to 1 will make logstash process the events in the orders they are received, but since it will use only 1 CPU, it can impact the performance if you have a high rate of events per second.
This is the part of the documentation about ordering events.

Stream Analytics query hits size limit

I'm new to Azure Stream Analytics. I have an Event hub as input source and now I'm trying to execute a simple query on this stream. An example query is like this:
SELECT
count(*)
INTO [output1]
FROM
[input1] TIMESTAMP BY Time
GROUP BY TumblingWindow(second, 10)
So I want to count the events which arrived within a certain time frame.
When executing this query, I always get the following error:
Request exceeded maximum allowed size limit
As I already narrowed down the checked time window and I'm certain that the amount of events within this time frame is not very big (at most several 100)
I'm not sure how to avoid this error.
Do you have a hint?
Thanks!

Request exceeded maximum allowed size limit
This error(i believe it should be more explicit) indicates that you violated the azure stream analytic resource and object limits.
It's not just about quantity, it's also about size.Please check your source inputs' size or try to reduce the windowsize and test again.
1.Does the record size of the source query mean that one event can only have 64 KB or does this parameter mean 64 K events?
It means the size of one event should below 64KB.
Is there a possibility to use Stream Analytics to select only certain
subfields of the event or is the only way to reduce the event size
before it is sent to the event hub?
As i know,ASA only collects data for processing it,so the size is all depends on the source side and your query sql. Since you need to use COUNT, i'm afraid that you have to do something on the eventhub side.Please refer to my thoughts:
Use Event Hub Azure Function Trigger, when an event streams into event hub,trigger the function and pick only partial key-values and save it into another event hub namespace.(Just in order to reduce the size of source event) Anyway you only need to COUNT records, i think it works for you.

Really big retrieval lag for Logstash Kafka inputs producing data irregularly

I'm using logstash 2.4 with kafka input 5.1.6. In my config I created a field called input_lag in order to monitor how much time it takes logstash to process logs:
ruby {
code => "event['lag_seconds'] = (Time.now.to_f - event['#timestamp'].to_f)"
}
I listen to several kafka topics from a single logstash instance and for the topics that produce logs regularly everything is OK and the lag is small (several seconds). However, for the topics that produce small amount of logs irregularly I get really big lags. Sometimes it's tens of thousands of seconds.
My configuration for Kafka input is following:
input {
kafka {
bootstrap_servers => "kafka-broker1:6667,kafka-broker2:6667,kafka-broker3:6667"
add_field => { "environment" => "my_env" }
topics_pattern => "a|lot|of|topics|like|60|of|them"
decorate_events => true
group_id => "mygroup1"
codec => json
consumer_threads => 10
auto_offset_reset => "latest"
auto_commit_interval_ms => "5000"
}
}
The logstash instance seems healthy, as logs from other topics are being retrieved regularly. I've checked and if I connect to Kafka using its console consumer the delayed logs are there. I've also thought that it might be a problem with too many topics being served by a single logstash instance and extracted those small topics to separate logstash instances but the lag is exactly the same, so it's not the issue.
Any ideas? I suspect that logstash might be using some exponential delay for log retrieval, but have no idea how to confirm and fix that.

Still lack some information:
Kafaka client version?
What's the content of #timestamp?
What's the order of filter? Is ruby last one?
the delayed logs are there -- 'there' means in Kafaka?
Timestamp
If we didn't use date filter to change this field, #timestamp should be the time at which the log entry was read.
In this case, the lag ups to seconds, so I guess the date filter is used and timestamp here is the time when log generated.
Wait Before Fetch
When use Kafka input plugin to consume message, it will wait some time before server respond. This can be configured by two options:
fetch_max_wait_ms
poll_timeout_ms
You many check them in config file.
Wait Before Filter
Logstash handles input log in batch to improve performance, so if not enough logs comes, it will wait some time.
pipeline.batch.delay
You may check it in logstash.yml.
Metric
Logstash itself has the metric information generate, combined with Elasticseach and Kibana, can be very handy to use. So I suggest you to have a try.
Ref
Kafka Input
Logstash Config
ELK Monitoring

Solution for delaying events for N days

We're currently writing an application in Microsoft Azure and we're planning to use Event Hubs to handle processing of real time events.
However, after an initial processing we will have to delay further processing of the events for N number of days. The process will work like this:
Event triggered -> Place event in Event Hub -> Event gets fetched from Event Hub and processed -> Event should be delay for X days -> Event gets' further processed (two last steps might be a loop)
How can we achieve this delay of further event processing without using polling or similar strategies. One idea is to use Azure Queues and their visibility timeout, but 7 days is the supported maximum according to the documentation and our business demands are in the 1-3 months maximum range. Number of events in our system should be max 10k per day.
Any ideas would be appreciated, thanks!

As you already mentioned - EventHubs supports only 7 days window of data to be retained.
Event Hubs are typically used as real-time telemetry data pipe-lines where data seek performance is critical. For 99.9% usecases/scenarios our users typically require last couple of hours, if not seconds.
However, after the real-time processing is over, and If you still need to re-analyze the data after a while, for ex: run a Hadoop job on last months data - our seek pattern & store are not optimized for it. We recommend to forward the messages to other data archival stores which are specialized for big-data queries.
As - data archival is an ask that most of our customers naturally look for - we are releasing a new feature which automatically archives the data in AVRO format into Azure storage.

Logstash File Input Latency on Linux

Logstash is running.
How long takes it from adding a single line to a log file until Logstash recognize the new line and start to transform and output it.
With a simple BASH script I measure from 99 msec up to 800 msec including a transformation. It's clear that the latency depends on the Logstash transformation, HD, OS and the CPU. But how recognize Logstash the file change? Is there an internal timer? Pulls logstash from file?

Logstash's file input polls the files being watched at the interval set in the stat_interval parameter, which currently (Logstash 1.5) defaults to 1, i.e. every second.
In other words, assuming that
Logstash isn't behind on the reading any of the log files monitored by a particular file input and
the Logstash process isn't CPU-starved (it usually runs at priority 19 so heavy CPU usage by other processes could cause scheduling delays),
new events will on average get picked up within 500 ms and in the worst case within 1000 ms.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Logstash input event size calculation - logstash

Related

communication filebeat --> logstash: messages wrong order

Stream Analytics query hits size limit

Really big retrieval lag for Logstash Kafka inputs producing data irregularly

Solution for delaying events for N days

Logstash File Input Latency on Linux

Categories

Resources