issue having logstash read a file and output to both stdout and another file - logstash

I have a project I am working on and wanted to try to hook it up to the ELK stack beginning with logstash. Essentially I have python writing this to a file named stockLog:
{'price': 30.98, 'timestamp': '2015-08-03 09:51:54', 'symbol':'FORTY',
'ceiling': Decimal('31.21'), 'floor': Decimal('30.68')}
I have logstash installed and (ideally) ready to run. My logstash.conf file looks like this:
input {
file { path => "/home/test001/stockLog"
start_position => beginning }
}
output {
stdout {}
file {
path => "/home/test001/testlog"
}
}
My goal is to actually be able to see how logstash is going to read the python dictionary before I install Elasticsearch and start keeping data. Essentially even though logstash has a lot of formatting options I would like to just have my python script do the lifting and put it in a format that is easiest to work with downstream.
My problem is that no matter what I change in the logstash.conf file I can't get anything to print to my terminal showing what logstash is doing. I get no errors but when I execute this command:
test001#test001:~$ sudo /opt/logstash/bin/logstash -f /opt/logstash/logstash.conf
I get a message saying logstash has started correctly and the options of typing into my terminal but no stdout showing what it did if anything with the dictionary in my stockLog file.
So far I have tried "" around the file name and not. I have added the file output which you can see above to see if it actually writes anything to that file even though I don't see output on my terminal (it does not) and I have tried using the codec => rubydebug to see if logstash just needed an idea of the format I wanted to see. Nothing shows me any sign that logstash is doing anything.
Any help would be greatly appreciated and I there is more information needed by all means let me know.
Thanks!

In the end the answer turned out to be three steps.
Like mentioned above I needed to stop overwriting the file and just append to it instead.
I used the json filter to have the data easily broken down the way I wanted to see it. Once converted into json with json.dumps in python the logstash json filter handled the data easily.
I realized that it is pointless to try and see what logstash is going to do prior to putting it into elasticsearch because it is extremely easy to remove the information if it isn't shaped right (I am to indoctrinated by permanent indexes in splunk sorry guys).

Related

what are the usual problems that we face with sincedb in logstash

I am using ELK stack, so using file input plugin in logstash i am working on it
at first i used file*.txt to match with file pattern
later i used masterfile.txt as a single file which has the data of all matching patterns
and now i am going back to file*.txt , but here i see the problem- I am seeing the data on kibana which is the date after the file*.txt is replaced with masterfile.txt but not the history,
I feel like i must understand the behavior of sincedb logstash here
also a possible solution to get the history data
Logstash stores information about the position of the last byte read in the file that contains the logs with sincedb_path. During the execution, Logstash starts reading the input file from the mentioned position.
Take into account 'start_position' and the name of the index ( Logstash -> output) if you want to create a new index with different logs.
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#plugins-inputs-file-sincedb_path

Logstash to output events in Elasticsearch bulk API data format

Is is possible to have Logstash to output events in Elasticsearch bulk API data format?
The idea is to do some heavy parsing on many machines (without direct connectivity to the ES node) and then feed the data manually into ES.
Thank for the help.
Maybe if you need change the flush_size in Logstash with your value:
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-flush_size
Or send metadata in file using json codec and afterload directly on elasticsearch
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-file.html
Logstash is a single-line type of system, and the bulk format is a multi-line format. Here are two ideas:
1) see if the file{} output message_format can contain a newline. This would allow you to output the meta data line and then the data line.
2) use logstash's clone{} to make a copy of each event. In the "original" event, use the file{} output with a message_format that looks like the first line of the bulk output (index, type, id). In the cloned copy, the default file{} output might work (or use the message_format with the exact format you need).

Processing large json's using logstash - Not working and not printing any error

I started using logstash (on windows) when my main cause of use will be passing logstash a large json (10 mb), filtering the json somehow and write it out to elastic search.
As for now, I don't really care about the json filtering (I will care after I'll get this to work). I wan't the file to pass through logstash and get to my elastic search.
The client who feeds logstash uses a tcp connection.
My logstash simple configuration file looks like:
input
{
tcp
{
port=>7788
codec=>"json"
}
}
output
{
elasticsearch
{
hosts=>"localhost:9200"
codec=>"json"
}
stdout
{
codec=>rubydebug
}
}
This does work for me on small json inputs like:
{"foo":"bar", "bar": "foo"}
I see the logstash working and passing the data to elastic search and
everything's ok.
Also, when using the default codec ("text") it worked, but not as expected.
My problem starts when the inputs are large jsons.
Assuming I have a 10 mb json - what do I need to do with it so logstash will be able to handle it over tcp as a json? Should the file be indented or not? What encoding should I use before I convert it into bytes? What codec\settings should my logstash have?
BTW, when I use curl and through the large json directly to elastic search - it works - So there are no problems with the json.
Is there any way I can get some better tracing or at least know why I fail?
I found out that the problem wasn't the length but the lack of a newline - So all I needed to do was to add a newline to my log files.
BTW, there is no 4K length limit - At least not when working with TCP.

Logstash, grok filter not working for fixed length fields

I am a newbie to logstash, I have an input file with fixed length fields and a config file for log stash configured with the regexp as shown below:
Contents of my log stash configuration file first-pipeline.conf
# The # character at the beginning of a line indicates a comment. Use
# comments to describe your configuration.
input {
file {
path => "/Users/priya/sample.log"
start_position => beginning
}
}
filter {
grok {
match => ["message", "(?<RECORD_CODE>.{1})(?<SEQUENCE_NUMBER>.{6})(?<REG_NUMBER>.{12})(?<DATA_TYPE>.{3})"]
}
}
output {
stdout {}
}
Content of my sample.log file:
50000026311000920150044236080000000026
5000003631100092015005423608000000002
5000004631100092015006615054962
The output i get from log stash is:
priyas-MacBook-Pro:bin priya$ ./logstash -f first-pipeline.conf
Default settings used: Filter workers: 2
Logstash startup completed
Could someone please help me debug the issue and get it to working?
Thanks and regards,
Priya
I assume the problem in your case is not the grok expression itself but the way the file input is reading your test file.
The file input remebers where it last read from a logfile and continues reading from that position on subsequent runs (it stores this index in a special file called since_db). start_position => "beginning" only works for the first time you start logstash, on subsequent runs it will start reading from it last ended meaning you won't see any new lines in your console unless you a.) add new lines to your files or b.) manually delete the since_db file (sincedb_path => null is not working under windows, at least when I last tried).
So imho you should first make sure that your grok is working. To do is simply add the stdin input to your input section like this:
input {
stdin {
}
file {
path => "/Users/priya/sample.log"
start_position => beginning
}
}
Now you can manually create logstash events by simply typing in your console and pressing enter. These events will be parsed as regular logstash events and you will see the resulting json in your console as well (that's done by the stdout output fitler).
After you made sure your grok is working you can check wether or not logstash is picking up the file contents as you would expect it to. Restart logstash and add a new line of data to your /Users/priya/sample.log file (don't forget the newcline/CR at the end of the new line otherwise it wount be picked up). If logstash picks up the new line it should appear in your console output (because you added the stdout output filter).

Debugging new logstash grok filters before full use

I have been following this guide:
http://deviantony.wordpress.com/2014/06/04/logstash-debug-configuration/
Which I'm hoping will help me test my logstash filters to see if I get the desired output before using them full time.
As part of the guide it tells you to set up an input and output and then a filter file. the input seems to work fine:
input {
stdin { }
}
The output is this:
output {
stdout {
codec => json
}
file {
codec => json
path => /tmp/debug-filters.json
}
}
I am getting the following error when I try to run the logstash process (here I've run it with --configtest as the error advises me to try that, but it doesn't give any more information):
# /opt/logstash/bin/logstash -f /etc/logstash/debug.d -l /var/log/logstash/logstash-debug.log --configtest
Sending logstash logs to /var/log/logstash/logstash-debug.log.
Error: Expected one of #, ", ', -, [, { at line 21, column 17 (byte 288) after output {
stdout {
codec => json
}
file {
codec => json
path =>
I have tried removing the file section in my output and I can get the logstash process running, but when I paste my log line in to the shell I don't see the log entry broken down in to the components I am expecting the grok filter to break it in to. All I get when I do that is:
Oct 30 08:57:01 VERBOSE[1447] logger.c: == Manager 'sendcron' logged off from 127.0.0.1
{"message":"Oct 30 08:57:01 VERBOSE[1447] logger.c: == Manager 'sendcron' logged off from 127.0.0.1","#version":"1","#timestamp":"2014-10-31T16:09:35.205Z","host":"lumberjack.domain.com"}
Initially I was having a problem with a new grok filter, so I have now tried with an existing filter that I know works (as shown above it is an Asterisk 1.2 filter) and has been generating entries in to elasticsearch for some time.
I have tried touching the json file mentioned in the output, but that hasn't helped.
When I tail the logstash-debug.log now I just see the error that is also being written to my shell.
Any suggestions on debugging grok filters would be appreciated, if I have missed something blindingly obvious, apologies, I've only been working with ELK & grok for a couple of weeks and I might not be doing this in the most sensible way. I was hoping to be able to drop example log entries in to the shell and get the JSON formatted logstash entry to my console so I could see if my filter was working as I hoped, and tagging them up as they will be displayed in kibana at the end. If there is a better way to do this please let me know.
I am using logstash 1.4.2
As far as debugging a grok filter goes, you can use this link (http://grokdebug.herokuapp.com/) It has a very comprehensive pattern detector which is a good start.
As far your file output, you need "" around your path. Here is the example i use in production. Here is the documentation on file output http://logstash.net/docs/1.4.2/outputs/file#path
output {
stdout {
codec => rubydebug
}
file {
codec => "plain"
path => "./logs/logs-%{+YYYY-MM-dd}.txt"
}
}
The Grokconstructor is a similar Grok debugger to Grokdebug which #user3195649 mentioned. I like it's random examples.

Resources