Processing large json's using logstash - Not working and not printing any error - logstash

I started using logstash (on windows) when my main cause of use will be passing logstash a large json (10 mb), filtering the json somehow and write it out to elastic search.
As for now, I don't really care about the json filtering (I will care after I'll get this to work). I wan't the file to pass through logstash and get to my elastic search.
The client who feeds logstash uses a tcp connection.
My logstash simple configuration file looks like:
input
{
tcp
{
port=>7788
codec=>"json"
}
}
output
{
elasticsearch
{
hosts=>"localhost:9200"
codec=>"json"
}
stdout
{
codec=>rubydebug
}
}
This does work for me on small json inputs like:
{"foo":"bar", "bar": "foo"}
I see the logstash working and passing the data to elastic search and
everything's ok.
Also, when using the default codec ("text") it worked, but not as expected.
My problem starts when the inputs are large jsons.
Assuming I have a 10 mb json - what do I need to do with it so logstash will be able to handle it over tcp as a json? Should the file be indented or not? What encoding should I use before I convert it into bytes? What codec\settings should my logstash have?
BTW, when I use curl and through the large json directly to elastic search - it works - So there are no problems with the json.
Is there any way I can get some better tracing or at least know why I fail?

I found out that the problem wasn't the length but the lack of a newline - So all I needed to do was to add a newline to my log files.
BTW, there is no 4K length limit - At least not when working with TCP.

Related

logstash refresh lookup file when using translate function

I have the yml file which I used the "traslate" function to do lookup.
What was done is to translate a string like "superhost.com" to "found".
My problem is that if I were to add in more entries there entries will not be reflected.
For example
I add a "ultrahost.com" entry into the yml file while logstash is still running. Incoming logs with "ultrahost.com" will not be translated to "found". This will only work after I have restarted the logstash script.
There is a refresh_interval parameter to the translate plugin that can be used to specify how often to re-read the file. The default is 300 seconds (5 minutes). You can lower that to be whatever interval you think will satisfy how often that the file will be updated.

logstash custom log that has xml tags inside

I have a custom log file that has plain text as well as xml tags. How do i capture these in separate fields. Here is how it looks like:
1/10/2017 4:16:35 AM :
Error thrown is:
No Error
Request sent is:
SCEO415154712
Response received is:
SCEO4151547trueTBAfalse7169-1TBAfalse2389-1
1/10/2017 4:16:35 AM :
Error thrown is:
No Error
*************************************************************************
Request sent is:
<InventoryMgmtRequest xmlns="http://www.af.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtRequest"><ns0:MsgHeader MessageType="FIXORD" MsgDate="10.01.2017 04:16:32" SystemOfOrigin="ISCS_DE" CommunityID="SG888" xmlns:ns0="http://www.av.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtRequest"><ns0:OrderID>SCEO4151547</ns0:OrderID><ns0:ReservationID></ns0:ReservationID><ns0:CRD></ns0:CRD></ns0:MsgHeader><ns0:MsgBody xmlns:ns0="http://www.ab.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtRequest"><ns0:Product Sku="CH562EE" Qty="1" IsExpress="false" IsTangible="true" Region="EMEA" Country="DE"><ns0:ProdType></ns0:ProdType><ns0:LineItemNum>1</ns0:LineItemNum><ns0:JCID></ns0:JCID></ns0:Product><ns0:Product Sku="CH563EE" Qty="1" IsExpress="false" IsTangible="true" Region="EMEA" Country="DE"><ns0:ProdType></ns0:ProdType><ns0:LineItemNum>2</ns0:LineItemNum><ns0:JCID></ns0:JCID></ns0:Product></ns0:MsgBody></InventoryMgmtRequest>
*************************************************************************
Response received is:
<ns0:InventoryMgmtResponse xmlns:ns0="http://www.ad.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtResponse"><ns0:MsgHeader MsgDate="10.01.2017 04:16:32" MessageType="FIXORD"><ns0:OrderID>SCEO4151547</ns0:OrderID><ns0:ReservationID /><ns0:ReadyToRelease>true</ns0:ReadyToRelease></ns0:MsgHeader><ns0:MsgBody><ns0:Product SKU="CH562EE" LSPSKU="9432GFT" OutOfStock="false" FulfillmentSite="00ZF" SKUExist="true" Region="EMEA" Country="DE" IsTangible="true"><ns0:EDD>TBA</ns0:EDD><ns0:FutureUsed>false</ns0:FutureUsed><ns0:CurrentQty>7169</ns0:CurrentQty><ns0:FutureQty>-1</ns0:FutureQty></ns0:Product><ns0:Product SKU="CH563EE" LSPSKU="9432GFU" OutOfStock="false" FulfillmentSite="00ZF" SKUExist="true" Region="EMEA" Country="DE" IsTangible="true"><ns0:EDD>TBA</ns0:EDD><ns0:FutureUsed>false</ns0:FutureUsed><ns0:CurrentQty>2389</ns0:CurrentQty><ns0:FutureQty>-1</ns0:FutureQty></ns0:Product></ns0:MsgBody></ns0:InventoryMgmtResponse>
*************************************************************************
Also I don't want to capture the line separators (line full of **** at the end) in my grok fields.
There is no simple answer here I'm afraid. Logstash and other log processing tools works line by line, each line is an event. If your events span more than one line you can use the multiline codec, which is pretty powerful, but in my experience you are better off trying to get the logs on to single lines at source, this makes it so much easier to write a pattern and get the process working reliably.
The issues you have here are many, but if, for example, one of your messages (sent via TCP) is retransmitted for some reason or simply (sent via UDP) lost, your pattern will break as part of the message that logstash is expecting is not there.
The best thing you can do in my opinion is to try and change the logging process to save to a file as a single line per event. Most logging tools should allow this with the right config options. Ideally, get your application to log in json format, (assuming you're processing logs to save them in elasticsearch) this would involve the lowest overhead on the logstash server to process these logs (as elasticsearch saves them in json format). All you would then need to do is pass each event/log line to the json filter and the fields are generated by the names given to it by your application.

Logstash to output events in Elasticsearch bulk API data format

Is is possible to have Logstash to output events in Elasticsearch bulk API data format?
The idea is to do some heavy parsing on many machines (without direct connectivity to the ES node) and then feed the data manually into ES.
Thank for the help.
Maybe if you need change the flush_size in Logstash with your value:
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-flush_size
Or send metadata in file using json codec and afterload directly on elasticsearch
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-file.html
Logstash is a single-line type of system, and the bulk format is a multi-line format. Here are two ideas:
1) see if the file{} output message_format can contain a newline. This would allow you to output the meta data line and then the data line.
2) use logstash's clone{} to make a copy of each event. In the "original" event, use the file{} output with a message_format that looks like the first line of the bulk output (index, type, id). In the cloned copy, the default file{} output might work (or use the message_format with the exact format you need).

issue having logstash read a file and output to both stdout and another file

I have a project I am working on and wanted to try to hook it up to the ELK stack beginning with logstash. Essentially I have python writing this to a file named stockLog:
{'price': 30.98, 'timestamp': '2015-08-03 09:51:54', 'symbol':'FORTY',
'ceiling': Decimal('31.21'), 'floor': Decimal('30.68')}
I have logstash installed and (ideally) ready to run. My logstash.conf file looks like this:
input {
file { path => "/home/test001/stockLog"
start_position => beginning }
}
output {
stdout {}
file {
path => "/home/test001/testlog"
}
}
My goal is to actually be able to see how logstash is going to read the python dictionary before I install Elasticsearch and start keeping data. Essentially even though logstash has a lot of formatting options I would like to just have my python script do the lifting and put it in a format that is easiest to work with downstream.
My problem is that no matter what I change in the logstash.conf file I can't get anything to print to my terminal showing what logstash is doing. I get no errors but when I execute this command:
test001#test001:~$ sudo /opt/logstash/bin/logstash -f /opt/logstash/logstash.conf
I get a message saying logstash has started correctly and the options of typing into my terminal but no stdout showing what it did if anything with the dictionary in my stockLog file.
So far I have tried "" around the file name and not. I have added the file output which you can see above to see if it actually writes anything to that file even though I don't see output on my terminal (it does not) and I have tried using the codec => rubydebug to see if logstash just needed an idea of the format I wanted to see. Nothing shows me any sign that logstash is doing anything.
Any help would be greatly appreciated and I there is more information needed by all means let me know.
Thanks!
In the end the answer turned out to be three steps.
Like mentioned above I needed to stop overwriting the file and just append to it instead.
I used the json filter to have the data easily broken down the way I wanted to see it. Once converted into json with json.dumps in python the logstash json filter handled the data easily.
I realized that it is pointless to try and see what logstash is going to do prior to putting it into elasticsearch because it is extremely easy to remove the information if it isn't shaped right (I am to indoctrinated by permanent indexes in splunk sorry guys).

Debugging new logstash grok filters before full use

I have been following this guide:
http://deviantony.wordpress.com/2014/06/04/logstash-debug-configuration/
Which I'm hoping will help me test my logstash filters to see if I get the desired output before using them full time.
As part of the guide it tells you to set up an input and output and then a filter file. the input seems to work fine:
input {
stdin { }
}
The output is this:
output {
stdout {
codec => json
}
file {
codec => json
path => /tmp/debug-filters.json
}
}
I am getting the following error when I try to run the logstash process (here I've run it with --configtest as the error advises me to try that, but it doesn't give any more information):
# /opt/logstash/bin/logstash -f /etc/logstash/debug.d -l /var/log/logstash/logstash-debug.log --configtest
Sending logstash logs to /var/log/logstash/logstash-debug.log.
Error: Expected one of #, ", ', -, [, { at line 21, column 17 (byte 288) after output {
stdout {
codec => json
}
file {
codec => json
path =>
I have tried removing the file section in my output and I can get the logstash process running, but when I paste my log line in to the shell I don't see the log entry broken down in to the components I am expecting the grok filter to break it in to. All I get when I do that is:
Oct 30 08:57:01 VERBOSE[1447] logger.c: == Manager 'sendcron' logged off from 127.0.0.1
{"message":"Oct 30 08:57:01 VERBOSE[1447] logger.c: == Manager 'sendcron' logged off from 127.0.0.1","#version":"1","#timestamp":"2014-10-31T16:09:35.205Z","host":"lumberjack.domain.com"}
Initially I was having a problem with a new grok filter, so I have now tried with an existing filter that I know works (as shown above it is an Asterisk 1.2 filter) and has been generating entries in to elasticsearch for some time.
I have tried touching the json file mentioned in the output, but that hasn't helped.
When I tail the logstash-debug.log now I just see the error that is also being written to my shell.
Any suggestions on debugging grok filters would be appreciated, if I have missed something blindingly obvious, apologies, I've only been working with ELK & grok for a couple of weeks and I might not be doing this in the most sensible way. I was hoping to be able to drop example log entries in to the shell and get the JSON formatted logstash entry to my console so I could see if my filter was working as I hoped, and tagging them up as they will be displayed in kibana at the end. If there is a better way to do this please let me know.
I am using logstash 1.4.2
As far as debugging a grok filter goes, you can use this link (http://grokdebug.herokuapp.com/) It has a very comprehensive pattern detector which is a good start.
As far your file output, you need "" around your path. Here is the example i use in production. Here is the documentation on file output http://logstash.net/docs/1.4.2/outputs/file#path
output {
stdout {
codec => rubydebug
}
file {
codec => "plain"
path => "./logs/logs-%{+YYYY-MM-dd}.txt"
}
}
The Grokconstructor is a similar Grok debugger to Grokdebug which #user3195649 mentioned. I like it's random examples.

Resources