Logstash to output events in Elasticsearch bulk API data format - logstash

Is is possible to have Logstash to output events in Elasticsearch bulk API data format?
The idea is to do some heavy parsing on many machines (without direct connectivity to the ES node) and then feed the data manually into ES.
Thank for the help.

Maybe if you need change the flush_size in Logstash with your value:
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-flush_size
Or send metadata in file using json codec and afterload directly on elasticsearch
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-file.html

Logstash is a single-line type of system, and the bulk format is a multi-line format. Here are two ideas:
1) see if the file{} output message_format can contain a newline. This would allow you to output the meta data line and then the data line.
2) use logstash's clone{} to make a copy of each event. In the "original" event, use the file{} output with a message_format that looks like the first line of the bulk output (index, type, id). In the cloned copy, the default file{} output might work (or use the message_format with the exact format you need).

Related

Logstash Read a Property File

I am looking for a way of reading property file in logstash config file so that I can do some data transformation based on the property file value? for example I can skip processing type 1 event and send to index a, process type 2 events and sent to index 2.
If I understand your question correctly, note that logstash will read all the files in your config directory. You can put different processing blocks in different config files, which makes for a nice separation of code. Be sure that each block is wrapped in a conditional so that they don't all run for all events.

what are the usual problems that we face with sincedb in logstash

I am using ELK stack, so using file input plugin in logstash i am working on it
at first i used file*.txt to match with file pattern
later i used masterfile.txt as a single file which has the data of all matching patterns
and now i am going back to file*.txt , but here i see the problem- I am seeing the data on kibana which is the date after the file*.txt is replaced with masterfile.txt but not the history,
I feel like i must understand the behavior of sincedb logstash here
also a possible solution to get the history data
Logstash stores information about the position of the last byte read in the file that contains the logs with sincedb_path. During the execution, Logstash starts reading the input file from the mentioned position.
Take into account 'start_position' and the name of the index ( Logstash -> output) if you want to create a new index with different logs.
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#plugins-inputs-file-sincedb_path

Stream Analytics -not seeing the output

I use Stream Analytic to save the EventHub data into an SQL DataBase.
Even though I can see that I have both input and output requests, when I write a query to see the data from the output table I can see just 200 empty rows!! So I send data to this table, but are just NULL values
I think the problem may bethe query between input and output because my output table is empty :(. This is how I wrote it:
SELECT id,sensor,val FROM EventHubInput
Could there be another problem?
I have to mention that my EventHub is the link between a Meshlium and Azure.This is why I think my problem can be also from the frame I send from Meshlium.
I really don't know what to do. HELP ?!
You haven't specified any output.
SELECT id,sensor,val
OUTPUT YourSQLOutput
FROM EventHubInput
Stream Analytics queries' default output is output.
So if your SQL DB alias is SQLDbOutput, it won't work. You should specify it yourself:
SELECT id,sensor,val
INTO SQLDbOutput
FROM EventHubInput
The editor in Azure should tell you the names of your inputs and outputs on the left.
Also make sure your events in Event Hub contain those properties (id, sensor, val), and that the SQL DB contains columns with the same names.

logstash custom log that has xml tags inside

I have a custom log file that has plain text as well as xml tags. How do i capture these in separate fields. Here is how it looks like:
1/10/2017 4:16:35 AM :
Error thrown is:
No Error
Request sent is:
SCEO415154712
Response received is:
SCEO4151547trueTBAfalse7169-1TBAfalse2389-1
1/10/2017 4:16:35 AM :
Error thrown is:
No Error
*************************************************************************
Request sent is:
<InventoryMgmtRequest xmlns="http://www.af.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtRequest"><ns0:MsgHeader MessageType="FIXORD" MsgDate="10.01.2017 04:16:32" SystemOfOrigin="ISCS_DE" CommunityID="SG888" xmlns:ns0="http://www.av.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtRequest"><ns0:OrderID>SCEO4151547</ns0:OrderID><ns0:ReservationID></ns0:ReservationID><ns0:CRD></ns0:CRD></ns0:MsgHeader><ns0:MsgBody xmlns:ns0="http://www.ab.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtRequest"><ns0:Product Sku="CH562EE" Qty="1" IsExpress="false" IsTangible="true" Region="EMEA" Country="DE"><ns0:ProdType></ns0:ProdType><ns0:LineItemNum>1</ns0:LineItemNum><ns0:JCID></ns0:JCID></ns0:Product><ns0:Product Sku="CH563EE" Qty="1" IsExpress="false" IsTangible="true" Region="EMEA" Country="DE"><ns0:ProdType></ns0:ProdType><ns0:LineItemNum>2</ns0:LineItemNum><ns0:JCID></ns0:JCID></ns0:Product></ns0:MsgBody></InventoryMgmtRequest>
*************************************************************************
Response received is:
<ns0:InventoryMgmtResponse xmlns:ns0="http://www.ad.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtResponse"><ns0:MsgHeader MsgDate="10.01.2017 04:16:32" MessageType="FIXORD"><ns0:OrderID>SCEO4151547</ns0:OrderID><ns0:ReservationID /><ns0:ReadyToRelease>true</ns0:ReadyToRelease></ns0:MsgHeader><ns0:MsgBody><ns0:Product SKU="CH562EE" LSPSKU="9432GFT" OutOfStock="false" FulfillmentSite="00ZF" SKUExist="true" Region="EMEA" Country="DE" IsTangible="true"><ns0:EDD>TBA</ns0:EDD><ns0:FutureUsed>false</ns0:FutureUsed><ns0:CurrentQty>7169</ns0:CurrentQty><ns0:FutureQty>-1</ns0:FutureQty></ns0:Product><ns0:Product SKU="CH563EE" LSPSKU="9432GFU" OutOfStock="false" FulfillmentSite="00ZF" SKUExist="true" Region="EMEA" Country="DE" IsTangible="true"><ns0:EDD>TBA</ns0:EDD><ns0:FutureUsed>false</ns0:FutureUsed><ns0:CurrentQty>2389</ns0:CurrentQty><ns0:FutureQty>-1</ns0:FutureQty></ns0:Product></ns0:MsgBody></ns0:InventoryMgmtResponse>
*************************************************************************
Also I don't want to capture the line separators (line full of **** at the end) in my grok fields.
There is no simple answer here I'm afraid. Logstash and other log processing tools works line by line, each line is an event. If your events span more than one line you can use the multiline codec, which is pretty powerful, but in my experience you are better off trying to get the logs on to single lines at source, this makes it so much easier to write a pattern and get the process working reliably.
The issues you have here are many, but if, for example, one of your messages (sent via TCP) is retransmitted for some reason or simply (sent via UDP) lost, your pattern will break as part of the message that logstash is expecting is not there.
The best thing you can do in my opinion is to try and change the logging process to save to a file as a single line per event. Most logging tools should allow this with the right config options. Ideally, get your application to log in json format, (assuming you're processing logs to save them in elasticsearch) this would involve the lowest overhead on the logstash server to process these logs (as elasticsearch saves them in json format). All you would then need to do is pass each event/log line to the json filter and the fields are generated by the names given to it by your application.

Processing large json's using logstash - Not working and not printing any error

I started using logstash (on windows) when my main cause of use will be passing logstash a large json (10 mb), filtering the json somehow and write it out to elastic search.
As for now, I don't really care about the json filtering (I will care after I'll get this to work). I wan't the file to pass through logstash and get to my elastic search.
The client who feeds logstash uses a tcp connection.
My logstash simple configuration file looks like:
input
{
tcp
{
port=>7788
codec=>"json"
}
}
output
{
elasticsearch
{
hosts=>"localhost:9200"
codec=>"json"
}
stdout
{
codec=>rubydebug
}
}
This does work for me on small json inputs like:
{"foo":"bar", "bar": "foo"}
I see the logstash working and passing the data to elastic search and
everything's ok.
Also, when using the default codec ("text") it worked, but not as expected.
My problem starts when the inputs are large jsons.
Assuming I have a 10 mb json - what do I need to do with it so logstash will be able to handle it over tcp as a json? Should the file be indented or not? What encoding should I use before I convert it into bytes? What codec\settings should my logstash have?
BTW, when I use curl and through the large json directly to elastic search - it works - So there are no problems with the json.
Is there any way I can get some better tracing or at least know why I fail?
I found out that the problem wasn't the length but the lack of a newline - So all I needed to do was to add a newline to my log files.
BTW, there is no 4K length limit - At least not when working with TCP.

Resources