Keywords as Fields in Logstash - logstash

I am new to ELK stack.
My requirement is to read several .log files and analyze the data in Kibana.
In the log file, I have several occurrences of certain keyword, let's say "xyz".
Is there any way, I can create a field for this keyword ("xyz") in the logstash conf file ?
I have googled/youtube/read the materials but grok is using "WORD" pattern which is not going to help as all the String letters will come under "WORD" category.
Please help.

Related

I want to add new column that contains html files in solr indexer using nutch 1.17 version

I want to add new column that contains htmls files(raw html files).May I know what configurations changes are required.I read segment reader that contains content folder but output is text file i want to index the htmls files in a column.May I know how could I achieve.
You may have to face special character issues in raw HTML when indexing in Solr. Anyhow, first you should1 customize index-basic plugin in Nutch. Its class name is BasicIndexingFilter.java. Update this class with followings:
String htmlcontent = parse.getData();
doc.add("htmlContent", StringUtil.cleanField(htmlcontent));
After this, you also have to add a field with Solr Schem "htmlContent". Hopefully it will solve your issue.
There may be others options also for this task.
I found another option as commented that works best. Use nutch CLI
bin/nutch index crawldb-path -dir segments-directory -addBinarycontent -base64

Using a list for a feature in an ML model

I want to run a machine learning algorithm on some data, so I'm exporting the data into a file first.
But one of my features for the text I'm classifying is a list of tags,
and each text can have multiple tags ex. (["mystery", "thriller"]).
Is it recommended that when I write to my CSV file for exporting the data, that I write that entire list as one of the features for my data (the "tags" feature).
Or is it better to make a separate feature for each tag. The only problem then is that most examples will only have one tag, so the other feature columns for those will be blank.
So it seems like writing this list of tags as one feature makes the most sense, but then when parsing it for training, would I then treat every element of that list as its own feature still or no?
If you do it as a single feature just make sure to use some delimiter to separate the tags that won't occur in any of the tags, and also isn't a comma (as that will mess with the csv format), something like | would probably do fine. When you go to build your models and read in that list of tags you can then split it based on that delimiter. In Java this would look like:
String[] tagList = inputString.split("|");
I'm sure most languages will have a similar method to do this.

how to show contents of the file rather than filename when searching by solr

I have a lot of pdf files (text inside), and I want to build a simple search engine to search the sentences which contains the given keywords. After several hours' searching, I chose solr as the tool.
I am new to solr. I downloaded latest solr 6.5.0 and set it up in windows 7.
I have used the following commands to create a collection called gettingstarted and can search operation by visiting the link http://localhost:8983/solr/gettingstarted/browse
bin\solr.cmd start
bin\solr.cmd create -c gettingstarted
java -Dauto -Dc=gettingstarted -Drecursive -jar example/exampledocs/post.jar *.pdf
However, it only shows the filename which contains the keyword rather than the detail lines of the file. The following picture shows this case:
I also tried the integrated example called techproducts and to my surprise, it can show the exact sentences which contains the keywords. The following picture shows this case:
So I have a question if I can do something to enable the sentences which contains exact keywords show in the first picture. I don't know about velocity, config files and even the underlying principles. I just want it work, giving the detail search results. I do not care about the security issues and also do not care about the way it shows (uglyness is OK).
It is the first day I play with solr, so maybe I made some mistakes about the description. Thanks for your patience. I need your help.
http://localhost:8983/solr/gettingstarted/browse
this is example UI application (solritas )which comes by default with solr.
You should use /select request handler to query, which handles you query and retrieve results.
http://localhost:8983/solr/gettingstarted/select?q=keyword
For Indexing PDF.
when you index pdf, all content inside pdf goes to field called content by default.
Example:
Assuming you created gettingstarted collection already.
Navigate to directory example/exampledocs/ and hit this command.
java -Dauto -Dc=gettingstarted -jar post.jar solr-word.pdf
if it indexed successfully. go to admin and search for keyword inside pdf, it should give content field with value (text inside pdf)
example query request URL
http://localhost:8983/solr/gettingstarted/select?q=solr&wt=json&indent=on

How can I sort through logs using regex

I have a directory full of log files, each one named for each day, ie, "log.2016-09-26" but they go back a long ways. I'm using filebeat to grab these logs from this directory, but my issue is that I only want the past 2 weeks/14 days. Filebeat wants a regex to filter out what files to exclude. What is the best way to filter these logs?

How to parse single file for different outputs

Does somebody know how to parse single line from a file and parse it for different outputs? For example: input is a log file, outputs are elasticsearch indices with different templates. I need to parse every line and save it into the first index and some of lines which has a promo code (like ?promo=wteaewfsthser) I need to put to another index as well. I think it's possible to use two logstash instances (correct me if I'm wrong please). But I want to know is it possible to use single instance of logstash and one configuration file?
Thanks,
Igor
Sounds like you're looking for clone. Note that only the filters that are present after the clone{} will be run on the cloned event.

Resources