how create multi index from on logfile in fileBeat config file

how create multi index from on logfile in fileBeat config file - logstash

i have a log file that created with tomcat, I want to create two indexes from this file so that one of the indexes contains all the log file information and the second index has part of the log file information.
At the moment, I have saved different parts of the log in different indexes, but I also want to have all the log file information in a separate index.
Can I solve this without running two filebeat in server?

Related

Can Databricks Autoloader Keep Track of File Uploading Time

Is it possible to keep track of S3 file uploading time with Databricks autoloader? Looks like Autoloader would add columns for the file name and processing time but in our user case we would need to know the order the files are uploaded to S3.

When you load the data, you can query the _metadata column (or specific attribute inside it) - it includes file_modification_time field that represents time of last file modification (that should match upload time).
Just do:
df.select("*", "_metadata.file_modification_time")
to get access to that field. See doc for details.

azure data factory: iterate over millions of files

Previously I had a problem on how to merge several JSON files into one single file,
which I was able to resolve it with the answer of this question.
At first, I tried with just some files by using wild cards in the file name in the connection section of the input dataset. But when I remove the file name, theory tells me that all of the files in all folders would be loaded recursively as I checked the copy recursively option, in the source section of the copy activity.
The problem is that when I manually trigger the pipeline after removing the file name from the input of the data set, only some of the files get loaded and the task ends successfully but only loading around 400+ files, each folder has 1M+ files, I want to create BIG csv files by merging all the small JSON files of the source (I already was able to create csv file by mapping the schemas in the copy activity).

It is probably stopping due to a timeout or out of memory exception.
One solution is to loop over the contents of the directory using
Directory.EnumerateFiles(searchDir)
This way you can process all the files without having the list / contents of all files in memory at the same time.

How can I sort through logs using regex

I have a directory full of log files, each one named for each day, ie, "log.2016-09-26" but they go back a long ways. I'm using filebeat to grab these logs from this directory, but my issue is that I only want the past 2 weeks/14 days. Filebeat wants a regex to filter out what files to exclude. What is the best way to filter these logs?

How to load files in a specific order

I would like to know how I can load some files in a specific order. For instance, I would like to load my files according to their timestamp, in order to make sure that subsequent data updates are replayed in the proper order.
Lets say I have 2 types of files : deal info files and risk files.
I would like to load T1_Info.csv, then T1_Risk.csv, T2_Info.csv, T2_Risk.csv...
I have tried to implement a comparator, as it is said on Confluence, but it seems that the loadInstructions file has the priority. It will order the Info files and the risk files independently. (loading T1_Info.csv, T2_Info.csv and then T1_Risk.csv, T2_Risk.csv..)
Do I have to implement a custom file loader, or is it possible using an AP configuration ?

The loading of the files based on load instructions is done in
com.quartetfs.tech.store.csv.impl.CSVDataModelFactory.load(List<FileLoadDescriptor>). The FileLoadDescriptor list you receive is created directly from the load instructions files.
What you can do is create a simple instructions files with 2 entries, one for deal info and one for risk. So your custom implementation of CSVDataModelFactory will be called with a list of two items. In your custom implementation you scan the directory where the files are, sort them in the order you want them to be parsed and call the super.load() with the list of FileLoadDescriptor you created from the directory scanning.
If you want to also load files that are place in the future in this folder you have to add to your load instructions a line that will match all files and that will make the super.load() implementation to create a directory watcher for that (you should then maybe override createDirectoryWatcher() to not watch the files already present in the folder when load is called).

a number of log4j config questions

I'm working on a project and we want to handle our logging using log4j. I am running into some issues that I am not able to easily resolve looking at the log4j docs, or other documentation online.
I get the basic idea of putting logging code throughout the codebase and then having the properties file assort the logged data into a hierarchy of appenders and how to write out to a file. That's fine. This basically allows me to create greppable log files in one hard coded folder, such as this:
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=example.log
But I have two basic questions: I want to have the log location be dynamic, such as:
log4j.appender.R.File={$processDir}/example.log
Also, every time the user runs this app, a folder will be created with the output files. I would like to have the log file be placed there, and I'm not sure how to accomplish that.
The other issue (although I think this will be a lot easier once the first issue is addressed...) is about creating a formatted log that does not necessarily reflect the process of how the app ran...for example, a title, followed by a list of all input files, a list of all output files, any warnings encountered.
I think for that I would create an object that implemented ObjectRenderer and write a doRender method that gave me the info I wanted.
Does that sound correct?
Thanks!

You can use variable with this syntax
log4j.appender.R.File=${processDir}/example.log
You must define the variables as system properties (es. -DprocessDir=...) or manually (after creating folder) with
System.setProperty("processDir",logDir);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how create multi index from on logfile in fileBeat config file - logstash

Related

Can Databricks Autoloader Keep Track of File Uploading Time

azure data factory: iterate over millions of files

How can I sort through logs using regex

How to load files in a specific order

a number of log4j config questions

Categories

Resources