As per the documentation of logstash's file plugin, the section on File Rotation says the following:
To support programs that write to the rotated file for some time after
the rotation has taken place, include both the original filename and
the rotated filename (e.g. /var/log/syslog and /var/log/syslog.1) in
the filename patterns to watch (the path option).
If anyone can clarify how to specify two filenames in the path configuration, that will be of great help as I did not find an exact example. Some examples suggest to use wild-cards like /var/log/syslog*, however I am looking for an example that achieves exactly what is said in documentation - two filenames in the path option.
The attribute path is an array and thus you can specify multiple files as follows:
input {
file{
path => [ "/var/log/syslog.log", "/var/log/syslog1.log"]
}
}
You can also use * notation for name or directory as follows:
input {
file{
path => [ "/var/log/syslog.log", "/var/log/syslog1.log", "/var/log/*.log", "/var/*/*.log"]
}
}
When you specify path as /var/*/*.log it does a recursive search to get all files with .log extension.
Reference Documentation
Related
Right now the databricks autoloader requires a directory path where all the files will be loaded from. But in case some other kind of log files also start coming in in that directory - is there a way to ask Autoloader to exclude those files while preparing dataframe?
df = spark.readStream.format("cloudFiles") \
.option(<cloudFiles-option>, <option-value>) \
.schema(<schema>) \
.load(<input-path>)
Autoloader supports specification of the glob string as <input-path> - from documentation:
<input-path> can contain file glob patterns
Glob syntax support different options, like, * for any character, etc. So you can specify input-path as, path/*.json for example. You can exclude files as well, but building that pattern could be slightly more complicated, compared to inclusion pattern, but it's still possible - for example, *.[^l][^o][^g] should exclude files with .log extension
Use pathGlobFilter as one of the option and provide a regex to filter a file type or file with specific name.
For instance, to skip files with filename as A1.csv, A2.csv .... A9.csv from load location, the value for pathGlobFilter will look like:
df = spark.read.load("/file/load/location,
format="csv",
schema=schema,
pathGlobFilter="A[0-9].csv")
In my package, I would like to use one .po file for each .py script it contains.
Here is my file tree :
foo
mainscript.py
commands/
commandOne.py
locales/fr/LC_MESSAGES/
mainscript_fr.po
commandOne_fr.po
In the mainscript.py, I got the following line to apply gettext to the strings :
if "fr" in os.environ['LANG']:
traduction = gettext.translation('mainscript_fr', localedir='./locales', languages=['fr'])
traduction.install()
else:
gettext.install('')
Until now, it is working as expected. But now I would like to add another .po file to translates the strings in commandOne.py.
I tried the following code :
if "fr" in os.environ['LANG']:
traduction = gettext.translation('commandOne_fr', localedir='../locales', languages=['fr'])
traduction.install()
else:
gettext.install('')
But I get a "FileNotFoundError: [Errno 2] No translation file found for domain: 'commandOne_fr' "
How can I use multiple file like that ? The package being a cli, there is many strings in a single file because of the help man and verbose mode...etc and this is not acceptable to have a single .po file with hundreds of strings.
Note : The mainscript.py calls a function from commandOne.py, which is itself inherited from an abstract class that contains other strings to translate... so I hope if any solution exists that it will also be applicable to the abstract class file.
Thank you
Translations are retrieved from .mo files, not .po files, see https://docs.python.org/3/library/gettext.html#gettext.translation. Most probably you have to compile CommandOne_fr.po with the program msgfmt into CommandOne_fr.mo.
Two more hints:
What you are doing looks like a premature optimization. You won't have any performance problem until the number of translations gets really big. Rather wait for that to happen.
Why the _fr in the name of the translation files? The language code fr is already a path component.
I want to ask that is there a way to use grok or regex match in file input plugin in logstash ? And if yes, how ?
Thanks for answering.
With your limited details, I assume you want to pick up some specific files in your directories, based on some matching patterns. This can be handled in multiple ways:
Exclusions (matched against the filename, not full path).
Filename patterns are valid here, too. For example, if you have to just include ".log" files from a directory, you can use:
input { file {
path => "/var/log/applicationDir/*.log"
} }
In Tail mode, you might want to exclude gzipped files:
input { file {
path => "/var/log/applicationDir/"
exclude => "*.gz"
} }
I use pathlib to match all files recursively to filter the files based on their content. Then I would like to find what is the top level of the folder of this file. Assume the following. I have a file in the folder:
a/b/c/file.log
I do the search from the level a:
for f in path_data.glob("**/*"):
if something inside file f:
# I would like to get in what folder this file is, i.e. 'b'
I now that I can get all parents levels using:
f.parents would give me b/c
f.parent would give me c
f.name would give me file.log
But how could I get b?
Just to precise: the number of levels where the file is stored is not known.
UPD: I know I could do it with split, but I would like to know if there is a proper API to do that. I couldn't find it.
The question was asked a while ago, but didn't quite get the attention. Nevertheless, I still would publish the answer:
f.parts[0]
I am using logstash file input with glob to read my files:
path => "/home/Desktop/LogstashInput/**/*.log"
Directory structure format:
LogstashInput => server-name => date => abc.log
This is reading all log files within every date directory ending with ".log".
Now I want to read only some particular log files within all date directories. Eg: 2014.11.05 directory has abc.log, xyz.log............ 10 such files. Then I want to read say only five particular files, how should the path input be ??
I read about exclude in logstash but it becomes a lot of files to be excluded as there are different type of files within different server-name directories and different dates
The logstash agent is written in ruby, so refer to the ruby glob rules. Based on your actual file names, you might be able to get one working.