Remove logstash Index/s using a bash script - linux

I am looking for a way to remove old Logstash indexes using a script, my logstash indexs are named logstash-2016.02.29, logstash-2016.03.01 ... at the moment I use an extention in chrome called Sense to remove the indexes. see screen shot, or I can also use curl to remove the indexes, curl -XDELETE 'http://myIpAddress:9200/logstash-2016.02.29'
I would like to write a script that would run daily and remove logstash index older than 2 weeks from Elasticsearch. Is this possible and if so how can I do it using the date from the name of the index?
G

Just use the find command:
find . logstash* -mtime +14 -type f -delete
This searches in the current directory and below, for all files whose name starts with "logstash", that are older than 14 days, and then deletes them.
If the file times are totally unreliable, and you have to use the filenames, try something like this:
#!/bin/bash
testdate=$(date -d '14 days ago' '+%Y%m%d')
for f in ./logstash-[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]; do
dt=$(basename "${f//.}")
dt=${dt#logstash-}
[ $dt -le $testdate ] && rm -f "$f"
done

Related

Bash command to archive files daily based on date added

I have a suite of scripts that involve downloading files from a remote server and then parsing them. Each night, I would like to create an archive of the files downloaded that day.
Some constraints are:
Downloading from a Windows server to an Ubuntu server.
Inability to delete files on the remote server.
Require the date added to the local directory, not the date the file was created.
I have deduplication running at the downloading stage; however, (using ncftp), the check involves comparing the remote and local directories. A strategy is to create a new folder each day, download files into it and then tar it sometime after midnight. A problem arises in that the first scheduled download on the new day will grab ALL files on the remote server because the new local folder is empty.
Because of the constraints, I considered simply archiving files based on "date added" to a central folder. This works very well using a Mac because HFS+ stores extended metadata such as date created and date added. So I can combine a tar command with something like below:
mdls -name kMDItemFSName -name kMDItemDateAdded -raw *.xml | \
xargs -0 -I {} echo {} | \
sed 'N;s/\n/ /' | \
but there doesn't seem to be an analogue under linux (at least not with EXT4 that I am aware of).
I am open to any form of solution to get around doubling up files into a subsequent day. The end result should be an archives directory full of tar.gz files looking something like:
files_$(date +"%Y-%m-%d").tar.gz
Depending on the method that is used to backup the files, the modified or changed date should reflect the time it was copied - for example if you used cp -p to back them up, the modified date would not change but the changed date would reflect the time of copy.
You can get this information using the stat command:
stat <filename>
which will return the following (along with other file related info not shown):
Access: 2016-05-28 20:35:03.153214170 -0400
Modify: 2016-05-28 20:34:59.456122913 -0400
Change: 2016-05-29 01:39:52.070336376 -0400
This output is from a file that I copied using cp -p at the time shown as 'change'.
You can get just the change time by calling stat with a specified format:
stat -c '%z' <filename>
2016-05-29 01:39:56.037433640 -0400
or with capital Z for that time in seconds since epoch. You could combine that with the date command to pull out just the date (or use grep, etc)
date -d "`stat -c '%z' <filename>" -I
2016-05-29
The command find can be used to find files by time frame, in this case using the flags -cmin 'changed minutes', -mmin 'modified minutes', or unlikely, -amin 'accessed minutes'. The sequence of commands to get the minutes since midnight is a little ugly, but it works.
We have to pass find an argument of "minutes since a file was last changed" (or modified, if that criteria works). So first you have to calculate the minutes since midnight, then run find.
min_since_mid=$(echo $(( $(date +%s) - $(date -d "(date -I) 0" +%s) )) / 60 | bc)
Unrolling that a bit:
$(date +%s) == seconds since epoch until 'now'
"(date -I) 0" == todays date in format "YYYY-MM-DD 0" with 0 indicating 0 seconds into the day
$(date -d "(date -I 0" +%s)) == seconds from epoch until today at midnight
Then we (effectively) echo ( $now - $midnight ) / 60 to bc to convert the results into minutes.
The find call is passed the minutes since midnight with a leading '-' indicating up to X minutes ago. A'+' would indicate X minutes or more ago.
find /path/to/base/folder -cmin -"$min_since_mid"
The actual answer
Finally to create a tgz archive of files in the given directory (and subdirectories) that have been changed since midnight today, use these two commands:
min_since_mid=$(echo $(( $(date +%s) - $(date -d "(date -I) 0" +%s) )) / 60 | bc)
find /path/to/base/folder -cmin -"${min_since_mid:-0}" -print0 -exec tar czvf /path/to/new/tarball.tgz {} +
The -print0 argument to find tells it to delimit the files with a null string which will prevent issues with spaces in names, among other things.
The only thing I'm not sure on is you should use the changed time (-cmin), the modified time (-mmin) or the accessed time (-amin). Take a look at your backup files and see which field accurately reflects the date/time of the backup - I would think changed time, but I'm not certain.
Update: changed -"$min_since_mid" to -"${min_since_mid:-0}" so that if min_since_mid isn't set you won't error out with invalid argument - you just won't get any results. You could also surround the find with an if statement to block the call if that variable isn't set properly.

shell - faster alternative to "find"

I'm writing a shell script wich should output the oldest file in a directory.
This directory is on a remote server and has (worst case) between 1000 and 1500 (temporary) files in it. I have no access to the server and I have no influence on how the files are stored. The server is connect through a stable but not very fast line.
The result of my script is passed to a monitoring system wich in turn allerts the staff if there are too many (=unprocessed) files in the directory.
Unfortunately the monitoring system only allows a maximun execution time of 30 seconds for my script before a timeout occurs.
This wasn't a problem when testing with small directories, this wasn't a problem. Testing with the target directory over the remote-mounted directory (approx 1000 files) it is.
So I'm looking for the fastest way to get things like "the oldest / newest / largest / smallest" file in a directory (not recursive) without using 'find' or sorting the output of 'ls'.
Currently I'm using this statement in my sh script:
old)
# return oldest file (age in seconds)
oldest=`find $2 -maxdepth 1 -type f | xargs ls -tr | head -1`
timestamp=`stat -f %B $oldest`
curdate=`date +%s`
echo `expr $(($curdate-$timestamp))`
;;
and I tried this one:
gfind /livedrive/669/iwt.save -type f -printf "%T# %P\n" | sort -nr | tail -1 | cut -d' ' -f 2-
wich are two of many variants of statements one can find using google.
Additional information:
I'writing this on a FreeBSD Box with sh und bash installed. I have full access to the box and can install programs if needed. For reference: gfind is the GNU-"find" utuility as known from linux as FreeBSD has another "find" installed by default.
any help is appreciated
with kind regards,
dura-zell
For the oldest/newest file issue, you can use -t option to ls which sorts the output using the time modified.
-t Sort by descending time modified (most recently modified first).
If two files have the same modification timestamp, sort their
names in ascending lexicographical order. The -r option reverses
both of these sort orders.
For the size issue, you can use -S to sort file by size.
-S Sort by size (largest file first) before sorting the operands in
lexicographical order.
Notice that for both cases, -r will reverse the order of the output.
-r Reverse the order of the sort.
Those options are available on FreeBSD and Linux; and must be pretty common in most implementations of ls.
Let use know if it's fast enough.
In general, you shouldn't be parsing the output of ls. In this case, it's just acting as a wrapper around stat anyway, so you may as well just call stat on each file, and use sort to get the oldest.
old) now=$(date +%s)
read name timestamp < <(stat -f "%N %B" "$2"/* | sort -k2,2n)
echo $(( $now - $timestamp ))
The above is concise, but doesn't distinguish between regular files and directories in the glob. If that is necessary, stick with find, but use a different form of -exec to minimize the number of calls to stat:
old ) now=$(date +%s)
read name timestamp < <(find "$2" -maxdepth 1 -type f -exec stat -f "%N %B" '{}' + | sort -k2,2n)
echo $(( $now - $timestamp ))
(Neither approach works if a filename contains a newline, although since you aren't using the filename in your example anyway, you can avoid that problem by dropping %N from the format and just sorting the timestamps numerically. For example:
read timestamp < <(stat -f %B "$2"/* | sort -n)
# or
read timestamp < <(find "$2" -maxdepth 1 -type f -exec stat -f %B '{}' + | sort -n)
)
Can you try creating a shell script that will reside in the remote host and when executed will provide the required output. Then from your local machine just use ssh or something like that to run that. In this way the script will run locally there. Just a thought :-)

listing file in unix and saving the output in a variable(Oldest File fetching for a particular extension)

This might be a very simple thing for a shell scripting programmer but am pretty new to it. I was trying to execute the below command in a shell script and save the output into a variable
inputfile=$(ls -ltr *.{PDF,pdf} | head -1 | awk '{print $9}')
The command works fine when I fire it from terminal but fails when executed through a shell script (sh). Why is that the command fails, does it mean that shell script doesn't support the command or am I doing it wrong? Also how do I know if a command will work in shell or not?
Just to give you a glimpse of my requirement, I was trying to get the oldest file from a particular directory (I also want to make sure upper case and lower case extensions are handled). Is there any other way to do this ?
The above command will work correctly only if BOTH *.pdf and *.PDF files are in the directory you are currently.
If you would like to execute it in a directory with only one of those you should consider using e.g.:
inputfiles=$(find . -maxdepth 1 -type f \( -name "*.pdf" -or -name "*.PDF" \) | xargs ls -1tr | head -1 )
NOTE: The above command doesn't work with files with new lines, or with long list of found files.
Parsing ls is always a bad idea. You need another strategy.
How about you make a function that gives you the oldest file among the ones given as argument? the following works in Bash (adapt to your needs):
get_oldest_file() {
# get oldest file among files given as parameters
# return is in variable get_oldest_file_ret
local oldest f
for f do
[[ -e $f ]] && [[ ! $oldest || $f -ot $oldest ]] && oldest=$f
done
get_oldest_file_ret=$oldest
}
Then just call as:
get_oldest_file *.{PDF,pdf}
echo "oldest file is: $get_oldest_file_ret"
Now, you probably don't want to use brace expansions like this at all. In fact, you very likely want to use the shell options nocaseglob and nullglob:
shopt -s nocaseglob nullglob
get_oldest_file *.pdf
echo "oldest file is: $get_oldest_file_ret"
If you're using a POSIX shell, it's going to be a bit trickier to have the equivalent of nullglob and nocaseglob.
Is perl an option? It's ubiquitous on Unix.
I would suggest:
perl -e 'print ((sort { -M $b <=> -M $a } glob ( "*.{pdf,PDF}" ))[0]);';
Which:
uses glob to fetch all files matching the pattern.
sort, using -M which is relative modification time. (in days).
fetches the first element ([0]) off the sort.
Prints that.
As #gniourf_gniourf says, parsing ls is a bad idea. Such as leaving unquoted globs, and generally not counting for funny characters in file names.
find is your friend:
#!/bin/sh
get_oldest_pdf() {
#
# echo path of oldest *.pdf (case-insensitive) file in current directory
#
find . -maxdepth 1 -mindepth 1 -iname "*.pdf" -printf '%T# %p\n' \
| sort -n \
| tail -1 \
| cut -d\ -f1-
}
whatever=$(get_oldest_pdf)
Notes:
find has numerous ways of formatting the output, including
things like access time and/or write time. I used '%T# %p\n',
where %T# is last write time in UNIX time format incl.fractal part.
This will never containt space so it's safe to use as separator.
Numeric sort and tail get the last item, sorting by the time,
cut removes the time from the output.
I used IMO much easier to read/maintain pipe notation, with help of \.
the code should run on any POSIX shell,
You could easily adjust the function to parametrize the pattern,
time used (access/write), control the search depth or starting dir.

Linux: List file names, if last modified between a date interval

I have 2 variables, which contains dates like this: 2001.10.10
And i want to use ls with a filter, that only list files if last modified were between the first and second date
The best solution I can think of involves creating temporary files with the boundary timestamps, and then using find:
touch -t YYYYMMDD0000 oldest_file
touch -t YYYYMMDD0000 newest_file
find -maxdepth 1 -newer oldest_file -and -not -newer newest_file
rm oldest_file newest_file
You can use the -print0 option to find if you want to strip off the leading ./ from all the filenames.
If creating temporary files isn't an option, you might consider writing a script to calculate and print the age of a file, such as described here, and then using that as a predicate.
Sorry, it is not the simplest. I just now developed it, only for you. :-)
ls -l --full-time|awk '{s=$6;gsub(/[-\.]/,"",s);if ((s>="'"$from_variable"'") && (s<="'"$to_variable"'")) {print $0}}';
The problem is, that these simple commandline tools doesn't handle date type. So first we convert them to integers removing the separating "-" and "." characters (by you is it ".", by me a "-" so I remove both, this can you see in
gsub(/[-\.]/,"",s)
After the removal, we can already compare them with integers. In this example, we compare them with the integers $from_variable and with $to_variable. So, this will list files modified between $from_variable and $to_variable .
Both of "from_variable" and "to_variable" need to be environment variables in the form 20070707 (for 7. July, 2007).

Pruning old backups in several steps

I am looking for a way to thin out old backups. The backups are run on a daily basis, and I want to increase the interval as the backups become older.
After a couple of days I'd like to remove the daily backups, leaving only the "Sunday" backup. After a couple of weeks, only the first backup of a month that is available should be removed.
Since I am dealing with historic backups, I cannot just change the naming scheme.
I tried to use 'find' for it, but couldn't find the right options.
Anyone got anything that might help?
I know it is historical data, but you might prefer coming up with a naming scheme to assist this problem. It might be far easier to tackle this problem in two passes: first, renaming the directories based on the date, then selecting the directories to keep in the future.
You could make a quick approximation, if all the directory dates in ls -l output look good enough:
ls -l | awk '{print "mv " $8 " " $6;}' > /tmp/runme
Look at /tmp/runme, and if it looks good, you can run it with sh /tmp/runme. You might wish to prune the entries or something like that, up to you.
If all the backups are stored in directories named, e.g:
2011-01-01/
2011-01-02/
2011-01-03/
...
2011-02-01/
2011-02-02/
...
2011-03-07/
then your problem would be reduced to computing the names to keep and delete. This problem is much easier to solve than searching through all your files and trying to select which ones to keep and delete based on when they were made. (See date "+%Y-%m-%d" output for a quick way to generate this sort of name.)
Once they are named conveniently, you can keep the first backup of every month with a script like this:
for y in `seq 2008 2010`
do for m in `seq -w 1 12`
do for d in `seq -w 2 31`
do echo "rm $y-$m-$d"
done
done
done
Save its output, inspect it :) and then run the output, similar to the rename script.
Once you've got the past backups under control, then you can generate the 2010 from date --date="Last Year" "+%Y", and other improvements so it handles "one a week" for the current month and maintains itself forever going forward.
I've developed a solution for my similar needs on top of #ajreal's starting point. My backups are named like "backup-2015-06-01T01:00:01" (using date "+%Y-%m-%dT%H:%M:%S").
Two simple steps: touch the files to keep using a shell glob pattern for first-of-each-month, and use find and xargs to delete anything more than 30 days old.
cd $BACKUPS_DIR
# touch backups from the first of each month
touch *-01T*
# delete backups more than 30 days old
echo "Deleting these backups:"
find -maxdepth 1 -mtime +30
find -maxdepth 1 -mtime +30 -print0 | xargs -0 rm -r
yup, for example
find -type f -mtime 30
details -
http://www.gnu.org/software/findutils/manual/html_mono/find.html#Age-Ranges

Resources