shell script for copying log files into a single compressed file - linux

We have a folder in our embedded board "statuslogs", this folder contains logs which are of the format : daily_status_date_time.log.
We need to get all the files of a particular year into a single file, for fetching from the server.
We did the following in our script
gzip -c statuslogs/daily_status_2017*.log > status_2017.gz
gzip -c statuslogs/daily_status_2018*.log > status_2018.gz
gzip -c statuslogs/daily_status_2019*.log > status_2019.gz
gzip -c statuslogs/daily_status_2020*.log > status_2020.gz
gzip -c statuslogs/daily_status_2021*.log > status_2021.gz
The problem with this logic is that it will still create status_*.gz file for the years 2019,2020,2021.
I tried writing the following logic
if [ - f statuslogs/daily_status_2017*.log ] but it fails due to regex may be. And I am not using bash, the interpreter is ash.
Can you please help me to optimize the script
Thanks for your time

You have a syntax error. It's -f, not - f. Example:
if [ -f statuslogs/daily_status_2017*.log ]; then
gzip -c statuslogs/daily_status_2017*.log > status_2017.gz
fi
However, with this you will probably run into a "too many arguments" error, which will happen if you have more than one matching file. So this would work better:
if find statuslogs/daily_status_2017*.log -mindepth 0 -maxdepth 0|head -n1; then
gzip -c statuslogs/daily_status_2017*.log > status_2017.gz
fi
It would be better to instead stop the loop when you reach the current year. For example,
for year in $(seq 2017 $(date +%Y)); do

Gzip only works on single files. If you want the separate files you need to do one of the following:
Combine the files using tar:
tar cf status_2017.tar.gz statuslogs/daily_status_2017*.log
OR use zip which supports multiple files directly
zip status_2017.zip statuslogs/daily_status_2017*.log
Now, if the problem is just that you want one archive for every year, but only for the years for which files exist, you can handle all the years using a for loop:
for year in `ls statuslogs/daily_status_* | cut -d _ -f 3 | sort | uniq`; do
tar cf status_$year.tar.gz statuslogs/daily_status_$year*.log;
done
If your shell doesn't support that format of calling, you can try this instead
ls statuslogs/daily_status_* | cut -d _ -f 3 | sort | uniq > years
cat years | while read year; do
tar cf status_$year.tar.gz statuslogs/daily_status_$year*.log;
done
If you just want one file for all the logs, you can just forget about the year part completely
tar cf statuslogs.tar.gz statuslogs/daily_status*.log

Related

Execute script for all but certain files in directory [duplicate]

This question already has answers here:
How do I exclude a directory when using `find`?
(46 answers)
Closed 3 years ago.
I need a bash script to iterate on all files in directory besides one with specific names. Maybe it can be done with help of awk/sed during script execution?
Here is my script, that simply merge all file in directory to one:
#!/bin/bash
(find $DIR_NAME -name app.gz\* | sort -rV | xargs -L1 gunzip -c 2> /dev/null || :)
How can I add some $DIR_NAME to list, and don`t iterate over them?
Put the names of the files to be excluded into a file, say "blacklist.txt", one filename per line. Then use
... | grep -F -f blacklist.txt | sort ...
to exclude them from the input to xargs.

bash loop file echo to each file in the directory

I searched a while and tried it by myself but unable to get this sorted so far. My folder looks below, 4 files
1.txt, 2.txt, 3.txt, 4.txt, 5.txt, 6.txt
I want to print file modified time and echo the time stamp in it
#!/bin/bash
thedate= `ls | xargs stat -s | grep -o "st_mtime=[0-9]*" | sed "s/st_mtime=//g"` #get file modified time
files= $(ls | grep -Ev '(5.txt|6.txt)$') #exclud 5 and 6 text file
for i in $thedate; do
echo $i >> $files
done
I want to insert each timestamp to each file. but having "ambiguous redirect" error. am I doing it incorrectly? Thanks
In this case, files is a "list" of files, so you probably want to add another loop to handle them one by one.
Your description is slightly confusing but, if your intent is to append the last modification date of each file to that file, you can do something like:
for fspec in [1-4].txt ; do
stat -c %y ${fspec} >>${fspec}
done
Note I've used stat -c %y to get the modification time such as 2017-02-09 12:21:22.848349503 +0800 - I'm not sure what variant of stat you're using but mine doesn't have a -s option. You can still use your option, you just have to ensure it's done on each file in turn, probably something like (in the for loop above):
stat -s ${fspec} | grep -o "st_mtime=[0-9]*" | sed "s/st_mtime=//g" >>${fspec}
You can not redirect the output to several files as in > $files.
To process several files you need something like:
#!/bin/bash
for f in ./[0-4].txt ; do
# get file modified time (in seconds)
thedate="$(stat --printf='%Y\n' "$f")"
echo "$thedate" >> "$f"
done
If you want a human readable time format change %Y by %y:
thedate="$(stat --printf='%y\n' "$f")"

Rm and Egrep -v combo

I want to remove all the logs except the current log and the log before that.
These log files are created after 20 minutes.So the files names are like
abc_23_19_10_3341.log
abc_23_19_30_3342.log
abc_23_19_50_3241.log
abc_23_20_10_3421.log
where 23 is today's date(might include yesterday's date also)
19 is the hour(7 o clock),10,30,50,10 are the minutes.
In this case i want i want to keep abc_23_20_10_3421.log which is the current log(which is currently being writen) and abc_23_19_50_3241.log(the previous one)
and remove the rest.
I got it to work by creating a folder,putting the first files in that folder and removing the files and then deleting it.But that's too long...
I also tried this
files_nodelete=`ls -t | head -n 2 | tr '\n' '|'`
rm *.txt | egrep -v "$files_nodelete"
but it didnt work.But if i put ls instead of rm it works.
I am an amateur in linux.So please suggest a simple idea..or a logic..xargs rm i tried but it didnt work.
Also read about mtime,but seems abit complicated since I am new to linux
Working on a solaris system
Try the logadm tool in Solaris, it might be the simplest way to rotate logs. If you just want to get things done, it will do it.
http://docs.oracle.com/cd/E23823_01/html/816-5166/logadm-1m.html
If you want a solution similar (but working) to your try this:
ls abc*.log | sort | head -n-2 | xargs rm
ls abc*.log: list all files, matching the pattern abc*.log
sort: sorts this list lexicographical (by name) from oldes to to newest logfile
head -n-2: return all but the last two entry in the list (you can give -n a negativ count too)
xargs rm: compose the rm command with the entries from stdin
If there are two or less files in the directory, this command will return an error like
rm: missing operand
and will not delete any files.
It is usually not a good idea to use ls to point to files. Some files may cause havoc (files which have a [Newline] or a weird character in their name are the usual exemples ....).
Using shell globs : Here is an interresting way : we count the files newer than the one we are about to remove!
pattern='abc*.log'
for i in $pattern ; do
[ -f "$i" ] || break ;
#determine if this is the most recent file, in the current directory
# [I add -maxdepth 1 to limit the find to only that directory, no subdirs]
if [ $(find . -maxdepth 1 -name "$pattern" -type f -newer "$i" -print0 | tr -cd '\000' | tr '\000' '+' | wc -c) -gt 1 ];
then
#there are 2 files more recent than $i that match the pattern
#we can delete $i
echo rm "$i" # remove the echo only when you are 100% sure that you want to delete all those files !
else
echo "$i is one of the 2 most recent files matching '${pattern}', I keep it"
fi
done
I only use the globbing mechanism to feed filenames to "find", and just use the terminating "0" of the -printf0 to count the outputed filenames (thus I have no problems with any special characters in those filenames, I just need to know how many files were outputted)
tr -cd "\000" will keep only the \000, ie the terminating NUL character outputed by print0. Then I translate each \000 to a single + character, and I count them with the wc -c. If I see 0, "$i" was the most recent file. If I see 1, "$i" was the one just a bit older (so the find sees only the most recent one). And if I see more than 1, it means the 2 files (mathching the pattern) that we want to keep are newer than "$i", so we can delete "$i"
I'm sure someone will step in with a better one, but the idea could be reused, I guess...
Thanks guyz for all the answers.
I found my answer
files=`ls -t *.txt | head -n 2 | tr '\n' '|' | rev |cut -c 2- |rev`
rm `ls -t | egrep -v "$files"`
Thank you for the help

Clearing archive files with linux bash script

Here is my problem,
I have a folder where is stored multiple files with a specific format:
Name_of_file.TypeMM-DD-YYYY-HH:MM
where MM-DD-YYYY-HH:MM is the time of its creation. There could be multiple files with the same name but not the same time of course.
What i want is a script that can keep the 3 newest version of each file.
So, I found one example there:
Deleting oldest files with shell
But I don't want to delete a number of files but to keep a certain number of newer files. Is there a way to get that find command, parse in the Name_of_file and keep the 3 newest???
Here is the code I've tried yet, but it's not exactly what I need.
find /the/folder -type f -name 'Name_of_file.Type*' -mtime +3 -delete
Thanks for help!
So i decided to add my final solution in case anyone liked to get it. It's a combination of the 2 solutions given.
ls -r | grep -P "(.+)\d{4}-\d{2}-\d{2}-\d{2}:\d{2}" | awk 'NR > 3' | xargs rm
One line, super efficiant. If anything changes on the pattern of date or name just change the grep -P pattern to match it. This way you are sure that only the files fitting this pattern will get deleted.
Can you be extra, extra sure that the timestamp on the file is the exact same timestamp on the file name? If they're off a bit, do you care?
The ls command can sort files by timestamp order. You could do something like this:
$ ls -t | awk 'NR > 3' | xargs rm
THe ls -t lists the files by modification time where the newest are first.
The `awk 'NR > 3' prints out the list of files except for the first three lines which are the three newest.
The xargs rm will remove the files that are older than the first three.
Now, this isn't the exact solution. There are possible problems with xargs because file names might contain weird characters or whitespace. If you can guarantee that's not the case, this should be okay.
Also, you probably want to group the files by name, and keep the last three. Hmm...
ls | sed 's/MM-DD-YYYY-HH:MM*$//' | sort -u | while read file
do
ls -t $file* | awk 'NR > 3' | xargs rm
done
The ls will list all of the files in the directory. The sed 's/\MM-DD-YYYY-HH:MM//' will remove the date time stamp from the files. Thesort -u` will make sure you only have the unique file names. Thus
file1.txt-01-12-1950
file2.txt-02-12-1978
file2.txt-03-12-1991
Will be reduced to just:
file1.txt
file2.txt
These are placed through the loop, and the ls $file* will list all of the files that start with the file name and suffix, but will pipe that to awk which will strip out the newest three, and pipe that to xargs rm that will delete all but the newest three.
Assuming we're using the date in the filename to date the archive file, and that is possible to change the date format to YYYY-MM-DD-HH:MM (as established in comments above), here's a quick and dirty shell script to keep the newest 3 versions of each file within the present working directory:
#!/bin/bash
KEEP=3 # number of versions to keep
while read FNAME; do
NODATE=${FNAME:0:-16} # get filename without the date (remove last 16 chars)
if [ "$NODATE" != "$LASTSEEN" ]; then # new file found
FOUND=1; LASTSEEN="$NODATE"
else # same file, different date
let FOUND="FOUND + 1"
if [ $FOUND -gt $KEEP ]; then
echo "- Deleting older file: $FNAME"
rm "$FNAME"
fi
fi
done < <(\ls -r | grep -P "(.+)\d{4}-\d{2}-\d{2}-\d{2}:\d{2}")
Example run:
[me#home]$ ls
another_file.txt2011-02-11-08:05
another_file.txt2012-12-09-23:13
delete_old.sh
not_an_archive.jpg
some_file.exe2011-12-12-12:11
some_file.exe2012-01-11-23:11
some_file.exe2012-12-10-00:11
some_file.exe2013-03-01-23:11
some_file.exe2013-03-01-23:12
[me#home]$ ./delete_old.sh
- Deleting older file: some_file.exe2012-01-11-23:11
- Deleting older file: some_file.exe2011-12-12-12:11
[me#home]$ ls
another_file.txt2011-02-11-08:05
another_file.txt2012-12-09-23:13
delete_old.sh
not_an_archive.jpg
some_file.exe2012-12-10-00:11
some_file.exe2013-03-01-23:11
some_file.exe2013-03-01-23:12
Essentially, but changing the file name to dates in the form to YYYY-MM-DD-HH:MM, a normal string sort (such as that done by ls) will automatically group similar files together sorted by date-time.
The ls -r on the last line simply lists all files within the current working directly print the results in reverse order so newer archive files appear first.
We pass the output through grep to extract only files that are in the correct format.
The output of that command combination is then looped through (see the while loop) and we can simply start deleting after 3 occurrences of the same filename (minus the date portion).
This pipeline will get you the 3 newest files (by modification time) in the current dir
stat -c $'%Y\t%n' file* | sort -n | tail -3 | cut -f 2-
To get all but the 3 newest:
stat -c $'%Y\t%n' file* | sort -rn | tail -n +4 | cut -f 2-

Script for renaming files with logical

Someone has very kindly help get me started on a mass rename script for renaming PDF files.
As you can see I need to add a bit of logical to stop the below happening - so something like add a unique number to a duplicate file name?
rename 's/^(.{5}).*(\..*)$/$1$2/' *
rename -n 's/^(.{5}).*(\..*)$/$1$2/' *
Annexes 123114345234525.pdf renamed as Annex.pdf
Annexes 123114432452352.pdf renamed as Annex.pdf
Hope this makes sense?
Thanks
for i in *
do
x='' # counter
j="${i:0:2}" # new name
e="${i##*.}" # ext
while [ -e "$j$x" ] # try to find other name
do
((x++)) # inc counter
done
mv "$i" "$j$x" # rename
done
before
$ ls
he.pdf hejjj.pdf hello.pdf wo.pdf workd.pdf world.pdf
after
$ ls
he.pdf he1.pdf he2.pdf wo.pdf wo1.pdf wo2.pdf
This should check whether there will be any duplicates:
rename -n [...] | grep -o ' renamed as .*' | sort | uniq -d
If you get any output of the form renamed as [...], then you have a collision.
Of course, this won't work in a couple corner cases - If your files contain newlines or the literal string renamed as, for example.
As noted in my answer on your previous question:
for f in *.pdf; do
tmp=`echo $f | sed -r 's/^(.{5}).*(\..*)$/$1$2/'`
mv -b ./"$f" ./"$tmp"
done
That will make backups of deleted or overwritten files. A better alternative would be this script:
#!/bin/bash
for f in $*; do
tar -rvf /tmp/backup.tar $f
tmp=`echo $f | sed -r 's/^(.{5}).*(\..*)$/$1$2/'`
i=1
while [ -e tmp ]; do
tmp=`echo $tmp | sed "s/\./-$i/"`
i+=1
done
mv -b ./"$f" ./"$tmp"
done
Run the script like this:
find . -exec thescript '{}' \;
The find command gives you lots of options for specifing which files to run on, works recursively, and passes all the filenames in to the script. The script backs all file up with tar (uncompressed) and then renames them.
This isn't the best script, since it isn't smart enough to avoid the manual loop and check for identical file names.

Resources