How do i extract the date from multiple files with dates in it? - linux

Lets say i have multiple filesnames e.g. R014-20171109-1159.log.20171109_1159.
I want to create a shell script which creates for every given date a folder and moves the files matching the date to it.
Is this possible?
For the example a folder "20171109" should be created and has the file "R014-20171109-1159.log.20171109_1159" on it.
Thanks

This is a typical application of a for-loop in bash to iterate thru files.
At the same time, this solution utilizes GNU [ shell param substitution ].
for file in /path/to/files/*\.log\.*
do
foldername=${file#*-}
foldername=${foldername%%-*}
mkdir -p "${foldername}" # -p suppress errors if folder already exists
[ $? -eq 0 ] && mv "${file}" "${foldername}" # check last cmd status and move
done

Since you want to write a shell script, use commands. To get date, use cut cmd like ex:
cat 1.txt
R014-20171109-1159.log.20171109_1159
cat 1.txt | cut -d "-" -f2
Output
20171109
is your date and create folder. This way you can loop and create as many folders as you want

Its actually quite easy(my Bash syntax might be a bit off) -
for f in /path/to/your/files*; do
## Check if the glob gets expanded to existing files.
## If not, f here will be exactly the pattern above
## and the exists test will evaluate to false.
[ -e "$f" ] && echo $f > #grep the file name for "*.log."
#and extract 8 charecters after "*.log." .
#Next check if a folder exists already with the name of 8 charecters.
#If not { create}
#else just move the file to that folder path
break
done
Main idea is from this post link. Sorry for not providing the actual code as i havent worked anytime recently on Bash

Below commands can be put in script to achieve this,
Assign a variable with current date as below ( use --date='n day ago' option if need to have an older date).
if need to get it from File name itself, get files in a loop then use cut command to get the date string,
dirVar=$(date +%Y%m%d) --> for current day,
dirVar=$(date +%Y%m%d --date='1 day ago') --> for yesterday,
dirVar=$(echo $fileName | cut -c6-13) or
dirVar=$(echo $fileName | cut -d- -f2) --> to get from $fileName
Create directory with the variable value as below, (-p : create directory if doesn't exist.)
mkdir -p ${dirVar}
Move files to directory to the directory with below line,
mv *log.${dirVar}* ${dirVar}/

Related

Find and copy specific files by date

I've been trying to get a script working to backup some files from one machine to another but have been running into an issue.
Basically what I want to do is copy two files, one .log and one (or more) .dmp. Their format is always as follows:
something_2022_01_24.log
something_2022_01_24.dmp
I want to do three things with these files:
find the second to last one .log file (i.e. something_2022_01_24.log is the latest,I want to find the one before that say something_2022_01_22.log)
get a substring with just the date (2022_01_22)
copy every .dmp that matches the date (i.e something_2022_01_24.dmp, something01_2022_01_24.dmp)
For the first one from what I could find the best way is to do: ls -t *.log | head-2 as it displays the second to last file created.
As for the second one I'm more at a loss because I'm not sure how to parse the output of the first command.
The third one I think I could manage with something of the sort:
[ -f "/var/www/my_folder/*$capturedate.dmp" ] && cp "/var/www/my_folder/*$capturedate.dmp" /tmp/
What do you guys think is there any way to do this? How can I compare the substring?
Thanks!
Would you please try the following:
#!/bin/bash
dir="/var/www/my_folder"
second=$(ls -t "$dir/"*.log | head -n 2 | tail -n 1)
if [[ $second =~ .*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log ]]; then
capturedate=${BASH_REMATCH[1]}
cp -p "$dir/"*"$capturedate".dmp /tmp
fi
second=$(ls -t "$dir"/*.log | head -n 2 | tail -n 1) will pick the
second to last log file. Please note it assumes that the timestamp
of the file is not modified since it is created and the filename
does not contain special characters such as a newline. This is an easy
solution and we may need more improvement for the robustness.
The regex .*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log will match the log
filename. It extracts the date substring (enclosed with the parentheses) and assigns the bash variable
${BASH_REMATCH[1]} to it.
Then the next cp command will do the job. Please be cateful
not to include the widlcard * within the double quotes so that
the wildcard is properly expanded.
FYI here are some alternatives to extract the date string.
With sed:
capturedate=$(sed -E 's/.*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log/\1/' <<< "$second")
With parameter expansion of bash (if something does not include underscores):
capturedate=${second%.log}
capturedate=${capturedate#*_}
With cut command (if something does not include underscores):
capturedate=$(cut -d_ -f2,3,4 <<< "${second%.log}")

Create folders automatically and move files

I have a lot of daily files that are sort by hours which comes from a data-logger (waveform). I downloaded inside a USB stick, now I need to save them inside folders named with the first 8 characters of waveform.
Those files have the following pattern:
Year-Month-Day-hourMinute-##.Code_Station_location_Channel
for example, inside the USB I have:
2020-10-01-0000-03.AM_REDDE_00_EHE; 2020-10-01-0100-03.AM_REDDE_00_EHE; 2020-10-02-0300-03.AM_REDDE_00_EHE; 2020-10-20-0000-03.AM_REDDE_00_EHE; 2020-10-20-0100-03.AM_REDDE_00_EHE; 2020-11-15-2000-03.AM_REDDE_00_EHE; 2020-11-15-2100-03.AM_REDDE_00_EHE; 2020-11-19-0400-03.AM_REDDE_00_EHE; 2020-11-19-0900-03.AM_REDDE_00_EHE;
I modified a little a code from #user3360767 (shell script to create folder daily with time-stamp and push time-stamp generated logs) to speed up the procedure of creating a folder and moving the files to them
for filename in 2020-10-01*EHE; do
foldername=$(echo "$filename" | awk '{print (201001)}');
mkdir -p "$foldername"
mv "$filename" "$foldername"
echo "$filename $foldername" ;
done
2020-10-01*EHE
Here I list all hours from 2020-10-01-0000-03.AM_REDDE_00_EHE
foldername=$(echo "$filename" | awk '{print (201001)}');
Here I create the folder that belongs to 2020-10-01 and with the following lines create the folder and then move all files to created folder.
mkdir -p "$foldername"
mv "$filename" "$foldername"
echo "$filename $foldername" ;
As you may notice, I will always need to modify the line for filename in 2020-10-01*EHE each time the file changes the date.
Is there a way to try to create folders with the first 8 number of the file?
Tonino
Use date
And since the foldername doesn't change, you don't need to keep creating one inside the loop.
files="$(date +%Y-%m-%d)*EHE"
foldername=$(date +%Y%m%d)
mkdir -p "$foldername"
for filename in $files; do
mv "$filename" "$foldername"
echo "$filename $foldername"
done
Edit:
If you want to specify the folder each time, you can pass it as an argument and use sed to get the filename pattern
foldername=$1
files=$(echo $1 | sed 's/\(....\)\(..\)\(..\)/\1-\2-\3/')
filepattern="$files*EHE"
mkdir -p "$foldername"
for filename in $filepattern; do
mv "$filename" "$foldername"
echo "$filename $foldername"
done
You call it with
./<yourscriptname>.sh 20101001
I think you want to move all files whose names end in *EHE into subdirectories. The subdirectories will be created as necessary and will be named according to the date at the start of each filename without the dashes/hyphens.
Please test the following on a copy of your files in a temporary directory somewhere.
#!/bin/bash
for filename in *EHE ; do
# Derive folder by deleting all dashes from filename, then taking first 8 characters
folder=${filename//-/}
folder=${folder:0:8}
echo "Would move $filename to $folder"
# Uncomment next 2 lines to actually move file
# mkdir -p "$folder"
# mv "$filename" "$folder"
done
Sample Output
Would move 2020-10-01-0000-03.AM_REDDE_00_EHE to 20201001
Would move 2020-10-01-0100-03.AM_REDDE_00_EHE to 20201001
Note that the 2 lines:
folder=${filename//-/}
folder=${folder:0:8}
use "bash parameter substitution", which is described here if you want to learn about it, and obviate the need to create whole new processes to run awk, sed or cut to extract the fields.

Find out if a backup ran by searching the newest file

I'd like to write a short and simple script, that searches for a file using a specivic filter, and checks the age of that file. I want to write a short output and an error-code. This should be accessible for an NRPE-Server.
The script itself works, but I only have a problem when the file does not exist. This happens with that command:
newestfile=$(ls -t $path/$filter | head -1)
When the files exist, everything works as it should. When there nothing matches my filter, I get the output (I changed the filter to *.zip to show):
ls: cannot access '/backup/*.zip': No such file or directory
But I want to get the following output and then just exit the script with code 1:
there are no backups with the filter *.zip in the directory /backup
I am pretty sure this is a very easy problem but I just don't know whats wron. By the way, I am still "new" to bash scripts.
Here is my whole code:
#!/bin/bash
# Set the variables
path=/backup
filter=*.tar.gz
# Find the newest file
newestfile=$(ls -t $path/$filter | head -1)
# check if we even have a file
if [ ! -f $newestfile ]; then
echo "there are no backups with the filter $filter in the directory $path"
exit 1
fi
# check how old the file is that we found
if [[ $(find "$newestfile" -mtime +1 -print) ]]; then
echo "File $newestfile is older than 24 hours"
exit 2
else
echo "the file $newestfile is younger than 24 hours"
exit 0
fi
Actually, with your code you should also get an error message bash: no match: /backup/*.zip
UPDATE: Fixed the proposed solution, and the missing quotes in the original solution:
I suggest the following approach:
shopt -u failglob # Turn off error from globbing
pathfilter="/backup/*.tar.gz" # Quotes to avoid the wildcards to be expanded here already
# First see whether we have any matching files
files=($pathfilter)
if [[ ! -e ${#files[0]} ]]
then
# .... No matching files
else
# Now you can safely fetch the newest file
# Note: This does NOT work if you have filenames
# containing newlines
newestfile=$(ls -tA $pathfilter | head -1)
fi
I don't like using ls for this task, but I don't see an easy way in bash to do it better.

Delete files in one directory that do not exist in another directory or its child directories

I am still a newbie in shell scripting and trying to come up with a simple code. Could anyone give me some direction here. Here is what I need.
Files in path 1: /tmp
100abcd
200efgh
300ijkl
Files in path2: /home/storage
backupfile_100abcd_str1
backupfile_100abcd_str2
backupfile_200efgh_str1
backupfile_200efgh_str2
backupfile_200efgh_str3
Now I need to delete file 300ijkl in /tmp as the corresponding backup file is not present in /home/storage. The /tmp file contains more than 300 files. I need to delete the files in /tmp for which the corresponding backup files are not present and the file names in /tmp will match file names in /home/storage or directories under /home/storage.
Appreciate your time and response.
You can also approach the deletion using grep as well. You can loop though the files in /tmp checking with ls piped to grep, and deleting if there is not a match:
#!/bin/bash
[ -z "$1" -o -z "$2" ] && { ## validate input
printf "error: insufficient input. Usage: %s tmpfiles storage\n" ${0//*\//}
exit 1
}
for i in "$1"/*; do
fn=${i##*/} ## strip path, leaving filename only
## if file in backup matches filename, skip rest of loop
ls "${2}"* | grep -q "$fn" &>/dev/null && continue
printf "removing %s\n" "$i"
# rm "$i" ## remove file
done
Note: the actual removal is commented out above, test and insure there are no unintended consequences before preforming the actual delete. Call it passing the path to tmp (without trailing /) as the first argument and with /home/storage as the second argument:
$ bash scriptname /path/to/tmp /home/storage
You can solve this by
making a list of the files in /home/storage
testing each filename in /tmp to see if it is in the list from /home/storage
Given the linux+shell tags, one might use bash:
make the list of files from /home/storage an associative array
make the subscript of the array the filename
Here is a sample script to illustrate ($1 and $2 are the parameters to pass to the script, i.e., /home/storage and /tmp):
#!/bin/bash
declare -A InTarget
while read path
do
name=${path##*/}
InTarget[$name]=$path
done < <(find $1 -type f)
while read path
do
name=${path##*/}
[[ -z ${InTarget[$name]} ]] && rm -f $path
done < <(find $2 -type f)
It uses two interesting shell features:
name=${path##*/} is a POSIX shell feature which allows the script to perform the basename function without an extra process (per filename). That makes the script faster.
done < <(find $2 -type f) is a bash feature which lets the script read the list of filenames from find without making the assignments to the array run in a subprocess. Here the reason for using the feature is that if the array is updated in a subprocess, it would have no effect on the array value in the script which is passed to the second loop.
For related discussion:
Extract File Basename Without Path and Extension in Bash
Bash Script: While-Loop Subshell Dilemma
I spent some really nice time on this today because I needed to delete files which have same name but different extensions, so if anyone is looking for a quick implementation, here you go:
#!/bin/bash
# We need some reference to files which we want to keep and not delete,
 # let's assume you want to keep files in first folder with jpeg, so you
# need to map it into the desired file extension first.
FILES_TO_KEEP=`ls -1 ${2} | sed 's/\.pdf$/.jpeg/g'`
#iterate through files in first argument path
for file in ${1}/*; do
# In my case, I did not want to do anything with directories, so let's continue cycle when hitting one.
if [[ -d $file ]]; then
continue
fi
# let's omit path from the iterated file with baseline so we can compare it to the files we want to keep
NAME_WITHOUT_PATH=`basename $file`
 # I use mac which is equal to having poor quality clts
# when it comes to operating with strings,
# this should be safe check to see if FILES_TO_KEEP contain NAME_WITHOUT_PATH
if [[ $FILES_TO_KEEP == *"$NAME_WITHOUT_PATH"* ]];then
echo "Not deleting: $NAME_WITHOUT_PATH"
else
# If it does not contain file from the other directory, remove it.
echo "deleting: $NAME_WITHOUT_PATH"
rm -rf $file
fi
done
Usage: sh deleteDifferentFiles.sh path/from/where path/source/of/truth

Output from a bash script looping through multiple directories

I am currently trying to write a script that will loop through multiple directories. The main raw_data directory contains ~150 subdirectories (subj001, subj002,...,subj00n), each of which has several subdirectories.
How can I make sure that the output from the script given bellow will be sent back to the specific subdirectory (e.g. subj0012) the input was taken from, rather than the current directory (raw_data)?
#!/bin/bash
for dir in ~raw_data/*
do
tractor -d -r -b preproc RunStages:1
done
Thank you.
The name of the dir you want to save the output to is in $dir, right? So, just send the output there via redirection:
#!/bin/bash
for dir in ~raw_data/* ; do
tractor -d -r- b preproc RunStages:1 > $dir/output
done
You should make sure that what you are processing really is a directory, though.
You can use output of find to run a loop and append the output at desired location like this:
while read d
do
echo $d >> ~raw_data/subj0012/output
done < <(find ~raw_data -type d)

Resources