how to extract specific name from folder or file name - linux

I'm new to linux command.
Now I would like to extract only date value from listed file name and compare as a date value.
Example:
/underdirectory
20080206
20080207
bk_20080208
I want to listed all above directories and compare date, means this directory is greater or smaller than which is according to specified date.
If all of the listed directories will be date, it's ok for condition check.
code
foreach date_directory ( ls )
if ( "$date_directory" >= "$fdate" && "$date_directory" <= "$tdate") then
echo ${target_del}${date_directory} >> ${output}
endif
end
But if include some words such as "bk_20080228" or "bk_20080228_bk" or "20080228_bk" or "20080228_tt", there condition check will be error.
bk_20080208 want to take only 20080208.
please help me.
Thanks!

use grep regexp grep -Eo '[0-9]{4}[0-9]{2}[0-9]{2}' or grep -Eo '[0-9]+'

shopt -s nullglob
for i in *;do echo $i|sed 's/.*_//';done
the output can be redirected as per needs.

Related

Find and copy specific files by date

I've been trying to get a script working to backup some files from one machine to another but have been running into an issue.
Basically what I want to do is copy two files, one .log and one (or more) .dmp. Their format is always as follows:
something_2022_01_24.log
something_2022_01_24.dmp
I want to do three things with these files:
find the second to last one .log file (i.e. something_2022_01_24.log is the latest,I want to find the one before that say something_2022_01_22.log)
get a substring with just the date (2022_01_22)
copy every .dmp that matches the date (i.e something_2022_01_24.dmp, something01_2022_01_24.dmp)
For the first one from what I could find the best way is to do: ls -t *.log | head-2 as it displays the second to last file created.
As for the second one I'm more at a loss because I'm not sure how to parse the output of the first command.
The third one I think I could manage with something of the sort:
[ -f "/var/www/my_folder/*$capturedate.dmp" ] && cp "/var/www/my_folder/*$capturedate.dmp" /tmp/
What do you guys think is there any way to do this? How can I compare the substring?
Thanks!
Would you please try the following:
#!/bin/bash
dir="/var/www/my_folder"
second=$(ls -t "$dir/"*.log | head -n 2 | tail -n 1)
if [[ $second =~ .*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log ]]; then
capturedate=${BASH_REMATCH[1]}
cp -p "$dir/"*"$capturedate".dmp /tmp
fi
second=$(ls -t "$dir"/*.log | head -n 2 | tail -n 1) will pick the
second to last log file. Please note it assumes that the timestamp
of the file is not modified since it is created and the filename
does not contain special characters such as a newline. This is an easy
solution and we may need more improvement for the robustness.
The regex .*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log will match the log
filename. It extracts the date substring (enclosed with the parentheses) and assigns the bash variable
${BASH_REMATCH[1]} to it.
Then the next cp command will do the job. Please be cateful
not to include the widlcard * within the double quotes so that
the wildcard is properly expanded.
FYI here are some alternatives to extract the date string.
With sed:
capturedate=$(sed -E 's/.*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log/\1/' <<< "$second")
With parameter expansion of bash (if something does not include underscores):
capturedate=${second%.log}
capturedate=${capturedate#*_}
With cut command (if something does not include underscores):
capturedate=$(cut -d_ -f2,3,4 <<< "${second%.log}")

How to auto insert a string in filename by bash?

I have the output file day by day:
linux-202105200900-foo.direct.tar.gz
The date and time string, ex: 202105200900 will change every day.
I need to manually rename these files to
linux-202105200900x86-foo.direct.tar.gz
( insert a short string x86 after date/time )
any bash script can help to do this?
If you're always inserting the string "x86" at character #18 in the string, you may use that command:
var="linux-202105200900-foo.direct.tar.gz"
var2=${var:0:18}"x86"${var:18}
echo $var2
The 2nd line means: "assign to variable var2 the first 18 characters of var, followed by x86 followed by the rest of the variable var"
If you want to insert "x86" just before the last hyphen in the string, you may write it like this:
var="linux-202105200900-foo.direct.tar.gz"
var2=${var%-*}"x86-"${var##*-}
echo $var2
The 2nd line means: "assign to variable var2:
the content of the variable var after removing the shortest matching pattern "-*" at the end
the string "x86-"
the content of the variable var after removing the longest matching pattern "*-" at the beginning
In addition to the very good answer by #Jean-Loup Sabatier another, perhaps more general way would simply be to replace the second occurrence of '-' with x86- which you can do with sed. Let's say you have:
fname=linux-202105200900-foo.direct.tar.gz
You can update that with:
fname="$(sed 's/-/x86-/2' <<< "$fname")"
Which simply uses a command substitution with sed and a herestring to modify fname assigning the modified result back to fname.
Example Use/Output
$ fname=linux-202105200900-foo.direct.tar.gz
fname="$(sed 's/-/x86-/2' <<< "$fname")"
echo $fname
linux-202105200900x86-foo.direct.tar.gz
Do you need this?
❯ dat=$(date '+%Y%m%d%H%M%S'); echo ${dat}
20210520170336
❯ filename="linux-${dat}x86-foo.direct.tar.gz"; echo ${filename}
linux-20210520170336x86-foo.direct.tar.gz
I wanted to go as simple as possible, considering only the timestamp is going to change, this script should do it. Just run it inside the folder where files are located and you'll get all of them renamed with x86.
#!/bin/bash
for file in $(ls); do
replaced=$(echo $file | sed 's|-foo|x86-foo|g')
mv $file $replaced
done
This is my output
filip#filip-ThinkPad-T14-Gen-1:~/test$ ls
linux-202105200900-foo.direct.tar.gz linux-202105201000-foo.direct.tar.gz linux-202105201100-foo.direct.tar.gz
filip#filip-ThinkPad-T14-Gen-1:~/test$ ./../development/bash-utils/bulk-rename.sh
filip#filip-ThinkPad-T14-Gen-1:~/test$ ls
linux-202105200900x86-foo.direct.tar.gz linux-202105201000x86-foo.direct.tar.gz linux-202105201100x86-foo.direct.tar.gz
Simply iterate through all the files in current folder and pipeline result to sed to replace regex -foo with x86-foo, then rename file with mv command.
As David mentioned in comment, if you're worried that there could be multiple occurrences of -foo then you can just replace g as global to 1 as first occurrence and that's it!
There is also the rename utility (https://man7.org/linux/man-pages/man1/rename.1.html), you could use:
rename -v 0-foo.direct.tar.gz 0x86-foo.direct.tar.gz *
which results in
`linux-202105200900-foo.direct.tar.gz' -> `linux-202105200900x86-foo.direct.tar.gz'
`linux-202205200900-foo.direct.tar.gz' -> `linux-202205200900x86-foo.direct.tar.gz'
`linux-202305200900-foo.direct.tar.gz' -> `linux-202305200900x86-foo.direct.tar.gz'
In addition to the very good answer by #David C. Rankin, just adding it in a loop and renaming the files
# !/usr/bin/bash
for file in `ls linux* 2>/dev/null` # Extract all files starting with linux
do
echo $file
fname="$(sed 's/-/x86-/2' <<< "$file")"
mv "$file" "$fname" # Rename file
done
Output recieved :
linux-202105200900x86-foo.direct.tar.gz

How do i extract the date from multiple files with dates in it?

Lets say i have multiple filesnames e.g. R014-20171109-1159.log.20171109_1159.
I want to create a shell script which creates for every given date a folder and moves the files matching the date to it.
Is this possible?
For the example a folder "20171109" should be created and has the file "R014-20171109-1159.log.20171109_1159" on it.
Thanks
This is a typical application of a for-loop in bash to iterate thru files.
At the same time, this solution utilizes GNU [ shell param substitution ].
for file in /path/to/files/*\.log\.*
do
foldername=${file#*-}
foldername=${foldername%%-*}
mkdir -p "${foldername}" # -p suppress errors if folder already exists
[ $? -eq 0 ] && mv "${file}" "${foldername}" # check last cmd status and move
done
Since you want to write a shell script, use commands. To get date, use cut cmd like ex:
cat 1.txt
R014-20171109-1159.log.20171109_1159
cat 1.txt | cut -d "-" -f2
Output
20171109
is your date and create folder. This way you can loop and create as many folders as you want
Its actually quite easy(my Bash syntax might be a bit off) -
for f in /path/to/your/files*; do
## Check if the glob gets expanded to existing files.
## If not, f here will be exactly the pattern above
## and the exists test will evaluate to false.
[ -e "$f" ] && echo $f > #grep the file name for "*.log."
#and extract 8 charecters after "*.log." .
#Next check if a folder exists already with the name of 8 charecters.
#If not { create}
#else just move the file to that folder path
break
done
Main idea is from this post link. Sorry for not providing the actual code as i havent worked anytime recently on Bash
Below commands can be put in script to achieve this,
Assign a variable with current date as below ( use --date='n day ago' option if need to have an older date).
if need to get it from File name itself, get files in a loop then use cut command to get the date string,
dirVar=$(date +%Y%m%d) --> for current day,
dirVar=$(date +%Y%m%d --date='1 day ago') --> for yesterday,
dirVar=$(echo $fileName | cut -c6-13) or
dirVar=$(echo $fileName | cut -d- -f2) --> to get from $fileName
Create directory with the variable value as below, (-p : create directory if doesn't exist.)
mkdir -p ${dirVar}
Move files to directory to the directory with below line,
mv *log.${dirVar}* ${dirVar}/

using grep to find and write specific lines with conditions

I'm new to shell commands.
I need to do some advanced search script and I have no idea how to do it.
I need to look for lines in my computer log files that contained "str1" and not contained "str2", order by date that greater or equal to some giving date, and write the result to a file, where the process run in the background.
in more formal way I want to do this:
new_process(write(from *log* where log_line contain "Str1" && log_line not contain "str2" order by Date having Date >= %date%) to read.txt)
thank you!
You may be making it harder than it need to be. grep 'str1' "$logfile" | grep -v 'str2' will find all lines containing str1 and not str2. For the given date, you will want to touch a file that has the mod time set to the date you want to measure against. You then use your list to feed a while read -r fname; do loop and test [ "$fname" -nt "testfilename" ] and write the files that satisfy that condition to a new file.
Your final script flow will look something like:
newname="${1:-newfile.txt}"
touch -d "date string" testfilename
grep "$str1" "$logfile" | grep -v "str2" | while read -r fname; do
[ "$fname" -nt "testfilename" ] && printf "%s\n" "$fname" > "$newname"
done
rm testfilename ## clean up
There are other ways to approach this, but this is a fairly standard approach. Let me know if you have further questions.

Extract part of a file name in bash

I have a folder with lots of files having a pattern, which is some string followed by a date and time:
BOS_CRM_SUS_20130101_10-00-10.csv (3 strings before date)
SEL_DMD_20141224_10-00-11.csv (2 strings before date)
SEL_DMD_SOUS_20141224_10-00-10.csv (3 strings before date)
I want to loop through the folder and extract only the part before the date and output into a file.
Output
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
This is my script but it is not working
#!/bin/bash
# script variables
FOLDER=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/
LOG_FILE=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/log
echo "Starting the programme at: $(date)" >> $LOG_FILE
# Getting part of the file name from FOLDER
for file in `ls $FOLDER/*.csv`
do
mv "${file}" "${file/date +%Y%m%d HH:MM:SS}" 2>&1 | tee -a $LOG_FILE
done #> $LOG_FILE
Use sed with extended-regex and groups to achieve this.
cat filelist | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'
where filelist is a file with all the names you care about. Of course, this is just a placeholder because I don't know how you are going to list all eligible files. If a glob will do, for example, you can do
ls mydir/*.csv | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'
Assuming you wont have numbers in the first part, you could use:
$ for i in *csv;do str=$(echo $i|sed -r 's/[0-9]+.*//'); echo $str; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
Or with parameter substitution:
$ for i in *csv;do echo ${i%_*_*}_; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
When you use ${var/pattern/replace}, the pattern must be a filename glob, not command to execute.
Instead of using the substitution operator, use the pattern removal operator
mv "${file}" "${file%_*-*-*.csv}.csv"
% finds the shortest match of the pattern at the end of the variable, so this pattern will just match the date and time part of the filename.
The substitution:
"${file/date +%Y%m%d HH:MM:SS}"
is unlikely to do anything, because it doesn't execute date +%Y%m%d HH:MM:SS. It just treats it as a pattern to search for, and it's not going to be found.
If you did execute the command, though, you would get the current date and time, which is also (apparently) not what you find in the filename.
If that pattern is precise, then you can do the following:
echo "${file%????????_??-??-??.csv}" >> "$LOG_FILE"
using grep:
ls *.csv | grep -Po "\K^([A-Za-z]+_)+"
output:
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

Resources