Extract part of a file name in bash - linux

I have a folder with lots of files having a pattern, which is some string followed by a date and time:
BOS_CRM_SUS_20130101_10-00-10.csv (3 strings before date)
SEL_DMD_20141224_10-00-11.csv (2 strings before date)
SEL_DMD_SOUS_20141224_10-00-10.csv (3 strings before date)
I want to loop through the folder and extract only the part before the date and output into a file.
Output
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
This is my script but it is not working
#!/bin/bash
# script variables
FOLDER=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/
LOG_FILE=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/log
echo "Starting the programme at: $(date)" >> $LOG_FILE
# Getting part of the file name from FOLDER
for file in `ls $FOLDER/*.csv`
do
mv "${file}" "${file/date +%Y%m%d HH:MM:SS}" 2>&1 | tee -a $LOG_FILE
done #> $LOG_FILE

Use sed with extended-regex and groups to achieve this.
cat filelist | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'
where filelist is a file with all the names you care about. Of course, this is just a placeholder because I don't know how you are going to list all eligible files. If a glob will do, for example, you can do
ls mydir/*.csv | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'

Assuming you wont have numbers in the first part, you could use:
$ for i in *csv;do str=$(echo $i|sed -r 's/[0-9]+.*//'); echo $str; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
Or with parameter substitution:
$ for i in *csv;do echo ${i%_*_*}_; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

When you use ${var/pattern/replace}, the pattern must be a filename glob, not command to execute.
Instead of using the substitution operator, use the pattern removal operator
mv "${file}" "${file%_*-*-*.csv}.csv"
% finds the shortest match of the pattern at the end of the variable, so this pattern will just match the date and time part of the filename.

The substitution:
"${file/date +%Y%m%d HH:MM:SS}"
is unlikely to do anything, because it doesn't execute date +%Y%m%d HH:MM:SS. It just treats it as a pattern to search for, and it's not going to be found.
If you did execute the command, though, you would get the current date and time, which is also (apparently) not what you find in the filename.
If that pattern is precise, then you can do the following:
echo "${file%????????_??-??-??.csv}" >> "$LOG_FILE"

using grep:
ls *.csv | grep -Po "\K^([A-Za-z]+_)+"
output:
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

Related

How to auto insert a string in filename by bash?

I have the output file day by day:
linux-202105200900-foo.direct.tar.gz
The date and time string, ex: 202105200900 will change every day.
I need to manually rename these files to
linux-202105200900x86-foo.direct.tar.gz
( insert a short string x86 after date/time )
any bash script can help to do this?
If you're always inserting the string "x86" at character #18 in the string, you may use that command:
var="linux-202105200900-foo.direct.tar.gz"
var2=${var:0:18}"x86"${var:18}
echo $var2
The 2nd line means: "assign to variable var2 the first 18 characters of var, followed by x86 followed by the rest of the variable var"
If you want to insert "x86" just before the last hyphen in the string, you may write it like this:
var="linux-202105200900-foo.direct.tar.gz"
var2=${var%-*}"x86-"${var##*-}
echo $var2
The 2nd line means: "assign to variable var2:
the content of the variable var after removing the shortest matching pattern "-*" at the end
the string "x86-"
the content of the variable var after removing the longest matching pattern "*-" at the beginning
In addition to the very good answer by #Jean-Loup Sabatier another, perhaps more general way would simply be to replace the second occurrence of '-' with x86- which you can do with sed. Let's say you have:
fname=linux-202105200900-foo.direct.tar.gz
You can update that with:
fname="$(sed 's/-/x86-/2' <<< "$fname")"
Which simply uses a command substitution with sed and a herestring to modify fname assigning the modified result back to fname.
Example Use/Output
$ fname=linux-202105200900-foo.direct.tar.gz
fname="$(sed 's/-/x86-/2' <<< "$fname")"
echo $fname
linux-202105200900x86-foo.direct.tar.gz
Do you need this?
❯ dat=$(date '+%Y%m%d%H%M%S'); echo ${dat}
20210520170336
❯ filename="linux-${dat}x86-foo.direct.tar.gz"; echo ${filename}
linux-20210520170336x86-foo.direct.tar.gz
I wanted to go as simple as possible, considering only the timestamp is going to change, this script should do it. Just run it inside the folder where files are located and you'll get all of them renamed with x86.
#!/bin/bash
for file in $(ls); do
replaced=$(echo $file | sed 's|-foo|x86-foo|g')
mv $file $replaced
done
This is my output
filip#filip-ThinkPad-T14-Gen-1:~/test$ ls
linux-202105200900-foo.direct.tar.gz linux-202105201000-foo.direct.tar.gz linux-202105201100-foo.direct.tar.gz
filip#filip-ThinkPad-T14-Gen-1:~/test$ ./../development/bash-utils/bulk-rename.sh
filip#filip-ThinkPad-T14-Gen-1:~/test$ ls
linux-202105200900x86-foo.direct.tar.gz linux-202105201000x86-foo.direct.tar.gz linux-202105201100x86-foo.direct.tar.gz
Simply iterate through all the files in current folder and pipeline result to sed to replace regex -foo with x86-foo, then rename file with mv command.
As David mentioned in comment, if you're worried that there could be multiple occurrences of -foo then you can just replace g as global to 1 as first occurrence and that's it!
There is also the rename utility (https://man7.org/linux/man-pages/man1/rename.1.html), you could use:
rename -v 0-foo.direct.tar.gz 0x86-foo.direct.tar.gz *
which results in
`linux-202105200900-foo.direct.tar.gz' -> `linux-202105200900x86-foo.direct.tar.gz'
`linux-202205200900-foo.direct.tar.gz' -> `linux-202205200900x86-foo.direct.tar.gz'
`linux-202305200900-foo.direct.tar.gz' -> `linux-202305200900x86-foo.direct.tar.gz'
In addition to the very good answer by #David C. Rankin, just adding it in a loop and renaming the files
# !/usr/bin/bash
for file in `ls linux* 2>/dev/null` # Extract all files starting with linux
do
echo $file
fname="$(sed 's/-/x86-/2' <<< "$file")"
mv "$file" "$fname" # Rename file
done
Output recieved :
linux-202105200900x86-foo.direct.tar.gz

Echo to all files found from GREP

I'm having a trouble with my code.
grep looks for files that doesn't have a word 'code'
and I need to add 'doesn't have' as a last line in those files
By logic
echo 'doesnt have' >> grep -ril 'code' file/file
I'm using -ril to ignore the cases and get file names
Does anyone know how to append a text to each .txt files found from grep searches?
How's this for a novel alternative?
echo "doesn't have" |
tee -a $(grep -riL 'code' file/file)
I switched to the -L option to list the files which do not contain the search string.
This is unfortunately rather brittle in that it assumes your file names do not contain whitespace or other shell metacharacters. This can be fixed at the expense of some complexity (briefly, have grep output zero-terminated matches, and read them into an array with readarray -d ''. This requires a reasonably recent Bash, and probably GNU grep.)
The 'echo' command can append output to a single file, which must be specified by redirecting the standard output. To update multiple files, a loop is needed. The loop iterated over all the files found with 'grep'
for file in $(grep -ril 'code' file/file) ; do
echo 'doesnt have' >> $file
done
Using a while read loop and Process Substitution.
#!/usr/bin/env bash
while IFS= read -r files; do
echo "doesn't have" >> "$files"
done < <(grep -ril 'code' file/file)
As mentioned by #dibery
#!/bin/sh
grep -ril 'code' file/file | {
while IFS= read -r files; do
echo "doesn't have" >> "$files"
done
}

Create file with egrep matches and file names

I need some help...
I'm creating a script of unit tests using shellscripts. That script, stores all the beeline calls from all files inside a directory.
The script is doing it's purpose, but I don't wanna append the file name if grep does not return results.
That's my code:
for file in $(ls)
do
cat $file | egrep -on '^( +)?\bbeeline.*password=;"?' >> testa_scripts.sh
echo $file >> testa_scripts.sh
done
How can I do that?
Thanks
grep returns a falsy exit status (1) if it doesn't find any matching lines, so you can put in an if statement to test if it matched anything. Inverted with ! here:
for file in ./*; do
if ! egrep -on '...' "$file" >> somefile; then
echo 'grep did not match anything'
fi
done
(I don't think there's any need for the ls instead of just a shell glob here.)

Changing the file names and copying into different directory

I have some files say about 1000 numbers.. Wanted to rename those files in such a way that, wanted to cut out only few chars from file name and copy it to some other directory.
Ex: Original file name.
vfcon062562~19.xml
vfcon058794~29.xml
vfcon072009~3.xml
vfcon071992~10.xml
vfcon071986~2.xml
vfcon071339~4.xml
vfcon069979~43.xml
Required O/P is cutting the ~and following chars.
O/P Ex:
vfcon058794.xml
vfcon062562.xml
vfcon069979.xml
vfcon071339.xml
vfcon071986.xml
vfcon071992.xml
vfcon072009.xml
But want to place n different directory.
If you are using bash or similar you can use the following simple loop:
for input in vfcon*xml
do
mv $input targetDir/$(echo $input | awk -F~ '{print $1".xml"}')
done
Or in a single line:
for input in vfcon*xml; do mv $input targetDir/$(echo $input | awk -F~ '{print $1".xml"}'); done
This uses awk to separate everything before ~ using it as a field separator and printing the first column and appending ".xml" to create the output file name. All this is prepended with the targetDir which can be a full path.
If you are using csh / tcsh then the syntax of the loop will be slightly different but the commands will be the same.
I like to make sure that my data set is correct prior to changing anything so I would put that into a variable first and then check over it.
files=$(ls vfcon*xml)
echo $files | less
Then, like #Stefan said, use a loop:
for i in $files
do
mv "$i" "$( echo "$file" | sed 's/~[0-9].//g')"
done
If you need help with bash you can use http://www.shellcheck.net/

Linux shell script to add leading zeros to file names

I have a folder with about 1,700 files. They are all named like 1.txt or 1497.txt, etc. I would like to rename all the files so that all the filenames are four digits long.
I.e., 23.txt becomes 0023.txt.
What is a shell script that will do this? Or a related question: How do I use grep to only match lines that contain \d.txt (i.e., one digit, then a period, then the letters txt)?
Here's what I have so far:
for a in [command i need help with]
do
mv $a 000$a
done
Basically, run that three times, with commands there to find one digit, two digits, and three digit filenames (with the number of initial zeros changed).
Try:
for a in [0-9]*.txt; do
mv $a `printf %04d.%s ${a%.*} ${a##*.}`
done
Change the filename pattern ([0-9]*.txt) as necessary.
A general-purpose enumerated rename that makes no assumptions about the initial set of filenames:
X=1;
for i in *.txt; do
mv $i $(printf %04d.%s ${X%.*} ${i##*.})
let X="$X+1"
done
On the same topic:
Bash script to pad file names
Extract filename and extension in bash
Using the rename (prename in some cases) script that is sometimes installed with Perl, you can use Perl expressions to do the renaming. The script skips renaming if there's a name collision.
The command below renames only files that have four or fewer digits followed by a ".txt" extension. It does not rename files that do not strictly conform to that pattern. It does not truncate names that consist of more than four digits.
rename 'unless (/0+[0-9]{4}.txt/) {s/^([0-9]{1,3}\.txt)$/000$1/g;s/0*([0-9]{4}\..*)/$1/}' *
A few examples:
Original Becomes
1.txt 0001.txt
02.txt 0002.txt
123.txt 0123.txt
00000.txt 00000.txt
1.23.txt 1.23.txt
Other answers given so far will attempt to rename files that don't conform to the pattern, produce errors for filenames that contain non-digit characters, perform renames that produce name collisions, try and fail to rename files that have spaces in their names and possibly other problems.
for a in *.txt; do
b=$(printf %04d.txt ${a%.txt})
if [ $a != $b ]; then
mv $a $b
fi
done
One-liner:
ls | awk '/^([0-9]+)\.txt$/ { printf("%s %04d.txt\n", $0, $1) }' | xargs -n2 mv
How do I use grep to only match lines that contain \d.txt (IE 1 digit, then a period, then the letters txt)?
grep -E '^[0-9]\.txt$'
Let's assume you have files with datatype .dat in your folder. Just copy this code to a file named run.sh, make it executable by running chmode +x run.sh and then execute using ./run.sh:
#!/bin/bash
num=0
for i in *.dat
do
a=`printf "%05d" $num`
mv "$i" "filename_$a.dat"
let "num = $(($num + 1))"
done
This will convert all files in your folder to filename_00000.dat, filename_00001.dat, etc.
This version also supports handling strings before(after) the number. But basically you can do any regex matching+printf as long as your awk supports it. And it supports whitespace characters (except newlines) in filenames too.
for f in *.txt ;do
mv "$f" "$(
awk -v f="$f" '{
if ( match(f, /^([a-zA-Z_-]*)([0-9]+)(\..+)/, a)) {
printf("%s%04d%s", a[1], a[2], a[3])
} else {
print(f)
}
}' <<<''
)"
done
To only match single digit text files, you can do...
$ ls | grep '[0-9]\.txt'
One-liner hint:
while [ -f ./result/result`printf "%03d" $a`.txt ]; do a=$((a+1));done
RESULT=result/result`printf "%03d" $a`.txt
To provide a solution that's cautiously written to be correct even in the presence of filenames with spaces:
#!/usr/bin/env bash
pattern='%04d' # pad with four digits: change this to taste
# enable extglob syntax: +([[:digit:]]) means "one or more digits"
# enable the nullglob flag: If no matches exist, a glob returns nothing (not itself).
shopt -s extglob nullglob
for f in [[:digit:]]*; do # iterate over filenames that start with digits
suffix=${f##+([[:digit:]])} # find the suffix (everything after the last digit)
number=${f%"$suffix"} # find the number (everything before the suffix)
printf -v new "$pattern" "$number" "$suffix" # pad the number, then append the suffix
if [[ $f != "$new" ]]; then # if the result differs from the old name
mv -- "$f" "$new" # ...then rename the file.
fi
done
There is a rename.ul command installed from util-linux package (at least in Ubuntu) by default installed.
It's use is (do a man rename.ul):
rename [options] expression replacement file...
The command will replace the first occurrence of expression with the given replacement for the provided files.
While forming the command you can use:
rename.ul -nv replace-me with-this in-all?-these-files*
for not doing any changes but reading what changes that command would make. When sure just reexecute the command without the -v (verbose) and -n (no-act) options
for your case the commands are:
rename.ul "" 000 ?.txt
rename.ul "" 00 ??.txt
rename.ul "" 0 ???.txt

Resources