Batch renaming files using variable extracted from file text - linux

Apologies if this has been answered, but I've spent several hours experimenting and searching online for the solution.
I have a folder with several thousand text files named e.g. '1.dat', '2.dat', 3.dat' etc. I would like to rename all of the files by extracting an 8-digit numerical ID from within the file text (the ID is always on the last line in columns 65-73), so that '1.dat' becomes '60741308.dat' etc.
I have made it as far as extracting the ID (using tail and cut) from the text file and assigning it to a variable, which I can then use to rename the file, on a single file,but I am unable to make it work as a batch process in a 'for' loop.
Here is what I have tried:
for i in *.dat
tmpname=$(tail -1 $i| cut -c 65-73)
mv $i $tmpname.dat
done
I get the following error: bash: syntax error near unexpected token `tmpname=$(tail -1 $i| cut -c 65-73)'
Any help much appreciated.

The syntax of a for loop in Bash is:
for i in {1..10}
do
echo $i
done
I can see that, you are missing the do keyword in your example. So, the correct version would be:
for i in *.dat
do
tmpname=$(tail -1 "$i" | cut -c 65-73)
mv "$i" "$tmpname.dat"
done

Related

Find and copy specific files by date

I've been trying to get a script working to backup some files from one machine to another but have been running into an issue.
Basically what I want to do is copy two files, one .log and one (or more) .dmp. Their format is always as follows:
something_2022_01_24.log
something_2022_01_24.dmp
I want to do three things with these files:
find the second to last one .log file (i.e. something_2022_01_24.log is the latest,I want to find the one before that say something_2022_01_22.log)
get a substring with just the date (2022_01_22)
copy every .dmp that matches the date (i.e something_2022_01_24.dmp, something01_2022_01_24.dmp)
For the first one from what I could find the best way is to do: ls -t *.log | head-2 as it displays the second to last file created.
As for the second one I'm more at a loss because I'm not sure how to parse the output of the first command.
The third one I think I could manage with something of the sort:
[ -f "/var/www/my_folder/*$capturedate.dmp" ] && cp "/var/www/my_folder/*$capturedate.dmp" /tmp/
What do you guys think is there any way to do this? How can I compare the substring?
Thanks!
Would you please try the following:
#!/bin/bash
dir="/var/www/my_folder"
second=$(ls -t "$dir/"*.log | head -n 2 | tail -n 1)
if [[ $second =~ .*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log ]]; then
capturedate=${BASH_REMATCH[1]}
cp -p "$dir/"*"$capturedate".dmp /tmp
fi
second=$(ls -t "$dir"/*.log | head -n 2 | tail -n 1) will pick the
second to last log file. Please note it assumes that the timestamp
of the file is not modified since it is created and the filename
does not contain special characters such as a newline. This is an easy
solution and we may need more improvement for the robustness.
The regex .*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log will match the log
filename. It extracts the date substring (enclosed with the parentheses) and assigns the bash variable
${BASH_REMATCH[1]} to it.
Then the next cp command will do the job. Please be cateful
not to include the widlcard * within the double quotes so that
the wildcard is properly expanded.
FYI here are some alternatives to extract the date string.
With sed:
capturedate=$(sed -E 's/.*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log/\1/' <<< "$second")
With parameter expansion of bash (if something does not include underscores):
capturedate=${second%.log}
capturedate=${capturedate#*_}
With cut command (if something does not include underscores):
capturedate=$(cut -d_ -f2,3,4 <<< "${second%.log}")

Rename large folder of Jpegs

I have a large folder of jpegs, which I would like to rename sequentially to image01.jpg, image02.jpg...image533jpg etc.
I have tried using the following
find ‘/myImages/‘ -maxdepth 1 -name ‘*.jpg’ | sort -n | awk 'BEGIN{ x=1 }{printf "mv \"%s\" \”/myImages/image%04d.jpg\”\n”, $0, x++ }' | bash
which I got from here: http://www.algissalys.com/how-to/how-to-quickly-rename-modify-and-scale-all-images-in-a-directory-using-linux
However, this is only returning
>
And then nothing happens, any suggestions would be great.
The easiest way to do that is with rename which you can install with homebrew using:
brew install rename
Then, you can go into your directory containing the images and run:
rename --dry-run -X -e '$_ = "$N"' *jpg
Sample Output
'a.jpg' would be renamed to '1.jpg'
'article.jpg' would be renamed to '2.jpg'
'blob-0.jpg' would be renamed to '3.jpg'
'blob-1.jpg' would be renamed to '4.jpg'
'blob-2.jpg' would be renamed to '5.jpg'
'blob-3.jpg' would be renamed to '6.jpg'
If that looks correct, you can run it again without the --dry-run to actually do it, rather than just telling you what it will do.
If you want your names zero-padded, the easiest is to let rename work out how much padding you need automatically like this:
rename --dry-run -X -N ...01 -e '$_ = "$N"' *jpg
The benefits of using rename are that:
it is simple and powerful
it will warn you before overwriting any files
it can do a dry run and tell you what would happen without actually doing anything
If you want an explanation of the command '$_ = "$N"' then read on...
The rename command is actually a Perl script, so the part I mention above is just a Perl script enclosed in single quotes. The $N is just a Perl variable that expands to be a sequentially increasing number. The Perl special variable $_ is filled with the name of the current file before your little Perl script is executed, and crucially, you are expected to set it to the name you want that input file renamed as.
You could do that with a bash script. Say you have the following in a file called rename_images.
#!/bin/bash
declare -a FILESERIES
FILESERIES=(`ls $1`)
NUM=${#FILESERIES[#]}
NEWNAME=$2
EXT=$3
for (( i=0; i<$NUM ; i++))
do
FI=${FILESERIES[$i]}
NEWFILENAME=`echo $NEWNAME$i$EXT`
mv $FI $NEWFILENAME
done
To do what you need, run the script from within the folder with all the images as follows:
./rename_images '*.jpg' image .jpg
And you should be sorted.

Bulk change file names in a directory - Shell

I tried to find something close to what I need, but ended up with bits and pieces from many questions here, and, obviously, my code doesn't work.
I've never programmed anything by myself, and have close to zero knowledge in programming.
What I'm trying to do, is to rename a bunch of files I have in 2 different directories, so that files in both have the same name, with no space chars.
For example:
~/Documents/Dir1/1.pdf instead of: ~/Documents/Dir1/file A.pdf
~/Documents/Dir2/1.pdf instead of: ~/Documents/Dir2/file A.pdf
This was the extent of my abilities:
#!/bin/bash
b4file=$1
c=0
for i in $b4file do
c=$((c+1))
pref=$(printf "%03d" $c)
mv "$i" "${pref}|$i"
done
The error I get is
mv.sh: line 7: syntax error near unexpected token `c=$((c+1))'
mv.sh: line 7: ` c=$((c+1))'
For loop read files one by one so use below code , this will remove your error.
#!/bin/bash
b4file=$1
c=0
for i in $b4file/* do
c=$((c+1))
pref=$(printf "%03d" $c)
mv "$i" "${pref}|$i"
done

Changing the file names and copying into different directory

I have some files say about 1000 numbers.. Wanted to rename those files in such a way that, wanted to cut out only few chars from file name and copy it to some other directory.
Ex: Original file name.
vfcon062562~19.xml
vfcon058794~29.xml
vfcon072009~3.xml
vfcon071992~10.xml
vfcon071986~2.xml
vfcon071339~4.xml
vfcon069979~43.xml
Required O/P is cutting the ~and following chars.
O/P Ex:
vfcon058794.xml
vfcon062562.xml
vfcon069979.xml
vfcon071339.xml
vfcon071986.xml
vfcon071992.xml
vfcon072009.xml
But want to place n different directory.
If you are using bash or similar you can use the following simple loop:
for input in vfcon*xml
do
mv $input targetDir/$(echo $input | awk -F~ '{print $1".xml"}')
done
Or in a single line:
for input in vfcon*xml; do mv $input targetDir/$(echo $input | awk -F~ '{print $1".xml"}'); done
This uses awk to separate everything before ~ using it as a field separator and printing the first column and appending ".xml" to create the output file name. All this is prepended with the targetDir which can be a full path.
If you are using csh / tcsh then the syntax of the loop will be slightly different but the commands will be the same.
I like to make sure that my data set is correct prior to changing anything so I would put that into a variable first and then check over it.
files=$(ls vfcon*xml)
echo $files | less
Then, like #Stefan said, use a loop:
for i in $files
do
mv "$i" "$( echo "$file" | sed 's/~[0-9].//g')"
done
If you need help with bash you can use http://www.shellcheck.net/

Unable to cat ~9000 files using command line

I am trying to cat ~9000 fasta like files into one larger file. All of the files are in a single subfolder. I keep getting the argument list is to long error.
This is a sample name from one of the files
efetch.fcgi?db=nuccore&id=CL640905.1&rettype=fasta&retmode=text
They are considered a document type file by the computer.
You can't use cat * > concatfile as you have limits on command line size. So take them one at a time and append:
ls | while read; do cat "$REPLY" >> concatfile; done
(Make sure concatfile doesn't exist beforehand.)
EDIT: As user6292850 rightfully points out, I might be overthinking it. This suffices, if your files don't have too weird names:
ls | xargs cat > concatfile
(but files with spaces in them, for example, would blow it up)
There is a limit on how many arguments you can place on the commandline.
You could use a for loop to handle this:
while read file;do
cat "${file}" >> path/to/output_folder;
done < <(find path/to/output_folder -maxdepth 1 -type f -print)
This will bypass the problem of an expanded glob with too many arguments.

Resources