Extracting a numbers from filenames

Extracting a numbers from filenames - linux

I have a bunch of files that all have a name and a serial number and an extension. I want to extract this serial number and extension. They look like this:
photo-123.jpg
photo-456.png
photo-789.bmp
etc.
I want to run a bash script to extract these serial numbers and place them in a file in this way:
123
456
789
etc.
Note that not all the photos have the same extension (bmp, png, jpg) but they all start with photo-.

You can use parameter substitution:
$ ls
photo-123.jpg photo-456.png photo-7832525239.bmp photo-789.bmp
$ for file in *; do
[[ -f "$file" ]] || continue
[[ $file == "num.log" ]] && continue
file=${file%.*} && echo "${file#*-}"
done > num.log
$ ls
num.log photo-123.jpg photo-456.png photo-7832525239.bmp photo-789.bmp
$ cat num.log
123
456
7832525239
789
${parameter#word} removes the shortest match from the start and ${parameter##word} removes the longest match from the start. ${parameter%word} on the contrary will remove shortest match from the end and ${parameter%%word} will remove longest match from the end.
Alternatively, you can read about nullglob instead of checking for existence of file in event there are no files in the directory. (Thanks Adrian Frühwirth for great feedback)

Using BASH regex:
f='photo-123.jpg'
[[ "$f" =~ -([0-9]+)\. ]] && echo "${BASH_REMATCH[1]}"
123
To run it against all the matching files:
for f in *-[0-9]*.*; do
[[ "$f" =~ -([0-9]+)\. ]] && echo "${BASH_REMATCH[1]}"
done

Assuming you just want to keep all of the numbers and you're using bash, here are a couple of things which you may find useful:
danny#machine:~$ file=abc123def.jpg
danny#machine:~$ echo ${file//[^0123456789]/}
123
danny#machine:~$ echo ${file##*.}
jpg
danny#machine:~$ echo ${file//[^0123456789]/}.${file##*.}
123.jpg
You should be able to write your script based on that. Or, just remove the leading "photo-" from $name by using
newname=$(name#photo-}
Those and several others are explained in the bash man page's Parameter Expansion section.

Or maybe with two consecutive awk calls:
ls -1 | awk -F- '{print $2}' | awk -F. '{print $1}'

How about
ls -l | awk {'print $9'} | grep -o -E '[0-9]*'
in the directory where the files reside?

Related

Script to remove spaces in all files and folders?

I wrote a script which removes spaces in a single folder/file name. I want to make it work so that it removes all spaces in folder/files name in the directory the script exists.
MY Script:
#!/bin/bash
var=$(ls | grep " ")
test=$(echo $var | sed 's/ //')
mv "$var" $test
How it worked
Thank you for helping!

Try this
ls | grep " " | while read file_name
do
mv "$file_name" "$(echo $file_name | sed -E 's/ +//g')"
done
sed -E is so that you can use some simple regex, and / +/ so it can work in case of multiple consecutive spaces such as . And /g so it replaces every occurrences such as foo baa .txt .

Something like this might work:
for f in * ; do
if [[ "$f" =~ \ ]] ; then
mv "$f" "${f// /_}"
fi
done
Explanattion:
for f in * ; do
loops over all file names in the directory. It doesn't have the quirks of ls that make parsing the output of ls a bad idea.
if [[ "$f" =~ \ ]] ; then
This is the bash way of pattern matching. The \ is the pattern. You need to escape the space with a backslash, otherwise the shell will not recognize it as a pattern.
mv "$f" "${f// /_}"
${f// /_} is the bash way of pattern-substitution. The // means replace all occurrences. The syntax is ${variable//pattern/replacement} to replace all patterns in the variable with the replacement.

how to not escape space and backslashes in echo and while in bash?

I'm passing two positional args to a script to run, both args are a path, and while in the scenario analyzing the paths, the problem is sometimes there is some path like: m i sc . . . . .. . . it has dots and spaces, and sometimes even we have a backslash in dir names.
It is so tried to get arguments via two procedures, directly and via at sign.
SOURCE_ARG=$1
DESTINATION_ARG=$2
and
ARG_COUNT=0
for POSITIONAL_ARGUMENTS in "${#}"
do
((ARG_COUNT++))
ARGUMENT_ARRAY[$ARG_COUNT]=$POSITIONAL_ARGUMENTS
done
In the loop, I iterate through the result of commands that have forwarded to them.
while IFS= read -r dir
do
echo "${ARGUMENT_ARRAY[1]}"
echo "${dir}"
while IFS= read -r item
do
# do some stuff
done < <(ls -A "$dir"/)
done < <(du -hP "$SOURCE_ARG" | awk '{$1=""; print $0}' | grep -v "^.$" | sed "s/^ //g")
when i use echo "${ARGUMENT_ARRAY[1]}" i get the same path as i need to check but when using loop iteration varible as dir in here ->echo "${dir}" i got all the spaces escaped, since other commands for that path could not do their jobs.
What I'm Asking for is that how can I get the output of $dir within the loop and as like as echo "${ARGUMENT_ARRAY[1]}" that i mentioned above(input with all spaces and backslashes)

Thanks to #Barmar in comments.
The only reason that filenames are without escapes (i.e. you see directories with no special character or special characters have been escaped) is because du is printing the filenames with escapes, so $dir variable would have escaped once and special characters are no longer available for the other loop iteration in my problem.
Now that we know the problem was raised by using du in my script:
while IFS= read -r dir
# do sth
done < <(du -hP "$SOURCE_ARG" | awk '{$1=""; print $0}' | grep -v "^.$" | sed "s/^ //g")
We can change the du to find and the problem is solved:
while IFS= read -r dir
# do sth
done < <(find "$SOURCE_ARG" -type d –)
PS 1:
Another problem raised as I wanted to print the lines to check them if they are ok or not (i.e. while debugging application) was with echo.
So be sure to try printf "%s\n" "$dir" instead of echo, as some versions of echo process escape sequences.
echo "${dir}"
printf "%s\n" "$dir"
PS 2:
Also If a filename has more than one space in a row, The way I used awk, was collapsing them into a single space.
awk '{$1=""; print $0}' | grep -v "^.$" | sed "s/^ //g"

Add leading zeros twice in filename

I have file names (from image tiles) consisting of two numbers separated by an underscore, e.g.
142_27.jpg
7_39.jpg
1_120.jpg
How can I (in linux) add leading zeros to both of these numbers? What I want is the file names as
142_027.jpg
007_039.jpg
001_120.jpg

You can use a single awk command to format filenames with leading zeroes using a printf:
for f in *.jpg; do
echo mv "$f" $(awk -F '[_.]' '{printf "%03d_%03d.%s", $1, $2, $3}' <<< "$f")
done
This will output:
mv 142_27.jpg 142_027.jpg
mv 1_120.jpg 001_120.jpg
mv 7_39.jpg 007_039.jpg
Once you're satisfied with the output, remove echo before mv command.

With perl based rename command
$ touch 142_27.jpg 7_39.jpg 1_120.jpg
$ rename -n 's/\d+/sprintf "%03d", $&/ge' *.jpg
rename(1_120.jpg, 001_120.jpg)
rename(142_27.jpg, 142_027.jpg)
rename(7_39.jpg, 007_039.jpg)
The -n option is for dry run, remove it for actual renaming
If perl based rename command is not available:
$ for f in *.jpg; do echo mv "$f" "$(echo "$f" | perl -pe 's/\d+/sprintf "%03d", $&/ge')"; done
mv 1_120.jpg 001_120.jpg
mv 142_27.jpg 142_027.jpg
mv 7_39.jpg 007_039.jpg
Change echo mv to just mv once dry run seems okay

You can do it with a little shell-script using sed:
for i in *.jpg;
do
new=`echo "$i" | sed -n -e '{ s/^\([0-9]\{0,3\}\)_\([0-9]\{0,3\}\).jpg/000\1_000\2.jpg/ };
{s/\([0-9]*\)\([0-9]\{3\}\)_\([0-9]*\)\([0-9]\{3\}\).jpg/\2_\4.jpg/p };'`;
mv "$i" "$new";
done;
I first append three leading zeros at the said places by default and afterwards cut off as many digits as necessary brginning at the start at said places so that only 3 digits are left

with bash substitution(a,b)
windows(bash)
for f in *.jpg;do a=${f%_*};b=${f#*_};mv $f $(printf "%03d_%07s" $a $b);done
linux
for f in *.jpg;do a=${f%_*};b=${f#*_};b=${b%.*};mv $f $(printf "%03d_%03d".jpg $a $b);done

How to use awk's output on bash command line

I have a file named "01 - Welcome To The Jungle.mp3", and I want to do eyeD3 -t "Welcome To the Jungle" 01 - Welcome To The Jungle.mp3 to modify the tag of the all the files in the folder. I've extracted from the file with awk: "Welcome To The Jungle" doing:
#!/bin/bash
for i in *.mp3
do
eyeD3 -t $(echo ${i} | awk -F' - ' '{print $2}' | awk -F'.' '{print $1}') ${i}
done
It doesn't work. Neither the whole "$(echo S{i}....)" nor the "${i}" seem to work for replacing the names of the respective files.

You need to prevent word splitting on IFS (default: space, tab, newline) by shell, as your input filename contains space(s). The typical workaround is to use double quotes around the variable expansion.
Do:
for i in *.mp3; do eyeD3 -t "$(echo "$i")" | ...; done
You can leverage here string, <<<, to avoid the echo-ing:
for i in *.mp3; do eyeD3 -t <<<"$i" | ...; done

You need to double-quote variables that may contain spaces, as #heemayl already pointed out.
Also, in this example, instead of using awk,
it would be better to use native Bash pattern substitution to extract the title, for example:
for file in *.mp3; do
title=${file%.mp3}
title=${title#?????}
eyeD3 -t "$title" "$file"
done
That is:
Remove .mp3 at the end
Remove the first 5 characters (the count prefix NN -)

Bash script to remove last three charater in a file name

For ex the file is this:
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00
I want to rename this file to:
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN

Using ${parameter%word} (Remove matching suffix pattern):
$ echo "$fn"
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00
$ echo "${fn%:*}"
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN

Using cut
$ echo $fn
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00
$ echo $fn |cut -d: -f1
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN
Using awk
echo $fn |awk -F : '{print $1}'
more ways...

According to the link here:
This should work:
awk '{old=$0;gsub(/...$/,"",$0);system("mv \""old"\" "$0)}'
provided the file name is given as input.
For eg:
ls -1 NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00|nawk '{old=$0;gsub(/...$/,"",$0);system("mv \""old"\" "$0)}'

Rename file using bash string manipulations:
# Filename needs to be in a variable
file=NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00
# Rename file
mv "$file" "${file%???}"
This removes the last three characters from filename.

Using just bash:
fn='NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00'
mv "$fn" "${fn::-3}"

if you have Ruby
echo NBDG6_CD* | ruby -e 'f=gets.chomp;File.rename(f, f[0..-4])'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extracting a numbers from filenames - linux

Using BASH regex: f='photo-123.jpg' [[ "$f" =~ -([0-9]+)\. ]] && echo "${BASH_REMATCH[1]}" 123 To run it against all the matching files: for f in -[0-9].*; do [[ "$f" =~ -([0-9]+)\. ]] && echo "${BASH_REMATCH[1]}" done

Or maybe with two consecutive awk calls: ls -1 | awk -F- '{print $2}' | awk -F. '{print $1}'

How about ls -l | awk {'print $9'} | grep -o -E '[0-9]*' in the directory where the files reside?

Related

Script to remove spaces in all files and folders?

how to not escape space and backslashes in echo and while in bash?

Add leading zeros twice in filename

How to use awk's output on bash command line

Bash script to remove last three charater in a file name

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extracting a numbers from filenames - linux

Using BASH regex: f='photo-123.jpg' [[ "$f" =~ -([0-9]+)\. ]] && echo "${BASH_REMATCH[1]}" 123 To run it against all the matching files: for f in *-[0-9]*.*; do [[ "$f" =~ -([0-9]+)\. ]] && echo "${BASH_REMATCH[1]}" done

Or maybe with two consecutive awk calls: ls -1 | awk -F- '{print $2}' | awk -F. '{print $1}'

How about ls -l | awk {'print $9'} | grep -o -E '[0-9]*' in the directory where the files reside?

Related

Script to remove spaces in all files and folders?

how to not escape space and backslashes in echo and while in bash?

Add leading zeros twice in filename

How to use awk's output on bash command line

Bash script to remove last three charater in a file name

Categories

Resources

Using BASH regex: f='photo-123.jpg' [[ "$f" =~ -([0-9]+)\. ]] && echo "${BASH_REMATCH[1]}" 123 To run it against all the matching files: for f in -[0-9].*; do [[ "$f" =~ -([0-9]+)\. ]] && echo "${BASH_REMATCH[1]}" done