Extract property value in filename? - string

I have many file paths of the form:
dir1/someotherdir/name_q=3_a=2.34_p=1.2.ext
I am running a bash script to do some processing on these files, and I need to extract the value of p (in this case 1.2; in general it is a floating number) from each of these paths. Basically I am running a for loop over all the file paths, and for each path, I need to extract the value of p. How can I do this?

Parameter expansion is a useful tool for this kind of operation:
#!/bin/bash
# ^^^^ IMPORTANT: Not /bin/sh
f=dir1/someotherdir/name_q=3_a=2.34_p=1.2.ext
if [[ $f = *_p=* ]]; then # Check for substring in filename
val=${f##*_p=} # Trim everything before the last "_p="
val=${val%%_*} # Trim everything after first subsequent _
val=${val%.ext} # Trim extension, should it exist.
echo "Extracted $val from filename $f"
fi
Alternately, you could also use shell-native regex support:
#!/bin/bash
# ^^^^ again, NOT /bin/sh
f=dir1/someotherdir/name_q=3_a=2.34_p=1.2.ext
# assigning regex to a variable avoids surprising behavior with some older bash releases
p_re='_p=([[:digit:].]+)(_|[.]ext$)'
if [[ $f =~ $p_re ]]; then # evaluate regex
echo "Extracted ${BASH_REMATCH[1]}" # extract groups from BASH_REMATCH array
fi

For completeness, another approach is to use eval. There can be security dangers here, you have to make your own mind-up if these are justified.
I am using IFS for the split - not everyone's favourite, but it is another way to do it. The eval will execute each assignment as it finds it, in this case dynamically creating variables q, a, and p.
fname='dir1/someotherdir/name_q=3_a=2.34_p=1.2.ext'
OldIFS="$IFS"
IFS='_'
for val in $fname
do
if [[ $val == *=* ]]
then
val=${val%.ext}
eval "$val"
fi
done
IFS="$OldIFS"
echo "$q"
echo "$a"
echo "$p"

Related

How to replace date part in filename with current date

How to replace only date part to current date of all files present in diretory in unix.
Folder path: C:/shan
Sample files:
CN_Apria_837p_20180924.txt
DN_Apria_837p_20150502.txt
GN_Apria_837p_20160502.txt
CH_Apria_837p_20170502.txt
CU_Apria_837p_20180502.txt
PN_Apria_837p_20140502.txt
CN_Apria_837p_20101502.txt
Desired result should be:
CN_Apria_837p_20190502.txt
DN_Apria_837p_20190502.txt
GN_Apria_837p_20190502.txt
CH_Apria_837p_20190502.txt
CU_Apria_837p_20190502.txt
PN_Apria_837p_20190502.txt
CN_Apria_837p_20190502.txt
Edit:
I'm completely new to unix sell scripting. I tried this below, however it's not working.
#!/bin/bash
for i in ls $1 | grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}'
do
x=echo $i | grep -oE '[0-9]{4}-[0-9]{2}-[0-9]{2}'
y=echo $i | sed "s/$x/$(date +%F)/g"
mv $1/$i $1/$y 2>/dev/null #incase if old date is same as current date
done
I would use regular expressions here. From the bash man-page:
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the right
of the operator is considered an extended regular expression and
matched accordingly (as in regex(3)). The return value is 0 if the
string matches the pattern, and 1 otherwise. .... Substrings
matched by parenthesized subexpressions within the regular
expression are saved in the array variable BASH_REMATCH. ...
The element of BASH_REMATCH with indexn is the portion of the
string matching the nth parenthesized sub-expression.
Hence, assuming that the variable x holds the name of one of the files
in question, the code
if [[ $x =~ ^(.*_)[0-9]+([.]txt$) ]]
then
mv "$x" "$BASH_REMATCH[1]$(date +%Y%m%d)$BASH_REMATCH[2]"
fi
first tests roughly whether the file indeed follows the required naming scheme, and then modifies the name accordingly.
Of course in practice, you will tailor the regexp to match your application better. Only you can know what variations in the file name are permitted.
The below should do this
for f in $(find /path/to/files -name "*_*_*_*.txt")
do
newname=$(echo "$f" | sed -r "s/[12][0-9]{3}[01][0-9][0-3][0-9]/$(date '+%Y%m%d')/g")
mv "$f" "$newname"
done
Try this Shellcheck-clean code:
#! /bin/bash -p
readonly dir=$1
shopt -s nullglob # Make glob patterns that match nothing expand to nothing
readonly dateglob='20[0-9][0-9][0-9][0-9][0-9][0-9]'
currdate=$(date '+%Y%m%d')
# shellcheck disable=SC2231
for path in "$dir"/*_${dateglob}.* ; do
name=${path##*/}
newname=${name/_${dateglob}./_${currdate}.}
if [[ $newname != "$name" ]] ; then
newpath="$dir/$newname"
printf "%q -> %q\\n" "$path" "$newpath"
mv -i -- "$path" "$newpath"
fi
done
shopt -s nullglob stops the code trying to process a garbage path if nothing matches the glob pattern in for path in ....
The pattern assigned to dateglob assumes that you will not have to process dates before 2000 (or after 2099!). Change it if that assumption is not valid.
The # shellcheck ... line is to prevent Shellcheck warning about the use of ${dateglob} without quotes. The quotes would be wrong in this case because they would prevent the glob pattern being expanded.
The pattern used to match filenames (*_${dateglob}.*) will match many more forms of filename than the examples given (e.g. A_20180313.tar.gz). You might want to change it.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for information about the Bash string manipulation mechanisms used (${path##...}, ${name/...}).
I've added a printf to output details of what is being moved.
The -i option to mv prompts for confirmation if a file would be overwritten. This turns out to be an issue for the example files because both CN_Apria_837p_20180924.txt and CN_Apria_837p_20101502.txt are identical except for the date, so the code tries to rename them to the same thing.
If any of the files with dates in their names have names beginning with '.', the code will not process them. Add line shopt -s dotglob somewhere before the loop if that is an issue.

Shell Script working with multiple files [duplicate]

This question already has answers here:
How to iterate over arguments in a Bash script
(9 answers)
Closed 5 years ago.
I have this code below:
#!/bin/bash
filename=$1
file_extension=$( echo $1 | cut -d. -f2 )
directory=${filename%.*}
if [[ -z $filename ]]; then
echo "You forgot to include the file name, like this:"
echo "./convert-pdf.sh my_document.pdf"
else
if [[ $file_extension = 'pdf' ]]; then
[[ ! -d $directory ]] && mkdir $directory
convert $filename -density 300 $directory/page_%04d.jpg
else
echo "ERROR! You must use ONLY PDF files!"
fi
fi
And it is working perfectly well!
I would like to create a script which I can do something like this: ./script.sh *.pdf
How can I do it? Using asterisk.
Thank you for your time!
Firstly realize that the shell will expand *.pdf to a list of arguments. This means that your shell script will never ever see the *. Instead it will get a list of arguments.
You can use a construction like the following:
#!/bin/bash
function convert() {
local filename=$1
# do your thing here
}
if (( $# < 1 )); then
# give your error message about missing arguments
fi
while (( $# > 0 )); do
convert "$1"
shift
done
What this does is first wrap your functionality in a function called convert. Then for the main code it first checks the number of arguments passed to the script, if this is less than 1 (i.e. none) you give the error that a filename should be passed. Then you go into a while loop which is executed as long as there are arguments remaining. The first argument you pass to the convert function which does what your script already does. Then the shift operation is performed, what this does is it throws away the first argument and then shifts all the remaining arguments "left" by one place, that is what was $2 now is $1, what was $3 now is $2, etc. By doing this in the while loop until the argument list is empty you go through all the arguments.
By the way, your initial assignments have a few issues:
you can't assume that the filename has an extension, your code could match a dot in some directory path instead.
your directory assignment seems to be splitting on . instead of /
your directory assignment will contain the filename if no absolute or relative path was given, i.e. only a bare filename
...
I think you should spend a bit more time on robustness
Wrap your code in a loop. That is, instead of:
filename=$1
: code goes here
use:
for filename in "$#"; do
: put your code here
done

check if a file is jpeg format using shell script

I know I can use file pic.jpg to get the file type, but how do I write an if statement to check it in a shell script?
E.g. (pseudo code):
if pic.jpg == jpeg file then
Try (assumes Bash v3.0+, using =~, the regex-matching operator):
if [[ $(file -b 'pic.jpg') =~ JPEG ]]; then ...
If you want to match file's output more closely:
if [[ $(file -b 'pic.jpg') =~ ^'JPEG ' ]]; then ...
This will only match if the output starts with 'JPEG', followed by a space.
Alternatively, if you'd rather use a globbing-style pattern:
if [[ $(file -b 'pic.jpg') == 'JPEG '* ]]; then ...
POSIX-compliant conditionals ([ ... ]) do not offer regex or pattern matching, so a different approach is needed:
if expr "$(file -b 'pic.jpg')" : 'JPEG ' >/dev/null; then ...
Note: expr only supports basic regular expressions and is implicitly anchored at the start of the string (no need for ^).
As for why [[ ... ]] rather than [ ... ] is needed in the Bash snippets:
Advanced features such as the regex operator (=~) or pattern matching (e.g., use of unquoted * to represent any sequence of chars.) are nonstandard (not part of the POSIX shell specification).
Since these features are incompatible with how standard conditionals ([ ... ]) work, a new syntax was required; Bash, Ksh, and Zsh use [[ ... ]].
Good old case is worth a mention, too.
case $(file -b pic.jpg) in
'JPEG '*)
echo is
;;
*)
echo is not
;;
esac
The lone right parentheses might seem uncanny at first, but other than that, this is reasonably simple, readable, versatile, and portable way back to the original Bourne shell. (POSIX allows for a matching left parenthesis before the expression, too.)
For JPEG files, the file -b output has JPEG as the first word on the line:
pax> file -b somefile.jpg
JPEG image data, JFIF standard 1.01, blah blah blah
So, you can use that to detect it with something like:
inputFile=somefile.jpg
if [[ $(file -b $testFile | awk '{print $1}') == "JPEG" ]] ; then
echo $inputFile is a JPEG file.
fi

Check if line in file contains a pattern in Bash

I'm trying to figure out why this wont check the lines in the file and echo
How do you compare or check if strings contain something?
#!/bin/bash
while read line
do
#if the line ends
if [[ "$line" == '*Bye$' ]]
then
:
#if the line ends
elif [[ "$line" == '*Fly$' ]]
then
echo "\*\*\*"$line"\*\*\*"
fi
done < file.txt
The problem is that *Bye$ is not a shell pattern (shell patterns don't use the $ notation, they just use the lack of a trailing *) — and even if it were, putting it in single-quotes would disable it. Instead, just write:
if [[ "$line" == *Bye ]]
(and similarly for Fly).
If you want to use proper regular expressions, that's done with the =~ operator, such as:
if [[ "$line" =~ Bye$ ]]
The limited regular expressions you get from shell patterns with == don't include things like the end-line marker $.
Note that you can do something this simple with shell patterns (*Bye) but, if you want the full power of regular expressions (or even just a consistent notation), =~ is the way to go.

Remove substring matching pattern both in the beginning and the end of the variable

As the title says, I'm looking for a way to remove a defined pattern both at the beginning of a variable and at the end. I know I have to use # and % but I don't know the correct syntax.
In this case, I want to remove http:// at the beginning, and /score/ at the end of the variable $line which is read from file.txt.
Well, you can't nest ${var%}/${var#} operations, so you'll have to use temporary variable.
Like here:
var="http://whatever/score/"
temp_var="${var#http://}"
echo "${temp_var%/score/}"
Alternatively, you can use regular expressions with (for example) sed:
some_variable="$( echo "$var" | sed -e 's#^http://##; s#/score/$##' )"
$ var='https://www.google.com/keep/score'
$ var=${var#*//} #removes stuff upto // from begining
$ var=${var%/*} #removes stuff from / all the way to end
$ echo $var
www.google.com/keep
You have to do it in 2 steps :
$ string="fooSTUFFfoo"
$ string="${string%foo}"
$ string="${string#foo}"
$ echo "$string"
STUFF
There IS a way to do it one step using only built-in bash functionality (no running external programs such as sed) -- with BASH_REMATCH:
url=http://whatever/score/
re='https?://(.*)/score/'
[[ $url =~ $re ]] && printf '%s\n' "${BASH_REMATCH[1]}"
This matches against the regular expression on the right-hand side of the =~ test, and puts the groups into the BASH_REMATCH array.
That said, it's more conventional to use two PE expressions and a temporary variable:
shopt -s extglob
url=http://whatever/score/
val=${url#http?(s)://}; val=${val%/score/}
printf '%s\n' "$val"
...in the above example, the extglob option is used to allow the shell to recognized "extglobs" -- bash's extensions to glob syntax (making glob-style patterns similar in power to regular expressions), among which ?(foo) means that foo is optional.
By the way, I'm using printf rather than echo in these examples because many of echo's behaviors are implementation-defined -- for instance, consider the case where the variable's contents are -e or -n.
how about
export x='https://www.google.com/keep/score';
var=$(perl -e 'if ( $ENV{x} =~ /(https:\/\/)(.+)(\/score)/ ) { print qq($2);}')

Resources