How to have the list of files represented by a regex given in input to a bash script - linux

I'm creating a code for the automatic extraction of bib records from scientific papers.
In an old version of the script i gave in input the name of the folder where all the pdfs were stored, now I want to give a regex. E.g. before:
./AutoBib.sh Papers/
Now:
./Autobib.sh Papers/*.pdf
In the folder there are, for example 3 pdf files: Shrek.pdf, Fiona.pdf, Donkey.pdf, using my script I should be able to retrieve the doi from all files creating a file where all doi are listed but executing my script it returns the doi of the first file and nothing more.
Here there is my code:
for i in $1; do
doi $i
done
doi is a function that extract the doi from a pdf and puts it in a txt file. When i run the script it returns me only the doi of the first file.
How can I feed a regex in my script and being able to iterate though all files that matches that regex?

It's important to understand that Papers/*.pdf is not a regular expression, it's a wildcard pattern that causes bash to perform filename expansion, or globbing.
$1 represents the first argument to your script, so your for loop is only ever iterating over that one argument.
Use $# to represent all arguments:
for i in "$#"; do
doi "$i"
done

If you want to filter files within directory by pattern, you can pass this pattern as second script parameter and search for matching files using find.
Here is the code. It's additionally resistant to filenames containing spaces:
find "$1" -maxdepth 1 -name "$2" -exec doi {} \;
Usage example: ./Autobib.sh Papers/ *.pdf

You can just run the ls command in loop and it will solve your problem.
for x in $(ls $#/*.pdf)
do
echo $x ## if you want only file name you can change this line to echo `basename $x`
done
I have created the same scenario as you mentioned above, refer the snapshot.

Related

How to rename string in multiple filename in a folder using shell script without mv command since it will move the files to different folder? [duplicate]

This question already has answers here:
Rename multiple files based on pattern in Unix
(24 answers)
Closed 5 years ago.
Write a simple script that will automatically rename a number of files. As an example we want the file *001.jpg renamed to user defined string + 001.jpg (ex: MyVacation20110725_001.jpg) The usage for this script is to get the digital camera photos to have file names that make some sense.
I need to write a shell script for this. Can someone suggest how to begin?
An example to help you get off the ground.
for f in *.jpg; do mv "$f" "$(echo "$f" | sed s/IMG/VACATION/)"; done
In this example, I am assuming that all your image files contain the string IMG and you want to replace IMG with VACATION.
The shell automatically evaluates *.jpg to all the matching files.
The second argument of mv (the new name of the file) is the output of the sed command that replaces IMG with VACATION.
If your filenames include whitespace pay careful attention to the "$f" notation. You need the double-quotes to preserve the whitespace.
You can use rename utility to rename multiple files by a pattern. For example following command will prepend string MyVacation2011_ to all the files with jpg extension.
rename 's/^/MyVacation2011_/g' *.jpg
or
rename <pattern> <replacement> <file-list>
this example, I am assuming that all your image files begin with "IMG" and you want to replace "IMG" with "VACATION"
solution : first identified all jpg files and then replace keyword
find . -name '*jpg' -exec bash -c 'echo mv $0 ${0/IMG/VACATION}' {} \;
for file in *.jpg ; do mv $file ${file//IMG/myVacation} ; done
Again assuming that all your image files have the string "IMG" and you want to replace "IMG" with "myVacation".
With bash you can directly convert the string with parameter expansion.
Example: if the file is IMG_327.jpg, the mv command will be executed as if you do mv IMG_327.jpg myVacation_327.jpg. And this will be done for each file found in the directory matching *.jpg.
IMG_001.jpg -> myVacation_001.jpg
IMG_002.jpg -> myVacation_002.jpg
IMG_1023.jpg -> myVacation_1023.jpg
etcetera...
find . -type f |
sed -n "s/\(.*\)factory\.py$/& \1service\.py/p" |
xargs -p -n 2 mv
eg will rename all files in the cwd with names ending in "factory.py" to be replaced with names ending in "service.py"
explanation:
In the sed cmd, the -n flag will suppress normal behavior of echoing input to output after the s/// command is applied, and the p option on s/// will force writing to output if a substitution is made. Since a sub will only be made on match, sed will only have output for files ending in "factory.py"
In the s/// replacement string, we use "& " to interpolate the entire matching string, followed by a space character, into the replacement. Because of this, it's vital that our RE matches the entire filename. after the space char, we use "\1service.py" to interpolate the string we gulped before "factory.py", followed by "service.py", replacing it. So for more complex transformations youll have to change the args to s/// (with an re still matching the entire filename)
Example output:
foo_factory.py foo_service.py
bar_factory.py bar_service.py
We use xargs with -n 2 to consume the output of sed 2 delimited strings at a time, passing these to mv (i also put the -p option in there so you can feel safe when running this). voila.
NOTE: If you are facing more complicated file and folder scenarios, this post explains find (and some alternatives) in greater detail.
Another option is:
for i in *001.jpg
do
echo "mv $i yourstring${i#*001.jpg}"
done
remove echo after you have it right.
Parameter substitution with # will keep only the last part, so you can change its name.
Can't comment on Susam Pal's answer but if you're dealing with spaces, I'd surround with quotes:
for f in *.jpg; do mv "$f" "`echo $f | sed s/\ /\-/g`"; done;
You can try this:
for file in *.jpg;
do
mv $file $somestring_${file:((-7))}
done
You can see "parameter expansion" in man bash to understand the above better.

Get list of file fron regex

I'm trying to do a bash script that extract info from pdf documents. the first argument should be a regex or the name of a file. Es:
$ autobib shrek2001.pdf
$ autobib *.pdf
My idea is to generate a list of files matching the regex and extract information from them. My code at the moment looks like this:
for article in $(ls $1);do
pdfinfo $article
done
But doing so the loop stops at the first file. How can I loop over all the files matching my regex?
clpgr has it completely right. Change your program to look like this:
for article in "$#" ;do
pdfinfo $article
done
The reason your program only does the first file is that the shell command gets globbed. That is, when you issue the command autobib *.pdf, you are really issuing this command: autobib 1.pdf 2.pdf 3.pdf (well, I'm making up some file names since I don't know what's in the directory. But the point is, your program will have $1 set to 1.pdf so you'll be executing this code $( ls 1.pdf ) which would only return 1.pdf.
Truth is, your program may have worked (depending on the file names in the directory) if you executed this way: autobib "*.pdf". In this example, the "*.pdf" is not globbed by the shell because it is quoted. Now, your program's $1 variable will have the value *.pdf.
That said, "$#" is soooooo much better than $( ls $1 ). "$#" will actually preserve spaces in the arguments.

shell script to list file names alone in a directory & rename it [duplicate]

This question already has answers here:
How can I remove the extension of a filename in a shell script?
(15 answers)
Closed 6 years ago.
I'm new to scripting concept.. I have a requirement to rename multiple files in a directory like filename.sh.x into filename.sh
First I tried to get the file names in a particular directory.. so i followed the below scripting code
for entry in PathToThedirectory/*sh.x
do
echo $entry
done
& the above code listed down all the file names with full path..
But my expected o/p is : to get file names alone like abc.sh.x,
so that I can proceed with the split string mechanism to perform rename
operation easily...
help me to solve this ... Thanks in advance
First approach trying to follow OP suggestions:
for i in my/path/*.py.x
do
basename=$(basename "$i")
mv my/path/"$basename" my/path/"${basename%.*}"
done
And maybe, you can simplify it:
for i in my/path/*.py.x
do
mv "$i" "${i%.*}";
done
Documentation regarding this kind of operation (parameter expansion): https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
In particular:
${parameter%word} : The word is expanded to produce a pattern just as in filename expansion. If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the ‘%’ case) or the longest matching pattern (the ‘%%’ case) deleted
So, ${i%.*} means:
Take $i
Match .* at the end of its value (. being a literal character)
Remove the shortest matching pattern
Look into prename (installed together with the perl package on ubuntu).
Then you can just do something like:
prename 's/\.x$//' *.sh.x
In ksh you can do this:
for $file in $(ls $path)
do
new_file=$(basename $path/$file .x)
mv ${path}/${file} ${path}/${new_file}
done
This should do the trick:
for file in *.sh.x;
do
mv "$file " "${file /.sh.x}";
done
Running this rename command from the root directory should work:
rename 's/\.sh\.x$/.sh/' *.sh.x
for i in ls -la /path|grep -v ^d|awk '{print $NF}'
do
echo "basename $i"
done
it will give u the base name of all files or you can try below
find /path -type f -exec basename {} \;

How to make this (l)unix script dynamically accept directory name in for-loop?

I am teaching myself more (l)unix skills and wanted to see if I could begin to write a program that will eventually read all .gz files and expand them. However, I want it to be super dynamic.
#!/bin/bash
dir=~/derp/herp/path/goes/here
for file in $(find dir -name '*gz')
do
echo $file
done
So when I excute this file, I simply go
bash derp.sh.
I don't like this. I feel the script is too brittle.
How can I rework my for loop so that I can say
bash derp.sh ~/derp/herp/path/goes/here (1)
I tried re-coding it as follows:
for file in $*
However, I don't want to have to type in bash
derp.sh ~/derp/herp/path/goes/here/*.gz.
How could I rewrite this so I could simply type what is in (1)? I feel I must be missing something simple?
Note
I tried
for file in $*/*.gz and that obviously did not work. I appreciate your assistance, my sources have been a wrox unix text, carpentry v5, and man files. Unfortunately, I haven't found anything that will what I want.
Thanks,
GeekyOmega
for dir in "$#"
do
for file in "$dir"/*.gz
do
echo $file
done
done
Notes:
In the outer loop, dir is assigned successively to each argument given on the command line. The special form "$#" is used so that the directory names that contain spaces will be processed correctly.
The inner loop runs over each .gz file in the given directory. By placing $dir in double-quotes, the loop will work correctly even if the directory name contains spaces. This form will also work correctly if the gz file names have spaces.
#!/bin/bash
for file in $(find "$#" -name '*.gz')
do
echo $file
done
You'll probably prefer "$#" instead of $*; if you were to have spaces in filenames, like with a directory named My Documents and a directory named Music, $* would effectively expand into:
find My Documents Music -name '*.gz'
where "$#" would expand into:
find "My Documents" "Music" -name '*.gz'
Requisite note: Using for file in $(find ...) is generally regarded as a bad practice, because it does tend to break if you have spaces or newlines in your directory structure. Using nested for loops (as in John's answer) is often a better idea, or using find -print0 and read as in this answer.

How to remove the extension of a file?

I have a folder that is full of .bak files and some other files also. I need to remove the extension of all .bak files in that folder. How do I make a command which will accept a folder name and then remove the extension of all .bak files in that folder ?
Thanks.
To remove a string from the end of a BASH variable, use the ${var%ending} syntax. It's one of a number of string manipulations available to you in BASH.
Use it like this:
# Run in the same directory as the files
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
That works nicely as a one-liner, but you could also wrap it as a script to work in an arbitrary directory:
# If we're passed a parameter, cd into that directory. Otherwise, do nothing.
if [ -n "$1" ]; then
cd "$1"
fi
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
Note that while quoting your variables is almost always a good practice, the for FILENAME in *.bak is still dangerous if any of your filenames might contain spaces. Read David W.'s answer for a more-robust solution, and this document for alternative solutions.
There are several ways to remove file suffixes:
In BASH and Kornshell, you can use the environment variable filtering. Search for ${parameter%word} in the BASH manpage for complete information. Basically, # is a left filter and % is a right filter. You can remember this because # is to the left of %.
If you use a double filter (i.e. ## or %%, you are trying to filter on the biggest match. If you have a single filter (i.e. # or %, you are trying to filter on the smallest match.
What matches is filtered out and you get the rest of the string:
file="this/is/my/file/name.txt"
echo ${file#*/} #Matches is "this/` and will print out "is/my/file/name.txt"
echo ${file##*/} #Matches "this/is/my/file/" and will print out "name.txt"
echo ${file%/*} #Matches "/name.txt" and will print out "/this/is/my/file"
echo ${file%%/*} #Matches "/is/my/file/name.txt" and will print out "this"
Notice this is a glob match and not a regular expression match!. If you want to remove a file suffix:
file_sans_ext=${file%.*}
The .* will match on the period and all characters after it. Since it is a single %, it will match on the smallest glob on the right side of the string. If the filter can't match anything, it the same as your original string.
You can verify a file suffix with something like this:
if [ "${file}" != "${file%.bak}" ]
then
echo "$file is a type '.bak' file"
else
echo "$file is not a type '.bak' file"
fi
Or you could do this:
file_suffix=$(file##*.}
echo "My file is a file '.$file_suffix'"
Note that this will remove the period of the file extension.
Next, we will loop:
find . -name "*.bak" -print0 | while read -d $'\0' file
do
echo "mv '$file' '${file%.bak}'"
done | tee find.out
The find command finds the files you specify. The -print0 separates out the names of the files with a NUL symbol -- which is one of the few characters not allowed in a file name. The -d $\0means that your input separators are NUL symbols. See how nicely thefind -print0andread -d $'\0'` together?
You should almost never use the for file in $(*.bak) method. This will fail if the files have any white space in the name.
Notice that this command doesn't actually move any files. Instead, it produces a find.out file with a list of all the file renames. You should always do something like this when you do commands that operate on massive amounts of files just to be sure everything is fine.
Once you've determined that all the commands in find.out are correct, you can run it like a shell script:
$ bash find.out
rename .bak '' *.bak
(rename is in the util-linux package)
Caveat: there is no error checking:
#!/bin/bash
cd "$1"
for i in *.bak ; do mv -f "$i" "${i%%.bak}" ; done
You can always use the find command to get all the subdirectories
for FILENAME in `find . -name "*.bak"`; do mv --force "$FILENAME" "${FILENAME%.bak}"; done

Resources