Bash script to get all file with desired extensions - linux

I'm trying to write a bash script that if I pass a text file containing some extension and a folder returns me an output file with the list of all files that match the desired extension, searching recursively in all sub-directories
the folder is my second parameter the extension list file my first parameter
I have tried:
for i in $1 ; do
find . -name $2\*.$i -print>>result.txt
done
but doesn't work

As noted from in comment:
It is not a good idea to write to a hard coded file name.
The given example fixes only the given code from the OP question.
Yes of course, it is even better to call with
x.sh y . > blabla
and remove the filename from the script itself. But my intention is not to fix the question...
The following bash script, named as x.sh
#!/bin/bash
echo -n >result.txt # delete old content
while read i; do # read a line from file
find $2 -name \*.$i -print>>result.txt # for every item do a find
done <$1 # read from file named with first arg from cmdline
with an text file named y with following content
txt
sh
and called with:
./x.sh y .
results in a file result.txt which contents is:
a.txt
b.txt
x.sh
OK, lets give some additional hints as got from comments:
If the results fiel should not collect any other conntent from other results of the script it can be simplified to:
#!/bin/bash
while read i; do # read a line from file
find $2 -name \*.$i -print # for every item do a find
done <$1 >result.txt # read from file named with first arg from cmdline
And as already mentioned:
The hard coded result.txt could be removed and the call can be something like
./x.sh y . > result.txt

Give this one-liner command a try.
Replace /mydir with the folder to search.
Change the list of extensions passed as argument to the egrep command:
find /mydir -type f | egrep "[.]txt|[.]xml" >> result.txt
After the egrep, each extension should be separated with |.
. char must be escaped with [.]

Related

How to rename file based on parent and child folder name in bash script

I would like to rename file based on parent/subparent directories name.
For example:
test.xml file located at
/usr/local/data/A/20180101
/usr/local/data/A/20180102
/usr/local/data/B/20180101
how to save test.xml file in /usr/local/data/output as
A_20180101_test.xml
A_20180102_test.xml
b_20180101_test.xml
tried shall script as below but does not help.
#!/usr/bin/env bash
target_dir_path="/usr/local/data/output"
for file in /usr/local/data/*/*/test.xml; do
l1="${file%%/*}"
l2="${file#*/}"
l2="${l2%%/*}"
filename="${file##*/}"
target_file_name="${l1}_${l2}_${filename}"
echo cp "$file" "${target_dir_path}/${target_file_name}"
done
Anything i am doing wrong in this shall script?
You can use the following command to do this operation:
source_folder="usr/local/data/";target_folder="target"; find $source_folder -type f -name test.xml | awk -v targetF=$target_folder 'BEGIN{FS="/"; OFS="_"}{printf $0" "; print targetF"/"$(NF-2),$(NF-1),$NF}' | xargs -n2 cp;
or on several lines for readibility:
source_folder="usr/local/data/";
target_folder="target";
find $source_folder -type f -name test.xml |\
awk -v targetF=$target_folder 'BEGIN{FS="/"; OFS="_"}{printf $0" "; print targetF"/"$(NF-2),$(NF-1),$NF}' |\
xargs -n2 cp;
where
target_folder is your target folder
source_folder is your source folder
the find command will search for all the test.xml named files present under this source folder
then the awk command will receive the target folder as a variable to be able to use it, then in the BEGIN bloc you define the field separator and output field separator, then you just print the initial filename as well as the new one
you use xargs to pass the result output grouped by 2 to the cp command and the trick is done
TESTED:
TODO:
you will just need to set up your source_folder and target_folder variables with what is on your environment and eventually put it in a script and you are good to go!
I've modified your code a little to get it to work. See comments in code
target_dir_path=""/usr/local/data/output"
for file in /usr/local/data/*/*/test.xml; do
tmp=${file%/*/*/*}
curr="${file#"$tmp/"}" # Extract wanted part of the filename
mod=${curr//[\/]/_} # Replace forward slash with underscore
mv "$file" "$target_dir_path$mod" # Move the file
done
if you have perl based rename command
$ for f in tst/*/*/test.xml; do
rename -n 's|.*/([^/]+)/([^/]+)/(test.xml)|./$1_$2_$3|' "$f"
done
rename(tst/A/20180101/test.xml, ./A_20180101_test.xml)
rename(tst/A/20180102/test.xml, ./A_20180102_test.xml)
rename(tst/B/20180101/test.xml, ./B_20180101_test.xml)
-n option is for dry run, remove it after testing
change tst to /usr/local/data and ./ to /usr/local/data/output/ for your usecase
.*/ to ignore file path
([^/]+)/([^/]+)/(test.xml) capture required portions
$1_$2_$3 re-arrange as required

How to create directories automatically in linux?

I am having a file named temp.txt where inside this file it contains the following content
https://abcdef/12345-xyz
https://ghifdfg/5426525-abc
I need to create a directories automatically in linux by using only th number part from each line in the file.
So the output should be something like 12345 and 5426525 directories created.
Any approach on how to do this could be helpful.
This is the code that i searched and got from internet,wherein this code, new directories will be created by the file name that starts with BR and W0 .
for file in {BR,W0}*.*; do
dir=${file%%.*}
mkdir -p "$dir"
mv "$file" "$dir"
done
Assuming each URL is of the form
http[s]://any/symbols/some_digits-some_letters
Then you indeed could use the simple prefix and suffix modifiers in shell variable expansion.
${x##*/} expands to the suffix part of x that starts after the last slash /.
${y%%-*} expands to the prefix part of y before the first -.
while read x ; do
y=${x##*/}
z=${y%%-*}
mkdir $z
done < temp.txt

Manipulate strings obtained from find command in bash script before dumping to file

I am using the find command in a shell (bash) script to obtain a list of files matching a pattern and dump those filesnames to a text file.
find ./ -type f -name *.txt >> files.list
Which produces a file containing
helloworld.txt
letter.txt
document-1.txt
document-1.backup.txt
etc.
The find command isn't that important. The one I am actually using contains regex-es, and produces a sensible list of matching files to be used as input file to a program.
I want to tag each filename with a flag to mark whether it is a Type A file or a Type B file. ie, I want the output file files.list to look like this
helloworld.txt type=A
letter.txt type=A
document-1.txt type=B
document-1.backup.txt type=B
if helloworld and letter are Type A files, and document-1 is a Type B file.
I thought perhaps I could write a files.list.tmp file first, and then read it back in using another bash script processing it line by line... But that would make additional file and I don't really want to do that.
Can I use the >> operator to create a variable or something?
I'm not really sure what I should do here.
By the way, when this bash script is called, it already knows whether the files matching the regex are of type A or type B, as that is given as an argument to the script.
I thought perhaps I could write a files.list.tmp file first, and then read it back in using another bash script processing it line by line... But that would make additional file and I don't really want to do that
You can avoid creating temporary file by using process substitution:
./ascript.sh < <(find . -type f -name '*.txt')
And make your script read file names line by line.
You can pipe through the case that contains the choices:
find . -name '*.txt' | while read f; do type=UNKNOWN; case $f in *hello*) type=A;; *letter*) type=A;; *document*) type=B;; esac; echo $f type=$type; done
In the code above the choices are as follows
*hello*: type=A
*letter*: type=A
*document*: type=B
default: type=UNKNOWN
You can add as many *PATTERN*) type=TYPE;; entries as you wish. Given your files the output is:
./document-1.backup.txt type=B
./document-1.txt type=B
./letter.txt type=A
./helloworld.txt type=A
By the way, when this bash script is called, it already knows whether the files matching the regex are of type A or type B, as that is given as an argument to the script.
So, as your find command dumps only filenames of one given type per call, it suffices to append the type=… tag to each output line; this can be done with sed, e. g.:
find ./ -type f -name *.txt | sed "s/$/ type=$type/" >>files.list

get the date part from filname from a path in shell script

I have a script as follows
pathtofile="/c/github/something/r1.1./myapp/*.txt"
echo $pathtofile
filename = ${pathtofile##*/}
echo $filename
i always have only one txt file as 2015-08-07.txt in the ../myapp/ directory. So the o/p is as follows:
/c/github/something/r1.1./myapp/2015-08-07.txt
*.txt
I need to extract the filename as 2015-08-07. i did follow a lot of the stack-overflow answers with same requirements. whats the best approach and how to do this to get the only date part of the filename from that path ?
FYI: the filename changes every time the script executed with today's date.
When you are saying:
pathtofile="/c/github/something/r1.1./myapp/*.txt"
you are storing the literal /c/github/something/r1.1./myapp/*.txt in a variable.
When you echo, this * gets expanded, so you see the results properly.
$ echo $pathtofile
/c/github/something/r1.1./myapp/2015-08-07.txt
However, if you quoted it you would see how the content is indeed a *:
$ echo "$pathtofile"
/c/github/something/r1.1./myapp/*.txt
So what you need to do is to store the value in, say, an array:
files=( /c/github/something/r1.1./myapp/*.txt )
This files array will be populated with the expansion of this expression.
Then, since you know that the array just contains an element, you can print it with:
$ echo "${files[0]}"
/c/github/something/r1.1./myapp/2015-08-07.txt
and then get the name by using Extract filename and extension in Bash:
$ filename=$(basename "${files[0]}")
$ echo "${filename%.*}"
2015-08-07
You are doing a lot for just getting the filename
$ find /c/github/something/r1.1./myapp/ -type f -printf "%f\n" | sed 's/\.txt//g'
2015-08-07

Launching program several times

I am using Mac Os. This is command line code to lauch my programm (two parts)
nucmer --mum file1.txt file2.txt
show-snps -Clr -x 2 out.delta > out_file1.snps
First part of the programm creates file out.delta. My file2.txt is always the same, but I want to launch this both parts 35000 times whith different file1.txt. All the file1s are located in the same directory.
Is it possible to do it using BASH?
Keep all the input files in a directory. Create a wrapper script to invoke nucmer script and then show-snps script. Your wrapper script will accept path to file directory as input. Iterate over all files in the directory and call your two scripts.
You could do something along these lines:
find . -maxdepth 1 -type f -print | grep -v './out_' | while read f
do
b=$(basename ${f})
nucmer --mum ${f} file2.txt
show-snps -Clr -x 2 out.delta > out_${b}.snps
done
The find bit finds all files in the current directory. grep filters out any previous output files, in case you've run some previously. The basename line strips off the leading ./ and trailing extension, and then your two programs get run with the input file name and an output filename based on the basename output.
If you don't get an argument list too long error, you could just use for:
for f in file*.txt; do nucmer --mum $f second.txt; show-snps -Clr -x 2 out.delta > out_${f%.txt}.snps; done

Resources