Manipulate strings obtained from find command in bash script before dumping to file - linux

I am using the find command in a shell (bash) script to obtain a list of files matching a pattern and dump those filesnames to a text file.
find ./ -type f -name *.txt >> files.list
Which produces a file containing
helloworld.txt
letter.txt
document-1.txt
document-1.backup.txt
etc.
The find command isn't that important. The one I am actually using contains regex-es, and produces a sensible list of matching files to be used as input file to a program.
I want to tag each filename with a flag to mark whether it is a Type A file or a Type B file. ie, I want the output file files.list to look like this
helloworld.txt type=A
letter.txt type=A
document-1.txt type=B
document-1.backup.txt type=B
if helloworld and letter are Type A files, and document-1 is a Type B file.
I thought perhaps I could write a files.list.tmp file first, and then read it back in using another bash script processing it line by line... But that would make additional file and I don't really want to do that.
Can I use the >> operator to create a variable or something?
I'm not really sure what I should do here.
By the way, when this bash script is called, it already knows whether the files matching the regex are of type A or type B, as that is given as an argument to the script.

I thought perhaps I could write a files.list.tmp file first, and then read it back in using another bash script processing it line by line... But that would make additional file and I don't really want to do that
You can avoid creating temporary file by using process substitution:
./ascript.sh < <(find . -type f -name '*.txt')
And make your script read file names line by line.

You can pipe through the case that contains the choices:
find . -name '*.txt' | while read f; do type=UNKNOWN; case $f in *hello*) type=A;; *letter*) type=A;; *document*) type=B;; esac; echo $f type=$type; done
In the code above the choices are as follows
*hello*: type=A
*letter*: type=A
*document*: type=B
default: type=UNKNOWN
You can add as many *PATTERN*) type=TYPE;; entries as you wish. Given your files the output is:
./document-1.backup.txt type=B
./document-1.txt type=B
./letter.txt type=A
./helloworld.txt type=A

By the way, when this bash script is called, it already knows whether the files matching the regex are of type A or type B, as that is given as an argument to the script.
So, as your find command dumps only filenames of one given type per call, it suffices to append the type=… tag to each output line; this can be done with sed, e. g.:
find ./ -type f -name *.txt | sed "s/$/ type=$type/" >>files.list

Related

How to copy a file to a new file with a new name in same directory but across multiple directories in bash?

I am trying to copy an existing file (that is found in across directories) to a new file with a new name in Bash. For example, 'Preview.json' to 'Performance.json'. I have tried using
find * -type f -name 'Preview.json' -exec cp {} {}"QA" \;
But ended up with 'Preview.jsonQA'. (I am new to Bash.) I have tried moving the "QA" in front of the {} but I got errors because of an invalid path.
In an -exec predicate, the symbol {} represents a path that is being considered, starting at one of the starting-point directories designated in the command. Example: start/dir2/Preview.json. You can form other file names by either prepending or appending characters, but whether that makes sense depends on the details. In your case, appending produces commands such as
cp start/dir2/Preview.json start/dir2/Preview.jsonQA
which is a plausible command in the event that start/dir2/Preview.json exists. But cp does not automatically create directories in the destination path, so the result of prepending characters ...
cp start/dir2/Preview.json QAstart/dir2/Preview.json
... is not as likely to be accepted -- it depends on directory QAstart/dir2 existing.
I think what you're actually looking for may be cp commands of the form ...
cp start/dir2/Preview.json start/dir2/QAPreview.json
... but find cannot do this by itself.
For more flexibility in handling the file names discovered by find, pipe its output into another command. If you want to pass them as command-line arguments to another command, then you can interpose the xargs command to achieve that. The command on the receiving end of the pipe can be a shell function or a compound command if you wish.
For example,
# Using ./* instead of * ensures that file names beginning with - will not
# be misinterpreted as options:
find ./* -type f -name 'Preview.json' |
while IFS= read -r name; do # Read one line and store it in variable $name
# the destination name needs to be computed differently if the name
# contains a / character (the usual case) than if it doesn't:
case "${name}" in
*/*) cp "${name}" "${name%/*}/QA${name##*/}" ;;
*) cp "${name}" "QA${name}" ;;
esac
done
Note that that assumes that none of your directory names contain newline characters (the read command would split up newline-containing filenames). That's a reasonably safe assumption, but not absolutely safe.
Of course, you would generally want to have that in a script, not to try to type it on the fly on the command line.

Why does echo command interpret variable for base directory?

I would like to find some file types in pictures folder and I have created the following bash-script in /home/user/pictures folder:
for i in *.pdf *.sh *.txt;
do
echo 'all file types with extension' $i;
find /home/user/pictures -type f -iname $i;
done
But when I execute the bash-script, it does not work as expected for files that are located on the base directory /home/user/pictures. Instead of echo 'All File types with Extension *.sh' the command interprets the variable for base directory:
all file types with extension file1.sh
/home/user/pictures/file1.sh
all file types with extension file2.sh
/home/user/pictures/file2.sh
all file types with extension file3.sh
/home/user/pictures/file3.sh
I would like to know why echo - command does not print "All File types with Extension *.sh".
Revised code:
for i in '*.pdf' '*.sh' '*.txt'
do
echo "all file types with extension $i"
find /home/user/pictures -type f -iname "$i"
done
Explanation:
In bash, a string containing *, or a variable which expands to such a string, may be expanded as a glob pattern unless that string is protected from glob expansion by putting it inside quotes (although if the glob pattern does not match any files, then the original glob pattern will remain after attempted expansion).
In this case, it is not wanted for the glob expansion to happen - the string containing the * needs to be passed as a literal to each of the echo and the find commands. So the $i should be enclosed in double quotes - these will allow the variable expansion from $i, but the subsequent wildcard expansion will not occur. (If single quotes, i.e. '$i' were used instead, then a literal $i would be passed to echo and to find, which is not wanted either.)
In addition to this, the initial for line needs to use quotes to protect against wildcard expansion in the event that any files matching any of the glob patterns exist in the current directory. Here, it does not matter whether single or double quotes are used.
Separately, the revised code here also removes some unnecessary semicolons. Semicolons in bash are a command separator and are not needed merely to terminate a statement (as in C etc).
Observed behaviour with original code
What seems to be happening here is that one of the patterns used in the initial for statement is matching files in the current directory (specifically the *.sh is matching file1.sh file2.sh, and file3.sh). It is therefore being replaced by a list of these filenames (file1.sh file2.sh file3.sh) in the expression, and the for statement will iterate over these values. (Note that the current directory might not be the same as either where the script is located or the top level directory used for the find.)
It would also still be expected that the *.pdf and *.txt would be used in the expression -- either substituted or not, depending on whether any matches are found. Therefore the output shown in the question is probably not the whole output of the script.
Such expressions (*.blabla) changes the value of $i in the loop. Here is the trick i would do :
for i in pdf sh txt;
do
echo 'all file types with extension *.'$i;
find /home/user/pictures -type f -iname '*.'$i;
done

Shell script to update last character of the filename

I have files in directories and sub directories, the filename should be changed such that, the last character should be replaced to a number or a character depending upon arguments provided. I could do it for numbers but not happening for a character.
For eg. if File names are 20170504ABCDXXXYYY6.xml or 20170504CFLFXXXYYY6.cfl.bz2.
If I write the command ./updateLastCharacter 5, file names should be 20170504ABCDXXXYYY5.xml or 20170504CFLFXXXYYY5.cfl.bz2.
If the command is ./updateLastCharacter A, file names should be 20170504ABCDXXXYYYA.xml or 20170504CFLFXXXYYYA.cfl.bz2.
I'm new to shell scripting. I tried a lot for making it happen but what I could do is :
find $directory -exec rename "s/[0-9].xml/$newNumber.xml/;s/[0-9].cfl/$newRevisionNumber.cfl/" {} ";"
This works fine for number but I'm looking for how can I do it for a character with single line command.
A simple solution with rename ( perl function )
find . | rename -n -v 's/[A-Za-z0-9](?=\.)/5/'
-n means no action
-v means verbose to the screen
And you can use renrem program utility directly that is more powerful than rename in Perl.
I wrote that program to myself because I needed a lot to rename or remove files.

Bash script to get all file with desired extensions

I'm trying to write a bash script that if I pass a text file containing some extension and a folder returns me an output file with the list of all files that match the desired extension, searching recursively in all sub-directories
the folder is my second parameter the extension list file my first parameter
I have tried:
for i in $1 ; do
find . -name $2\*.$i -print>>result.txt
done
but doesn't work
As noted from in comment:
It is not a good idea to write to a hard coded file name.
The given example fixes only the given code from the OP question.
Yes of course, it is even better to call with
x.sh y . > blabla
and remove the filename from the script itself. But my intention is not to fix the question...
The following bash script, named as x.sh
#!/bin/bash
echo -n >result.txt # delete old content
while read i; do # read a line from file
find $2 -name \*.$i -print>>result.txt # for every item do a find
done <$1 # read from file named with first arg from cmdline
with an text file named y with following content
txt
sh
and called with:
./x.sh y .
results in a file result.txt which contents is:
a.txt
b.txt
x.sh
OK, lets give some additional hints as got from comments:
If the results fiel should not collect any other conntent from other results of the script it can be simplified to:
#!/bin/bash
while read i; do # read a line from file
find $2 -name \*.$i -print # for every item do a find
done <$1 >result.txt # read from file named with first arg from cmdline
And as already mentioned:
The hard coded result.txt could be removed and the call can be something like
./x.sh y . > result.txt
Give this one-liner command a try.
Replace /mydir with the folder to search.
Change the list of extensions passed as argument to the egrep command:
find /mydir -type f | egrep "[.]txt|[.]xml" >> result.txt
After the egrep, each extension should be separated with |.
. char must be escaped with [.]

How to make this (l)unix script dynamically accept directory name in for-loop?

I am teaching myself more (l)unix skills and wanted to see if I could begin to write a program that will eventually read all .gz files and expand them. However, I want it to be super dynamic.
#!/bin/bash
dir=~/derp/herp/path/goes/here
for file in $(find dir -name '*gz')
do
echo $file
done
So when I excute this file, I simply go
bash derp.sh.
I don't like this. I feel the script is too brittle.
How can I rework my for loop so that I can say
bash derp.sh ~/derp/herp/path/goes/here (1)
I tried re-coding it as follows:
for file in $*
However, I don't want to have to type in bash
derp.sh ~/derp/herp/path/goes/here/*.gz.
How could I rewrite this so I could simply type what is in (1)? I feel I must be missing something simple?
Note
I tried
for file in $*/*.gz and that obviously did not work. I appreciate your assistance, my sources have been a wrox unix text, carpentry v5, and man files. Unfortunately, I haven't found anything that will what I want.
Thanks,
GeekyOmega
for dir in "$#"
do
for file in "$dir"/*.gz
do
echo $file
done
done
Notes:
In the outer loop, dir is assigned successively to each argument given on the command line. The special form "$#" is used so that the directory names that contain spaces will be processed correctly.
The inner loop runs over each .gz file in the given directory. By placing $dir in double-quotes, the loop will work correctly even if the directory name contains spaces. This form will also work correctly if the gz file names have spaces.
#!/bin/bash
for file in $(find "$#" -name '*.gz')
do
echo $file
done
You'll probably prefer "$#" instead of $*; if you were to have spaces in filenames, like with a directory named My Documents and a directory named Music, $* would effectively expand into:
find My Documents Music -name '*.gz'
where "$#" would expand into:
find "My Documents" "Music" -name '*.gz'
Requisite note: Using for file in $(find ...) is generally regarded as a bad practice, because it does tend to break if you have spaces or newlines in your directory structure. Using nested for loops (as in John's answer) is often a better idea, or using find -print0 and read as in this answer.

Resources