Find files recursively and rename based on their full path - linux

I'm looking to search for files of a specific name, modify the name to the full path and then copy the results to another folder.
Is it possible to update each find result with the full path as the file name; i.e.
./folder/subfolder/my-file.csv
becomes
folder_subfolder_my-file.csv
I am listing the files using the following and would like to script it.
find . -name my-file.csv -exec ls {} \;

Since you're using bash, you can take advantage of globstar and use a for loop:
shopt -s globstar # set globstar option
for csv in **/my-file.csv; do
echo "$csv" "${csv//\//_}"
done
shopt -u globstar # unset the option if you don't want it any more
With globstar enabled, ** does a recursive search (similar to the basic functionality of find).
"${csv//\//_}" is an example of ${var//match/replace}, which does a global replacement of all instances of match (here an escaped /) with replace.
If you're happy with the output, then change the echo to mv.

Just to demonstrate how to do this with find;
find . -type f -exec bash -c '
for file; do
f=${file#./}
cp "$file" "./${f//\//_}"
done' _ {} +
The Bash pattern expansion ${f//x/y} replaces x with y throughout. Because find prefixes each found file with the path where it was found (here, ./) we trim that off in order to avoid doing mv "./file" "._file". And because the slash is used in the parameter expansion itself, we need to backslash the slash we want the shell to interpret literally. Finally, because this parameter expansion syntax is a Bash-only extension, we use bash rather than sh.
Obviously, if you want to rename rather than copy, replace cp with mv.
If your find does not support -exec ... + this needs to be refactored somewhat (probably to use xargs); but it should be supported on any reasonably modern platform.

With perl's rename command ...
$ prename
Usage: rename [-v] [-n] [-f] perlexpr [filenames]
... you can rename multiple files by applying a regular expression. rename also accepts file names via stdin:
find ... | rename -n 's#/#_#g'
Check the results and if they are fine, remove -n.

Related

Syntacticaly error while writing a Unix Shell script (Bash shell) [duplicate]

Say I want to copy the contents of a directory excluding files and folders whose names contain the word 'Music'.
cp [exclude-matches] *Music* /target_directory
What should go in place of [exclude-matches] to accomplish this?
In Bash you can do it by enabling the extglob option, like this (replace ls with cp and add the target directory, of course)
~/foobar> shopt extglob
extglob off
~/foobar> ls
abar afoo bbar bfoo
~/foobar> ls !(b*)
-bash: !: event not found
~/foobar> shopt -s extglob # Enables extglob
~/foobar> ls !(b*)
abar afoo
~/foobar> ls !(a*)
bbar bfoo
~/foobar> ls !(*foo)
abar bbar
You can later disable extglob with
shopt -u extglob
The extglob shell option gives you more powerful pattern matching in the command line.
You turn it on with shopt -s extglob, and turn it off with shopt -u extglob.
In your example, you would initially do:
$ shopt -s extglob
$ cp !(*Music*) /target_directory
The full available extended globbing operators are (excerpt from man bash):
If the extglob shell option is enabled using the shopt builtin, several extended
pattern matching operators are recognized.A pattern-list is a list of one or more patterns separated by a |. Composite patterns may be formed using one or more of the following sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
So, for example, if you wanted to list all the files in the current directory that are not .c or .h files, you would do:
$ ls -d !(*#(.c|.h))
Of course, normal shell globing works, so the last example could also be written as:
$ ls -d !(*.[ch])
Not in bash (that I know of), but:
cp `ls | grep -v Music` /target_directory
I know this is not exactly what you were looking for, but it will solve your example.
If you want to avoid the mem cost of using the exec command, I believe you can do better with xargs. I think the following is a more efficient alternative to
find foo -type f ! -name '*Music*' -exec cp {} bar \; # new proc for each exec
find . -maxdepth 1 -name '*Music*' -prune -o -print0 | xargs -0 -i cp {} dest/
A trick I haven't seen on here yet that doesn't use extglob, find, or grep is to treat two file lists as sets and "diff" them using comm:
comm -23 <(ls) <(ls *Music*)
comm is preferable over diff because it doesn't have extra cruft.
This returns all elements of set 1, ls, that are not also in set 2, ls *Music*. This requires both sets to be in sorted order to work properly. No problem for ls and glob expansion, but if you're using something like find, be sure to invoke sort.
comm -23 <(find . | sort) <(find . | grep -i '.jpg' | sort)
Potentially useful.
You can also use a pretty simple for loop:
for f in `find . -not -name "*Music*"`
do
cp $f /target/dir
done
In bash, an alternative to shopt -s extglob is the GLOBIGNORE variable. It's not really better, but I find it easier to remember.
An example that may be what the original poster wanted:
GLOBIGNORE="*techno*"; cp *Music* /only_good_music/
When done, unset GLOBIGNORE to be able to rm *techno* in the source directory.
My personal preference is to use grep and the while command. This allows one to write powerful yet readable scripts ensuring that you end up doing exactly what you want. Plus by using an echo command you can perform a dry run before carrying out the actual operation. For example:
ls | grep -v "Music" | while read filename
do
echo $filename
done
will print out the files that you will end up copying. If the list is correct the next step is to simply replace the echo command with the copy command as follows:
ls | grep -v "Music" | while read filename
do
cp "$filename" /target_directory
done
One solution for this can be found with find.
$ mkdir foo bar
$ touch foo/a.txt foo/Music.txt
$ find foo -type f ! -name '*Music*' -exec cp {} bar \;
$ ls bar
a.txt
Find has quite a few options, you can get pretty specific on what you include and exclude.
Edit: Adam in the comments noted that this is recursive. find options mindepth and maxdepth can be useful in controlling this.
The following works lists all *.txt files in the current dir, except those that begin with a number.
This works in bash, dash, zsh and all other POSIX compatible shells.
for FILE in /some/dir/*.txt; do # for each *.txt file
case "${FILE##*/}" in # if file basename...
[0-9]*) continue ;; # starts with digit: skip
esac
## otherwise, do stuff with $FILE here
done
In line one the pattern /some/dir/*.txt will cause the for loop to iterate over all files in /some/dir whose name end with .txt.
In line two a case statement is used to weed out undesired files. – The ${FILE##*/} expression strips off any leading dir name component from the filename (here /some/dir/) so that patters can match against only the basename of the file. (If you're only weeding out filenames based on suffixes, you can shorten this to $FILE instead.)
In line three, all files matching the case pattern [0-9]*) line will be skipped (the continue statement jumps to the next iteration of the for loop). – If you want to you can do something more interesting here, e.g. like skipping all files which do not start with a letter (a–z) using [!a-z]*, or you could use multiple patterns to skip several kinds of filenames e.g. [0-9]*|*.bak to skip files both .bak files, and files which does not start with a number.
this would do it excluding exactly 'Music'
cp -a ^'Music' /target
this and that for excluding things like Music?* or *?Music
cp -a ^\*?'complete' /target
cp -a ^'complete'?\* /target

find recursively, but with specific sub-folder name

This command find all files name "log_7" recursively in current folder.
find . -name log_7
Assume many sub-folders under the current folder tree has a file with that same name "log_7":
./am/f1/log_7
./ke/f2/log_7
./sa/f6/log_7
..
./xx/f97/log_7
Is there a way to explicitly say that we only want to search for "log_7" in a folder name "f2" ? such that the result from find will only list only one entry:
./ke/f2/log_7
You could use a regular expression.
find . -regex '.*/f2/log_7'
This will only match if log_7 is directly nested under f2
There is a different way is to use xargs for same thing
find . -name filename | xargs grep -e refectory
But find with built in regex is preferable.
Simple glob should do:
printf '%s\n' */f2/log_7
If there is a possibility for more leading folders, you can use the globstar option:
shopt -s globstar
printf '%s\n' **/f2/log_7

Bash script to change file names of different extension [duplicate]

This question already has answers here:
Batch renaming files with Bash
(10 answers)
Closed 4 years ago.
we are working on a angularjs project, where the compiled output contains lot of file extensions like js,css, woff, etc.. along with individual dynamic hashing as part of file name.
I am working on simple bash script to search the files belonging to the mentioned file extensions and moving to some folder with hashing removed by
searching for first instance of '.'.
Please note file extension .woff and .css should be retained.
/src/main.1cc794c25c00388d81bb.js ==> /dst/main.js
/src/polyfills.eda7b2736c9951cdce19.js ==> /dst/polyfills.js
/src/runtime.a2aefc53e5f0bce023ee.js ==> /dst/runtime.js
/src/styles.8f19c7d2fbe05fc53dc4.css ==> /dst/styles.css
/src/1.620807da7415abaeeb47.js ==> /dst/1.js
/src/2.93e8bd3b179a0199a6a3.js ==> /dst/2.js
/src/some-webfont.fee66e712a8a08eef580.woff ==> /dst/some-webfont.woff
/src/Web_Bd.d2138591460eab575216.woff ==> /dst/Web_Bd.woff
Bash code:
#!/bin/bash
echo Process web binary files!
echo Processing the name change for js files!!!!!!!!!!!!!
sfidx=0;
SFILES=./src/*.js #{js,css,voff}
DST=./dst/
for files in $SFILES
do
echo $(basename $files)
cp $files ${DST}"${files//.*}".js
sfidx=$((sfidx+1))
done
echo Number of target files detected in srcdir $sfidx!!!!!!!!!!
The above code has 2 problems,
Need to add file extensions in for loop at a common place, instead of running for each extension. However, this method fails, not sure this needs to be changed.
SFILES=./src/*.{js,css,voff}
cp: cannot stat `./src/*.{js,css,voff}': No such file or directory
Second, the cp cmd fails due to below reason, need some help to figure out correct syntax.
cp $files ${DST}"${files//.*}".js
1.620807da7415abaeeb47.js
cp: cannot create regular file `./dst/./src/1.620807da7415abaeeb47.js.js': No such file or directory
Here is a relatively simple command to do it:
find ./src -type f \( -name \*.js -o -name \*.css -o -name \*.woff \) -print0 |
while IFS= read -r -d $'\0' line; do
dest="./dst/$(echo $(basename $line) | sed -E 's/(\..{20}\.)(js|css|woff)/\.\2/g')"
echo Copying $line to $dest
cp $line $dest
done
This is based on the original code and is Shellcheck-clean:
#!/bin/bash
shopt -s nullglob # Make globs that match nothing expand to nothing
echo 'Process web binary files!'
echo 'Processing the name change for js, css, and woff files!!!!!!!!!!!!!'
srcfiles=( src/*.{js,css,woff} )
destdir=dst
for srcpath in "${srcfiles[#]}" ; do
filename=${srcpath##*/}
printf '%s\n' "$filename"
nohash_base=${filename%.*.*} # Remove the hash and suffix from the end
suffix=${filename##*.} # Remove everything up to the final '.'
newfilename=$nohash_base.$suffix
cp -- "$srcpath" "$destdir/$newfilename"
done
echo "Number of target files detected in srcdir ${#srcfiles[*]}!!!!!!!!!"
The code uses an array instead of a string to hold the list of files because it is easier (and generally safer because it can handle file with names that contain spaces and other special characters). See Arrays [Bash Hackers Wiki] for information about using arrays in Bash.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for information about using ${var##pattern} etc. for extracting parts of strings.
See Correct Bash and shell script variable capitalization for an explanation of why it is best to avoid uppercase variable names (such as SFILES).
shopt -s nullglob prevents strange things happening if the glob pattern(s) fail to match. See Why is nullglob not default? for more information.
See Bash Pitfalls #2 (cp $file $target) for why it's generally better to use cp -- instead of plain cp (though it's not necessary in this case (since neither argument can begin with '-')).
It's best to keep Bash code Shellcheck-clean. When run on the code in the question it identifies the key problem, and recommends the use of arrays as a way to fix it. It also identifies several other potential problems.
Your problem is precedence of the expansions. Here is my solution:
#!/bin/bash
echo Process web binary files!
echo Processing the name change for js files!!!!!!!!!!!!!
sfidx=0;
SFILES=$(echo ./src/*.{js,css,voff})
DST=./dst/
for file in $SFILES
do
new=${file##*/} # basename
new="${DST}${new%\.*}.js" # escape \ the .
echo "copying $file to $new" # sanity check
cp $file "$new"
sfidx=$((sfidx+1))
done
echo Number of target files detected in srcdir $sfidx!!!!!!!!!!
With three files in ./src all named "gash" I get:
Process web binary files!
Processing the name change for js files!!!!!!!!!!!!!
copying ./src/gash.js to ./dst/gash.js
copying ./src/gash.css to ./dst/gash.js
copying ./src/gash.voff to ./dst/gash.js
Number of target files detected in srcdir 3!!!!!!!!!!
(You might be able to get around using eval, but that can be a security issue)
new=${file##*/} - remove the longest string on the left ending in / (remove leading directory names, as basename). If you wanted to use the external non-shell basename program then it would be new=$(basename $file).
${new%\.*} - remove the shortest string on the right starting . (remove the old file extension)
A possible approach is to have the find command generate a shell script, then execute it.
src=./src
dst=./dst
find "$src" \( -name \*.js -o -name \*.woff -o -name \*.css \) \
-printf 'p="%p"; f="%f"; cp "$p" "'"${dst}"'/${f%%%%.*}.${f##*.}"\n'
This will print the shell commands you want to execute. If they are what you want,
just pipe the output to a shell:
find "$src" \( -name \*.js -o -name \*.woff -o -name \*.css \) \
-printf 'p="%p"; f="%f"; cp "$p" "'"${dst}"'/${f%%%%.*}.${f##*.}"\n'|bash
(or |bash -x if you want to see what is going on.)
If you have files named, e.g., ./src/dir1/a.xyz.js and ./src/dir2/a.uvw.js they will both end up as ./dst/a.js, the second overwriting the first. To avoid this, you might want to use cp -i instead of cp.
If you are absolutely sure that there will never be spaces or other strange characters in your pathnames, you can use less quotes (to the horror of some shell purists)
find $src \( -name \*.js -o -name \*.woff -o -name \*.css \) \
-printf "p=%p; f=%f; cp \$p ${dst}/\${f%%%%.*}.\${f##*.}\\n"|bash
Some final remarks:
%p and %f are expanded by -printf as the full pathname and the basename of the processed file. they enable us to avoid the basename command. Unfortunately, the is no such directive for the file extension, so we must use brace expansion in the shell to compose the final name.
In the -printf argument, we must use %% to write a single percent character. Since we need two of them, there have to be four...
${f%%.*} expands in the shell as the value of $f with everything removed from the first dot onwards
${f##*.} expands in the shell as the value of $f with everything removed up to the last dot (i.e., it expands to the file extension)

No such file or directory when piping. Each command works separately, but not when piping

I have 2 folders: folder_a & folder_b. In each of these folders there are a bunch of files. I am trying to use sed to move all of these files out of these folders and into my current working directory I am currently in.
My folder structure looks like this:
mytest:
a:
1.txt
2.txt
3.txt
b:
4.txt
5.txt
The command I am trying to use is:
find . -type d ! -iname '*.*' # find all folders other than root
| sed -r 's/.*/&\/*/' # add '/*' to each of the arguments
| sed -r 'p;s/.*/./' # output: a/* . b/* .
| xargs -n 2 mv # should be creating two commands: 'mv a/* .' and 'mv b/* .'
Unfortunately I get an error:
mv: cannot stat './aaa/*': No such file or directory
I also get the same error when I try this other strategy (using ls instead of mv):
for dir in */; do
ls $dir;
done;
Even if I use sed to replace the spaces in each directory name with '\ ', or surround the directory names with quotes I get the same error.
I'm not sure if these 2 examples are related in my misunderstanding of bash but they both seem to demonstrate my ignorance of how bash translates the output from one command into the input of another command.
Can anyone shed some light on this?
Update: Completely rewritten.
As #EtanReisner and #melpomene have noted, mv */* . or, more specifically, mv a/* b/* . is the most straightforward solution, but you state that this is in part a learning exercise, so the remainder of the answer shows an efficient find-based solution and explains the problem with the original command.
An efficient find-based solution
Generally, if feasible, it's best and most efficient to let find itself do the work, without involving additional tools; find's -exec action is like a built-in xargs, with {} representing the path at hand (with terminator \;) / all paths (with +):
find . -type f -exec echo mv -t . {} +
To be safe, his will just print the mv commands that would be executed; remove the echo to actually execute them.
This will execute a single[1] mv command to which all matching files are passed, and -t . moves them all to the current dir.
[1] If the resulting command line is too long (which is unlikely), it is split up into multiple commands, just as with xargs.
Operating on files (-type f) bypasses the need for globbing, as find will then enumerate all files for you (it also bypasses the need to exclude . explicitly).
Note that this solution works on entire subtrees, not just (immediate) subdirectories.
It's tempting to consider turning on Bash 4's globstar option and using mv */** ., but that won't work, because it will attempt to move directories as well, not just the files in them.
A caveat re -exec with +: it only works if {} - the placeholder for all paths - is the token immediately before the +.
Since you're on Linux, we can satisfy this condition by specifying the target folder for mv with option -t before the {}; on BSD-based systems such as OSX, you could not do that, because mv doesn't support -t there, so you'd have to use terminator \;, which means that mv is called once for every path, which is obviously much slower.
Why your command didn't work:
As #EtanReisner points out in a comment, xargs invokes the command specified without (implicitly) involving a shell, so globbing won't work; you can verify this with the following command:
echo '*' | xargs echo # -> '*' - NO globbing
If we leave the globbing issue aside, additional work would have been necessary to make your xargs command work correctly with folder names with embedded spaces (or other shell metacharacters):
find . -mindepth 1 -type d |
sed -r "s/.*/'&'\/* ./" | # -> '<input-path>'/* . (including single-quotes)
xargs -n 2 echo mv # NOTE: still won't work due to lack of globbing
Note how the (combined) sed command now produces a single output line '<input-path>'/* ., with the input path enclosed in embedded single-quotes, which is required for xargs to recognize <input-path> as a single argument, even if it contains embedded spaces.
(If your filenames contain single-quotes, you'd have to do more work; also note that since now all arguments for a given dir. are on a single line, you could use xargs -L 1 ....)
Also note how -mindepth 1 (only process paths at the subdirectory level or below) is used to skip processing of . itself.
The only way to make globbing happen is to get the shell involved:
find . -mindepth 1 -type d |
sed -r "s/.*/'&'\/* ./" | # -> '<input-path>'/* . (including single-quotes)
xargs -I {} sh -c 'echo mv {}' # works, but is inefficient
Note the use of xargs' -I option to treat each input line as its own argument ({} is a self-chosen placeholder for the input).
sh -c invokes the (default) shell to execute the resulting command, at which globbing does happen.
However, overall, this is quite inefficient:
A pipeline with 3 segments is used.
A shell instance is invoked for every input path, which in turn calls the mv utility.
Compare this to the efficient find-only solution above, which (typically) creates only 2 processes in total.

Unix: traverse a directory

I need to traverse a directory so starting in one directory and going deeper into difference sub directories. However I also need to be able to have access to each individual file to modify the file. Is there already a command to do this or will I have to write a script? Could someone provide some code to help me with this task? Thanks.
The find command is just the tool for that. Its -exec flag or -print0 in combination with xargs -0 allows fine-grained control over what to do with each file.
Example: Replace all foo's by bar's in all files in /tmp and subdirectories.
find /tmp -type f -exec sed -i -e 's/foo/bar/' '{}' ';'
for i in `find` ; do
if [ -d $i ] ; then do something with a directory ; fi
if [ -f $i ] ; then do something with a file etc. ; fi
done
This will return the whole tree (recursively) in the current directory in a list that the loop will go through.
This can be easily achieved by mixing find, xargs, sed (or other file modification command).
For example:
$ find /path/to/base/dir -type f -name '*.properties' | xargs sed -ie '/^#/d'
This will filter all files with file extension .properties.
The xargs command will feed the file path generated by find command into the sed command.
The sed command will delete all lines start with # in the files (feed by xargs).
Command combination in this way is very flexible.
For example, find command have different parameters so you can filter by user name, file size, file path (eg: under /test/ subfolder), file modification time.
Another dimension of flexibility is how and what to change in your file. For ex, sed command allows you to make changes on file in applying substitution (specify via regular expressions). Similarly, you can use gzip to compress the file. And so on ...
You would usually use the find command. On Linux, you have the GNU version, of course. It has many extra (and useful) options. Both will allow you to execute a command (eg a shell script) on the files as they are found.
The exact details of how to make changes to the file depend on the change you want to make to the file. That is probably best scripted, with find running the script:
POSIX or GNU:
find . -type f -exec your_script '{}' +
This will run your script once for a group of files with those names provided as arguments. If you want to do it one file at a time, replace the + with ';' (or \;).
I am assuming SearchMe is the example directory name you need to traverse completely.
I am also assuming, since it was not specified, the files you want to modify are all text file. Is this correct?
In such scenario I would suggest using the command:
find SearchMe -type f -exec vi {} \;
If you are not familiar with vi editor, just use another one (nano, emacs, kate, kwrite, gedit, etc.) and it should work as well.
Bash 4+
shopt -s globstar
for file in **
do
if [ -f "$file" ];then
# do some processing to your file here
# where the find command can't do conveniently
fi
done

Resources