How can I search for files in directories that contain spaces in names, using "find"?

How can I search for files in directories that contain spaces in names, using "find"? - linux

How can I search for files in directories that contain spaces in names, using find?
i use script
#!/bin/bash
for i in `find "/tmp/1/" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
do
for j in `ls "$i" | grep sh | sed 's/\.txt//g'`
do
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
but the files and directories that contain spaces in names are not processed?

This will grab all the files that have spaces in them
$ls
more space nospace stillnospace this is space
$find -type f -name "* *"
./this is space
./more space

I don't know how to achieve you goal. But given your actual solution, the problem is not really with find but with the for loops since "spaces" are taken as delimiter between items.
find has a useful option for those cases:
from man find:
-print0
True; print the full file name on the standard output, followed by a null character
(instead of the newline character that -print uses). This allows file names
that contain newlines or other types of white space to be correctly interpreted
by programs that process the find output. This option corresponds to the -0
option of xargs.
As the man saids, this will match with the -0 option of xargs. Several other standard tools have the equivalent option. You probably have to rewrite your complex pipeline around those tools in order to process cleanly file names containing spaces.
In addition, see bash "for in" looping on null delimited string variable to learn how to use for loop with 0-terminated arguments.

Do it like this
find . -type f -name "* *"
Instead of . you can specify your path, where you want to find files with your criteria

Your first for loop is:
for i in `find "/tmp/1" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
If I understand it correctly, it is looking for all text files in the /tmp/1 directory, and then attempting to remove the file name with the sed command right? This would cause a single directory with multiple .txt files to be processed by the inner for loop more than once. Is that what you want?
Instead of using sed to get rid of the filename, you can use dirname instead. Also, later on, you use sed to get rid of the extension. You can use basename for that.
for i in `find "/tmp/1" -iname "*.txt"` ; do
path=$(dirname "$i")
for j in `ls $path | grep POD` ; do
file=$(basename "$j" .txt)
# Do what ever you want with the file
This doesn't solve the problem of having a single directory processed multiple times, but if it is an issue for you, you can use the for loop above to store the file name in an array instead and then remove duplicates with sort and uniq.

Use while read loop with null-delimited pathname output from find:
#!/bin/bash
while IFS= read -rd '' i; do
while IFS= read -rd '' j; do
find "/tmp/2/" -iname "$j.sh" -exec echo cp '{}' "$i" \;
done <(exec find "$i" -maxdepth 1 -mindepth 1 -name '*POD*' -not -name '*.txt' -printf '%f\0')
done <(exec find /tmp/1 -iname '*.txt' -not -iname '[0-9A-Za-z]*.txt' -print0)

Never used for i in $(find...) or similar as it'll fail for file names containing white space as you saw.
Use find ... | while IFS= read -r i instead.
It's hard to say without sample input and expected output but something like this might be what you need:
find "/tmp/1/" -iname "*.txt" |
while IFS= read -r i
do
i="${i%%[0-9A-Za-z]*\.txt}"
for j in "$i"/*sh*
do
j="${j%%\.txt}"
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
The above will still fail for file names that contains newlines. If you have that situation and can't fix the file names then look into the -print0 option for find, and piping it to xargs -0.

Related

How to read out a file line by line and for every line do a search with find and copy the search result to destination?

I hope you can help me with the following problem:
The Situation
I need to find files in various folders and copy them to another folder. The files and folders can contain white spaces and umlauts.
The filenames contain an ID and a string like:
"2022-01-11-02 super important file"
The filenames I need to find are collected in a textfile named ids.txt. This file only contains the IDs but not the whole filename as a string.
What I want to achieve:
I want to read out ids.txt line by line.
For every line in ids.txt I want to do a find search and copy cp the result to destination.
So far I tried:
for n in $(cat ids.txt); do find /home/alex/testzone/ -name "$n" -exec cp {} /home/alex/testzone/output \; ;
while read -r ids; do find /home/alex/testzone -name "$ids" -exec cp {} /home/alex/testzone/output \; ; done < ids.txt
The output folder remains empty. Not using -exec also gives no (search)results.
I was thinking that -name "$ids" is the root cause here. My files contain the ID + a String so I should search for names containing the ID plus a variable string (star)
As argument for -name I also tried "$ids *" "$ids"" *" and so on with no luck.
Is there an argument that I can use in conjunction with find instead of using the star in the -name argument?
Do you have any solution for me to automate this process in a bash script to read out ids.txt file, search the filenames and copy them over to specified folder?
In the end I would like to create a bash script that takes ids.txt and the search-folder and the output-folder as arguments like:
my-id-search.sh /home/alex/testzone/ids.txt /home/alex/testzone/ /home/alex/testzone/output
EDIT:
This is some example content of the ids.txt file where only ids are listed (not the whole filename):
2022-01-11-01
2022-01-11-02
2020-12-01-62
EDIT II:
Going on with the solution from tripleee:
#!/bin/bash
grep . $1 | while read -r id; do
echo "Der Suchbegriff lautet:"$id; echo;
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/ausgabe \;
done
In case my ids.txt file contains empty lines the -name "$id*" will be -name * which in turn finds all files and copies all files.
Trying to prevent empty line to be read does not seem to work. They should be filtered by the expression grep . $1 |. What am I doing wrong?

If your destination folder is always the same, the quickest and absolutely most elegant solution is to run a single find command to look for all of the files.
sed 's/.*/-o\n—name\n&*/' ids.txt |
xargs -I {} find -false {} -exec cp {} /home/alex/testzone/output +
The -false predicate is a bit of a hack to allow the list of actual predicates to start with -o (as in "or").
This could fail if ids.txt is too large to fit into a single xargs invocation, or if your sed does not understand \n to mean a literal newline.
(Here's a fix for the latter case:
xargs printf '-o\n-name\n%s*\n' <ids.txt |
...
Still the inherent problem with using xargs find like this is that xargs could split the list between -o and -name or between -name and the actual file name pattern if it needs to run more than one find command to process all the arguments.
A slightly hackish solution to that is to ensure that each pair is a single string, and then separately split them back out again:
xargs printf '-o_-name_%s*\n' <ids.txt |
xargs bash -c 'arr=("$#"); find -false ${arr[#]/-o_-name_/-o -name } -exec cp {} "$0"' /home/alex/testzone/ausgabe
where we temporarily hold the arguments in an array where each file name and its flags is a single item, and then replace the flags into separate tokens. This still won't work correctly if the file names you operate on contain literal shell metacharacters like * etc.)
A more mundane solution fixes your while read attempt by adding the missing wildcard in the -name argument. (I also took the liberty to rename the variable, since read will only read one argument at a time, so the variable name should be singular.)
while read -r id; do
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/output \;
done < ids.txt

Please try the following bash script copier.sh
#!/bin/bash
IFS=$'\n' # make newlines the only separator
set -f # disable globbing
file="files.txt" # name of file containing filenames
finish="finish" # destination directory
while read -r n ; do (
du -a | awk '{for(i=2;i<=NF;++i)printf $i" " ; print " "}' | grep $n | sed 's/ *$//g' | xargs -I '{}' cp '{}' $finish
);
done < $file
which copies recursively all the files named in files.txt from . and it's subfiles to ./finish
This new version works even if there are spaces in the directory names or file names.

how to cp files with spaces in the filename when files are provided by find

I would like to ensure that all files found by find with a given criteria are properly copied to the required location.
$from = '/some/path/to/the/files'
$ext = 'custom_file_extension'
$dest = '/new/destination/for/the/files/with/given/extension'
cp 'find $from -name "*.$ext"' $dest
The problem here is that, when a file found with the proper extension and it is containing space cp cannot copy it properly.

You don't do that. You can't splat filenames with spaces that way.
You either get to use something from http://mywiki.wooledge.org/BashFAQ/001 to read the output from find line-by-line or into an array or you use find -exec to do the copy work.
Something like this:
from='/some/path/to/the/files'
ext='custom_file_extension'
dest='/new/destination/for/the/files/with/given/extension'
find "$from" -name "*.$ext" -exec cp -t "$dest" {} +
Using -exec command + here means that find will only execute as many cp commands as it needs based on command length limits. Using -exec command ; here would run one cp-per-file-found (but is more portable to older systems).
See comment from gniourf_gniourf about the use of -t in that cp command to make -exec command + work correctly.

Use -exec:
find "$from" -name "*.$ext" -exec cp {} "$dest" \;

you need to copy file one by one:
for file in "$from"/*."$ext"; do
cp "$file" "$dest"
done
I just use glob here, and it's enough and complete. I think find may introduce problem if the file name contains funny character.

The solution for this sort of problem is xargs -0 and the -print0 flag for find.
-print0 instructs find to print the results with a NUL character termination, instead of a newline, while -0 for xargs tells it expect input in that format.
Finally, the -J option for xargs allows one to put the arguments in the right place for a copy.
find "$from" -name "*.$ext" -print0 | xargs -0 -J % cp % "$dest"

It's better to use -exec argument of find command to do this:
find . -type f -name "*.ext" -exec cp {} ./destination_dir \;
I've checked this case with files containing spaces and it's work for me. Also don't forger to point out '-type f' if you want to find only files, not directories.

Linux: Redirecting output of a command to "find"

I have a list of file names as output of certain command.
I need to find each of these files in a given directory.
I tried following command:
ls -R /home/ABC/testDir/ | grep "\.java" | xargs find /home/ABC/someAnotherDir -iname
But it is giving me following error:
find: paths must precede expression: XYZ.java
What would be the right way to do it?

ls -R /home/ABC/testDir/ | grep -F .java |
while read f; do find . -iname "$(basename $f)"; done
You can also use ${f##*/} instead of basename. Or;
find /home/ABC/testDir -iname '*.java*' |
while read f; do find . -iname "${f##*/}"; done
Note that, undoubtedly, many people will object to parsing the output of ls or find without using a null byte as filename separater, claiming that whitespace in filenames will cause problems. Those people usually ignore newlines in filenames, and their objections can be safely ignored. (As long as you don't allow whitespace in your filenames, that is!)
A better option is:
find /home/ABC/testDir -iname '*.java' -exec find . -iname {}
The reason xargs doesn't work is that is that you cannot pass 2 arguments to -iname within find.

find /home/ABC/testDir -name "\.java"

How do I rename lots of files changing the same filename element for each in Linux?

I'm trying to rename a load of files (I count over 200) that either have the company name in the filename, or in the text contents. I basically need to change any references to "company" to "newcompany", maintaining capitalisation where applicable (ie "Company becomes Newcompany", "company" becomes "newcompany"). I need to do this recursively.
Because the name could occur pretty much anywhere I've not been able to find example code anywhere that meets my requirements. It could be any of these examples, or more:
company.jpg
company.php
company.Class.php
company.Company.php
companysomething.jpg
Hopefully you get the idea. I not only need to do this with filenames, but also the contents of text files, such as HTML and PHP scripts. I'm presuming this would be a second command, but I'm not entirely sure what.
I've searched the codebase and found nearly 2000 mentions of the company name in nearly 300 files, so I don't fancy doing it manually.
Please help! :)

bash has powerful looping and substitution capabilities:
for filename in `find /root/of/where/files/are -name *company*`; do
mv $filename ${filename/company/newcompany}
done
for filename in `find /root/of/where/files/are -name *Company*`; do
mv $filename ${filename/Company/Newcompany}
done

For the file and directory names, use for, find, mv and sed.
For each path (f) that has company in the name, rename it (mv) from f to the new name where company is replaced by newcompany.
for f in `find -name '*company*'` ; do mv "$f" "`echo $f | sed s/company/nemcompany/`" ; done
For the file contents, use find, xargs and sed.
For every file, change company by newcompany in its content, keeping original file with extension .backup.
find -type f -print0 | xargs -0 sed -i .bakup 's/company/newcompany/g'

I'd suggest you take a look at man rename an extremely powerful perl-utility for, well, renaming files.
Standard syntax is
rename 's/\.htm$/\.html/' *.htm
the clever part is that the tool accept any perl-regexp as a pattern for a filename to be changed.
you might want to run it with the -n switch which will make the tool to only report what it would have changed.
Can't figure out a nice way to keep the capitalization right now, but since you already can search through the filestructure, issue several rename with different capitalization until all files are changed.
To loop through all files below current folder and to search for a particular string, you can use
find . -type f -exec grep -n -i STRING_TO_SEARCH_FOR /dev/null {} \;
The output from that command can be directed to a file (after some filtering to just extract the file names of the files that need to be changed).
find . /type ... > files_to_operate_on
Then wrap that in a while read loop and do some perl-magic for inplace-replacement
while read file
do
perl -pi -e 's/stringtoreplace/replacementstring/g' $file
done < files_to_operate_on

There are few right ways to recursively process files. Here's one:
while IFS= read -d $'\0' -r file ; do
newfile="${file//Company/Newcompany}"
newfile="${newfile//company/newcompany}"
mv -f "$file" "$newfile"
done < <(find /basedir/ -iname '*company*' -print0)
This will work with all possible file names, not just ones without whitespace in them.
Presumes bash.
For changing the contents of files I would advise caution because a blind replacement within a file could break things if the file is not plain text. That said, sed was made for this sort of thing.
while IFS= read -d $'\0' -r file ; do
sed -i '' -e 's/Company/Newcompany/g;s/company/newcompany/g'"$file"
done < <(find /basedir/ -iname '*company*' -print0)
For this run I recommend adding some additional switches to find to limit the files it will process, perhaps
find /basedir/ \( -iname '*company*' -and \( -iname '*.txt' -or -ianem '*.html' \) \) -print0

bash script collecting filenames seems to get confused by spaces

I'm trying to build a script that lists all the zip files in a set of directories, with some filters and get it to spit them out to file but when a filename has a space in it it seems to appear on a new line.
This list will eventually be used as an input to tar to gzip all the zip files, script is below:
#!/bin/bash
rm -f set1.txt
rm -f set2.txt
for line in $(find /home -type d -name assets ;);
do
echo $line >> set1.txt
for line in $(find $line -type f -name \*.zip -mtime +2 ;);
do
echo \"$line\" >> set2.txt
done;
This works as expected until you get a space in a filename then set2.txt contains entries like this:
"/home/xxxxxx/oldwebroot/htdocs/upload/assets/jobbags/rbjbCost"
"in"
"use"
"sept"
"2010.zip"
Does anyone know how I can get it to keep these filenames with spaces in in a single line with the whole lot wrapped in one set of quotes?
Thanks!

The correct way to loop over a set of files located via find is with a while read construct, thus:
while IFS= read -r -d '' line ; do
echo "$line" >> set1.txt
while IFS= read -r -d '' file ; do
printf '"%s"\n' "$file" >> set2.txt
done < <(find "$line" -type f -name \*.zip -mtime +2 -print0)
done < <(find /home -type d -name assets -print0)
For clarity I have given the inner loop variable a different name.
If you didn't have bash you'd have to issue the find command separately and redirect the output to a file, then read the file with while read ; do .. done < filename.
Note that each expansion of each variable is double-quoted. This is necessary.
Note also, however, that for what you want you can simply use the -printf switch to find, if you have GNU find.
find /home -type f -path '*/assets/*.zip' -mtime +2 -printf '"%p"\n' > set2.txt
Although, as #sarnold notes, this is not safe.

You should probably be executing your tar(1) command through some other mechanism; the find(1) program supports a -print0 option to request ASCII NUL-separated filename output, and the xargs(1) program supports a -0 option to tell it that the input is separated by ASCII NUL characters. (Since NUL is the only character that is not allowed in filenames, this is the only way to get reliable filename handling.)
Simply using the -print0 and -0 options will help but this still leaves the script open to another problem -- xargs(1) might decide to execute the tar(1) command two, three, or more times, depending upon its input. The last execution is the one that will "win", and the data from earlier invocations will be lost for ever. (This is useless as a backup.)
So you should also look into adding the --concatenate command line option to tar(1), too, so that it will add to the archive. It might make sense to perform the compression after all the files have been added, via gzip(1) or bzip2(1). (This does mean you need to remove the archive before a "fresh run" of this script.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string