Find & Xargs copy and rename via bash script

Find & Xargs copy and rename via bash script - linux

I have a folder with images and I need to copy them elsewhere removing special chars in the progress.
Lets say I have this
Folder1/ImageÑ%1.jpg
Folder1/ImageÑ%1-70x70.jpg
Folder1/ImageÑ%2.jpg
Folder1/ImageÑ%2-70x70.jpg
Folder1/ImageÑ%3.jpg
Folder1/ImageÑ%3-100x100.jpg
Folder1/ImageÑ%4.jpg
Folder1/ImageÑ%4-100x100.jpg
And I want to copy just the files that doesn't have "-70x70" or "-100x100" in the name to Folder2 and be like this (without any special char):
Folder2/Image1.jpg
Folder2/Image2.jpg
Folder2/Image3.jpg
Folder2/Image4.jpg
I've managed to copy the files, but I have no clue how to rename the files (and remove the special chars) in the same step. I think it's using SED but I can't figure out how.
find Folder1 -type f -regextype posix-extended \( ! -regex '.+\-[0-9]{2,4}x[0-9]{2,4}\.jpg' \) -print0 | xargs -0 cp -p --target-directory=Folder2
Thanks!

To remove weird characters, I recommend you detox
Then :
find Folder1 \( ! -name 'Image*-70x70*' -a ! -name 'Image*-100x100*' \) |
xargs -i% cp -p % Folder2
edit
find explanations :
the ! character is a negation
-a option is a AND condition
the parenthesis are there to enclose the conditions
Edit2
To do it on the fly like you ask (for substituting weird characters):
find Folder1 \( ! -name 'Image*-70x70*' -a ! -name 'Image*-100x100*' \) \
-exec bash -c '
file=$(echo "$1" | perl -pe "s/\303\221%/_weird_/g")
cp -p "$1" "Folder2/$file"
' -- {} \;

This maybe works for you:
#!/bin/bash
DIR=/your/Folder1
DEST=/your/Folder2
for f in `ls -1 $DIR | grep .jpg | egrep -v ".+\-[0-9]{2,4}x[0-9]{2,4}\.jpg"`
do
f=$(basename $f)
F=$(echo "$f" | sed "s/[^A-Z|a-z|0-9|\.]//g;")
cp -p "$DIR/$f" "$DEST/$F"
done
You can use MV instead of CP if you needit.
EDITED
This will get only the files you have to copy:
ls -1 $DIR/*.jpg | egrep -v ".+\-[0-9]{2,4}x[0-9]{2,4}\.jpg
basename command get only the file name in order to replace weir characters and prepare to move
sed it's replacing all character if the aren't letters and numbers (and a dot for file extension)

Something like this perhaps?
for file in Folder1/*.jpg; do
case $file in
*-70x70.jpg | *-100x100.jpg) ;;
*) cp "$file" "Folder2/Image${file#Folder1/Image?}" ;;
esac
done
The construct ${variable#pattern} removes any match for pattern from the beginning of the value of $variable. The pattern Folder1/Image? matches the literal text Folder1/Image (we prepend back the static Image part then) and one arbitrary character.
If the offending character is not a single byte, try doubling the question mark for a start; that will match two arbitrary characters (or bytes, if your locale etc are configured that way). If you have a combining Unicode sequence, it takes much more than that, though -- a hex dump of the actual bytes used to represent the file name would be helpful for diagnosing this. Anyway, you can simply trim from the other end instead, although that gets a bit more involved;
*) base=${file#Folder1/}
prefix=${base%%[0-9]*}
cp "$file" "Folder2/Image${base#$prefix}" ;;
The variant ${variable%pattern} trims from the end instead of the beginning, and doubling the # or % causes the shell to trim the longest possible match instead of the shortest. (I believe the longest-possible matching is a Bash extension?)

Related

How to read out a file line by line and for every line do a search with find and copy the search result to destination?

I hope you can help me with the following problem:
The Situation
I need to find files in various folders and copy them to another folder. The files and folders can contain white spaces and umlauts.
The filenames contain an ID and a string like:
"2022-01-11-02 super important file"
The filenames I need to find are collected in a textfile named ids.txt. This file only contains the IDs but not the whole filename as a string.
What I want to achieve:
I want to read out ids.txt line by line.
For every line in ids.txt I want to do a find search and copy cp the result to destination.
So far I tried:
for n in $(cat ids.txt); do find /home/alex/testzone/ -name "$n" -exec cp {} /home/alex/testzone/output \; ;
while read -r ids; do find /home/alex/testzone -name "$ids" -exec cp {} /home/alex/testzone/output \; ; done < ids.txt
The output folder remains empty. Not using -exec also gives no (search)results.
I was thinking that -name "$ids" is the root cause here. My files contain the ID + a String so I should search for names containing the ID plus a variable string (star)
As argument for -name I also tried "$ids *" "$ids"" *" and so on with no luck.
Is there an argument that I can use in conjunction with find instead of using the star in the -name argument?
Do you have any solution for me to automate this process in a bash script to read out ids.txt file, search the filenames and copy them over to specified folder?
In the end I would like to create a bash script that takes ids.txt and the search-folder and the output-folder as arguments like:
my-id-search.sh /home/alex/testzone/ids.txt /home/alex/testzone/ /home/alex/testzone/output
EDIT:
This is some example content of the ids.txt file where only ids are listed (not the whole filename):
2022-01-11-01
2022-01-11-02
2020-12-01-62
EDIT II:
Going on with the solution from tripleee:
#!/bin/bash
grep . $1 | while read -r id; do
echo "Der Suchbegriff lautet:"$id; echo;
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/ausgabe \;
done
In case my ids.txt file contains empty lines the -name "$id*" will be -name * which in turn finds all files and copies all files.
Trying to prevent empty line to be read does not seem to work. They should be filtered by the expression grep . $1 |. What am I doing wrong?

If your destination folder is always the same, the quickest and absolutely most elegant solution is to run a single find command to look for all of the files.
sed 's/.*/-o\n—name\n&*/' ids.txt |
xargs -I {} find -false {} -exec cp {} /home/alex/testzone/output +
The -false predicate is a bit of a hack to allow the list of actual predicates to start with -o (as in "or").
This could fail if ids.txt is too large to fit into a single xargs invocation, or if your sed does not understand \n to mean a literal newline.
(Here's a fix for the latter case:
xargs printf '-o\n-name\n%s*\n' <ids.txt |
...
Still the inherent problem with using xargs find like this is that xargs could split the list between -o and -name or between -name and the actual file name pattern if it needs to run more than one find command to process all the arguments.
A slightly hackish solution to that is to ensure that each pair is a single string, and then separately split them back out again:
xargs printf '-o_-name_%s*\n' <ids.txt |
xargs bash -c 'arr=("$#"); find -false ${arr[#]/-o_-name_/-o -name } -exec cp {} "$0"' /home/alex/testzone/ausgabe
where we temporarily hold the arguments in an array where each file name and its flags is a single item, and then replace the flags into separate tokens. This still won't work correctly if the file names you operate on contain literal shell metacharacters like * etc.)
A more mundane solution fixes your while read attempt by adding the missing wildcard in the -name argument. (I also took the liberty to rename the variable, since read will only read one argument at a time, so the variable name should be singular.)
while read -r id; do
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/output \;
done < ids.txt

Please try the following bash script copier.sh
#!/bin/bash
IFS=$'\n' # make newlines the only separator
set -f # disable globbing
file="files.txt" # name of file containing filenames
finish="finish" # destination directory
while read -r n ; do (
du -a | awk '{for(i=2;i<=NF;++i)printf $i" " ; print " "}' | grep $n | sed 's/ *$//g' | xargs -I '{}' cp '{}' $finish
);
done < $file
which copies recursively all the files named in files.txt from . and it's subfiles to ./finish
This new version works even if there are spaces in the directory names or file names.

How can I search for files in directories that contain spaces in names, using "find"?

How can I search for files in directories that contain spaces in names, using find?
i use script
#!/bin/bash
for i in `find "/tmp/1/" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
do
for j in `ls "$i" | grep sh | sed 's/\.txt//g'`
do
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
but the files and directories that contain spaces in names are not processed?

This will grab all the files that have spaces in them
$ls
more space nospace stillnospace this is space
$find -type f -name "* *"
./this is space
./more space

I don't know how to achieve you goal. But given your actual solution, the problem is not really with find but with the for loops since "spaces" are taken as delimiter between items.
find has a useful option for those cases:
from man find:
-print0
True; print the full file name on the standard output, followed by a null character
(instead of the newline character that -print uses). This allows file names
that contain newlines or other types of white space to be correctly interpreted
by programs that process the find output. This option corresponds to the -0
option of xargs.
As the man saids, this will match with the -0 option of xargs. Several other standard tools have the equivalent option. You probably have to rewrite your complex pipeline around those tools in order to process cleanly file names containing spaces.
In addition, see bash "for in" looping on null delimited string variable to learn how to use for loop with 0-terminated arguments.

Do it like this
find . -type f -name "* *"
Instead of . you can specify your path, where you want to find files with your criteria

Your first for loop is:
for i in `find "/tmp/1" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
If I understand it correctly, it is looking for all text files in the /tmp/1 directory, and then attempting to remove the file name with the sed command right? This would cause a single directory with multiple .txt files to be processed by the inner for loop more than once. Is that what you want?
Instead of using sed to get rid of the filename, you can use dirname instead. Also, later on, you use sed to get rid of the extension. You can use basename for that.
for i in `find "/tmp/1" -iname "*.txt"` ; do
path=$(dirname "$i")
for j in `ls $path | grep POD` ; do
file=$(basename "$j" .txt)
# Do what ever you want with the file
This doesn't solve the problem of having a single directory processed multiple times, but if it is an issue for you, you can use the for loop above to store the file name in an array instead and then remove duplicates with sort and uniq.

Use while read loop with null-delimited pathname output from find:
#!/bin/bash
while IFS= read -rd '' i; do
while IFS= read -rd '' j; do
find "/tmp/2/" -iname "$j.sh" -exec echo cp '{}' "$i" \;
done <(exec find "$i" -maxdepth 1 -mindepth 1 -name '*POD*' -not -name '*.txt' -printf '%f\0')
done <(exec find /tmp/1 -iname '*.txt' -not -iname '[0-9A-Za-z]*.txt' -print0)

Never used for i in $(find...) or similar as it'll fail for file names containing white space as you saw.
Use find ... | while IFS= read -r i instead.
It's hard to say without sample input and expected output but something like this might be what you need:
find "/tmp/1/" -iname "*.txt" |
while IFS= read -r i
do
i="${i%%[0-9A-Za-z]*\.txt}"
for j in "$i"/*sh*
do
j="${j%%\.txt}"
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
The above will still fail for file names that contains newlines. If you have that situation and can't fix the file names then look into the -print0 option for find, and piping it to xargs -0.

How to use sed to change file extensions?

I have to do a sed line (also using pipes in Linux) to change a file extension, so I can do some kind of mv *.1stextension *.2ndextension like mv *.txt *.c. The thing is that I can't use batch or a for loop, so I have to do it all with pipes and sed command.

you can use string manipulation
filename="file.ext1"
mv "${filename}" "${filename/%ext1/ext2}"
Or if your system support, you can use rename.
Update
you can also do something like this
mv ${filename}{ext1,ext2}
which is called brace expansion

sed is for manipulating the contents of files, not the filename itself. My suggestion:
rename 's/\.ext/\.newext/' ./*.ext
Or, there's this existing question which should help.

This may work:
find . -name "*.txt" |
sed -e 's|./||g' |
awk '{print "mv",$1, $1"c"}' |
sed -e "s|\.txtc|\.c|g" > table;
chmod u+x table;
./table
I don't know why you can't use a loop. It makes life much easier :
newex="c"; # Give your new extension
for file in *.*; # You can replace with *.txt instead of *.*
do
ex="${file##*.}"; # This retrieves the file extension
ne=$(echo "$file" | sed -e "s|$ex|$newex|g"); # Replaces current with the new one
echo "$ex";echo "$ne";
mv "$file" "$ne";
done

You can use find to find all of the files and then pipe that into a while read loop:
$ find . -name "*.ext1" -print0 | while read -d $'\0' file
do
mv $file "${file%.*}.ext2"
done
The ${file%.*} is the small right pattern filter. The % marks the pattern to remove from the right side (matching the smallest glob pattern possible), The .* is the pattern (the last . followed by the characters after the .).
The -print0 will separate file names with the NUL character instead of \n. The -d $'\0' will read in file names separated by the NUL character. This way, file names with spaces, tabs, \n, or other wacky characters will be processed correctly.

You may try following options
Option 1 find along with rename
find . -type f -name "*.ext1" -exec rename -f 's/\.ext1$/ext2/' {} \;
Option 2 find along with mv
find . -type f -name "*.ext1" -exec sh -c 'mv -f $0 ${0%.ext1}.ext2' {} \;
Note: It is observed that rename doesn't work for many terminals

Another solution only with sed and sh
printf "%s\n" *.ext1 |
sed "s/'/'\\\\''/g"';s/\(.*\)'ext1'/mv '\''\1'ext1\'' '\''\1'ext2\''/g' |
sh
for better performance: only one process created
perl -le '($e,$f)=#ARGV;map{$o=$_;s/$e$/$f/;rename$o,$_}<*.$e>' ext2 ext3

well this should work
mv $file $(echo $file | sed -E -e 's/.xml.bak.*/.xml/g' | sed -E -e 's/.\///g')
output
abc.xml.bak.foobar -> abc.xml

Remove files not containing a specific string

I want to find the files not containing a specific string (in a directory and its sub-directories) and remove those files. How I can do this?

The following will work:
find . -type f -print0 | xargs --null grep -Z -L 'my string' | xargs --null rm
This will firstly use find to print the names of all the files in the current directory and any subdirectories. These names are printed with a null terminator rather than the usual newline separator (try piping the output to od -c to see the effect of the -print0 argument.
Then the --null parameter to xargs tells it to accept null-terminated inputs. xargs will then call grep on a list of filenames.
The -Z argument to grep works like the -print0 argument to find, so grep will print out its results null-terminated (which is why the final call to xargs needs a --null option too). The -L argument to grep causes grep to print the filenames of those files on its command line (that xargs has added) which don't match the regular expression:
my string
If you want simple matching without regular expression magic then add the -F option. If you want more powerful regular expressions then give a -E argument. It's a good habit to use single quotes rather than double quotes as this protects you against any shell magic being applied to the string (such as variable substitution)
Finally you call xargs again to get rid of all the files that you've found with the previous calls.
The problem with calling grep directly from the find command with the -exec argument is that grep then gets invoked once per file rather than once for a whole batch of files as xargs does. This is much faster if you have lots of files. Also don't be tempted to do stuff like:
rm $(some command that produces lots of filenames)
It's always better to pass it to xargs as this knows the maximum command-line limits and will call rm multiple times each time with as many arguments as it can.
Note that this solution would have been simpler without the need to cope with files containing white space and new lines.
Alternatively
grep -r -L -Z 'my string' . | xargs --null rm
will work too (and is shorter). The -r argument to grep causes it to read all files in the directory and recursively descend into any subdirectories). Use the find ... approach if you want to do some other tests on the files as well (such as age or permissions).
Note that any of the single letter arguments, with a single dash introducer, can be grouped together (for instance as -rLZ). But note also that find does not use the same conventions and has multi-letter arguments introduced with a single dash. This is for historical reasons and hasn't ever been fixed because it would have broken too many scripts.

GNU grep and bash.
grep -rLZ "$str" . | while IFS= read -rd '' x; do rm "$x"; done
Use a find solution if portability is needed. This is slightly faster.

EDIT: This is how you SHOULD NOT do this! Reason is given here. Thanks to #ormaaj for pointing it out!
find . -type f | grep -v "exclude string" | xargs rm
Note: grep pattern will match against full file path from current directory (see find . -type f output)

One possibility is
find . -type f '!' -exec grep -q "my string" {} \; -exec echo rm {} \;
You can remove the echo if the output of this preview looks correct.
The equivalent with -delete is
find . -type f '!' -exec grep -q "user_id" {} \; -delete
but then you don't get the nice preview option.

To remove files not containing a specific string:
Bash:
To use them, enable the extglob shell option as follows:
shopt -s extglob
And just remove all files that don't have the string "fix":
rm !(*fix*)
If you want to don't delete all the files that don't have the names "fix" and "class":
rm !(*fix*|*class*)
Zsh:
To use them, enable the extended glob zsh shell option as follows:
setopt extended_glob
Remove all files that don't have the string, in this example "fix":
rm -- ^*fix*
If you want to don't delete all the files that don't have the names "fix" and "class":
rm -- ^(*fix*|*class*)
It's possible to use it for extensions, you only need to change the regex: (.zip) , (.doc), etc.
Here are the sources:
https://www.tecmint.com/delete-all-files-in-directory-except-one-few-file-extensions/
https://codeday.me/es/qa/20190819/1296122.html

I can think of a few ways to approach this. Here's one: find and grep to generate a list of files with no match, and then xargs rm them.
find yourdir -type f -exec grep -F -L 'yourstring' '{}' + | xargs -d '\n' rm
This assumes GNU tools (grep -L and xargs -d are non-portable) and of course no filenames with newlines in them. It has the advantage of not running grep and rm once per file, so it'll be reasonably fast. I recommend testing it with "echo" in place of "rm" just to make sure it picks the right files before you unleash the destruction.

This worked for me, you can remove the -f if you're okay with deleting directories.
myString="keepThis"
for x in `find ./`
do if [[ -f $x && ! $x =~ $myString ]]
then rm $x
fi
done

Another solution (although not as fast). The top solution didn't work in my case because the string I needed to use in place of 'my string' has special characters.
find -type f ! -name "*my string*" -exec rm {} \; -print

Strip leading dot from filenames bash script

I have some files in a bunch of directories that have a leading dot and thus are hidden. I would like to revert that and strip the leading dot.
I was unsuccessful with the following:
for file in `find files/ -type f`;
do
base=`basename $file`
if [ `$base | cut -c1-2` = "." ];
then newname=`$base | cut -c2-`;
dirs=`dirname $file`;
echo $dirs/$newname;
fi
done
Which fails on the condition statement:
[: =: unary operator expected
Furthermore, some files have a space in them and file returns them split.
Any help would be appreciated.

The easiest way to delete something from the start of a variable is to use ${var#pattern}.
$ FILENAME=.bashrc; echo "${FILENAME#.}"
bashrc
$ FILENAME=/etc/fstab; echo "${FILENAME#.}"
/etc/fstab
See the bash man page:
${parameter#word}
${parameter##word}
The word is expanded to produce a pattern just as in pathname expansion. If the pattern matches the beginning of the value of parameter, then the result of the expansion is
the expanded value of parameter with the shortest matching pattern (the ‘‘#’’ case) or the longest matching pattern (the ‘‘##’’ case) deleted.
By the way, with a more selective find command you don't need to do all the hard work. You can have find only match files with a leading dot:
find files/ -type f -name '.*'
Throwing that all together, then:
find files/ -type f -name '.*' -printf '%P\0' |
while read -d $'\0' path; do
dir=$(dirname "$path")
file=$(basename "$path")
mv "$dir/$file" "$dir/${file#.}"
done
Additional notes:
To handle file names with spaces properly you need to quote variable names when you reference them. Write "$file" instead of just $file.
For extra robustness the -printf '\0' and read -d $'\0' use NUL characters as delimiters so even file names with embedded newlines '\n' will work.

find files/ -name '.*' -printf '%f\n'|while read f; do
mv "files/$f" "files/${f#.}"
done

This script works with any file you
can throw at it, even if they have
spaces, newlines or other nefarious
characters in their name.
It works no matter how many subdirectories deep the hidden file is
Unlike other answers thus far, you
don't have to change the rest of the script when you change the path
given to find
*Note: I included an echo so that you can test it like a dry-run. Remove the single echo if you are satisfied with the results.
find . -name '.*' -exec sh -c 'for arg; do d="${arg%/*}"; f=${arg:${#d}}; echo mv "$arg" "$d/${f#.}"; done' _ {} +

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string