Passing filename as variable from find's exec into a second exec command - linux

From reading this stackoverflow answer I was able to remove the file extension from the files using find:
find . -name "S4*" -execdir basename {} .fastq.gz ';'
returned:
S9_S34_R1_001
S9_S34_R2_001
I'm making a batch script where I want to extract the filename with the above prefix to pass as arguments into a program. At the moment I'm currently doing this with a loop but am wondering if it can be achieved using find.
for i in $(ls | grep 'S9_S34*' | cut -d '.' -f 1); do echo "$i"_trim.log "$i"_R1_001.fastq.gz "$i"_R2_001.fastq.gz; done; >> trim_script.sh
Is it possible to do something as follows:
find . -name "S4*" -execdir basename {} .fastq.gz ';' | echo {}_trim.log {}_R1_001.fastq.gz {}_R2_001.fastq.gz {}\ ; >> trim_script.sh

You don't need basename at all, or -exec, if all you're doing is generating a series of strings that contain your file's basenames within them; the -printf action included in GNU find can do all that for you, as it provides a %P built-in to insert the basename of your file:
find . -name "S4*" \
-printf '%P_trim.log %P_R1_001.fastq.gz %P_R2_001.fastq.gz %P\n' \
>trim_script.sh
That said, be sure you only do this if you trust your filenames. If you're truly running the result as a script, there are serious security concerns if someone could create a S4$(rm -rf ~).txt file, or something with a similarly malicious name.
What if you don't trust your filenames, or don't have the GNU version of find? Then consider making find pass them into a shell (like bash or ksh) that supports the %q extension, to generate a safely-escaped version of those names (note that you should run the script with the same interpreter you used for this escaping):
find . -name "S4*" -exec bash -c '
for file do # iterates over "$#", so processes each file in turn
file=${file##*/} # get the basename
printf "%q_trim.log %q_R1_001.fastq.gz %q_R2_001.fastq.gz %q\n" \
"$file" "$file" "$file" "$file"
done
' _ {} + >trim_script.sh
Using -exec ... {} + invokes the smallest possible number of subprocesses -- not one per file found, but instead one per batch of filenames (using the largest possible batch that can fit on a command line).

Related

Passing linux command as a command line argument to shell script

Following command
"find . -type f -regextype posix-extended -regex './ctrf.|./rbc.' -exec basename {} ;"
And executing it.
I am stroring the command in variable in shell script link
Find_Command=$1
For Execution
Files="$(${Find_Command})"
Not working.
Best Practice: Accept An Array, Not A String
First, your shell script should take the command to run as a series of separate arguments, not a single argument.
#!/usr/bin/env bash
readarray -d '' Files < <("$#")
echo "Found ${#Files[#]} files" >&2
printf ' - %q\n' "${Files[#]}"
called as:
./yourscript find . -type f -regextype posix-extended -regex './ctrf.*|./rbc.*' -printf '%f\0'
Note that there's no reason to use the external basename command: find -printf can directly print you only the filename.
Fallback: Parsing A String To An Array Correctly
If you must accept a string, you can use the answers in Reading quoted/escaped arguments correctly from a string to convert that string to an array safely.
Compromising complete shell compatibility to avoid needing nonstandard tools, we can use xargs:
#!/usr/bin/env bash
readarray -d '' Command_Arr < <(xargs printf '%s\0' <<<"$1")
readarray -d '' Files < <("${Command_Arr[#]}")
echo "Found ${#Files[#]} files" >&2
printf ' - %q\n' "${Files[#]}"
...with your script called as:
./yourscript $'find . -type f -regextype posix-extended -regex \'./ctrf.*|./rbc.*\' -printf \'%f\\0\''
If you want to run a command specified in a variable and save the output in another variable, you can use following commands.
command="find something" output=$($command)
Or if you want to store output in array:
typeset -a output=$($command)
However, storing filenames in variables and then attempting to access files with those filenames is a bad idea because it is impossible to set the proper delimiter to separate filenames because filenames can contain any character except NUL (see https://mywiki.wooledge.org/BashPitfalls).
I'm not sure what you're trying to accomplish, but your find command contains an error. The -exec option must end with ; to indicate the end of the -exec parameters. Aside from that, it appears to be 'The xy problem' see https://xyproblem.info/
If you want to get basename of regular files with the extension .ctrf or.rbc, use the bash script below.
for x in **/*.+(ctrf|rbc); do basename $x ; done
Or zsh script
basename **/*.(ctrf|rbc)(#q.)
Make sure you have enabled 'extended glob' option in your shell.
To enable it in bash run following comand.
shopt -s extglob
And for zsh
setopt extendedglob
You should use array instead of string for Find_Command :
#!/usr/bin/env bash
Find_Command=(find . -type f -regextype posix-extended -regex '(./ctrf.|./rbc.)' -exec basename {} \;)
Files=($(“${Find_Command[#]}”))
Second statement assumes you don't have special characters (like spaces) in your file names.
Use eval:
Files=$(eval "${Find_Command}")
Be mindful of keeping the parameter sanitized and secure.

How to read out a file line by line and for every line do a search with find and copy the search result to destination?

I hope you can help me with the following problem:
The Situation
I need to find files in various folders and copy them to another folder. The files and folders can contain white spaces and umlauts.
The filenames contain an ID and a string like:
"2022-01-11-02 super important file"
The filenames I need to find are collected in a textfile named ids.txt. This file only contains the IDs but not the whole filename as a string.
What I want to achieve:
I want to read out ids.txt line by line.
For every line in ids.txt I want to do a find search and copy cp the result to destination.
So far I tried:
for n in $(cat ids.txt); do find /home/alex/testzone/ -name "$n" -exec cp {} /home/alex/testzone/output \; ;
while read -r ids; do find /home/alex/testzone -name "$ids" -exec cp {} /home/alex/testzone/output \; ; done < ids.txt
The output folder remains empty. Not using -exec also gives no (search)results.
I was thinking that -name "$ids" is the root cause here. My files contain the ID + a String so I should search for names containing the ID plus a variable string (star)
As argument for -name I also tried "$ids *" "$ids"" *" and so on with no luck.
Is there an argument that I can use in conjunction with find instead of using the star in the -name argument?
Do you have any solution for me to automate this process in a bash script to read out ids.txt file, search the filenames and copy them over to specified folder?
In the end I would like to create a bash script that takes ids.txt and the search-folder and the output-folder as arguments like:
my-id-search.sh /home/alex/testzone/ids.txt /home/alex/testzone/ /home/alex/testzone/output
EDIT:
This is some example content of the ids.txt file where only ids are listed (not the whole filename):
2022-01-11-01
2022-01-11-02
2020-12-01-62
EDIT II:
Going on with the solution from tripleee:
#!/bin/bash
grep . $1 | while read -r id; do
echo "Der Suchbegriff lautet:"$id; echo;
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/ausgabe \;
done
In case my ids.txt file contains empty lines the -name "$id*" will be -name * which in turn finds all files and copies all files.
Trying to prevent empty line to be read does not seem to work. They should be filtered by the expression grep . $1 |. What am I doing wrong?
If your destination folder is always the same, the quickest and absolutely most elegant solution is to run a single find command to look for all of the files.
sed 's/.*/-o\n—name\n&*/' ids.txt |
xargs -I {} find -false {} -exec cp {} /home/alex/testzone/output +
The -false predicate is a bit of a hack to allow the list of actual predicates to start with -o (as in "or").
This could fail if ids.txt is too large to fit into a single xargs invocation, or if your sed does not understand \n to mean a literal newline.
(Here's a fix for the latter case:
xargs printf '-o\n-name\n%s*\n' <ids.txt |
...
Still the inherent problem with using xargs find like this is that xargs could split the list between -o and -name or between -name and the actual file name pattern if it needs to run more than one find command to process all the arguments.
A slightly hackish solution to that is to ensure that each pair is a single string, and then separately split them back out again:
xargs printf '-o_-name_%s*\n' <ids.txt |
xargs bash -c 'arr=("$#"); find -false ${arr[#]/-o_-name_/-o -name } -exec cp {} "$0"' /home/alex/testzone/ausgabe
where we temporarily hold the arguments in an array where each file name and its flags is a single item, and then replace the flags into separate tokens. This still won't work correctly if the file names you operate on contain literal shell metacharacters like * etc.)
A more mundane solution fixes your while read attempt by adding the missing wildcard in the -name argument. (I also took the liberty to rename the variable, since read will only read one argument at a time, so the variable name should be singular.)
while read -r id; do
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/output \;
done < ids.txt
Please try the following bash script copier.sh
#!/bin/bash
IFS=$'\n' # make newlines the only separator
set -f # disable globbing
file="files.txt" # name of file containing filenames
finish="finish" # destination directory
while read -r n ; do (
du -a | awk '{for(i=2;i<=NF;++i)printf $i" " ; print " "}' | grep $n | sed 's/ *$//g' | xargs -I '{}' cp '{}' $finish
);
done < $file
which copies recursively all the files named in files.txt from . and it's subfiles to ./finish
This new version works even if there are spaces in the directory names or file names.

Only list files; iconv and directories?

I want to convert the coding of some csv-files with iconv. It has to be a script so I am working with while; do done. The script lists every item in a specific directory and converts them into another coding (utf-8).
Currently, my script lists EVERY item, including directories... So here are my questions
Does iconv has a problem with directories or does it ignore them?
And if there is a problem, how can I only list/search only for files?
I tried How to list only files in Bash? a ***./*** at the beginning of every item and that's kinda annoying (and my program doesn't like it, too).
Another possibility is ls -p | grep -v / but this would also affect files with / in the name, wouldn't it?
I hope you can help me. Thank you.
Here is the code:
for item in $(ls directory/); do
FileName=$item
iconv -f "windows-1252" -t "UTF-8" FileName -o FileName
done
Yea, i know, the input and output file cannot be the same^^
Use find directly:
find . -maxdepth 1 -type f -exec bash -c 'iconv -f "windows-1252" -t "UTF-8" $1 > $1.converted && mv $1.converted $1' -- {} \;
find . -maxdepth 1 -type f finds all files in the working directory
-exec ... executes a command on each such file (including correct handling of e.g. spaces or newlines in the filename)
bash -c '...' executes the command in '...' in a subshell (easier to do the subsequent steps, involving multiple expansions of the filename, this way)
-- terminates option processing, and treats anything after the -- as arguments to the call.
{} is replaced by find with the file name(s) found
$1 in the bash command is replaced with the first (and only) argument, which is the {} replaced by the filename (see above)
\; tells find where the -exec'ed command ends.
Building upon the existing question that you referenced, Why don't you just remove the first 2 characters i.e. ./?
find . -maxdepth 1 -type f | cut -c 3-
Edit: I agree with #DevSolar about the space-based problem in the for-loop. While I think that his solution is better for this problem, I just want to give an alternative way to get out of the space-based for-loop issue.
OLD_IFS=$IFS
IFS=$'\n'
for item in $(find . -maxdepth 1 -type f | cut -c 3-); do
FileName=$item
iconv -f "windows-1252" -t "UTF-8" FileName -o FileName
done
IFS=$OLD_IFS

Remove files not containing a specific string

I want to find the files not containing a specific string (in a directory and its sub-directories) and remove those files. How I can do this?
The following will work:
find . -type f -print0 | xargs --null grep -Z -L 'my string' | xargs --null rm
This will firstly use find to print the names of all the files in the current directory and any subdirectories. These names are printed with a null terminator rather than the usual newline separator (try piping the output to od -c to see the effect of the -print0 argument.
Then the --null parameter to xargs tells it to accept null-terminated inputs. xargs will then call grep on a list of filenames.
The -Z argument to grep works like the -print0 argument to find, so grep will print out its results null-terminated (which is why the final call to xargs needs a --null option too). The -L argument to grep causes grep to print the filenames of those files on its command line (that xargs has added) which don't match the regular expression:
my string
If you want simple matching without regular expression magic then add the -F option. If you want more powerful regular expressions then give a -E argument. It's a good habit to use single quotes rather than double quotes as this protects you against any shell magic being applied to the string (such as variable substitution)
Finally you call xargs again to get rid of all the files that you've found with the previous calls.
The problem with calling grep directly from the find command with the -exec argument is that grep then gets invoked once per file rather than once for a whole batch of files as xargs does. This is much faster if you have lots of files. Also don't be tempted to do stuff like:
rm $(some command that produces lots of filenames)
It's always better to pass it to xargs as this knows the maximum command-line limits and will call rm multiple times each time with as many arguments as it can.
Note that this solution would have been simpler without the need to cope with files containing white space and new lines.
Alternatively
grep -r -L -Z 'my string' . | xargs --null rm
will work too (and is shorter). The -r argument to grep causes it to read all files in the directory and recursively descend into any subdirectories). Use the find ... approach if you want to do some other tests on the files as well (such as age or permissions).
Note that any of the single letter arguments, with a single dash introducer, can be grouped together (for instance as -rLZ). But note also that find does not use the same conventions and has multi-letter arguments introduced with a single dash. This is for historical reasons and hasn't ever been fixed because it would have broken too many scripts.
GNU grep and bash.
grep -rLZ "$str" . | while IFS= read -rd '' x; do rm "$x"; done
Use a find solution if portability is needed. This is slightly faster.
EDIT: This is how you SHOULD NOT do this! Reason is given here. Thanks to #ormaaj for pointing it out!
find . -type f | grep -v "exclude string" | xargs rm
Note: grep pattern will match against full file path from current directory (see find . -type f output)
One possibility is
find . -type f '!' -exec grep -q "my string" {} \; -exec echo rm {} \;
You can remove the echo if the output of this preview looks correct.
The equivalent with -delete is
find . -type f '!' -exec grep -q "user_id" {} \; -delete
but then you don't get the nice preview option.
To remove files not containing a specific string:
Bash:
To use them, enable the extglob shell option as follows:
shopt -s extglob
And just remove all files that don't have the string "fix":
rm !(*fix*)
If you want to don't delete all the files that don't have the names "fix" and "class":
rm !(*fix*|*class*)
Zsh:
To use them, enable the extended glob zsh shell option as follows:
setopt extended_glob
Remove all files that don't have the string, in this example "fix":
rm -- ^*fix*
If you want to don't delete all the files that don't have the names "fix" and "class":
rm -- ^(*fix*|*class*)
It's possible to use it for extensions, you only need to change the regex: (.zip) , (.doc), etc.
Here are the sources:
https://www.tecmint.com/delete-all-files-in-directory-except-one-few-file-extensions/
https://codeday.me/es/qa/20190819/1296122.html
I can think of a few ways to approach this. Here's one: find and grep to generate a list of files with no match, and then xargs rm them.
find yourdir -type f -exec grep -F -L 'yourstring' '{}' + | xargs -d '\n' rm
This assumes GNU tools (grep -L and xargs -d are non-portable) and of course no filenames with newlines in them. It has the advantage of not running grep and rm once per file, so it'll be reasonably fast. I recommend testing it with "echo" in place of "rm" just to make sure it picks the right files before you unleash the destruction.
This worked for me, you can remove the -f if you're okay with deleting directories.
myString="keepThis"
for x in `find ./`
do if [[ -f $x && ! $x =~ $myString ]]
then rm $x
fi
done
Another solution (although not as fast). The top solution didn't work in my case because the string I needed to use in place of 'my string' has special characters.
find -type f ! -name "*my string*" -exec rm {} \; -print

bash script collecting filenames seems to get confused by spaces

I'm trying to build a script that lists all the zip files in a set of directories, with some filters and get it to spit them out to file but when a filename has a space in it it seems to appear on a new line.
This list will eventually be used as an input to tar to gzip all the zip files, script is below:
#!/bin/bash
rm -f set1.txt
rm -f set2.txt
for line in $(find /home -type d -name assets ;);
do
echo $line >> set1.txt
for line in $(find $line -type f -name \*.zip -mtime +2 ;);
do
echo \"$line\" >> set2.txt
done;
This works as expected until you get a space in a filename then set2.txt contains entries like this:
"/home/xxxxxx/oldwebroot/htdocs/upload/assets/jobbags/rbjbCost"
"in"
"use"
"sept"
"2010.zip"
Does anyone know how I can get it to keep these filenames with spaces in in a single line with the whole lot wrapped in one set of quotes?
Thanks!
The correct way to loop over a set of files located via find is with a while read construct, thus:
while IFS= read -r -d '' line ; do
echo "$line" >> set1.txt
while IFS= read -r -d '' file ; do
printf '"%s"\n' "$file" >> set2.txt
done < <(find "$line" -type f -name \*.zip -mtime +2 -print0)
done < <(find /home -type d -name assets -print0)
For clarity I have given the inner loop variable a different name.
If you didn't have bash you'd have to issue the find command separately and redirect the output to a file, then read the file with while read ; do .. done < filename.
Note that each expansion of each variable is double-quoted. This is necessary.
Note also, however, that for what you want you can simply use the -printf switch to find, if you have GNU find.
find /home -type f -path '*/assets/*.zip' -mtime +2 -printf '"%p"\n' > set2.txt
Although, as #sarnold notes, this is not safe.
You should probably be executing your tar(1) command through some other mechanism; the find(1) program supports a -print0 option to request ASCII NUL-separated filename output, and the xargs(1) program supports a -0 option to tell it that the input is separated by ASCII NUL characters. (Since NUL is the only character that is not allowed in filenames, this is the only way to get reliable filename handling.)
Simply using the -print0 and -0 options will help but this still leaves the script open to another problem -- xargs(1) might decide to execute the tar(1) command two, three, or more times, depending upon its input. The last execution is the one that will "win", and the data from earlier invocations will be lost for ever. (This is useless as a backup.)
So you should also look into adding the --concatenate command line option to tar(1), too, so that it will add to the archive. It might make sense to perform the compression after all the files have been added, via gzip(1) or bzip2(1). (This does mean you need to remove the archive before a "fresh run" of this script.)

Resources