Problems with Grep Command in bash script - linux

I'm having some rather unusual problems using grep in a bash script. Below is an example of the bash script code that I'm using that exhibits the behaviour:
UNIQ_SCAN_INIT_POINT=1
cat "$FILE_BASENAME_LIST" | uniq -d >> $UNIQ_LIST
sed '/^$/d' $UNIQ_LIST >> $UNIQ_LIST_FINAL
UNIQ_LINE_COUNT=`wc -l $UNIQ_LIST_FINAL | cut -d \ -f 1`
while [ -n "`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`" ]; do
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
CURRENT_DUPECHK_FILE=$FILE_DUPEMATCH-$CURRENT_LINE
grep $CURRENT_LINE $FILE_LOCTN_LIST >> $CURRENT_DUPECHK_FILE
MATCH=`grep -c $CURRENT_LINE $FILE_BASENAME_LIST`
CMD_ECHO="$CURRENT_LINE matched $MATCH times," cmd_line_echo
echo "$CURRENT_DUPECHK_FILE" >> $FILE_DUPEMATCH_FILELIST
let UNIQ_SCAN_INIT_POINT=UNIQ_SCAN_INIT_POINT+1
done
On numerous occasions, when grepping for the current line in the file location list, it has put no output to the current dupechk file even though there have definitely been matches to the current line in the file location list (I ran the command in terminal with no issues).
I've rummaged around the internet to see if anyone else has had similar behaviour, and thus far all I have found is that it is something to do with buffered and unbuffered outputs from other commands operating before the grep command in the Bash script....
However no one seems to have found a solution, so basically I'm asking you guys if you have ever come across this, and any idea/tips/solutions to this problem...
Regards
Paul

The `problem' is the standard I/O library. When it is writing to a terminal
it is unbuffered, but if it is writing to a pipe then it sets up buffering.
try changing
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
to
CURRENT LINE=`sed "$UNIQ_SCAN_INIT_POINT"'q;d' $UNIQ_LIST_FINAL`

Are there any directories with spaces in their names in $FILE_LOCTN_LIST? Because if they are, those spaces will need escaped somehow. Some combination of find and xargs can usually deal with that for you, especially xargs -0

A small bash script using md5sum and sort that detects duplicate files in the current directory:
CURRENT="" md5sum * |
sort |
while read md5sum filename;
do
[[ $CURRENT == $md5sum ]] && echo $filename is duplicate;
CURRENT=$md5sum;
done

you tagged linux, some i assume you have tools like GNU find,md5sum,uniq, sort etc. here's a simple example to find duplicate files
$ echo "hello world">file
$ md5sum file
6f5902ac237024bdd0c176cb93063dc4 file
$ cp file file1
$ md5sum file1
6f5902ac237024bdd0c176cb93063dc4 file1
$ echo "blah" > file2
$ md5sum file2
0d599f0ec05c3bda8c3b8a68c32a1b47 file2
$ find . -type f -exec md5sum "{}" \; |sort -n | uniq -w32 -D
6f5902ac237024bdd0c176cb93063dc4 ./file
6f5902ac237024bdd0c176cb93063dc4 ./file1

Related

Find and delete files, but leave X newest [duplicate]

Is there a simple way, in a pretty standard UNIX environment with bash, to run a command to delete all but the most recent X files from a directory?
To give a bit more of a concrete example, imagine some cron job writing out a file (say, a log file or a tar-ed up backup) to a directory every hour. I'd like a way to have another cron job running which would remove the oldest files in that directory until there are less than, say, 5.
And just to be clear, there's only one file present, it should never be deleted.
The problems with the existing answers:
inability to handle filenames with embedded spaces or newlines.
in the case of solutions that invoke rm directly on an unquoted command substitution (rm `...`), there's an added risk of unintended globbing.
inability to distinguish between files and directories (i.e., if directories happened to be among the 5 most recently modified filesystem items, you'd effectively retain fewer than 5 files, and applying rm to directories will fail).
wnoise's answer addresses these issues, but the solution is GNU-specific (and quite complex).
Here's a pragmatic, POSIX-compliant solution that comes with only one caveat: it cannot handle filenames with embedded newlines - but I don't consider that a real-world concern for most people.
For the record, here's the explanation for why it's generally not a good idea to parse ls output: http://mywiki.wooledge.org/ParsingLs
ls -tp | grep -v '/$' | tail -n +6 | xargs -I {} rm -- {}
Note: This command operates in the current directory; to target a directory explicitly, use a subshell ((...)) with cd:
(cd /path/to && ls -tp | grep -v '/$' | tail -n +6 | xargs -I {} rm -- {})
The same applies analogously to the commands below.
The above is inefficient, because xargs has to invoke rm separately for each filename.
However, your platform's specific xargs implementation may allow you to solve this problem:
A solution that works with GNU xargs is to use -d '\n', which makes xargs consider each input line a separate argument, yet passes as many arguments as will fit on a command line at once:
ls -tp | grep -v '/$' | tail -n +6 | xargs -d '\n' -r rm --
Note: Option -r (--no-run-if-empty) ensures that rm is not invoked if there's no input.
A solution that works with both GNU xargs and BSD xargs (including on macOS) - though technically still not POSIX-compliant - is to use -0 to handle NUL-separated input, after first translating newlines to NUL (0x0) chars., which also passes (typically) all filenames at once:
ls -tp | grep -v '/$' | tail -n +6 | tr '\n' '\0' | xargs -0 rm --
Explanation:
ls -tp prints the names of filesystem items sorted by how recently they were modified , in descending order (most recently modified items first) (-t), with directories printed with a trailing / to mark them as such (-p).
Note: It is the fact that ls -tp always outputs file / directory names only, not full paths, that necessitates the subshell approach mentioned above for targeting a directory other than the current one ((cd /path/to && ls -tp ...)).
grep -v '/$' then weeds out directories from the resulting listing, by omitting (-v) lines that have a trailing / (/$).
Caveat: Since a symlink that points to a directory is technically not itself a directory, such symlinks will not be excluded.
tail -n +6 skips the first 5 entries in the listing, in effect returning all but the 5 most recently modified files, if any.
Note that in order to exclude N files, N+1 must be passed to tail -n +.
xargs -I {} rm -- {} (and its variations) then invokes on rm on all these files; if there are no matches at all, xargs won't do anything.
xargs -I {} rm -- {} defines placeholder {} that represents each input line as a whole, so rm is then invoked once for each input line, but with filenames with embedded spaces handled correctly.
-- in all cases ensures that any filenames that happen to start with - aren't mistaken for options by rm.
A variation on the original problem, in case the matching files need to be processed individually or collected in a shell array:
# One by one, in a shell loop (POSIX-compliant):
ls -tp | grep -v '/$' | tail -n +6 | while IFS= read -r f; do echo "$f"; done
# One by one, but using a Bash process substitution (<(...),
# so that the variables inside the `while` loop remain in scope:
while IFS= read -r f; do echo "$f"; done < <(ls -tp | grep -v '/$' | tail -n +6)
# Collecting the matches in a Bash *array*:
IFS=$'\n' read -d '' -ra files < <(ls -tp | grep -v '/$' | tail -n +6)
printf '%s\n' "${files[#]}" # print array elements
Remove all but 5 (or whatever number) of the most recent files in a directory.
rm `ls -t | awk 'NR>5'`
(ls -t|head -n 5;ls)|sort|uniq -u|xargs rm
This version supports names with spaces:
(ls -t|head -n 5;ls)|sort|uniq -u|sed -e 's,.*,"&",g'|xargs rm
Simpler variant of thelsdj's answer:
ls -tr | head -n -5 | xargs --no-run-if-empty rm
ls -tr displays all the files, oldest first (-t newest first, -r reverse).
head -n -5 displays all but the 5 last lines (ie the 5 newest files).
xargs rm calls rm for each selected file.
find . -maxdepth 1 -type f -printf '%T# %p\0' | sort -r -z -n | awk 'BEGIN { RS="\0"; ORS="\0"; FS="" } NR > 5 { sub("^[0-9]*(.[0-9]*)? ", ""); print }' | xargs -0 rm -f
Requires GNU find for -printf, and GNU sort for -z, and GNU awk for "\0", and GNU xargs for -0, but handles files with embedded newlines or spaces.
All these answers fail when there are directories in the current directory. Here's something that works:
find . -maxdepth 1 -type f | xargs -x ls -t | awk 'NR>5' | xargs -L1 rm
This:
works when there are directories in the current directory
tries to remove each file even if the previous one couldn't be removed (due to permissions, etc.)
fails safe when the number of files in the current directory is excessive and xargs would normally screw you over (the -x)
doesn't cater for spaces in filenames (perhaps you're using the wrong OS?)
ls -tQ | tail -n+4 | xargs rm
List filenames by modification time, quoting each filename. Exclude first 3 (3 most recent). Remove remaining.
EDIT after helpful comment from mklement0 (thanks!): corrected -n+3 argument, and note this will not work as expected if filenames contain newlines and/or the directory contains subdirectories.
Ignoring newlines is ignoring security and good coding. wnoise had the only good answer. Here is a variation on his that puts the filenames in an array $x
while IFS= read -rd ''; do
x+=("${REPLY#* }");
done < <(find . -maxdepth 1 -printf '%T# %p\0' | sort -r -z -n )
For Linux (GNU tools), an efficient & robust way to keep the n newest files in the current directory while removing the rest:
n=5
find . -maxdepth 1 -type f -printf '%T# %p\0' |
sort -z -nrt ' ' -k1,1 |
sed -z -e "1,${n}d" -e 's/[^ ]* //' |
xargs -0r rm -f
For BSD, find doesn't have the -printf predicate, stat can't output NULL bytes, and sed + awk can't handle NULL-delimited records.
Here's a solution that doesn't support newlines in paths but that safeguards against them by filtering them out:
#!/bin/bash
n=5
find . -maxdepth 1 -type f ! -path $'*\n*' -exec stat -f '%.9Fm %N' {} + |
sort -nrt ' ' -k1,1 |
awk -v n="$n" -F'^[^ ]* ' 'NR > n {printf "%s%c", $2, 0}' |
xargs -0 rm -f
note: I'm using bash because of the $'\n' notation. For sh you can define a variable containing a literal newline and use it instead.
Solution for UNIX & Linux (inspired from AIX/HP-UX/SunOS/BSD/Linux ls -b):
Some platforms don't provide find -printf, nor stat, nor support NUL-delimited records with stat/sort/awk/sed/xargs. That's why using perl is probably the most portable way to tackle the problem, because it is available by default in almost every OS.
I could have written the whole thing in perl but I didn't. I only use it for substituting stat and for encoding-decoding-escaping the filenames. The core logic is the same as the previous solutions and is implemented with POSIX tools.
note: perl's default stat has a resolution of a second, but starting from perl-5.8.9 you can get sub-second resolution with the stat function of the module Time::HiRes (when both the OS and the filesystem support it). That's what I'm using here; if your perl doesn't provide it then you can remove the ‑MTime::HiRes=stat from the command line.
n=5
find . '(' -name '.' -o -prune ')' -type f -exec \
perl -MTime::HiRes=stat -le '
foreach (#ARGV) {
#st = stat($_);
if ( #st > 0 ) {
s/([\\\n])/sprintf( "\\%03o", ord($1) )/ge;
print sprintf( "%.9f %s", $st[9], $_ );
}
else { print STDERR "stat: $_: $!"; }
}
' {} + |
sort -nrt ' ' -k1,1 |
sed -e "1,${n}d" -e 's/[^ ]* //' |
perl -l -ne '
s/\\([0-7]{3})/chr(oct($1))/ge;
s/(["\n])/"\\$1"/g;
print "\"$_\"";
' |
xargs -E '' sh -c '[ "$#" -gt 0 ] && rm -f "$#"' sh
Explanations:
For each file found, the first perl gets the modification time and outputs it along the encoded filename (each newline and backslash characters are replaced with the literals \012 and \134 respectively).
Now each time filename is guaranteed to be single-line, so POSIX sort and sed can safely work with this stream.
The second perl decodes the filenames and escapes them for POSIX xargs.
Lastly, xargs calls rm for deleting the files. The sh command is a trick that prevents xargs from running rm when there's no files to delete.
I realize this is an old thread, but maybe someone will benefit from this. This command will find files in the current directory :
for F in $(find . -maxdepth 1 -type f -name "*_srv_logs_*.tar.gz" -printf '%T# %p\n' | sort -r -z -n | tail -n+5 | awk '{ print $2; }'); do rm $F; done
This is a little more robust than some of the previous answers as it allows to limit your search domain to files matching expressions. First, find files matching whatever conditions you want. Print those files with the timestamps next to them.
find . -maxdepth 1 -type f -name "*_srv_logs_*.tar.gz" -printf '%T# %p\n'
Next, sort them by the timestamps:
sort -r -z -n
Then, knock off the 4 most recent files from the list:
tail -n+5
Grab the 2nd column (the filename, not the timestamp):
awk '{ print $2; }'
And then wrap that whole thing up into a for statement:
for F in $(); do rm $F; done
This may be a more verbose command, but I had much better luck being able to target conditional files and execute more complex commands against them.
If the filenames don't have spaces, this will work:
ls -C1 -t| awk 'NR>5'|xargs rm
If the filenames do have spaces, something like
ls -C1 -t | awk 'NR>5' | sed -e "s/^/rm '/" -e "s/$/'/" | sh
Basic logic:
get a listing of the files in time order, one column
get all but the first 5 (n=5 for this example)
first version: send those to rm
second version: gen a script that will remove them properly
With zsh
Assuming you don't care about present directories and you will not have more than 999 files (choose a bigger number if you want, or create a while loop).
[ 6 -le `ls *(.)|wc -l` ] && rm *(.om[6,999])
In *(.om[6,999]), the . means files, the o means sort order up, the m means by date of modification (put a for access time or c for inode change), the [6,999] chooses a range of file, so doesn't rm the 5 first.
Adaptation of #mklement0's excellent answer with some parameters and without needing to navigate to the folder containing the files to be deleted...
TARGET_FOLDER="/my/folder/path"
FILES_KEEP=5
ls -tp "$TARGET_FOLDER"**/* | grep -v '/$' | tail -n +$((FILES_KEEP+1)) | xargs -d '\n' -r rm --
[Ref(s).: https://stackoverflow.com/a/3572628/3223785 ]
Thanks! 😉
found interesting cmd in Sed-Onliners - Delete last 3 lines - fnd it perfect for another way to skin the cat (okay not) but idea:
#!/bin/bash
# sed cmd chng #2 to value file wish to retain
cd /opt/depot
ls -1 MyMintFiles*.zip > BigList
sed -n -e :a -e '1,2!{P;N;D;};N;ba' BigList > DeList
for i in `cat DeList`
do
echo "Deleted $i"
rm -f $i
#echo "File(s) gonzo "
#read junk
done
exit 0
Removes all but the 10 latest (most recents) files
ls -t1 | head -n $(echo $(ls -1 | wc -l) - 10 | bc) | xargs rm
If less than 10 files no file is removed and you will have :
error head: illegal line count -- 0
To count files with bash
I needed an elegant solution for the busybox (router), all xargs or array solutions were useless to me - no such command available there. find and mtime is not the proper answer as we are talking about 10 items and not necessarily 10 days. Espo's answer was the shortest and cleanest and likely the most unversal one.
Error with spaces and when no files are to be deleted are both simply solved the standard way:
rm "$(ls -td *.tar | awk 'NR>7')" 2>&-
Bit more educational version: We can do it all if we use awk differently. Normally, I use this method to pass (return) variables from the awk to the sh. As we read all the time that can not be done, I beg to differ: here is the method.
Example for .tar files with no problem regarding the spaces in the filename. To test, replace "rm" with the "ls".
eval $(ls -td *.tar | awk 'NR>7 { print "rm \"" $0 "\""}')
Explanation:
ls -td *.tar lists all .tar files sorted by the time. To apply to all the files in the current folder, remove the "d *.tar" part
awk 'NR>7... skips the first 7 lines
print "rm \"" $0 "\"" constructs a line: rm "file name"
eval executes it
Since we are using rm, I would not use the above command in a script! Wiser usage is:
(cd /FolderToDeleteWithin && eval $(ls -td *.tar | awk 'NR>7 { print "rm \"" $0 "\""}'))
In the case of using ls -t command will not do any harm on such silly examples as: touch 'foo " bar' and touch 'hello * world'. Not that we ever create files with such names in real life!
Sidenote. If we wanted to pass a variable to the sh this way, we would simply modify the print (simple form, no spaces tolerated):
print "VarName="$1
to set the variable VarName to the value of $1. Multiple variables can be created in one go. This VarName becomes a normal sh variable and can be normally used in a script or shell afterwards. So, to create variables with awk and give them back to the shell:
eval $(ls -td *.tar | awk 'NR>7 { print "VarName=\""$1"\"" }'); echo "$VarName"
leaveCount=5
fileCount=$(ls -1 *.log | wc -l)
tailCount=$((fileCount - leaveCount))
# avoid negative tail argument
[[ $tailCount < 0 ]] && tailCount=0
ls -t *.log | tail -$tailCount | xargs rm -f
I made this into a bash shell script. Usage: keep NUM DIR where NUM is the number of files to keep and DIR is the directory to scrub.
#!/bin/bash
# Keep last N files by date.
# Usage: keep NUMBER DIRECTORY
echo ""
if [ $# -lt 2 ]; then
echo "Usage: $0 NUMFILES DIR"
echo "Keep last N newest files."
exit 1
fi
if [ ! -e $2 ]; then
echo "ERROR: directory '$1' does not exist"
exit 1
fi
if [ ! -d $2 ]; then
echo "ERROR: '$1' is not a directory"
exit 1
fi
pushd $2 > /dev/null
ls -tp | grep -v '/' | tail -n +"$1" | xargs -I {} rm -- {}
popd > /dev/null
echo "Done. Kept $1 most recent files in $2."
ls $2|wc -l
Modified version of the answer of #Fabien if you want to specify a path. Useful if you're running the script elsewhere.
ls -tr /path/foo/ | head -n -5 | xargs -I% --no-run-if-empty rm /path/foo/%

Move a file list based upon grep pattern in command line [duplicate]

I want to pass each output from a command as multiple argument to a second command, e.g.:
grep "pattern" input
returns:
file1
file2
file3
and I want to copy these outputs, e.g:
cp file1 file1.bac
cp file2 file2.bac
cp file3 file3.bac
How can I do that in one go? Something like:
grep "pattern" input | cp $1 $1.bac
You can use xargs:
grep 'pattern' input | xargs -I% cp "%" "%.bac"
You can use $() to interpolate the output of a command. So, you could use kill -9 $(grep -hP '^\d+$' $(ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }')) if you wanted to.
In addition to Chris Jester-Young good answer, I would say that xargs is also a good solution for these situations:
grep ... `ls -lad ... | awk '{ print $9 }'` | xargs kill -9
will make it. All together:
grep -hP '^\d+$' `ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }'` | xargs kill -9
For completeness, I'll also mention command substitution and explain why this is not recommended:
cp $(grep -l "pattern" input) directory/
(The backtick syntax cp `grep -l "pattern" input` directory/ is roughly equivalent, but it is obsolete and unwieldy; don't use that.)
This will fail if the output from grep produces a file name which contains whitespace or a shell metacharacter.
Of course, it's fine to use this if you know exactly which file names the grep can produce, and have verified that none of them are problematic. But for a production script, don't use this.
Anyway, for the OP's scenario, where you need to refer to each match individually and add an extension to it, the xargs or while read alternatives are superior anyway.
In the worst case (meaning problematic or unspecified file names), pass the matches to a subshell via xargs:
grep -l "pattern" input |
xargs -r sh -c 'for f; do cp "$f" "$f.bac"; done' _
... where obviously the script inside the for loop could be arbitrarily complex.
In the ideal case, the command you want to run is simple (or versatile) enough that you can simply pass it an arbitrarily long list of file names. For example, GNU cp has a -t option to facilitate this use of xargs (the -t option allows you to put the destination directory first on the command line, so you can put as many files as you like at the end of the command):
grep -l "pattern" input | xargs cp -t destdir
which will expand into
cp -t destdir file1 file2 file3 file4 ...
for as many matches as xargs can fit onto the command line of cp, repeated as many times as it takes to pass all the files to cp. (Unfortunately, this doesn't match the OP's scenario; if you need to rename every file while copying, you need to pass in just two arguments per cp invocation: the source file name and the destination file name to copy it to.)
So in other words, if you use the command substitution syntax and grep produces a really long list of matches, you risk bumping into ARG_MAX and "Argument list too long" errors; but xargs will specifically avoid this by instead copying only as many arguments as it can safely pass to cp at a time, and running cp multiple times if necessary instead.
The above will still work incorrectly if you have file names which contain newlines. Perhaps see also https://mywiki.wooledge.org/BashFAQ/020
#!/bin/bash
for f in files; do
if grep -q PATTERN "$f"; then
echo cp -v "$f" "${f}.bac"
fi
done
files can be *.txt or *.text which basically means files ending in *.txt or *text or replace with something that you want/need, of course replace PATTERN with yours. Remove echo if you're satisfied with the output. For a recursive solution take a look at the bash shell option globstar

How to get only filenames without Path by using grep

I have got the following Problem.
I´m doing a grep like:
$command = grep -r -i --include=*.cfg 'host{' /omd/sites/mesh/etc/icinga/conf.d/objects
I got the following output:
/omd/sites/mesh/etc/icinga/conf.d/objects/testsystem/test1.cfg:define host{
/omd/sites/mesh/etc/icinga/conf.d/objects/testsystem/test2.cfg:define host{
/omd/sites/mesh/etc/icinga/conf.d/objects/testsystem/test3.cfg:define host{
...
for all *.cfg files.
With exec($command,$array)
I passed the result in an array.
Is it possible to get only the filenames as result of the grep-command.
I have tried the following:
$Command= grep -l -H -r -i --include=*.cfg 'host{' /omd/sites/mesh/etc/icinga/conf.d/objects
but I got the same result.
I know that on the forum a similar topic exists.(How can I use grep to show just filenames (no in-line matches) on linux?), but the solution doesn´t work.
With "exec($Command,$result_array)" I try to get an array with the results.
The mentioned solutions works all, but I can´t get an resultarray with exec().
Can anyone help me?
Yet another simpler solution:
grep -l whatever-you-want | xargs -L 1 basename
or you can avoid xargs and use a subshell instead, if you are not using an ancient version of the GNU coreutils:
basename -a $(grep -l whatever-you-want)
basename is the bash straightforward solution to get a file name without path. You may also be interested in dirname to get the path only.
GNU Coreutils basename documentation
Is it possible to get only the filenames as result of the grep command.
With grep you need the -l option to display only file names.
Using find ... -execdir grep ... \{} + you might prevent displaying the full path of the file (is this what you need?)
find /omd/sites/mesh/etc/icinga/conf.d/objects -name '*.cfg' \
-execdir grep -r -i -l 'host{' \{} +
In addition, concerning the second part of your question, to read the result of a command into an array, you have to use the syntax: IFS=$'\n' MYVAR=( $(cmd ...) )
In that particular case (I formatted as multiline statement in order to clearly show the structure of that expression -- of course you could write as a "one-liner"):
IFS=$'\n' MYVAR=(
$(
find objects -name '*.cfg' \
-execdir grep -r -i -l 'host{' \{} +
)
)
You have then access to the result in the array MYVAR as usual. While I while I was testing (3 matches in that particular case):
sh$ echo ${#MYVAR[#]}
3
sh$ echo ${MYVAR[0]}
./x y.cfg
sh$ echo ${MYVAR[1]}
./d.cfg
sh$ echo ${MYVAR[2]}
./e.cfg
# ...
This should work:
grep -r -i --include=*.cfg 'host{' /omd/sites/mesh/etc/icinga/conf.d/objects | \
awk '{print $1}' | sed -e 's|[^/]*/||g' -e 's|:define$||'
The awk portion finds the first field in it and the sed command trims off the path and the :define.

xargs with multiple arguments

I have a source input, input.txt
a.txt
b.txt
c.txt
I want to feed these input into a program as the following:
my-program --file=a.txt --file=b.txt --file=c.txt
So I try to use xargs, but with no luck.
cat input.txt | xargs -i echo "my-program --file"{}
It gives
my-program --file=a.txt
my-program --file=b.txt
my-program --file=c.txt
But I want
my-program --file=a.txt --file=b.txt --file=c.txt
Any idea?
Don't listen to all of them. :) Just look at this example:
echo argument1 argument2 argument3 | xargs -l bash -c 'echo this is first:$0 second:$1 third:$2'
Output will be:
this is first:argument1 second:argument2 third:argument3
None of the solutions given so far deals correctly with file names containing space. Some even fail if the file names contain ' or ". If your input files are generated by users, you should be prepared for surprising file names.
GNU Parallel deals nicely with these file names and gives you (at least) 3 different solutions. If your program takes 3 and only 3 arguments then this will work:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo a2.txt; echo b2.txt; echo c2.txt;) |
parallel -N 3 my-program --file={1} --file={2} --file={3}
Or:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo a2.txt; echo b2.txt; echo c2.txt;) |
parallel -X -N 3 my-program --file={}
If, however, your program takes as many arguments as will fit on the command line:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo d1.txt; echo e1.txt; echo f1.txt;) |
parallel -X my-program --file={}
Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ
How about:
echo $'a.txt\nb.txt\nc.txt' | xargs -n 3 sh -c '
echo my-program --file="$1" --file="$2" --file="$3"
' argv0
It's simpler if you use two xargs invocations: 1st to transform each line into --file=..., 2nd to actually do the xargs thing ->
$ cat input.txt | xargs -I# echo --file=# | xargs echo my-program
my-program --file=a.txt --file=b.txt --file=c.txt
You can use sed to prefix --file= to each line and then call xargs:
sed -e 's/^/--file=/' input.txt | xargs my-program
Here is a solution using sed for three arguments, but is limited in that it applies the same transform to each argument:
cat input.txt | sed 's/^/--file=/g' | xargs -n3 my-program
Here's a method that will work for two args, but allows more flexibility:
cat input.txt | xargs -n 2 | xargs -I{} sh -c 'V="{}"; my-program -file=${V% *} -file=${V#* }'
I stumbled on a similar problem and found a solution which I think is nicer and cleaner than those presented so far.
The syntax for xargs that I have ended with would be (for your example):
xargs -I X echo --file=X
with a full command line being:
my-program $(cat input.txt | xargs -I X echo --file=X)
which will work as if
my-program --file=a.txt --file=b.txt --file=c.txt
was done (providing input.txt contains data from your example).
Actually, in my case I needed to first find the files and also needed them sorted so my command line looks like this:
my-program $(find base/path -name "some*pattern" -print0 | sort -z | xargs -0 -I X echo --files=X)
Few details that might not be clear (they were not for me):
some*pattern must be quoted since otherwise shell would expand it before passing to find.
-print0, then -z and finally -0 use null-separation to ensure proper handling of files with spaces or other wired names.
Note however that I didn't test it deeply yet. Though it seems to be working.
xargs doesn't work that way. Try:
myprogram $(sed -e 's/^/--file=/' input.txt)
It's because echo prints a newline. Try something like
echo my-program `xargs --arg-file input.txt -i echo -n " --file "{}`
I was looking for a solution for this exact problem and came to the conclution of coding a script in the midle.
to transform the standard output for the next example use the -n '\n' delimeter
example:
user#mybox:~$ echo "file1.txt file2.txt" | xargs -n1 ScriptInTheMiddle.sh
inside the ScriptInTheMidle.sh:
!#/bin/bash
var1=`echo $1 | cut -d ' ' -f1 `
var2=`echo $1 | cut -d ' ' -f2 `
myprogram "--file1="$var1 "--file2="$var2
For this solution to work you need to have a space between those arguments file1.txt and file2.txt, or whatever delimeter you choose, one more thing, inside the script make sure you check -f1 and -f2 as they mean "take the first word and take the second word" depending on the first delimeter's position found (delimeters could be ' ' ';' '.' whatever you wish between single quotes .
Add as many parameters as you wish.
Problem solved using xargs, cut , and some bash scripting.
Cheers!
if you wanna pass by I have some useful tips http://hongouru.blogspot.com
Actually, it's relatively easy:
... | sed 's/^/--prefix=/g' | xargs echo | xargs -I PARAMS your_cmd PARAMS
The sed 's/^/--prefix=/g' is optional, in case you need to prefix each param with some --prefix=.
The xargs echo turns the list of param lines (one param in each line) into a list of params in a single line and the xargs -I PARAMS your_cmd PARAMS allows you to run a command, placing the params where ever you want.
So cat input.txt | sed 's/^/--file=/g' | xargs echo | xargs -I PARAMS my-program PARAMS does what you need (assuming all lines within input.txt are simple and qualify as a single param value each).
There is another nice way of doing this, if you do not know the number of files upront:
my-program $(find . -name '*.txt' -printf "--file=%p ")
Nobody has mentioned echoing out from a loop yet, so I'll put that in for completeness sake (it would be my second approach, the sed one being the first):
for line in $(< input.txt) ; do echo --file=$line ; done | xargs echo my-program
Old but this is a better answer:
cat input.txt | gsed "s/\(.*\)/\-\-file=\1/g" | tr '\n' ' ' | xargs my_program
# i like clean one liners
gsed is just gnu sed to ensure syntax matches version brew install gsed or just sed if your on gnu linux already...
test it:
cat input.txt | gsed "s/\(.*\)/\-\-file=\1/g" | tr '\n' ' ' | xargs echo my_program

Linux: Removing files that don't contain all the words specified

Inside a directory, how can I delete files that lack any of the words specified, so that only files that contain ALL the words are left? I tried to write a simple bash shell script using grep and rm commands, but I got lost. I am totally new to Linux, any help would be appreciated
How about:
grep -L foo *.txt | xargs rm
grep -L bar *.txt | xargs rm
If a file does not contain foo, then the first line will remove it.
If a file does not contain bar, then the second line will remove it.
Only files containing both foo and bar should be left
-L, --files-without-match
Suppress normal output; instead print the name of each input
file from which no output would normally have been printed. The
scanning will stop on the first match.
See also #Mykola Golubyev's post for placing in a loop.
list=`Word1 Word2 Word3 Word4 Word5`
for word in $list
grep -L $word *.txt | xargs rm
done
Addition to the answers above: Use the newline character as delimiter to handle file names with spaces!
grep -L $word $file | xargs -d '\n' rm
grep -L word | xargs rm
To do the same matching filenames (not the contents of files as most of the solutions above) you can use the following:
for file in `ls --color=never | grep -ve "\(foo\|bar\)"`
do
rm $file
done
As per comments:
for file in `ls`
shouldn't be used. The below does the same thing without using the ls
for file in *
do
if [ x`echo $file | grep -ve "\(test1\|test3\)"` == x ]; then
rm $file
fi
done
The -ve reverses the search for the regexp pattern for either foo or bar in the filename.
Any further words to be added to the list need to be separated by \|
e.g. one\|two\|three
First, remove the file-list:
rm flist
Then, for each of the words, add the file to the filelist if it contains that word:
grep -l WORD * >>flist
Then sort, uniqify and get a count:
sort flist | uniq -c >flist_with_count
All those files in flsit_with_count that don't have the number of words should be deleted. The format will be:
2 file1
7 file2
8 file3
8 file4
If there were 8 words, then file1 and file2 should be deleted. I'll leave the writing/testing of the script to you.
Okay, you convinced me, here's my script:
#!/bin/bash
rm -rf flist
for word in fopen fclose main ; do
grep -l ${word} *.c >>flist
done
rm $(sort flist | uniq -c | awk '$1 != 3 {print $2} {}')
This removes the files in the directory that didn't have all three words:
You could try something like this but it may break
if the patterns contain shell or grep meta characters:
(in this example one two three are the patterns)
for f in *; do
unset cmd
for p in one two three; do
cmd="fgrep \"$p\" \"$f\" && $cmd"
done
eval "$cmd" >/dev/null || rm "$f"
done
This will remove all files that doesn't contain words Ping or Sent
grep -L 'Ping\|Sent' * | xargs rm

Resources