Bash: How to tail then copy multiple files (eg using xargs)? - linux

I've been trying various combinations of xargs and piping but I just can't get the right result. Previous questions don't quite cover exactly what I want to do:
I have a source directory somewhere, lets say /foo/source, with a mix of different files
I want to copy just the csv files found in source to a different destination, say /foo/dest
But I ALSO at the same time need to remove 232 header rows (eg using tail)
I've figured out that I need to pipe the results of find into xargs, which can then run commands on each find result. But I'm struggling to tail then copy. If I pipe tail into cp, cp does not seem to receive the file (missing file operand). Here's some examples of what I've tried so far:
find /foo/source -name "*.csv" | xargs -I '{}' sh -c 'tail -n +232 | cp -t /foo/dest'
cp: missing file operand
find /foo/source -name "*.csv" | xargs -I '{}' sh -c 'tail -n +232 {} | cp -t /foo/dest'
Result:
cp: failed to access '/foo/dest': No such file or directory ...
find /foo/source -name "*.csv" | xargs -I '{}' sh -c 'tail -n +232 {} > /foo/dest/{}'
sh: /foo/dest/foo/source/0001.csv: No such file or directory ...
Any pointers would be really appreciated!
Thanks

Just use find with exec and copy the file name in a variable:
find your_dir -name "*.csv" -exec sh -c 'f="$1"; tail -n +5 "$f" > dest_dir/$(basename "$f")' -- {} \;
See f={} makes $f hold the name of the file, with the full path. Then, it is a matter of redirecting the output of tail into the file, stripping the path from it.
Or, based on Random832's suggestion below in comments (thanks!):
find your_dir -name "*.csv" -exec sh -c 'tail -n +5 "$1" > dest_dir/$(basename "$1")' -- {} \;

Your last command is close, but the problem is that {} is replaced with the full pathname, not just the filename. Use the basename command to extract the filename from it.
find /foo/source -name "*.csv" | xargs -I '{}' sh -c 'tail -n +232 {} > /foo/dest/$(basename {})'

As an alternative to find and xargs you could use a for loop, and as an alternative to tail you could use sed, consider this:
source=/foo/source
dest=/foo/dest
for csv in $source/*.csv; do sed '232,$ !d' $csv > $dest/$(basename $csv); done

Using GNU Parallel you would do:
find /foo/source -name "*.csv" | parallel tail -n +232 {} '>' /foo/dest/{/}

Related

How to use grep to reverse search files in a folder

I'm trying to create a script which will find missing topics from multiple log files. These logfiles are filled top down, so the newest logs are at the bottom of the file. I would like to grep only the last line from this file which includes UNKNOWN_TOPIC_OR_PARTITION. This should be done in multiple files with completely different names. Is grep the best solution or is there another solution that suits my needs. I already tried adding tail, but that doesn't seem to work.
missingTopics=$(grep -Ri -m1 --exclude=*.{1,2,3,4,5} UNKNOWN_TOPIC_OR_PARTITION /app/tibco/log/tra/domain/)
You could try a combination of find, tac and grep:
find /app/tibco/log/tra/domain -type f ! -name '*.[1-5]' -exec sh -c \
'tac "$1" | grep -im1 UNKNOWN_TOPIC_OR_PARTITION' "sh" '{}' \;
tac prints files in reverse, the -exec sh -c SCRIPT "sh" '{}' \; action of find executes the shell SCRIPT each time a file matching the previous tests is found. The SCRIPT is executed with "sh" as parameter $0 and the path of the found file as parameter $1.
If performance is an issue you can probably improve it with:
find . -type f ! -name '*.[1-5]' -exec sh -c 'for f in "$#"; do \
tac "$f" | grep -im1 UNKNOWN_TOPIC_OR_PARTITION; done' "sh" '{}' +
which will spawn less shells. If security is also an issue you can also replace -exec by -execdir (even if with this SCRIPT I do not immediately see any exploit).

Find files matching a pattern, replace strings and then diff the output with original, command fails

I am trying to find files with the name.* and run sed on the ones that match, then pipe to diff to see what was changed.
However the command fails. If I remove the pipe an diff it is happy to output results. Why is failing with the diff? Is there a better way to do this?
> find -type f -name "names.*" -printf '%p' -exec sed 's/Cow/Kitten' {} | diff {} - \;
diff: extra operand ';'
diff: Try 'diff --help' for more information.
find: missing argument to \-exec\'`
A shell is needed to do what you wanted, like so.
find -type f -name "names.*" -exec sh -c '
for f; do
sed 's/Cow/Kitten/' "$f" | diff "$f" -
done' _ {} \;
In-one-line
find -type f -name "names.*" -exec sh -c 'for f; do sed 's/Cow/Kitten/' "$f" | diff "$f" -; done' _ {} \;
See understanding-the-exec-option-of-find
Or using a while + read loop and Process Substitution.
#!/usr/bin/env bash
while IFS= read -rd '' files; do
sed 's/Cow/Kitten/' "$files" | diff "$files" -
done < <(find -type f -name "names.*" -print0)
The latter script is white space/tab/newlines safe but is strictly bash as oppose to the former script which is POSIX sh. (Will/should work/execute with any POSIX compliant shell.)
See How can I find and safely handle file names containing newlines, spaces or both?
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?

how do I change string in all sub directories with same file name (For eg: data.txt) in linux using termianl?

find . -name "data.txt" -print0 | grep -rl "pa028" ./ |xargs -0 sed -i '' -e 's/pa028/pa014/g'
I tried to replace pa028 with pa014 in the file name "data.txt" in all subdirectories. Can you find please correct me?
You can't put grep between find -print0 and xargs -0 because grep operates on lines, and this pipeline contains null-separated text instead of lines. Additionally, grep -r . will ignore the standard input you so expensively set up find to produce.
find . -name "data.txt" -exec grep -q "pa028" {} \; -print0 |
xargs -r -0 sed -i '' -e 's/pa028/pa014/g'
The logic here is to use -exec grep -q as a predicate to find so we produce a null-terminated list of matching files (for which the -exec returns true) to pass to xargs -r -0. (The -r option is important, too; you get weird errors if xargs runs anyway even though find produced no output.)
There is an extension to GNU grep to operate on null-terminated strings with -z and print null-terminated file names with -Z -l but that's a fairly recent development, so I'm not yet prepared to recommend that.

Is it possible to pipe the results of FIND to a COPY command CP?

Is it possible to pipe the results of find to a COPY command cp?
Like this:
find . -iname "*.SomeExt" | cp Destination Directory
Seeking, I always find this kind of formula such as from this post:
find . -name "*.pdf" -type f -exec cp {} ./pdfsfolder \;
This raises some questions:
Why cant you just use | pipe? isn't that what its for?
Why does everyone recommend the -exec
How do I know when to use that (exec) over pipe |?
There's a little-used option for cp: -t destination -- see the man page:
find . -iname "*.SomeExt" | xargs cp -t Directory
Good question!
why cant you just use | pipe? isn't that what its for?
You can pipe, of course, xargs is done for these cases:
find . -iname "*.SomeExt" | xargs cp Destination_Directory/
Why does everyone recommend the -exec
The -exec is good because it provides more control of exactly what you are executing. Whenever you pipe there may be problems with corner cases: file names containing spaces or new lines, etc.
how do I know when to use that (exec) over pipe | ?
It is really up to you and there can be many cases. I would use -exec whenever the action to perform is simple. I am not a very good friend of xargs, I tend to prefer an approach in which the find output is provided to a while loop, such as:
while IFS= read -r result
do
# do things with "$result"
done < <(find ...)
You can use | like below:
find . -iname "*.SomeExt" | while read line
do
cp $line DestDir/
done
Answering your questions:
| can be used to solve this issue. But as seen above, it involves a lot of code. Moreover, | will create two process - one for find and another for cp.
Instead using exec() inside find will solve the problem in a single process.
Try this:
find . -iname "*.SomeExt" -print0 | xargs -0 cp -t Directory
# ........................^^^^^^^..........^^
In case there is whitespace in filenames.
I like the spirit of the response from #fedorqui-so-stop-harming, but it needed a tweak to work in my bash terminal.
In this version...
find . -iname "*.SomeExt" | xargs cp Destination_Directory/
The cp command incorrectly takes Destination_Directory/ as the first argument. I needed to add a replacement string in order to get xargs to insert the argument in the right position for cp. I used a percent symbol for the replacement string, but you can use anything that doesn't conflict with the input from the pipe. This version works for me.
find . -iname "*.SomeExt" | xargs -I % cp % Destination_Directory/
This SOLVED my problem.
find . -type f | grep '\.pdf' | while read line
do
cp $line REPLACE_WITH_TARGET_DIRECTORY
done
If there are spaces in the filenames, try:
find . -iname *.ext > list.txt
cat list.txt | awk 'BEGIN {a="'"'"'"}{print "cp "a$0a" Directory"}' > script.sh
sh script.sh
You can inspect list.txt and script.sh before sh script.sh. Remember to delete the list.txt and script.sh afterwards.
I had some files with parenthesis and wanted a progress bar, so replaced the cat line with:
cat list.txt | awk -v X='"' '{print "rsync -Pa "X$0X" /Volumes/Untitled/"}' > script.sh

Linux: Find a List of Files in a Dictionary recursively

I have a Textfile with one Filename per row:
Interpret 1 - Song 1.mp3
Interpret 2 - Song 2.mp3
...
(About 200 Filenames)
Now I want to search a Folder recursivly for this Filenames to get the full path for each Filename in Filenames.txt.
How to do this? :)
(Purpose: Copied files to my MP3-Player but some of them are broken and i want to recopy them all without spending hours of researching them out of my music folder)
The easiest way may be the following:
cat orig_filenames.txt | while read file ; do find /dest/directory -name "$file" ; done > output_file_with_paths
Much faster way is run the find command only once and use fgrep.
find . -type f -print0 | fgrep -zFf ./file_with_filenames.txt | xargs -0 -J % cp % /path/to/destdir
You can use a while read loop along with find:
filecopy.sh
#!/bin/bash
while read line
do
find . -iname "$line" -exec cp '{}' /where/to/put/your/files \;
done < list_of_files.txt
Where list_of_files.txt is the list of files line by line, and /where/to/put/your/files is the location you want to copy to. You can just run it like so in the directory:
$ bash filecopy.sh
+1 for #jm666 answer, but the -J option doesn't work for my flavor of xargs, so i chaned it to:
find . -type f -print0 | fgrep -zFf ./file_with_filenames.txt | xargs -0 -I{} cp "{}" /path/to/destdir/

Resources