Pass a list of files to perl script via pipe - linux

I am having a problem where my perl script will fail upon having an input piped, but works fine when I just list all the file names individually.
For reference, input of the perl script is read with while(<>).
Example:
script.pl file1.tag file2.tag file3.tag
runs fine.
But the following all fail.
find ./*.tag | chomp | script.pl
ls -l *.tag | perl -pe 's/\n/ /g' | script.pl
find ./*.tag | perl -pe 's/\n/ /g' | script.pl
I also tested dumping it into a text file and catting that into the perl:
cat files.text | script.pl
All of them fail the same way. It is like the script is passed no input arguments and the program just finishes.

From perldoc perlop:
The null filehandle <> is special [...] Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the #ARGV array is checked, and if it is empty, $ARGV[0] is set to -, which when opened gives you standard input. The #ARGV array is then processed as a list of filenames.
You're not passing any command line arguments to your Perl scripts, so everything you pipe into them is read into STDIN instead of being treated as filenames:
$ echo foo > foo.txt
$ echo bar > bar.txt
$ ls | perl -e 'print "<$_>\n" while <>'
<bar.txt
>
<foo.txt
>
Notice that the files foo.txt and bar.txt are not actually read; all we get is the file names. If you want the files to be opened and read, you have to pass them as command line arguments or explicitly set #ARGV:
$ perl -e 'print "<$_>\n" while <>' *
<bar
>
<foo
>
If you have a large number of files, like you're likely to get from find, you should use xargs as Dyno Hongjun Fu suggested.
However, you don't need find, ls, cat, or your Perl one-liner to run your script on all the .tag files in the current directory. Simply do:
script.pl *.tag

you need xargs, e.g.
find ./ -type f -name "*.tag" | xargs -i script.pl {}
what is chomp?

Related

Move a file list based upon grep pattern in command line [duplicate]

I want to pass each output from a command as multiple argument to a second command, e.g.:
grep "pattern" input
returns:
file1
file2
file3
and I want to copy these outputs, e.g:
cp file1 file1.bac
cp file2 file2.bac
cp file3 file3.bac
How can I do that in one go? Something like:
grep "pattern" input | cp $1 $1.bac
You can use xargs:
grep 'pattern' input | xargs -I% cp "%" "%.bac"
You can use $() to interpolate the output of a command. So, you could use kill -9 $(grep -hP '^\d+$' $(ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }')) if you wanted to.
In addition to Chris Jester-Young good answer, I would say that xargs is also a good solution for these situations:
grep ... `ls -lad ... | awk '{ print $9 }'` | xargs kill -9
will make it. All together:
grep -hP '^\d+$' `ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }'` | xargs kill -9
For completeness, I'll also mention command substitution and explain why this is not recommended:
cp $(grep -l "pattern" input) directory/
(The backtick syntax cp `grep -l "pattern" input` directory/ is roughly equivalent, but it is obsolete and unwieldy; don't use that.)
This will fail if the output from grep produces a file name which contains whitespace or a shell metacharacter.
Of course, it's fine to use this if you know exactly which file names the grep can produce, and have verified that none of them are problematic. But for a production script, don't use this.
Anyway, for the OP's scenario, where you need to refer to each match individually and add an extension to it, the xargs or while read alternatives are superior anyway.
In the worst case (meaning problematic or unspecified file names), pass the matches to a subshell via xargs:
grep -l "pattern" input |
xargs -r sh -c 'for f; do cp "$f" "$f.bac"; done' _
... where obviously the script inside the for loop could be arbitrarily complex.
In the ideal case, the command you want to run is simple (or versatile) enough that you can simply pass it an arbitrarily long list of file names. For example, GNU cp has a -t option to facilitate this use of xargs (the -t option allows you to put the destination directory first on the command line, so you can put as many files as you like at the end of the command):
grep -l "pattern" input | xargs cp -t destdir
which will expand into
cp -t destdir file1 file2 file3 file4 ...
for as many matches as xargs can fit onto the command line of cp, repeated as many times as it takes to pass all the files to cp. (Unfortunately, this doesn't match the OP's scenario; if you need to rename every file while copying, you need to pass in just two arguments per cp invocation: the source file name and the destination file name to copy it to.)
So in other words, if you use the command substitution syntax and grep produces a really long list of matches, you risk bumping into ARG_MAX and "Argument list too long" errors; but xargs will specifically avoid this by instead copying only as many arguments as it can safely pass to cp at a time, and running cp multiple times if necessary instead.
The above will still work incorrectly if you have file names which contain newlines. Perhaps see also https://mywiki.wooledge.org/BashFAQ/020
#!/bin/bash
for f in files; do
if grep -q PATTERN "$f"; then
echo cp -v "$f" "${f}.bac"
fi
done
files can be *.txt or *.text which basically means files ending in *.txt or *text or replace with something that you want/need, of course replace PATTERN with yours. Remove echo if you're satisfied with the output. For a recursive solution take a look at the bash shell option globstar

pipe then hyphen (stdin) as an alternative to for loop

I wrote a few sed an awk commands to extract a set of IDs that are associated with file names. I would like to run a set of commands using these filenames from id.txt
cat id.txt
14235.gz
41231.gz
41234.gz
I usually write for loops as follows:
for i in $(cat id.txt);
do
command <options> $i
done
I thought I could also do cat id.txt | command <options> -
Is there a way to pipe the output of cat, awk, sed, etc, line by line into a command?
Use a while read loop see Don't read lines wit for
while IFS= read -r line_in_text_file; do
echo "$line_in_text_file"
done < id.txt
Commands don't usually get their filename arguments on standard input. Using - as an argument means to read the file contents from standard input instead of a named file, it doesn't mean to get the filename from stdin.
You can use command substitution to use the contents of the file as all the filename arguments to the command:
command <options> $(cat id.txt)
or you can use xargs
xargs command <options> < id.txt
Is there a way to pipe the output of cat, awk, sed, etc, line by line into a command?
Compound commands can be placed in a pipe, the syntax is not very strict. The usual:
awk 'some awk script' |
while IFS= read -r line; do
echo "$line"
done |
sed 'some sed script'
I avoid reading input line by line using a while read - it's very slow. It's way faster to use awk scripts and other commands.
Command groups can be used to:
awk 'some awk script' |
{ # or '(', but there is no need for a subshell
echo "header1,header2"
# remove first line
IFS= read -r first_line
# ignore last line
sed '$d'
} |
sed 'some sed script'
Remember that pipe command are run in a subshell, so variable changes will not affect parent shell.
Bash has process substitution extension that let's you run a while loop inside parent shell:
var=1
while IFS= read -r line; do
if [[ "$line" == 2 ]]; then
var=2
fi
done <(
seq 10 |
sed '$d'
)
echo "$var" # will output 2
xargs can do this
cat id.txt | xargs command
From xargs help
$ xargs --help
Usage: xargs [OPTION]... COMMAND [INITIAL-ARGS]...
Run COMMAND with arguments INITIAL-ARGS and more arguments read from input.
Mandatory and optional arguments to long options are also
mandatory or optional for the corresponding short option.
-0, --null items are separated by a null, not whitespace;
disables quote and backslash processing and
logical EOF processing
-a, --arg-file=FILE read arguments from FILE, not standard input
-d, --delimiter=CHARACTER items in input stream are separated by CHARACTER,
not by whitespace; disables quote and backslash
...

Bash command using xargs and xargs -0

I just found difference between two commands:
echo sum.txt| xargs cat
This will output content of sum.txt
echo sum.txt| xargs -0 cat
This shows error:
cat: sum.txt
: No such file or directory
I know -0 will treat null bytes as delimiter. And i think the new line starts with : is because echo command produce new lines. And doesn't produce output like:
cat: sum.txt: No such file or directory
But if echo produce new lines why the first command can succeed? since xargs use white spaces as delimiter by default.
I think what is happening will become more clear if you replace cat with echo -n as an experiment.
echo sum.txt| xargs echo -n
echo sum.txt | xargs -0 echo -n
In the first example, xargs breaks on the newline and discards the newline, leaving just sum.txt.
In the second example, xargs breaks on the end of the EOF at the end of the output of the first echo, resulting in the string 'sum.txt\n'.

Linux - Get a line from one file and corresponding line from a second file and pass into cp command

I have two .txt files.
'target.txt' is a list of target files
'destination.txt' is a list of (on corresponding lines) of destinations.
I'd like to create a command that does the following:
cp [line 1 from target.txt] [line 1 from destination.txt]
For each line of the files.
paste target.txt destination.txt | sed -e 's/^/cp /' > cp.cmds
Then, after inspecting cp.cmds for correctness, you can just run it as a shell script.
sh cp.cmds
The paste command merges two files by concatenating corresponding lines.
paste target.txt destination.txt | while read target dest; do
cp $target $dest
done
This will not work if any of the filenames contain spaces, though. If that's a requirement, I would use awk to read the first file into an array, then when reading the second file print a cp command with the corresponding lines and quotes around them, and pipe this to sh to execute it.
To handle whitespace in the filenames:
paste -d\\n target.txt destination.txt | xargs -d\\n -n2 -x cp
paste -d\\n interleaves lines of the argument files
xargs -d\\n -n2 reads two complete lines at a time and applies them as two arguments at the end of the command line. The -d flag disables all special processing of quotes, apostrophes and backslashes in the input lines, as well as the eof character (by default _).
The -d command-line options to xargs is a GNU extension. If you are stuck with a Posix standard xargs, you can use the following alternative, courtesy of the Open Group (see example 2, near the end of the page):
paste -d\\n target.txt destination.txt |
sed 's/[^[:alnum:]]/\\&/g' |
xargs -E "" -n 2 -x cp
The sed command backslash-escapes every non-alphanumeric character
xargs -E "" disables the end-of-file character handling.

Pipe output to use as the search specification for grep on Linux

How do I pipe the output of grep as the search pattern for another grep?
As an example:
grep <Search_term> <file1> | xargs grep <file2>
I want the output of the first grep as the search term for the second grep. The above command is treating the output of the first grep as the file name for the second grep. I tried using the -e option for the second grep, but it does not work either.
You need to use xargs's -i switch:
grep ... | xargs -ifoo grep foo file_in_which_to_search
This takes the option after -i (foo in this case) and replaces every occurrence of it in the command with the output of the first grep.
This is the same as:
grep `grep ...` file_in_which_to_search
Try
grep ... | fgrep -f - file1 file2 ...
If using Bash then you can use backticks:
> grep -e "`grep ... ...`" files
the -e flag and the double quotes are there to ensure that any output from the initial grep that starts with a hyphen isn't then interpreted as an option to the second grep.
Note that the double quoting trick (which also ensures that the output from grep is treated as a single parameter) only works with Bash. It doesn't appear to work with (t)csh.
Note also that backticks are the standard way to get the output from one program into the parameter list of another. Not all programs have a convenient way to read parameters from stdin the way that (f)grep does.
I wanted to search for text in files (using grep) that had a certain pattern in their file names (found using find) in the current directory. I used the following command:
grep -i "pattern1" $(find . -name "pattern2")
Here pattern2 is the pattern in the file names and pattern1 is the pattern searched for
within files matching pattern2.
edit: Not strictly piping but still related and quite useful...
This is what I use to search for a file from a listing:
ls -la | grep 'file-in-which-to-search'
Okay breaking the rules as this isn't an answer, just a note that I can't get any of these solutions to work.
% fgrep -f test file
works fine.
% cat test | fgrep -f - file
fgrep: -: No such file or directory
fails.
% cat test | xargs -ifoo grep foo file
xargs: illegal option -- i
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements]] [-J replstr]
[-L number] [-n number [-x]] [-P maxprocs] [-s size]
[utility [argument ...]]
fails. Note that a capital I is necessary. If i use that all is good.
% grep "`cat test`" file
kinda works in that it returns a line for the terms that match but it also returns a line grep: line 3 in test: No such file or directory for each file that doesn't find a match.
Am I missing something or is this just differences in my Darwin distribution or bash shell?
I tried this way , and it works great.
[opuser#vjmachine abc]$ cat a
not problem
all
problem
first
not to get
read problem
read not problem
[opuser#vjmachine abc]$ cat b
not problem xxy
problem abcd
read problem werwer
read not problem 98989
123 not problem 345
345 problem tyu
[opuser#vjmachine abc]$ grep -e "`grep problem a`" b --col
not problem xxy
problem abcd
read problem werwer
read not problem 98989
123 not problem 345
345 problem tyu
[opuser#vjmachine abc]$
You should grep in such a way, to extract filenames only, see the parameter -l (the lowercase L):
grep -l someSearch * | xargs grep otherSearch
Because on the simple grep, the output is much more info than file names only. For instance when you do
grep someSearch *
You will pipe to xargs info like this
filename1: blablabla someSearch blablabla something else
filename2: bla someSearch bla otherSearch
...
Piping any of above line makes nonsense to pass to xargs.
But when you do grep -l someSearch *, your output will look like this:
filename1
filename2
Such an output can be passed now to xargs
I have found the following command to work using $() with my first command inside the parenthesis to have the shell execute it first.
grep $(dig +short) file
I use this to look through files for an IP address when I am given a host name.

Resources