How do I use the filenames output by "grep" as argument to another program - linux

I have this grep command which outputs the names of files (which contains matches to some pattern), and I want to parse those files with some file-parsing program. The pipechain looks like this:
grep -rl "{some-pattern}" . | {some-file-parsing-program} > a.out
How do I get those file names as command line arguments to the file-parsing program?
For example, let's say grep returns the filenames a, b, c. How do I pass the filenames so that it's as if I'm executing
{some-file-parsing-program} a b c > a.out
?

It looks to me as though you're wanting xargs:
grep -rl "{some_pattern" . | xargs your-command > a.out
I'm not convinced a.out is a good output file name, but we can let that slide. The xargs command reads white-space separated file names from standard input and then invokes your-command with those names as arguments. It may need to invoke your-command several times; unless you're using GNU xargs and you specify -r, your-command will be invoked at least once, even if there are no matching file names.
Without using xargs, you could not use sed for this job. Without using xargs, using awk would be clumsy. Perl (and Python) could manage it 'trivially'; it would be easy to write the code to read file names from standard input and then process each file in turn.

I don't know of any linux programs that cannot read from stdin. Depending on the program, the default input may be stdin or you may need to specify to use stdin by using a command line option (often - by itself). Do you have anything particular in mind?

Related

How to use xargs with find?

I have a large number of files on disk and trying to xargs with find to get faster output.
find . -printf '%m %p\n'|sort -nr
If I write find . -printf '%m %p\n'|xargs -0 -P 0 sort -nr, it gives error argument line is too long. Removing -0 option gives other error.
The parallelism commands such as xargs or GNU parallel
are applicable only if the task can be divided into multiple independent
jobs e.g. processing multiple files at once with the same command.
It is not possible to use sort command with these parallelism commands.
Although sort has --parallel option, it may not work well for
piped input. (Not fully evaluated.)
As side notes:
The mechanism of xargs is it reads items (filenames in most cases) from
the standard input and generates individual commands by combining
the argument list (command to be executed) with each item. Then you'll
see the syntax .. | xargs .. sort is incorrect because each filename
is passed to sort as an argument then sort tries to sort the contents
of the file.
The -0 option to xargs tells xargs that input items are delimited
by a null character instead of a newline. It is useful when the input
filenames contain special characters including a newline character.
In order to use this feature, you need to coherently handle the piped
stream in that way: putting -print0 option to find and -z option
to sort. Otherwise the items are wrongly concatenated and will cause
argument line is too long error.
Suggesting to use locate command instead of find command.
You might want to update files database with updatedb command.
Read more about locate command here.

why can't pass file path argument to shell command 'more' in pipeline mode?

I have a text file a.txt
hello world
I use following commands:
cmd1:
$ more a.txt
output:
hello world
cmd2:
$ echo 'a.txt'|more
output:
a.txt
I thought cmd2 should equal to echo 'a.txt'|xargs -i more {},but it's not.
I want to know why cmd2 worked like that and how to write code which work differently in pipeline mode.
Redirection with | or < controls what the stdin stream contains; it has no impact on a program's command line argument list.
Thus, more <a.txt (efficiently) or cat a.txt | more (inefficiently) both attach a file handle from which one can read the contents of a.txt to the stdin file handle of a new process before replacing that process with more. Similarly, echo a.txt | more makes a.txt itself the literal text that more reads from its stdin stream, which is the default place it's documented to get the input to display from, if not given any more specific filename(s) on its command line.
Generally, if you have a list of filenames and want to convert them to command-line arguments, this is what xargs is for (though using it without a great deal of care can introduce bugs, potentially-security-impacting ones).
Consider the following, which (using NUL rather than newline delimiters to separate filenames) is a safe use of xargs to take a list of filenames being piped into it, and transform that into an argument list to cat, used to concatenate all those files together and generate a single stream of input to more:
printf '%s\0' a.txt b.txt |
xargs -0 cat -- |
more

How to take advantage of filters

I've read here that
To make a pipe, put a vertical bar (|) on the command line between two commands.
then
When a program takes its input from another program, performs some operation on that input, and writes the result to the standard output, it is referred to as a filter.
So I've first tried the ls command whose output is:
Desktop HelloWord.java Templates glassfish-4.0
Documents Music Videos hs_err_pid26742.log
Downloads NetBeansProjects apache-tomcat-8.0.3 mozilla.pdf
HelloWord Pictures examples.desktop netbeans-8.0
Then ls | echo which outputs absolutely nothing.
I'm looking for a way to take advantages of pipelines and filters in my bash script. Please help.
echo doesn't read from standard input. It only writes its command-line arguments to standard output. The cat command is what you want, which takes what it reads from standard input to standard output.
ls | cat
(Note that the pipeline above is a little pointless, but does demonstrate the idea of a pipe. The command on the right-hand side must read from standard input.)
Don't confuse command-line arguments with standard input.
echo doesn't read standard input. To try something more useful, try
ls | sort -r
to get the output sorted in reverse,
or
ls | grep '[0-9]'
to only keep the lines containing digits.
In addition to what others have said - if your command (echo in this example) does not read from standard input you can use xargs to "feed" this command from standard input, so
ls | echo
doesn't work, but
ls | xargs echo
works fine.

grep based on blacklist -- without procedural code?

It's a well-known task, simple to describe:
Given a text file foo.txt, and a blacklist file of exclusion strings, one per line, produce foo_filtered.txt that has only the lines of foo.txt that do not contain any exclusion string.
A common application is filtering compiler warnings from a build log, but to ignore warnings on files that are not yours. The file foo.txt is the warnings file (itself filtered from the build log), and a blacklist file excluded_filenames.txt with file names, one per line.
I know how it's done in procedural languages like Perl or AWK, and I've even done it with combinations of Linux commands such as cut, comm, and sort.
But I feel that I should be really close with xargs, and just can't see the last step.
I know that if excluded_filenames.txt has only 1 file name in it, then
grep -v foo.txt `cat excluded_filenames.txt`
will do it.
And I know that I can get the filenames one per line with
xargs -L1 -a excluded_filenames.txt
So how do I combine those two into a single solution, without explicit loops in a procedural language?
Looking for the simple and elegant solution.
You should use the -f option (or you can use fgrep which is the same):
grep -vf excluded_filenames.txt foo.txt
You could also use -F which is more directly the answer to what you asked:
grep -vF "`cat excluded_filenames.txt`" foo.txt
from man grep
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.

How do I list one filename per output line in Linux?

I'm using ls -a command to get the file names in a directory, but the output is in a single line.
Like this:
. .. .bash_history .ssh updater_error_log.txt
I need a built-in alternative to get filenames, each on a new line, like this:
.
..
.bash_history
.ssh
updater_error_log.txt
Use the -1 option (note this is a "one" digit, not a lowercase letter "L"), like this:
ls -1a
First, though, make sure your ls supports -1. GNU coreutils (installed on standard Linux systems) and Solaris do; but if in doubt, use man ls or ls --help or check the documentation. E.g.:
$ man ls
...
-1 list one file per line. Avoid '\n' with -q or -b
Yes, you can easily make ls output one filename per line:
ls -a | cat
Explanation: The command ls senses if the output is to a terminal or to a file or pipe and adjusts accordingly.
So, if you pipe ls -a to python it should work without any special measures.
Ls is designed for human consumption, and you should not parse its output.
In shell scripts, there are a few cases where parsing the output of ls does work is the simplest way of achieving the desired effect. Since ls might mangle non-ASCII and control characters in file names, these cases are a subset of those that do not require obtaining a file name from ls.
In python, there is absolutely no reason to invoke ls. Python has all of ls's functionality built-in. Use os.listdir to list the contents of a directory and os.stat or os to obtain file metadata. Other functions in the os modules are likely to be relevant to your problem as well.
If you're accessing remote files over ssh, a reasonably robust way of listing file names is through sftp:
echo ls -1 | sftp remote-site:dir
This prints one file name per line, and unlike the ls utility, sftp does not mangle nonprintable characters. You will still not be able to reliably list directories where a file name contains a newline, but that's rarely done (remember this as a potential security issue, not a usability issue).
In python (beware that shell metacharacters must be escapes in remote_dir):
command_line = "echo ls -1 | sftp " + remote_site + ":" + remote_dir
remote_files = os.popen(command_line).read().split("\n")
For more complex interactions, look up sftp's batch mode in the documentation.
On some systems (Linux, Mac OS X, perhaps some other unices, but definitely not Windows), a different approach is to mount a remote filesystem through ssh with sshfs, and then work locally.
you can use ls -1
ls -l will also do the work
You can also use ls -w1
This allows to set number of columns.
From manpage of ls:
-w, --width=COLS
set output width to COLS. 0 means no limit
ls | tr "" "\n"
Easy, as long as your filenames don't include newlines:
find . -maxdepth 1
If you're piping this into another command, you should probably prefer to separate your filenames by null bytes, rather than newlines, since null bytes cannot occur in a filename (but newlines may):
find . -maxdepth 1 -print0
Printing that on a terminal will probably display as one line, because null bytes are not normally printed. Some programs may need a specific option to handle null-delimited input, such as sort's -z. Your own script similarly would need to account for this.
-1 switch is the obvious way of doing it but just to mention, another option is using echo and a command substitution within a double quote which retains the white-spaces(here \n):
echo "$(ls)"
Also how ls command behaves is mentioned here:
If standard output is a terminal, the output is in columns (sorted
vertically) and control characters are output as question marks;
otherwise, the output is listed one per line and control characters
are output as-is.
Now you see why redirecting or piping outputs one per line.

Resources