Using Xargs max-procs with multiple arguments from a file - linux

I have a script that is getting me the results I want. I want to improve the performance of the script.
My script takes argument from file file1.txt.
The contents are below:
table1
table2
table3
and so on
Now when I use the while statement like below the script runs in sequential order.
while statement is below:
while IFS=',' read -r a; do import.sh "$a"; done < file1.txt
Now when I use the xargs max-procs utility in bash then the scripts run in parallel based on no of max-procs.
xargs statement is below:
xargs --max-procs 10 -n 1 sh import.sh < file1.txt
Now I have another script
This script takes arguments from file file2.txt.
The contents are below:
table1,db1
table2,db2
table3,db3
and so on
when I use the while statement the script performs fine.
while IFS=',' read -r a b; do test.sh "$a" "$b"; done < file2.txt
But when I use teh xargs statement then the script gives me usage error.
xargs statement is below.
xargs --max-procs 10 -n 1 sh test.sh < file2.txt
The error statement is below:
Usage : test.sh input_file
Why is this happening?
How can I rectify this?

Your second script, test.sh, expects two arguments, but xargs is feeding it only one (one word, in this case the complete line). You can fix it by first converting commas , to newlines (with a simple sed script) and then passing two arguments (now two lines) per call to test.sh (with -n2):
sed s/,/\\n/g file2.txt | xargs --max-procs 10 -n2 sh test.sh
Note that xargs supports a custom delimiter via -d option, and you could use it in case each line in file2.txt were ending with , (but then you should probably strip a newline prefixed to each first field).

Related

How do i execute some line in a file as a command in ternimal?

I write down some commands row by row in a file, and I want to execute the commands through grep and pipe;
for example:
1.there is a file a.txt,which content is like below:
echo "hello world"
ls -l
2.then I want execute the first line in my terminal, so I want it like this:
cat a.txt | grep echo | execute the output of previous commands
so that, I can finally execute the command, which is the first line of a.txt.
(can not find any answer of this, so I come here to find some help.)
You can either pipe the command to bash (or any other shell) to execute it:
sed -n 1p a.txt | bash
or you can use eval with command substitution:
eval $(head -n1 a.txt)
BTW, I showed you another two ways how to extract the line from the file.

Move a file list based upon grep pattern in command line [duplicate]

I want to pass each output from a command as multiple argument to a second command, e.g.:
grep "pattern" input
returns:
file1
file2
file3
and I want to copy these outputs, e.g:
cp file1 file1.bac
cp file2 file2.bac
cp file3 file3.bac
How can I do that in one go? Something like:
grep "pattern" input | cp $1 $1.bac
You can use xargs:
grep 'pattern' input | xargs -I% cp "%" "%.bac"
You can use $() to interpolate the output of a command. So, you could use kill -9 $(grep -hP '^\d+$' $(ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }')) if you wanted to.
In addition to Chris Jester-Young good answer, I would say that xargs is also a good solution for these situations:
grep ... `ls -lad ... | awk '{ print $9 }'` | xargs kill -9
will make it. All together:
grep -hP '^\d+$' `ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }'` | xargs kill -9
For completeness, I'll also mention command substitution and explain why this is not recommended:
cp $(grep -l "pattern" input) directory/
(The backtick syntax cp `grep -l "pattern" input` directory/ is roughly equivalent, but it is obsolete and unwieldy; don't use that.)
This will fail if the output from grep produces a file name which contains whitespace or a shell metacharacter.
Of course, it's fine to use this if you know exactly which file names the grep can produce, and have verified that none of them are problematic. But for a production script, don't use this.
Anyway, for the OP's scenario, where you need to refer to each match individually and add an extension to it, the xargs or while read alternatives are superior anyway.
In the worst case (meaning problematic or unspecified file names), pass the matches to a subshell via xargs:
grep -l "pattern" input |
xargs -r sh -c 'for f; do cp "$f" "$f.bac"; done' _
... where obviously the script inside the for loop could be arbitrarily complex.
In the ideal case, the command you want to run is simple (or versatile) enough that you can simply pass it an arbitrarily long list of file names. For example, GNU cp has a -t option to facilitate this use of xargs (the -t option allows you to put the destination directory first on the command line, so you can put as many files as you like at the end of the command):
grep -l "pattern" input | xargs cp -t destdir
which will expand into
cp -t destdir file1 file2 file3 file4 ...
for as many matches as xargs can fit onto the command line of cp, repeated as many times as it takes to pass all the files to cp. (Unfortunately, this doesn't match the OP's scenario; if you need to rename every file while copying, you need to pass in just two arguments per cp invocation: the source file name and the destination file name to copy it to.)
So in other words, if you use the command substitution syntax and grep produces a really long list of matches, you risk bumping into ARG_MAX and "Argument list too long" errors; but xargs will specifically avoid this by instead copying only as many arguments as it can safely pass to cp at a time, and running cp multiple times if necessary instead.
The above will still work incorrectly if you have file names which contain newlines. Perhaps see also https://mywiki.wooledge.org/BashFAQ/020
#!/bin/bash
for f in files; do
if grep -q PATTERN "$f"; then
echo cp -v "$f" "${f}.bac"
fi
done
files can be *.txt or *.text which basically means files ending in *.txt or *text or replace with something that you want/need, of course replace PATTERN with yours. Remove echo if you're satisfied with the output. For a recursive solution take a look at the bash shell option globstar

pipe then hyphen (stdin) as an alternative to for loop

I wrote a few sed an awk commands to extract a set of IDs that are associated with file names. I would like to run a set of commands using these filenames from id.txt
cat id.txt
14235.gz
41231.gz
41234.gz
I usually write for loops as follows:
for i in $(cat id.txt);
do
command <options> $i
done
I thought I could also do cat id.txt | command <options> -
Is there a way to pipe the output of cat, awk, sed, etc, line by line into a command?
Use a while read loop see Don't read lines wit for
while IFS= read -r line_in_text_file; do
echo "$line_in_text_file"
done < id.txt
Commands don't usually get their filename arguments on standard input. Using - as an argument means to read the file contents from standard input instead of a named file, it doesn't mean to get the filename from stdin.
You can use command substitution to use the contents of the file as all the filename arguments to the command:
command <options> $(cat id.txt)
or you can use xargs
xargs command <options> < id.txt
Is there a way to pipe the output of cat, awk, sed, etc, line by line into a command?
Compound commands can be placed in a pipe, the syntax is not very strict. The usual:
awk 'some awk script' |
while IFS= read -r line; do
echo "$line"
done |
sed 'some sed script'
I avoid reading input line by line using a while read - it's very slow. It's way faster to use awk scripts and other commands.
Command groups can be used to:
awk 'some awk script' |
{ # or '(', but there is no need for a subshell
echo "header1,header2"
# remove first line
IFS= read -r first_line
# ignore last line
sed '$d'
} |
sed 'some sed script'
Remember that pipe command are run in a subshell, so variable changes will not affect parent shell.
Bash has process substitution extension that let's you run a while loop inside parent shell:
var=1
while IFS= read -r line; do
if [[ "$line" == 2 ]]; then
var=2
fi
done <(
seq 10 |
sed '$d'
)
echo "$var" # will output 2
xargs can do this
cat id.txt | xargs command
From xargs help
$ xargs --help
Usage: xargs [OPTION]... COMMAND [INITIAL-ARGS]...
Run COMMAND with arguments INITIAL-ARGS and more arguments read from input.
Mandatory and optional arguments to long options are also
mandatory or optional for the corresponding short option.
-0, --null items are separated by a null, not whitespace;
disables quote and backslash processing and
logical EOF processing
-a, --arg-file=FILE read arguments from FILE, not standard input
-d, --delimiter=CHARACTER items in input stream are separated by CHARACTER,
not by whitespace; disables quote and backslash
...

Pass a list of files to perl script via pipe

I am having a problem where my perl script will fail upon having an input piped, but works fine when I just list all the file names individually.
For reference, input of the perl script is read with while(<>).
Example:
script.pl file1.tag file2.tag file3.tag
runs fine.
But the following all fail.
find ./*.tag | chomp | script.pl
ls -l *.tag | perl -pe 's/\n/ /g' | script.pl
find ./*.tag | perl -pe 's/\n/ /g' | script.pl
I also tested dumping it into a text file and catting that into the perl:
cat files.text | script.pl
All of them fail the same way. It is like the script is passed no input arguments and the program just finishes.
From perldoc perlop:
The null filehandle <> is special [...] Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the #ARGV array is checked, and if it is empty, $ARGV[0] is set to -, which when opened gives you standard input. The #ARGV array is then processed as a list of filenames.
You're not passing any command line arguments to your Perl scripts, so everything you pipe into them is read into STDIN instead of being treated as filenames:
$ echo foo > foo.txt
$ echo bar > bar.txt
$ ls | perl -e 'print "<$_>\n" while <>'
<bar.txt
>
<foo.txt
>
Notice that the files foo.txt and bar.txt are not actually read; all we get is the file names. If you want the files to be opened and read, you have to pass them as command line arguments or explicitly set #ARGV:
$ perl -e 'print "<$_>\n" while <>' *
<bar
>
<foo
>
If you have a large number of files, like you're likely to get from find, you should use xargs as Dyno Hongjun Fu suggested.
However, you don't need find, ls, cat, or your Perl one-liner to run your script on all the .tag files in the current directory. Simply do:
script.pl *.tag
you need xargs, e.g.
find ./ -type f -name "*.tag" | xargs -i script.pl {}
what is chomp?

UNIX shell script to run a list of grep commands from a file and getting result in a single delimited file

I am beginner in unix programming and a way to automate my work
I want to run a list a grep commands and get the output of all the grep command in a in a single delimited file .
i am using the following bash script. But it's not working .
Mockup sh file:
!/bin/sh
grep -l abcd123
grep -l abcd124
grep -l abcd125
and while running i used the following command
$ ./Mockup.sh > output.txt
Is it the right command?
How can I get both the grep command and output in the output file?
how can i delimit the output after each command and result?
How can I get both the grep command and output in the output file
You can use bash -v (verbose) to print each command before execution on stderr and it's output will be as usual be available on stdout:
bash -v ./Mockup.sh > output.txt 2>&1
cat output.txt
Working Demo
A suitable shell script could be
#!/bin/sh
grep -l 'abcd123\|abcd124\|abcd125' "$#"
provided that the filenames you pass on the invocation of the script are "well behaved", that is no whitespace in them. (Edit Using the "$#" expansion takes care of generic whitespace in the filenames, tx to triplee for his/her comment)
This kind of invocation (with alternative matching strings, as per the \| syntax) has the added advantage that you have exactly one occurrence of a filename in your final list, because grep -l prints once the filename as soon as it finds the first occurrence of one of the three strings in a file.
Addendum about "$#"
% ff () { for i in "$#" ; do printf "[%s]\n" "$i" ; done ; }
% # NB "a s d" below is indeed "a SPACE s TAB d"
% ff "a s d" " ert " '345
345'
[a s d]
[ ert ]
[345
345]
%
cat myscript.sh
########################
#!/bin/bash
echo "Trying to find the file contenting the below string, relace your string with below string"
grep "string" /path/to/folder/* -R -l
########################
save above file and run it as below
sh myscript.sh > output.txt
once the command prmpt get return you can check the output.txt for require output.
Another approach, less efficient, that tries to address the OP question
How can I get both the grep command and output in the output file?
% cat Mockup
#!/bin/sh
grep -o -e string1 -e string2 -e string3 "$#" 2> /dev/null | sort -t: -k2 | uniq
Output: (mocked up as well)
% sh Mockup file{01..99}
file01:string1
file17:string1
file44:string1
file33:string2
file44:string2
file48:string2
%
looking at the output from POV of a consumer, one foresees problems with search strings and/or file names containing colons... oh well, that's another Q maybe

Resources