show a specific line from a command output Linux script shell - linux

I want to find the file with the biggest number of words in a directory so I tried to use just the second line from the output of this command :
wc * -w | sort -nr
In fact I know that it will work if i saved the output to a file and used the command sed like this :
wc * -w | sort -nr >> file
sed -n "2p" file
but this is not what I want , I need to do it via the output of the command directly
I tried with a script shell like :
for i in `wc * -w | sort -nr`
do
if test $i -eq 2 then
echo "$i"
fi
done
but it was not what I expected
Thank you in advance.

find -type f -exec wc -w {} + | sort -nr | awk NR==2
use find instead, wildcard will find folders as well
pipe to awk

Related

xargs print output from wc -l

I would like to use xargs to count the number of blocks of 4 lines in a list of compressed files, and do the counting in parallel using 8 CPUs, like this:
find $PWD/ -name "*.ext.gz" | xargs -t -n1 -P8 -I % gunzip -c % | paste - - - - | wc -l
Currently, this one-liner does the calculation but I cannot see the output count except for the last one.
What do I need to add to be able to see the number coming from wc -l associated to the input file?
Any ideas?
If I understand your question right, you have a wrong assumption. It would appear that you expect that
gunzip -c <filename> | paste - - - - | wc -l
will be run for each file that find reports. This is incorrect. What is actually happening is that
gunzip -c <filename>
is being run for each file, the outputs of each uncompressed file are all being combined into one large body, and paste - - - - | wc -l is being run on that combined result.
A better approach would be to write a short shell script, say count_groups.sh that looks something like this:
#!/bin/bash
nlines=$(gzcat $1 | wc -l)
(( ngroups = nlines / 4 ))
echo "$1 : $ngroups"
Then, run chmod +x count_groups.sh, and run
find $PWD/ -name "*.ext.gz" | xargs -t -n1 -P8 -I% ./count_groups.sh %

Trying to delete lines beginning with a specific string from files where the file meets a target condition, in bash/linux

I am writing a bash script that will run a couple of times a minute. What I would like it to do is find all files in a specified directory that contain a specified string, and search that list of files and delete any line beginning with a different specific string (in this case it's
Here's what I've tried s far, but they aren't working:
ls -1t /the/directory | head -10 | grep -l "qualifying string" * | sed -i '/^<meta/d' *'
ls -1t /the/directory | head -10 | grep -l "qualifying string" * | sed -i '/^<meta/d' /the/directory'
The only reason I added in the head -10 is so that every time the script runs, it will start by only looking at the 10 most recent files. I don't want it to spend a lot of time searching needlessly through the entire directory since it will be going through and removing the line many times a minute.
The script has to be run out of a different directory than the files are in. Also, would the modified date on the files change if the "<meta" string doesn't exist in the file?
There are a variety of problem with this part of the command...
ls -1t /the/directory | head -10 | grep -l "qualifying string" * ...
First, you appear to be trying to pipe the output of ls ... | head -10 into grep, which would cause grep to search for "qualifying string" in the output of ls. Except then you turn around and provide * as a command line argument to grep, causing it to search in all the files, and completely ignoring the ls and head commands.
You probably want to read about the xargs commands, which reads a list of files on stdin and then runs a given command against that list. For example, you ought to be able to generate your file list like this:
ls -1t /the/directory | head -10 |
xargs grep -l "qualifying string"
And to apply sed to those files:
ls -1t /the/directory | head -10 |
xargs grep -l "qualifying string" |
sed -i 's/something/else/g'
Modifying the files with sed will update the modification time on the files.
You can use globbing with the * character to expand file names and loop through the directory.
n=0
for file in /the/directory/*; do
if [ -f "$file" ]; then
grep "qualifying string" "$file" && sed -i '/^<meta/d' "$file"
n=$((n+1))
fi
[ $n -eq 10 ] && break
done

Output filename/lines/type for given directory

I'm trying to teach myself basic file manipulation and scripting in linux but I've hit a wall. Right now I'm trying to output a table that gives something like
FILENAME LINES TYPE
File1 22 File
File2 56 File
Folder1 N/A Directory
when given any directory to search. I've been researching how to format output using awk and using maybe grep and wc to try and get my data but I'm a bit lost. For all I know I'm barking up the wrong tree entirely.
Look at printf to format your output, then look at the commands file to find your file type, wc to print out the number of lines, etc.
All this could be done via a find | while read loop:
printf "%-20.20s %-3.3s %s\n", "File", "Lines", "Type"
find . -type f -print0 | while read -d $'\0' file
do
file_name=$(basename $file)
lines="$(cat $file | wc -l | sed 's/^ *//')"
desc="$(file --brief "$file")"
printf "%-20.20s %3.3s %s\n", "$file_name", $lines, "$desc"
done
The $(...) syntax returns the output of the enclosed command as a string that can be assigned to variable. I use cat $file | wc -l to eliminate the name of the file, and then use sed to remove leading spaces.

xargs with multiple arguments

I have a source input, input.txt
a.txt
b.txt
c.txt
I want to feed these input into a program as the following:
my-program --file=a.txt --file=b.txt --file=c.txt
So I try to use xargs, but with no luck.
cat input.txt | xargs -i echo "my-program --file"{}
It gives
my-program --file=a.txt
my-program --file=b.txt
my-program --file=c.txt
But I want
my-program --file=a.txt --file=b.txt --file=c.txt
Any idea?
Don't listen to all of them. :) Just look at this example:
echo argument1 argument2 argument3 | xargs -l bash -c 'echo this is first:$0 second:$1 third:$2'
Output will be:
this is first:argument1 second:argument2 third:argument3
None of the solutions given so far deals correctly with file names containing space. Some even fail if the file names contain ' or ". If your input files are generated by users, you should be prepared for surprising file names.
GNU Parallel deals nicely with these file names and gives you (at least) 3 different solutions. If your program takes 3 and only 3 arguments then this will work:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo a2.txt; echo b2.txt; echo c2.txt;) |
parallel -N 3 my-program --file={1} --file={2} --file={3}
Or:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo a2.txt; echo b2.txt; echo c2.txt;) |
parallel -X -N 3 my-program --file={}
If, however, your program takes as many arguments as will fit on the command line:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo d1.txt; echo e1.txt; echo f1.txt;) |
parallel -X my-program --file={}
Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ
How about:
echo $'a.txt\nb.txt\nc.txt' | xargs -n 3 sh -c '
echo my-program --file="$1" --file="$2" --file="$3"
' argv0
It's simpler if you use two xargs invocations: 1st to transform each line into --file=..., 2nd to actually do the xargs thing ->
$ cat input.txt | xargs -I# echo --file=# | xargs echo my-program
my-program --file=a.txt --file=b.txt --file=c.txt
You can use sed to prefix --file= to each line and then call xargs:
sed -e 's/^/--file=/' input.txt | xargs my-program
Here is a solution using sed for three arguments, but is limited in that it applies the same transform to each argument:
cat input.txt | sed 's/^/--file=/g' | xargs -n3 my-program
Here's a method that will work for two args, but allows more flexibility:
cat input.txt | xargs -n 2 | xargs -I{} sh -c 'V="{}"; my-program -file=${V% *} -file=${V#* }'
I stumbled on a similar problem and found a solution which I think is nicer and cleaner than those presented so far.
The syntax for xargs that I have ended with would be (for your example):
xargs -I X echo --file=X
with a full command line being:
my-program $(cat input.txt | xargs -I X echo --file=X)
which will work as if
my-program --file=a.txt --file=b.txt --file=c.txt
was done (providing input.txt contains data from your example).
Actually, in my case I needed to first find the files and also needed them sorted so my command line looks like this:
my-program $(find base/path -name "some*pattern" -print0 | sort -z | xargs -0 -I X echo --files=X)
Few details that might not be clear (they were not for me):
some*pattern must be quoted since otherwise shell would expand it before passing to find.
-print0, then -z and finally -0 use null-separation to ensure proper handling of files with spaces or other wired names.
Note however that I didn't test it deeply yet. Though it seems to be working.
xargs doesn't work that way. Try:
myprogram $(sed -e 's/^/--file=/' input.txt)
It's because echo prints a newline. Try something like
echo my-program `xargs --arg-file input.txt -i echo -n " --file "{}`
I was looking for a solution for this exact problem and came to the conclution of coding a script in the midle.
to transform the standard output for the next example use the -n '\n' delimeter
example:
user#mybox:~$ echo "file1.txt file2.txt" | xargs -n1 ScriptInTheMiddle.sh
inside the ScriptInTheMidle.sh:
!#/bin/bash
var1=`echo $1 | cut -d ' ' -f1 `
var2=`echo $1 | cut -d ' ' -f2 `
myprogram "--file1="$var1 "--file2="$var2
For this solution to work you need to have a space between those arguments file1.txt and file2.txt, or whatever delimeter you choose, one more thing, inside the script make sure you check -f1 and -f2 as they mean "take the first word and take the second word" depending on the first delimeter's position found (delimeters could be ' ' ';' '.' whatever you wish between single quotes .
Add as many parameters as you wish.
Problem solved using xargs, cut , and some bash scripting.
Cheers!
if you wanna pass by I have some useful tips http://hongouru.blogspot.com
Actually, it's relatively easy:
... | sed 's/^/--prefix=/g' | xargs echo | xargs -I PARAMS your_cmd PARAMS
The sed 's/^/--prefix=/g' is optional, in case you need to prefix each param with some --prefix=.
The xargs echo turns the list of param lines (one param in each line) into a list of params in a single line and the xargs -I PARAMS your_cmd PARAMS allows you to run a command, placing the params where ever you want.
So cat input.txt | sed 's/^/--file=/g' | xargs echo | xargs -I PARAMS my-program PARAMS does what you need (assuming all lines within input.txt are simple and qualify as a single param value each).
There is another nice way of doing this, if you do not know the number of files upront:
my-program $(find . -name '*.txt' -printf "--file=%p ")
Nobody has mentioned echoing out from a loop yet, so I'll put that in for completeness sake (it would be my second approach, the sed one being the first):
for line in $(< input.txt) ; do echo --file=$line ; done | xargs echo my-program
Old but this is a better answer:
cat input.txt | gsed "s/\(.*\)/\-\-file=\1/g" | tr '\n' ' ' | xargs my_program
# i like clean one liners
gsed is just gnu sed to ensure syntax matches version brew install gsed or just sed if your on gnu linux already...
test it:
cat input.txt | gsed "s/\(.*\)/\-\-file=\1/g" | tr '\n' ' ' | xargs echo my_program

Problems with Grep Command in bash script

I'm having some rather unusual problems using grep in a bash script. Below is an example of the bash script code that I'm using that exhibits the behaviour:
UNIQ_SCAN_INIT_POINT=1
cat "$FILE_BASENAME_LIST" | uniq -d >> $UNIQ_LIST
sed '/^$/d' $UNIQ_LIST >> $UNIQ_LIST_FINAL
UNIQ_LINE_COUNT=`wc -l $UNIQ_LIST_FINAL | cut -d \ -f 1`
while [ -n "`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`" ]; do
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
CURRENT_DUPECHK_FILE=$FILE_DUPEMATCH-$CURRENT_LINE
grep $CURRENT_LINE $FILE_LOCTN_LIST >> $CURRENT_DUPECHK_FILE
MATCH=`grep -c $CURRENT_LINE $FILE_BASENAME_LIST`
CMD_ECHO="$CURRENT_LINE matched $MATCH times," cmd_line_echo
echo "$CURRENT_DUPECHK_FILE" >> $FILE_DUPEMATCH_FILELIST
let UNIQ_SCAN_INIT_POINT=UNIQ_SCAN_INIT_POINT+1
done
On numerous occasions, when grepping for the current line in the file location list, it has put no output to the current dupechk file even though there have definitely been matches to the current line in the file location list (I ran the command in terminal with no issues).
I've rummaged around the internet to see if anyone else has had similar behaviour, and thus far all I have found is that it is something to do with buffered and unbuffered outputs from other commands operating before the grep command in the Bash script....
However no one seems to have found a solution, so basically I'm asking you guys if you have ever come across this, and any idea/tips/solutions to this problem...
Regards
Paul
The `problem' is the standard I/O library. When it is writing to a terminal
it is unbuffered, but if it is writing to a pipe then it sets up buffering.
try changing
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
to
CURRENT LINE=`sed "$UNIQ_SCAN_INIT_POINT"'q;d' $UNIQ_LIST_FINAL`
Are there any directories with spaces in their names in $FILE_LOCTN_LIST? Because if they are, those spaces will need escaped somehow. Some combination of find and xargs can usually deal with that for you, especially xargs -0
A small bash script using md5sum and sort that detects duplicate files in the current directory:
CURRENT="" md5sum * |
sort |
while read md5sum filename;
do
[[ $CURRENT == $md5sum ]] && echo $filename is duplicate;
CURRENT=$md5sum;
done
you tagged linux, some i assume you have tools like GNU find,md5sum,uniq, sort etc. here's a simple example to find duplicate files
$ echo "hello world">file
$ md5sum file
6f5902ac237024bdd0c176cb93063dc4 file
$ cp file file1
$ md5sum file1
6f5902ac237024bdd0c176cb93063dc4 file1
$ echo "blah" > file2
$ md5sum file2
0d599f0ec05c3bda8c3b8a68c32a1b47 file2
$ find . -type f -exec md5sum "{}" \; |sort -n | uniq -w32 -D
6f5902ac237024bdd0c176cb93063dc4 ./file
6f5902ac237024bdd0c176cb93063dc4 ./file1

Resources