File name printed twice when using wc - linux

For printing number of lines in all ".txt" files of current folder, I am using following script:
for f in *.txt;
do l="$(wc -l "$f")";
echo "$f" has "$l" lines;
done
But in output I am getting:
lol.txt has 2 lol.txt lines
Why is lol.txt printed twice (especially after 2)? I guess there is some sort of stream flush required, but I dont know how to achieve that in this case.So what changes should i make in the script to get the output as :
lol.txt has 2 lines

You can remove the filename with 'cut':
for f in *.txt;
do l="$(wc -l "$f" | cut -f1 -d' ')";
echo "$f" has "$l" lines;
done

The filename is printed twice, because wc -l "$f" also prints the filename after the number of lines. Try changing it to cat "$f" | wc -l.

wc prints the filename, so you could just write the script as:
ls *.txt | while read f; do wc -l "$f"; done
or, if you really want the verbose output, try
ls *.txt | while read f; do wc -l "$f" | awk '{print $2, "has", $1, "lines"}'; done

There is a trick here. Get wc to read stdin and it won't print a file name:
for f in *.txt; do
l=$(wc -l < "$f")
echo "$f" has "$l" lines
done

Related

wc with find. error if space in folders name

I need to calculate folder size in bytes.
if folder name contains space /folder/with spaces/ then following command not work properly
wc -c `find /folder -type f` | grep total | awk '{print $1}'
with error
wc: /folder/with: No such file or directory
wc: spaces/file2: No such file or directory
How can it done?
Try this line instead:
find /folder -type f | xargs -I{} wc -c "{}" | awk '{print $1}'
You need the names individually quoted.
$: while read n; # assign whole row read to $n
do a+=("$n"); # add quoted "$n" to array
done < <( find /folder -type f ) # reads find as a stream
$: wc -c "${a[#]}" | # pass wc the quoted names
sed -n '${ s/ .*//; p; }' # ignore all but total, scrub and print
Compressed to short couple lines -
$: while read n; do a+=( "$n"); done < <( find /folder -type f )
$: wc -c "${a[#]}" | sed -n '${ s/ .*//; p; }'
This is because bash (different to zsh) word-splits the result of the command substitution. You could use an array to collect the file names:
files=()
for entry in *
do
[[ -f $entry ]] && files+=("$entry")
done
wc -c "${files[#]}" | grep .....

rename files which produced by split

I splitted the huge file and output is several files which start by x character.
I want to rename them and make a list which sorted by name like below:
part-1.gz
part-2.gz
part-3.gz ...
I tried below CMD:
for (( i = 1; i <= 3; i++ )) ;do for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv $f part-$i.gz ;done ; done;
for f in `ls -l | awk '{print $9}' | grep '^x'`; do for i in 1 .. 3 ; do mv -- "$f" "${part-$i}.gz" ;done ; done;
for i in 1 .. 3 ;do for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv -- "$f" "${part-$i}.gz" ;done ; done;
for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv -- "$f" "${f%}.gz" ;done
Tip: don't do ls -l if you only need the file names. Even better, don't use ls at all, just use the shell's globbing ability. x* expands to all file names starting with x.
Here's a way to do it:
i=1; for f in x*; do mv $f $(printf 'part-%d.gz' $i); ((i++)); done
This initializes i to 1, and then loops over all file names starting with x in alphabetical order, assigning each file name in turn to the variable f. Inside the loop, it renames $f to $(printf 'part-%d.gz' $i), where the printf command replaces %d with the current value of i. You might want something like %02d if you need to prefix the number with zeros. Finally, still inside the loop, it increments i so that the next file receives the next number.
Note that none of this is safe if the input file names contain spaces, but yours don't.

Created directory with for loop in bash

I have these files. Imagine that each "test" represent the name of one server:
test10.txt
test11.txt
test12.txt
test13.txt
test14.txt
test15.txt
test16.txt
test17.txt
test18.txt
test19.txt
test1.txt
test20.txt
test21.txt
test22.txt
test23.txt
test24.txt
test25.txt
test26.txt
test27.txt
test28.txt
test29.txt
test2.txt
test30.txt
test31.txt
test32.txt
test33.txt
test34.txt
test35.txt
test36.txt
test37.txt
test38.txt
test39.txt
test3.txt
test40.txt
test4.txt
test5.txt
test6.txt
test7.txt
test8.txt
test9.txt
In each txt file, I have this type of data:
2019-10-14-00-00;/dev/hd1;1024.00;136.37;/
2019-10-14-00-00;/dev/hd2;5248.00;4230.53;/usr
2019-10-14-00-00;/dev/hd3;2560.00;481.66;/var
2019-10-14-00-00;/dev/hd4;3584.00;67.65;/tmp
2019-10-14-00-00;/dev/hd5;256.00;26.13;/home
2019-10-14-00-00;/dev/hd1;1024.00;476.04;/opt
2019-10-14-00-00;/dev/hd5;384.00;0.38;/usr/xxx
2019-10-14-00-00;/dev/hd4;256.00;21.39;/xxx
2019-10-14-00-00;/dev/hd2;512.00;216.84;/opt
2019-10-14-00-00;/dev/hd3;128.00;21.46;/var/
2019-10-14-00-00;/dev/hd8;256.00;75.21;/usr/
2019-10-14-00-00;/dev/hd7;384.00;186.87;/var/
2019-10-14-00-00;/dev/hd6;256.00;0.63;/var/
2019-10-14-00-00;/dev/hd1;128.00;0.37;/admin
2019-10-14-00-00;/dev/hd4;256.00;179.14;/opt/
2019-10-14-00-00;/dev/hd3;2176.00;492.93;/opt/
2019-10-14-00-00;/dev/hd1;256.00;114.83;/opt/
2019-10-14-00-00;/dev/hd9;256.00;41.73;/var/
2019-10-14-00-00;/dev/hd1;3200.00;954.28;/var/
2019-10-14-00-00;/dev/hd10;256.00;0.93;/var/
2019-10-14-00-00;/dev/hd10;64.00;1.33;/
2019-10-14-00-00;/dev/hd2;1664.00;501.64;/opt/
2019-10-14-00-00;/dev/hd4;256.00;112.32;/opt/
2019-10-14-00-00;/dev/hd9;2176.00;1223.1;/opt/
2019-10-14-00-00;/dev/hd11;22784.00;12325.8;/opt/
2019-10-14-00-00;/dev/hd12;256.00;2.36;/
2019-10-14-06-00;/dev/hd12;1024.00;137.18;/
2019-10-14-06-00;/dev/hd1;256.00;2.36;/
2019-10-14-00-00;/dev/hd1;1024.00;136.37;/
2019-10-14-00-00;/dev/hd2;5248.00;4230.53;/usr
2019-10-14-00-00;/dev/hd3;2560.00;481.66;/var
2019-10-14-00-00;/dev/hd4;3584.00;67.65;/tmp
2019-10-14-00-00;/dev/hd5;256.00;26.13;/home
2019-10-14-00-00;/dev/hd1;1024.00;476.04;/opt
2019-10-14-00-00;/dev/hd5;384.00;0.38;/usr/xxx
2019-10-14-00-00;/dev/hd4;256.00;21.39;/xxx
2019-10-14-00-00;/dev/hd2;512.00;216.84;/opt
2019-10-14-00-00;/dev/hd3;128.00;21.46;/var/
2019-10-14-00-00;/dev/hd8;256.00;75.21;/usr/
2019-10-14-00-00;/dev/hd7;384.00;186.87;/var/
2019-10-14-00-00;/dev/hd6;256.00;0.63;/var/
2019-10-14-00-00;/dev/hd1;128.00;0.37;/admin
2019-10-14-00-00;/dev/hd4;256.00;179.14;/opt/
2019-10-14-00-00;/dev/hd3;2176.00;492.93;/opt/
2019-10-14-00-00;/dev/hd1;256.00;114.83;/opt/
2019-10-14-00-00;/dev/hd9;256.00;41.73;/var/
2019-10-14-00-00;/dev/hd1;3200.00;954.28;/var/
2019-10-14-00-00;/dev/hd10;256.00;0.93;/var/
2019-10-14-00-00;/dev/hd10;64.00;1.33;/
2019-10-14-00-00;/dev/hd2;1664.00;501.64;/opt/
2019-10-14-00-00;/dev/hd4;256.00;112.32;/opt/
I would like to create a directory for each server, create in each directory a txt file for each FS and put in these txt files each lines which correspond to the FS.
For that, I've tried loop :
#!/bin/bash
directory=(ls *.txt | cut -d'.' -f1)
for d in $directory
do
if [ ! -d $d ]
then
mkdir $d
fi
done
for i in $(cat *.txt)
do
file=$(echo $i | awk -F';' '{print $2}' | sort | uniq | cut -d'/' -f3 )
data=$(echo $i | awk -F';' '{print $2}' )
echo $i | grep -w $data >> /xx/xx/xx/xx/xx/${directory/${file}.txt
done
But this loop doesn't work properly. The directories are created but not the file inside each directory.
I would like something like :
test1/hd1.txt ( with each line which for the hd1 fs in the hd1.txt)
And same thing for each server.
Can you show me how to do that?
#!/bin/bash
for src in *.txt; do
# start a subshell so we don't need to cd back afterwards
# make "$src" be stdin before cd, so we don't need full path
# be careful that in subshell only awk reads from stdin
(
# extract server name to use as directory
dir=/xx/xx/xx/xx/xx/"${src%.txt}"
# chain with "&&" so failures don't cause bad files
mkdir -p "$dir" &&
cd "$dir" &&
awk -F \; '{ split($2, dev, "/"); print > dev[3]".txt" }'
) < "$src"
done
The awk script reads lines delimited by semi-colons.
It splits the second field on slashes to extract the device name (assumption is that the devices always have form: /dev/name
Finally, the > sends output to the relevant file.
For reference, you can make your script work by doing directory=$(...); adding the prefix to mkdir (assuming the prefix directories already exist); closing the reference ${directory}; and quoting all variable references for safety:
#!/bin/bash
directory=$(ls *.txt | cut -d'.' -f1)
for d in "$directory"
do
if [ ! -d "$d" ]
then
mkdir /xx/xx/xx/xx/xx/"$d"
fi
done
for i in $(cat *.txt)
do
file=$(echo "$i" | awk -F';' '{print $2}' | sort | uniq | cut -d'/' -f3 )
data=$(echo $i | awk -F';' '{print $2}' )
echo "$i" | grep -w "$data" >> /xx/xx/xx/xx/xx/"${directory}"/"${file}".txt
done
for file in `ls *.txt`
do
echo ${file}
directory=`echo ${file} | cut -d'.' -f1`
#echo ${directory}
if [ ! -d ${directory} ]
then
mkdir ${directory}
fi
FS=`cat ${file} | awk -F';' '{print $2}' | sort | uniq | cut -d'/' -f3`
#echo $FS
for f in $FS
do
cat ${file} |grep -w -e $f > ${directory}/${f}.txt
done
done
Explanation:
For each file in the current directory, the outer for loop will run.
In the loop for the selected file, a respective directory will be created first.
Next using the FS variable we take all the possible file systems from that selected file.
Finally, an inner loop will be run using the FS types to grep and create separate file system files in the directory.

I want to check if some given files contain more then 3 words from an input file in a shell script

My first parameter is the file that contains the given words and the rest are the other directories in which I'm searching for files, that contain at least 3 of the words from the 1st parameter
I can successfully print out the number of matching words, but when testing if it's greater then 3 it gives me the error: test: too many arguments
Here's my code:
#!/bin/bash
file=$1
shift 1
for i in $*
do
for j in `find $i`
do
if test -f "$j"
then
if test grep -o -w "`cat $file`" $j | wc -w -ge 3
then
echo $j
fi
fi
done
done
You first need to execute the grep | wc, and then compare that output with 3. You need to change your if statement for that. Since you are already using the backquotes, you cannot nest them, so you can use the other syntax $(command), which is equivalent to `command`:
if [ $(grep -o -w "`cat $file`" $j | wc -w) -ge 3 ]
then
echo $j
fi
I believe your problem is that you are trying to get the result of grep -o -w "cat $file" $j | wc -w to see if it's greater or equal to three, but your syntax is incorrect. Try this instead:
if test $(grep -o -w "`cat $file`" $j | wc -w) -ge 3
By putting the grep & wc commands inside the $(), the shell executes those commands and uses the output rather than the text of the commands themselves. Consider this:
> cat words
western
found
better
remember
> echo "cat words | wc -w"
cat words | wc -w
> echo $(cat words | wc -w)
4
> echo "cat words | wc -w gives you $(cat words | wc -w)"
cat words | wc -w gives you 4
>
Note that the $() syntax is equivalent to the double backtick notation you're already using for the cat $file command.
Hope this helps!
Your code can be refactored and corrected at few places.
Have it this way:
#!/bin/bash
input="$1"
shift
for dir; do
while IFS= read -r d '' file; do
if [[ $(grep -woFf "$input" "$file" | sort -u | wc -l) -ge 3 ]]; then
echo "$file"
fi
done < <(find "$dir" -type f -print0)
done
for dir loops through all the arguments
Use of sort -u is to remove duplicate words from output of grep.
Usewc -linstead ofwc -wsincegrep -o` prints matching words in separate lines.
find ... -print0 is to take care of file that may have whitespaces.
find ... -type f is to retrieve only files and avoid checking for -f later.

How to get result from background process linux shell script?

For example let's say I want to count the number of lines of 10 BIG files and print a total.
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' &
done
I was trying something like:
for f in files
do
#this does not work :/
n=$( expr $(wc -l $f | awk '{print $1}') + $n ) &
done
echo $n
I finally found a working solution using anonymous pipes and bash:
#!/bin/bash
# this executes a separate shell and opens a new pipe, where the
# reading endpoint is fd 3 in our shell and the writing endpoint
# stdout of the other process. Note that you don't need the
# background operator (&) as exec starts a completely independent process.
exec 3< <(./a.sh 2&1)
# ... do other stuff
# write the contents of the pipe to a variable. If the other process
# hasn't already terminated, cat will block.
output=$(cat <&3)
You should probably use gnu parallel:
find . -maxdepth 1 -type f | parallel --gnu 'wc -l' | awk 'BEGIN {n=0} {n += $1} END {print n}'
or else xargs in parallel mode:
find . -maxdepth 1 -type f | xargs -n1 -P4 wc -l | awk 'BEGIN {n=0} {n += $1} END {print n}'
Another option, if this doesn't fit your needs, is to write to temp files. If you don't want to write to disk, just write to /dev/shm. This is a ramdisk on most Linux systems.
#!/bin/bash
declare -a temp_files
count=0
for f in *
do
if [[ -f "$f" ]]; then
temp_files[$count]="$(mktemp /dev/shm/${f}-XXXXXX)"
((count++))
fi
done
count=0
for f in *
do
if [[ -f "$f" ]]; then
cat "$f" | wc -l > "${temp_files[$count]}" &
((count++))
fi
done
wait
cat "${temp_files[#]}" | awk 'BEGIN {n=0} {n += $1} END {print n}'
for tf in "${temp_files[#]}"
do
rm "$tf"
done
By the way, this can be though of as a map-reduce with wc doing the mapping and awk doing the reduction.
You could write that to a file or better, listen to a fifo as soon as data arrives.
Here is a small example on how they work:
# create the fifo
mkfifo test
# listen to it
while true; do if read line <test; then echo $line; fi done
# in another shell
echo 'hi there'
# notice 'hi there' being printed in the first shell
So you could
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' > fifo &
done
and listen on the fifo for sizes.

Resources