Count number of words in file, bash script

Count number of words in file, bash script - linux

How could i go printing the number of words in a specified file in a bash script. For example it will be run as
cat test | ./bash_script.sh
cat test
Hello World
This is a test
Output of running cat test | ./bash_script would look like
Word count: 6.
I am aware that it can be done without a script. I am trying to implement wc -w into a bash script that will count the words like shown above. Any help is appreciated! Thank You

if given a stream of input as shown:
while read -a words; do (( num += ${#words[#]} )); done
echo Word count: $num.
Extending from the link #FredrikPihl gave in a comment: this reads from each file given as an argument or from stdin if no files given:
for f in "${#:-/dev/stdin}"; do
while read -a words; do (( num += ${#words[#]} )); done < "$f"
done
echo Word count: $num.
this should be faster:
for f in "${#:-/dev/stdin}"; do
words=( $(< "$f") )
(( num += ${#words[#]} ))
done
echo Word count: $num.

in pure bash:
read -a arr -d $'\004'
echo ${#arr[#]}

Try this:
wc -w *.md | grep total | awk '{print $1}'

#!/bin/bash
word_count=$(wc -w)
echo "Word count: $word_count."
As pointed by #keshlam in the comments, this can be easily done by executing wc -w from the shell script, I didn't understand what could be its use case.
Although, the above shell script will work as per your requirement. Below is a test output.

I believe what you need is a function that you could add to your bashrc:
function script1() { wc -w $1; }
script1 README.md
335 README.md
You can add the function to your .bash_rc file and call it what you want upon next console or if you source your .bashrc file then it will load in the function ... from then on you can call function name like you see with file and it will give you count

You could expand the contents of the file as arguments and echo the number of arguments in the script.
$# Expands to the number of script arguments
#!/bin/bash
echo "Word count: $#."
Then execute:
./bash_script.sh $(cat file)

Related

Bash exercise giving multiple input to a script by another

i've come across to another exercise in preparation to the exam that i always find tricky for the redirection of input/output.
it asks:
Write a first script named "contaseparatamente.sh" that takes a variable number of arguments, each is a name of a file.
The script need to write on the Standard output the tot number of rows of the even's arguments and on the Standard error the tot number of rows of the odd's arguments.
(And i have done like this, and it works):
GNU nano 4.8 contaseparatamente.sh
#!/bin/bash
NUMEVEN=0
NUMODD=0
for((i=1; i<=$#; i++)); do
if((i%2==0))
then
NUMEVEN=$((${NUMEVEN} + `wc -l ${!i} | cut -d ' ' -f 1` ))
else
NUMODD=$((${NUMODD} + `wc -l ${!i} | cut -d ' ' -f 1` ))
fi;
done
echo rows of even ${NUMEVEN}
echo rows of odd ${NUMODD} 1<&2
then it asks: write a second script to launch and execute the first giving him as arguments the first 7 line of the output of ls -S1 /usr/include/*.h in the end this second script must show on the standard error also the output of the first script.
This is my try:
GNU nano 4.8
#!/bin/bash
./contaseparatamente.sh <( ls -S1 /usr/include/*.h | head -n 7 ) 2<&1
but in this way the result is
0 rows from the even
and 7 from the odd, which is not possible

I don't like the assignment, but...
To pass the args in the simplest way, use an unquoted subshell. (ugh)
#!/bin/bash
./contaseparatamente.sh $( ls -S1 /usr/include/*.h | head -n 7 )
The stderr of the first script will bleed through and show when you run the second script if you do nothing at all. If you need it on stdout, just redirect.
#!/bin/bash
./contaseparatamente.sh $( ls -S1 /usr/include/*.h | head -n 7 ) 2>&1

How to efficiently loop through the lines of a file in Bash?

I have a file example.txt with about 3000 lines with a string in each line. A small file example would be:
>cat example.txt
saudifh
sometestPOIFJEJ
sometextASLKJND
saudifh
sometextASLKJND
IHFEW
foo
bar
I want to check all repeated lines in this file and output them. The desired output would be:
>checkRepetitions.sh
found two equal lines: index1=1 , index2=4 , value=saudifh
found two equal lines: index1=3 , index2=5 , value=sometextASLKJND
I made a script checkRepetions.sh:
#!bin/bash
size=$(cat example.txt | wc -l)
for i in $(seq 1 $size); do
i_next=$((i+1))
line1=$(cat example.txt | head -n$i | tail -n1)
for j in $(seq $i_next $size); do
line2=$(cat example.txt | head -n$j | tail -n1)
if [ "$line1" = "$line2" ]; then
echo "found two equal lines: index1=$i , index2=$j , value=$line1"
fi
done
done
However this script is very slow, it takes more than 10 minutes to run. In python it takes less than 5 seconds... I tried to store the file in memory by doing lines=$(cat example.txt) and doing line1=$(cat $lines | cut -d',' -f$i) but this is still very slow...

When you do not want to use awk (a good tool for the job, parsing the input only once),
you can run through the lines several times. Sorting is expensive, but this solution avoids the loops you tried.
grep -Fnxf <(uniq -d <(sort example.txt)) example.txt
With uniq -d <(sort example.txt) you find all lines that occur more than once. Next grep will search for these (option -f) complete (-x) lines without regular expressions (-F) and show the line it occurs (-n).

See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why your script is so slow.
$ cat tst.awk
{ val2hits[$0] = val2hits[$0] FS NR }
END {
for (val in val2hits) {
numHits = split(val2hits[val],hits)
if ( numHits > 1 ) {
printf "found %d equal lines:", numHits
for ( hitNr=1; hitNr<=numHits; hitNr++ ) {
printf " index%d=%d ,", hitNr, hits[hitNr]
}
print " value=" val
}
}
}
$ awk -f tst.awk file
found 2 equal lines: index1=1 , index2=4 , value=saudifh
found 2 equal lines: index1=3 , index2=5 , value=sometextASLKJND
To give you an idea of the performance difference using a bash script that's written to be as efficient as possible and an equivalent awk script:
bash:
$ cat tst.sh
#!/bin/bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: bash 4.0 required" >&2; exit 1;; esac
# initialize an associative array, mapping each string to the last line it was seen on
declare -A lines=( )
lineNum=0
while IFS= read -r line; do
(( ++lineNum ))
if [[ ${lines[$line]} ]]; then
printf 'Content previously seen on line %s also seen on line %s: %s\n' \
"${lines[$line]}" "$lineNum" "$line"
fi
lines[$line]=$lineNum
done < "$1"
$ time ./tst.sh file100k > ou.sh
real 0m15.631s
user 0m13.806s
sys 0m1.029s
awk:
$ cat tst.awk
lines[$0] {
printf "Content previously seen on line %s also seen on line %s: %s\n", \
lines[$0], NR, $0
}
{ lines[$0]=NR }
$ time awk -f tst.awk file100k > ou.awk
real 0m0.234s
user 0m0.218s
sys 0m0.016s
There are no differences in the output of both scripts:
$ diff ou.sh ou.awk
$
The above is using 3rd-run timing to avoid caching issues and being tested against a file generated by the following awk script:
awk 'BEGIN{for (i=1; i<=10000; i++) for (j=1; j<=10; j++) print j}' > file100k
When the input file had zero duplicate lines (generated by seq 100000 > nodups100k) the bash script executed in about the same amount of time as it did above while the awk script executed much faster than it did above:
$ time ./tst.sh nodups100k > ou.sh
real 0m15.179s
user 0m13.322s
sys 0m1.278s
$ time awk -f tst.awk nodups100k > ou.awk
real 0m0.078s
user 0m0.046s
sys 0m0.015s

To demonstrate a relatively efficient (within the limits of the language and runtime) native-bash approach, which you can see running in an online interpreter at https://ideone.com/iFpJr7:
#!/bin/bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: bash 4.0 required" >&2; exit 1;; esac
# initialize an associative array, mapping each string to the last line it was seen on
declare -A lines=( )
lineNum=0
while IFS= read -r line; do
lineNum=$(( lineNum + 1 ))
if [[ ${lines[$line]} ]]; then
printf 'found two equal lines: index1=%s, index2=%s, value=%s\n' \
"${lines[$line]}" "$lineNum" "$line"
fi
lines[$line]=$lineNum
done <example.txt
Note the use of while read to iterate line-by-line, as described in BashFAQ #1: How can I read a file line-by-line (or field-by-field)?; this permits us to open the file only once and read through it without needing any command substitutions (which fork off subshells) or external commands (which need to be individually started up by the operating system every time they're invoked, and are likewise expensive).
The other part of the improvement here is that we're reading the whole file only once -- implementing an O(n) algorithm -- as opposed to running O(n^2) comparisons as the original code did.

Command to count the characters present in the variable

I am trying to count the number of characters present in the variable. I used the below shell command. But I am getting error - command not found in line 4
#!/bin/bash
for i in one; do
n = $i | wc -c
echo $n
done
Can someone help me in this?

In bash you can just write ${#string}, which will return the length of the variable string, i.e. the number of characters in it.

Something like this:
#!/bin/bash
for i in one; do
n=$(echo $i | wc -c)
echo $n
done

Assignments in bash cannot have a space before the equals sign. In addition, you want to capture the output of the command you run and assign that to $n, rather than that statement which would probably just assign $i to $n.
Use the following instead:
#!/bin/bash
for i in one; do
n=`$i | wc -c`
echo $n
done

It can be as simple as that:
str="abcdef"; wc -c <<< "$str"
7
But mind you that end of line counts as a character:
str="abcdef"; cat -A <<< "$str"
abcdef$
If you need to remove it:
str="abcdef"; tr -d '\n' <<< "$str" | wc -c
6

Cygwin bash: read file word by word

I want to read text file word by word. Problem: there are some words containing "/*". Such a word causes script to return files in root directory. I tried:
for word in $(< file)
do
printf "[%s]\n" "$word"
done
And several other combinations with echo/cat/etc...
For this file:
/* example file
I get following output:
[/bin]
[/cygdrive]
[/Cygwin.bat]
...
[example]
[file]
Should be easy but it's driving me nuts.

You need to turn off pathname expansion globbing. Run a new shell with bash -f and try again. See http://wiki.bash-hackers.org/syntax/expansion/globs or dive into the manpage with man bash, maybe do man bash | col -b >bash.txt.

How about this solution, the double quotes around $(< file) stop * from being expanded and sed is used format the output as required:
for word in "$(< file)"
do
echo "$word" | sed -E 's/(\S*)(\s)/[\1]\2\n/g'
done
Output:
[/*]
[example]
[file]

this may help;
# skip blank lines and comment lines begining with hash (#)
cat $CONFIG_FILE | while read LINE
do
first_char=`echo $LINE | cut -c1-1`
if [ "${first_char}" = "#" ]
then
echo "Skip line with first_char= >>${first_char}<<"
else
:
echo "process line: $LINE" ;
fi
done
Another way is to use a case statement

How about this one?
while read -a a; do printf '[%s]\n' "${a[#]}"; done < file
Output:
[/*]
[example]
[file]

extracting specified line numbers from file using shell script

I have a file with a list of address it looks like this (ADDRESS_FILE)
0xf012134
0xf932193
.
.
0fx12923a
I have another file with a list of numbers it looks like this (NUMBERS_FILE)
20
40
.
.
12
I want to cut the first 20 lines from ADDRESS_FILE and put that into a new file
then cut the next 40 lines from ADDRESS_FILE so on ...
I know that a series of sed commands like the one given below does the job
sed -n 1,20p ADDRESSS_FILE > temp_file_1
sed -n 20,60p ADDRESSS_FILE > temp_file_2
.
.
sed -n somenumber,endofilep. ADDRESS_FILE > temp_file_n
But I want to does this automatically using shell scripting which will change the numbers of lines to cut on each sed execution.
How to do this ???
Also on a general note, which are the text processing commands in linux which are very useful in such cases?

Assuming your line numbers are in a file called lines, sorted etc., try:
#!/bin/sh
j=0
count=1
while read -r i; do
sed -n $j,$i > filename.$count # etc... details of sed/redirection elided
j=$i
count=$(($count+1))
done < lines
Note. The above doesn't assume a consistent number of lines to split on for each iteration.
Since you've additionally asked for a general utility, try split. However this splits on a consistent number of lines, and is perhaps of limited use here.

Here's an alternative that reads directly from the NUMBERS_FILE:
n=0; i=1
while read; do
sed -n ${i},+$(( REPLY - 1 ))p ADDRESS_FILE > temp_file_$(( n++ ))
(( i += REPLY ))
done < NUMBERS_FILE

size=$(wc -l ADDRESSS_FILE)
i=1
n=1
while [ $n -lt $size ]
do
sed -n $n,$((n+19))p ADDRESSS_FILE > temp_file_$i
i=$((i+1))
n=$((n+20))
done
or just
split -l20 ADDRESSS_FILE temp_file_
(thanks Brian Agnew for the idea).

An ugly solution which works with a single sed invocation, can probably be made less horrible.
This generates a tiny sed script to split the file
#!/bin/bash
sum=0
count=0
sed -n -f <(while read -r n ; do
echo $((sum+1),$((sum += n)) "w temp_file_$((count++))" ;
done < NUMBERS_FILE) ADDRESS_FILE

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Count number of words in file, bash script - linux

in pure bash: read -a arr -d $'\004' echo ${#arr[#]}

Try this: wc -w *.md | grep total | awk '{print $1}'

You could expand the contents of the file as arguments and echo the number of arguments in the script. $# Expands to the number of script arguments #!/bin/bash echo "Word count: $#." Then execute: ./bash_script.sh $(cat file)

Related

Bash exercise giving multiple input to a script by another

How to efficiently loop through the lines of a file in Bash?

Command to count the characters present in the variable

Cygwin bash: read file word by word

extracting specified line numbers from file using shell script

Categories

Resources