grep -o: Keep input line format - linux

$ echo "abca\ndeaf" | grep -o a
a
a
a
I am looking for the output:
aa
a
Or perhaps
a a
a
or even
a<TAB>a
a
(this is a very very simplified example)
I just want it not to throw away the line grouping.

You can do it with sed by removing any character that isn't a:
echo "abca\ndeaf" | sed 's/[^a]//g'
aa
a
It can't be done with grep alone.

#sudo_O's answer shows how to do this with single-character strings. The difficulty level is raised if you want to match longer strings.
One way to do it is by parsing the output of grep -n -o, like so:
$ cat mgrep
#!/bin/bash
# Print each match along with its line number.
grep -no "$#" | {
matches=() # An array of matches to be printed when the line number changes.
lastLine= # Keep track of the current and previous line numbers.
# Read the matches, with `:' as the separator.
while IFS=: read line match; do
# If this is the same line number as the previous match, add this one to
# the list.
if [[ $line = $lastLine ]]; then
matches+=("$match")
# Otherwise, print out the list of matches we've accumulated and start
# over.
else
(( ${#matches[#]} )) && echo "${matches[#]}"
matches=("$match")
fi
lastLine=$line
done
# Print any remaining matches.
(( ${#matches[#]} )) && echo "${matches[#]}"
}
Example usage:
$ echo $'abca\ndeaf' | ./mgrep a
a a
a
$ echo $'foo bar foo\nbaz\ni like food' | ./mgrep foo
foo foo
foo

Based off John Kugelman's solution, this one works with one input file and gawk
grep -on abc file.txt | awk -v RS='[[:digit:]]+:' 'NF{$1=$1; print}'

If you're willing to use perl:
$ echo $'abca\ndeaf' | perl -ne '#m = /a/g; print "#m\n"'
a a
a

Related

Print second last line from variable in bash

VAR="1\n2\n3"
I'm trying to print out the second last line. One liner in bash!
I've gotten so far: printf -- "$VAR" | head -2
It however prints out too much.
I can do this with a file no problem: tail -2 ~/file | head -1
You almost done this task by yourself. Try
VAR="1\n2\n3"; printf -- "$VAR"|tail -2|head -1
Here is one pure bash way of doing this:
readarray -t arr < <(printf -- "$VAR") && echo "${arr[-2]}"
2
You may also use this awk as a single command:
VAR="1\n2\n3"
awk -F '\\\\n' '{print $(NF-1)}' <<< "$VAR"
2
maybe more efficient using a temporary variable and using expansions
var=$'1\n2\n3' ; tmpvar=${var%$'\n'*} ; echo "${tmpvar##*$'\n'}"
Use echo -e for backslash interpretation and to translate \n to newlines and print the interested line number using NR.
$ echo -e "${VAR}" | awk 'NR==2'
2
With multiple lines and do, tail and head can be used to print any particular line number.
$ echo -e "$VAR" | tail -2 | head -1
2
or do a fancy sed, where you keep the previous line in the buffer-space (x) to print and keep deleting until the last line,
$ echo -e "$VAR" | sed 'x;$!d'
2

Linux: Extract string from a line including delimiter character using sed command [duplicate]

For example
echo "abc-1234a :" | grep <do-something>
to print only abc-1234a
I think these are closer to what you're getting at, but without knowing what you're really trying to achieve, it's hard to say.
echo "abc-1234a :" | egrep -o '^[^:]+'
... though this will also match lines that have no colon. If you only want lines with colons, and you must use only grep, this might work:
echo "abc-1234a :" | grep : | egrep -o '^[^:]+'
Of course, this only makes sense if your echo "abc-1234a :" is an example that would be replace with possibly multiple lines of input.
The smallest tool you could use is probably cut:
echo "abc-1234a :" | cut -d: -f1
And sed is always available...
echo "abc-1234a :" | sed 's/ *:.*//'
For this last one, if you only want to print lines that include a colon, change it to:
echo "abc-1234a :" | sed -ne 's/ *:.*//p'
Heck, you could even do this in pure bash:
while read line; do
field="${line%%:*}"
# do stuff with $field
done <<<"abc-1234a :"
For information on the %% bit, you can man bash and search for "Parameter Expansion".
UPDATE:
You said:
It's the characters in the first line of input before the colon. The
input could have multiple line though.
The solutions with grep probably aren't your best choice, then, since they'll also print data from subsequent lines that might include colons. Of course, there are many ways to solve this requirement as well. We'll start with sample input:
$ function sample { printf "abc-1234a:foo\nbar baz:\nNarf\n"; }
$ sample
abc-1234a:foo
bar baz:
Narf
You could use multiple pipes, for example:
$ sample | head -1 | grep -Eo '^[^:]*'
abc-1234a
$ sample | head -1 | cut -d: -f1
abc-1234a
Or you could use sed to process only the first line:
$ sample | sed -ne '1s/:.*//p'
abc-1234a
Or tell sed to exit after printing the first line (which is faster than reading the whole file):
$ sample | sed 's/:.*//;q'
abc-1234a
Or do the same thing but only show output if a colon was found (for safety):
$ sample | sed -ne 's/:.*//p;q'
abc-1234a
Or have awk do the same thing (as the last 3 examples, respectively):
$ sample | awk '{sub(/:.*/,"")} NR==1'
abc-1234a
$ sample | awk 'NR>1{nextfile} {sub(/:.*/,"")} 1'
abc-1234a
$ sample | awk 'NR>1{nextfile} sub(/:.*/,"")'
abc-1234a
Or in bash, with no pipes at all:
$ read line < <(sample)
$ printf '%s\n' "${line%%:*}"
abc-1234a
It is possible to do what you want with only sed.
Here is an example:
#!/bin/sh
filename=$1
pattern=yourpattern
# flag -n disables print everyline (default behavior)
sed -n "
1,/$pattern/ {
/$pattern/n # skip line containing pattern
p # print lines ranging from line 1 untill pattern
}
" $filename
exit 0
This works at least for GNU's sed. It should work for other sed too, except
regarding the comments (some implementations of sed don't support comments).
Source: https://www.grymoire.com/Unix/Sed.html

how to check if a word contains all letters in a string bash

let's say I have a file containing words (one per line), and I have a string containing letters
str = "aeiou"
I want to check how many words in the file contain all the letters in string. They don't have to appear in order.
the first thing that came to mind was using cat and grep
cat wordfile | grep a | grep e | grep i | grep letters....
this seems to work, but I wonder if there's a better way.
If the search string is fixed, you might try something like that:
cat wordfile | awk '/a/&&/e/&&/i/&&/o/&&/u/' | wc -l
If needed, the search pattern may easily been build using your favorite script language. As I favor Python:
str="aeiou"
search=$(python -c 'print "/"+"/&&/".join([c for c in "'"$str"'"])+"/"')
cat wordfile | awk "$search" | wc -l
Here is a solution that is done solely in bash. Note the [[ ]] makes this non-portable to sh. This script will read every line in file and then test that it contains every character in str. The file to read must be the first argument for the script. The comments below describe the operation:
#!/bin/bash
str=aeiou
while read line || test -n "$line"; do # read every line in file
match=0; # initialize match = true
for ((i=0; i<${#str}; i++)); do # for each letter in string
[[ $line =~ ${str:$i:1} ]] || { # test it is contained in line - or
match=1 # set match false and
break # break - goto next word
}
done
# if match still true, then all letters in string found in line
test "$match" -eq 0 && echo "all found in '$line'";
done < "$1"
exit 0
testfile (dat/vowels.txt):
a_even_ice_dough_ball
a_even_ice_ball
someword
notallvowels
output:
$ bash vowel.sh dat/vowels.txt
all found in 'a_even_ice_dough_ball'
Messy, but can be done in one step by turning on the PCRE-regex flag of GNU grep
grep -P '^(?=.*a.*)(?=.*e.*)(?=.*i.*)(?=.*o.*)(?=.*u.*)' file | wc -l

Getting a specific line from a string where the line number I must get is stored in a variable?

I'm trying to get a specific line of a variable. The line I must get is stored in i. My code looks like this right now.
$(echo "$data" | sed '$iq;d')
It looks like I'm putting i in there wrong, Putting a number in for i works fine but $i gets me the entire string.
I haven't found a solution that works with a variable yet and I'm not too familiar with bash and would appreciate help,
Edit: a bit of context
i=5
data=$(netstat -a | grep ESTAB)
line=$(echo "$data" | sed "${i}p")
echo $line
Use sed -n "${i}p" instead.
Example:
i=4; seq 1 10 | sed -n "${i}p"
Output:
4
Bonus:
i=5
readarray -O 1 -t data < <(exec netstat -a | grep ESTAB) ## Stores data as an array of lines starting at index 1
line=${data[i]}
echo "$line"
# printf '%s\n' "${data[#]}" ## Prints whole data.
Here is way you can do this in BASH itself:
IFS=$'\n' arr=($data)
echo "${arr[$i]}"

How do I find the count of multiple words in a text file?

I am able to find the number of times a word occurs in a text file, like in Linux we can use:
cat filename|grep -c tom
My question is, how do I find the count of multiple words like "tom" and "joe" in a text file.
Since you have a couple names, regular expressions is the way to go on this one. At first I thought it was as simple as just a grep count on the regular expression of joe or tom, but fount that this did not account for the scenario where tom and joe are on the same line (or tom and tom for that matter).
test.txt:
tom is really really cool! joe for the win!
tom is actually lame.
$ grep -c '\<\(tom\|joe\)\>' test.txt
2
As you can see from the test.txt file, 2 is the wrong answer, so we needed to account for names being on the same line.
I then used grep -o to show only the part of a matching line that matches the pattern where it gave the correct pattern matches of tom or joe in the file. I then piped the results into number of lines into wc for the line count.
$ grep -o '\(joe\|tom\)' test.txt|wc -l
3
3...the correct answer! Hope this helps
Ok, so first split the file into words, then sort and uniq:
tr -cs '[:alnum:]' '\n' < testdata | sort | uniq -c
You use uniq:
sort filename | uniq -c
Use awk:
{for (i=1;i<=NF;i++)
count[$i]++
}
END {
for (i in count)
print count[i], i
}
This will produce a complete word frequency count for the input.
Pipe tho output to grep to get the desired fields
awk -f w.awk input | grep -E 'tom|joe'
BTW, you do not need cat in your example, most programs that acts as filters can take the filename as an parameter; hence it's better to use
grep -c tom filename
if not, there is a strong possibility that people will start throwing Useless Use of Cat Award at you ;-)
The sample you gave does not search for words "tom". It will count "atom" and "bottom" and many more.
Grep searches for regular expressions. Regular expression that matches word "tom" or "joe" is
\<\(tom\|joe\)\>
You could do regexp,
cat filename |tr ' ' '\n' |grep -c -e "\(joe\|tom\)"
Here is one:
cat txt | tr -s '[:punct:][:space:][:blank:]'| tr '[:punct:][:space:][:blank:]' '\n\n\n' | tr -s '\n' | sort | uniq -c
UPDATE
A shell script solution:
#!/bin/bash
file_name="$2"
string="$1"
if [ $# -ne 2 ]
then
echo "Usage: $0 <pattern to search> <file_name>"
exit 1
fi
if [ ! -f "$file_name" ]
then
echo "file \"$file_name\" does not exist, or is not a regular file"
exit 2
fi
line_no_list=("")
curr_line_indx=1
line_no_indx=0
total_occurance=0
# line_no_list contains loc k the line number loc k+1 the number
# of times the string occur at that line
while read line
do
flag=0
while [[ "$line" == *$string* ]]
do
flag=1
line_no_list[line_no_indx]=$curr_line_indx
line_no_list[line_no_indx+1]=$((line_no_list[line_no_indx+1]+1))
total_occurance=$((total_occurance+1))
# remove the pattern "$string" with a null" and recheck
line=${line/"$string"/}
done
# if we have entered the while loop then increment the
# line index to access the next array pos in the next
# iteration
if (( flag == 1 ))
then
line_no_indx=$((line_no_indx+2))
fi
curr_line_indx=$((curr_line_indx+1))
done < "$file_name"
echo -e "\nThe string \"$string\" occurs \"$total_occurance\" times"
echo -e "The string \"$string\" occurs in \"$((line_no_indx/2))\" lines"
echo "[Occurence # : Line Number : Nos of Occurance in this line]: "
for ((i=0; i<line_no_indx; i=i+2))
do
echo "$((i/2+1)) : ${line_no_list[i]} : ${line_no_list[i+1]} "
done
echo
I completely forgot about grep -f:
cat filename | grep -fc names
AWK solution:
Assuming the names are in a file called names:
cat filename | awk 'NR==FNR {h[NR] = $1;ct[i] = 0; cnt=NR} NR !=FNR {for(i=1;i<=cnt;++i) if(match($0,h[i])!=0) ++ct[i] } END {for(i in h) print h[i], ct[i]}' names -
Note that your original grep doesn't search for words. e.g.
$ echo tomorrow | grep -c tom
1
You need grep -w
gawk -vRS='[^[:alpha:]]+' '{print}' | grep -c '^(tom|joe|bob|sue)$'
The gawk program sets the record separator to anything non-alphabetic, so every word will end up on a separate line. Then grep counts lines that match one of the words you want exactly.
We use gawk because the POSIX awk doesn't allow regex record separator.
For brevity, you can replace '{print}' with 1 - either way, it's an Awk program that simply prints out all input records ("is 1 true? it is? then do the default action, which is {print}.")
To find all hits in all lines
echo "tom is really really cool! joe for the win!
tom is actually lame." | akw '{i+=gsub(/tom|joe/,"")} END {print i}'
3
This will count "tomtom" as 2 hits.

Resources