grep shows occurrences of pattern on a per line basis - linux

From the input file:
I am Peter
I am Mary
I am Peter Peter Peter
I am Peter Peter
I want output to be like this:
1 I am Peter
3 I am Peter Peter Peter
2 I am Peter Peter
Where 1, 3 and 2 are occurrences of "Peter".
I tried this, but the info is not formatted the way I wanted:
grep -o -n Peter inputfile

This is not easily solved with grep, I would suggest moving "two tools up" to awk:
awk '$0 ~ FS { print NF-1, $0 }' FS="Peter" inputfile
Output:
1 I am Peter
3 I am Peter Peter Peter
2 I am Peter Peter
###Edit
To answer a question in the comments:
What if I want case insensitive? and what if I want multiple pattern
like "Peter|Mary|Paul", so "I am Peter peter pAul Mary marY John",
will yield the count of 5?
If you are using GNU awk, you do it by enabling IGNORECASE and setting the pattern in FS like this:
awk '$0 ~ FS { print NF-1, $0 }' IGNORECASE=1 FS="Peter|Mary|Paul" inputfile
Output:
1 I am Peter
1 I am Mary
3 I am Peter Peter Peter
2 I am Peter Peter
5 I am Peter peter pAul Mary marY John

You don’t need -o or -n. From grep --help:
-o, --only-matching show only the part of a line matching PATTERN
...
-n, --line-number print line number with output lines
Remove them and your output will be better. I think you’re misinterpreting -n -- it just shows the line number, not the occurrence count.
It looks like you’re trying to get the count of “Peter” appearances per line. You’d need something beyond a single grep for that. awk could be a good choice. Or you could loop over each each line to split into words (say an array) and grep -c the array for each line, to print the line’s count.

Related

using sed to change numbers in a csv file

I have a csv file with 3 columns like below
Jones Smith 656220665
I would like to convert it to
Jones Smith 000000000
The problem i have is not all the numbers are the same length. some are 7 digits long. i can't seem to find a way to change them from their current format to 0,s and has to use sed and cut
Here is 2 of the codes i tried and tried to manipulate to suit my needs
sed 's/\([^ ]*\) \([^_]*\)_\(.*\)/\1 \3/g' Input_file
and
$ sed 's/\(\([^,]\+,\)\{1\}\)\([^,]\+,\)\(.*\)/\1\3\3\4/' /path/to/your/file
Instead of using sed, how about this:
echo 'Jones Smith 656220665' | tr '[0-9]' '0'
Jones Smith 000000000
For the whole file that's then:
tr '[0-9]' '0' < file > file.tmp
Edit 1:
added a sed solution:
sed 's/[0-9]/0/g' smith
Jones Smith 000000000
stil
Edit 2:
cat ClientData.csv > ClientData.csv.bak
sed 's/[0-9]/0/g; w ClientData.csv' ClientData.csv.bak | cut -d" " -f 1-2
You can do this very simply with sed general substitution of:
sed 's/[0-9]/0/g`
(where the 'g' provides a global replacement of all instances of [0-9] with '0'), e.g.
$ echo "Jones Smith 656220665" | sed 's/[0-9]/0/g'
Jones Smith 000000000
Give it a shot and let me know if you have further issues.

Randomly selecting (units) from a file where a unit is 2 lines.

I want to select from a file random lines/units but where the units are consisted of 2 lines.
For example a file looks like this
Adam
Apple
Mindy
Candy
Steve
Chips
David
Meat
Carol
Carrots
And I want to randomly subselect lets say 2 units group
For example
Adam
Apple
David
Meat
or
Steve
Chips
Carol
Carrots
I've tried using shuf and sort -R but they only shuffle 1 lines. Could someone help me please?
Thank you.
You could do it with shuf by joining the lines before shuffling (that might not be a bad idea for a file format in general, if the lines describe a single item):
$ < file sed -e 'N;s/\n/:/' | shuf | head -1 | tr ':' '\n'
Carol
Carrots
The sed loads two lines at a time, and joins them with a colon.
Pick a random number in the correct range, ensure that it is odd (if desired), then use sed to print the 2 lines:
$ a=$(expr $RANDOM % \( $(wc -l < input) / 2 \) \* 2 + 1)
$ sed -n -e ${a}p -e $((a+1))p input
Rather than selecting lines to print, you could walk the file and print each "unit" with a particular probability. For example, to print (roughly) 10% of the "units" in the file, you could do:
awk 'BEGIN{srand()} NR%2 && (rand() < .1) {print; getline; print}' input

How to use cut and paste commands as a single line command?

In Unix, I am trying to write a sequence of cut and paste commands (saving result of each command in a file) that inverts every name in the file(below) shortlist and places a coma after the last name(for example, bill johnson becomes johnson, bill).
here is my file shortlist:
2233:charles harris :g.m. :sales :12/12/52: 90000
9876:bill johnson :director :production:03/12/50:130000
5678:robert dylan :d.g.m. :marketing :04/19/43: 85000
2365:john woodcock :director :personnel :05/11/47:120000
5423:barry wood :chairman :admin :08/30/56:160000
I am able to cut from shortlist but not sure how to paste it on to my filenew file in same command line. Here is my code for cut:
cut -d: -f2 shortlist
result:
charles harris
bill johnson
robert dylan
john woodcock
barry wood
Now I want this to be pasted in my filenew file and when I cat filenew, result should look like below,
harris, charles
johnson, bill
dylan, robert
woodcock, john
wood, barry
Please guide me through this. Thank you.
You could do it with a single awk:
awk -F: '{split($2,a, / /); if(a[2]) l=a[2] ", "; print l a[1]}' shortlist
I am assuming that if you don't have a second name, you don't want to print the comma (and you don't have more than 2 words in the name).
Once you've used cut to split up the string, it may be easier to use awk than paste to produce the result you want:
$ cut -d":" -f2 shortlist | awk '{printf "%s, %s\n", $2, $1}'

Replace single character in string with new line

I'm trying to edit some text files by replacing single characters with a new line.
Before:
Bill Fred Jack L Max Sam
After:
Bill Fred Jack
Max Sam
This is the closest I have gotten, but the single character is not always going to be 'L'.
cat File.txt | tr "L" "\n
you can try this
sed "s/\s\S\s/\n/g" File.txt
explanation,
You want to convert any word formed by a single character in special character: line break \n,
\s : Space and tab
\S : Non-whitespace characters
bash-4.3$ cat file.txt
1.Bill Fred Jack L Max Sam
2.Bill Fred Jack M Max Sam
3.Bill Fred Jack N Max Sam
bash-4.3$ sed 's/\s[A-Z]\s/\n/g' file.txt
1.Bill Fred Jack
Max Sam
2.Bill Fred Jack
Max Sam
3.Bill Fred Jack
Max Sam
sed "s/[[:blank:]][[:alpha:]][[:blank:]]/\
/g" YourFile
posix version
assuming that single letter is inside the string and not to the edge (start or end)

Shell script problems

So I'm doing some work on shell script. I have this code:
Echo "5 Matt male"
Echo "8 Sarah female"
Echo "9 Paul male"
I am meant to set a threshold number of 6 which will only output the lines whose numbers are above 6. Hence the lines containing sarah and Paul. But I have no idea on how to do this. Im so sorry but it is also meant to print only the ones that also contain "female"
your date need to be stored in file.txt.
file.txt:
5 Matt male
8 Sarah female
9 Paul male
cat file.txt | awk '{ if( $1 > 5 && $3=="female") print $0}'
If you don't know the usage of awk, take a look this http://cm.bell-labs.com/cm/cs/awkbook/

Resources