Combine similar rows in a file using AWK - linux

again!
Can somebody help me with next question:
I need to combine similar rows in a file using awk. Example
File has next rows:
Mike dollar 15
Fred euro 10
Mike euro 4
Fred euro 4
Output should look like:
Mike:
dollar 15
euro 4
Fred:
euro 14
How do I combine similar patterns in different rows into one row?
Thanks a lot for ideas!

awk to the rescue!
$ awk '{a[$1,$2]+=$3; k1s[$1]; k2s[$2]}
END{for(k1 in k1s)
{print k1":";
for(k2 in k2s) if(a[k1,k2]) print k2, a[k1,k2]; print ""}}' file
Mike:
euro 4
dollar 15
Fred:
euro 14

Related

How to delete lines in file1 based on column match with file2

I have 2 files; file1 and file2. File1 has many lines/rows and columns. File2 has just one column, with several lines/rows. All of the strings in file2 are found in file1. I want to create a new file (file3), such that the lines in file1 that contain the any of the strings in file2 are deleted.
For example,
File1:
Sally ate 083 popcorn
Rick has 241 cars
John won 505 dollars
Bruce knows 121 people
File2:
083
121
Desired file3:
Rick has 241 cars
John won 505 dollars
Note that I do not want to enter the strings in file 2 into a command manually (the actual files are much larger than in the example).
Thanks!
awk approach:
awk 'BEGIN{p=""}FNR==NR{if(!/^$/){p=p$0"|"} next} $0!~substr(p, 1, length(p)-1)' file2 file1 > file3
p="" the variable treated as pattern containing all column values from file2
FNR==NR ensures that the next expression is performed for the first input file i.e. file2
if(!/^$/){p=p$0"|"} means: if it's not an empty line !/^$/ (as it could be according to your input) concatenate pattern parts with | so it eventually will look like 083|121|
$0!~substr(p, 1, length(p)-1) - checks if a line from the second input file(file1) is not matched with pattern(i.e. file2 column values)
The file3 contents:
Rick has 241 cars
John won 505 dollars
grep suites your purpose better than a line editor
grep -v -f File2 File1 >File3
Try this -
#cat f1
Sally ate 083 popcorn
Rick has 241 cars
John won 505 dollars
Bruce knows 121 people
#cat f2
083
121
#grep -vwf f2 f1
Rick has 241 cars
John won 505 dollars

Replace single character in string with new line

I'm trying to edit some text files by replacing single characters with a new line.
Before:
Bill Fred Jack L Max Sam
After:
Bill Fred Jack
Max Sam
This is the closest I have gotten, but the single character is not always going to be 'L'.
cat File.txt | tr "L" "\n
you can try this
sed "s/\s\S\s/\n/g" File.txt
explanation,
You want to convert any word formed by a single character in special character: line break \n,
\s : Space and tab
\S : Non-whitespace characters
bash-4.3$ cat file.txt
1.Bill Fred Jack L Max Sam
2.Bill Fred Jack M Max Sam
3.Bill Fred Jack N Max Sam
bash-4.3$ sed 's/\s[A-Z]\s/\n/g' file.txt
1.Bill Fred Jack
Max Sam
2.Bill Fred Jack
Max Sam
3.Bill Fred Jack
Max Sam
sed "s/[[:blank:]][[:alpha:]][[:blank:]]/\
/g" YourFile
posix version
assuming that single letter is inside the string and not to the edge (start or end)

grep shows occurrences of pattern on a per line basis

From the input file:
I am Peter
I am Mary
I am Peter Peter Peter
I am Peter Peter
I want output to be like this:
1 I am Peter
3 I am Peter Peter Peter
2 I am Peter Peter
Where 1, 3 and 2 are occurrences of "Peter".
I tried this, but the info is not formatted the way I wanted:
grep -o -n Peter inputfile
This is not easily solved with grep, I would suggest moving "two tools up" to awk:
awk '$0 ~ FS { print NF-1, $0 }' FS="Peter" inputfile
Output:
1 I am Peter
3 I am Peter Peter Peter
2 I am Peter Peter
###Edit
To answer a question in the comments:
What if I want case insensitive? and what if I want multiple pattern
like "Peter|Mary|Paul", so "I am Peter peter pAul Mary marY John",
will yield the count of 5?
If you are using GNU awk, you do it by enabling IGNORECASE and setting the pattern in FS like this:
awk '$0 ~ FS { print NF-1, $0 }' IGNORECASE=1 FS="Peter|Mary|Paul" inputfile
Output:
1 I am Peter
1 I am Mary
3 I am Peter Peter Peter
2 I am Peter Peter
5 I am Peter peter pAul Mary marY John
You don’t need -o or -n. From grep --help:
-o, --only-matching show only the part of a line matching PATTERN
...
-n, --line-number print line number with output lines
Remove them and your output will be better. I think you’re misinterpreting -n -- it just shows the line number, not the occurrence count.
It looks like you’re trying to get the count of “Peter” appearances per line. You’d need something beyond a single grep for that. awk could be a good choice. Or you could loop over each each line to split into words (say an array) and grep -c the array for each line, to print the line’s count.

Shell script problems

So I'm doing some work on shell script. I have this code:
Echo "5 Matt male"
Echo "8 Sarah female"
Echo "9 Paul male"
I am meant to set a threshold number of 6 which will only output the lines whose numbers are above 6. Hence the lines containing sarah and Paul. But I have no idea on how to do this. Im so sorry but it is also meant to print only the ones that also contain "female"
your date need to be stored in file.txt.
file.txt:
5 Matt male
8 Sarah female
9 Paul male
cat file.txt | awk '{ if( $1 > 5 && $3=="female") print $0}'
If you don't know the usage of awk, take a look this http://cm.bell-labs.com/cm/cs/awkbook/

Cannot get this simple sed command

This sed command is described as follows
Delete the cars that are $10,000 or more. Pipe the output of the sort into a sed to do this, by quitting as soon as we match a regular expression representing 5 (or more) digits at the end of a record (DO NOT use repetition for this):
So far the command is:
$ grep -iv chevy cars | sort -nk 5
I have to add another pipe at the end of that command I think which "quits as soon as we match a regular expression representing 5 or more digits at the end of a record"
I tried things like
$ grep -iv chevy cars | sort -nk 5 | sed "/[0-9][0-9][0-9][0-9][0-9]/ q"
and other variations within the // but nothing works! What is the command which matches a regular expression representing 5 or more digits and quits according to this question?
Nominally, you should add a $ before the second / to match 5 digits at the end of the record. If you omit the $, then any sequence of 5 digits will cause sed to quit, so if there is another number (a VIN, perhaps) before the price, it might match when you didn't intend it to.
grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/q'
On the whole, it's safer to use single quotes around the regex, unless you need to substitute a shell variable into it (or unless the regex contains single quotes itself). You can also specify the repetition:
grep -iv chevy cars | sort -nk 5 | sed '/[0-9]\{5,\}$/q'
The \{5,\} part matches 5 or more digits. If for any reason that doesn't work, you might find you're using GNU sed and you need to do something like sed --posix to get it working in the normal mode. Or you might be able to just remove the backslashes. There certainly are options to GNU sed to change the regex mechanism it uses (as there are with GNU grep too).
Another way.
As you don't post a file sample, a did it as a guess.
Here I'm looking for lines with the word "chevy" where the field 5 is less than 10000.
awk '/chevy/ {if ( $5 < 10000 ) print $0} ' cars
I forgot the flag -i from grep ... so the correct is:
awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
$ cat > cars
Chevy 2 3 4 10000
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 10000
CHEVY 2 3 4 2000
Prevy 2 3 4 1000
Prevy 2 3 4 10000
$ awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 2000
grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/d'

Resources