sed: How do I replace all lines containing a certain string?

sed: How do I replace all lines containing a certain string? - string

Say I have a file containing these lines:
tomatoes
bananas
tomatoes with apples
pears
pears with apples
How do I replace every line containing the word "apples" with just "oranges"? This is what I want to end up with:
tomatoes
bananas
oranges
pears
oranges

Use sed 's/.*apples.*/oranges/' ?

you can use awk
$ awk '/apples/{$0="orange"}1' file
tomatoes
bananas
orange
pears
orange
this says to search for apples, then change the whole record to "orange".

Related

Understanding the sed N command

The sed manual states about the N command:
N
Add a newline to the pattern space, then append the next line of input to the pattern space. If there is no more input then sed exits without processing any more commands.
Now, from what I know, sed reads each line of input, applies the script(s) on it, and (if -n is not specified) prints it as the output stream.
So, given the following example file:
Apple
Banana
Grapes
Melon
Running this:
$ sed '/Banana/N;p'
From what I understand, sed should process each line of input: Apple, Banana, Grapes and Melon.
So, I would think that the output will be:
Apple
Apple
Banana
Grapes # since it read the next line with N
Banana
Grapes
Grapes (!)
Grapes (!)
Melon
Melon
Explanation:
Apple is read to the pattern space. it doesn't match Banana regex, so only p is applied. It's printed twice: once for the p command, and once because sed prints the pattern space by default.
Next, Banana is read to the pattern space. It matches the regex, so that the N command is applied: so it reads the next line Grapes to the pattern space, and then p prints it: Banana\nGrapes. Next, the pattern space is printed again due to the default behavior.
Now, I would expect that Grapes will be read to the pattern space, so that Grapes will be printed twice, same as for Apple and Melon.
But in reality, this is what I get:
Apple
Apple
Banana
Grapes
Banana
Grapes
Melon
Melon
It seems that once Grapes was read as part of the N command that was applied to Banana, it will no longer be read as a line of its own.
Is that so? and if so, why isn't it emphasized in the docs?

This might explain it (GNU sed):
sed '/Banana/N;p' file --d
SED PROGRAM:
/Banana/ N
p
INPUT: 'file' line 1
PATTERN: Apple
COMMAND: /Banana/ N
COMMAND: p
Apple
END-OF-CYCLE:
Apple
INPUT: 'file' line 2
PATTERN: Banana
COMMAND: /Banana/ N
PATTERN: Banana\nGrapes
COMMAND: p
Banana
Grapes
END-OF-CYCLE:
Banana
Grapes
INPUT: 'file' line 4
PATTERN: Melon
COMMAND: /Banana/ N
COMMAND: p
Melon
END-OF-CYCLE:
Melon
Where --d is short for --debug
You will see the INPUT: lines go 1,2,4 because the second cycle also grabs input line 3 with the N command.

To debug this, I added = to your script so that you can see what's being emitted at each iteration. The line numbers conveniently demarcate the output from each. Then to identify the default print action at the end of each iteration, I added s/.*/==&==/ so you can see what was printed by sed because you did not specify -n.
sed '/Banana/N;=;p;=;s/.*/==&==/' <<\:
> Apple
> Banana
> Grapes
> Melon
> :
1
Apple
1
==Apple==
3
Banana
Grapes
3
==Banana
Grapes==
4
Melon
4
==Melon==
So, the pattern space containing Banana and Grapes was printed twice, and the first and last lines were printed in isolation twice.

How to delete in file 1 which is not in file 2

I have 2 files; file1 and file2. File1 has many lines/rows and columns. File2 has just one column, with several lines/rows. All of the strings in file2 are found in file1. I want to create a new file (file3),
For example,
File1:
Sally ate 083 popcorn
Rick has 241 cars
John won 505 dollars
Bruce knows 121 people
File2:
083
121
Desired file3:
Sally ate 083 popcorn
Bruce knows 121 people

Just use grep -f:
$ cat file1
Sally ate 083 popcorn
Rick has 241 cars
John won 505 dollars
Bruce knows 121 people
$ cat file2
083
121
$ grep -f file2 file1
Sally ate 083 popcorn
Bruce knows 121 people
To save the output in file3:
grep -f file2 file1 > file3

diff 2 files with an output that does not include extra lines

I have 2 files test and test1 and I would like to do a diff between them without the output having extra characters 2a3, 4a6, 6a9 as shown below.
mangoes
apples
banana
peach
mango
strawberry
test1:
mangoes
apples
blueberries
banana
peach
blackberries
mango
strawberry
star fruit
when I diff both the files
$ diff test test1
2a3
> blueberries
4a6
> blackberries
6a9
> star fruit
How do I get the output as
$ diff test test1
blueberries
blackberries
star fruit

A solution using comm:
comm -13 <(sort test) <(sort test1)
Explanation
comm - compare two sorted files line by line
With no options, produce three-column output. Column one contains
lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files.
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
-3 suppress column 3 (lines that appear in both files
As we only need the lines unique to the second file test1, -13 is used to suppress the unwanted columns.
Process Substitution is used to get the sorted files.

You can use grep to filter out lines that are not different text:
$ diff file1 file2 | grep '^[<>]'
> blueberries
> blackberries
> star fruit
If you want to remove the direction indicators that indicate which file differs, use sed:
$ diff file1 file2 | sed -n 's/^[<>] //p'
blueberries
blackberries
star fruit
(But it may be confusing to not see which file differs...)

You can use awk
awk 'NR==FNR{a[$0];next} !($0 in a)' test test1
NR==FNR means currently first file on the command line (i.e. test) is being processed,
a[$0] keeps each record in array named a,
next means read next line without doing anything else,
!($0 in a) means if current line does not exist in a, print it.

File Manipulation in UNIX - Creating duplicate records based on the count from a column and manipulating only one column

I have a space delimited file.txt with n number of columns. 3rd column in the file.txt is comma delimited, and I want to create duplicate records in same file.txt based on the number of counts in column(n=3) by splitting the comma delimited column with each value.
--file.txt
I have 0,1,2,3 apples
I have 2,3 bananas
I have 3 oranges
--desiredoutput.txt
I have 0 apples
I have 1 apples
I have 2 apples
I have 3 apples
I have 2 bananas
I have 3 bananas
I have 3 oranges

Awk solution:
awk '$3~/,/{
split($3, a, ","); f=$1 OFS $2; sub(/^[^ ]+ +[^ ]+ +[^ ]+/,"");
for (i in a) print f,a[i] $0; next
}1' file
The output:
I have 0 apples
I have 1 apples
I have 2 apples
I have 3 apples
I have 2 bananas
I have 3 bananas
I have 3 oranges

How to delete lines in file1 based on column match with file2

I have 2 files; file1 and file2. File1 has many lines/rows and columns. File2 has just one column, with several lines/rows. All of the strings in file2 are found in file1. I want to create a new file (file3), such that the lines in file1 that contain the any of the strings in file2 are deleted.
For example,
File1:
Sally ate 083 popcorn
Rick has 241 cars
John won 505 dollars
Bruce knows 121 people
File2:
083
121
Desired file3:
Rick has 241 cars
John won 505 dollars
Note that I do not want to enter the strings in file 2 into a command manually (the actual files are much larger than in the example).
Thanks!

awk approach:
awk 'BEGIN{p=""}FNR==NR{if(!/^$/){p=p$0"|"} next} $0!~substr(p, 1, length(p)-1)' file2 file1 > file3
p="" the variable treated as pattern containing all column values from file2
FNR==NR ensures that the next expression is performed for the first input file i.e. file2
if(!/^$/){p=p$0"|"} means: if it's not an empty line !/^$/ (as it could be according to your input) concatenate pattern parts with | so it eventually will look like 083|121|
$0!~substr(p, 1, length(p)-1) - checks if a line from the second input file(file1) is not matched with pattern(i.e. file2 column values)
The file3 contents:
Rick has 241 cars
John won 505 dollars

grep suites your purpose better than a line editor
grep -v -f File2 File1 >File3

Try this -
#cat f1
Sally ate 083 popcorn
Rick has 241 cars
John won 505 dollars
Bruce knows 121 people
#cat f2
083
121
#grep -vwf f2 f1
Rick has 241 cars
John won 505 dollars

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string