sed command to print lines between two patterns - linux

I am trying to print lines between two patterns through sed command. But I want to include the line containing Pattern1 in the result and exclude the Pattern2.
For ex:
/PAT1/
line 1
line 2
line 3
/PAT2/
The desired output is :
/PAT1/
line 1
line 2
line 3
I have tried this :
sed -n '/PAT1/,/PAT2/{/PAT2/{d};p}' Input_File
But it is excluding both the patterns.

You can do it with awk: awk '/patt1/{flag=1}/patt2/{flag=0}flag' input_file
If input_file is:
111
222
333
444
555
awk '/222/{flag=1}/444/{flag=0}flag' input_file
gives:
222
333

Related

How to replace two lines with a blank line using SED command?

I want to replace the first two lines with a blank line as below.
Input:
sample
sample
123
234
235
456
Output:
<> blank line
123
234
235
456
Delete the first line, remove all the content from the second line but don't delete it completely:
$ sed -e '1d' -e '2s/.*//' input.txt
123
234
235
456
Or insert a blank line before the first, and delete the first two lines:
$ sed -e '1i\
' -e '1,2d' input.txt
123
234
235
456
Or use tail instead of sed to print all lines starting with the third, and an echo first to get a blank line:
(echo ""; tail +3 input.txt)
Or if you're trying to modify a file in place, use ed instead:
ed -s input.txt <<EOF
1,2c
.
w
EOF
(The c command changes the given range of lines to new content)

How to use awk '{print $1*Number}' from the second line or telling him to ignore NaN values?

I have a file called 'waterproofposters.jsonl' with this type of output:
Regular price
100
200
300
400
500
And I need to take out 2% of each value. I have used the following code:
awk '{print $1*0.98}' waterproofposters.jsonl
And then I have the following output:
0
98
196
294
392
490
And then I'm stuck because I need to have 'Regular price' in the first line instead '0'
I thought to replace '0' with 'Regular price using
find . -name "waterproof.jsonl" | xargs sed -i -e 's/0/Regular price/g'
But it will replace all the '0' by 'Regular price'
To print the first line as-is:
awk '{print (NR>1 ? $0*0.98 : $0)}'
To print lines that are not a number as-is:
awk '{print ($0+0 == $0 ? $0*0.98 : $0)}'
I'm using $0 instead of $1 in the multiplication because:
They're the same thing in your numerical input, and
I aesthetically prefer using the same value across the whole script rather than different values for the numeric vs non-numeric lines, and
When you use a specific field it causes awk to do field-splitting so it's a bit more efficient to not reference a field when the whole record will do.
Here's both of the above working with the posted sample input:
$ awk '{print (NR>1 ? $0*0.98 : $0)}' file
Regular price
98
196
294
392
490
$ awk '{print ($0+0 == $0 ? $0*0.98 : $0)}' file
Regular price
98
196
294
392
490
and here's the difference between the two given input that has a non-numeric value mid input file:
$ cat file
Regular price
100
200
foobar
400
500
$ awk '{print (NR>1 ? $0*0.98 : $0)}' file
Regular price
98
196
0
392
490
$ awk '{print ($0+0 == $0 ? $0*0.98 : $0)}' file
Regular price
98
196
foobar
392
490
You can certainly achieve what you need with a single awk call, but an answer to why your sed -i -e 's/0/Regular price/g' command did not work as expected is that you used 0 as the regex pattern. 0 matches any zero char inside the string.
You want to replace 0s that are the only char on a line.
Hence, you need to use ^ and $ anchors to match the start and end of the line respectively:
sed -i 's/^0$/Regular price/'
If you need to replace on the first line only add the 1 address before the substitution command:
sed -i '1 s/^0$/Regular price/'
Note you do not need g, since you only expect one replacement per line and g is only needed when performing multiple replacements on a line. By default, all lines will get processed.
How to use awk '{print $1Number}' from the second line or telling him to ignore NaN values?*
I would do it following way using GNU AWK, let file.txt content be
Regular price
100
200
300
400
500
then
awk 'NR==1{print}NR>=2{print $1*0.98}' file.txt
output
Regular price
98
196
294
392
490
Explanation: if it 1st line just print it, if it 2nd or later line print 0.98 of 1st column value
(tested in GNU Awk 5.0.1)

Read a file for specific string and read lines after the match

I have a file which looks like:
AA
2
3
4
CCC
111
222
333
XXX
12
23
34
I am looking for awk command to search for a string 'CCC' from above and print all the lines that occur after 'CCC' but stop reading as soon as i reach 'XXX'.
A very simple command does the read for me but does not stop at XXX.
awk '$0 == "CCC" {i=1;next};i && i++' c.out
Could you please try following.
Solution 1st: With sed.
sed -n '/CCC/,/XXX/p' Input_file
Solution 2nd: With awk.
awk '/CCC/{flag=1} flag; /XXX/{flag=""}' Input_file
Solution 3rd: In case you want to print from string CCC to XXX but not these strings then do following.
awk '/CCC/{flag=1;next} /XXX/{flag=""} flag' Input_file
"Do something between this and that" can easily be solved with a range pattern:
awk '/CCC/,/XXX/' # prints everything between CCC and XXX (inclusive)
But it's not exactly what you've asked. You wanted to print everything after CCC and quit (stop reading) on XXX. This translates to
awk '/XXX/{exit};f;/CCC/{f=1}'

Reorder Lines Based On Previous File Order Before Randomization

I have the following lines in file1:
line 1text
line 2text
line 3text
line 4text
line 5text
line 6text
line 7text
With the command cat file1 | sort -R | head -4 I get the following in file2:
line 5text
line 1text
line 7text
line 2text
I would like to order the lines (not numerically, just the same order as file1) into the following file3:
line 1text
line 2text
line 5text
line 7text
The actual data doesn't have digits. Any easy way to do this? I was thinking of doing a grep and finding the first instance in a loop. But, I'm sure you experienced guys know an easier solution. Your positive input is highly appreciated.
You can decorate with line numbers, select four random lines lines, sort by line number and remove the line numbers:
$ nl -b a file1 | shuf -n 4 | sort -n -k 1,1 | cut -f 2-
line 2text
line 5text
line 6text
line 7text
The -b a option to nl makes sure that also empty lines are numbered.
Notice that this loads all of file1 into memory, as pointed out by ghoti. To avoid that (and as a generally smarter solution), we can use a different feature of (GNU) shuf: its -i option takes a number range and treats each number as a line. To get four random line numbers from an input file file1, we can use
shuf -n 4 -i 1-$(wc -l < file1)
Now, we have to print exactly these lines. Sed can do that; we just turn the output of the previous command into a sed script and run sed with sed -n -f -. All together:
shuf -n 4 -i 1-$(wc -l < file1) | sort -n | sed 's/$/p/;$s/p/{&;q}/' |
sed -n -f - file1
sort -n sorts the line numbers numerically. This isn't strictly needed, but if we know that the highest line number comes last, we can quit sed afterwards instead of reading the rest of the file for nothing.
sed 's/$/p/;$s/p/{&;q}/ appends p to each line. For the last line, we append {p;q} to stop processing the file.
If the output from sort looks like
27
774
670
541
then the sed command turns it into
27p
774p
670p
541{p;q}
sed -n -f - file1 processes file1, using the output of above sed command as the instructions for sed. -n suppresses output for the lines we don't want.
The command can be parametrized and put into a shell function, taking the file name and the number of lines to print as arguments:
randlines () {
fname=$1
nlines=$2
shuf -n "$nlines" -i 1-$(wc -l < "$fname") | sort -n |
sed 's/$/p/;$s/p/{&;q}/' | sed -n -f - "$fname"
}
to be used like
randlines file1 4
cat can add line numbers:
$ cat -n file
1 line one
2 line two
3 line three
4 line four
5 line five
6 line six
7 line seven
8 line eight
9 line nine
So you can use that to decorate, sort, undecorate:
$ cat -n file | sort -R | head -4 | sort -n
You can also use awk to decorate with a random number and line index (if your sort lacks -R like on OS X):
$ awk '{print rand() "\t" FNR "\t" $0}' file | sort -n | head -4
0.152208 4 line four
0.173531 8 line eight
0.193475 6 line six
0.237788 1 line one
Then sort with the line numbers and remove the decoration (one or two columns depending if you use cat or awk to decorate):
$ awk '{print rand() "\t" FNR "\t" $0}' file | sort -n | head -4 | cut -f2- | sort -n | cut -f2-
line one
line four
line six
line eight
another solution could be to sort whole file
sort file1 -o file2
to pick random lines on file2
shuf -n 4 file2 -o file3

Slice 3TB log file with sed, awk & xargs?

I need to slice several TB of log data, and would prefer the speed of the command line.
I'll split the file up into chunks before processing, but need to remove some sections.
Here's an example of the format:
uuJ oPz eeOO 109 66 8
uuJ oPz eeOO 48 0 221
uuJ oPz eeOO 9 674 3
kf iiiTti oP 88 909 19
mxmx lo uUui 2 9 771
mxmx lo uUui 577 765 27878456
The gaps between the first 3 alphanumeric strings are spaces. Everything after that is tabs. Lines are separated with \n.
I want to keep only the last line in each group.
If there's only 1 line in a group, it should be kept.
Here's the expected output:
uuJ oPz eeOO 9 674 3
kf iiiTti oP 88 909 19
mxmx lo uUui 577 765 27878456
How can I do this with sed, awk, xargs and friends, or should I just use something higher level like Python?
awk -F '\t' '
NR==1 {key=$1}
$1!=key {print line; key=$1}
{line=$0}
END {print line}
' file_in > file_out
Try this:
awk 'BEGIN{FS="\t"}
{if($1!=prevKey) {if (NR > 1) {print lastLine}; prevKey=$1} lastLine=$0}
END{print lastLine}'
It saves the last line and prints it only when it notcies that the key has changed.
This might work for you:
sed ':a;$!N;/^\(\S*\s\S*\s\S*\)[^\n]*\n\1/s//\1/;ta;P;D' file

Resources