Substitution of the pattern with SED - text

I have a string
ATOM 2448 N LEU 301 -6.821 36.580 65.884 1.00 19.70 O
Here I need to substitute any term in third column (which is always equal to tree capital letters) with NHE.
I try to use
sed-e 's/[[:upper:]][[:upper:]][[:upper:]]/NHE/g'
but it substitute the world in the first column also
NHEM 2448 N NHE 301 -6.821 36.580 65.884 1.00 19.70 O
how to ask sed to substitute only the word which consist of only the 3 any letters (not with more than 3)?
Thanks!

This might work for you (GNU sed):
sed 's/\S\+/NHE/4' file
Replace the 4th non-empty column with NHE
An alternative:
sed 's/\S\S*/NHE/4' file

sed -r 's/(([^[:blank:]]+[[:blank:]]+){3})\<[[:upper:]]{3}\>/\1NHE/' file
ATOM 2448 N NHE 301 -6.821 36.580 65.884 1.00 19.70 O
Some systems require -E instead of -r
With GNU sed, you can specify which match gets replaced. Here, you want to replace the 4th whitespace-separated word:
sed -r 's/[^[:blank:]]+/NHE/4' file

Related

GNU Awk - don't modify whitespaces

I am using GNU Awk to replace a single character in a file. The file is a single line with varying whitespacing between "fields". After passing through gawk all the extra whitespacing is removed and I end up with single spaces. This is completely unintended and I need it to ignore these spaces and only change the one character I have targeted. I have tried several variations, but I cannot seem to get gawk to ignore these extra spaces.
Since I know this will come up, I read from the end of the line for replacement because the whitespacing is arbitrary/inconsistent in the source file.
Command:
gawk -i inplace -v new=3 'NF {$(NF-5) = new} 1' ~/scripts/tmp_beta_weather_file
Original file example:
2020-07-01 18:29:51.00 C M -11.4 28.9 29 9 23 5.5 000 0 0 00020 044013.77074 1 1 1 3 0 0
Result after command above:
2020-07-01 18:30:51.00 C M -11.8 28.8 29 5 23 5.5 000 0 0 00020 044013.77143 3 1 1 3 0 0
it might be easier with sed
sed -E 's/([^ ]+)(( [^ ]+){5})$/3\2/' file
test and add -i for in-place edit.

sed: filter string subset from lines matching regexp

I have a file of the following format:
abc: A B C D E
abc: 1 2 3 4 5
def D E F G H
def: 10 11 12 23 99
...
That is a first line with strings after ':' is a header for the next line with numbers. I'd like to use sed to extract only a line starting with PATTERN string with numbers in the line.
Number of numbers in a line is variable, but assume that I know exactly how many I'm expecting, so I tried this command:
% sed 's/^abc: \([0-9]+ [0-9]+ [0-9]+\)$/\1/g' < file.txt
But it dumps all entries from the file. What am I doing wrong?
sed does substitutions and prints each line, whether a substitution happens or not.
Your regular expression is wrong. It would match only three numbers separated by spaces if extended regex flag was given (-E). Without it, not even that, because the + sign will be interpreted literally.
The best here is to use addresses and only print lines that have a match:
sed -nE '/^abc: [0-9]+ [0-9]+ [0-9]+ [0-9]+ [0-9]+$/p' < file.txt
or better,
sed -nE '/^abc:( [0-9]+){5}$/p' < file.txt
The -n flag disables the "print all lines" behavior of sed described in (1). Only the lines that reach the p command will be printed.
to extract only a line starting with PATTERN string with numbers in the line and Number of numbers in a line is variable means at least one number, so:
$ sed -n '/abc: \([0-9]\+\)/p' file
Output:
abc: 1 2 3 4 5
With exactly 5 numbers, use:
$ sed -n '/abc: \([0-9]\+\( \|$\)\)\{5\}/p' file
With #Mark's additional question in a comment "If I want to just extract the matched numbers (and remove prefix, e.g, abc)…" this is the pattern I came up with:
sed -En 's/^abc: (([0-9]+[ \t]?)+)[ \t]*$/\1/gp' file.txt
I'm using the -E flag for extended regular expressions to avoid all the escaping that would be needed.
Given this file:
abc: A B C D E
abc: 1 2 3 4 5
abc: 1 c9 A 7f
def D E F G H
def: 10 11 12 23 99
… this regex matches abc: 1 2 3 4 5 while excluding abc: 1 c9 A 7f — it also allows variable whitespace and trailing whitespace.
With any sed:
$ sed -n 's/^abc: \([0-9 ]*\)$/\1/p' file
1 2 3 4 5

How to remove n characters from a specific column using sed/awk/perl

I have the following tab delimited data:
chr1 3119713 3119728 MA05911Bach1Mafk 839 +
chr1 3119716 3119731 MA05011MAFNFE2 860 +
chr1 3120036 3120051 MA01502Nfe2l2 866 +
What I want to do is to remove 7 characters from 4th column.
Resulting in
chr1 3119713 3119728 Bach1Mafk 839 +
chr1 3119716 3119731 MAFNFE2 860 +
chr1 3120036 3120051 Nfe2l2 866 +
How can I do that?
Note the output needs to be also TAB separated.
I'm stuck with the following code, which replaces from the first
column onward, which I don't want
sed 's/^.\{7\}//' myfile.txt
awk '{ $4 = substr($4, 8); print }'
perl -anE'$F[3] =~ s/.{7}//; say join "\t", #F' data.txt
or
perl -anE'substr $F[3],0,7,""; say join "\t", #F' data.txt
With sed
$ sed -E 's/^(([^\t]+\t){3}).{7}/\1/' myfile.txt
chr1 3119713 3119728 Bach1Mafk 839 +
chr1 3119716 3119731 MAFNFE2 860 +
chr1 3120036 3120051 Nfe2l2 866 +
-E use extended regular expressions, to avoid having to use \ for (){}. Some sed versions might need -r instead of -E
^(([^\t]+\t){3}) capture the first three columns, easy to change number of columns if needed
.{7} characters to delete from 4th column
\1 the captured columns
Use -i option for in-place editing
With perl you can use \K for variable length positive lookbehind
perl -pe 's/^([^\t]+\t){3}\K.{7}//' myfile.txt

how to delete all lines that match a pattern asking permission in vi

Hello I'm new with vi and i have a problem making vi ask me the permission to delete all line with a pattern. My file looks like this:
SEQRES 1 A 46 GLY SER GLU ALA ARG GLU CYS VAL ASN CYS GLY ALA THR
SEQRES 2 A 46 ALA THR PRO LEU TRP ARG ARG ASP ARG THR GLY HIS TYR
SEQRES 3 A 46 LEU CYS ASN ALA CYS GLY LEU TYR HIS LYS MET ASN GLY
SEQRES 4 A 46 GLN ASN ARG PRO LEU ILE ARG
I want to delete all the lines that contain the string 'GLY'
This is what i came up to:
:g/GLY/cd
but it's definitely wrong
Only the :substitute command has the confirm flag. However, if you use a regular expression that matches the entire line (including the trailing newline), you can use that to delete entire lines, with confirmation:
:%s/.*GLY.*\n//c
Alternatively, you could build your own confirmation into :global; here's a simple one that you have to answer with either Enter or Esc:
:g/GLY/if confirm('Delete: ' . getline('.')) | delete _ | endif
[Use the power of replace :%s]
To confirm delete all lines containing vim,
:g/vim/s/.*//gc [This confirm-replace all matching lines with blanks]
:g/^$/d [This deletes all the blank lines]
Bonus: To confirm delete all lines that 'begin' with vim,
:g/^vim/s/.*//gc
:g/^$/d

how to remove only the first two leading spaces in all lines of a files

my input file is like
*CONTROL_ADAPTIVE
$ adpfreq adptol adpopt maxlvl tbirth tdeath lcadp ioflag
0.10 5.000 2 3 0.0 0.0 0 0
I JUST want to remove the leading 2 spaces in all the lines.
I used
sed "s/^[ \t]*//" -i inputfile.txt
but it deletes all the space from all the lines.. I just want to shift the complete text in files to two position to left.
Any solutions to this?
You can specify that you want to delete two matches of the character set in the brackets:
sed -r -i "s/^[ \t]{2}//" inputfile.txt
See the output:
$ sed -r "s/^[ \t]{2}//" file
*CONTROL_ADAPTIVE
$ adpfreq adptol adpopt maxlvl tbirth tdeath lcadp ioflag
0.10 5.000 2 3 0.0 0.0 0 0

Resources