I have a long text file where somewhere near the end there is a 1 line, with the 3rd column == OXT.
ATOM 2439 O LEU 300 -4.699 34.599 65.335 1.00 83.23 O
ATOM 2440 N LEU 301 -6.822 33.898 65.057 1.00 19.70 N
ATOM 2441 CA LEU 301 -7.080 34.965 64.138 1.00 19.70 C
ATOM 2442 CB LEU 301 -8.165 34.630 63.101 1.00 19.70 C
ATOM 2443 CG LEU 301 -7.762 33.478 62.162 1.00 19.70 C
ATOM 2444 CD1 LEU 301 -8.849 33.207 61.110 1.00 19.70 C
ATOM 2445 CD2 LEU 301 -6.376 33.719 61.543 1.00 19.70 C
ATOM 2446 C LEU 301 -7.556 36.168 64.946 1.00 19.70 C
ATOM 2447 O LEU 301 -8.657 36.695 64.633 1.00 19.70 O
ATOM 2448 OXT LEU 301 -6.821 36.580 65.884 1.00 19.70 O
TER 2449 LEU 301
HETATM 2450 NA NA 302 -13.016 13.036 54.214 1.00 44.33 NA
HETATM 2451 O WAT 303 -18.411 13.587 59.094 1.00 27.41 O
HETATM 2452 O WAT 304 -11.894 17.279 58.575 1.00 18.35 O
HETATM 2453 O WAT 305 -15.811 12.728 54.157 1.00 39.81 O
I need to modify this line with the pattern OXT (see example below) in a following fashion: in a third column - substitute "OXT" with "N "; in a forth column - substitute ACE with NHE; in a last column substitute O with N. Importantly after the substitutions I need to keep the equal space numbers between each of the columns as in the rest of the file:
ATOM 2439 O LEU 300 -4.699 34.599 65.335 1.00 83.23 O
ATOM 2440 N LEU 301 -6.822 33.898 65.057 1.00 19.70 N
ATOM 2441 CA LEU 301 -7.080 34.965 64.138 1.00 19.70 C
ATOM 2442 CB LEU 301 -8.165 34.630 63.101 1.00 19.70 C
ATOM 2443 CG LEU 301 -7.762 33.478 62.162 1.00 19.70 C
ATOM 2444 CD1 LEU 301 -8.849 33.207 61.110 1.00 19.70 C
ATOM 2445 CD2 LEU 301 -6.376 33.719 61.543 1.00 19.70 C
ATOM 2446 C LEU 301 -7.556 36.168 64.946 1.00 19.70 C
ATOM 2447 O LEU 301 -8.657 36.695 64.633 1.00 19.70 O
ATOM 2448 N NHE 301 -6.821 36.580 65.884 1.00 19.70 N
TER
HETATM 2450 NA NA 302 -13.016 13.036 54.214 1.00 44.33 NA
HETATM 2451 O WAT 303 -18.411 13.587 59.094 1.00 27.41 O
HETATM 2452 O WAT 304 -11.894 17.279 58.575 1.00 18.35 O
HETATM 2453 O WAT 305 -15.811 12.728 54.157 1.00 39.81 O
I have tried to use
awk '$3=="OXT"{ f=1; rn=NR; $3=$NF="N"; $4="NHE" }/TER/ && f && NR-rn == 1{ $0=$1 }1' file
It has produced a right job but within a new string now I have 1 space between each columns which is wrong format.
ATOM 2410 N NHE 299 -17.563 -15.711 -15.915 1.00 76.42 N
However I need to keep the original format of the spacings between the columns as in the rest of the file:
ATOM 2448 N NHE 301 -6.821 36.580 65.884 1.00 19.70 N
quick and very dirty:
#/bin/bash
skip=0
cat /tmp/list | while read line
do
third=$(echo $line | awk '{print $3}')
if [ $skip -eq 1 ]
then
echo "TER"
skip=0
continue
fi
if [ "${third}" == "OXT" ]
then
echo "${line}" | sed 's/OXT/N /'
skip=1
continue
fi
echo "${line}"
done
of course the /tmp/list is the file with all values.
You can pipe the result of your command to the column command:
$>awk '$3=="OXT"{ f=1; rn=NR; $3=$NF="N"; $4="NHE" }/TER/ && f && NR-rn == 1{ $0=$1 }1' f|column -t
ATOM 2439 O LEU 300 -4.699 34.599 65.335 1.00 83.23 O
ATOM 2440 N LEU 301 -6.822 33.898 65.057 1.00 19.70 N
ATOM 2441 CA LEU 301 -7.080 34.965 64.138 1.00 19.70 C
ATOM 2442 CB LEU 301 -8.165 34.630 63.101 1.00 19.70 C
ATOM 2443 CG LEU 301 -7.762 33.478 62.162 1.00 19.70 C
ATOM 2444 CD1 LEU 301 -8.849 33.207 61.110 1.00 19.70 C
ATOM 2445 CD2 LEU 301 -6.376 33.719 61.543 1.00 19.70 C
ATOM 2446 C LEU 301 -7.556 36.168 64.946 1.00 19.70 C
ATOM 2447 O LEU 301 -8.657 36.695 64.633 1.00 19.70 O
ATOM 2448 N NHE 301 -6.821 36.580 65.884 1.00 19.70 N
TER
HETATM 2450 NA NA 302 -13.016 13.036 54.214 1.00 44.33 NA
HETATM 2451 O WAT 303 -18.411 13.587 59.094 1.00 27.41 O
HETATM 2452 O WAT 304 -11.894 17.279 58.575 1.00 18.35 O
HETATM 2453 O WAT 305 -15.811 12.728 54.157 1.00 39.81 O
Related
Im attempting to add the number 128 to each line in column 6 of my file below_zn.pdb that contains 128 lines, and 12 columns separated by spaces, not tab delimited. When I use
awk '{ $6+=128; print }' below_zn.pdb
I am able to add 128 to column 6, but the formatting of my file changes. My output looks as follows:
ATOM 1 ZN ZN2 H 129 -13.264 34.400 10.700 1.00 0.00 HETA
ATOM 2 ZN ZN2 H 130 -13.264 25.273 10.700 1.00 0.00 HETA
ATOM 3 ZN ZN2 H 131 -13.264 43.527 10.700 1.00 0.00 HETA
ATOM 4 ZN ZN2 H 132 -13.264 52.654 10.700 1.00 0.00 HETA
ATOM 5 ZN ZN2 H 133 -13.175 29.836 14.467 1.00 0.00 HETA
ATOM 6 ZN ZN2 H 134 -13.175 38.963 14.467 1.00 0.00 HETA
ATOM 7 ZN ZN2 H 135 -13.175 48.090 14.467 1.00 0.00 HETA
ATOM 8 ZN ZN2 H 136 -13.175 57.217 14.467 1.00 0.00 HETA
ATOM 9 ZN ZN2 H 137 -10.679 34.400 -15.527 1.00 0.00 HETA
ATOM 10 ZN ZN2 H 138 -10.679 25.273 -15.527 1.00 0.00 HETA
ATOM 11 ZN ZN2 H 139 -10.679 43.527 -15.527 1.00 0.00 HETA
ATOM 12 ZN ZN2 H 140 -10.679 52.654 -15.527 1.00 0.00 HETA
ATOM 13 ZN ZN2 H 141 -10.590 29.836 -11.760 1.00 0.00 HETA
ATOM 14 ZN ZN2 H 142 -10.590 38.963 -11.760 1.00 0.00 HETA
ATOM 15 ZN ZN2 H 143 -10.590 48.090 -11.760 1.00 0.00 HETA
ATOM 16 ZN ZN2 H 144 -10.590 57.217 -11.760 1.00 0.00 HETA
ATOM 17 ZN ZN2 H 145 -9.288 34.400 1.958 1.00 0.00 HETA
ATOM 18 ZN ZN2 H 146 -9.288 25.273 1.958 1.00 0.00 HETA
ATOM 19 ZN ZN2 H 147 -9.288 43.527 1.958 1.00 0.00 HETA
ATOM 20 ZN ZN2 H 148 -9.288 52.654 1.958 1.00 0.00 HETA
I need to keep the formatting for my file to be useful. I have tried
awk -F'()' '{ $6+=128; print }' below_zn.pdb
but instead of adding the number 128 to all lines of column 6, I am seeing a new column at the farthest right made of the number 128 repeatedly. As seen below:
ATOM 1 ZN ZN2 H 1 -13.264 34.400 10.700 1.00 0.00 HETA 128
ATOM 2 ZN ZN2 H 2 -13.264 25.273 10.700 1.00 0.00 HETA 128
ATOM 3 ZN ZN2 H 3 -13.264 43.527 10.700 1.00 0.00 HETA 128
ATOM 4 ZN ZN2 H 4 -13.264 52.654 10.700 1.00 0.00 HETA 128
ATOM 5 ZN ZN2 H 5 -13.175 29.836 14.467 1.00 0.00 HETA 128
ATOM 6 ZN ZN2 H 6 -13.175 38.963 14.467 1.00 0.00 HETA 128
Is there a way I can use awk/sed/grep or any other command in linux to add 128 to my numbers in column 6 while keeping the formatting as follows:
ATOM 1 ZN ZN2 H 1 -13.264 34.400 10.700 1.00 0.00 HETA
ATOM 2 ZN ZN2 H 2 -13.264 25.273 10.700 1.00 0.00 HETA
ATOM 3 ZN ZN2 H 3 -13.264 43.527 10.700 1.00 0.00 HETA
ATOM 4 ZN ZN2 H 4 -13.264 52.654 10.700 1.00 0.00 HETA
ATOM 5 ZN ZN2 H 5 -13.175 29.836 14.467 1.00 0.00 HETA
ATOM 6 ZN ZN2 H 6 -13.175 38.963 14.467 1.00 0.00 HETA
ATOM 7 ZN ZN2 H 7 -13.175 48.090 14.467 1.00 0.00 HETA
ATOM 8 ZN ZN2 H 8 -13.175 57.217 14.467 1.00 0.00 HETA
ATOM 9 ZN ZN2 H 9 -10.679 34.400 -15.527 1.00 0.00 HETA
ATOM 10 ZN ZN2 H 10 -10.679 25.273 -15.527 1.00 0.00 HETA
ATOM 11 ZN ZN2 H 11 -10.679 43.527 -15.527 1.00 0.00 HETA
ATOM 12 ZN ZN2 H 12 -10.679 52.654 -15.527 1.00 0.00 HETA
ATOM 13 ZN ZN2 H 13 -10.590 29.836 -11.760 1.00 0.00 HETA
ATOM 14 ZN ZN2 H 14 -10.590 38.963 -11.760 1.00 0.00 HETA
ATOM 15 ZN ZN2 H 15 -10.590 48.090 -11.760 1.00 0.00 HETA
ATOM 16 ZN ZN2 H 16 -10.590 57.217 -11.760 1.00 0.00 HETA
ATOM 17 ZN ZN2 H 17 -9.288 34.400 1.958 1.00 0.00 HETA
ATOM 18 ZN ZN2 H 18 -9.288 25.273 1.958 1.00 0.00 HETA
ATOM 19 ZN ZN2 H 19 -9.288 43.527 1.958 1.00 0.00 HETA
ATOM 20 ZN ZN2 H 20 -9.288 52.654 1.958 1.00 0.00 HETA
.
.
.
An important note is that column 7 through 9 can have up to 7 characters (whole number with a period followed by the decimal), and there is one space separating the columns.
My file has the following format
column 1 - 4 characters
1 space
column 2 - 1 character
1 space
column 3 - 2 characters
1 space
column 4 - 1 character
1 space
column 5 - 3 characters
1 space
column 6 - 1,2,or 3 characters
5 spaces
column 7 - up to 7 characters
1 space
column 8 - up to 7 characters
1 space
column 9 - up to 7 characters
2 spaces
column 10 - 4 characters
2 spaces
column 11 - 4 characters
6 spaces
column 12 - 4 characters
end of file
Thank you!
Assumptions:
input is using fixed-width spacing
white space only shows up as a column delimiter (ie, no column values contain white space)
the values in column 6 are left-justified
Adding a new row to demonstrate a wider value for column 6:
$ cat below_zn.pdb
ATOM 1 ZN ZN2 H 1 -13.264 34.400 10.700 1.00 0.00 HETA
ATOM 2 ZN ZN2 H 2 -13.264 25.273 10.700 1.00 0.00 HETA
ATOM 3 ZN ZN2 H 3 -13.264 43.527 10.700 1.00 0.00 HETA
ATOM 4 ZN ZN2 H 4 -13.264 52.654 10.700 1.00 0.00 HETA
ATOM 5 ZN ZN2 H 5 -13.175 29.836 14.467 1.00 0.00 HETA
BUBBLE 206 ZN ZN2 H 7000 -13.175 29.836 14.467 1.00 0.00 HETA-HETA
One awk idea:
awk '
BEGIN { regex1="^([^[:space:]]+[[:space:]]+){5}" # match 1st 5 columns plus trailing white space
regex2="[^[:space:]]+" # match non-white space characters (aka 6th column)
}
{ oldline=$0
match(oldline,regex1) # find 1st 5 columns
newline=substr(oldline,1,RSTART+RLENGTH-1) # save 1st 5 columns for new line
oldline=substr(oldline,RSTART+RLENGTH) # strip off 1st 5 columns
match(oldline,regex2) # match 1st column of shortened line (aka 6th column of original line)
newval=substr(oldline,1,RLENGTH) + 128 # extract column and add 128
newlen=length(newval) # get length of new value
newline=newline newval substr(oldline,RSTART+newlen) # append new value and rest of line to newline
print newline # print newline to stdout
}
' below_zn.pdb
This generates:
ATOM 1 ZN ZN2 H 129 -13.264 34.400 10.700 1.00 0.00 HETA
ATOM 2 ZN ZN2 H 130 -13.264 25.273 10.700 1.00 0.00 HETA
ATOM 3 ZN ZN2 H 131 -13.264 43.527 10.700 1.00 0.00 HETA
ATOM 4 ZN ZN2 H 132 -13.264 52.654 10.700 1.00 0.00 HETA
ATOM 5 ZN ZN2 H 133 -13.175 29.836 14.467 1.00 0.00 HETA
BUBBLE 206 ZN ZN2 H 7128 -13.175 29.836 14.467 1.00 0.00 HETA-HETA
I would harness GNU AWK for this task following way, let file.txt content be
ATOM 1 ZN ZN2 H 1 -13.264 34.400 10.700 1.00 0.00 HETA
ATOM 2 ZN ZN2 H 2 -13.264 25.273 10.700 1.00 0.00 HETA
ATOM 3 ZN ZN2 H 3 -13.264 43.527 10.700 1.00 0.00 HETA
ATOM 4 ZN ZN2 H 4 -13.264 52.654 10.700 1.00 0.00 HETA
ATOM 5 ZN ZN2 H 5 -13.175 29.836 14.467 1.00 0.00 HETA
then
awk 'BEGIN{FPAT="[^[:space:]]+[[:space:]]*";OFS=""}{$6=($6+128) " ";print}' file.txt
gives output
ATOM 1 ZN ZN2 H 129 -13.264 34.400 10.700 1.00 0.00 HETA
ATOM 2 ZN ZN2 H 130 -13.264 25.273 10.700 1.00 0.00 HETA
ATOM 3 ZN ZN2 H 131 -13.264 43.527 10.700 1.00 0.00 HETA
ATOM 4 ZN ZN2 H 132 -13.264 52.654 10.700 1.00 0.00 HETA
ATOM 5 ZN ZN2 H 133 -13.175 29.836 14.467 1.00 0.00 HETA
Explanation: I inform GNU AWK that field is one-or-more (+) non (^) whitespace ([:space:]) characters, followed by zero-or-more (*) whitespace chcaracters, therefore trailing whitespace will become part of field and that output field separator (OFS) is empty string. Then for each line regarding 6th column I increase value by 128 and concatenate with two spaces, after that I print line. Feel free to adjust required number of spaces.
(tested in gawk 4.2.1)
I have a multi-line pdb file in the following format
ATOM 2381 CG2 THR A 304 3.359 -8.466 -13.379 1.00 34.89 C
ATOM 2380 OG1 THR A 304 5.073 -10.157 -13.609 1.00 36.00 O
...
ATOM 2380 OG1 THR A 304 5.073 -10.157 -13.609 1.00 36.00 O
TER
HETATM 2382 O HOH A 572 2.739 5.289 20.202 1.00 33.02 O
HETATM 2389 H01 HOH A 572 2.967 5.272 19.270 1.00 33.02 H
HETATM 2390 H02 HOH A 572 2.017 5.906 20.344 1.00 33.02 H
HETATM 2383 O HOH A 619 9.589 -1.213 21.275 1.00 28.34 O
HETATM 2391 H01 HOH A 619 9.100 -1.521 22.041 1.00 28.34 H
HETATM 2392 H03 HOH A 619 9.669 -0.257 21.309 1.00 28.34 H
HETATM 2384 O HOH A 634 8.859 1.214 21.216 1.00 27.10 O
HETATM 2393 H01 HOH A 634 9.495 1.911 21.394 1.00 27.10 H
HETATM 2394 H02 HOH A 634 8.631 0.771 22.037 1.00 27.10 H
HETATM 2385 O HOH A 660 10.309 -1.469 23.867 1.00 43.45 O
HETATM 2395 H01 HOH A 660 9.648 -1.616 24.547 1.00 43.45 H
HETATM 2396 H02 HOH A 660 10.465 -0.527 23.770 1.00 43.45 H
END
Using some utility I need to copy all lines after TER record (they may be defined as the lines started from HETATM) and save it in the separate file, containing:
HETATM 2382 O HOH A 572 2.739 5.289 20.202 1.00 33.02 O
HETATM 2389 H01 HOH A 572 2.967 5.272 19.270 1.00 33.02 H
HETATM 2390 H02 HOH A 572 2.017 5.906 20.344 1.00 33.02 H
HETATM 2383 O HOH A 619 9.589 -1.213 21.275 1.00 28.34 O
HETATM 2391 H01 HOH A 619 9.100 -1.521 22.041 1.00 28.34 H
HETATM 2392 H03 HOH A 619 9.669 -0.257 21.309 1.00 28.34 H
HETATM 2384 O HOH A 634 8.859 1.214 21.216 1.00 27.10 O
HETATM 2393 H01 HOH A 634 9.495 1.911 21.394 1.00 27.10 H
HETATM 2394 H02 HOH A 634 8.631 0.771 22.037 1.00 27.10 H
HETATM 2385 O HOH A 660 10.309 -1.469 23.867 1.00 43.45 O
HETATM 2395 H01 HOH A 660 9.648 -1.616 24.547 1.00 43.45 H
HETATM 2396 H02 HOH A 660 10.465 -0.527 23.770 1.00 43.45 H
what unix utility may be useful for this?
I would like to ask how to change in last column the letter A to C using sed.
Input for example:
HETATM 18 H UNK 0 12.447 20.851 23.373 0.00 0.00 0.167 HD
HETATM 19 C UNK 0 11.406 19.947 21.942 0.00 0.00 0.033 A
HETATM 20 C UNK 0 10.684 20.899 21.181 0.00 0.00 0.030 A
HETATM 21 C UNK 0 9.503 20.541 20.507 0.00 0.00 0.019 A
HETATM 22 C UNK 0 9.032 19.211 20.545 0.00 0.00 0.032 A
HETATM 23 C UNK 0 9.772 18.248 21.264 0.00 0.00 0.019 A
HETATM 24 C UNK 0 10.946 18.613 21.948 0.00 0.00 0.030 A
HETATM 25 C UNK 0 7.833 18.846 19.889 0.00 0.00 0.253 C
HETATM 26 O UNK 0 7.856 18.994 18.642 0.00 0.00 -0.267 OA
Output:
HETATM 18 H UNK 0 12.447 20.851 23.373 0.00 0.00 0.167 HD
HETATM 19 C UNK 0 11.406 19.947 21.942 0.00 0.00 0.033 C
HETATM 20 C UNK 0 10.684 20.899 21.181 0.00 0.00 0.030 C
HETATM 21 C UNK 0 9.503 20.541 20.507 0.00 0.00 0.019 C
HETATM 22 C UNK 0 9.032 19.211 20.545 0.00 0.00 0.032 C
HETATM 23 C UNK 0 9.772 18.248 21.264 0.00 0.00 0.019 C
HETATM 24 C UNK 0 10.946 18.613 21.948 0.00 0.00 0.030 C
HETATM 25 C UNK 0 7.833 18.846 19.889 0.00 0.00 0.253 C
HETATM 26 O UNK 0 7.856 18.994 18.642 0.00 0.00 -0.267 OA
I tried sed like this:
sed 's/[A*]$/C/'
But the output looks like this:
HETATM 26 O UNK 0 7.856 18.994 18.642 0.00 0.00 -0.267 OC
Simple sed approach:
sed 's/\<A[[:space:]]*$/C/' file
\< - word boundary (assuming A char occurs only as standalone char)
[[:space:]]* - match possible whitespace(s) at the end of the string $
The output:
HETATM 18 H UNK 0 12.447 20.851 23.373 0.00 0.00 0.167 HD
HETATM 19 C UNK 0 11.406 19.947 21.942 0.00 0.00 0.033 C
HETATM 20 C UNK 0 10.684 20.899 21.181 0.00 0.00 0.030 C
HETATM 21 C UNK 0 9.503 20.541 20.507 0.00 0.00 0.019 C
HETATM 22 C UNK 0 9.032 19.211 20.545 0.00 0.00 0.032 C
HETATM 23 C UNK 0 9.772 18.248 21.264 0.00 0.00 0.019 C
HETATM 24 C UNK 0 10.946 18.613 21.948 0.00 0.00 0.030 C
HETATM 25 C UNK 0 7.833 18.846 19.889 0.00 0.00 0.253 C
HETATM 26 O UNK 0 7.856 18.994 18.642 0.00 0.00 -0.267 OA
So below is a part of one column-sensitive file from lines 23 to 34. Please look at columns 25 and 26. Lines 23 to 28 are correct as it's supposed to be sequential.
HETATM 21 O HOH 7 -1.609 5.551 -4.296 1.00 0.00 WAT O
HETATM 22 H HOH 7 -1.594 5.971 -3.395 1.00 0.00 WAT H
HETATM 23 H HOH 7 -1.048 4.730 -4.281 1.00 0.00 WAT H
HETATM 24 O HOH 8 -4.693 5.472 -0.557 1.00 0.00 WAT O
HETATM 25 H HOH 8 -3.881 4.900 -0.521 1.00 0.00 WAT H
HETATM 26 H HOH 8 -4.819 5.805 -1.485 1.00 0.00 WAT H
HETATM 27 O HOH 1 0.289 -5.035 5.663 1.00 0.00 WAT O
HETATM 28 H HOH 10 0.241 -4.604 -5.564 1.00 0.00 WAT H
HETATM 29 H HOH 1 -0.399 -5.750 5.605 1.00 0.00 WAT H
HETATM 30 O HOH 11 -1.741 -5.167 0.877 1.00 0.00 WAT O
HETATM 31 H HOH 0 -2.612 -4.754 0.636 1.00 0.00 WAT H
HETATM 32 H HOH 0 -1.819 -5.599 1.769 1.00 0.00 WAT H
However, columns 25 and 26 in lines 29 to 34 (and also lines beyond 34 that are not included here) need to be edited. They represent the ID number of water molecules in the file. So, columns 25 and 26 in lines 29-31 is supposed to be ' 9' instead of ' 1' or '10', and columns 25 and 26 in lines 32-34 are supposed to be '10' instead of '11' or ' 0'. And all lines after 34 suffers from the similar problem and I also want to change the contents in columns 25 and 26 to '12','13',etc. for each group of 3 lines. So the final result is expected to be like this.
HETATM 21 O HOH 7 -1.609 5.551 -4.296 1.00 0.00 WAT O
HETATM 22 H HOH 7 -1.594 5.971 -3.395 1.00 0.00 WAT H
HETATM 23 H HOH 7 -1.048 4.730 -4.281 1.00 0.00 WAT H
HETATM 24 O HOH 8 -4.693 5.472 -0.557 1.00 0.00 WAT O
HETATM 25 H HOH 8 -3.881 4.900 -0.521 1.00 0.00 WAT H
HETATM 26 H HOH 8 -4.819 5.805 -1.485 1.00 0.00 WAT H
HETATM 27 O HOH 9 0.289 -5.035 5.663 1.00 0.00 WAT O
HETATM 28 H HOH 9 0.241 -4.604 -5.564 1.00 0.00 WAT H
HETATM 29 H HOH 9 -0.399 -5.750 5.605 1.00 0.00 WAT H
HETATM 30 O HOH 10 -1.741 -5.167 0.877 1.00 0.00 WAT O
HETATM 31 H HOH 10 -2.612 -4.754 0.636 1.00 0.00 WAT H
HETATM 32 H HOH 10 -1.819 -5.599 1.769 1.00 0.00 WAT H
So far I couldn't really come up with a nice pattern to replace those funky numbers to 9,10,etc. It would be great if I could replace all these groups of 3 lines in a single vim command instead of having to do it group by group, as there are 50-60 groups of these with this problem. What I did earlier was just simply :26,28s/HOH 1/HOH 8 and this is clearly not the most efficient way.
Sorry for not being clear at the first attempt of the question, but your help would be appreciated. Thank you
Your question is not clear, but from what I understand, trying to select a rectangular block in visual mode might help you. Use ctrl-v in OS X or Linux or ctrl-q in Windows (in normal mode).
Actually I'd like to thank everyone for your time and sorry for causing the confusions. I found a way to do it, with python's string formatting as the pattern is really fuzzy and I'm not so used to the regex patterns so I couldn't figure a simple way to do it on VIM.
I would like to add a specific line "TER" to several variable text files:
Input:
[...]
ATOM 4149 C LEU C 9 136.820 120.050 53.540 1.00 0.00
ATOM 4150 O LEU C 9 136.600 118.860 53.240 1.00 0.00
ATOM 4151 O LEU C 9 137.310 120.340 54.650 1.00 0.00
ATOM 4154 N LYS D 2 115.050 134.940 61.060 1.00 0.00
ATOM 4155 H1 LYS D 2 115.660 134.160 61.180 1.00 0.00
ATOM 4156 H2 LYS D 2 114.760 135.000 60.100 1.00 0.00
[...]
Output:
[...]
ATOM 4149 C LEU C 9 136.820 120.050 53.540 1.00 0.00
ATOM 4150 O LEU C 9 136.600 118.860 53.240 1.00 0.00
ATOM 4151 O LEU C 9 137.310 120.340 54.650 1.00 0.00
TER
ATOM 4154 N LYS D 2 115.050 134.940 61.060 1.00 0.00
ATOM 4155 H1 LYS D 2 115.660 134.160 61.180 1.00 0.00
ATOM 4156 H2 LYS D 2 114.760 135.000 60.100 1.00 0.00
[...]
So the pattern is: if after a " C " for the first time a " D " is found add a "TER" before the " D " line (after the " C " line). All other numbers and characters can be variable.
I found some examples with the sed command however I do not know how to do add to the previous line.
With awk:
$ awk 'last_c5=="C" && $5=="D" {print "TER"}; last_c5=$5' file
ATOM 4149 C LEU C 9 136.820 120.050 53.540 1.00 0.00
ATOM 4150 O LEU C 9 136.600 118.860 53.240 1.00 0.00
ATOM 4151 O LEU C 9 137.310 120.340 54.650 1.00 0.00
TER
ATOM 4154 N LYS D 2 115.050 134.940 61.060 1.00 0.00
ATOM 4155 H1 LYS D 2 115.660 134.160 61.180 1.00 0.00
ATOM 4156 H2 LYS D 2 114.760 135.000 60.100 1.00 0.00
It keeps tracking last 5th column value storing it in last_c5 variable. In case the previous was C and the current is D, it prints TER. On last_c5=$5 all lines are being printed.