I am dealing with a pdb file, which basically contains many atoms and their coordinates. The first and second lines looks like this:
ATOM 1 ZN ZN2 Z 1 6.703 3.973 -2.488 1.00 0.00 ZINC
ATOM 2 ZN ZN2 Z 1 -2.639 3.973 -2.488 1.00 0.00 ZINC
I have overlapping atoms. This means, the three numbers (in the example 6.703 3.973 -2.488 in line 1, and -2.639 3.973 -2.488 in line 2) appear many times throughout the file. Each line, however, has a different atom number (in the example, the 1 and the 2 after ATOM). Thus, the lines for the overlapping atoms aren't exactly the same. The coordinates are. I want to delete all the lines with the repeated coordinates but one. I do not care about the order. So far, I have manually search for each set of coordinates and replaced them but the first appearance with a three # characters, to later delete all lines containing ###. For example, for line 1 I am doing this:
:%s/6.703 3.973 -2.488/###/gc
:g/###/d
This file, however, is extremely long. I am afraid doing this for every line will take me days, and there is a big change that I will make mistakes. Is there an easier way for this? I prefer using the vi editor, but emacs works too.
Thank you!!!
With the format you show in the question (whitespace-delimited fields), you can use a common awk idiom to remove "duplicates" directly from the command-line:
awk '!/^ATOM / || !dupes[$7,$8,$9]++' infile >outfile
This prints any line that does not start ATOM and the first atom line for any given set of coordinates.
The idiom !f[x]++ works as:
the first time array element f[x] is accessed, for any particular value of x, it is assigned empty string
! means "not". !(empty string) equates to true
++ increments a value by one (initially to 1, then 2, 3, 4, etc) - empty string is treated as zero. (++x increments before x is used; x++ increments after)
!(positive number) is false
Depending on the layout of the file, you may be able to run a visual check by something like:
diff -u infile outfile | sort -k7,9 | less
This should display the atom lines grouped by coordinate.
You can use -S in less to disable wrapping (then left/right arrow to scroll sideways).
The first line of each group will start with space, the rest with hyphen.
Then searching for ^ .* in less provides simple highlighting.
Any non-atom lines should be prefixed with space. Something went wrong if there are any prefixed with plus or hyphen.
This question already has answers here:
Why does incrementing with CTRL-A in Vim take me from "07" to "10"?
(2 answers)
Closed 3 years ago.
The behavior of vim's Ctrl A is weird when incrementing numbers <= 0.005.
Due to some personal need, I want to get a set of numbers that increments by 0.005 each time from 0.005,
like this:
0.005
0.010
0.015
...
then I thought of vim's macro and Ctrl A.
I entered 0.005 in vim's first line, use y y p Ctrl A to record a macro. But when I moved the cursor to 5 and then pressed Ctrl A continuously, the third time, the number changed directly from 0.007 to 0.010. If I just press 3 times, the output will become:
0.005
0.010
0.013
0.016
...
That means that I cannot complete the task using vim.
After doing this in other ways, I started to be interested in the behavior of vim's Ctrl A.
The text below comes from vim's help manual:
:h CTRL-A:
Add [count] to the number or alphabetic character at or after the cursor.
and :h count:
An optional number that may precede the command to multiply or iterate the command.
If no number is given, a count of one is used, unless otherwise noted.
When I tested some other numbers, I found that the behavior becomes weird starting from 0.01. But I still don't know why Ctrl A behaves like this.
Before starting to read the source of vim, does anyone know why vim's Ctrl A behaves like this on decimals?
BTW, My PC environment is Win10, and I use vim_only_x64 downloading from vim's official site.
As #Simson Said, vim thinks it is an octal number and increses 0.07 to 0.10.
You can change this behavior by telling vim not to use octal numbers.
:set nrformats-=octal
Then it increases as expected 0.09 => 0.10
You can see the definition o numbers with :h expr-number
Vim does not recognize 0.004 as a decimal fraction, it interpreters it as something.004 The leading 0 of a number encodes an octal number so it increments from 007 to 010
Fun fact it will also recognize hexadecimal numbers 0x09 will be incremented to 0x0a
I am using vim-sexp and vim-sexp-mappings-for-regular-people plugins for editing Clojure files. I don't quite understand what slurp and barf commands do exactly.
I tried playing with them, and it seems that they insert/remove forms at the beginning/end of an adjacent form. Is that correct? If not, what is the proper definition for slurp and barf?
slurping and barfing are the essential operations/concepts to use one of the modern structural code editors. after getting used to them I'm completely incapable of editing code without these. Of the ~20 people that sit with me writing clojure all day all of them use these all the time. so saying that they are "helpful for lisp coders" is a very tactful and polite understatement.
slurp: (verb)
"to include the item to one side of the expression surrounding the point into the expression"
barf: (verb)
"to exclude either the left most or right most item in the expression surrounding the point from the expression"
and some examples.
1 2 (3 4) 5 6
slurp right:
1 2 (3 4 5) 6
barf right:
1 2 (3 4) 5 6
slurp left:
1 (2 3 4) 5 6
barf left:
1 2 (3 4) 5 6
and we're back where we started.
When I give talks/presentations introducing paredit I generally leave students/attendees with just these two concepts because I feel they are enough to start getting the benefit of structural editing without being overwhelming. Once you are comfortable with these then move on to structual navigation by learning to move forward/backward and up/down by expression rather than by character.
even though it list emacs keybindings I still highly recommend the animated guide to paredit that Peter Rincker mentions in his answer.
it may seem gross, but i visualise barf-ing like vomiting (they are synonyms after all), where you are expelling something out.
slurping, i visualise having a drink through a straw and drawing in the drink.
The pipe symbol is the cursor in these illustrations.
so barf-ing to the right (pushing out the 4 )
1 2 (3 |4) 5 6 -> 1 2 (3|) 4 5 6
and slurping to the right gets you back the 4 (gross as it may be to re-ingest what you previously threw up)
1 2 (3|) 4 5 6 -> 1 2 (3 4) 5 6
The backward version do the same things but with items before the current s-exp.
I find I use the forward/right versions much more than the left, as I'm usually adding something in front, like a let binding, so a session might be:
(some-fn1 (count some-map))
(some-fn2 (count some-map))
aha, a let could come in here to refactor the (count some-map):
(let [c (count some-map)]|)
(some-fn1 c)
(some-fn2 c)
But the let isn't wrapping the 2 calls, so we want to pull in (slurp) the next 2 forms inside the let s-exp, so now at the cursor position, slurp twice which will give after first:
(let [c (count some-map)]|
(some-fn1 c))
(some-fn2 c)
and then on second:
(let [c (count some-map)]|
(some-fn1 c)
(some-fn2 c))
and any decent editor with paredit/structural editing will also do the indentation at the same time for you.
It's also important to note that the barf/slurp will happen within the current set of brackets (i.e. slurping (let [a (count x)]) will do different things depending on where the cursor is, as there are 3 sets of brackets), hence why I was careful where to put the cursor in the let binding above, else you're push in/out the wrong bracket (which is another way of thinking of barf/slurping - manipulating the position of the bracket rather than pull/pushing items into/out of the s-exp).
I am not an expert on: lisps, emacs, paredit, vim-sexp, or vim-sexp-mappings-for-regular-people. (Why am I posting right?!)
However I know that slurp and barf come form Emac's paredit mode. This Emacs mode is supposedly very helpful for lisp coders. I am sure you can find a nice helpful article on these subjects if you search for paredit. As a matter a fact I found a nice article for you: The Animated Guide to Paredit. From what I can tell you are right in your guesses about slurp and barf.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why did we bother with line numbers at all?
I'm curious about why early versions of the BASIC programming language had line numbering like in:
42 PRINT "Hello world!"
The text editors back then had no line numbering?
EDIT: Yes, I know they are used for GOTOs, but why? I mean having labels was too computationally expensive?
Many microcomputers had a BASIC interpreter in ROM that would start upon bootup. The problem was that there was no text editor or file system to speak of. You had an interactive prompt to do everything through. If you wanted to insert a line of code, you just typed it, starting with the line number. It would insert it into the correct spot in you code. e.g:
>10 print "hello"
>30 goto 10
>20 print "world"
>list
10 PRINT "hello"
20 PRINT "world"
30 GOTO 10
>
(In that example > is the BASIC prompt)
If you wanted to erase a line, you would type something like ERASE 20.
Some really fancy systems gave you a line editor (i.e. EDIT 10)
And if you didn't plan your line numbers and ran out (how do I insert a line between 10 and 11?) some systems gave you a RENUM command which would renumber your code (and adjust GOTOs and GOSUBs appropriately).
Fun Times!
The original BASIC line numbering was actually an integral part of the language, and used for control flow.
The GOTO and GOSUB commands would take the line, and use it for control flow. This was common then (even though it's discouraged now).
They were used as labels for GOTO and GOSUB
Like this:
10 PRINT "HELLO WORLD"
20 GOTO 10
There were no named labels in some early BASIC versions
They were also required if you wanted to insert a line between 2 existing lines of code, because in the early days, you had no full text editors. Everything had to be typed in the "interactive" interpreter.
So if you typed:
15 PRINT "AND THE UNIVERSE"
The program would become:
10 PRINT "HELLO WORLD"
15 PRINT "AND THE UNIVERSE"
20 GOTO 10
When you ran out of line numbers, you could run a "renumbering" tool to renumber all lines in your program, but in the very early days of the Commodore 64 and other home computers, we didn't even have that, so you'd have to renumber manually. That's why you had to leave gaps of 10 or more in the line numbers, so you could easily add lines in between.
If you want to try out the Commodore 64 interpreter, check out this C64 emulator written in Flash: http://codeazur.com.br/stuff/fc64_final/ (no install required)
In BASIC, the line numbers indicated sequence.
Also, many older editors weren't for files, but simply lines ("line editors", e.g. ed, the standard editor). By numbering them this way, you knew which line you were working on.
A simple google reveals what wikipedia has to say about it:
Line numbers were a required element of syntax in some older programming languages such as GW-BASIC.[2] The primary reason for this is that most operating systems at the time lacked interactive text editors; since the programmer's interface was usually limited to a line editor, line numbers provided a mechanism by which specific lines in the source code could be referenced for editing, and by which the programmer could insert a new line at a specific point. Line numbers also provided a convenient means of distinguishing between code to be entered into the program and commands to be executed immediately when entered by the user (which do not have line numbers).
Back in the day all languages had sequence numbers, everything was on punched cards.
There was one line per card.
Decks of cards made up your program.
When you dropped the cards, you'd put them in a card sorter that used those sequence numbers.
And of course, they were referenced by control flow constructs.
On the C64, there wasn't even a real editor (built-in at least). To edit a part of the program, you'd do something like LIST 100-200, and then you'd only be able to edit those lines that were currently displayed on the screen (no scrolling upwards!)
They were labels for statements, so that you could GOTO the line number. The number of the statements did not necessarily have to match the physical line numbers in the file.
The line numbers were used in control flow. There were no named subroutines. You had to use GOSUB 60, for instance, to call the subroutine starting at line 60.
On your update, not all languages had labels, but all languages had line numbers at one time. At one time, everything was punch cards. BASIC was one of the very first interactive languages, where you could actually type something and have a response immediately. Line numbers were still the current technology.
Labels are an extra expense. You have to keep track of the correlation between the symbolic label and the code or data to which it refers. But if every line has a line number (and if all transfer of control flow statements always refer to the beginning of a line), then you don't need a separate symbol table.
Also keep in mind that original BASIC interpreters didn't need a symbol table for variables: There were 26 variables named A-Z. Some were sophisticated and had An-Zn. Some got very fancy and added a distinction between string, integer and floating point by adding "$" or "%" after the variable. But no symbol table was required.
IIRC, line numbers were mostly used as labels for GOTO and GOSUB statements, since in some (most?) flavors of BASIC there was no way to label a section of code.
They were also used by the editor - ie you said:
edit 100
to edit line 100.
As others have pointed out, these line numbers were used as part of subroutines.
Of course, there's a reason that this isn't done anymore. Imagine if you say GOTO 20 on line 10, and then later realize you need to write 10 more lines of code after line 10. All of a sudden, you're smashing up against 20 so you either need to shift your subroutine farther away (higher numbers) and change your GOTO value, or you need to write another subroutine that jumps farther in the code.
In other words, it became a nightmare of true spaghetti code and is not fun to maintain.
It was entered in on the command-line in many instances (or was, on my old Commodore 64) so there might not always have been a text editor, or if there was, it was quite basic.
In addition, you would need to do GOTOs and the like, as well as inserting lines in between others.
ie:
10 PRINT "HELLO"
20 GOTO 10
15 PRINT " WORLD"
where it would go in the logical 10 15 20
Some editors only had an "overwrite" mode and no "insert" mode. This made editing of existing code extremely painful. By adding that line-number feature, you could however patch existing code from anywhere within the file:
100 PRINT "Broken Code"
200 PRINT "Foobar"
...
101 patch the broken code
102 patch more broken code
Because line numbers didn't have to be ordered within the file.
Line numbers were a PART of the language, in some VERY early ones, even the OS was just these simple lines. ALL you had was one line at a time to manipulate. Try writing an accounting system using 1-4k program files and segmenting it by size to get stuff done. To edit you used the line numbers to tell what you were editing. So, if you entered like:
10 PRINT "howdy"
20 GOTO 10
10 PRINT "WOOPS"
15 PRINT "MORE WOOPS"
20
RUN
YOU WOULD GET:
WOOPS
MORE WHOOPS
The blank 20 would effectivly delete that line.