emacs string-insert-rectangle vector of numbers? - text

How can I use emacs string-insert-rectangle operation to add a vector of numbers to a series of lines? For example, I've got this shortened version of a bunch of text entries in my emacs buffer:
element01 8 111111111011010000100000001100101000001111101111011111111000
element01 8 111101101010101001111111111000111110111111011110100101010111
element01 8 111111011001001110111010111111100111010110101011111010110011
element01 8 111111111111111111111111010111101101011101011111000001100000
element01 8 111100111111011111100110110000001011110101000011111011111101
element01 8 111001001011000000011100000101001001100101101011101101011011
element01 8 111011111101101111111111111101101010111110111011111101011011
element01 8 101101111101101111000110111101111010111011101111001101001011
element02 6 110101110101100101100101000111010101110111001001101111111011
element02 6 111001011001001011101110111100111101101011111111111011111101
element02 6 101111100111010111111010010101111101111111101101111011111011
element02 6 111101111111111100111110110111101011111001001101101100111111
element02 6 111111010111101111010011110111001100001000101010111111111101
element02 6 111110111001101111111100111011110000011011100100100111111010
element03 13 110011011111111111101011100111111110011111110100111010011111
element03 13 100011101000111110101101000000000001110110110011110110111101
element03 13 101100011100011111110111110110101101111111110110110100101111
element03 13 111111011110101110101011010111110000010111111011100100011111
element03 13 011100110110110111100101110101111110111100101110010111110011
element03 13 100111111111100100111110110110111111111101011101110110110111
element03 13 101111111111110101110110111011111110111101110110111111111111
element03 13 111110010111110110101111110110111111111110101111111101110011
element03 13 100111111101110110110110111110111010111110110011111111110111
element03 13 110100110111110110110100111010110100110110110110110101111111
element03 13 011111011010111101101001011100111110010111111011111101011010
element03 13 011101111110010000111000000101101010111110100010110110110110
element03 13 110100110110110010101010100011100011000000110011011100110100
element03 13 010101101010110010111100101001001010111001100111110000011011
[...]
And I want to add a column between the second and third column that will look like this:
element01 8 id1 111111111011010000100000001100101000001111101111011111111000
element01 8 id2 111101101010101001111111111000111110111111011110100101010111
element01 8 id3 111111011001001110111010111111100111010110101011111010110011
element01 8 id4 111111111111111111111111010111101101011101011111000001100000
element01 8 id5 111100111111011111100110110000001011110101000011111011111101
element01 8 id6 111001001011000000011100000101001001100101101011101101011011
element01 8 id7 111011111101101111111111111101101010111110111011111101011011
element01 8 id8 101101111101101111000110111101111010111011101111001101001011
element02 6 id9 110101110101100101100101000111010101110111001001101111111011
element02 6 id10 111001011001001011101110111100111101101011111111111011111101
element02 6 id11 101111100111010111111010010101111101111111101101111011111011
element02 6 id12 111101111111111100111110110111101011111001001101101100111111
element02 6 id13 111111010111101111010011110111001100001000101010111111111101
element02 6 id14 111110111001101111111100111011110000011011100100100111111010
element03 13 id15 110011011111111111101011100111111110011111110100111010011111
element03 13 id16 100011101000111110101101000000000001110110110011110110111101
element03 13 id17 101100011100011111110111110110101101111111110110110100101111
element03 13 id18 111111011110101110101011010111110000010111111011100100011111
element03 13 id19 011100110110110111100101110101111110111100101110010111110011
element03 13 id20 100111111111100100111110110110111111111101011101110110110111
element03 13 id21 101111111111110101110110111011111110111101110110111111111111
element03 13 id22 111110010111110110101111110110111111111110101111111101110011
element03 13 id23 100111111101110110110110111110111010111110110011111111110111
element03 13 id24 110100110111110110110100111010110100110110110110110101111111
element03 13 id25 011111011010111101101001011100111110010111111011111101011010
element03 13 id26 011101111110010000111000000101101010111110100010110110110110
element03 13 id27 110100110110110010101010100011100011000000110011011100110100
element03 13 id28 010101101010110010111100101001001010111001100111110000011011
[...]
How can I use something like string-insert-rectangle in emacs to add this new third column with increasing number count?
PS: I know I could do this with a bash/perl/python/etc script, in this question I am asking if this can be easily done with emacs.

I think the simplest solution is to mark the first character of the original third column in the first line, move point to the same character of the last line, and then type:
C-uC-xrNRET id%d RET
rectangle-number-lines is an interactive compiled Lisp function in
`rect.el'.
It is bound to C-x r N.
(rectangle-number-lines START END START-AT &optional FORMAT)
Insert numbers in front of the region-rectangle.
START-AT, if non-nil, should be a number from which to begin
counting. FORMAT, if non-nil, should be a format string to pass
to `format' along with the line count. When called interactively
with a prefix argument, prompt for START-AT and FORMAT.
The regexp-replace and macro techniques are both superb general-purpose tools to know, but rectangle-number-lines is pretty much custom-built for this very question.
Edit: I hadn't noticed at the time, but it turns out that this is a new feature in Emacs 24. Earlier versions of Emacs will translate that sequence to C-x r n (lower-case n) which runs an entirely different function.

You can use query-replace-regexp directly, by adding a new column with the match count \#.
The matches look for 3 columns separated by spaces, which will be stored in submatch strings \1 to \3. The replaced string adds a new column using the match count.
Version 1 (simpler, but starts at 0):
M-x query-replace-regexp RET
^\(\w+\)\ +\(\w+\)\ +\(\w+\)$ RET
\1 \2 id\# \3 RET
Note I used spaces for matching and replacing. You can use tabs instead.
Version 2 (uses lisp to customize the row count with the +1 function):
M-x query-replace-regexp RET
^\(\w+\)\ +\(\w+\)\ +\(\w+\)$ RET
\,(format "%s %s id%d %s" \1 \2 (+1 \#) \3) RET

Here is a log of how you can solve it with a keyboard macro. AFAIK you can't solve this with just string-insert-rectangle.
Where a register input is required, I used a
C-1 C-x r n
number-to-register
C-x ( kmacro-start-macro
C-M-f forward-sexp [3 times]
C-M-b backward-sexp
C-u C-x r i
insert-register
C-x r + increment-register
C-x ) kmacro-end-macro
C-SPC set-mark-command
M-> end-of-buffer
C-x C-k r apply-macro-to-region-lines

This is a way to do it in emacs, unfortunately, this approach doesn't use string-insert-rectangle. Also, this approach rudely assumes there are more than 10 characters on every line. Hilarity will ensue if that's not the case. M-x doit will invoke it.
(defun doit ()
(interactive)
(save-excursion
(beginning-of-buffer)
(let ((n 1))
(while (< (point) (point-max))
(forward-char 10)
(insert "id" (int-to-string n) " ")
(end-of-line)
(forward-line)
(incf n)))))

Related

How to add a string to line 13 in my text file

I have a very large text file that is difficult to open in text editors.
Lines 12 - 15 are:
1 15.9994
2 24.305
Atoms
I would like to add:
3 196 to line 14 and then have a blank line between 3 196 and Atoms like it is currently. I tried:
sed '14 a <3 196>' file.data
But it did not seem to change anything. Anyone know of how I can do this?
Normally, sed only writes out the changes. It does not modify the file.
If you want the input file to be modified, you can use GNU sed -i:
sed -i '14 a <3 196>' file.data
Before:
[...]
9
10
11
1 15.9994
2 24.305
Atoms
16
17
[...]
After:
[...]
9
10
11
1 15.9994
2 24.305
<3 196>
Atoms
16
17
[...]
Note: If you want it after line 13 instead of 14, change 14 to 13 in your code. Similarly, if you wanted 3 196 instead of <3 196>, change <3 196> to 3 196 in your code.

Count Number of occurrence at each line

I have the following file
ENST001 ENST002 4 4 4 88 9 9
ENST004 3 3 3 99 8 8
ENST009 ENST010 ENST006 8 8 8 77 8 8
Basically I want to count how many times ENST* is repeated in each line so the expected results is
2
1
3
Any suggestion please ?
Try this (and see it in action here):
awk '{print gsub("ENST[0-9]+","")}' INPUTFILE

How to find all Common substrings which is 3 chars or longer

Are there an efficient algorithm to search and dump all common substrings (which length is 3 or longer) between 2 strings?
Example input:
Length : 0 5 10 15 20 25 30
String 1 : ABC-DEF-GHI-JKL-ABC-ABC-STU-MWX-Y
String 2 : ABC-JKL-MNO-ABC-DEF-PQR-DEF-ZWX-Y
Example output:
In string 1 2
---------------------------
ABC-DEF 0 12
ABC-DE 0 12
BC-DEF 1 13
:
-ABC- 15,19 11
-JKL- 11 3
-DEF- 3 15
-JKL 11 3
JKL- 12 4
-DEF 3 15,23
DEF- 4 16
WX-Y 29 29
ABC- 0,16,20 0,12
-ABC 15,19 11
DEF- 4 16,24
DEF 4 16,24
ABC 0,16,20 0,12
JKL 12 4
WX- 29 29
X-Y 30 30
-AB 15,19 11
BC- 1,17,21 1,13
-DE 3 15,23
EF- 5 17,25
-JK 11 3
KL- 13 5
:
In the example, "-D", "-M" is also a common substring but is not required, because it's length is only 2. (There might be some missing outputs in example because there are so many of them...)
You can find all common substrings using a data structure called a Generalized suffix tree
Libstree contains some example code for finding the longest common substring. That example code can be modified to obtain all common substrings.

redefine length.character in R

Since length is a generic method, why can't I do
length.character <- nchar
? It seems that strings are treated special in R. Is there a reason for that? Would you discourage defining functions like head.character and tail.character?
If you look at the help page for InternalMethods (mentioned in the details portion of the help page for length) it states that
For efficiency, internal dispatch only
occurs on objects, that
is those for which ‘is.object’ returns true.
Vectors are not objects in the same sense as other objects are, so the method dispatch is not being done on any basic vectors (not just character). if you really want to use this type of dispatch you need a defined object, e.g.:
> tmp <- state.name
> class(tmp) <- 'mynewclass'
> length.mynewclass <- nchar
> length(tmp)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8
[26] 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12 9 5 4 7 8 10 13 9 7
>
My 2c:
Strings are not treated specially in R. If length did the same thing as nchar, then you would get unexpected results if you tried to compute length(c("foo", "bazz")). Or to put it another way, would you expect the length of a numeric vector to return the number of digits in each element of the vector or the length of the vector itself?
Also creating this method might side-effect other functions which expect the normal string behavior.
Now I found a reason not to define head.character: it changes the way how head works. For example:
head.character <- function(s,n) if(n<0) substr(s,1,nchar(s)+n) else substr(s,1,n)
test <- c("abc", "bcd", "cde")
head("abc", 2) # works fine
head(test,2)
Without the definition of head, the last line would return c("abc", "bcd"). Now, with head.character defined, this function is applied to each element of the list and returns c("ab", "bc", "cd").
But I have a strhead and a strtail function now.. :-)

linux/shell script

I have written a program which generates parameter index for 2 variables. Say, a and b in steps of 5. like this I have to do for 23 variables. So I don't want to write 23 for-loops to run, how can I make it into a single for-loop which is common for all 23 variables. I hope it can be done with an array, but i don't know how to implement it via program.
Could you please help me?
Program:
int z, p
float a, b
float a0, an, s, a1, b0, bn, b1
str var
s=5; a0=1; an=10; b0=8; bn=13 // s= steps, a0, b0= initial value, an,bn=final value
z=0
a1=(an-a0)/s
b1=(bn-b0)/s
for (a=(a1+a0);a<=an;a=a+a1)
for (b=(b1+b0);b<=bn;b=b+b1)
echo {z} {a} {b} -format "%25s" >> /home/genesis/genesis-2.3/genesis/Scripts/kinetikit/dhanu19.txt
z=z+1
end
end
output : dhanu19.txt
0 2.8 9
1 2.8 10
2 2.8 11
3 2.8 12
4 2.8 13
5 4.6 9
6 4.6 10
7 4.6 11
8 4.6 12
9 4.6 13
10 6.4 9
11 6.4 10
12 6.4 11
13 6.4 12
14 6.4 13
15 8.2 9
16 8.2 10
17 8.2 11
18 8.2 12
19 8.2 13
20 10 9
21 10 10
22 10 11
23 10 12
24 10 13
Have you considered writing either a script or a program to write the script for you? Generating shell-scripts, then running them can sometimes be a powerful solution to problems.
Which Shell are you referring to? Declaring Arrays has some syntactical differences between zsh, bash or so...
Let's assume you write the 23 for loop.
If you have 5 steps for each loop, you will end up with 5^23 parameter !
Let's suppose each loop outputs 1 byte, you still need to store something like 10^16 bytes, or ten thousand terabytes.
I think you should reconsider your problem, or reformulate your question
Edit :
This is not a forums (and aven in forums you can edit your post).
Please edit your question instead of posting new answer, I think it is interesting

Resources