How to replace finding words with the different in each occurrence in VI/VIM editor ? - linux

For example, I have a text ,
10 3 4 2 10 , 4 ,10 ....
No I want to change each 10 with different words
I know %s/10/replace-words/gc but it only let me replace interactively like yes/no but I want to change each occurrence of 10 with different words like replace1, 3, 4 , 2 , replace2, 4, replace3 ....

Replaces each occurence of 10 with replace{index_of_match}:
:let #a=1 | %s/10/\='replace'.(#a+setreg('a',#a+1))/g
Replaces each occurence of 10 with a word from a predefined array:
:let b = ['foo', 'bar', 'vim'] | %s/10/\=(remove(b, 0))/g
Replaces each occurence of 10 with a word from a predefined array, and the index of the match:
:let #a=1 | let b = ['foo', 'bar', 'vim'] | %s/10/\=(b[#a-1]).(#a+setreg('a',#a+1))/g
But since you have to type in any word anyway, the benefit of the second and third function this is minimal. See the answer from SpoonMeiser for the "manual" solution.
Update: As wished, the explanation for the regex part in the second example:
%= on every line in the document
s/<search>/<replace>/g = s means do a search & replace, g means replace every occurence.
\= interprets the following as code.
remove(b, 0) removes the element at index 0 of the list b and returns it.
so for the first occurrence. the line will be %s/10/foo/g the second time, the list is now only ['bar', 'vim'] so the line will be %s/10/bar/g and so on
Note: This is a quick draft, and unlikely the best & cleanest way to achieve it, if somebody wants to improve it, feel free to add a comment

Is there a pattern to the words you want or would you want to type each word at each occurrence of the word you're replacing?
If I were replacing each instance of "10" with a different word, I'd probably do it somewhat manually:
/10
cw
<type word>ESC
ncw
<type word>ESC
ncw
<type word>ESC
Which doesn't seem too onerous, if each word is different and has to be typed separately anyway.

Related

How can I remove repeated characters in a string with R?

I would like to implement a function with R that removes repeated characters in a string. For instance, say my function is named removeRS, so it is supposed to work this way:
removeRS('Buenaaaaaaaaa Suerrrrte')
Buena Suerte
removeRS('Hoy estoy tristeeeeeee')
Hoy estoy triste
My function is going to be used with strings written in spanish, so it is not that common (or at least correct) to find words that have more than three successive vowels. No bother about the possible sentiment behind them. Nonetheless, there are words that can have two successive consonants (especially ll and rr), but we could skip this from our function.
So, to sum up, this function should replace the letters that appear at least three times in a row with just that letter. In one of the examples above, aaaaaaaaa is replaced with a.
Could you give me any hints to carry out this task with R?
I did not think very carefully on this, but this is my quick solution using references in regular expressions:
gsub('([[:alpha:]])\\1+', '\\1', 'Buenaaaaaaaaa Suerrrrte')
# [1] "Buena Suerte"
() captures a letter first, \\1 refers to that letter, + means to match it once or more; put all these pieces together, we can match a letter two or more times.
To include other characters besides alphanumerics, replace [[:alpha:]] with a regex matching whatever you wish to include.
I think you should pay attention to the ambiguities in your problem description. This is a first stab, but it clearly does not work with "Good Luck" in the manner you desire:
removeRS <- function(str) paste(rle(strsplit(str, "")[[1]])$values, collapse="")
removeRS('Buenaaaaaaaaa Suerrrrte')
#[1] "Buena Suerte"
Since you want to replace letters that appear AT LEAST 3 times, here is my solution:
gsub("([[:alpha:]])\\1{2,}", "\\1", "Buennaaaa Suerrrtee")
#[1] "Buenna Suertee"
As you can see the 4 "a" have been reduced to only 1 a, the 3 r have been reduced to 1 r but the 2 n and the 2 e have not been changed.
As suggested above you can replace the [[:alpha:]] by any combination of [a-zA-KM-Z] or similar, and even use the "or" operator | inside the squre brackets [y|Q] if you want your code to affect only repetitions of y and Q.
gsub("([a|e])\\1{2,}", "\\1", "Buennaaaa Suerrrtee")
# [1] "Buenna Suerrrtee"
# triple r are not affected and there are no triple e.

A complicated case of conditional line splitting to be performed in Vim

Here is a sample of text that I’m working with:
Word1
Word2
...
Word4 / Word5 Word6
Word7
Word8 Word9 Word10 / Word11 Word12 Word13 Word14
Word15
Word16
...
I would like to transform it by splitting the lines containing
slash-separated chunks, so that the first chunk (preceding the slash)
gets the trailing words copied from the second chunk (following the
slash) to equalize the number of words in both lines resulting from
the chunks, if the former one has fewer words than the latter.
In other words, the desired transformation is to target the lines
consisting of two groups of words separated by a (space-surrounded)
slash character. The first group of words (preceding the slash) on
a target line has 1 to 3 words, but always fewer than the second
group.
Thus, the target lines have the following structure:
‹G1› / ‹G2› ‹G3›
where ‹G1› and
‹G2› ‹G3› (i.e.,
‹G2› concatenated with ‹G3›)
constitute the two aforementioned groups of words, with
‹G2› standing for as many of the leading words of the
after-slash group as there are in the before-slash one, and
‹G3› standing for the remaining words in the
after-slash group.
Such lines should be replaced with two lines, as follows:
‹G1› ‹G3›
‹G2› ‹G3›
For the above example, the desired result is as follows:
Word1
Word2
...
Word4 Word6
Word5 Word6
Word7
Word8 Word9 Word10 Word14
Word11 Word12 Word13 Word14
Word15
Word16
...
Could you please help me implement this transformation in Vim?
You can write a function to expand slash:
fun! ExpandSlash() range
for i in range(a:firstline, a:lastline)
let ws = split(getline(i))
let idx = index(ws, '/')
if idx==-1
continue
endif
let h= join(ws[ : idx-1])
let m= join(ws[idx+1 : 2*idx])
let t= join(ws[2*idx+1 : ])
call setline(i, h.' '.t.'/'.m.' '.t)
endfor
endfun
:%call ExpandSlash()
:%s#/#\r#
before
1 2 3 / 4 5 6 7 8
after
1 2 3 7 8
4 5 6 7 8
One can use the following command to perform the desired transformation:
:g~/~s~\s*/\s*~\r~|-|exe's/\ze\n\%(\s*\w\+\)\{'.len(split(getline('.'))).'}\(.*\)$/\1'
This :global command selects the lines matching the pattern /
(here, it is delimited by ~ characters) and executes the commands
that follow it for each of those lines.
Let us consider them one by one.
The slash character with optional surrounding whitespace that
separates the first and the second groups of words on the
current line (as defined in the question’s statement), is
replaced by the newline character:
:s~\s*/\s*~\r~
Here the tilde characters are used again to delimit the
pattern and the replacement strings, so that there is no'
need to escape the slash.
After the above substitution the cursor is located on the line
next to the one where the substituted slash was. To make writing
the following commands more convenient, the cursor is moved back
that line just above:
:-
The - address is the shortening for the .-1 range denoting
the line preceding the current one (see :help :range).
The third group of words, which is now at the end of the next
line, is to be appended to the current one. In order to do
that, the number of words in the first group is determined.
Since the current line contains the first group only, that
number can be calculated by separating the contents of that
line into whitespace-delimited groups with the help of the
split() function:
len(split(getline('.')))
The getline('.') call returns the current line as a string,
split() converts that string into a list of words, and
len() counts the number of items in that list.
Using the number of words, a substitution command is generated
and run with the :execute command:
:exe's/\ze\n\%(\s*\w\+\)\{'.len(split(getline('.'))).'}\(.*\)$/\1'
The substitutions have the following structure:
:s/\ze\n\%(\s*\w\+\)\{N}\(.*\)$/\1
where N is the number of words that were placed before
the slash.
The pattern matches the newline character of the current line
followed by exactly N words on the second line. A word
is matched as a sequence of whitespace preceding a series of
one or more word characters (see :help /\s and :help /\w).
The word pattern is enclosed between the \%( and \)
escaped parentheses (see :help /\%() to treat it as a single
atom for the \{N} specifier (see :help /\{) to match
exactly N occurrences of it. The remaining text to the
end of the next line is matched as a subgroup to be referenced
from the replacement expression.
Because of the \ze atom at the very beginning of the
pattern, its match has zero width (see :help /\ze). Thanks
to that, the substitution command replaces the empty string
just before the newline character with the text matched by the
subgroup, thus inserting the third group of words after the
first one.
For the given example the result is equivalent to replacing each / with the last word on the line and a line break \r. Here is a global substitute command to do it:
:%s#/ \ze.*\(\<\w\+$\)#\1\r#
Explanation:
/ \ze match the / end stop matching (nothing after the \ze will be substituted)
.* match any intermediate characters
\( start another match group
\<\w\+$ match the last word before the end of the line
\) stop the match group
However, you then say that the trailing group g3 may contain more than one word, which means the replace operation needs to be able to count the number of words before and after the /. I'm afraid I don't know how to do that, but I'm sure someone will leap to your rescue before long!

Count word occurrences in R

Is there a function for counting the number of times a particular keyword is contained in a dataset?
For example, if dataset <- c("corn", "cornmeal", "corn on the cob", "meal") the count would be 3.
Let's for the moment assume you wanted the number of element containing "corn":
length(grep("corn", dataset))
[1] 3
After you get the basics of R down better you may want to look at the "tm" package.
EDIT: I realize that this time around you wanted any-"corn" but in the future you might want to get word-"corn". Over on r-help Bill Dunlap pointed out a more compact grep pattern for gathering whole words:
grep("\\<corn\\>", dataset)
Another quite convenient and intuitive way to do it is to use the str_count function of the stringr package:
library(stringr)
dataset <- c("corn", "cornmeal", "corn on the cob", "meal")
# for mere occurences of the pattern:
str_count(dataset, "corn")
# [1] 1 1 1 0
# for occurences of the word alone:
str_count(dataset, "\\bcorn\\b")
# [1] 1 0 1 0
# summing it up
sum(str_count(dataset, "corn"))
# [1] 3
You can also do something like the following:
length(dataset[which(dataset=="corn")])
I'd just do it with string division like:
library(roperators)
dataset <- c("corn", "cornmeal", "corn on the cob", "meal")
# for each vector element:
dataset %s/% 'corn'
# for everything:
sum(dataset %s/% 'corn')
You can use the str_count function from the stringr package to get the number of keywords that match a given character vector.
The pattern argument of the str_count function accepts a regular expression that can be used to specify the keyword.
The regular expression syntax is very flexible and allows matching whole words as well as character patterns.
For example the following code will count all occurrences of the string "corn" and will return 3:
sum(str_count(dataset, regex("corn")))
To match complete words use:
sum(str_count(dataset, regex("\\bcorn\\b")))
The "\b" is used to specify a word boundary. When using str_count function, the default definition of word boundary includes apostrophe. So if your dataset contains the string "corn's", it would be matched and included in the result.
This is because apostrophe is considered as a word boundary by default. To prevent words containing apostrophe from being counted, use the regex function with parameter uword = T. This will cause the regular expression engine to use the unicode TR 29 definition of word boundaries. See http://unicode.org/reports/tr29/tr29-4.html. This definition does not consider apostrophe as a word boundary.
The following code will give the number of time the word "corn" occurs. Words such as "corn's" will not be included.
sum(str_count(dataset, regex("\\bcorn\\b", uword = T)))

Move lines matched by :g to the top of the file

I have a large text file with several calls to a specific function method_name.
I've matched them using :g/method_name.
How would I move them to the top of the file (with the first match being on the top)?
I tried :g/method_name/normal ddggP but that reverses the order. Is there a better way to directly cut and paste all the matching lines, in order?
Example input file:
method_name 1
foo
method_name 2
bar
method_name 3
baz
Example output file:
method_name 1
method_name 2
method_name 3
foo
bar
baz
How about trying it the other way around: moving the un-matched lines to the bottom:
:v/method_name/normal ddGp
This seems to achieve what you want.
I think you can achieve the desired result by first creating a variable assigned
to 0:
:let i=0
And then executing this command:
:g/method_name/exec "m ".i | let i+= 1
It basically calls :m passing as address the value of i, and then increments
that value by one so it can be used in the next match. Seems to work.
Of course, you can delete the variable when you don't need it anymore:
:unlet i
If the file is really large, count of matching entries is small, and you don't want to move around the entire file with solution v/<pattern>/ m$, you may do this:
Pick any mark you don't care about, say 'k. Now the following key sequence does what you want:
ggmk:g/method_name/ m 'k-1
ggmk marks first line with 'k.
m 'k-1 moves matching line to 1 line before the 'k mark (and mark moves down with the line it is attached to).
This will only move a few matching lines, not the entire file.
Note: this somehow works even if the first line contains the pattern -- and I don't have an explanation for that.
For scripts:
normal ggmk
g/method_name/ m 'k-1

Any other ways to emulate `tr` in J?

I picked up J a few weeks ago, about the same time the CodeGolf.SE beta opened to the public.
A recurrent issue (of mine) when using J over there is reformatting input and output to fit the problem specifications. So I tend to use code like this:
( ] ` ('_'"0) ) #. (= & '-')
This one untested for various reasons (edit me if wrong); intended meaning is "convert - to _". Also come up frequently: convert newlines to spaces (and converse), merge numbers with j, change brackets.
This takes up quite a few characters, and is not that convenient to integrate to the rest of the program.
Is there any other way to proceed with this? Preferably shorter, but I'm happy to learn anything else if it's got other advantages. Also, a solution with an implied functional obverse would relieve a lot.
It sometimes goes against the nature of code golf to use library methods, but in the string library, the charsub method is pretty useful:
'_-' charsub '_123'
-123
('_-', LF, ' ') charsub '_123', LF, '_stuff'
-123 -stuff
rplc is generally short for simple replacements:
'Test123' rplc 'e';'3'
T3st123
Amend m} is very short for special cases:
'*' 0} 'aaaa'
*aaa
'*' 0 2} 'aaaa'
*a*a
'*&' 0 2} 'aaaa'
*a&a
but becomes messy when the list has to be a verb:
b =: 'abcbdebf'
'L' (]g) } b
aLcLdeLf
where g has to be something like g =: ('b' E. ]) # ('b' E. ]) * [: i. #.
There are a lot of other "tricks" that work on a case by case basis. Example from the manual:
To replace lowercase 'a' through 'f' with uppercase 'A'
through 'F' in a string that contains only 'a' through 'f':
('abcdef' i. y) { 'ABCDEF'
Extending the previous example: to replace lowercase 'a' through
'f' with uppercase 'A' through 'F' leaving other characters unchanged:
(('abcdef' , a.) i. y) { 'ABCDEF' , a.
I've only dealt with the newlines and CSV, rather than the general case of replacement, but here's how I've handled those. I assume Unix line endings (or line endings fixed with toJ) with a final line feed.
Single lines of input: ".{:('1 2 3',LF) (Haven't gotten to use this yet)
Rectangular input: (".;._2) ('1 2 3',LF,'4 5 6',LF)
Ragged input: probably (,;._2) or (<;._2) (Haven't used this yet either.)
One line, comma separated: ".;._1}:',',('1,2,3',LF)
This doesn't replace tr at all, but does help with line endings and other garbage.
You might want to consider using the 8!:2 foreign:
8!:2]_1
-1

Resources