Align at longest word - vim

I have the following code:
a = 123
p.value 0.123
p.long.name = "abc"
How can I align each line like shown below in vim?
a = 123
p.value = 0.123
p.long.name = "abc"
Thanks for any hints.

Without plugin:
:%s/=/ &/
:%s/\%13c\s\+=/=
First command will insert spaces before first equal signs on all lines, second one will remove all spaces before an equal sign at 13th column. You could also use Visual block selection and <..... to shift left as many times as necessary.
However this is really unclean. With the tabular plugin you just type :Tab /=/ and this will do the work and the range will be calculated automatically (greatest range around the cursor in which all lines match the pattern).

Related

How do i split a single cell content into a same cell? [duplicate]

In the cell below, I want to get whatever is separated by a comma to come to a new line. I can add these line breaks manually with alt+enter, but this time I want to automate it.
BCM:Open,Event:Site XXXX is down,Service Affected :2G,Impact :Coverage
Restored at XXXX Area,Reason:Under Investigation,Recovery Time :30
Minutes,Start time:14:25:13,End Time:15:18:03,Duration:00:52:50,SLA:1
Hour.
For some reason, none of the above worked for me. This DID however:
Selected the range of cells I needed to replace.
Go to Home > Find & Select > Replace or Ctrl + H
Find what: ,
Replace with: CTRL + SHIFT + J
Click Replace All
Somehow CTRL + SHIFT + J is registered as a linebreak.
To replace commas with newline characters use this formula (assuming that the text to be altered is in cell A1):
=SUBSTITUTE(A1,",",CHAR(10))
You may have to then alter the row height to see all of the values in the cell
I've left a comment about the other part of your question
Edit: here's a screenshot of this working - I had to turn on "Wrap Text" in the "Format Cells" dialog.
Use
=SUBSTITUTE(A1,",",CHAR(10) & CHAR(13))
This will replace each comma with a new line. Change A1 to the cell you are referencing.
You can also do this without VBA from the find/replace dialogue box. My answer was at https://stackoverflow.com/a/6116681/509840 .
Windows (unlike some other OS's, like Linux), uses CR+LF for line breaks:
CR = 13 = 0x0D = ^M = \r = carriage return
LF = 10 = 0x0A = ^J = \n = new line
The characters need to be in that order, if you want the line breaks to be consistently visible when copied to other Windows programs. So the Excel function would be:
=SUBSTITUTE(A1,",",CHAR(13) & CHAR(10))

Extract last word in string in R - error faced

First, I wish to extract the last word and first word for the Description column (this column contains at least 3 words) into a newly created column firstword and lastword. However, the word() function is not applied to all the rows. As such, there are many rows with empty lastword, though these rows actually have a last word (as you can see from the Description column). This is shown in the first two lines of codes.
Second, I am also trying to get the third line of code to replace the lastword with firstword, if lastword is empty. However it isn't working.
Is there a way to rectify this?
c1$lastword = word(c1$Description,start=-1) #extract last word
c1$firstword = word(c1$Description,start=1) #extract first word
c1$lastword=ifelse(c1$lastword == " ", c1$firstword, c1$lastword)
I realise that there is white space at the beginning of some of the rows of the Description variable, which isn't shown when viewed in R.
Removing the whitespace using stri_trim() solved the issue.
c1$Description = stri_trim(c1$Description, "left") #remove whitespace

vim search and replace successive string occurrence with different string

Let’s say I have multiple STRING occurrences. I want to replace the 1st occurrence with STRING_A, 2nd occurrence with STRING_B, 3rd occurrence with STRING_C.
e.g
Color of my pant is STRING. Color of my hair is STRING. Color of my car is STRING.
After I run search and replace, I should get:
Color of my pant is STRING_A. Color of my hair is STRING_B. Color of my car is STRING_C.
Any help will be greatly appreciated.
From vim wiki:
let #a=1 | %s/STRING/\='STRING_'.(#a+setreg('a',#a+1))/g
But this will give you STRING_1, STRING_2 etc.
Slight modification gives the desired result:
let #a=65 | %s/STRING/\='STRING_'.nr2char(#a+setreg('a',#a+1))/g
If you want to get the substitutions from an array, first define an array:
:let foo=['bar','baz','bak']
Then do the substitution:
let #a=0 | %s/STRING/\=get(foo, #a+setreg('a',#a+1))/g
This will give you:
Color of my pant is bar. Color of my hair is baz. Color of my car is bak.
You can define a List of replacements, and then use :help sub-replace-expression to pop replacements off it:
:let r = ['bar', 'baz', 'bak']
:%substitute/STRING/\=remove(r, 0)/g

Openrefine: Split multi-valued cells by token/word count?

I have a large corpus of text data that I'm pre-processing for document classification with MALLET using openrefine.
Some of the cells are long (>150,000 characters) and I'm trying to split them into <1,000 word/token segments.
I'm able to split long cells into 6,000 character chunks using the "Split multi-valued cells" by field length, which roughly translates to 1,000 word/token chunks, but it splits words across rows, so I'm losing some of my data.
Is there a function I could use to split long cells by the first whitespace (" ") after every 6,000th character, or even better, split every 1,000 words?
Here is my simple solution:
Go to Edit cells -> Transform and enter
value.replace(/((\s+\S+?){999})\s+/,"$1###")
This will replace every 1000th whitespace (consecutive whitespaces are counted as one and replaced if they appear at the split border) with ### (you can choose any token you like, as long as it doesn't appear in the original text).
The go to Edit cells -> Split multi-valued cells and split using the token ### as separator.
The simplest way is probably to split your text by spaces, to insert a very rare character (or group of characters) after each group of 1000 elements, to reconcatenate, then to use "Split multivalued cells" with your weird character(s).
You can do that in GREL, but it will be much clearer by choosing "Python/Jython" as script language.
So: Edit cells -> Transform -> Python/Jython:
my_list = value.split(' ')
n = 1000
i = n
while i < len(my_list):
my_list.insert(i, '|||')
i+= (n+1)
return " ".join(my_list)
(For an explanation of this script, see here)
Here is a more compact version :
text = value.split(' ')
n = 1000
return "|||".join([' '.join(text[i:i+n]) for i in range(0,len(text),n)])
You can then split using ||| as separator.
If you prefer to split by characters instead of words, looks like you can do that in two lines with textwrap :
import textwrap
return "|||".join(textwrap.wrap(value, 6000))

Substitute a comma with a line break in a cell

In the cell below, I want to get whatever is separated by a comma to come to a new line. I can add these line breaks manually with alt+enter, but this time I want to automate it.
BCM:Open,Event:Site XXXX is down,Service Affected :2G,Impact :Coverage
Restored at XXXX Area,Reason:Under Investigation,Recovery Time :30
Minutes,Start time:14:25:13,End Time:15:18:03,Duration:00:52:50,SLA:1
Hour.
For some reason, none of the above worked for me. This DID however:
Selected the range of cells I needed to replace.
Go to Home > Find & Select > Replace or Ctrl + H
Find what: ,
Replace with: CTRL + SHIFT + J
Click Replace All
Somehow CTRL + SHIFT + J is registered as a linebreak.
To replace commas with newline characters use this formula (assuming that the text to be altered is in cell A1):
=SUBSTITUTE(A1,",",CHAR(10))
You may have to then alter the row height to see all of the values in the cell
I've left a comment about the other part of your question
Edit: here's a screenshot of this working - I had to turn on "Wrap Text" in the "Format Cells" dialog.
Use
=SUBSTITUTE(A1,",",CHAR(10) & CHAR(13))
This will replace each comma with a new line. Change A1 to the cell you are referencing.
You can also do this without VBA from the find/replace dialogue box. My answer was at https://stackoverflow.com/a/6116681/509840 .
Windows (unlike some other OS's, like Linux), uses CR+LF for line breaks:
CR = 13 = 0x0D = ^M = \r = carriage return
LF = 10 = 0x0A = ^J = \n = new line
The characters need to be in that order, if you want the line breaks to be consistently visible when copied to other Windows programs. So the Excel function would be:
=SUBSTITUTE(A1,",",CHAR(13) & CHAR(10))

Resources