Align text after n-th column in vim removing unnecessary blanks - vim

In vim, in a Windows machine (with no access to "unix"-like commands such command column) I want to reformat this code to make it more readable:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FORMAT 99
COLUMN VALUE_2 FORMAT 99
COLUMN VALUE_3 FORMAT 999
COLUMN VALUE_4 FORMAT 999
And I want to have this using as less commands as possible:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FORMAT 99
COLUMN VALUE_2 FORMAT 99
COLUMN VALUE_3 FORMAT 999
COLUMN VALUE_4 FORMAT 999
Note this is just an excerpt, as there many more lines in which I must do the same.

You could use the following command:
:%s/\w\zs\s*\zeFORMAT/^I
The pattern will match the whitespaces between FORMAT and the end of the previous word and replace it by a tab:
\w Any 'word' character
\zs Start the matching
\s* Any number of whitespace
\ze End the matching
FORMAT The actual word format
\zs and \ze allow to apply the substitution only on the whitespaces see: :h /\zs and :h /\ze
Note that ^I should be inserted with ctrl+vtab
The tabular plugin recommended by #SatoKatsura would be a good way to do it too.
You can also generalize that. Let's say you have the following file:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FOO 99
COLUMN VALUE_2 BAR 99
You could use this command:
:%s/^\(\w*\s\)\{1}\w*\zs\s*\ze/
Were the pattern can be detailed like that:
^ Match the beginning of the line
\(\w*\s\)\{1} One occurrence of the pattern \w*\s i.e. one column
\w* Another column
\zs\s*\ze The whitespaces after the previous column
You could change the value of \{1} to apply the command on the next columns.
EDIT to answer #aturegano comment, here is a way to align the column to another one:
%s/^\(\w*\s\)\{1}\w*\zs\s*\ze/\=repeat(' ', 30-matchstrpos(getline('.'), submatch(0))[1])
The idea is still to match the whitespaces which must be aligned, on the second part of the substitution command we use a sub-replace-expression (See :h sub-replace-expression).
This allows us to use a command from the substitution part, which can be explained like this:
\= Interpret the next characters as a command
repeat(' ', XX) Replace the match with XX whitespaces
XX is decomposed like this:
30- 30 less the next expression
matchstrpos()[1] Returns the columns where the second argument appears in the first one
getline('.') The current line (i.e. the one containing the match
submatch(0) The matched string
[1] Necessary since matchstrpos() returns a list:
[matchedString, StartPosition, EndPosition]
and we are looking for the second value.
You then simply have to replace 30 by the column where you want to move your next column.
See :h matchstrpos(), :h getline() and :h submatch()

For alignment, there are three well-known plugins:
the venerable Align - Help folks to align text, eqns, declarations, tables, etc
the modern tabular
the contender vim-easy-align

Posting an answer as requested:
:g/^COLUMN / s/.*/\=call('printf', ['%s %-30s %s %s'] + split(submatch(0)))/
Explanation:
g/^COLUMN / - apply the following command to lines matching /^COLUMN / (cf. :h :global)
\= - replace with the result of evaluating an expression, rather than with a fixed string (cf. :h s/\=)
submatch(0) - the line being matched
split(...) - split line into words
printf(...) - format the line
call(...) - we'd like to have printf('%s %-30s %s %s', list), but printf() doesn't take "real" lists as arguments, so we have to unfold the list with a call(...) (cf. :h call()).

Yet another solution:
:%s/ \{2,}/ /g
This solution is not perfect because the result will have an extra single space on the first line. To fix this problem:
:%s/\%>15c \{2,}/ /g
Explanation of pattern:
%>15c\s\{2,}
%>15c Matches only after column 15
\s\{2,} Matches two or more white spaces

Related

Append characters based on the count of a match in Vim

I would like to append - at the end of each word match. But, the number of - appended should be based on the count of the match, so that the total number of characters in that line remain constant.
As shown in the example below, the total number of characters should be 6.
e.g.
ab
xyz
abcde
The above text should be replaced to:
ab----
xyz---
abcde-
You can use \= to substitute with an expression, see :h sub-replace-expression.
When the substitute string starts with \=, the remainder is interpreted as an expression.
The submatch() function can be used to obtain matched text. The whole matched text can be accessed with submatch(0). The text matched with the first pair of () with submatch(1). Likewise for further sub-matches in ().
So you can achieve it like this:
:[range]s//\=submatch(0) . repeat('-', 6-strlen(submatch(0)))/

How do I remove text using sed?

For instance let say I have a text file:
worker1, 0001, company1
worker2, 0002, company2
worker3, 0003, company3
How would I use sed to take the first 2 characters of the first column so "wo" and remove the rest of the text and attach it to the second column so the output would look like this:
wo0001,company1
wo0002,company2
wo0003,company3
$ sed -E 's/^(..)[^,]*, ([^,]*,) /\1\2/' file
wo0001,company1
wo0002,company2
wo0003,company3
s/ begin substitution
^(..) match the first two characters at the beginning of the line, captured in a group
[^,]* match any amount of non-comma characters of the first column
, match a comma and a space character
([^,]*,) match the second field and comma captured in a group (any amount of non-comma characters followed by a comma)
match the next space character
/\1\2/ replace with the first and second capturing group

Search for a pattern in Column in a CSV and replace another pattern in the same line using sed command

I want to check for a pattern (only if the pattern starts with) in second column in a CSV file and if that pattern exists then replace something else in same line.
I wrote the following sed command for following csv to change the I to N if the pattern 676 exists in second column. But it checks 676 in the 7th and 9th column also since the ,676 exists. Ideally, I want only the second line to be checked for if the prefix 676 exists. All I want is to check 676 prefixed in second column (pattern not in the middle or end of the second value Ex- 46769777) and then do the change on ,I, to ,N,.
sed -i '/,676/ {; s/,I,/,N,/;}' temp.csc
6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,I,TTTT,I,67677,yy
6768880,46769777,S,I,TTTT,I,67677,yy
Expected result required
6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,N,TTTT,N,67677,yy
6768880,40999777,S,I,TTTT,I,67677,yy
If you are not bound by sed, awk might be a better option for you. Give this a try :
awk -F"," '{match($2,/^676/)&&gsub(",I",",N")}{print}' temp.csc
match syntax does the matching of second column to numbers that starts with (^) 676. gsub replaces I with N.
Result:
6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,N,TTTT,N,67677,yy
6768880,46769777,S,I,TTTT,I,67677,yy
This requires that 676 appear at the beginning of the second column before any changes are made:
$ sed '/^[^,]*,676/ s/,I,/,N,/g' file
6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,N,TTTT,N,67677,yy
6768880,46769777,S,I,TTTT,I,67677,yy
Notes:
The regex /^[^,]*,676/ requires that 676 appear after the first appearance of a comma on the line. In more detail:
^ matches the beginning of the line
[^,]* matches the first column
,676 matches the first comma followed by 676
In your desired output, ,I, was replaced with ,N, every time it appeared on the line. To accomplish this, g (meaning global) was added to the substitute command.

Delete lines based on character(s) in specific position on line

I'm working with a large text file and need to be able delete lines based on the value of the 25th character on the line, i.e. if it is equal to H, K or Z. Is this possible, either just by matching one of the letters and running 3 commands or (even better) by all 3 in one command? Any help greatly appreciated!
You can use global to find a regex and then execute a command on the line that regex was found.
In this case it looks for any character 24 times from the beginning of the line and if the character after it matches H, K, or Z delete that line. (d at the end of the command stands for delete).
:g/^.\{24\}[HKZ]/d
Edit: as Peter Ricker points out \%25c would also work.
:g/\%25c[HKZ]/d
\%25c matches the 25th column then preforms the regex from there.
You could also use \%v if you wanted to match virtual columns instead.
You can try following ex command:
:if match( "HKZ", strpart( getline("."), 24, 1) ) != -1 | delete | endif

Matching only a <tab> that is between two numbers

How to match a tab only when it is between two numbers?
Sample script
209.65834 27.23204908
119.37987 15.03317082
74.240635 8.30561924
29.1014 0
931.8861 -100.00000
-16.03784 -8.30562
;
_mirror
l
;
29.1014 0
1028.10 0.00
n
_spline
935.4875 250
924.2026913 269.8820375
912.9178825 277.4506484
890.348265 287.3181854
(in the above script, the tabs are between the numbers, not the spaces) (blank lines are significant; there is nothing in them, but I can't lose them)
I wish to get a "," between the numbers. Tried with :%s/\t/\,/ but that will touch the empty lines too, and the end of lines.
Try this:
:%s/\(\d\)\t\(-\?\d\)/\1,\2/
\d matches any digit. -? means "an optional -. The pair of (escaped) parenthesis capture the match, and \1 refers to the first captured match, \2 refers to the second.
google://vim+regex -> http://vimregex.com/ ->
:%s/\([0-9]\)\t\([0-9]\)/\1,\2/gc
You have 2 groups of numbers here ([0-9]) and tab-symbols \t between them. Add some escape symbols and you have the answer.
g for multichange in single line, c for some asking.
\1 and \2 are matching groups (numbers in your case).
It's not really hard to find answer for questions like that by yourself.
try
:%s/\([0-9]\)\t\([0-9]\)/\1,\2/g
explanation - search the patten <digit>\t<digit> and remember the part that matches <digit> .
\( ... \) captures and remembers the part that matches.
\1 recalls the first captured digit, \2 the second captured digit.
so if the match was on 123\t789, <digit>,<digit> matches 3\t7
the 3 and 7 are rememberd as \1 and \2
or
:g/[0-9]/ s/\t/,/g
explanation - filter all lines with a digit, then substitute tabs with a comma on those lines

Resources