Delete lines based on character(s) in specific position on line

Delete lines based on character(s) in specific position on line - vim

I'm working with a large text file and need to be able delete lines based on the value of the 25th character on the line, i.e. if it is equal to H, K or Z. Is this possible, either just by matching one of the letters and running 3 commands or (even better) by all 3 in one command? Any help greatly appreciated!

You can use global to find a regex and then execute a command on the line that regex was found.
In this case it looks for any character 24 times from the beginning of the line and if the character after it matches H, K, or Z delete that line. (d at the end of the command stands for delete).
:g/^.\{24\}[HKZ]/d
Edit: as Peter Ricker points out \%25c would also work.
:g/\%25c[HKZ]/d
\%25c matches the 25th column then preforms the regex from there.
You could also use \%v if you wanted to match virtual columns instead.

You can try following ex command:
:if match( "HKZ", strpart( getline("."), 24, 1) ) != -1 | delete | endif

Related

How to combine and negate these two patterns together?

In VIM, I want to delete any lines that are not 2 or 3 characters.
:g/^..$/d
:g/^...$/d
Those delete 2 or 3 character lines. How to combine the two into one and negate it, namely 'don't delete 2 or 3 character lines'

You can use :v to execute a command on lines that do not match a pattern.
This requires that you use a single pattern though... Which in your case you can easily do by using the \= modifier to optionally match the last item.
So to delete all lines with either 2 or 3 characters, you can use:
:g/^...\=$/d
And to delete all lines except those with either 2 or 3 characters:
:v/^...\=$/d

Following would be my regex of choice
:v/\v^.{2,3}$/d
Try it online!
Options: Case insensitive
Assert position at the beginning of the string ^
Match any single character .{2,3}
Between 2 and 3 times, as many times as possible, giving back as needed (greedy) {2,3}
Assert position at the very end of the string $

How about "delete all lines with less than two or more than three characters"?
:g/^.\{,1}$\|^.\{4,}/d

Align text after n-th column in vim removing unnecessary blanks

In vim, in a Windows machine (with no access to "unix"-like commands such command column) I want to reformat this code to make it more readable:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FORMAT 99
COLUMN VALUE_2 FORMAT 99
COLUMN VALUE_3 FORMAT 999
COLUMN VALUE_4 FORMAT 999
And I want to have this using as less commands as possible:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FORMAT 99
COLUMN VALUE_2 FORMAT 99
COLUMN VALUE_3 FORMAT 999
COLUMN VALUE_4 FORMAT 999
Note this is just an excerpt, as there many more lines in which I must do the same.

You could use the following command:
:%s/\w\zs\s*\zeFORMAT/^I
The pattern will match the whitespaces between FORMAT and the end of the previous word and replace it by a tab:
\w Any 'word' character
\zs Start the matching
\s* Any number of whitespace
\ze End the matching
FORMAT The actual word format
\zs and \ze allow to apply the substitution only on the whitespaces see: :h /\zs and :h /\ze
Note that ^I should be inserted with ctrl+vtab
The tabular plugin recommended by #SatoKatsura would be a good way to do it too.
You can also generalize that. Let's say you have the following file:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FOO 99
COLUMN VALUE_2 BAR 99
You could use this command:
:%s/^\(\w*\s\)\{1}\w*\zs\s*\ze/
Were the pattern can be detailed like that:
^ Match the beginning of the line
\(\w*\s\)\{1} One occurrence of the pattern \w*\s i.e. one column
\w* Another column
\zs\s*\ze The whitespaces after the previous column
You could change the value of \{1} to apply the command on the next columns.
EDIT to answer #aturegano comment, here is a way to align the column to another one:
%s/^\(\w*\s\)\{1}\w*\zs\s*\ze/\=repeat(' ', 30-matchstrpos(getline('.'), submatch(0))[1])
The idea is still to match the whitespaces which must be aligned, on the second part of the substitution command we use a sub-replace-expression (See :h sub-replace-expression).
This allows us to use a command from the substitution part, which can be explained like this:
\= Interpret the next characters as a command
repeat(' ', XX) Replace the match with XX whitespaces
XX is decomposed like this:
30- 30 less the next expression
matchstrpos()[1] Returns the columns where the second argument appears in the first one
getline('.') The current line (i.e. the one containing the match
submatch(0) The matched string
[1] Necessary since matchstrpos() returns a list:
[matchedString, StartPosition, EndPosition]
and we are looking for the second value.
You then simply have to replace 30 by the column where you want to move your next column.
See :h matchstrpos(), :h getline() and :h submatch()

For alignment, there are three well-known plugins:
the venerable Align - Help folks to align text, eqns, declarations, tables, etc
the modern tabular
the contender vim-easy-align

Posting an answer as requested:
:g/^COLUMN / s/.*/\=call('printf', ['%s %-30s %s %s'] + split(submatch(0)))/
Explanation:
g/^COLUMN / - apply the following command to lines matching /^COLUMN / (cf. :h :global)
\= - replace with the result of evaluating an expression, rather than with a fixed string (cf. :h s/\=)
submatch(0) - the line being matched
split(...) - split line into words
printf(...) - format the line
call(...) - we'd like to have printf('%s %-30s %s %s', list), but printf() doesn't take "real" lists as arguments, so we have to unfold the list with a call(...) (cf. :h call()).

Yet another solution:
:%s/ \{2,}/ /g
This solution is not perfect because the result will have an extra single space on the first line. To fix this problem:
:%s/\%>15c \{2,}/ /g
Explanation of pattern:
%>15c\s\{2,}
%>15c Matches only after column 15
\s\{2,} Matches two or more white spaces

Remove extra commas from only 2nd and 3rd row of CSV file

I have a comma delimited file (CSV file) test.csv as shown below.
FHEAD,1,2,3,,,,,,
FDEP,2,3,,,,,,,,
FCLS,3,,,4-5,,,,,,,
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,
I wanted to remove the empty columns only from 2nd and 3rd row of the file i.e. were ever the records starts with FDEP and FCLS only in those rows I wanted to remove the empty columns (,,).
after removing the empty columns the same file test.csv should look like
FHEAD,1,2,3,,,,,,
FDEP,2,3
FCLS,3,4-5
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,
How can I do this in Unix???

Here's one way to do it, using sed:
sed '/^F\(DEP\|CLS\),/ { s/,\{2,\}/,/g; s/,$// }'
We use a range of /^F\(DEP\|CLS\),/, i.e. the following command will only process lines matching ^F\(DEP\|CLS\),. This regex matches beginning-of-string, followed by F, followed by either DEP or CLS, followed by ,. In other words, we look for lines starting with FDEP, or FCLS,.
Having found such a line, we first substitute (s command) all runs (g flag, match as many times as possible) of 2 or more (\{2,\}) commas (,) in a row by a single ,. This squeezes ,,, down to a single ,.
Second, we substitute , at end-of-string by nothing. This gets rid of any trailing comma.

Search for a pattern in Column in a CSV and replace another pattern in the same line using sed command

I want to check for a pattern (only if the pattern starts with) in second column in a CSV file and if that pattern exists then replace something else in same line.
I wrote the following sed command for following csv to change the I to N if the pattern 676 exists in second column. But it checks 676 in the 7th and 9th column also since the ,676 exists. Ideally, I want only the second line to be checked for if the prefix 676 exists. All I want is to check 676 prefixed in second column (pattern not in the middle or end of the second value Ex- 46769777) and then do the change on ,I, to ,N,.
sed -i '/,676/ {; s/,I,/,N,/;}' temp.csc
6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,I,TTTT,I,67677,yy
6768880,46769777,S,I,TTTT,I,67677,yy
Expected result required
6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,N,TTTT,N,67677,yy
6768880,40999777,S,I,TTTT,I,67677,yy

If you are not bound by sed, awk might be a better option for you. Give this a try :
awk -F"," '{match($2,/^676/)&&gsub(",I",",N")}{print}' temp.csc
match syntax does the matching of second column to numbers that starts with (^) 676. gsub replaces I with N.
Result:
6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,N,TTTT,N,67677,yy
6768880,46769777,S,I,TTTT,I,67677,yy

This requires that 676 appear at the beginning of the second column before any changes are made:
$ sed '/^[^,]*,676/ s/,I,/,N,/g' file
6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,N,TTTT,N,67677,yy
6768880,46769777,S,I,TTTT,I,67677,yy
Notes:
The regex /^[^,]*,676/ requires that 676 appear after the first appearance of a comma on the line. In more detail:
^ matches the beginning of the line
[^,]* matches the first column
,676 matches the first comma followed by 676
In your desired output, ,I, was replaced with ,N, every time it appeared on the line. To accomplish this, g (meaning global) was added to the substitute command.

Keep duplicate lines notepad++

I need to remove the unique lines and keep the duplicates in my text file(read the articles written to remove duplicate lines but I want to do the opposite). Is there any way I could do that using expressions or textfx?
E.g:
file1.txt
hello
world
hello
After operation, output should be
hello
hello
Thanks in advance

In the Replace dialogue:
Find:
^(.+)\r?\n(?!(.|\r?\n)*\1)
Replace:
*leave empty!*
Options:
Select radio button "Regular Expression"
Leave checkbox ". matches newline" unselected
Pros:
Duplicate line doesn't need to be immediately after the 1st occurrence
Cons:
If a line appears x times in your data, after the regex x-1 occurrences will be left and not x as asked in OP.

This finds all lines followed by a line repetition (it does NOT find the last line, though):
.+\r\n(?=(.+\r\n)\1)
000000 111111 22
This matches a non-empty line 0, but only if it is followed by (a non-empty line \1, which is followed by \1).
Note that this assumes \r\n (Windows) line separations. On a Unix text file, just \n, on a Mac text file, just \r.
In the search box, mark Regular expression, unmark . matches newline, Replace with = "".
Example:
"Zulu
Alpha
Alpha
Bravo
Charlie
Charlie
Delta
Echo
Echo
Foxtrott
"
(file ends with empty line)
-->
"Alpha
Alpha
Charlie
Charlie
Echo
Echo
Foxtrott
"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Delete lines based on character(s) in specific position on line - vim

You can try following ex command: :if match( "HKZ", strpart( getline("."), 24, 1) ) != -1 | delete | endif

Related

How to combine and negate these two patterns together?

Align text after n-th column in vim removing unnecessary blanks

Remove extra commas from only 2nd and 3rd row of CSV file

Search for a pattern in Column in a CSV and replace another pattern in the same line using sed command

Keep duplicate lines notepad++

Categories

Resources