In Vim, how to remove all lines that are duplicate somewhere

In Vim, how to remove all lines that are duplicate somewhere - vim

I have a file that contains lines as follows:
one one
one one
two two two
one one
three three
one one
three three
four
I want to remove all occurrences of the duplicate lines from the file and leave only the non-duplicate lines. So, in the example above, the result should be:
two two two
four
I saw this answer to a similar looking question. I tried to modify the ex one-liner as given below:
:syn clear Repeat | g/^\(.*\)\n\ze\%(.*\n\)*\1$/exe 'syn match Repeat "^' . escape(getline ('.'), '".\^$*[]') . '$"' | d
But it does not remove all occurrences of the duplicate lines, it removes only some occurrences.
How can I do this in vim? or specifically How can I do this with ex in vim?
To clarify, I am not looking for sort u.

If you have access to UNIX-style commands, you could do:
:%!sort | uniq -u
The -u option to the uniq command performs the task you require. From the uniq command's help text:
-u, --unique
only print unique lines
I should note however that this answer assumes that you don't mind that the output doesn't match any sort order that your input file might have already.

if you are on linux box with awk available, this line works for your needs:
:%!awk '{a[$0]++}END{for(x in a)if(a[x]==1)print x}'

Assuming you are on an UNIX derivative, the command below should do what you want:
:sort | %!uniq -u
uniq only works on sorted lines so we must sort them first with Vim's buit-in :sort command to save some typing (it works on the whole buffer by default so we don't need to pass it a range and it's a built-in command so we don't need the !).
Then we filter the whole buffer through uniq -u.

My PatternsOnText plugin version 1.30 now has a
:DeleteAllDuplicateLinesIgnoring
command. Without any arguments, it'll work as outlined in your question.

It does not preserve the order of the remaining lines, but this seems to work:
:sort|%s/^\(.*\)\n\%(\1\n\)\+//
(This version is #Peter Rincker's idea, with a little correction from me.) On vim 7.3, the following even shorter version works:
:sort | %s/^\(.*\n\)\1\+//
Unfortunately, due to differences between the regular-expression engines, this no longer works in vim 7.4 (including patches 1-52).

Taking the code from here and modifying it to delete the lines instead of highlighting them, you'll get this:
function! DeleteDuplicateLines() range
let lineCounts = {}
let lineNum = a:firstline
while lineNum <= a:lastline
let lineText = getline(lineNum)
if lineText != ""
if has_key(lineCounts, lineText)
execute lineNum . 'delete _'
if lineCounts[lineText] > 0
execute lineCounts[lineText] . 'delete _'
let lineCounts[lineText] = 0
let lineNum -= 1
endif
else
let lineCounts[lineText] = lineNum
let lineNum += 1
endif
else
let lineNum += 1
endif
endwhile
endfunction
command! -range=% DeleteDuplicateLines <line1>,<line2>call DeleteDuplicateLines()

This is not any simpler than #Ingo Karkat's answer, but it is a little more flexible. Like that answer, this leaves the remaining lines in the original order.
function! RepeatedLines(...)
let first = a:0 ? a:1 : 1
let last = (a:0 > 1) ? a:2 : line('$')
let lines = []
for line in range(first, last - 1)
if index(lines, line) != -1
continue
endif
let newlines = []
let text = escape(getline(line), '\')
execute 'silent' (line + 1) ',' last
\ 'g/\V' . text . '/call add(newlines, line("."))'
if !empty(newlines)
call add(lines, line)
call extend(lines, newlines)
endif
endfor
return sort(lines)
endfun
:for x in reverse(RepeatedLines()) | execute x 'd' | endfor
A few notes:
My function accepts arguments instead of handling a range. It defaults to the entire buffer.
This illustrates some of the functions for manipulating lists. :help list-functions
I use /\V (very no magic) so the only character I need to escape in a search pattern is the backslash itself. :help /\V

Add line number so that you can restore the order before sort
:%s/^/=printf("%d ", line("."))/g
sort
:sort /^\d+/
Remove duplicate lines
:%s/^(\d+ )(.*)\n(\d+ \2\n)+//g
Restore order
:sort
Remove line number added in #1
:%s/^\d+ //g

please use perl ,perl can do it easily !
use strict;use warnings;use diagnostics;
#read input file
open(File1,'<input.txt') or die "can not open file:$!\n";my #data1=<File1>;close(File1);
#save row and count number of row in hash
my %rownum;
foreach my $line1 (#data1)
{
if (exists($rownum{$line1}))
{
$rownum{$line1}++;
}
else
{
$rownum{$line1}=1;
}
}
#if number of row in hash =1 print it
open(File2,'>output.txt') or die "can not open file:$!\n";
foreach my $line1 (#data1)
{
if($rownum{$line1}==1)
{
print File2 $line1;
}
}
close(File2);

Related

prevblank() or startpara() function?

For the purposes of fixing indent/rst.vim I would need a Viml function returning line number of the first line of paragraph (blank line separated set of lines). Does anybody have something like that written?

I guess you want line("'{"). It returns the number of (blank) line before the previous paragraph (or 1 if the paragraph is at the beginning of a file). See :h '{ and :h {.
UPD. If we speak of a "complete" version:
function! StartPara()
let l:lnum = line("'{")
return l:lnum > 1 ? l:lnum + 1 : 1 + empty(getline(1))
"or a shorter but a little more inefficient version:
"return l:lnum + empty(getline(l:lnum))
endfunction
Note that a line containing only spaces is counted as a "paragraph" line, not as a "separator" line. Thus we don't need to match a regex.

I’ve got even better answer on Reddit:
function! get_paragraph_start()
let paragraph_mark_start = getpos("'{")[1]
return getline(paragraph_mark_start) =~ '\S' ? paragraph_mark_start : paragraph_mark_start + 1
endfunction
which is kind of similar to what Matt suggested, but more complete.

Passing line numbers from external command to run macro

I've seen a couple of questions about passing line numbers from Vim to an external command, but I want to do the opposite. I want to run a file through jshint and then apply corrections to each line number based on the jshint output.
For example, I'm trying to append a semicolon on each line that is missing one. Right now I'm shelling out to jshint and parsing the output but I'm not sure how I can use that to run a macro on multiple lines.
My current thought right now is to:
call jshint and parse out the line numbers for "Missing semicolon" errors
iterate through line numbers
for each line number, run G<LINE_NUMBER>A;
Here is what I have so far for parsing the jshint output:
:r ! jshint % | grep 'Missing semicolon' | awk '{ print $3 }' | sed 's/,//'
Is there a convenient way for me to do something like xargs in Vim or to parse the output of the external command into an array that I can loop over?

Well, let's see. You might try using errorformat:
let lines = split(system('jshint --verbose ' . shellescape(expand('%', 1))), "\n", 1)
let &errorformat = '%f: line %l\, col %v\, %m'
cgetexpr lines
for line in uniq(sort(map(filter(getqflist(), 'v:val["valid"] && v:val["text"] =~# "\\m^Missing semicolon"'), 'v:val["lnum"]')))
execute line . 's/$/;/'
endfor
Not what I'd call "convenient", but what do I know.
Then it might occur to you that the missing semicolons might not always be at end of lines. So you'd modify the code like this:
function! Cmp(a, b)
return a:a[0] == a:b[0] ? a:b[1] - a:a[1] : a:b[0] - a:a[0]
endfunction
let lines = split(system('jshint --verbose ' . shellescape(expand('%', 1))), "\n", 1)
let &errorformat = '%f: line %l\, col %v\, %m'
cgetexpr lines
for p in uniq(sort(map(filter(getqflist(), 'v:val["valid"] && v:val["text"] =~# "\\m^Missing semicolon"'),
\ '[str2nr(v:val["lnum"]), str2nr(v:val["col"])]'), 'Cmp'))
let line = getline(p[0])
call setline(p[0], line[ : p[1]-2] . ';' . line[p[1]-1 :])
endfor
Then it may occur to you that this doesn't handle the case of tabs. That's a problem because by default JSHint's idea of a tab is tab stop = 4, while Vim's is tab stop = 8. Then you... might fix that as an exercise, or you might come to your senses and use a real JavaScript parser to fix this instead of Vim. :)

iterate through regex results vimscript

In vimscript, how can I iterate through all the matches of a regex in the current file and then run a shell command for each result?
I think this is a start but I cant figure out how to feed it the whole file and get each match.
while search(ENTIRE_FILE, ".*{{\zs.*\ze}}", 'nw') > 0
system(do something with THIS_MATCH)
endwhile

Assuming that we have a file with the content:
123 a shouldmatch
456 b shouldmatch
111 c notmatch
And we like to match
123 a shouldmatch
456 b shouldmatch
with the regex
.*shouldmatch
If you only have one match per line you can use readfile() and afterwards loop through the lines and check each line with matchstr(). [1]
function! Test001()
let file = readfile(expand("%:p")) " read current file
for line in file
let match = matchstr(line, '.*shouldmatch') " regex match
if(!empty(match))
echo match
" your command with match
endif
endfor
endfunction
You can put this function in your ~/.vimrc and call it with call Test001().
[1] http://vimdoc.sourceforge.net/htmldoc/eval.html#matchstr%28%29

You could use subtitute() instead. For instance...
call substitute(readfile(expand('%')), '.*{{\zs.*\ze}}',
\ '\=system("!dosomething ".submatch(1))', 'g')

Vim: Columnvise Increment inside and outside?

By outside, I want solutions that does not use Vim's scripting hacks but try to reuse certain basic *ix tools. Inside Vim stuff asks for solutions to get the column-increment with inside stuff such as scripting.
1 1
1 2
1 3
1 ---> 4
1 5
1 6
. .
. .
Vim has a script that does column-vise incrementing, VisIncr. It has gathered about 50/50 ups and down, perhaps tasting a bit reinventing-the-wheel. How do you column-increment stuff in Vim without using such script? Then the other question is, how do you column-increment stuff without/outside Vim?
Most elegant, reusable and preferably-small wins the race!

I don't see a need for a script, a simple macro would do
"a yyp^Ayy
then play it, or map to play it.
Of course, there is always the possibility that I misunderstood the question entirely...

The optimal choice of a technique highly depends on the actual circumstances
of the transformation. There are at least two points variations affecting
implementation:
Whether the lines to operate on are the only ones in a file? If not,
is the range of lines defined by context (i.e. it separated by blank
lines, like a paragraph) or is it arbitrary and should be specified by
user?
Are those lines already contain numbers that should be changed or is
it necessary to insert new ones leaving the text on the lines in tact?
Since there is no information to answer these questions, below we will try to
construct a flexible solution.
A general solution is a substitution operating on the beginnings of the lines
in the range specified by the user. Visual mode is probably the simplest way
of selecting an arbitrary range of lines, so we assume here that boundaries of
the range are defined by the visual selection.
:'<,'>s/^\d\+/\=line(".")-line("''")+1/
If it is necessary to number every line in a buffer, the command can be
simplified as follows.
:%s/^\d\+/\=line('.')/
In any case, if the number should be merely inserted at the beginnings of the
lines (without modifying the ones that already exist), one can change the
pattern from ^\d\+ to ^, and optionally add a separator:
:'<,'>s/^\d\+/\=(line(".")-line("''")+1).' '/
or
:%s/^/\=line('.').' '/
respectively.
For a solution based on command-line tools, one can consider using stream
editors like Sed or text extraction and reporting tools like AWK.
To number each of the lines in a file using Sed, run the commands
$ sed = filename | sed 'N;s/\n/ /'
In order to do the same in AWK, use the command
$ awk '{print NR " " $0}' filename
which could be easily modfied to limit numbering to a particular range of lines
satisfying a certain condition. For example, the following command numbers the
lines two through eight.
$ awk '{print (2<=NR && NR<=8 ? ++n " " : "") $0}' filename
Having an interest in how commands similar to those from the script linked in
the question statement are implemented, one can use the following command as
a reference.
vnoremap <leader>i :call EnumVisualBlock()<cr>
function! EnumVisualBlock() range
if visualmode() != "\<c-v>"
return
endif
let [l, r] = [virtcol("'<"), virtcol("'>")]
let [l, r] = [min([l, r]), max([l, r])]
let start = matchstr(getline("'<"), '^\d\+', col("'<")-1)
let off = start - line("'<")
let w = max(map([start, line("'>") + off], 'len("".v:val)'))
exe "'<,'>" 's/\%'.l.'v.*\%<'.(r+1).'v./'.
\ '\=printf("%'.w.'d",line(".")+off).repeat(" ",r-l+1-w)'
endfunction

If you want change 1 1 1 1 ... to 1 2 3 4 .... (Those numbers should be on different lines.)
:let i=1 | g/1/s//\=i/g | let i+=1
If some of 1 1 1 1 ... are in the same line:
:let g:i = 0
:func! Inc()
: let g:i+=1
: return g:i
:endfun
:%s/1/\=Inc()/g

longest line in vim?

Is there a command to determine length of a longest line in vim? And to append that length at the beginning of the file?

Gnu's wc command has a -L --max-line-length option which prints out the max line length of the file. See the gnu man wc. The freebsd wc also has -L, but not --max-line-length, see freebsd man wc.
How to use these from vim? The command:
:%!wc -L
Will filter the open file through wc -L and make the file's contents the maximum line length.
To retain the file contents and put the maximum line length on the first line do:
:%yank
:%!wc -L
:put
Instead of using wc, Find length of longest line - awk bash describes how to use awk to find the length of the longest line.
Ok, now for a pure Vim solution. I'm somewhat new to scripting, but here goes. What follows is based on the FilterLongestLineLength function from textfilter.
function! PrependLongestLineLength ( )
let maxlength = 0
let linenumber = 1
while linenumber <= line("$")
exe ":".linenumber
let linelength = virtcol("$")
if maxlength < linelength
let maxlength = linelength
endif
let linenumber = linenumber+1
endwhile
exe ':0'
exe 'normal O'
exe 'normal 0C'.maxlength
endfunction
command PrependLongestLineLength call PrependLongestLineLength()
Put this code in a .vim file (or your .vimrc) and :source the file. Then use the new command:
:PrependLongestLineLength
Thanks, figuring this out was fun.

If you work with tabulations expanded, a simple
:0put=max(map(getline(1,'$'), 'len(v:val)'))
is enough.
Otherwise, I guess we will need the following (that you could find as the last example in :h virtcol(), minus the -1):
0put=max(map(range(1, line('$')), "virtcol([v:val, '$'])-1"))

:!wc -L %
rather than
:%!wc -L
To append that length at the beginning of the file:
:0r !wc -L % | cut -d' ' -f1

Here is a simple, hence easily-remembered approach:
select all text: ggVG
substitute each character (.) with "a": :'<,'>s/./a/g
sort, unique: :'<,'>sort u
count the characters in the longest line (if too many characters to easily count, just look at the column position in the Vim status bar)
I applied this to examine Enzyme Commission (EC) numbers, prior to making a PostgreSQL table:
I copied the ec_numbers data to Calc, then took each column in Neovim, replaced each character with "a",
:'<,'>s/./a/g
and then sorted for unique lines
:'<,'>sort u
aaaaaaa
aaaaaaaa
aaaaaaaaa
aaaaaaaaaa
aaaaaaaaaaa
... so the longest EC number entry [x.x.x.x] is 11 char, VARCHAR(11).
Similarly applied to the Accepted Names, we get
aaaaa
aaaaaa
...
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
i.e. the longest name is 147 char: VARCHAR(200) should cover it!

For neovim users:
local lines = vim.api.nvim_buf_get_lines(bufnr, 0, -1, false)
local width = #(lines[1])
for _, line in ipairs(lines) do
if #line > width then
width = #line
end
end

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

In Vim, how to remove all lines that are duplicate somewhere - vim

if you are on linux box with awk available, this line works for your needs: :%!awk '{a[$0]++}END{for(x in a)if(a[x]==1)print x}'

My PatternsOnText plugin version 1.30 now has a :DeleteAllDuplicateLinesIgnoring command. Without any arguments, it'll work as outlined in your question.

Add line number so that you can restore the order before sort :%s/^/=printf("%d ", line("."))/g sort :sort /^\d+/ Remove duplicate lines :%s/^(\d+ )(.*)\n(\d+ \2\n)+//g Restore order :sort Remove line number added in #1 :%s/^\d+ //g

Related

prevblank() or startpara() function?

Passing line numbers from external command to run macro

iterate through regex results vimscript

Vim: Columnvise Increment inside and outside?

longest line in vim?

Categories

Resources