Removing duplicate rows in vi? - vim

I have a text file that contains a long list of entries (one on each line). Some of these are duplicates, and I would like to know if it is possible (and if so, how) to remove any duplicates. I am interested in doing this from within vi/vim, if possible.

If you're OK with sorting your file, you can use:
:sort u

Try this:
:%s/^\(.*\)\(\n\1\)\+$/\1/
It searches for any line immediately followed by one or more copies of itself, and replaces it with a single copy.
Make a copy of your file though before you try it. It's untested.

From command line just do:
sort file | uniq > file.new

awk '!x[$0]++' yourfile.txt if you want to preserve the order (i.e., sorting is not acceptable). In order to invoke it from vim, :! can be used.

I would combine two of the answers above:
go to head of file
sort the whole file
remove duplicate entries with uniq
1G
!Gsort
1G
!Guniq
If you were interested in seeing how many duplicate lines were removed, use control-G before and after to check on the number of lines present in your buffer.

g/^\(.*\)$\n\1/d
Works for me on Windows. Lines must be sorted first though.

Select the lines in visual-line mode (Shift+v), then :!uniq. That'll only catch duplicates which come one after another.

If you don't want to sort/uniq the entire file, you can select the lines you want to make uniq in visual mode and then simply: :sort u.

Regarding how Uniq can be implemented in VimL, search for Uniq in a plugin I'm maintaining. You'll see various ways to implement it that were given on Vim mailing-list.
Otherwise, :sort u is indeed the way to go.

:%s/^\(.*\)\(\n\1\)\+$/\1/gec
or
:%s/^\(.*\)\(\n\1\)\+$/\1/ge
this is my answer for you ,it can remove multiple duplicate lines and
only keep one not remove !

I would use !}uniq, but that only works if there are no blank lines.
For every line in a file use: :1,$!uniq.

This version only removes repeated lines that are contigous. I mean, only deletes consecutive repeated lines. Using the given map the function does note mess up with blank lines. But if change the REGEX to match start of line ^ it will also remove duplicated blank lines.
" function to delete duplicate lines
function! DelDuplicatedLines()
while getline(".") == getline(line(".") - 1)
exec 'norm! ddk'
endwhile
while getline(".") == getline(line(".") + 1)
exec 'norm! dd'
endwhile
endfunction
nnoremap <Leader>d :g/./call DelDuplicatedLines()<CR>

An alternative method that does not use vi/vim (for very large files), is from the Linux command line use sort and uniq:
sort {file-name} | uniq -u

This worked for me for both .csv and .txt
awk '!seen[$0]++' <filename> > <newFileName>
Explanation:
The first part of the command prints unique rows and the second part i.e. after the middle arrow is to save the output of the first part.
awk '!seen[$0]++' <filename>
>
<newFileName>

Related

Vim - sort the contents of a register before/after pasting it?

As part of a project of mine I'm trying to move certain lines from a file to the top, sorted in a certain fashion. I'm not sure how to do the sort once those lines are up there - I don't want to disturb the other lines in the file.
I'm moving them by yanking them and putting them back down, like so:
g:/pattern/yank A
g:/pattern/d
0put A
This moves all the lines I specify up to the top of the file like I need, but now I need to sort them according to a pattern, like so:
[range]sort r /pattern2/
Is there a way to sort the contents of a register before pasting it? Or a way to sort only lines which match /pattern/? (because all the yanked lines will, of course).
I'm stymied and help would be appreciated.
edit - a possible workaround might be to count the number of lines before they're yanked, and then use that to select and sort those lines once they're placed again. I'm not sure how to count those lines - I can print the number of lines that match a pattern with the command :%s/pattern//n but I can't do anything with that number, or use that in a function.
The whole point of :g/pattern/cmd is to execute cmd on every line matching pattern. cmd can, of course, be :sort.
In the same way you did:
:g/pattern/yank A
to append every line matching pattern to register a and:
:g/pattern/d
to cut every line matching pattern, you can do:
:g/pattern/sort r /pattern2/
to sort every line matching pattern on pattern2.
Your example is wasteful anyway. Instead of abusing registers with three commands you could simply do:
:g/pattern/m0
to move every line matching pattern to the top of the buffer before sorting them with:
:g//sort r /pattern2/
See :help :global, :help :sort, :help :move.
I know this is old, and may not be of any use to you anymore, but I just figured this one out today. It relies on the system's sort command (not vim's). Assuming you're saving to register A:
qaq
:g/pattern/yank A
<C-O>
:put=system('sort --stable --key=2,3',#A)
qaq: clears register A of anything
:g/pattern/yank A: searches current buffer for pattern and copies it to register A
<C-O>: pressing Ctrl+O in normal mode returns you to the last place your cursor was
:put=system('sort --stable --key=2,3',#A): sends the contents of register A to the sort command's STDIN and pastes the output to the current position of the cursor.
I mapped this whole thing to <F8>:
noremap <F8> qaq:g/pattern/yank A<CR><C-O>:put=system('sort --stable --key=2,3',#A)<CR>
I don't know how janky this is considered, cuz I'm a complete noob to vim. I spent hours today trying to figure this out. It works for me and I'm happy with it, hopefully it'll help someone else too.

remove almost-duplicates containing substring of next line

I need to know a way to remove duplicate strings in line, but let me explain, cause I have already used uniq. In a file, I get these two lines:
ANASI:A=4-63261950;
ANASI:A=4-63261950,ES=541;
The string 4-63261950 is duplicated in both lines, but the line itself is different, only that string is equal in both lines. I just need a way to process the entire file and remove the first line and leave only the one with the ANASI:A=4-63261950,ES=541;. The file will contain several lines with this exact same scenario. Is there a way to do this with sed or something?
awk to the rescue...
assuming your delimiters and structure stays the same
sort file | awk -F"[;,]" '!a[$1]++'
will pick the first one based on lexical order (, < ;)
If file is huge (and memory a problem or issue)
sort YourFile | awk -F '[;,]' 'Last != $1{print}{Last = $1}'
This might work for you (GNU sed):
sed -r 'N;/^(.*);\n\1,/!P;D' file
This uses a moving window to compare successive pairs of lines to print the required match.

sort rows in 'VI' editor

If i have to sort following rows on the basis of values on left of '='. But the sorting should expand the selection to column after '=' simultaneously. Thtz is we dnt have to sort column after '=' ::
50599=1000000
50454=00000054
50080=00005464
50098=00000875
50661=00000665
50788=10000035
50988=10000006
50994=10000656
57009=00000005
57022=10000008
57040=10000005
57000=10000005
57060=10000089
57067=10005640
57102=00000765
57190=00000867
This needs to be done in 'VI' editing the file.
RESULT should be ::
50080=00005464
50098=00000875 ...etc.
Try:
:%!sort
It will sort according the whole line alphabetically. If you want to sort numerically (i.e. the number in the first column can have different widt), then try:
:%!sort -n
Don't worry about the =, it will not modify any line, it will just change their order.
This answer is coming 2 years late, but might still be relevant, in visual mode select the block you want to sort and run:
:!sort
You can do the following to see the sorted output:
:!sort %
Explanation:
: : to enter ex mode.
! : allows you to run a shell
command.
% : the name of the file currently
open.
To sort the file by changing it you can redirect its output to a temp file and then copy its content back to the original file:
:!(sort %>/tmp/tmp;cp -f /tmp/tmp %)
I'm not sure exactly when in the last eight years vi built this in, but you can now run:
:sort n
to sort numerical entries instead of using :! to run the sort command. See :help sort

How to add line numbers to range of lines in Vim?

How can I add line numbers to a range of lines in a file opened in Vim? Not as in :set nu—this just displays line numbers—but actually have them be prepended to each line in the file?
With
:%s/^/\=line('.')/
EDIT: to sum up the comments.
This command can be tweaked as much as you want.
Let's say you want to add numbers in front of lines from a visual selection (V + move), and you want the numbering to start at 42.
:'<,'>s/^/\=(line('.')-line("'<")+42)/
If you want to add a string between the number and the old text from the line, just concatenate (with . in VimL) it to the number-expression:
:'<,'>s/^/\=(line('.')-line("'<")+42).' --> '/
If you need this to sort as text, you may want to zero pad the results, which can be done using printf for 0001, 0002 ... instead of 1, 2... eg:
:%s/^/\=printf('%04d', line('.'))/
Anyway, if you want more information, just open vim help: :h :s and follow the links (|subreplace-special|, ..., |submatch()|)
cat -n adds line numbers to its input. You can pipe the current file to cat -n and replace the current buffer with what it prints to stdout. Fortunately this convoluted solution is less than 10 characters in vim:
:%!cat -n
Or, if you want just a subselection, visually select the area, and type this:
:!cat -n
That will automatically put the visual selection markers in, and will look like this after you've typed it:
:'<,'>!cat -n
In order to erase the line numbers, I recommend using control-v, which will allow you to visually select a rectangle, you can then delete that rectangle with x.
On a GNU system: with the external nl binary:
:%!nl
With Unix-like environment, you can use cat or awk to generate a line number easily, because vim has a friendly interface with shell, so everything work in vim as well as it does in shell.
From Vim Tip28:
:%!cat -n
or
:%!awk '{print NR,$0}'
But, if you use vim in MS-DOS, of win9x, win2000, you loss these toolkit.
here is a very simple way to archive this only by vim:
fu! LineIt()
exe ":s/^/".line(".")."/"
endf
Or, a sequence composed with alphabet is as easy as above:
exe "s/^/".nr2char(line("."))."/"
You can also use a subst:
:g/^/exe ":s/^/".line(".")."^I/"
You can also only want to print the lines without adding them to the file:
"Sometimes it could be useful especially be editing large source files to print the line numbers out on paper.
To do so you can use the option :set printoptions=number:y to activate and :set printoptions=number:n to deactivate this feature.
If the line number should be printed always, place the line set printoptions=number:y in the vimrc."
First, you can remove the existing line numbers if you need to:
:%s/^[0-9]*//
Then, you can add line numbers. NR refers to the current line number starting at one, so you can do some math on it to get the numbering you want. The following command gives you four digit line numbers:
:%!awk '{print 1000+NR*10,$0}'
The "VisIncr" plugin is good for inserting columns of incrementing numbers in general (or letters, dates, roman numerals etc.). You can control the number format, padding, and so on. So insert a "1" in front of every line (via :s or :g or visual-block insert), highlight that column in visual-block mode, and run one of the commands from the plugin.
If someone wants to put a tab (or some spaces) after inserting the line numbers using the this excellent answer, here's a way. After going into the escape mode, do:
:%s/^/\=line('.').' '/
^ means beginning of a line and %s is the directive for substitution. So, we say that put a line number at the beginning of each line and add 4 spaces to it and then put whatever was the contents of the line before the substitution, and do this for all lines in the file.
This will automatically substitute it. Alternatively, if you want the command to ask for confirmation from you, then do:
:%s/^/\=line('.').' '/igc
P.S: power of vim :)
The best reply is done in a duplicate question.
In summary:
with CTRL-V then G I 0 You can insert a column of zero.
Then select the whole column and increment:
CTRL-V g CTRL-A
See also: https://vim.fandom.com/wiki/Making_a_list_of_numbers#Incrementing_selected_numbers

Vim delete blank lines

What command can I run to remove blank lines in Vim?
:g/^$/d
:g will execute a command on lines which match a regex. The regex is 'blank line' and the command is :d (delete)
Found it, it's:
g/^\s*$/d
Source: Power of g at vim wikia
Brief explanation of :g
:[range]g/pattern/cmd
This acts on the specified [range] (default whole file), by executing the Ex command cmd for each line matching pattern (an Ex command is one starting with a colon such as :d for delete). Before executing cmd, "." is set to the current line.
:v/./d
or
:g/^$/d
or
:%!cat -s
The following can be used to remove only multi blank lines (reduce them to a single blank line) and leaving single blank lines intact:
:g/^\_$\n\_^$/d
how to remove all the blanks lines
:%s,\n\n,^M,g
(do this multiple times util all the empty lines went gone)
how to remove all the blanks lines leaving SINGLE empty line
:%s,\n\n\n,^M^M,g
(do this multiple times)
how to remove all the blanks lines leaving TWO empty lines AT MAXIMUM,
:%s,\n\n\n\n,^M^M^M,g
(do this multiple times)
in order to input ^M, I have to control-Q and control-M in windows
How about:
:g/^[ \t]*$/d
This works for me
:%s/^\s*$\n//gc
work with perl in vim:
:%!perl -pi -e s/^\s*$//g
I tried a few of the answers on this page, but a lot of them didn't work for me. Maybe because I'm using Vim on Windows 7 (don't mock, just have pity on me :p)?
Here's the easiest one that I found that works on Vim in Windows 7:
:v/\S/d
Here's a longer answer on the Vim Wikia: http://vim.wikia.com/wiki/Remove_unwanted_empty_lines
Press delete key in insert mode to remove blank lines.
This function only remove two or more blank lines, put the lines below in your vimrc, then use \d to call function
fun! DelBlank()
let _s=#/
let l = line(".")
let c = col(".")
:g/^\n\{2,}/d
let #/=_s
call cursor(l, c)
endfun
map <special> <leader>d :keepjumps call DelBlank()<cr>
:g/^\s*$/d
^ begin of a line
\s* at least 0 spaces and as many as possible (greedy)
$ end of a line
paste
:command -range=% DBL :<line1>,<line2>g/^\s*$/d
in your .vimrc,then restart your vim.
if you use command :5,12DBL
it will delete all blank lines between 5th row and 12th row.
I think my answer is the best answer!
If something has double linespaced your text then this command will remove the double spacing and merge pre-existing repeating blank lines into a single blank line. It uses a temporary delimiter of ^^^ at the start of a line so if this clashes with your content choose something else. Lines containing only whitespace are treated as blank.
%s/^\s*\n\n\+/^^^\r/g | g/^\s*$/d | %s/^^^^.*
This worked for me:
:%s/^[^a-zA-Z0-9]$\n//ig
It basically deletes all the lines that don't have a number or letter. Since all the items in my list had letters, it deleted all the blank lines.

Resources