delete ocurrences that ocurr less than X time - vim

I have a file with numbers. One number per line
1234
54332
54321
32452
1234
1234
54321
I want to delete every number that doesn't appear more than 3 times.
I was thinking about sorting and then joining lines and then delete the ones that don't have 3 words.
I think there is a better way but I don't know enough vim to do it.
Have any tip?

As I commented under your question, I would do it with awk. of course, vim can do it too, by a custom function, for example.
you could try this line:
%!awk '{a[$0]++}END{for(x in a)if(a[x]>3)for(y=1;y<=a[x];y++)print x}'
note that, your example is not so good, because there is no line "appear more than 3 times." If you add another 1234 line, the result of above command would be:
1234
1234
1234
1234

Related

How can you create a vim macro that reorders lines?

Let's start off with the text
1 The
2 Quick
3 Brown
4 Fox
5 Jumps
6 Over
7 The
8 Lazy
9 Dog
Let's then say that you want to make the first line the last line repeatedly with a macro. That is, this is the goal state after 1 run
1 Quick
2 Brown
3 Fox
4 Jumps
5 Over
6 The
7 Lazy
8 Dog
9 The
Use case : I want to apply a longer macro with the word The the first time, Quick the 2nd time, etc.
The naive approach works exactly once :
q11Gdd8Gpq
#1 <- This works
#1 <- This breaks
This breaks when repeated. I've tried other approaches which avoid dd (e.g. making a new line above the 1st line, d1j, returning to the 8th line, paste, J to join lines). Everything I try works when run once, but something is changing the macro buffer during this run.
How do you make a macro that does this that can be run multiple times?
This page has the answer, https://vim.fandom.com/wiki/Moving_lines_up_or_down
Outside my specific application (thanks #Amadan in the comments) this is
q1:1m$<cr>q
For me, where I am rotating items in a list with contents after the list, the entire solution ended up being
q1:1m8<cr>q
However for the problem as stated, $ rather than a line number is correct.
This situation does not require a macro: the common idiom is
:global/^/move $

how to print two strings in a line one with space delimiter and another between two strings in Linux

I have a file with more than 100 lines.
But only some lines have specific pattern like abc.
My question is that I want two things to print
5th word of line which has pattern abc.
words between 2 distinct strings (xxx, yyy).
Say for example my file has the content below:
This is first line.
Second line has abc pattern with xxx as first separator and yyy as second separator.
This is third line.
Again fourth line has same pattern abc with separators xxx and yyy.
And so on.
The required output is like below:
pattern as first separator and
same and
I tried many ways in Linux but if I was able to print 5th word then content between xxx and yyy I was not able to print and vice versa.
Can any one help me please?
Let me answer to your question:
My question is that I want two things to print
5th word of line which has pattern abc.
words between 2 distinct strings (xxx, yyy).
You can use awk for both parts of your question:
awk '/abc/{print $5}' input_file.txt
awk '/xxx.*yyy/{if(match($0,"xxx.*yyy)){print substr($0,RSTART,RLENGTGH)}}' input_file.txt
if you need to combine both requirements in one command:
awk '/abc/{print $5} /xxx.*yyy/{if(match($0,"xxx.*yyy)){print substr($0,RSTART,RLENGTGH)}}'
OUTPUT:
pattern
xxx as first separator and yyy
same
xxx and yyy

Linux - flip last bottom lines only, with awk

We are currently learning some awk commands to manipulate/rearrange text on files.
our original text goes like this..1998
Ford,mondeo,1998,5800
ford,fiesta,1994,4575
Chevy,impala,1998,1000
Plymouth,Fury,1992,2000
Dodge,charger,1989,5950
Ford,F150,1991,1450
It's asking us to rearrange the columns (part one) which i already did..
but at the same time to flip the bottom two lines (part two) so that it looks like this
1998 5800 Ford mondeo
1994 4575 ford fiesta
1998 1000 Chevy impala
1992 2000 Plymouth Fury
1991 1450 Ford F150
1989 5950 Dodge charger
again, i got the order and spacing taken care of, but how do i switch the bottom two lines (or flip them). by the way this can only be done with awk.
The following code switches the last two lines of a file:
awk '{array[NR%3]=$0;
if(NR>2){print array[(NR+1)%3]}}
END{print array[NR%3];
print array[(NR-1)%3]}' test
To do so, we store the last three lines in an array. We print a line only when line after the next line is read. Finally, we take last two lines from the array, invert the order and print it.
A more detailed description of the code:
We store the content of the current line ($0) in an array. As we only need to keep track of the last three lines read, the index of our array can be limited to the values 0, 1, and 2. We achieve this by using the modulo operator (NR % 3).
When reaching the third line, we print the last but one line (if(NR>2)...). This is save, as in this case there are still two lines in the array that we can handle separately.
When the end of the file is reached (END{...}), we print the last two lines in inverse order.
Why did we use NR%3 (and not NR%2)? If there were only two lines in our input file, printing must happen in the END-block and not before!
this will swap the last two lines and prints omitted for the others
$ tac cars | awk 'NR==1{getline one}
NR==2{print; print one}
NR>2{print "omitted"}' | tac
omitted
omitted
omitted
Plymouth,Fury,1992,2000
Dodge,charger,1989,5950
you can add formatting. Similarly can be without tac with logic in END block.
UPDATE: awk only solution: keep two lines in memory, when done print them in reverse order
$ awk 'NR>2{print pp}
{pp=p;p=$0}
END{print p;print pp}' car
Ford,mondeo,1998,5800
ford,fiesta,1994,4575
Chevy,impala,1998,1000
Plymouth,Fury,1992,2000
Ford,F150,1991,1450
Dodge,charger,1989,5950

How to delete double lines in bash

Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT--
Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
Try this simple script:
cat file.txt | sort | uniq
cat will output the contents of the file,
sort will put duplicate entries adjacent to each other
uniq will remove adjcacent duplicate entries.
Hope this helps!
The uniq command will do what you want.
But make sure the file is sorted first, it only checks for consecutive lines.
Like this:
sort file.txt | uniq

How to break one line of text into multiple lines of equal character length

Say I have 45 characters in a line in a text file, and I want to break them up into multiple lines, 10 characters each, what command should I use in putty? Also, if there are not enough characters at the end to make it 10, just leave it as it is.
ex:
12345678901234567890123456789012345
to
1234567890
1234567890
1234567890
12345
Try using the cut command with the option -c to cut the string based on number of characters.
a=1
b=4
myText="longString"
echo $myText | cut -c $a-$b
This will output long. This will help you to achieve what to want to do. When you want to repeat this, update the variables a and b.

Resources