Vim: find lines with less occurrences of string - vim

In a data file I need to find all lines that contain less than 10 times the pattern |^|
I need them in two ways:
a search, so I can go through the file and examine the data
as a list to be copied, including the next line
I use Gvim in Windows.
So far I've tried along the lines of:
/[|^|]{,9}
/[|^|]*{,9}
:g/\v(\|[^|^|]*){,9}/p
Is anyone able to help me?
Edit: an example (as real data is not allowed to be used)
abc|^|def|^|ghi|^|jkl|^|mno|^|pqr|^|stu|^|vwx|^|yza|^|bcd|^|efg
abc|^|def|^|ghi|^|jkl|^|mno|^|pqr|^|stu|^|vwx|^|yza|^|
bcd|^|efg
abc|^|def|^|ghi|^|jkl|^|mno|^|pqr|^|stu|^|vwx|^|yza|^|bcd|^|efg
Final solution:
:v/\v(\|\^\|.*){10}

I tried this one. I think it will help.
/\(|^|.*\)\{10}
or with \v
/\v(\|\^\|.*){10}

Related

vim Search Replace should use replaced text in following searches

I have a data file (comma separated) that has a lot of NAs (It was generated by R). I opened the file in vim and tried to replace all the NA values to empty strings.
Here is a sample slimmed down version of a record in the file:
1,1,NA,NA,NA,NATIONAL,NA,1,NANA,1,AMERICANA,1
Once I am done with the search-replace, the intended output should be:
1,1,,,,NATIONAL,,1,NANA,1,AMERICANA,1
In other words, all the NAs should be replaced except the words NATIONAL, NANA and AMERICANA.
I used the following command in vim to do this:
1, $ s/\,NA\,/\,\,/g
But, it doesn't seem to work. Here is the output that I get:
1,1,,NA,,NATIONAL,,1,NANA,1,AMERICANA,1
As you can see, there is one ,NA, that is left out of the replacement process.
Does anyone have a good way to fix it? Thanks.
A trivial solution is to run the same command again and it will take care of the remaining ,NA,. However, it is not a feasible solution because my actual data file has 100s of columns and 500K+ rows each with a variable number of NAs.
, doesn't have a special meaning so you don't have to escape it:
:1,$s/,NA,/,,/g
Which doesn't solve your problem.
You can use % as a shorthand for 1,$:
:%s/,NA,/,,/g
Which doesn't solve your problem either.
The best way to match all those NA words to the exclusion of other words containing NA would be to use word boundaries:
:%s/,\<NA\>,/,,/g
Which still doesn't solve your problem.
Which makes those commas, that you used to restrict the match to NA and that are causing the error, useless:
:%s/\<NA\>//g
See :help :range and :help \<.
Use % instead of 1,$ (% means "the buffer" aka the whole file).
You don't need \,. , works fine.
Vim finds discrete, non-overlapping matches. so in ,NA,NA,NA, it only finds the first ,NA, and third ,NA, as the middle one doesn't have its own separate surrounding ,. We can modify the match to not include certain characters of our regex with \zs (start) and \ze (end). These modify our regex to find matches that are surrounded by other characters, but our matches don't actually include them, so we can match all the NA in ,NA,NA,NA,.
TL;DR: %s/,\zsNA\ze,//g

Saving a flat-file through Vim add an invisible byte to the file that creates a new line

The title is not really specific, but I have trouble identifying the correct key words as I'm not sure what is going on here. For the same reason, it is possible that my question has a duplicate, as . If that's the case: sorry!
I have a Linux application that receive data via flat files. I don't know exactly how those files are generated, but I can read them without any problem. Those are short files, only a line each.
For test purpose, I tried to modify one of those files and reinjected it again in the application. But when I do that I can see in the log that it added a mysterious page break at the end of the message (resulting in the application not recognising the message)...
For the sake of example, let's say I receive a flat file, named original, that contains the following:
ABCDEF
I make a copy of this file and named it copy.
If I compare those two files using the "diff" command, it says they are identical (as I expect them to be)
If I open copy via Vi and then quit without changing nor saving anything and then use the "diff" command, it says they are identical (as I also expect them to be)
If I open copy via Vi and then save it without changing anything and then use the "diff" command, I have the following (I added the dot for layout purpose):
diff original copy
1c1
< ABCDEF
\ No newline at end of file
---
.> ABCDEF
And if I compare the size of my two files, I can see that original is 71 bytes when copy is 72.
It seems that the format of the file change when I save the file. I first thought of an encoding problem, so I used the ":set list" command on Vim to see the invisible characters. But for both files, I can see the following:
ABCDEF$
I have found other ways to do my test, But this problem still bugged me and I would really like to understand it. So, my two questions are:
What is happening here?
How can I modify a file like that without creating this mysterious page break?
Thank you for your help!
What happens is that Vim is set by default to assume that the files you edit end with a "newline" character. That's normal behavior in UNIX-land. But the "files" your program is reading look more like "streams" to me because they don't end with a newline character.
To ensure that those "files" are written without a newline character, set the following options before writing:
:set binary noeol
See :help 'eol'.

Keep only rows in a range using vim

I was wondering if there's a way in VIM to keep only rows in a certain range, i.e say I wanted to to keep only rows 1:20 in a file, and discard everything else. Better yet say I wanted to keep lines 1-20 and 40-60 is there a way to do this?
Is there a way to do this without manually deleting stuff?
If you mean entire lines by "rows", just use the :delete command with the inverted range:
:21,$delete
removes all lines except 1-20.
If the ranges are non-consecutive, an alternative is the :vglobal command with regular expression atoms that match only in certain lines. For example, to only keep lines 3 and 7:
:g!/\%3l\|\%7l/delete
There are also atoms for "lines less/larger than", so you can build ranges with them, too.
In order to keep lines 1 through 20 and 40 through 60, the following construct should do:
:v/\%>0l\%<21l\|\%>39l\%<61l/d
If you want to (as I now understand from your comments) save (different) parts of the buffer as new files, it's best to not modify the original file, but to write fragments as separate files. In fact, Vi(m) supports this well, because you can just pass a range to the :write command:
:1,20w newfile1
:40,60w newfile2
Append works, too:
:40,60w >> newfile1
There's not only one way to achieve what you want:
If this question is really about the first rows in a file:
head -20 <filename> > newfile
If it shall be a vim solution:
:21ENTER
dG
However, you mention that you want to split up a large file into smaller pieces. The tool for this is split: it lets you split up files into chunks of even line count or even size.

Vim - How to move the result of a search to the beginning of the file?

I want to search some text and move the entire line where the text belongs to the beginning of the file. Just that.
How about the simple move command?
:g/^C/m0
:g/^B/m0
:g/^A/m0
:g/regex/norm dd1Gp
Well, what I'm gonna suggest is a primitive answer as primitive it can get. But nothing else springs to mind currently.
:g/A ... some text not including A, B or C.../d
(will tell you how many lines it has yanked)
and then you go to the beginning of the file and, for example
5P
Although, if cases are as simple as this, maybe sorting lines by first letter .... I've never done anything similar but look for older questions.

vi search copy paste search copy

I was just wondering if anyone could help out with how to do the following using vi.
I have a text file and it might contain something like
start of text file:
--something1.something2--
--anotherThing1.something2--
end of text file:
If I want to take this line and convert it by way of searching for anything matching the first occurrence of [A-Za-z0-9] copy that to the buffer then append it on the same line to before the first occurrent of --
start of text file:
--something1.something2.something1--
--anotherThing1.something2.anotherThing1--
end of text file:
Is there a single VI command to do this?
Cheers
Ben
:%s/--\([a-zA-Z0-9]*\).\(.*\)--/--\1.\2.\1--/gc
or without asking confirmation for every replace:
:%s/--\([a-zA-Z0-9]*\).\(.*\)--/--\1.\2.\1--/g
will produce:
--something1.something2.something1--
--anotherThing1.something2.anotherThing1--
from:
--something1.something2--
--anotherThing1.something2--
This is if you want to copy the first word after '--' up to first '.' and append '.' and word found before the last '--'.
Using vim.
RE COMMENTS:
Someone mentioned that it will not work when there are multiple words and so on.
I tested it on the following:
start of text file:
--something1.something2.something3.something4.something5.something6--
--anotherThing1.something2.anotherThing3.anotherThing4.anotherThing5--
end of text file:
after replace with the above expression:
start of text file:
--something1.something2.something3.something4.something5.something6.something1--
--anotherThing1.something2.anotherThing3.anotherThing4.anotherThing5.anotherThing1--
end of text file:
Try this:
%s:\([A-Za-z0-9]\+\)\(\..\+\)--:\1\2.\1--:g
Holy crap, in the time it took me to login to post that answer, you posted it and already got a comment!
%s/--\(\w*\)\.\(.*\)--/--\1.\2.\1--/g
--ab1.cd2..yz99-- -> --ab1.cd2..yz99.ab1--
Building on stefanB's solution but using negated character classes and the "very magic" setting, I arrive at the following substitution command:
:%s/\v--([^.]+)\.(.*)--/--\1.\2.\1--/
Depending on the exact requirements (what are allowed characters for "something1" and "anotherThing1") this might or might not be more correct.
One more thing to consider: all solutions posted so far assume that there is only one occurance of the "--text.someOtherText-- pattern per line. If this is not the case, the (.*) part of the pattern would have to be adjusted and the /g modifier is required.

Resources