How to delete text in a file based on regular expression using vim - vim

I have an XML file like this:
<fruit><apple>100</apple><banana>200</banana></fruit>
<fruit><apple>150</apple><banana>250</banana></fruit>
Now I want delete all the text in the file except the words in tag apple. That is, the file should contain:
100
150
How can I achive this?

:%s/.*apple>\(.*\)<\/apple.*/\1/
That should do what you need. Worked for me.
Basically just grabbing everything up to and including the tag, then backreferences everything between the apple begin and end tag, and matches to the rest of the line. Replaces it with the first backreference, which was the stuff between the apple tags.

I personally use this:
%s;.*<apple>\(\d*\)</apple>.*;\1;
Since the text contain '/' which is the default seperator,and by using ';' as sep makes the code clearer.
And I found that non-greedy match #Conspicuous Compiler mentioned should be
\{-}
instead of "{-}" in Vim.
However, I after change Conspicuous' solution to
%s/.*apple>(.\{-\})<\/apple.*/\1^M/g
my Vim said it can't find the pattern.

In this case, one can use the general technique for collecting pattern matches
explained in my answer to the question "How to extract regex matches
using Vim".
In order to collect and store all of the matches in a list, run the Ex command
:let t=[] | %s/<apple>\(.\{-}\)<\/apple>\zs/\=add(t,submatch(1))[1:0]/g
The command purposely does not change the buffer's contents, only collects the
matched text. To set the contents of the current buffer to the
newline-separated list of matches, use the command
:0pu=t | +,$d_

Related

vim Search Replace should use replaced text in following searches

I have a data file (comma separated) that has a lot of NAs (It was generated by R). I opened the file in vim and tried to replace all the NA values to empty strings.
Here is a sample slimmed down version of a record in the file:
1,1,NA,NA,NA,NATIONAL,NA,1,NANA,1,AMERICANA,1
Once I am done with the search-replace, the intended output should be:
1,1,,,,NATIONAL,,1,NANA,1,AMERICANA,1
In other words, all the NAs should be replaced except the words NATIONAL, NANA and AMERICANA.
I used the following command in vim to do this:
1, $ s/\,NA\,/\,\,/g
But, it doesn't seem to work. Here is the output that I get:
1,1,,NA,,NATIONAL,,1,NANA,1,AMERICANA,1
As you can see, there is one ,NA, that is left out of the replacement process.
Does anyone have a good way to fix it? Thanks.
A trivial solution is to run the same command again and it will take care of the remaining ,NA,. However, it is not a feasible solution because my actual data file has 100s of columns and 500K+ rows each with a variable number of NAs.
, doesn't have a special meaning so you don't have to escape it:
:1,$s/,NA,/,,/g
Which doesn't solve your problem.
You can use % as a shorthand for 1,$:
:%s/,NA,/,,/g
Which doesn't solve your problem either.
The best way to match all those NA words to the exclusion of other words containing NA would be to use word boundaries:
:%s/,\<NA\>,/,,/g
Which still doesn't solve your problem.
Which makes those commas, that you used to restrict the match to NA and that are causing the error, useless:
:%s/\<NA\>//g
See :help :range and :help \<.
Use % instead of 1,$ (% means "the buffer" aka the whole file).
You don't need \,. , works fine.
Vim finds discrete, non-overlapping matches. so in ,NA,NA,NA, it only finds the first ,NA, and third ,NA, as the middle one doesn't have its own separate surrounding ,. We can modify the match to not include certain characters of our regex with \zs (start) and \ze (end). These modify our regex to find matches that are surrounded by other characters, but our matches don't actually include them, so we can match all the NA in ,NA,NA,NA,.
TL;DR: %s/,\zsNA\ze,//g

From each filepath in each line extract filename and add next to it

Given my text in a file:
/home/dir1/file1.txt
/home/dir2/file2.txt
...
/home/dirn/filek.txt
I would like it to be this text instead:
/home/dir1/file1.txt file1
/home/dir2/file2.txt file2
...
/home/dirn/filek.txt filek
Can I write this in vim?
This regex works:
:%s#.*/\(.*\)\.txt$#& \1#
The starting .*/ skips everything until the last slash (it's greedy, so it will catch all the directory part.)
Then we capture the filename in a group with \(.*\).
And finally we match the extension with \.txt and anchor it to the end of the line.
For the replacement, we use a & to keep the full path around and then add \1 to include the filename only.
We can use # as a delimiter, so we don't need to escape the /s. We only have one here, but that's an useful technique to use when paths are involved, so I'm making sure I use it here for consistency.
You might want to take a look at :help pattern-searches to learn more about Vim regexes you can use for search and substitutions.
(In general, the Vim documentation is great and the help system can be very useful if you know how to navigate it, see :help helphelp for more.)

What is wrong with this vim regular expression?

I have a list of files with extension .elf like this
file1.elf
file2.elf
file3.elf
I am trying to run them in shell with run command like run file1.elf >file1.log and get the result in a log file with file name with .log addition.
My list of file is very big. I am trying out a vim regular expression so it will match the file name eg file1 in file1.elf and use it to create name for the log file. I am trying out like this
s/\(\(\<\w\+\)\#<=\.elf\)/\1 >\2\.log/
Here i try to match a text which is proceeded by .elf and keep it in \1 , i expect the entrire file name to be in it and \2 i was hoping would just contain the file name minus extension. but this gives me
run file1 >file1.run i.e \1 dose not take the full file name, it has some how missed .elf extension. I can do \1\.elf to get proper result but i was wondering why the expression is not working as i expected?
You use \#<= in your match pattern. This is the positiv lookahead assertion. As per documentation (:help /\#<=1),
Matches with zero width if the preceding atom matches just before what follows
The important part is that it matches with zero width, this is what you are experiancing, the .elf (which follows) is matched but with zero widht, so that \1 does not contain the suffix .elf.
Instead, it would be easier to go with a
%s/\v(.*)\.elf$/run \1.elf > \1.log/
Here, I've used \v to turn on very magic (:help magic). With this turned on, you don't need al those backslashes when you use grouping parantheses.
Then there is (.*) to match and store the filename up until
\.elf$ which seems to be each files suffix.
In the substitution part, after the / I add the literal run followed by \1. \1 will be replaced by the stored filename (without .elf suffix).
The \#<= seems pointless and unneeded. Removing it gets you the desired behavior.

Delete text with GREP in Textwrangler

I have the following source code from the Wikipedia page of a list of Games. I need to grab the name of the game from the source, which is located within the title attribute, as follows:
<td><i>007: Quantum of Solace</i><sup id="cite_ref-4" class="reference"><span>[</span>4<span>]</span></sup></td>
As you can see above, in the title attribute there's a string. I need to use GREP to search through every single line for when that occurs, and remove everything excluding:
title="Game name"
I have the following (in TextWrangler) which returns every single occurrence:
title="(.*)"
How can I now set it to remove everything surrounding that, but to ensure it keeps either the string alone, or title="string".
I use a multi-step method to process these kind of files.
First you want to have only one HTML tag per line, GREP works on each line so you want to minimise the need for complicated patterns. I usually replace all: > with >\n
Then you want to develop a pattern for each occurrence of the item you want. In this case 'title=".?"'. Put that in between parentheses (). Then you want add some filling to that statement to find and replace all occurrences of this pattern: .?(title=".?").
Replace everything that matches .?(title=".?").* with \1
Finally, make smart use of the Textwrangler function process lines containing, to filter any remaining rubbish.
Notes
the \1 refers to the first occurrence of a match between () you can also reorder stuff using multiple parentheses and use something like (.?), (.) with \2, \1 to shuffle columns.
Learn how to do lazy regular expressions. The use of ? in these patterns is very powerfull. Basically ? will have the pattern looking for the next occurrence of the next part of the pattern not the latest part that the next part of your pattern occurs.
I've figured this problem out, it was quite simple. Instead of retrieving the content in the title attribute, I'd retrieve the page name.
To ensure I only struck the correct line where the content was, I'd use the following string for searching the code.
(.)/wiki/(.)"
Returning \2
After that, I simply remove any cases where there is HTML code:
<(.*)
Returning ''
Finally, I'll remove the remaining content after the page name:
"(.*)
Returning ''
A bit of cleaning up the spacing and I have a list for all game names.

Using Vim, how do you use a variable to store count of patterns found?

This question was helpful for getting a count of a certain pattern in Vim, but it would be useful to me to store the count and sum the results so I can echo a concise summary.
I'm teaching a class on basic HTML to some high schoolers, and I'm using this script to be quickly check numbers of required elements throughout all their pages without leaving Vim. It works fine, but when students have more than 10 .html files it gets cumbersome to add up the various sections by hand.
Something like:
img_sum = :bufdo %s/<img>//gen
would be nice. I think I'll write a ruby script to check the pages more thoroughly and check for structure, but for now I'm curious about how to do this in Vim.
The problem can be solved by a counter separate from the one built-in into the
:substitute command: Use Vim-script variable to hold the number of pattern
matches. A convenient way to register every match and modify a particular
variable accordingly, is to take advantage of the substitute with an
expression feature of the :substitute command (see :help sub-replace-\=).
The idea is to use a substitution that evaluates an expression increasing
a counter on every occurrence, and does not change the text it is operating
on.
The first part of the technique cannot be implemented straightforwardly
because it is forbidden to use Ex commands in expressions (including \=
substitute expressions), and therefore it is not possible to use the :let
command to modify a variable. Answering the question "gVim find/replace
with counter", I have proposed a simple trick to overcome that limitation,
which is based on using a single-item list (or dictionary containing a single
key-value pair). Since the map() function transforms a list or a dictionary
in place, that only item could be changed in a constrained expression context.
To do that, one should call the map() function passing an expression
evaluating to the new value along with the list containing the current value.
The second half of the technique is how to avoid changing text when using
a substitution command. In order to achieve that, one can make the pattern
have zero-width by prepending \ze or by appending \zs atoms to it (see
:help /\zs, :help /\ze). In such a way, the modified pattern captures
a string of zero width just before or after the occurrence of the initial
pattern. So, if the replacement text is also empty, substitution does not
cause any change in the contents of a buffer. To make the substitute
expression evaluate to an empty string, one can just extract an empty
substring or sublist from the resulting value of that expression.
The two ideas are put into action in the following command.
:let n=[0] | bufdo %s/pattern\zs/\=map(n,'v:val+1')[1:]/ge
I think that answer above is hard to understand and more pretty way to use external command grep like this:
:let found=0
:bufdo let found=found+(system('grep "<p>" '.expand('%:p') . '| wc -l'))
:echo found

Resources