Delete text with GREP in Textwrangler - string

I have the following source code from the Wikipedia page of a list of Games. I need to grab the name of the game from the source, which is located within the title attribute, as follows:
<td><i>007: Quantum of Solace</i><sup id="cite_ref-4" class="reference"><span>[</span>4<span>]</span></sup></td>
As you can see above, in the title attribute there's a string. I need to use GREP to search through every single line for when that occurs, and remove everything excluding:
title="Game name"
I have the following (in TextWrangler) which returns every single occurrence:
title="(.*)"
How can I now set it to remove everything surrounding that, but to ensure it keeps either the string alone, or title="string".

I use a multi-step method to process these kind of files.
First you want to have only one HTML tag per line, GREP works on each line so you want to minimise the need for complicated patterns. I usually replace all: > with >\n
Then you want to develop a pattern for each occurrence of the item you want. In this case 'title=".?"'. Put that in between parentheses (). Then you want add some filling to that statement to find and replace all occurrences of this pattern: .?(title=".?").
Replace everything that matches .?(title=".?").* with \1
Finally, make smart use of the Textwrangler function process lines containing, to filter any remaining rubbish.
Notes
the \1 refers to the first occurrence of a match between () you can also reorder stuff using multiple parentheses and use something like (.?), (.) with \2, \1 to shuffle columns.
Learn how to do lazy regular expressions. The use of ? in these patterns is very powerfull. Basically ? will have the pattern looking for the next occurrence of the next part of the pattern not the latest part that the next part of your pattern occurs.

I've figured this problem out, it was quite simple. Instead of retrieving the content in the title attribute, I'd retrieve the page name.
To ensure I only struck the correct line where the content was, I'd use the following string for searching the code.
(.)/wiki/(.)"
Returning \2
After that, I simply remove any cases where there is HTML code:
<(.*)
Returning ''
Finally, I'll remove the remaining content after the page name:
"(.*)
Returning ''
A bit of cleaning up the spacing and I have a list for all game names.

Related

Inject a code into the unique function in multiple files (Sublime editor)

Taking the advice of Xaelias I'll modify this post, because the initial question was quite unclear and overcomplicated.
So basically I have multiple script files that need to be edited (too many of them to afford to edit them manually). What I need to do for each script file is to insert a certain code at a specific location inside each of them.
So if my script file was called foo.script, and if the code inside it was as follows:
cut_bar() {
some code...
}
cut_cabbage() {
some code...
}
...
Then I'd like my final look of the foo.script file to be as follows:
cut_bar() {
some code...
message msg_foo_bar
}
cut_cabbage() {
some code...
message msg_foo_cabbage
...
(cut_ is a universal prefix shared by all those functions inside of which the code needs to be added).
Is there any way I can do this in Sublime Text editor? Or is there no other way but to develop a small program that does all of this (in which case, tips would be appreciated likewise!).
We'll try a first solution, which is far from perfect, but might be enough for what you want to do. If it does not work, we'll try another solution.
You want to do a search and replace (⌥⌘F (on mac) or Find→Replace...).
Make sure that the RegEx modifier is active (.* on the left)
Find What: (?<=^cut_)(.*?)(\(\)\h*\{(?:.|\v)*?)(^\}\h*$) (explanations below)
Replace With: \1\2\tmsg_foo_\1\n\3
If it does not work as intended, maybe we'll just need to tweak a little the RegEx. Or maybe we'll need to resort to python and ST plugins!
RegEx explanation:
(?<=^cut_): we want our match to start at the beginning of a line (^) with cut_ this is not captured because as you said, this is a constant, so we don't really care
(.*?): this matches the rest of the name of the method. This is captured and will be \1
(\(\)\h*\{(?:.|\v)*?): we stop the name capture at (), \h is for any horizontal space (spaces or tabs) that could be between the end of the name, and {, then we match every character and \v (vertical space such as new lines) (this is captured as \2)
(^\}\h*$): ... up until we meet a line that starts with }, and might have any kind of horizontal space before the end of the line ($), this is captured as \3
From there, the replace part is kind of straightforward I guess.
\1\2\ is everything we captured but do not want to modify (the name of the method, and its body, except for the last line })
Then we put what you wanted, msg_foo_\1 which will transform into msg_foo_bar for example, and then put back the ending }

replacing part of regex matches

I have several functions that start with get_ in my code:
get_num(...) , get_str(...)
I want to change them to get_*_struct(...).
Can I somehow match the get_* regex and then replace according to the pattern so that:
get_num(...) becomes get_num_struct(...),
get_str(...) becomes get_str_struct(...)
Can you also explain some logic behind it, because the theoretical regex aren't like the ones used in UNIX (or vi, are they different?) and I'm always struggling to figure them out.
This has to be done in the vi editor as this is main work tool.
Thanks!
To transform get_num(...) to get_num_struct(...), you need to capture the correct text in the input. And, you can't put the parentheses in the regular expression because you may need to match pointers to functions too, as in &get_distance, and uses in comments. However, and this depends partially on the fact that you are using vim and partially on how you need to keep the entire input together, I have checked that this works:
%s/get_\w\+/&_struct/g
On every line, find every expression starting with get_ and continuing with at least one letter, number, or underscore, and replace it with the entire matched string followed by _struct.
Darn it; I shouldn't answer these things on spec. Note that other regex engines might use \& instead of &. This depends on having magic set, which is default in vim.
For an alternate way to do it:
%s/get_\(\w*\)(/get_\1_struct(/g
What this does:
\w matches to any "word character"; \w* matches 0 or more word characters.
\(...\) tells vim to remember whatever matches .... So, \(w*\) means "match any number of word characters, and remember what you matched. You can then access it in the replacement with \1 (or \2 for the second, etc.)
So, the overall pattern get_\(\w*\)( looks for get_, followed by any number of word chars, followed by (.
The replacement then just does exactly what you want.
(Sorry if that was too verbose - not sure how comfortable you are with vim regex.)

Using Vim, how do you use a variable to store count of patterns found?

This question was helpful for getting a count of a certain pattern in Vim, but it would be useful to me to store the count and sum the results so I can echo a concise summary.
I'm teaching a class on basic HTML to some high schoolers, and I'm using this script to be quickly check numbers of required elements throughout all their pages without leaving Vim. It works fine, but when students have more than 10 .html files it gets cumbersome to add up the various sections by hand.
Something like:
img_sum = :bufdo %s/<img>//gen
would be nice. I think I'll write a ruby script to check the pages more thoroughly and check for structure, but for now I'm curious about how to do this in Vim.
The problem can be solved by a counter separate from the one built-in into the
:substitute command: Use Vim-script variable to hold the number of pattern
matches. A convenient way to register every match and modify a particular
variable accordingly, is to take advantage of the substitute with an
expression feature of the :substitute command (see :help sub-replace-\=).
The idea is to use a substitution that evaluates an expression increasing
a counter on every occurrence, and does not change the text it is operating
on.
The first part of the technique cannot be implemented straightforwardly
because it is forbidden to use Ex commands in expressions (including \=
substitute expressions), and therefore it is not possible to use the :let
command to modify a variable. Answering the question "gVim find/replace
with counter", I have proposed a simple trick to overcome that limitation,
which is based on using a single-item list (or dictionary containing a single
key-value pair). Since the map() function transforms a list or a dictionary
in place, that only item could be changed in a constrained expression context.
To do that, one should call the map() function passing an expression
evaluating to the new value along with the list containing the current value.
The second half of the technique is how to avoid changing text when using
a substitution command. In order to achieve that, one can make the pattern
have zero-width by prepending \ze or by appending \zs atoms to it (see
:help /\zs, :help /\ze). In such a way, the modified pattern captures
a string of zero width just before or after the occurrence of the initial
pattern. So, if the replacement text is also empty, substitution does not
cause any change in the contents of a buffer. To make the substitute
expression evaluate to an empty string, one can just extract an empty
substring or sublist from the resulting value of that expression.
The two ideas are put into action in the following command.
:let n=[0] | bufdo %s/pattern\zs/\=map(n,'v:val+1')[1:]/ge
I think that answer above is hard to understand and more pretty way to use external command grep like this:
:let found=0
:bufdo let found=found+(system('grep "<p>" '.expand('%:p') . '| wc -l'))
:echo found

How do you delete everything but a specific pattern in Vim?

I have an XML file where I only care about the size attribute of a certain element.
First I used
global!/<proto name="geninfo"/d
to delete all lines that I don't care about. That leaves me a whole bunch of lines that look like this:
<proto name="geninfo" pos="0" showname="General information" size="174">
I want to delete everything but the value for "size."
My plan was to use substitute to get rid of everything not matching 'size="[digit]"', the remove the string 'size' and the quotes but I can't figure out how to substitute the negation of a string.
Any idea how to do it, or ideas on a better way to achieve this? Basically I want to end up with a file with one number (the size) per line.
You can use matching groups:
:%s/^.*size="\([0-9]*\)".*$/\1/
This will replace lines that contain size="N" by just N and not touch other lines.
Explanation: this will look for a line that contains some random characters, then somewhere the chain size=", then digits, then ", then some more random characters, then the end of the line. Now what I did is that I wrapped the digits in (escaped) parenthesis. That creates a group. In the second part of the search-and-replace command, I essentially say "I want to replace the whole line with just the contents of that first group" (referred to as \1).
:v:size="\d\+":d|%s:.*size="\([^"]\+\)".*:\1:
The first command (until the | deletes every line which does not match the size="<SOMEDIGIT(S)>" pattern, the second (%s... removes everything before and after size attr's " (and " will also be removed).
HTH

How to delete text in a file based on regular expression using vim

I have an XML file like this:
<fruit><apple>100</apple><banana>200</banana></fruit>
<fruit><apple>150</apple><banana>250</banana></fruit>
Now I want delete all the text in the file except the words in tag apple. That is, the file should contain:
100
150
How can I achive this?
:%s/.*apple>\(.*\)<\/apple.*/\1/
That should do what you need. Worked for me.
Basically just grabbing everything up to and including the tag, then backreferences everything between the apple begin and end tag, and matches to the rest of the line. Replaces it with the first backreference, which was the stuff between the apple tags.
I personally use this:
%s;.*<apple>\(\d*\)</apple>.*;\1;
Since the text contain '/' which is the default seperator,and by using ';' as sep makes the code clearer.
And I found that non-greedy match #Conspicuous Compiler mentioned should be
\{-}
instead of "{-}" in Vim.
However, I after change Conspicuous' solution to
%s/.*apple>(.\{-\})<\/apple.*/\1^M/g
my Vim said it can't find the pattern.
In this case, one can use the general technique for collecting pattern matches
explained in my answer to the question "How to extract regex matches
using Vim".
In order to collect and store all of the matches in a list, run the Ex command
:let t=[] | %s/<apple>\(.\{-}\)<\/apple>\zs/\=add(t,submatch(1))[1:0]/g
The command purposely does not change the buffer's contents, only collects the
matched text. To set the contents of the current buffer to the
newline-separated list of matches, use the command
:0pu=t | +,$d_

Resources