Mark words in notepad++ including dash (-) - search

I would like to mark in Notepad++ the sql scripts in a text log. The sql files have this format in the text:
AAAAAAAA.BBBBBBBBBBB.sql
So what I execute is this sentence in search menu:
\w*.sql
As I should get BBBBBBBBBBB.sql. The point is that in some script names there are dashes (-), and when that happens I dont get the whole name, but just the end after the last dash.
For example, in:
AAAAAAAA.BBBBB-CCCCCCC.sql
I would like to get BBBBB-CCCCCCC.sql, but I just get CCCCCCC.sql
Is there any possible formula to get them?

If the match can not start and end with a hyphen:
\w+(?:-\w+)*\.sql
\w+ Match 1+ word characters
(?:-\w+)* Optionally match - and 1+ word characters
\.sql Match .sql
See a regex demo.
Note that in your pattern the \w* can also match 0 occurrences and that the . can match any character if it is not escaped.
Another option could be using a character class to match either - or a word character, but this would also allow to mix and match like --a--.sql
[\w-]+\.sql
See another regex demo.

Related

What do you understand by this RegEx?

I´m working with VBA and trying to split a string into three columns, almost all strings are like Company Name 3567782 Agent Name.pdf
With this pattern I want to match all the text before a space and digits (1st group), the digits (2nd group) and all the text after the space and before the .pdf (3rd group).
strPattern = "^(.+)\n(\d{4,10})\n(.+).pdf"
I recall spaces in python are \s but saw in VBA are \n.
Can you help me find the right pattern for what I´m looking for?
As I put in my comment, I use the https://regex101.com site. There are others but I find this one the most helpful to me.
When I put in your regex
^(.+)\n(\d{4,10})\n(.+).pdf
and test string
Company Name 3567782 Agent Name.pdf
the first thing I notice is that the regex does not match the test string (see right side under MATCH INFORMATION).
Here are a couple things that I saw:
\n is newline, not space. In regex, space is " ".
Your last "." in ".pdf" is not registering as a literal period, it's a token that matches any character. To match a literal period, you need \.
If we change those two things it returns three groups that seem to match what you are looking for.
^(.+) (\d{4,10}) (.+)\.pdf
It looks like for the digits, you are looking for between 4 and 10 digits. If that's correct, it looks like your regex is good. You could put in a handful of example strings into the TEST STRING area and make sure that it works in all cases.
I'd use either of these:
(?:(?:([a-zA-Z]+\.?)|(\d+)))
capture a-Z greedy with a possible . to allow for the .pdf or capture digits
this version excludes the space [ ] or \s
or keep the search structured so you can control what goes in and out of each column
^(\w+\s\w+)|(\d+)|(\w+\s\w+\.\w+$)
\b or ^ - word boundary or start of string
(\w+\s\w+) - 1st capture \w+ - any alpha numeric char greedily, followed by 1 x space (use \s* or \s+ for more), followed again by alpha numeric greedily
|(\d+) - alteration - \d+ - capture just digits
`|(\w+\s\w+.\w+$) - similar to 1st group but allows for the '.' of pdf and bounds to the end of string (\G or $).
you could optionally build the '.' into the 1st group like my top answer, but for neatness and better control I prefer the 2nd.

Vim search and replace to amend a string with common structure but varying words

I'm fairly new to Vim and I haven't been able to find on this site how to search and replace with a varying part of a string. I need to apply a global edit to all times "SetTag("...")" appears with ... being any word. My edit is to add one more word after the second quotation mark. example: SetTag("err" + __LINE__ with the bolded part being what I need to add. Can anyone let me know how this is possible with a vim search command? Thanks!
nb: I assume "word" is any sequence of characters other than a doublequote character. Modify as needed.
:%s/SetTag("\([^"]*\)")/SetTag("\1" + __LINE__)/
the escaped parentheses grab the sub-match; the \1 in the replacement string is replaced by that sub-match.

^[:blank:] does not match dot in sed

I have an input as follows:
INa.aa................... October 2010 after its previous U.S.-based owners failed to pay debts
My goal is to put brackets around every word starting with letter i/I. So I issued a command:
sed 's/\<i[^[:blank:]]*\>/(&)/gi' input_data
Which returned this output:
(INa.aa)................... October 2010 after (its) previous U.S.-based owners failed to pay debts
What I don't get is, why doesn't the ^[:blank:]* also include the dots after INa.aa?
Thank you for any suggestions.
You use the \> "end of word" escape. A word boundary is defined as
the character to the left is a "word" character and the character to the right is a "non-word" character, or vice-versa
in the manual (referring to \b). In the case of \>, the "vice-versa" does not apply.
What is a "word" character?
A "word" character is any letter or digit or the underscore character.
And "non-word" are all the others. You expect the boundary between your periods and a blank to match \>, but it doesn't: both the period and the blank are non-word characters. The word boundary is between the last a and the first ..
The period between the as is also surrounded by word boundaries, but because there aren't any blanks involved, it's a part of the match.
If you want to match everything up to the next blank, you can just skip the \> in your regex.

How do I put several characters after the first letter and the last letter by use of Vim?

How do I put several characters after the first letter and the last letter in the whole text by use of Vim?
E.g. I need to put {{c1:: after the first letter and }} after the last letter. Also, I want to ignore two-letter words.
You mean in every word? Try this:
:%s/\<\(\w\)\(\w\w\+\)\>/\1{{c1::\2}}/g
That will replace every first character in a word with the first character followed by {{c1:: and add }} at the end of it. Words shorter than three characters are ignored.
If your words contain more than just [a-zA-Z0-9], then replace \w by a more appropriate character class.

Vim - Insert something between every letter

In vim I have a line of text like this:
abcdef
Now I want to add an underscore or something else between every letter, so this would be the result:
a_b_c_d_e_f
The only way I know of doing this wold be to record a macro like this:
qqa_<esc>lq4#q
Is there a better, easier way to do this?
:%s/\(\p\)\p\#=/\1_/g
The : starts a command.
The % searches the whole document.
The \(\p\) will match and capture a printable symbol. You could replace \p with \w if you only wanted to match characters, for example.
The \p\#= does a lookahead check to make sure that the matched (first) \p is followed by another \p. This second one, i.e., \p\#= does not form part of the match. This is important.
In the replacement part, \1 fills in the matched (first) \p value, and the _ is a literal.
The last flag, g is the standard do them all flag.
If you want to add _ only between letters you can do it like this:
:%s/\a\zs\ze\a/_/g
Replace \a with some other pattern if you want more than ASCII letters.
To understand how this is supposed to work: :help \a, :help \zs, :help \ze.
Here's a quick and a little more interactive way of doing this, all in normal mode.
With the cursor at the beginning of the line, press:
i_<Esc>x to insert and delete the separator character. (We do this for the side effect.)
gp to put the separator back.
., hold it down until the job is done.
Unfortunately we can't use a count with . here, because it would just paste the separator 'count' times on the spot.
Use positive lookahead and substitute:
:%s/\(.\(.\)\#=\)/\1_/g
This will match any character followed by any character except line break.
:%s/../&:/g
This will add ":" after every two characters, for the whole line.
The first two periods signify the number of characters to be skipped.
The "&" (from what I gathered) is interpreted by vim to identify what character is going to be added.
Simply indicate that character right after "&"
"/g" makes the change globally.
I haven't figured out how to exclude the end of the line though, with the result being that the characters inserted get tagged onto the end...so that something like:
"c400ad4db63b"
Becomes "c4:00:ad:4d:b6:3b:"

Resources