Remove both duplicates (original and duplicate) from text notepad++ - text

I need to remove both duplicates like:
admin
user
admin
result:
user
I have tried but none works for notepad++

You have to sort your file before apply this (for example using the plugin TexFX).
Ctrl+H
Find what: ^(.+)(?:\R\1)+
Replace with: NOTHING
check Wrap around
check Regular expression
DO NOT CHECK . matches newline
Replace all
Explanation:
^ : Beginning of line
(.+) : group 1, 1 or more any character but newline
(?: : start non capture group
\R : any kind of linebreak
\1 : content of group 1
)+ : end group, must appear 1 or more times

Related

How to find lines that contain 3 specific characters?

Is there a way to remove lines that contain three specific characters?
For example the characters should be U S E
So for these lines, it should just remove USER AND UDES:
USER
USAD
UDES
Thanks
Ctrl+H
Find what: ^(?=.*U)(?=.*S)(?=.*E).+\R?
Replace with: LEAVE EMPTY
TICK Match case
TICK Wrap around
SELECT Regular expression
UNTICK . matches newline
Replace all
Explanation:
^ # beginning of line
(?= # positive lookahead, make sure we have after:
.* # 0 or more any character but newline
U # letter uppercase U
) # end lookahead
(?=.*S) # same as above for letter S
(?=.*E) # same as above for letter E
.+ # 1 or more any character but newline
\R? # any kind of linebreak, optional
Screenshot (before):
Screenshot (after):

Keep just the last 10 characters of a line

I have a file which is as following
!J INCé0001438823
#1 A LIFESAFER HOLDINGS, INC.é0001509607
#1 ARIZONA DISCOUNT PROPERTIES LLCé0001457512
#1 PAINTBALL CORPé0001433777
$ LLCé0001427189
$AVY, INC.é0001655250
& S MEDIA GROUP LLCé0001447162
I just want to keep the last 10 characters of each line so that it becomes as following:-
0001438823
0001509607
0001457512
0001433777
0001427189
0001655250
:%s/.*\(.\{10\}\)/\1
: ex-commaned
% entire file
s/ substitute
.* anything (greedy)
. followed by any character
\{10\} exactly 10 of them
\( \) put them in a match group
/ replace with
\1 said match group
I would treat this as a shell script problem. Enter the following in vim:
:%! rev|cut -c1-10|rev
The :%! will pipe the entire buffer through the following filter, and then the filter comes straight from here.
for a single line you could use:
$9hd0
$ go to end of line
9h go 9 characters left
d0 delete to beginning of line
Assuming the é character appears only once in a line, and only before your target ten digits, then this would seem to work:
:% s/^.*é//
: command
% all lines
s/ / substitute (i.e., search-and-replace) the stuff between / and /
^ search from beginning of line,
. including any character (wildcard),
* any number of the preceding character,
é finding "é";
// replace with the stuff between / and / (i.e., nothing)
Note that you can type the é character by using ctrl-k e' (control-k, then e, then apostrophe, without spaces). On my system at least, this works in insert mode and when typing the "substitute" command. (To see the list of characters you can invoke with the ctrl-k "digraph" feature, use :dig or :digraph.

how to match only once occurrence of a double space of a line?

line A
foo bar bar foo bar foo
line B
foo bar bar foo
In line A, there are multiple occurrence of double space.
I only want to match lines like line B which has only once double space occurrence.
I tried
^.*\s{2}.*$
but it will match both.
How may I have the desired output? Thank you.
If you wish to match strings that contain no more than one string of two or more spaces between words you could use following regular expression.
r'^(?!(?:.*(?<! ) {2,}(?! )){2})'
Start your engine!
Note that this expression matches
abc de fgh
where there are four spaces between 'c' and 'd'.
Python's regex engine performs the following operations.
^
(?! : begin negative lookahead
(?: : begin non-capture group
.* : match 0+ characters other than line terminators
(?<! : begin negative lookbehind
[ ]{2,} : match 2+ spaces
(?! ) : negative lookahead asserts match is not followed by a space
) : end negative lookbehind
) : end non-capture group
{2} : execute non-capture group twice
) : end negative lookahead
You can do:
^(?!.*[ \t]{2,}.*[ \t]{2,})
# Negative look ahead assertion that states 'only start the match
# on this line IF there are NOT 2 (or potentially more) breaks with
# two (or potentially more) of tabs or spaces'.
Demo 1
If you want to require ONE double space in the line but not more:
^(?=.*[ \t]{2,})(?!.*[ \t]{2,}.*[ \t]{2,})
# Positive look ahead that states 'only start this match if there is
# at least one break with two tabs or spaces'
# BUT
# Negative look ahead assertion that states 'only start the match
# on this line IF there are NOT 2 (or potentially more) breaks with
# two (or potentially more) of tabs or spaces'.
Demo 2
If you want to limit to only two spaces (not tabs and not more than 2 spaces):
^(?=.*[ ]{2})(?!.*[ ]{2}.*[ ]{2})
# Same as above but remove the tabs as part of the assertion
Demo 3
Note: In your regex you have \s as the class for a space. That also matches [\r\n\t\f\v ] so both horizontal and vertical space characters.
Note 2:
You can do this without a regex as well (assuming you only want lines that have 1 and only 1 double space in them):
txt='''\
line A
foo bar bar foo bar foo
line B
foo bar bar foo'''
>>> [line for line in txt.splitlines() if len(line.split(' '))==2]
['foo bar bar foo']
You can get the match without lookarounds by starting the match with 1+ non whitespace chars.
Then optionally repeat a single whitespace char followed by non whitespace chars before and after matching a double whitespace char.
The negated character class [^\S\r\n] will match any whitespace chars except a newline or carriage return. If you want to allow matching newlines as well, you could use \s
^\S+(?:[^\S\r\n]\S+)*[^\S\r\n]{2}(?:\S+[^\S\r\n])*\S+$
Explanation
^ Start of string
\S+ Match 1+ non whitespace chars
(?: Non capture group
[^\S\r\n]\S+ Match a whitespace char without a newline
)* Close group and repeat 0+ times
[^\S\r\n]{2} Match the 2 whitespace chars without a newline
(?: Non capture group
\S+[^\S\r\n] Match 1+ non whitespace chars followed by a whitespace char without a newline
)* Close group a and repeat 1+ times
\S+ Match 1+ non whitespace chars
$ End of string
Regex demo

Notepad++ remove text between string+bracket and another bracket

So, I've seen that you can remove between two characters and remove between two strings but I haven't been able to find a system that works between a string and a character.
I need to remove the numbers between the two brackets in...
provinces= {
923 6862 9794 9904 11751 11846 11882
}
Keep in mind that these files also contains other brackets which are needed. I've looked around for a solution for this but none seem to work :/
Thanks for the help.
This one will do the job:
Ctrl+H
Find what: \b(provinces\s*=\s*\{)[^}]+(\})
Replace with: $1$2
Replace all
Explanation:
\b : a word boundary
( : start group 1
provinces : literally "provinces"
\s* : 0 or more spaces
= : equal sign
\s* : 0 or more spaces
\{ : an open curly bracket, must be escaped because it has special meaning in regex
) : end group 1
[^}]+ : any character that is not a close curly bracket
(\}) : group 2, a close curly bracket, escaped.
Replacement:
$1$2 : group 1 then group 2

Replace C statement with substitute in vim

I would like to use vim's substitute function (:%s) to search and replace a certain pattern of code. For example if I have code similar to the following:
if(!foo)
I would like to replace it with:
if(foo == NULL)
However, foo is just an example. The variable name can be anything.
This is what I came up with for my vim command:
:%s/if(!.*)/if(.* == NULL)/gc
It searches the statements correctly, but it tries to replace it with ".*" instead of the variable that's there (i.e "foo"). Is there a way to do what I am asking with vim?
If not, is there any other editor/tools I can use to help me with modifications like these?
Thanks in advance!
You need to use capture grouping and backreferencing in order to achieve that:
Pattern String sub. flags
|---------| |------------| |-|
:%s/if(!\(.*\))/if(\1 == NULL)/gc
|---| |--|
| ^
|________|
The matched string in pattern will be exactly repeated in string substitution
:help /\(
\(\) A pattern enclosed by escaped parentheses. /\(/\(\) /\)
E.g., "\(^a\)" matches 'a' at the start of a line.
E51 E54 E55 E872 E873
\1 Matches the same string that was matched by /\1 E65
the first sub-expression in \( and \). {not in Vi}
Example: "\([a-z]\).\1" matches "ata", "ehe", "tot", etc.
\2 Like "\1", but uses second sub-expression, /\2
... /\3
\9 Like "\1", but uses ninth sub-expression. /\9
Note: The numbering of groups is done based on which "\(" comes first
in the pattern (going left to right), NOT based on what is matched
first.
You can use
:%s/if(!\(.*\))/if(\1 == NULL)/gc
By putting .* in \( \) you make numbered captured group, which means that the regex will capture what is in .*
When the replace starts then by using \1 you will print the captured group.
A macro is easy in this case, just do the following:
qa .............. starts macro 'a'
f! .............. jumps to next '!'
x ............... erase that
e ............... jump to the end of word
a ............... starts append mode (insert)
== NULL ........ literal == NULL
<ESC> ........... stop insert mode
q ............... stops macro 'a'
:%norm #a ........ apply marco 'a' in the whole file
:g/^if(!/ norm #a apply macro 'a' in the lines starting with if...
Try the following:
%s/if(!\(.\{-}\))/if(\1 == NULL)/gc
The quantifier .\{-} matches a non-empty word, as few as possible (more strict than .*).
The paranthesis \( and \) are used to divide the searched expression into subexpressions, so that you can use those subgroups in the substitute string.
Finally, \1 allows the user to use the first matched subexpression, in our case it is whatever is caught inside the paranthesis.
I hope this is more clear, more information can be found here. And thanks for the comment that suggests improving the answer.

Resources