I have a text file which contains word like this
137.147.138.224|write|write|Australia
137.154.4.3|United States
And I want to find
137.154.4.3|United States
There may be anything in place of 137.154.4.3|United States like 155.186.7.9|India , 185.173.4.7|JapanSo i have long list of words like that and i just wanted to find the words contains only one vertical bar |
To find the lines which have an IP, a | and then a country, you can use this regex pattern:
\d+\.\d+\.\d+\.\d+\|[^|]+$
\d+\.\d+\.\d+\.\d+ # digits (1 or more) and dots
\| # string literal
[^|]+ # 1 or more characters that are not '|'
$ # end of line
Demo
Related
Is there a way to remove lines that contain three specific characters?
For example the characters should be U S E
So for these lines, it should just remove USER AND UDES:
USER
USAD
UDES
Thanks
Ctrl+H
Find what: ^(?=.*U)(?=.*S)(?=.*E).+\R?
Replace with: LEAVE EMPTY
TICK Match case
TICK Wrap around
SELECT Regular expression
UNTICK . matches newline
Replace all
Explanation:
^ # beginning of line
(?= # positive lookahead, make sure we have after:
.* # 0 or more any character but newline
U # letter uppercase U
) # end lookahead
(?=.*S) # same as above for letter S
(?=.*E) # same as above for letter E
.+ # 1 or more any character but newline
\R? # any kind of linebreak, optional
Screenshot (before):
Screenshot (after):
I have a file which is as following
!J INCé0001438823
#1 A LIFESAFER HOLDINGS, INC.é0001509607
#1 ARIZONA DISCOUNT PROPERTIES LLCé0001457512
#1 PAINTBALL CORPé0001433777
$ LLCé0001427189
$AVY, INC.é0001655250
& S MEDIA GROUP LLCé0001447162
I just want to keep the last 10 characters of each line so that it becomes as following:-
0001438823
0001509607
0001457512
0001433777
0001427189
0001655250
:%s/.*\(.\{10\}\)/\1
: ex-commaned
% entire file
s/ substitute
.* anything (greedy)
. followed by any character
\{10\} exactly 10 of them
\( \) put them in a match group
/ replace with
\1 said match group
I would treat this as a shell script problem. Enter the following in vim:
:%! rev|cut -c1-10|rev
The :%! will pipe the entire buffer through the following filter, and then the filter comes straight from here.
for a single line you could use:
$9hd0
$ go to end of line
9h go 9 characters left
d0 delete to beginning of line
Assuming the é character appears only once in a line, and only before your target ten digits, then this would seem to work:
:% s/^.*é//
: command
% all lines
s/ / substitute (i.e., search-and-replace) the stuff between / and /
^ search from beginning of line,
. including any character (wildcard),
* any number of the preceding character,
é finding "é";
// replace with the stuff between / and / (i.e., nothing)
Note that you can type the é character by using ctrl-k e' (control-k, then e, then apostrophe, without spaces). On my system at least, this works in insert mode and when typing the "substitute" command. (To see the list of characters you can invoke with the ctrl-k "digraph" feature, use :dig or :digraph.
line A
foo bar bar foo bar foo
line B
foo bar bar foo
In line A, there are multiple occurrence of double space.
I only want to match lines like line B which has only once double space occurrence.
I tried
^.*\s{2}.*$
but it will match both.
How may I have the desired output? Thank you.
If you wish to match strings that contain no more than one string of two or more spaces between words you could use following regular expression.
r'^(?!(?:.*(?<! ) {2,}(?! )){2})'
Start your engine!
Note that this expression matches
abc de fgh
where there are four spaces between 'c' and 'd'.
Python's regex engine performs the following operations.
^
(?! : begin negative lookahead
(?: : begin non-capture group
.* : match 0+ characters other than line terminators
(?<! : begin negative lookbehind
[ ]{2,} : match 2+ spaces
(?! ) : negative lookahead asserts match is not followed by a space
) : end negative lookbehind
) : end non-capture group
{2} : execute non-capture group twice
) : end negative lookahead
You can do:
^(?!.*[ \t]{2,}.*[ \t]{2,})
# Negative look ahead assertion that states 'only start the match
# on this line IF there are NOT 2 (or potentially more) breaks with
# two (or potentially more) of tabs or spaces'.
Demo 1
If you want to require ONE double space in the line but not more:
^(?=.*[ \t]{2,})(?!.*[ \t]{2,}.*[ \t]{2,})
# Positive look ahead that states 'only start this match if there is
# at least one break with two tabs or spaces'
# BUT
# Negative look ahead assertion that states 'only start the match
# on this line IF there are NOT 2 (or potentially more) breaks with
# two (or potentially more) of tabs or spaces'.
Demo 2
If you want to limit to only two spaces (not tabs and not more than 2 spaces):
^(?=.*[ ]{2})(?!.*[ ]{2}.*[ ]{2})
# Same as above but remove the tabs as part of the assertion
Demo 3
Note: In your regex you have \s as the class for a space. That also matches [\r\n\t\f\v ] so both horizontal and vertical space characters.
Note 2:
You can do this without a regex as well (assuming you only want lines that have 1 and only 1 double space in them):
txt='''\
line A
foo bar bar foo bar foo
line B
foo bar bar foo'''
>>> [line for line in txt.splitlines() if len(line.split(' '))==2]
['foo bar bar foo']
You can get the match without lookarounds by starting the match with 1+ non whitespace chars.
Then optionally repeat a single whitespace char followed by non whitespace chars before and after matching a double whitespace char.
The negated character class [^\S\r\n] will match any whitespace chars except a newline or carriage return. If you want to allow matching newlines as well, you could use \s
^\S+(?:[^\S\r\n]\S+)*[^\S\r\n]{2}(?:\S+[^\S\r\n])*\S+$
Explanation
^ Start of string
\S+ Match 1+ non whitespace chars
(?: Non capture group
[^\S\r\n]\S+ Match a whitespace char without a newline
)* Close group and repeat 0+ times
[^\S\r\n]{2} Match the 2 whitespace chars without a newline
(?: Non capture group
\S+[^\S\r\n] Match 1+ non whitespace chars followed by a whitespace char without a newline
)* Close group a and repeat 1+ times
\S+ Match 1+ non whitespace chars
$ End of string
Regex demo
I want to search a String and modify it
I build a String like that:
|Bor.Team1-FCTeam2|
but also:
|FCTeam2-Bor.Team1|
or as pattern
|Text–Text|
I want to change the Name of Team1, it can have different Names eg:
Bor.Team1
B.Team1
BTeam091
...
I want everytime the same name -> B.Team1
Team is everytime IN the different diction!
I play arround with sed 's/\bBor.Team1\b/B.Team1/g', but I must find and know every different diction.
That shoudl do what you need:
echo $string | sed -e 's/[a-zA-Z\.]\+Team\([0-9]\+\)/B.Team\1/g'
It searches for alphabetic characters and also a dot (.) followed by the word Team, followed by a number ([0-9]). That all will be replaced by whatever you want. In the example it's B.Team followed by the found number (\1).
find and replace any something Team1 d replace the whole by B.Team1 and keep counter part the same in each | | surrounded content
sed -e 's/\|[^-|]*Team1[^-|]*-\([^|]*\)|/|B.Team1-\1|/g
s/\|\([^-|]\)-[^|]*Team1[^|]*|/|\1-B.Team1|/g' YourFile
ex:
|bla.team1.corp-other| -> |B.team1-other|
|ot.her-foolteam1| -> |ot.her-B.team1|
I want to do two things:
1) count the number of times a given word appears in a text file
2) print out the context of that word
This is the code I am currently using:
my $word_delimiter = qr{
[^[:alnum:][:space:]]*
(?: [[:space:]]+ | -- | , | \. | \t | ^ )
[^[:alnum:]]*
}x;
my $word = "hello";
my $count = 0;
#
# here, a file's contents are loaded into $lines, code not shown
#
$lines =~ s/\R/ /g; # replace all line breaks with blanks (cannot just erase them, because this might connect words that should not be connected)
$lines =~ s/\s+/ /g; # replace all multiple whitespaces (incl. blanks, tabs, newlines) with single blanks
$lines = " ".$lines." "; # add a blank at beginning and end to ensure that first and last word can be found by regex pattern below
while ($lines =~ m/$word_delimiter$word$word_delimiter/g ) {
++$count;
# here, I would like to print the word with some context around it (i.e. a few words before and after it)
}
Three problems:
1) Is my $word_delimiter pattern catching all reasonable characters I can expect to separate words? Of course, I would not want to separate hyphenated words, etc. [Note: I am using UTF-8 throughout but only English and German text; and I understand what reasonably separates a word might be a matter of judgment]
2) When the file to be analzed contains text like "goodbye hello hello goodbye", the counter is incremented only once, because the regex only matches the first occurence of " hello ". After all, the second time it could find "hello", it is not preceeded by another whitespace. Any ideas on how to catch the second occurence, too? Should I maybe somehow reset pos()?
3) How to (reasonably efficiently) print out a few words before and after any matched word?
Thanks!
1. Is my $word_delimiter pattern catching all reasonable characters I can expect to separate words?
Word characters are denoted by the character class \w. It also matches digits and characters from non-roman scripts.
\W represents the negated sense (non-word characters).
\b represents a word boundary and has zero-length.
Using these already available character classes should suffice.
2. Any ideas on how to catch the second occurence, too?
Use zero-length word boundaries.
while ( $lines =~ /\b$word\b/g ) {
++$count;
}