I would like to find all the number combinaitions without having 3 zero's in between.
There might be some delimiters (max 2 characters) in between the numbers.
I'm using python and I would like to perform this search with the regex.
Accepted numbers
This is number 1234 which should be accepted.
12-45
1 2 0 0 3 4 5
not accepted numbers:
1
12
123
1000
1000-2000
30000-31000
21 000-32 000-50 000
21 00 03 00 00
The regex with which I could come up is:
([\s\-]{0,2}\d(?!000)){4,}
My regex can find all the accepted numbers but it doesn't filter out all the excepted numbers.
See the results in regex
Actually this regex is used in python to remove the matched numbers from the text:
See python code
p.s. Delimiters are not only space but should be at least \s and dash.
p.s.s. The numbers might be in the middle of the string. So I think I cannot use ^ and $ in my regex.
You could assert not 3 zeroes in a row while matching optional delimiters in between.
\b(?![\d\s-]*?0(?:[\s-]*0){2})\d(?:[\s-]*\d){3,}\b
Explanation
\b A word boundary
(?! Negative lookahead, assert what is at the right is not
[\d\s-]*? Match any of a digit, whitespace char or - as least as possible
0(?:[\s-]*0){2} - ) Match a zere followed by 2 times a zero with optional delimiters in between
\d Match a digit
(?:[\s-]*\d){3,} Repeat 3 or more times matching a digit with optional delimiters in between
\b A word boundary
Regex demo
For instance let say I have a text file:
worker1, 0001, company1
worker2, 0002, company2
worker3, 0003, company3
How would I use sed to take the first 2 characters of the first column so "wo" and remove the rest of the text and attach it to the second column so the output would look like this:
wo0001,company1
wo0002,company2
wo0003,company3
$ sed -E 's/^(..)[^,]*, ([^,]*,) /\1\2/' file
wo0001,company1
wo0002,company2
wo0003,company3
s/ begin substitution
^(..) match the first two characters at the beginning of the line, captured in a group
[^,]* match any amount of non-comma characters of the first column
, match a comma and a space character
([^,]*,) match the second field and comma captured in a group (any amount of non-comma characters followed by a comma)
match the next space character
/\1\2/ replace with the first and second capturing group
today I started to use vim. I get confused at :g and :%s commands. So, what is the difference between :g or :%s commands?
:g, short for global, executes a command on all lines that match a regex:
:g/LinesThatMatchThisRegex/ExecuteThisCommand
Example:
:g/hello/d
This will delete (d) all lines that contain hello.
On the other hand, :%s just performs a search (on a regex) and replace throughout the file:
:%s/hello/world/g
The g at the end means global or greedy (this is disputed) so it will replace all occurrences on the line, not just one per line. You can also use the c flag (:%s/hello/world/gc) if you want to confirm each replacement manually.
This command replaces all occurrences of hello with world.
Both the :g and :%s commands support regular expressions.
The s command means substitute and the % means throughout the buffer. So %s means substitute throughout the entire buffer. You can also give a line range:
:10,15s/hello/world/g
This will execute the search and replace seen earlier on only lines 10 to 15 (inclusive).
They are different.
:g can execute commands for matched lines. :s is one of those commands. That is you can combine :g and s
:%s just do search and replace on whole buffer, even though it can do some other things with expression too, but it is not as straightforward as :g.
E.g.:
:g/foo/s/bar/blah/g
this will do bar->blah substitution on lines which contain foo. With :s we could:
:%s/foo/\=substitute(getline('.'), 'bar','blah','g')
so :g is easier.
So if you are dealing with substitution task, usually :s should come up first. If you want to do something like for all lines that matches xxx, I want to delete/join/indent/....... :g maybe helpful for you.
Review:
The ":" mode (e.g. ex-mode) commands in vi or vim have this form:
[Address-specifier] [command] [command-specifics] [cmd-modifiers]
Address can be a single line address (ex-mode operates on "lines"), or a line range.
For instance, a very simple command in "p" which will print the addressed line(s).
:1p - will print line 1.
:5p - will print line 5.
:1,5p - will print lines 1 through 5. 1,5 is an address range.
:7,+3p - will print lines 7 through 10 (7,7+3=10). A relative range.
There are some shorthands in the address space. $, and % are the most popular.
$ means "last line in the file". Thus the expression:
1,$p - will print all lines, 1 to the LAST-line in the file.
The expression 1,$ is so frequently used (e.g. apply the following command to all lines in the file) that it has an even shorter, shorthand, %. % means "1,$"
So:
%p - will print all lines, 1 to the LAST-line in the file, just like 1,$
There is also a special "global" command, whose effect is to supply a set of address prefixing, that is not necessarily a linear range of lines, but is instead determined by a a regular expression match. The ":g/regex/" prefix fits into the "Address specifier" part of the ex-command format (not the command part, which follows it).
It allows specifying a "list" of lines, matched by regular expression rather than "line number", or "range of lines". The matching applies by the regular expression showing up in the line, and then that line is include in the list of lines to which the command will apply.
Application of :1,$s vs %s vs :g/./s
Using the following file as an example:
1: 1
2: 1 2 3 4 5 6 1 2
3: 3 2 1
4: 2 3 1 2
This command, using the global prefix/regex for address, and the "p" print command:
:g/1 2/p - will print
2: 1 2 3 4 5 6 1 2
4: 2 3 1 2
Line 2, and 4 both matched the :g/1 2/ regular expression, and expands effectively into a list of line numbers, with the following command applied to each item in the list. Approximately like this command).
:2p 4p
The substitution command allows substituting a field matching a regular expression, with other text. If we applied the substitution command to our example file, on line 2, we can see its effect.
1: ....
2: 1 2 3 4 5 6 1 2
3: ....
Command:
:2s/1 2/2 1/ will change line 2 to be instead: 2: 2 1 3 4 5 6 1 2
It changes ONLY the first instance of the pattern "1 2" to be "2 1".
If we "undo" this command using "u", we can then run the command again, modified.
We can use the "p" modifier on the command, which for "substitute" does not do much.
It applies the change, but also prints the applied changes at the bottom of the screen (somewhat redundantly in this example).
:2s/1 2/2 1/p
u (to undo), and then we try it again.
We can use the "c" modifier to ask for confirmation.
:2s/1 2/2 1/c The "confirm" modifier for the substitute asks for confirmation on each change.
u (to undo).
The "global" modifier. (Not the global address/regex address operator) can make the substitute command perform multiple substitutions on a line.
:2s/1 2/2 1/g - The "g" here is a modifier to the "s" substitute command.
It means perform the the substitution globally on THE LINE. Modifiers modify "commands", and commands apply to 1 or more lines, as set the address field. The "g" applied to the end of the substitute command means: substitute globally on this line, e.g. every time the regular expression of the substitute command occurs, perform the substitute.
2: 2 1 3 4 5 6 2 1 - Here, both the first and second instance are substituted.
If the substitute command cannot find its match regular expression, then it does nothing. This means it can be applied to a range of lines, and have impacts only on the lines that have at least one match to the substitute command regex.
1,4 s/1 2/2 1/
1,$ s/1 2/2 1/
%s/1 2/2 1/
are all equivalent, and will substitute the FIRST occurrence of the
substitute commands regex match pattern, with the substitute pattern.
1: 1
2: 1 2 3 4 5 6 1 2
3: 3 2 1
4: 2 3 1 2
becomes:
1: 1
2: 2 1 3 4 5 6 1 2
3: 3 2 1
4: 2 3 2 1
Adding "g" to the end gives:
:%s/1 2/2 1/g
1: 1
2: 2 1 3 4 5 6 2 1
3: 3 2 1
4: 2 3 2 1
The g:/regex Prefix
The :g/regex/ Address specifier applies to any command that follows it, and that command can include the substitute command, including with the "g" modifier.
:g/3 4/s/1 2/2 1/g
This command says, "globally match lines with regex /3 4/" and then run command
:s/1 2/2 1/g.
Only line 2 includes the regex /3 4/, so only line 2 is matched. Thus on this file:
:g/3 4/s/1 2/2 1/g is equivalent to:
:2s/1 2/3 4/g, which substitutes all occurrences of 1 2 with 2 1.
1: 1
2: 1 2 3 4 5 6 1 2
3: 3 2 1
4: 2 3 1 2
becomes:
1: 1
2: 2 1 3 4 5 6 2 1
3: 3 2 1
4: 2 3 1 2
Notices that line 4: is unchanged, because it did not have the pattern "3 4" for the Address specifier line match.
:g/regex-line-match/s/match-regex-substitute/sub-pattern/g
:%s/match-regex-substitute/sub-pattern/g
The two lines often can be equivalent in EFFECT. They can often not be equivalent. The equivalence depends on the regex patterns and their matching, and because "substitute" does nothing when a line has no matching match-regex-substitute match pattern.
% = 1,$ which matches all lines, and then applies the substitute pattern.
:g/./ would match every line, if prefixed.
The regex pattern of the "global/regex" prefix if the same as the match-pattern of the substitute would be a lot of extra typing, but would restrict the substitute command to only lines that matched the global/regex. If the global/regex expression truly match every line, such as :g/^.$/, then the global line would have the same effect as %. (Since % would match all lines, and since :g/^.$/ would match all lines, then the "s" would do the same thing in base cases. When using a more typical regular express (that matched some specific string), the :g/regex/ prefix would be different than %. The command "s" would only be applied to lines that first matched the g:/refex/ prefix, instead of to all lines 1,$. The substitute would then try and apply its own "per line" match pattern successfully (and substitute), or find no match on the given line and do nothing.
The place where the global/regex prefix is interesting, is when the global/regex prefix regular expression is different than the substitution match regex pattern. In this case, you apply global/regex FIRST (to determine which lines will then be subject to), the substitute "match-replace-regex" pattern in the substitute command (which can be different). As shown in our example above where we used a global/regex prefix of "3 4", and a substitute match-regex-pattern of "1 2", which is applied SECOND.
VERY ADVANCED:
While global/regex essentially builds a list of lines on which to apply commands, the manner in which that list is built is not the same as the 1,$ or other fixed range specifiers are. Fixed specifiers, are computed, "all at once", at the moment the :[address]command is typed. The global/regex command on the other hand, recomputes its line target after each individual application of its subordinate command.
We will use the "join" command to illustrate the difference.
1: 1
2: 1 2 3 4 5 6 1 2
3: 3 2 1
4: 2 3 1 2
If I specify a range of commands to apply the "join" command to, using range syntax, such as: :1,$j (or :%j) would render:
1: 1 1 2 3 4 5 6 1 2 3 2 1 2 3 1 2
This happens happens because 1,$ selects lines 1,4 at the start, and then applies "j" to every line selected, combining all of the lines of the range.)
But if we instead used the global prefix operator (matching all lines), the application is different:
:g/./j
This will render:
1: 1 1 2 3 4 5 6 1 2
2: 3 2 1 2 3 1 2
The difference occurs because of "how" and "when" the command is applied in each of the two syntax. In the first :%j syntax, all the lines are computed up front, and then "j" is applied to each of those lines.
With the global/regex syntax, the lines and commands are applied on an "as you go", and "from where you are" basis, after EACH application of the command. So the :g/./j command will match LINE1 first, and then runs "j" combining lines 1+2= new-1. It then advances to the "next" line in the file (the new file, new-2), matches that line (/./ matches all) and applies "j" to new-2 (original line3), and new-3 (original 4) to create new-new-2 = 3+4. And then advances to the next line in the "new new file" which is line 3 (but there is no new-new-3, so it stops.) The result is:
1: 1 1 2 3 4 5 6 1 2
2: 3 2 1 2 3 1 2
The key difference is that after application of an instance of the command, the global regex search resumes on the "next" line of the file in existence after the application of the command.
As an earlier poster summed up in far fewer words (but assuming much more knowledge in the reader):
:g/first-search-pattern/s/match-pattern/substitute-pattern/g or /gc for confirm.
SUMMARY:
All of these patterns can be different, the trailing g or gc can be present (all occurences on each line, with or without confirm), or ommitted, (first occurence on each line only). While writing:
:%s/pattern/replace/g is common, the following is nearly equivalent:
:g/./s/pattern/replace/g (less common, but basically the with "substitute" command).
After a copy-paste from Wikipedia into Vim, I get this:
1 A
2
3 [+] Métier agricole<200e> – 44 P • 2 C
4 [×] Métier de l'ameublement<200e> – 10 P
5 [×] Métier de l'animation<200e> – 5 P
6 [+] Métier en rapport avec l'art<200e> – 11 P • 4 C
7 [×] Métier en rapport avec l'automobile<200e> – 10 P
8 [×] Métier de l'aéronautique<200e> – 15 P
The problem is that <200e> is only a char.
I'd like to know how to put it in a search/replace (via the / or :).
Check the help for \%u:
/\%d /\%x /\%o /\%u /\%U E678
\%d123 Matches the character specified with a decimal number. Must be
followed by a non-digit.
\%o40 Matches the character specified with an octal number up to 0377.
Numbers below 040 must be followed by a non-octal digit or a non-digit.
\%x2a Matches the character specified with up to two hexadecimal characters.
\%u20AC Matches the character specified with up to four hexadecimal
characters.
\%U1234abcd Matches the character specified with up to eight hexadecimal
characters.
These are sequences you can use. Looks like you have two bytes, so \%u200e
should match it. Anyway, it's pretty strange. 20 in UTF-8 / ASCII is the space
character, and 0e is ^N. Check your encoding settings.
replace ^#
:%s/\%x00//g
replace ^L
// Enter the ^L using ctrl-V ctrl-L
:%s/^L//g
refers:
gvim - How to remove this symbol "^#" with vim? - Super User
vim - Deleting form feed ^L characters - Stack Overflow
If you want to quickly select this extraneous character everywhere and replace it / get rid of it, you could:
isolate one of the strange characters by adding a space before and after it, so it becomes a "word"
use the * command to search for the word under the cursor. If you have set hlsearch on, you should then see all of the occurrences of the extraneous character highlighted.
replace last searched item by something else, globally:
:%s//something else/
How to match a tab only when it is between two numbers?
Sample script
209.65834 27.23204908
119.37987 15.03317082
74.240635 8.30561924
29.1014 0
931.8861 -100.00000
-16.03784 -8.30562
;
_mirror
l
;
29.1014 0
1028.10 0.00
n
_spline
935.4875 250
924.2026913 269.8820375
912.9178825 277.4506484
890.348265 287.3181854
(in the above script, the tabs are between the numbers, not the spaces) (blank lines are significant; there is nothing in them, but I can't lose them)
I wish to get a "," between the numbers. Tried with :%s/\t/\,/ but that will touch the empty lines too, and the end of lines.
Try this:
:%s/\(\d\)\t\(-\?\d\)/\1,\2/
\d matches any digit. -? means "an optional -. The pair of (escaped) parenthesis capture the match, and \1 refers to the first captured match, \2 refers to the second.
google://vim+regex -> http://vimregex.com/ ->
:%s/\([0-9]\)\t\([0-9]\)/\1,\2/gc
You have 2 groups of numbers here ([0-9]) and tab-symbols \t between them. Add some escape symbols and you have the answer.
g for multichange in single line, c for some asking.
\1 and \2 are matching groups (numbers in your case).
It's not really hard to find answer for questions like that by yourself.
try
:%s/\([0-9]\)\t\([0-9]\)/\1,\2/g
explanation - search the patten <digit>\t<digit> and remember the part that matches <digit> .
\( ... \) captures and remembers the part that matches.
\1 recalls the first captured digit, \2 the second captured digit.
so if the match was on 123\t789, <digit>,<digit> matches 3\t7
the 3 and 7 are rememberd as \1 and \2
or
:g/[0-9]/ s/\t/,/g
explanation - filter all lines with a digit, then substitute tabs with a comma on those lines