I need to build a regular expression for strings in which the +/- characters cannot stand side by side (must be separated by some other). I got this option: (a*(+|-)a)* , where a is any character, * is the Kleene closure, () is for clarity. But this expression does not recognize lines of the form: "+", "-","+ a-" etc. Maybe someone will be able to move me from the dead point. I need regularity to build a finite automaton.
That might do:
^(\+(?![\+-])|-(?![\+-])|[^\+-])*?$
It bounds everything to the single line: ^ to $. The lazy quantifier ()*? ensures that no more then one line is going to be recognized.
The 3 concatenations inside the parentheses are as follows:
\+(?![\+-]) if the character is + the next one must not be + or -
-(?![\+-]) if the character is - the next one must not be + or -
the previous two have the second character only looked-ahead, and could be combined into one concatenation: [\+-](?![\+-]).
[^\+-] any character that is not + and -
However, you must know that a regex is more powerful than a regular expression. You need a regular grammar more than a regular expression:
S = +T
S = -T
S = #S
S = ε
T = #S
T = ε
This grammar is right-regular, and # is any character that is not + nor -. ε is epsilon = nothing.
Here is the deterministic finite automaton (where P=+ M=- #=not + and not -):
/---- # ----\
| |
| v
(S = start/final)--- P,M -->(T = final)---- P,M --->(error)
^ |
| |
\----------- # ------------/
Related
I am looking to extract only chars from the given string. but my query is doing exactly opposite
s= "A man, a plan, a canal: Panama"
newS = ''.join(re.findall("[^a-zA-Z]*", s))
print(newS) // my o/p: , , :
expected o/p string is:
"A man a plan a canal Panama"
Your regular expression is inverting the match - that's what the caret symbol (^) does inside square brackets (negated character class). You first need to remove that.
Next, you should be matching a sequence of one or more characters (+) rather than zero or more characters (*) -- using * will match the empty string, which you don't want in this case.
Finally your join should join with a space to get the intended output, rather than an empty string -- which won't retain the spaces between the words.
newS = ' '.join(re.findall(r'[a-zA-Z]+', s))
Though not essential in this case, its advised to use raw strings for regular expressions (r). More in this post.
Full working code:
import re
s = 'A man, a plan, a canal: Panama'
newS = ' '.join(re.findall(r'[a-zA-Z]+', s))
print(newS)
From this code intended to convert a balanced ternary representation to a Haskell Integer:
frombal3 :: String -> Integer
frombal3 "+" = 1
frombal3 "0" = 0
frombal3 "-" = -1
frombal3 current:therest = \
(*) frombal3 current (^) 3 length therest \
+ frombal3 therest
I got the error:
main.hs:7:3: error: parse error on input ‘+’
|
7 | + frombal3 therest
| ^
<interactive>:3:1: error:
• Variable not in scope: main
• Perhaps you meant ‘min’ (imported from Prelude)
It is not clear what you are trying to achieve, but I can see some mistakes that can be already pointed out.
Problems
You don't need \ to continue a line, that's only needed inside strings. Indentation is enough in Haskell
You need to wrap your pattern matching with parenthesis: (current:therest). Furthermore, this pattern will make current a Char and not a String, so you cannot directly pass it to your function that takes a String.
You need to wrap your function arguments as well: if you want to multiply frombal3 current by 3, you need (*) (frombal3 current) 3, or the much better frombal3 current * 3. Infix functions have higher precedence and make the code more clear.
Suggestions
I am not sure what you want to achieve, but this looks like somthing that can be done with a fold or simple list comprehension
Don't use backslashes, and remember to properly bracket pattern matches:
frombal3 :: String -> Integer
frombal3 "+" = 1
frombal3 "0" = 0
frombal3 "-" = -1
frombal3 (current:therest) = -- ^ Note brackets
(*) frombal3 current (^) 3 length therest
+ frombal3 therest
This still causes a problem due to how you're using operators, but I think you can solve this on your own, especially since I can't work out what you're trying to do here.
You appear to be trying to use backslashes to continue onto the next line; don't do that. If you just delete all the backslashes, the error will go away. (You'll get several other errors, but this particular one will go away.)
Haskell uses indentation to detect where one part ends and the next begins. You don't need to manually add backslashes to the end of each line to continue an expression.
What is an efficient way in MATLAB to replace/insert one symbol (in series of symbols) with several others that correspond to the one that is being replaced?
For example, consider having a string Eq: Eq = 'A*exp(-((x-xc)/w)^2)'. Is there a way to replace * with .*, / with ./,\ with .\, and ^ with .^ without writing four separate strrep() lines?
Regular expressions will do the job nicely. Regular expressions simply find patterns in text. You specify what kind of pattern you are looking for by a regular expression, and the output gives you the locations of where the pattern occurred.
For our particular case, not only do we want to find where patterns occur, we also want to replace those patterns with something else. Specifically, use the function regexprep from MATLAB to replace matches in a string with something else. What you want to do is replace all *, /, \ and ^ symbols by adding a . in front of each.
How regexprep works is that the first input is the string you're looking at, the second input is a pattern that you're trying to find. In our case, we want to find any of *, /, \ and ^. To specify this pattern, you put those desired symbols in [] brackets. Regular expressions reserve \ as a special symbol to delineate characters that can be parsed as a regular expression but actually aren't. As such, you need to use \\ for the \ character and \^ for the ^ character. The third input is what you want to replace each match with. In our case, we simply want to reuse each matched character, but we add a . at the beginning of the match. This is done by doing \.$0 in the regular expression syntax. $0 means to grab the first token produced by a match... which is essentially the matched symbol from the pattern. . is also a reserved keyword using regular expressions, so we must prepend this symbol with a \ character.
Without further ado:
>> Eq = 'A*exp(-((x-xc)/w)^2)';
>> out = regexprep(Eq, '[*/\\\^]', '\.$0')
out =
A.*exp(-((x-xc)./w).^2)
The pattern we are looking for is [*/\\\^], which means that we want to find any of *, /, \ - denoted as \\ in regex, and \^ - denoted as ^ in regex. We want to find any of these symbols and replace them with the same symbol by adding a . character in front - \.$0.
As a more complicated example, let's make sure that we include all of the symbols you're looking for in a sample equation:
>> A = 'A*exp(-((x-xc)/w)^2) \ b^2';
>> out = regexprep(A, '[*/\\\^]', '\.$0')
out =
A.*exp(-((x-xc)./w).^2) .\ b.^2
I'd go with regexp as in rayryeng's answer. But here's another approach, just to provide an alternative.
ops = '*/\^'; %// operators that need a dot
ii = find(ismember(Eq, ops)); %// find where dots should be inserted
[~, jj] = sort([1:numel(Eq) ii-.5]); %// will be used to properly order the result
result = [Eq repmat('.',1,numel(ii))]; %// insert dots at the end
result = result(jj); %// properly order the result
And a variant:
ops = '*/\^'; %// operators that need a dot
ii = find(ismember(Eq, ops)); %// find where dots should be inserted
jj = sort([1:numel(Eq) ii-.5]); %// dot locations are marked with fractional part
result = Eq(ceil(jj)); %// repeat characters where the dots will be placed
result(mod(jj,1)>0) = '.'; %// place dots at indices with fractional part
The vectorize function already does almost all of what you want except that it does not convert mldivide (\) to ldivide (.\).
By "efficient," do you mean fewer lines of code or faster? Regular expressions are almost always slower than other approaches and less readable. I don't think they're necessary or a good choice in this case. If you only need to convert your string once, then speed is less of a concern than readability (strrep will still be faster). If you need to do it many times, this simple code that you alluded to is 4–5 times faster than regexrep for short strings like your example (and much faster for longer strings):
out = strrep(Eq,'*','.*');
out = strrep(out,'/','./');
out = strrep(out,'\','.\');
out = strrep(out,'^','.^');
If you want one line, use:
out = strrep(strrep(strrep(strrep(Eq,'*','.*'),'/','./'),'\','.\'),'^','.^');
which will also be slightly faster still. Or create your own version of vectorize and call that.
Where regular expressions shine is in more complex cases, e.g., if your string is already partially vectorized: Eq = 'A.*exp(-((x-xc)/w)^2)'. Even still, the vectorize function just uses strrep and then calls strfind to "remove any possible '..*', '../', etc." and replace them with the proper element-wise operators because it's faster (symbolic math strings can get very large, for example).
I have to make a difficult word processing. How can I change dynamically as the following example?
Example: /hello/ baby /deneme/ /hello2/
Output: (/hello/) baby (/deneme/) (/hello2/)
This is a pretty rudimentary solution, but it works for the case you've given (SQL Fiddle here):
SELECT
in_str,
(
-- If the string starts with '/', prepend '('
CASE WHEN in_str LIKE '/%' THEN '(' ELSE '' END
-- Replace / after a space with (/
+ REPLACE(
-- Replace / before by a space with /)
REPLACE( in_str, ' /', ' (/' ),
'/ ', '/) '
)
-- If the string ends with '/', append ')'
+ CASE WHEN in_str LIKE '%/' THEN ')' ELSE '' END
) AS out_str
FROM table1;
If table1 has the following in_str values this will give you the corresponding out_str values:
in_str out_str
------------------------ ------------------------------
/one/ two /three/ /four/ (/one/) two (/three/) (/four/)
one /two/ /three/ one (/two/) (/three/)
/one/ /two/ three (/one/) (/two/) three
//one / // two/ / (//one (/) (//) two/) (/)
I've included the last one to demonstrate some edge cases. Also note that this only handles / characters immediately followed by a space or the beginning or end of the string. Other whitespace characters like newlines and tabs aren't handled. For example, if you had a string like this (where ⏎ indicates a newline and ⇒ a tab):
/one/⇒/two/⏎
/three/⏎
...the output you would get is this:
(/one/⇒/two/⏎
/three/⏎
You could handle these scenarios with additional REPLACE functions, but that's a rabbit hole you'll have to jump down yourself.
Is there a way in Vim in which I could navigate to the next differing indent level?
So from here to there for example:
-> var a = 1;
var b = 2;
var func = function(){
-> return a + b;
}
This should work for indents made up of spaces (not tabs):
call search('^ \{0,'.eval(indent(".")-1).'}\S\|^ \{'.eval(indent(".")+1).',}\S')
This is made up of two regular expressions:
^ \{0,'.eval(indent(".")-1).'}\S matches a smaller indent, using the \{n,m} construction matching from n to m of the preceding space.
^ \{'.eval(indent(".")+1).',}\S' matches a larger indent, using the \{n,} construction matching at least n of the preceding space.
The regexes are sandwiched between ^ and \S to apply only to the leading whitespace on the line. Then they are joined by the \| ('OR') operator.
Of course the search() call could be mapped to a key combination for convenience.
EDIT
Chris Johnsen points out that the calls to eval() are superfluous, so the command can be reduced to this:
call search('^ \{0,'.(indent(".")-1).'}\S\|^ \{'.(indent(".")+1).',}\S')