I am reading this tutorial, and I encountered that bash script uses [...] as a wild card character. So what exactly [...] stands in a bash script?
It's a regex-style character matching syntax; from the Bash Reference Manual, §3.5.8.1 (Pattern Matching):
[...]
Matches any one of the enclosed characters. A pair of characters separated by a hyphen denotes a range expression; any character that sorts between those two characters, inclusive, using the current locale's collating sequence and character set, is matched. If the first character following the ‘[’ is a ‘!’ or a ‘^’ then any character not enclosed is matched. A ‘−’ may be matched by including it as the first or last character in the set. A ‘]’ may be matched by including it as the first character in the set. The sorting order of characters in range expressions is determined by the current locale and the value of the LC_COLLATE shell variable, if set.
For example, in the default C locale, ‘[a-dx-z]’ is equivalent to ‘[abcdxyz]’. Many locales sort characters in dictionary order, and in these locales ‘[a-dx-z]’ is typically not equivalent to ‘[abcdxyz]’; it might be equivalent to ‘[aBbCcDdxXyYz]’, for example. To obtain the traditional interpretation of ranges in bracket expressions, you can force the use of the C locale by setting the LC_COLLATE or LC_ALL environment variable to the value ‘C’.
Within ‘[’ and ‘]’, character classes can be specified using the syntax [:class:], where class is one of the following classes defined in the posix standard:
alnum alpha ascii blank cntrl digit graph lower
print punct space upper word xdigit
A character class matches any character belonging to that class. The word character class matches letters, digits, and the character ‘_’.
Within ‘[’ and ‘]’, an equivalence class can be specified using the syntax [=c=], which matches all characters with the same collation weight (as defined by the current locale) as the character c.
Within ‘[’ and ‘]’, the syntax [.symbol.] matches the collating symbol symbol.
(emphasis added to the most common usage patterns)
It is used in the tutorial to speak about regular expressions in addition to globbing ('*' and '?'). For example [a-z] regular expression will match one lowercase character.
Actually, what is a wildcard is [abc] for example. It matches one of the three letters.
Related
I want to replace a line, that represents a part of mathematical equation:
f(x,z,time,temp)=-(2.0)/(exp(128*((x-2.5*time)*(x-2.5*time)+(z-0.2)*(z-0.2))))+(
with a new one similar to the above. Both new and old lines are saved in bash variables.
Main problem is that mathematical equation is full with special characters that do not allow proper search and replace in bash mode, even when I used as delimiter special character that is not used in equation.
I used
sed -n "s|$OLD|$NEW|g" restart.k
and
sed -i "s|$OLD|$NEW|g" restart.k
but all times I get wrong results.
Any idea to solve this?
There is only * in your pattern here that is special for sed, so escape it and do replacement as usual:
sed "s:$(sed 's:[*]:\\&:g' <<<"$old"):$new:" infile
if there are more special characters in your real sample, then you will need to add them inside bracket []; there are some exceptions like:
if ^ character: it can be place anywhere in [] but not first character, because ^ character at first negates the characters within its bracket expression.
if ] character: it should be the first character, because this character is also used to end the bracket expression.
if - character: it should be the first or last character, because this character is also can be used for defining the range of characters too.
I have the following string in the code at multiple places,
m_cells->a[ Id ]
and I want to replace it with
c(Id)
where the string Id could be anything including numbers also.
A regular expression replace like below should do:
%s/m_cells->a\[\s\(\w\+\)\s\]/c(\1)/g
If you wish to apply the replacement operation on a number of files you could use the :bufdo command.
Full explanation of #BasBossink's answer (as a separate answer because this won't fit in a comment), because regexes are awesome but non-trivial and definitely worth learning:
In Command mode (ie. type : from Normal mode), s/search_term/replacement/ will replace the first occurrence of 'search_term' with 'replacement' on the current line.
The % before the s tells vim to perform the operation on all lines in the document. Any range specification is valid here, eg. 5,10 for lines 5-10.
The g after the last / performs the operation "globally" - all occurrences of 'search_term' on the line or lines, not just the first occurrence.
The "m_cells->a" part of the search term is a literal match. Then it gets interesting.
Many characters have special meaning in a regex, and if you want to use the character literally, without the special meaning, then you have to "escape" it, by putting a \ in front.
Thus \[ and \] match the literal '[' and ']' characters.
Then we have the opposite case: literal characters that we want to treat as special regex entities.
\s matches white*s*pace (space, tab, etc.).
\w matches "*w*ord" characters (letters, digits, and underscore _).
(. matches any character (except a newline). \d matches digits. There are more...)
If a character is not followed by a quantifier, then exactly one such character matches. Thus, \s will match one space or tab, but not fewer or more.
\+ is a quantifier, and means "one or more". (\? matches 0 or 1; * (with no backslash) matches any number: zero or more. Warning: matching on zero occurrences takes a little getting used to; when you're first learning regexes, you don't always get the results you expected. It's also possible to match on an arbitrary exact number or range of occurrences, but I won't get into that here.)
\( and \) work together to form a "capturing group". This means that we don't just want to match on these characters, we also want to remember them specially so that we can do something with them later. You can have any number of capturing groups, and they can be nested too. You can refer to them later by number, starting at 1 (not 0). Just start counting (escaped) left-parantheses from the left to determine the number.
So here, we are matching a space followed by a group (which we will capture) of at least one "word" character followed by a space, within the square brackets.
Then section between the second and third / is the replacement text.
The "c" is literal.
\1 means the first captured group, which in this case will be the "Id".
In summary, we are finding text that matches the given description, capturing part of it, and replacing the entire match with the replacement text that we have constructed.
Perhaps a final suggestion: c after the final / (doesn't matter whether it comes before or after the 'g') enables *c*onfirmation: vim will highlight the characters to be replaced and will show the replacement text and ask whether you want to go ahead. Great for learning.
Yes, regexes are complicated, but super powerful and well worth learning. Once you have them internalized, they're actually fairly easy. I suggest that, as with learning vim itself, you start with the basics, get fluent in them, and then incrementally add new features to your repertoire.
Good luck and have fun.
Are there any rules for unix/linux shell variable naming?
For example, like the common rules for Java variable naming.
You have to be very careful not to use any UNIX command as a variable. It will mess the code and produce unexpected results. Also, keep in mind the reserved words (if, else, elif, do, done...) and that uppercase variables are reserved for system use.
From Rules for Naming variable name:
Variable name must begin with alphanumeric alpha character or underscore
character (_), followed by one or more alphanumeric or underscore
characters. Valid shell variable examples
Or as seen in The Open Group Base Specifications Issue 7:
In the shell command language, a word consisting solely of
underscores, digits, and alphabetics from the portable character set.
The first character of a name is not a digit.
I messed something up. In my xml, each non preferred term has a preferred term to use:
Something I have done has created some non preffered terms where the preferred term to use is the exact same name as this non preferred term.
<term>
<termId>127699289611384833453kNgWuDxZEK37Lo4QVWZ</termId>
<termUpdate>Add</termUpdate>
<termName>Adenosquamous Carcinoma</termName>
<termType>Nd</termType>
<termStatus>Active</termStatus>
<termApproval>Approved</termApproval>
<termCreatedDate>20110704T09:41:31</termCreatedDatae>
<termCreatedBy>admin</termCreatedBy>
<termModifiedDate>20110704T09:45:17</termModifiedDate>
<termModifiedBy>admin</termModifiedBy>
<relation>
<relationType>USE</relationType>
<termId>1276992897N1537166632rbr7BISWAI93SarY118G</termId>
<termName>Adenosquamous Carcinoma</termName>
</relation>
Is there a text editor with a find and replace function I can use to tell it that if the in =the of the actual term, to just delete the whole ? I looked at the related queries and they mentioned regular expressions, but I've spent ages trying to build them and they are beyond me,
thanks!
It is nearly 3 years too late answering this question, but there are Perl regular expressions which can be indeed used for this task.
Finding and deleting a term block containing same termName in relation as defined above for the term itself is possible with UltraEdit for Windows v21.10.0.1032 and most likely also with other text editors supporting Perl regular expression using a case-sensitive Perl regular expression Replace with search string:
^[ \t]*<term>(?:(?!</term>)[\S\s])+<termName>([^\r\n]+?)</termName>(?:(?!</term>)[\S\s])+<relation>(?:(?!</term>)[\S\s])+<termName>\1</termName>(?:(?!</term>)[\S\s])+</term>[ \t\r]*\n
The replace string is an empty string.
Explanation:
^ ... start every search at beginning of a line.
[ \t]* ... there can be 0 or more spaces or tabs at beginning of the line.
<term> ... this string must be found next on the line.
Next the tricky expression follows which is required to match any character up to next string of interest, but with avoiding matching something in next term block if the remaining expression does not return a positive result on current term block.
(?:(?!</term>)[\S\s])+ ... this expression finds any character because of [\S\s] matching any non whitespace character or any whitespace character. There must be at least 1 character before next fixed string because of the +, but it can be also more characters. Additionally the Perl regular expression must make look ahead on every character matched to check if NOT </term> follows. If right of the currently matched character there is the string </term>, the Perl regexp engine must stop matching any character at current position in stream and continue with next part of the search string. So this expression can match any character, but not beyond </term> and therefore only characters between <term> and </term>. Because of ?: nothing is captured/marked for back referencing by this expression.
<termName> ... this fixed string within a term block must be found next.
([^\r\n]+?) ... matches the characters of the name of the term and captures/marks this string for back referencing. Instead of the negative character class expression [^\r\n], it would be also possible to use another class definition, or just . if a dot does not match new line characters. Also possible would be ([^<]+) if it is not possible that a not encoded opening angle bracket is part of the term name. Character < must be encoded with < according to XML specification within an element's value except within a CDATA block.
</termName> ... this fixed string within a term block must be found next.
(?:(?!</term>)[\S\s])+ ... again any character within a term block up to next fixed string.
<relation> ... this fixed string within a term block must be found next.
(?:(?!</term>)[\S\s])+ ... again any character within a term block up to next fixed string.
<termName> ... this fixed string within a term block must be found next.
\1 ... this expression back references the captured/marked term name and therefore the next string must be the same as the name of the term defined above.
</termName> ... this fixed string within a term block must be found next.
(?:(?!</term>)[\S\s])+ ... again any character within a term block up to next fixed string.
</term> ... this fixed string marking end of a term block must be found next.
[ \t\r]*\n ... matches 0 or more spaces, tabs and carriage returns and next a line-feed. So this expression works for a DOS/Windows (CR+LF) and a Unix (only LF) text file.
Also possible with UltraEdit is:
(?s)^[ \t]*<term>(?:(?!</term>).)+<termName>([^<]+?)</termName>(?:(?!</term>).)+<relation>(?:(?!</term>).)+<termName>\1</termName>(?:(?!</term>).)+</term>[ \t\r]*\n
(?s) ... this expression at beginning of the search string changes the behavior of . from matching any character except line terminators to really any character and therefore . is now like [\S\s].
Could anyone please provide an explanation of the syntax in the following example, or post me a link where there is a more general explanation of the individual symbols used in this expression? I found Vim help to be incomplete in this regard.
:set foldexpr=getline(v:lnum)=~'^\\s*$'&&getline(v:lnum+1)=~'\\S'?'<1':1
What is unclear to me is the following.
Why are strings enclosed in single quotes instead of double quotes? Is it a matter of choice?
What does the explanation mark mean in =~'\\S'?'<1':1?
What does the expression 'string1'string2'string3 mean?
What does :1 mean?
The foldexpr option supposed to contain an expression that evaluates
into an integer or a string of particular format that specifies the folding
level of the line which number is stored in the v:lnum global variable
at the moment of evaluation.
Let us follow the logic of this foldexpr example from top to bottom.
getline(v:lnum)=~'^\\s*$'&&getline(v:lnum+1)=~'\\S'?'<1':1
At the top level, the whole expression is an instance of the ternary
operator A ? B : C. The result of the operator is the value of the
B expression if A evaluates to non-zero, and the value of the
C expression otherwise (see :help expr1). In this case, B is
the string literal '<1', and C is the number 1 (for meaning
of '<1' and 1 as fold level specifiers see :help fold-expr).
The A expression consists of two conditions joined by the
&& operator:
getline(v:lnum) =~ '^\\s*$' && getline(v:lnum+1) =~ '\\S'
Both conditions have the same form:
getline(N) =~ S
The getline function returns contents of the line (in the current
buffer) that is referenced by the line number passed as an argument
(see :help getline). When the foldexpr is evaluated, the v:lnum
variable contains number of the line for which folding level should
be calculated.
The =~ operator tests whether its left operand matches a regular
expression given by its right string operand, and returns boolean value
(see :help expr4, in particular, near the end of the expr4 section).
Thus, the A condition is intended to check that the v:lnum-th line
matches the '^\\s*$' pattern, and the line following that v:lnum-th
line matches the '\\S' pattern.
The regular expression patterns are specified in the expression as
string literals. String literals have two syntactic forms and can be
quoted using double or single quotes. The difference between these
forms is that double quoted string could contain various control
sequences which start with backslash. That sequences allow to specify
special characters that cannot be easily typed otherwise (double
quote, for example—it writes \"). Single quoted strings, at the
other hand, do not allow such backslash-sequences. (For complete
description of single and double quoted strings see :help expr-string
and :help literal-string.)
The notable consequence of the double quoted strings syntax is that
backslash symbol itself must be escaped (\\). That is why single
quoted strings are often used to specify regular expressions: there
is no need to escape constantly demanded backslash symbol. One can
notice, though, that backslashes are nevertheless escaped in those
patterns above. This is due to that some symbols (including backslash)
have special meaning when in Ex commands (including :set, of
course). When you hit Enter to start the command
:set foldexpr=...
Vim interprets some character sequences first (see :help cmdline-special).
In particular, the \\ sequence is treated as a single backslash.
Putting it all together, the expression tests whether the line number
v:lnum contains only blank characters and whether the following line
(number v:lnum+1) has any non-blank character (see :help pattern
to grasp the meaning of the patterns). If so, the expression evaluates
to the string '<1', otherwise it evaluates to the number 1.