two asterisk sttring match in TCL - string

Help me to decide one problem in TCL.
By using my macros I want find string, which contains two asterisk (**).
I tried to used following commands:
string match \*\* string_name
But it doesn't work. Can you explain me where I made a mistake and how to do it correctly?
Thanks in advance!

What you are actually passing to the interpreter is string match ** string_name. You need to pass the actual backslashes to the interpreter so that it then will understand two escaped asterisks, and to do that you need to add a couple more backslashes:
string match \\*\\* $s
Or use braces:
string match {\*\*} $s
Note that the above will match only if $s contains 2 asterisks, and nothing else. To allow for anything before and after the asterisks, you can use more asterisks...
string match {*\*\**} $s
There are a few other ways to check if a string has double asterisks, you can for instance use string first (and since this one does not support expressions, you can actually get away without having to escape anything):
string first ** $s
If you get something greater than -1, then ** is present in $s.
Or if you happen to know some regular expressions:
regexp -- {\*\*} $s
Those are the most common I think.

Related

BASH script matching a glob at the begining of a string

I have folders in a directory with names giving specific information. For example:
[allied]_remarkable_points_[treatment]
[nexus]_advisory_plans_[inspection]
....
So I have a structure similar to this: [company]_title_[topic]. The script has to match the file naming structure to variables in a script in order to extract the information:
COMPANY='[allied]';
TITLE='remarkable points'
TOPIC='[treatment]'
The folders do not contain a constant number of characters, so I can't use indexed matching in the script. I managed to extract $TITLE and $TOPIC, but I can't manage to match the first string since the variable brings me back the complete folders name.
FOLDERNAME=${PWD##*/}
This is the line is giving me grief:
COMPANY=`expr $FOLDERNAME : '\(\[.*\]\)'`
I tried to avoid the greedy behaviour by placing ? in the regular expression:
COMPANY=`expr $FOLDERNAME : '\(\[.*?\]\)'`
but as soon as I do that, it returns nothing
Any ideas?
expr isn't needed for regular-expression matching in bash.
[[ $FOLDERNAME =~ (\[[^]]*\]) ]] && COMPANY=${BASH_REMATCH[1]}
Use [^]]* instead of .* to do a non-greedy match of the bracketed portion. An bigger regular expression can capture all three parts:
[[ $FOLDERNAME =~ (\[[^]]*\])_([^_]*)_(\[[^]]*\]) ]] && {
COMPANY=${BASH_REMATCH[1]}
TITLE=${BASH_REMATCH[2]}
TOPIC=${BASH_REMATCH[3]}
}
Bash has built-in string manipulation functionality.
for f in *; do
company=${f%%\]*}
company=${company#\[} # strip off leading [
topic=${f##\[}
topic=${f%\]} # strip off trailing ]
:
done
The construct ${variable#wildcard} removes any prefix matching wildcard from the value of variable and returns the resulting string. Doubling the # obtains the longest possible wildcard match instead of the shortest. Using % selects suffix instead of prefix substitution.
If for some reason you do want to use expr, the reason your non-greedy regex attempt doesn't work is that this syntax is significantly newer than anything related to expr. In fact, if you are using Bash, you should probably not be using expr at all, as Bash provides superior built-in features for every use case where expr made sense, once in the distant past when the sh shell did not have built-in regex matching and arithmetic.
Fortunately, though, it's not hard to get non-greedy matching in this isolated case. Just change the regex to not match on square brackets.
COMPANY=`expr "$FOLDERNAME" : '\(\[[^][]*\]\)'`
(The closing square bracket needs to come first within the negated character class; in any other position, a closing square bracket closes the character class. Many newbies expect to be able to use backslash escapes for this, but that's not how it works. Notice also the addition of double quotes around the variable.)
If you're not adverse to using grep, then:
COMPANY=$(grep -Po "^\[.*?\]" $FOLDERNAME)

How to capture a string between parentheses?

str = "fa, (captured)[asd] asf, 31"
for word in str:gmatch("\(%a+\)") do
print(word)
end
Hi! I want to capture a word between parentheses.
My Code should print "captured" string.
lua: /home/casey/Desktop/test.lua:3: invalid escape sequence near '\('
And i got this syntax error.
Of course, I can just find position of parentheses and use string.sub function
But I prefer simple code.
Also, brackets gave me a similar error.
The escape character in Lua patterns is %, not \. So use this:
word=str:match("%((%a+)%)")
If you only need one match, there is no need for a gmatch loop.
To capture the string in square brackets, use a similar pattern:
word=str:match("%[(%a+)%]")
If the captured string is not entirely composed of letters, use .- instead of %a+.
lhf's answer likely gives you what you need, but I'd like to mention one more option that I feel is underused and may work for you as well. One issue with using %((%a+)%) is that it doesn't work for nested parentheses: if you apply it to something like "(text(more)text)", you'll get "more" even though you may expect "text(more)text". Note that you can't fix it by asking to match to the first closing parenthesis (%(([^%)]+)%)) as it will give you "text(more".
However, you can use %bxy pattern item, which balances x and y occurrences and will return (text(more)text) in this case (you'd need to use something like (%b()) to capture it). Again, this may be overkill for your case, but useful to keep in mind and may help someone else who comes across this problem.

Lua - How to remove quotes around integers in strings

So I have this string:
{"scores":{"1":["John",60],"2":["Jude",60],"3":["Max",60],"4":["Kyle",60],"5":["Smith",60],"6":["Mark",50],"7":["Luke",40],"8":["Anne",30],"9":["Bruce",20],"10":["kazuo",10]}}
There are a number of integers there that have quotes around them, and I want to get rid of them. How do I do that? I already tried out:
print(string.gsub(string, '/"(\d)"/', "%1"));
but it does not work. :(
Lua does not have regular expressions like Perl, instead, it does have patterns. These are similar with a few differences.
There is no need for delimiting slashes / /, and the escaping character is % but not \. Otherwise, your trial is essentially correct:
print(string.gsub(str, '"(%d+)"', "%1"))
Where str is the variable containing the input string. Also note that string.gsub returns 2 values, which are both printed, the second result being the number of substitutions. Use an extra pair of parentheses to keep only the first result.
You can simplify a little the notation using the colon : operator :
print((str:gsub('"(%d+)"', "%1")))

Why do I have to escape the final ]

I have a file containing string like this one :
print $hash_xml->{'div'}{'div'}{'div'}[1]...
I want to replace {'div'}{'div'}{'div'}[1] by something else.
So I tried
%s/{'div'}{'div'}{'div'}[1]/by something else/gc
The strings were not found. I though I had to escape the {,},[ and ]
Still string not found.
So I tried to search a single { and it found them.
Then I tried to search {'div'}{'div'}{'div'} and it found it again.
Then {'div'}{'div'}{'div'}[1 was still found.
To find {'div'}{'div'}{'div'}[1]
I had to use %s/{'div'}{'div'}{'div'}[1\]
Why ?
vim 7.3 on Linux
The [] are used in regular expressions to wrap a range of acceptable characters.
When both are supplied unescaped, vim is treating the search string as a regex.
So when you leave it out, or escape the final character, vim cannot interpret a single bracket in a regex context, so does a literal search (basically the best it can do given the search string).
Personally, I would escape the opening and closing square brace to ensure that the meaning is clear.
That's because the [ and ] characters are used to build the search pattern.
See :h pattern and use the help file pattern.txt to try the following experiment:
Searching for the "[9-0]" pattern (without quotes) using /[0-9] will match every digit from 0 to 9 individually (see :h \[)
Now, if you try /\[0-9] or /[0-9\] you will match the whole pattern: a zero, an hyphen and a nine inside square brackets. That's because when you escape one of [ or ] the operator [*] ceases to exist.
Using your search pattern, /{'div'}{'div'}{'div'}[1\] and /{'div'}{'div'}{'div'}\[1] should match the same pattern which is the one you want, while /{'div'}{'div'}{'div'}[1] matches the string {'div'}{'div'}{'div'}1.
In order to avoid being caught by these special characters in regular expressions, you can try using the very magic flag.
E.g.:
:%s/\V{'div'}[1]/replacement/
Notice the \V flag at the beginning of the line.
Because the square brackets mean that vim thinks you're looking for any of the characters inside. This is known as a 'character class'. By escaping either of the square brackets it lets vim know that you're looking for the literal square string ending with '[1]'.
Ideally you should write your expression as:
%s/{'div'}{'div'}{'div'}\[1\]/replacement string/
to ensure that the meaning is completely clear.

Ignore escape characters (backslashes) in R strings

While running an R-plugin in SPSS, I receive a Windows path string as input e.g.
'C:\Users\mhermans\somefile.csv'
I would like to use that path in subsequent R code, but then the slashes need to be replaced with forward slashes, otherwise R interprets it as escapes (eg. "\U used without hex digits" errors).
I have however not been able to find a function that can replace the backslashes with foward slashes or double escape them. All those functions assume those characters are escaped.
So, is there something along the lines of:
>gsub('\\', '/', 'C:\Users\mhermans')
C:/Users/mhermans
You can try to use the 'allowEscapes' argument in scan()
X=scan(what="character",allowEscapes=F)
C:\Users\mhermans\somefile.csv
print(X)
[1] "C:\\Users\\mhermans\\somefile.csv"
As of version 4.0, introduced in April 2020, R provides a syntax for specifying raw strings. The string in the example can be written as:
path <- r"(C:\Users\mhermans\somefile.csv)"
From ?Quotes:
Raw character constants are also available using a syntax similar to the one used in C++: r"(...)" with ... any character sequence, except that it must not contain the closing sequence )". The delimiter pairs [] and {} can also be used, and R can be used in place of r. For additional flexibility, a number of dashes can be placed between the opening quote and the opening delimiter, as long as the same number of dashes appear between the closing delimiter and the closing quote.
First you need to get it assigned to a name:
pathname <- 'C:\\Users\\mhermans\\somefile.csv'
Notice that in order to get it into a name vector you needed to double them all, which gives a hint about how you could use regex. Actually, if you read it in from a text file, then R will do all the doubling for you. Mind you it not really doubling the backslashes. It is being stored as a single backslash, but it's being displayed like that and needs to be input like that from the console. Otherwise the R interpreter tries (and often fails) to turn it into a special character. And to compound the problem, regex uses the backslash as an escape as well. So to detect an escape with grep or sub or gsub you need to quadruple the backslashes
gsub("\\\\", "/", pathname)
# [1] "C:/Users/mhermans/somefile.csv"
You needed to doubly "double" the backslashes. The first of each couple of \'s is to signal to the grep machine that what next comes is a literal.
Consider:
nchar("\\A")
# returns `[1] 2`
If file E:\Data\junk.txt contains the following text (without quotes): C:\Users\mhermans\somefile.csv
You may get a warning with the following statement, but it will work:
texinp <- readLines("E:\\Data\\junk.txt")
If file E:\Data\junk.txt contains the following text (with quotes): "C:\Users\mhermans\somefile.csv"
The above readlines statement might also give you a warning, but will now contain:
"\"C:\Users\mhermans\somefile.csv\""
So, to get what you want, make sure there aren't quotes in the incoming file, and use:
texinp <- suppressWarnings(readLines("E:\\Data\\junk.txt"))

Resources