string.gmatch to find a string included between two inequality signs - string

I'm using Lua, already used Google and nothing, can't find way to get string between inequality signs (< >). Other brackets are easy to get but these not. It's possible to do?
Target: How to grab "name" from string between inequality signs?
String: < name >: Message

If name does not contain >, then <(.-)> works.

You can use the (%b<>) pattern to capture matching <>. Then using that value, you can simply use string.sub to cut off the first and last char:
name,message=('< name<> > : Foo Bar!'):match('(%b<>)%s*:%s*(.*)')
name=name:sub(2,-2)
print(name,'sent message :',message)
As you can see this also takes care of strings containing other, embedded <> signs

Related

What is % in Lua [not for numeric operations]?

So I was using string.find() to find "(" in my string. But without "%" before "(" (like "%(") it says: "unfinished capture". What exactly this symbol doing?
Works:
local str = "(text)"
print(str:find("%("))
Don`t work:
local str = "(text)"
print(str:find("("))
It's used in patterns, used in some functions related to finding. For example, %s means "find a single whitespace character". %( searches for the character (. The reason you can't directly write ( is that that will create a capture, which is a mechanism to retrieve a part of a match. For other characters, you directly type them, unless there is a similar restriction.

LUA -- gsub problems -- passing a variable to the match string isn't working [duplicate]

This question already has an answer here:
How to match a sentence in Lua
(1 answer)
Closed 1 year ago.
Been stuck on this for over a day.
I'm trying to use gsub to extract a portion of an input string. The exact pattern of the input varies in different cases, so I'm trying to use a variable to represent that pattern, so that the same routine - which is otherwise identical - can be used in all cases, rather than separately coding each.
So, I have something along the lines of:
newstring , n = oldstring:gsub(matchstring[i],"%1");
where matchstring[] is an indexed table of the different possible pattern matches, set up so that "%1" will match the target sequence in each matchstring[].
For instance, matchstring[1] might be
"\[User\] <code:%w*>([^<]*)<\\code>.*" -- extract user name from within the <code>...<\code>
while matchstring[2] could be
"\[World\] (%w)* .*" -- extract user name as first word after prefix '[World] '
and matchstring[3] could be
"<code:%w*>([^<]*)<\\code>.*" -- extract username from within <code>...<\code> at start
This does not work.
Yet when, debugging one of the cases, I replace matchstring[i] with the exact same string -- only now passed as a string literal rather than saved in a variable -- it works.
So.. I'm guessing there must be some 'processing' of the string - stripping out special characters or something - when it's sent as a variable rather than a string literal ... but for the life of me I can't figure out how to adjust the matchstring[] entries to compensate!
Help much appreciated...
FACEPALM
Thankyou, Piglet, you got me on the right track.
Given how this particular platform processes & passes strings, anything within <...> needed the escape character \ for downstream use, but of course - duh - for the lua gsub's processing itself it needed the standard %
much obliged

Get a value from the string with regex

I have this for example:
<#445288012218368010>
And I want to get from between <# > symbols the value.
I tried so:
string.replace(/^(?:\<\#)(?:.*)(?:\>)$/gim, '');
But then I don't get any result. It will delete/remove the whole string.
I want only this part: 445288012218368010 (it will be dynamic, so yeah it will be not the same numbers).
Anyway it is for the discord chat bot and I know that there is other methods for check the mentioned names but I want to do that in regex because which I am trying to do can't go the common method.
So yeah how can I get the value from between those symbols?
I need this in node.js regex.
You can use String#match which will return regular expression matches for the string (in this case the RegExp would be <#(\d+)> (the parenthesis around the \d+ make \d+ become its own group). This way you can use <string>.match(/<#(\d+)>/) to get the regular expression results and <string>.match(/<#(\d+)>/)[1] to get the first group of the regex (in this case the number).
You regex matches but you use a non capturing group (?:.*) so you get the full match and replace that with an empty string. Note that you could omit the first and the third non capturing group and use <# and > instead.
You could match what is between the brackets using a capturing group ([^>]+) or (\d+) and use replace and refer the first capturing group $1 in the replacement.
console.log("<#445288012218368010>".replace(/^<#([^>]+)>$/gim, '$1'));

How to find a substring of a double-quoted string with a dollar sign in Groovy

I wanted to correct the automatically created Linux scripts. I use findAll(String, String) function to change "$APP_ARGS" for something else.
I have tried variants:
replaceAll('"$APP_ARGS"', 'simulators ' + '"\\\\$APP_ARGS"') - doesn't find
replaceAll('\"\$APP_ARGS\"',... - doesn't find
replaceAll('"\$APP_ARGS"',... - doesn't find
replaceAll('\\"\\$APP_ARGS\\"',... - editor warning - excessive escape
replaceAll('"\\\\$APP_ARGS"',... - doesn't find
replaceAll('\\\\"\\\\$APP_ARGS\\\\"',... - doesn't find
replaceAll($/"$$APP_ARGS"/$, ...) - does not find
replaceAll('"[$]APP_ARGS"', 'something simple') - finds.
replaceAll('"[$]APP_ARGS"', '"\\\\$APP_ARGS"') - fails.
As you see, if I use the regex format, the finding works ok. But is there a way to make an escaping work? For I need that $ in the replacing string, too.
According to Groovy manuals, /../ string needn't escaping for anything except slashes themselves. But
replaceAll(/"$APP_ARGS"/,...
fails, too, with a message: Could not get unknown property 'APP_ARGS'.
It seems that behaviour of that function has no logic and we have to find the correct solution by experiments.
replaceAll('"\\$APP_ARGS"', 'simulators ' + '"\\$APP_ARGS"')
The additional possible problem is that \\ before $ should be in the both strings, replacing and replaced.
The first argument of replaceAll is always treated as an regexp, so we need to quote $ (line end). The second param may contain backreferences to groups from the regexp, which start with a $, so that one must be quoted too.
A saner way is to use replace instead of replaceAll, which already quotes/escapes both params according to that useage.

XML schema restriction pattern for not allowing specific string

I need to write an XSD schema with a restriction on a field, to ensure that
the value of the field does not contain the substring FILENAME at any location.
For example, all of the following must be invalid:
FILENAME
ORIGINFILENAME
FILENAMETEST
123FILENAME456
None of these values should be valid.
In a regular expression language that supports negative lookahead, I could do this by writing /^((?!FILENAME).)*$ but the XSD pattern language does not support negative lookahead.
How can I implement an XSD pattern restriction with the same effect as /^((?!FILENAME).)*$ ?
I need to use pattern, because I don't have access to XSD 1.1 assertions, which are the other obvious possibility.
The question XSD restriction that negates a matching string covers a similar case, but in that case the forbidden string is forbidden only as a prefix, which makes checking the constraint easier. How can the solution there be extended to cover the case where we have to check all locations within the input string, and not just the beginning?
OK, the OP has persuaded me that while the other question mentioned has an overlapping topic, the fact that the forbidden string is forbidden at all locations, not just as a prefix, complicates things enough to require a separate answer, at least for the XSD 1.0 case. (I started to add this answer as an addendum to my answer to the other question, and it grew too large.)
There are two approaches one can use here.
First, in XSD 1.1, a simple assertion of the form
not(matches($v, 'FILENAME'))
ought to do the job.
Second, if one is forced to work with an XSD 1.0 processor, one needs a pattern that will match all and only strings that don't contain the forbidden substring (here 'FILENAME').
One way to do this is to ensure that the character 'F' never occurs in the input. That's too drastic, but it does do the job: strings not containing the first character of the forbidden string do not contain the forbidden string.
But what of strings that do contain an occurrence of 'F'? They are fine, as long as no 'F' is followed by the string 'ILENAME'.
Putting that last point more abstractly, we can say that any acceptable string (any string that doesn't contain the string 'FILENAME') can be divided into two parts:
a prefix which contains no occurrences of the character 'F'
zero or more occurrences of 'F' followed by a string that doesn't match 'ILENAME' and doesn't contain any 'F'.
The prefix is easy to match: [^F]*.
The strings that start with F but don't match 'FILENAME' are a bit more complicated; just as we don't want to outlaw all occurrences of 'F', we also don't want to outlaw 'FI', 'FIL', etc. -- but each occurrence of such a dangerous string must be followed either by the end of the string, or by a letter that doesn't match the next letter of the forbidden string, or by another 'F' which begins another region we need to test. So for each proper prefix of the forbidden string, we create a regular expression of the form
$prefix || '([^F' || next-character-in-forbidden-string || ']'
|| '[^F]*'
Then we join all of those regular expressions with or-bars.
The end result in this case is something like the following (I have inserted newlines here and there, to make it easier to read; before use, they will need to be taken back out):
[^F]*
((F([^FI][^F]*)?)
|(FI([^FL][^F]*)?)
|(FIL([^FE][^F]*)?)
|(FILE([^FN][^F]*)?)
|(FILEN([^FA][^F]*)?)
|(FILENA([^FM][^F]*)?)
|(FILENAM([^FE][^F]*)?))*
Two points to bear in mind:
XSD regular expressions are implicitly anchored; testing this with a non-anchored regular expression evaluator will not produce the correct results.
It may not be obvious at first why the alternatives in the choice all end with [^F]* instead of .*. Thinking about the string 'FEEFIFILENAME' may help. We have to check every occurrence of 'F' to make sure it's not followed by 'ILENAME'.

Resources