LUA -- gsub problems -- passing a variable to the match string isn't working [duplicate] - string

This question already has an answer here:
How to match a sentence in Lua
(1 answer)
Closed 1 year ago.
Been stuck on this for over a day.
I'm trying to use gsub to extract a portion of an input string. The exact pattern of the input varies in different cases, so I'm trying to use a variable to represent that pattern, so that the same routine - which is otherwise identical - can be used in all cases, rather than separately coding each.
So, I have something along the lines of:
newstring , n = oldstring:gsub(matchstring[i],"%1");
where matchstring[] is an indexed table of the different possible pattern matches, set up so that "%1" will match the target sequence in each matchstring[].
For instance, matchstring[1] might be
"\[User\] <code:%w*>([^<]*)<\\code>.*" -- extract user name from within the <code>...<\code>
while matchstring[2] could be
"\[World\] (%w)* .*" -- extract user name as first word after prefix '[World] '
and matchstring[3] could be
"<code:%w*>([^<]*)<\\code>.*" -- extract username from within <code>...<\code> at start
This does not work.
Yet when, debugging one of the cases, I replace matchstring[i] with the exact same string -- only now passed as a string literal rather than saved in a variable -- it works.
So.. I'm guessing there must be some 'processing' of the string - stripping out special characters or something - when it's sent as a variable rather than a string literal ... but for the life of me I can't figure out how to adjust the matchstring[] entries to compensate!
Help much appreciated...

FACEPALM
Thankyou, Piglet, you got me on the right track.
Given how this particular platform processes & passes strings, anything within <...> needed the escape character \ for downstream use, but of course - duh - for the lua gsub's processing itself it needed the standard %
much obliged

Related

Using flex to identify variable name without repeating characters

I'm not fully sure how to word my question, so sorry for the rough title.
I am trying to create a pattern that can identify variable names with the following restraints:
Must begin with a letter
First letter may be followed by any combination of letters, numbers, and hyphens
First letter may be followed with nothing
The variable name must not be entirely X's ([xX]+ is a seperate identifier in this grammar)
So for example, these would all be valid:
Avariable123
Bee-keeper
Y
E-3
But the following would not be valid:
XXXX
X
3variable
5
I am able to meet the first three requirements with my current identifier, but I am really struggling to change it so that it doesn't pick up variables that are entirely the letter X.
Here is what I have so far: [a-z][a-z0-9\-]* {return (NAME);}
Can anyone suggest a way of editing this to avoid variables that are made up of just the letter X?
The easiest way to handle that sort of requirement is to have one pattern which matches the exceptional string and another pattern, which comes afterwards in the file, which matches all the strings:
[xX]+ { /* matches all-x tokens */ }
[[:alpha:]][[:alnum:]-]* { /* handle identifiers */ }
This works because lex (and almost all lex derivatives) select the first match if two patterns match the same longest token.
Of course, you need to know what you want to do with the exceptional symbol. If you just want to accept it as some token type, there's no problem; you just do that. If, on the other hand, the intention was to break it into subtokens, perhaps individual letters, then you'll have to use yyless(), and you might want to switch to a new lexing state in order to avoid repeatedly matching the same long sequence of Xs. But maybe that doesn't matter in your case.
See the flex manual for more details and examples.

Finding substring of variable length in bash

I have a string, such as time=1234, and I want to extract just the number after the = sign. However, this number could be in the range of 0 and 100000 (eg. - time=1, time=23, time=99999, etc.).
I've tried things like $(string:5:8}, but this will only work for examples of a certain length.
How do I get the substring of everything after the = sign? I would prefer to do it without outside commands like cut or awk, because I will be running this script on devices that may or may not have that functionality. I know there are examples out there using outside functions, but I am trying to find a solution without the use of such.
s=time=1234
time_int=${s##*=}
echo "The content after the = in $s is $time_int"
This is a parameter expansion matching everything matching *= from the front of the variable -- thus, everything up to and including the last =.
If intending this to be non-greedy (that is, to remove only content up to the first = rather than the last =), use ${s#*=} -- a single # rather than two.
References:
The bash-hackers page on parameter expansion
BashFAQ #100 ("How do I do string manipulations in bash?")
BashFAQ #73 ("How can I use parameter expansion? How can I get substrings? [...])
BashSheet quick-reference, paramater expansion section
if time= part is constant you can remove prefix by using ${str#time=}
Let's say you have str='time=123123' if you execute echo ${str#time=} you would get 123123

XML schema restriction pattern for not allowing specific string

I need to write an XSD schema with a restriction on a field, to ensure that
the value of the field does not contain the substring FILENAME at any location.
For example, all of the following must be invalid:
FILENAME
ORIGINFILENAME
FILENAMETEST
123FILENAME456
None of these values should be valid.
In a regular expression language that supports negative lookahead, I could do this by writing /^((?!FILENAME).)*$ but the XSD pattern language does not support negative lookahead.
How can I implement an XSD pattern restriction with the same effect as /^((?!FILENAME).)*$ ?
I need to use pattern, because I don't have access to XSD 1.1 assertions, which are the other obvious possibility.
The question XSD restriction that negates a matching string covers a similar case, but in that case the forbidden string is forbidden only as a prefix, which makes checking the constraint easier. How can the solution there be extended to cover the case where we have to check all locations within the input string, and not just the beginning?
OK, the OP has persuaded me that while the other question mentioned has an overlapping topic, the fact that the forbidden string is forbidden at all locations, not just as a prefix, complicates things enough to require a separate answer, at least for the XSD 1.0 case. (I started to add this answer as an addendum to my answer to the other question, and it grew too large.)
There are two approaches one can use here.
First, in XSD 1.1, a simple assertion of the form
not(matches($v, 'FILENAME'))
ought to do the job.
Second, if one is forced to work with an XSD 1.0 processor, one needs a pattern that will match all and only strings that don't contain the forbidden substring (here 'FILENAME').
One way to do this is to ensure that the character 'F' never occurs in the input. That's too drastic, but it does do the job: strings not containing the first character of the forbidden string do not contain the forbidden string.
But what of strings that do contain an occurrence of 'F'? They are fine, as long as no 'F' is followed by the string 'ILENAME'.
Putting that last point more abstractly, we can say that any acceptable string (any string that doesn't contain the string 'FILENAME') can be divided into two parts:
a prefix which contains no occurrences of the character 'F'
zero or more occurrences of 'F' followed by a string that doesn't match 'ILENAME' and doesn't contain any 'F'.
The prefix is easy to match: [^F]*.
The strings that start with F but don't match 'FILENAME' are a bit more complicated; just as we don't want to outlaw all occurrences of 'F', we also don't want to outlaw 'FI', 'FIL', etc. -- but each occurrence of such a dangerous string must be followed either by the end of the string, or by a letter that doesn't match the next letter of the forbidden string, or by another 'F' which begins another region we need to test. So for each proper prefix of the forbidden string, we create a regular expression of the form
$prefix || '([^F' || next-character-in-forbidden-string || ']'
|| '[^F]*'
Then we join all of those regular expressions with or-bars.
The end result in this case is something like the following (I have inserted newlines here and there, to make it easier to read; before use, they will need to be taken back out):
[^F]*
((F([^FI][^F]*)?)
|(FI([^FL][^F]*)?)
|(FIL([^FE][^F]*)?)
|(FILE([^FN][^F]*)?)
|(FILEN([^FA][^F]*)?)
|(FILENA([^FM][^F]*)?)
|(FILENAM([^FE][^F]*)?))*
Two points to bear in mind:
XSD regular expressions are implicitly anchored; testing this with a non-anchored regular expression evaluator will not produce the correct results.
It may not be obvious at first why the alternatives in the choice all end with [^F]* instead of .*. Thinking about the string 'FEEFIFILENAME' may help. We have to check every occurrence of 'F' to make sure it's not followed by 'ILENAME'.

need guidance with basic function creation in MATLAB

I have to write a MATLAB function with the following description:
function counts = letterStatistics(filename, allowedChar, N)
This function is supposed to open a text file specified by filename and read its entire contents. The contents will be parsed such that any character that isn’t in allowedChar is removed. Finally it will return a count of all N-symbol combinations in the parsed text. This function should be stored in a file name “letterStatistics.m” and I made a list of some commands and things of how the function should be organized according to my professors' lecture notes:
Begin the function by setting the default value of N to 1 in case:
a. The user specifies a 0 or negative value of N.
b. The user doesn’t pass the argument N into the function, i.e., counts = letterStatistics(filename, allowedChar)
Using the fopen function, open the file filename for reading in text mode.
Using the function fscanf, read in all the contents of the opened file into a string variable.
I know there exists a MATLAB function to turn all letters in a string to lower case. Since my analysis will disregard case, I have to use this function on the string of text.
Parse this string variable as follows (use logical indexing or regular expressions – do not use for loops):
a. We want to remove all newline characters without this occurring:
e.g.
In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since.
In my younger and more vulnerableyears my father gave me some advicethat I’ve been turning over in my mindever since.
Replace all newline characters (special character \n) with a single space: ' '.
b. We will treat hyphenated words as two separate words, hence do the same for hyphens '-'.
c. Remove any character that is not in allowedChar. Hint: use regexprep with an empty string '' as an argument for replace.
d. Any sequence of two or more blank spaces should be replaced by a single blank space.
Use the provided permsRep function, to create a matrix of all possible N-symbol combinations of the symbols in allowedChar.
Using the strfind function, count all the N-symbol combinations in the parsed text into an array counts. Do not loop through each character in your parsed text as you would in a C program.
Close the opened file using fclose.
HERE IS MY QUESTION: so as you can see i have made this list of what the function is, what it should do, and using which commands (fclose etc.). the trouble is that I'm aware that closing the file involves use of 'fclose' but other than that I'm not sure how to execute #8. Same goes for the whole function creation. I have a vague idea of how to create a function using what commands but I'm unable to produce the actual code.. how should I begin? Any guidance/hints would seriously be appreciated because I'm having programmers' block and am unable to start!
I think that you are new to matlab, so the documentation may be complicated. The root of the problem is the basic understanding of file I/O (input/output) I guess. So the thing is that when you open the file using fopen, matlab returns a pointer to that file, which is generally called a file ID. When you call fclose you want matlab to understand that you want to close that file. So what you have to do is to use fclose with the correct file ID.
fid = open('test.txt');
fprintf(fid,'This is a test.\n');
fclose(fid);
fid = 0; % Optional, this will make it clear that the file is not open,
% but it is not necessary since matlab will send a not open message anyway
Regarding the function creation the syntax is something like this:
function out = myFcn(x,y)
z = x*y;
fprintf('z=%.0f\n',z); % Print value of z in the command window
out = z>0;
This is a function that checks if two numbers are positive and returns true they are. If not it returns false. This may not be the best way to do this test, but it works as example I guess.
Please comment if this is not what you want to know.

Simple string replacement set of rules

I have an application where users set up a bunch of objects by filling up a bunch of text boxes which represent values that these objects will take. Just like setting up a Person object which requires you to enter a Name and a LastName properties.
Now I want to introduce global variables that the user will be able to assign values to, or which's values will change during the execution of the program. And I want the user to be able to use them when filling up any object's properties. My first idea was to choose an special character that will mark the beginning of a variable name, and then let the user use the character itself twice to represent the character itself.
For instance, say I have a global variable called McThing. Then, say the symbol I choose to mark the beginning of a variable is %. The user would then be able to enter as a person's last name the string "Mc. %McThing", which then I'd replace using the value of McThing. If McThing's value is "Donalds", the last name would become "Mc. Donalds".
The problem with this is that, if I'd have a variable called He and another called Hello and the user enters "%Hello" as the string I wouldn't know which variable needs to be replaced. I could change my rules to, for instance, use the "%" symbol to mark both the beginning and the end of the variable name. But I'm not sure whether this will cause any other problem.
What would be the simplest possible set of rules to achieve this such that the user will be able to represent every possible string without ambiguities? Ideally, the variable names can have any character but I could restrict their names to a given set of characters.
Your approach of marking both beginning and end with % has one problem. What happens if the input string is %foo%%bar%? Do I get the value of foo and the value of bar? Or do I get the value of foo%bar? (Of course if % in variable names isn't allowed, this isn't a problem.)
The simplest way I can think of to avoid this problem is to use one symbol for the beginning and another (e.g. #) for the end. That should avoid any ambiguity. If the user wants a # in text or a variable name, he escapes it like so: %#. This causes no problems, since empty variable names are not a thing (at least I hope not).
It will be fine and easy to implement on the assumptions that:
You have no empty variable names (i.e. if we see a %% is that a % or an empty variable name?)
Variable names cannot contain %s (i.e. if we see a % in a variable name, is that the end or a %?)
OR
Variable names cannot start or end with % and you cannot have 2 variable names in a row
(i.e. is %a%%b% = a and b or a%b?)
These assumptions will ensure that any %% always represents a % character, and any % always represents the start or the end of the string.
These assumptions might not necessarily be required, but at the very least they will make the implementation a lot more difficult (with the above assumptions, we never have to look more than 1 character forward).
An alternative with no such restrictions, loosely based on the way C/Java/etc. does it:
Have % take on a role similar to \ in C/Java/etc. - use:
%s to denote the start of the string
%e to denote the end of the string
%% to denote the % character
You can also use the same characters to represent the start and end, but, we may as well make them different so it's easier to read.

Resources