Allowing only usernames using "reasonable" characters

Allowing only usernames using "reasonable" characters - string

A username for a website can contain the space character, and yet it cannot be composed only of space characters. It can contain some symbols (like underscore and dash), but starting with certain symbols would look weird. Non-latin letters should be allowed, preferably for all languages, but tab and newline characters shouldn't. And definitely no Zalgo.
The rules composing what should and shouldn't be allowed in a reasonable naming system are complicated, however they are virtually the same for every website. Reimplementing them is probably a bad idea. Where can I find an implementation? I'm using PHP.

You should validate the username entered by the new user against a regular expression that run a match against the allowed character set.
Example: The following allows only english alphanumeric characters and - and _.
function isNewUsernameValid ($name, $filter = "[^a-zA-Z0-9\-\_\.]"){
return preg_match("~" . $filter . "~iU", $name) ? false : true;
}
if ( !isNewUsernameValid ($name) ){
print "Not a valid name.";
}
For your particular case, you'll have to come up with and test the regular expression.

Related

How can I use arbitrary text as a function name in Rust?

Is there a way in Rust to use any text as a function name? Something like:
fn 'This is the name of the function' { ... }
I find it useful for test functions and it is is allowed by other languages.

There's no way. According to the official reference:
An identifier is any nonempty ASCII string of the following form:
Either
The first character is a letter.
The remaining characters are alphanumeric or _.
Or
The first character is _.
The identifier is more than one character. _ alone is not an identifier.
The remaining characters are alphanumeric or _.
A raw identifier is like a normal identifier, but prefixed by r#. (Note that
the r# prefix is not included as part of the actual identifier.)
Unlike a normal identifier, a raw identifier may be any strict or reserved
keyword except the ones listed above for RAW_IDENTIFIER.

You can't have spaces in function names (and this is true of most programming languages). Usual practice for function names in Rust is to replace spaces with underscores, so the following is allowed:
fn This_is_the_name_of_the_function { ... }
although usual practice would use a lower-case t

Perl critic policy violation in checking index of substring in a string

for my $item (#array) {
if (index($item, '$n') != -1) {
print "HELLO\n";
}
}
Problem is: Perl critic gives below policy violation.
String may require interpolation at line 168, near '$item, '$n''. (Severity: 1)
Please advise how do I fix this?

In this case the analyzer either found a bug or is plain wrong in flagging your code.
Are you looking for a literal "$n" in $item, or for what $n variable evaluates to?
If you want to find the literal $n characters then there is nothing wrong with your code
If you expect $item to contain the value stored in $n variable then allow it to be evaluated,
if (index($item, $n) != -1)
If this is indeed the case but $n may also contain yet other escaped sequences or encodings which you need as literal characters (so to suppress their evaluation) then you may need to do a bit more, depending of what exactly may be in that variable.
In case you do need to find characters $ followed by n (what would explain a deliberate act of putting single quotes around a variable) you need to handle the warning.
For the particular policy that is violated see Perl::Critic::Policy::ValuesAndExpressions
This policy warns you if you use single-quotes or q// with a string that has unescaped metacharacters that may need interpolation.
To satisfy the policy you'd need to use double quotes and escape the $, for example qq(\$n). In my opinion this would change the fine original code segment into something strange to look at.
If you end up wanting to simply silence the warning see documentation, in Bending The Rules
A comment. The tool perlcritic is useful but you have to use it right. It's a static code analyzer and it doesn't know what your program is doing, so to say; it can catch bad practices but can't tell you how to write programs. Many of its "policies" are unsuitable for particular code.
The book that it is based on says all this very nicely in its introduction. Use sensibly.
When I look at the question where this comes from it appears that you are looking for index at which substrings were matched, so you need the content of $n variable, not literal "$n". Then perlcritic identified a bug in the code, good return for using it!

Exclude some characters in Unicode category

I'm trying to implement a rule along the lines of "all characters in the Letter and Symbol Unicode categories except a few reserved characters." From the lexer rules, I know I can use \p{___} to match against Unicode categories, but I am unsure of how to handle excluding certain characters.
Looking at example grammars, I am led a few different directions. For example, the Java 9 grammar seems to use predicates in order to directly use Java's built in isJavaIdentifier() while others manually define every valid character.
How can I achieve this functionality?

Without target specific code, you will have to define the ranges yourself so that the chars you want to exclude are not part of these ranges. You cannot use \p{...} and then exclude certain characters from it.
With target specific code, you can do as in the Java 9 grammar:
#lexer::members {
boolean aCustomMethod(int character) {
// Your logic to see if 'character' is valid. You're sure
// that it's at least a char from \p{Letter} or \p{Symbol}
return true;
}
}
TOKEN
: [\p{Letter}\p{Symbol}] {aCustomMethod(_input.LA(-1))}?
;

XML schema restriction pattern for not allowing specific string

I need to write an XSD schema with a restriction on a field, to ensure that
the value of the field does not contain the substring FILENAME at any location.
For example, all of the following must be invalid:
FILENAME
ORIGINFILENAME
FILENAMETEST
123FILENAME456
None of these values should be valid.
In a regular expression language that supports negative lookahead, I could do this by writing /^((?!FILENAME).)*$ but the XSD pattern language does not support negative lookahead.
How can I implement an XSD pattern restriction with the same effect as /^((?!FILENAME).)*$ ?
I need to use pattern, because I don't have access to XSD 1.1 assertions, which are the other obvious possibility.
The question XSD restriction that negates a matching string covers a similar case, but in that case the forbidden string is forbidden only as a prefix, which makes checking the constraint easier. How can the solution there be extended to cover the case where we have to check all locations within the input string, and not just the beginning?

OK, the OP has persuaded me that while the other question mentioned has an overlapping topic, the fact that the forbidden string is forbidden at all locations, not just as a prefix, complicates things enough to require a separate answer, at least for the XSD 1.0 case. (I started to add this answer as an addendum to my answer to the other question, and it grew too large.)
There are two approaches one can use here.
First, in XSD 1.1, a simple assertion of the form
not(matches($v, 'FILENAME'))
ought to do the job.
Second, if one is forced to work with an XSD 1.0 processor, one needs a pattern that will match all and only strings that don't contain the forbidden substring (here 'FILENAME').
One way to do this is to ensure that the character 'F' never occurs in the input. That's too drastic, but it does do the job: strings not containing the first character of the forbidden string do not contain the forbidden string.
But what of strings that do contain an occurrence of 'F'? They are fine, as long as no 'F' is followed by the string 'ILENAME'.
Putting that last point more abstractly, we can say that any acceptable string (any string that doesn't contain the string 'FILENAME') can be divided into two parts:
a prefix which contains no occurrences of the character 'F'
zero or more occurrences of 'F' followed by a string that doesn't match 'ILENAME' and doesn't contain any 'F'.
The prefix is easy to match: [^F]*.
The strings that start with F but don't match 'FILENAME' are a bit more complicated; just as we don't want to outlaw all occurrences of 'F', we also don't want to outlaw 'FI', 'FIL', etc. -- but each occurrence of such a dangerous string must be followed either by the end of the string, or by a letter that doesn't match the next letter of the forbidden string, or by another 'F' which begins another region we need to test. So for each proper prefix of the forbidden string, we create a regular expression of the form
$prefix || '([^F' || next-character-in-forbidden-string || ']'
|| '[^F]*'
Then we join all of those regular expressions with or-bars.
The end result in this case is something like the following (I have inserted newlines here and there, to make it easier to read; before use, they will need to be taken back out):
[^F]*
((F([^FI][^F]*)?)
|(FI([^FL][^F]*)?)
|(FIL([^FE][^F]*)?)
|(FILE([^FN][^F]*)?)
|(FILEN([^FA][^F]*)?)
|(FILENA([^FM][^F]*)?)
|(FILENAM([^FE][^F]*)?))*
Two points to bear in mind:
XSD regular expressions are implicitly anchored; testing this with a non-anchored regular expression evaluator will not produce the correct results.
It may not be obvious at first why the alternatives in the choice all end with [^F]* instead of .*. Thinking about the string 'FEEFIFILENAME' may help. We have to check every occurrence of 'F' to make sure it's not followed by 'ILENAME'.

String pattern or String manipulation to search and replace a pattern in lua

I get the list of domains on a system and I need to replace only the patterns which contain "domain\username" with '*'.
As of now I am able to do mask the domain names with * using string.gsub() but What pattern should I add to make sure any presence of domain\username is replaced with *
Example:
If on the system there are 2 domains test.com and work-user.com and users as admin and guest a file has the following details:
User tried to login from TEST\admin; but should have logged in from work-user\user1, No logs present for testing\guest, account.
The domain test.com and WORK-USER.org are active and TESTING domain in inactive.
Then the output should look like this:
User tried to login from *********; but should have logged in from ********\user1, No logs present for testing\*****, account.
The domain ****.com and *********.org are active and TESTING domain in inactive.
Since Testing and user1 are not the domain and username on that system, they should not get replaced.
I have the logic to replace the username and domain name independently in any given format, but when it is the format of domain\username I am not able to replace it.
I have to add some logic\pattern after I get the domain name so it matches the above requirement.
Can you please let me know how to proceed?
I tried the below code:
test_string="User tried to login from TEST\\admin; but should have logged in from work-user\\user1, No logs present for testing\\guest, account. The domain test.com and WORK-USER.org are active and TESTING domain in inactive"
s= "test"
t=( string.gsub(s.."$DNname", "%$(%w+)", {DNname="\\([%w_]+)"}) )
n=( string.gsub(s.."$DNname", "%$(%w+)", {DNname="\\([%a%d]+)([%;%,%.%s]?)"}) ) print (t)
print(n)
r=string.match(test_string,t)
res=string.match(test_string,n)
print(r)
print(res)
It is printing nil, and is not able to match any pattern

First let's talk about why your code doesn't work.
For one thing, your patterns both have a backslash in them, so you are right away missing anything without a backslash:
print(t) -- test\([%w_]+)
print(n) -- test\([%a%d]+)([%;%,%.%s]?)
But there is also another problem. The only thing with a backslash that ought to match in your test message is TEST\admin. But here TEST is all uppercase, and pattern matching is case sensitive, so you will not find it.
The first part of the answer, then, is to make a case-insensitive pattern. This can be done as follows:
s= "[Tt][Ee][Ss][Tt]"
Here I have replaced each letter with the character class that will match either the uppercase or lowercase letter.
What happens if we look for this pattern in the original message, though? We will have an unfortunate problem: we will find testing and TESTING. It looks like you may have already encountered this problem as you wrote "([%;%,%.%s]?)".
The better way to do this is the frontier pattern. (Note that the frontier pattern is an undocumented feature in Lua 5.1. I'm not sure if it is in Lua 5.0 or not. It became a documented feature in Lua 5.2.)
The frontier pattern takes a character set and will only match spaces between characters where the previous character is not in the set and the next character is in the set. It sounds complicated, but basically it lets you find the beginnings or endings of words.
To use the frontier pattern, we need to figure out what a domain or username might look like. We may not be able to do this perfectly, but, in practice, being overly greedy should be fine.
s = "%f[%w-][Tt][Ee][Ss][Tt]%f[^%w-]"
This new pattern will match "TEST" and "test", but will not match "TESTING" or "testing".
Before proceeding, let's look at a problem that might occur with a domain like your "work-user". The character "-" has a special meaning in patterns, so we must escape it. All special characters can be escaped by adding a "%" in front. So, our work-user pattern would look like:
s = "%f[%w-][Ww][Oo][Rr][Kk]%-[Uu][Ss][Ee][Rr]%f[^%w-]"
Well, these kind of patterns are sort of awful to write out, so let's try to write a function to do it for us:
function string_to_pattern(str, frontier_set, ci)
-- escape magic characters
str = str:gsub("[][^$()%%.*+-?]", "%%%0")
if ci then
-- make the resulting pattern case-insensitive
str = str:gsub("%a", function(letter)
return "["..letter:upper()..letter:lower().."]"
end)
end
if frontier_set then
str = "%f["..frontier_set.."]"..str.."%f[^"..frontier_set.."]"
end
return str
end
print(string_to_pattern("work-user", "%w-", true))
-- %f[%w-][Ww][Oo][Rr][Kk]%-[Uu][Ss][Ee][Rr]%f[^%w-]
I'll go ahead a mention the corner case now: this pattern will not match "-work-user" or "work-user-". This may be okay or not depending on what kind of messages get generated. You could take "-" out of frontier set, but then you would match e.g. "my-work-user". You can decide if this matters, but I haven't thought how to solve it with Lua's pattern matching language.
Now, how do we replace a match with *'s? This part is pretty easy. The built-in string.gsub function will allow us to replace matches of our patterns with other strings. We just need to generate a replacement string that consists of as many *'s as characters.
function string_to_stars(str)
return ("*"):rep(str:len())
end
local pattern = string_to_pattern("test", "%w-", true)
print( (test_string:gsub(pattern, string_to_stars)) )
Now, there's a final problem. We can match users in the same we match domains. For example:
-- note that different frontier_set here
-- I don't know what the parameters for your usernames are,
-- but this matches your code
local pattern = string_to_pattern("admin", "%w_", true)
print( (test_string:gsub(pattern, string_to_stars)) )
However, even if we replace all the domains and usernames separately, the backslash between "TEST" and "admin" in "TEST\admin" will not be replaced. We could do a hack like this:
test_string:gsub("%*\\%*","***")
This would replace "**" with "***" in the final output. However, this is not quite robust because it could replace a "**" that was in the original message and not a result of our processing. To do things properly, we would have to iterate over all domain+user pairs and do something like this:
test_string:gsub(domain_pattern .. "\\" .. user_pattern, string_to_stars)
Note that this must be done before any other replacements, as otherwise the domain and username will have already been replaced, and can no longer be matched.
Now that the problem is solved in that way, let me suggest an alternative approach that reflects something more like what I would write from scratch. I think it is probably simpler and more readable. Instead of using pattern matching to find our domains and usernames exactly, let's instead just match tokens that could be domains or usernames and then check if they match exactly later.
local message = -- broken into multiple lines only for
-- formatting reasons
"User tried to login from TEST\\admin; but should "
.."have logged in from work-user\\user1, No logs present "
.."for testing\\guest, account. The domain test.com and "
.."WORK-USER.org are active and TESTING domain in inactive"
-- too greedy, but may not matter in your case
local domain_pattern = "%w[%w-]*"
-- again, not sure
local user_pattern = "[%w_]+"
-- for case-insensitivity, call :lower before inserting into the set
local domains = {["test"]=true, ["work-user"]=true}
local users = {["admin"]=true, ["guest"]=true}
local pattern = "(("..domain_pattern..")\\("..user_pattern.."))"
message = message:gsub(pattern, function(whole, domain, user)
-- only call lower if case-insensitive
if domains[domain:lower()] and users[user:lower()] then
return string_to_stars(whole)
else
return whole
end
end)
local function replace_set(message, pattern, set, ci)
return (message:gsub(pattern, function(str)
if ci then str = str:lower() end
if set[str] then
return string_to_stars(str)
else
return str
end
end))
end
message = replace_set(message, domain_pattern, domains, true)
message = replace_set(message, user_pattern, users, true)
print(message)
Notice how simple the patterns are in this example. We no longer need case-insensitive character classes like "[Tt]" because the case-insensitivity is checked after the matching by forcing both strings to be lowercase with string.lower (which may not be maximally efficient, but, hey, this is Lua). We no longer need to use the frontier pattern because we are guaranteed to get full words because of greedy matching. The backslash case is still weird, but I've handled it in the same "robust" way as I suggested above.
A final note: I don't know exactly why your doing this, but I can maybe guess that it is to prevent someone from seeing domains or usernames. Replacing them with *'s is not necessarily the best way to go. First, doing matching in these ways could be problematic if your messages are (for example) delimited with letters. This seems unlikely for user-friendly messages, but I don't know whether that's something you should count on when security is at stake. Another thing is that you are not hiding the lengths of the domains or usernames. This can also be a major source of insecurity. For example, a user might reasonably guess that ***** is "admin".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Allowing only usernames using "reasonable" characters - string

Related

How can I use arbitrary text as a function name in Rust?

Perl critic policy violation in checking index of substring in a string

Exclude some characters in Unicode category

XML schema restriction pattern for not allowing specific string

String pattern or String manipulation to search and replace a pattern in lua

Categories

Resources