Sanitize string for comparison in Matlab - string

This is a follow-up question from this that considered evalc, instead of figgling with file-descriptors manually. You can see below an example about poor sanitization. I want to remove things such as trailing characters, all whitespaces, all newlines etc -- that usually cause unexpected things -- is there a ready sanitization command to do this?
EDU>> a
a =
1 +1*{x} -1*{y}*{z}
EDU>> b
b =
1 +1*{x} -1*{y}*{z}
EDU>> isequal(a,b)
ans =
0

I don't know whether there exist any ready robust implementation but this works pretty well
xx=#(x)regexprep(x,'\s',''); isequal(xx(a),xx(b))
where I use anonymous function and remove some oddities such as trailing whitespaces/newlines often hard to see on the window.
Also, the commands such as strtrim() and deblank() can be useful to you in removing trailing characters.

Related

Python3 strip() get unexpect result

It's a weird problem
to_be_stripped="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120"
And two strings below:
s1="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120\\[Content_Types].xml"
s2="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120\\_rels\.rels"
When I use the command below:
s1.strip(to_be_stripped)
s2.strip(to_be_stripped)
I get these outputs:
'[Content_Types].x'
'_rels\\.'
If I use lstrip(), they will be:
'[Content_Types].xml'
'_rels\\.rels'
Which is the right outputs.
However, if we replace all Project Known with zeus_pipeline:
to_be_stripped="D:\\Users\\UserKnown\\PycharmProjects\\zeus_pipeline\\PT\\collections\\120"
And:
s2="D:\\Users\\UserKnown\\PycharmProjects\\zeus_pipeline\\PT\\collections\\120\\_rels\.rels"
s2.lstrip(to_be_stripped)will be '.rels'
If I use / instead of \\, nothing goes wrong. I am wondering why this problem happens.
strip isn't meant to remove full strings exactly. Rather, you give it a string, and every character in that string is removed from the start and of the string to be stripped.
In your case, the variable to_be_stripped contains the characters m and l, so those are stripped from the end of s1. However, it doesn't contain the character x, so the stripping stops there and no characters beyond that are removed.
Check out this question. The accepted answer is probably more extensive than you need - I like another user's suggestion of using replace instead of strip. This would look like:
s1.replace(to_be_stripped, "")

Determine when an SQL alias can be an open name

What would be the highest-performing implementation to determine if a string that represents an SQL alias needs to be wrapped in double-quotes?
Presently, in pg-promise I am wrapping every alias in double-quotes, to play it safe. I am looking to make the output SQL neater and shorter, if possible.
And I am divided which approach is the best -
to use a regular expression, somehow
to do a direct algorithm with strings
not to change it at all, if there are reasons for that
Basically, I am looking to improve function as.alias, if possible, not to wrap aliases into double quotes when it is not needed.
What have I tried so far...
I thought at first to do it only for the 99% of all cases - not to add double-quotes when your alias is the most typical one, just a simple word:
function skipQuotes(alias) {
const m = alias.match(/[A-Z]+|[a-z]+/);
return m && m[0] === alias;
}
This only checks it is a single word that uses either upper or lower case, but not the combination.
SOLUTION
Following the answer, I ended up with implementation that should cover 99% of all practical use cases, which is what I was trying to achieve:
const m = alias.match(/[a-z_][a-z0-9_$]*|[A-Z_][A-Z0-9_$]*/);
if (m && m[0] === alias) {
// double quotes will be skipped
} else {
// double quotes will be added
}
i.e. the surrounding double quotes are not added when the alias uses a simple syntax:
it is a same-case single word, without spaces
it can contain underscores, and can start with one
it can contain digits and $, but cannot start with those
Removing double quotes is admirable -- it definitely makes queries easier to read. The rules are pretty simple. A "valid" identifier consists of:
Letters (including diacritical marks), numbers, underscore, and dollar sign.
Starts with a letter (including diacriticals) or underscore.
Is not a reserved word.
(I think I have this summarized correctly. The real rules are in the documentation.)
The first two are readily implemented using regular expressions. The last probably wants a reference table for lookup (and the list varies by Postgres release -- although less than you might imagine).
Otherwise, the identifier needs to be surrounded by escape characters. Postgres uses double quotes (which is ANSI standard).
One reason you may want to do this is because Postgres converts identifiers to lower case for comparison. So, the following works fine:
select xa, Xa, xA, "xa"
from (select 1 as Xa) y
However, this does not work:
select Xa
from (select 1 as "Xa") y
Nor does:
select "Xa"
from (select 1 as Xa) y
In fact, there is no way to get refer to "Xa" without using quotes (at least none that I can readily think of).
Enforcing the discipline of exact matches can be a good thing or a bad thing. I find that one discipline too many: I admit to often ignoring case when writing "casual" code; it is just simpler to type without capitalization (or using double quotes). For more formal code, I try to be consistent.
On the other hand, the rules do allow:
select "Xa", "aX", ax
from (select 1 as "Xa", 2 as "aX", 3 as AX) y
(This returns 1, 2, 3.)
This is a naming convention that I would be happy if it were not allowed.

J string manipulation using only builtins

You are given a string like ))()(())(, and you wish to remove all instances of () from the string, which in this case means these two instances:
))()(())(
^^ ^^
leaving only ))()(.
I know you can use the library function stringreplace, or you could load up a regex library, but I want to know is if there is a concise way of accomplishing this the the J builtin operators exclusively?
I should clarify that my own solution was:
#~(-.#+._1&|.)#('()'&E.)
which I consider verbose -- so any similar solutions would not qualify as "concise" in my book. I'm really asking if there is a way to use a builtin (or maybe a simple combination of 2) to solve this directly. I expect this answer is no.
I think you are right that there is no ultra-concise way of expressing the operation you want to perform using just J primitives. The version I came up was very much like the one Dan, suggested above.
However given that a built in library verb rplc (based on stringreplace) performs exactly the operation you are after, I'm not sure why it would be better to replace it with a primitive.
'))()(())(' rplc '()';''
))()(
Having said that, if you can come up with a compelling case, then there is probably no reason it couldn't be added.
Not sure how concise it is, but I think that this will work:
deparen=. (-.#:(+/)#:(_1&|. ,: ])#:E. # ])
'()' deparen '))()(())('
))()(
Essentially the work is done by -. #: (+/) #: (_1&|. ,: ] )#:E. to create a bit string that removes the '()' instances using # (Copy) on the right argument.
E. identifies the positions of '()' using a bit string. Shift and laminate to get positions of '(' and ')', add them together to have 1 1 in the string where ever there is a '()' and then negate so these positions become 0 0 and are removed using Copy

String pattern or String manipulation to search and replace a pattern in lua

I get the list of domains on a system and I need to replace only the patterns which contain "domain\username" with '*'.
As of now I am able to do mask the domain names with * using string.gsub() but What pattern should I add to make sure any presence of domain\username is replaced with *
Example:
If on the system there are 2 domains test.com and work-user.com and users as admin and guest a file has the following details:
User tried to login from TEST\admin; but should have logged in from work-user\user1, No logs present for testing\guest, account.
The domain test.com and WORK-USER.org are active and TESTING domain in inactive.
Then the output should look like this:
User tried to login from *********; but should have logged in from ********\user1, No logs present for testing\*****, account.
The domain ****.com and *********.org are active and TESTING domain in inactive.
Since Testing and user1 are not the domain and username on that system, they should not get replaced.
I have the logic to replace the username and domain name independently in any given format, but when it is the format of domain\username I am not able to replace it.
I have to add some logic\pattern after I get the domain name so it matches the above requirement.
Can you please let me know how to proceed?
I tried the below code:
test_string="User tried to login from TEST\\admin; but should have logged in from work-user\\user1, No logs present for testing\\guest, account. The domain test.com and WORK-USER.org are active and TESTING domain in inactive"
s= "test"
t=( string.gsub(s.."$DNname", "%$(%w+)", {DNname="\\([%w_]+)"}) )
n=( string.gsub(s.."$DNname", "%$(%w+)", {DNname="\\([%a%d]+)([%;%,%.%s]?)"}) ) print (t)
print(n)
r=string.match(test_string,t)
res=string.match(test_string,n)
print(r)
print(res)
It is printing nil, and is not able to match any pattern
First let's talk about why your code doesn't work.
For one thing, your patterns both have a backslash in them, so you are right away missing anything without a backslash:
print(t) -- test\([%w_]+)
print(n) -- test\([%a%d]+)([%;%,%.%s]?)
But there is also another problem. The only thing with a backslash that ought to match in your test message is TEST\admin. But here TEST is all uppercase, and pattern matching is case sensitive, so you will not find it.
The first part of the answer, then, is to make a case-insensitive pattern. This can be done as follows:
s= "[Tt][Ee][Ss][Tt]"
Here I have replaced each letter with the character class that will match either the uppercase or lowercase letter.
What happens if we look for this pattern in the original message, though? We will have an unfortunate problem: we will find testing and TESTING. It looks like you may have already encountered this problem as you wrote "([%;%,%.%s]?)".
The better way to do this is the frontier pattern. (Note that the frontier pattern is an undocumented feature in Lua 5.1. I'm not sure if it is in Lua 5.0 or not. It became a documented feature in Lua 5.2.)
The frontier pattern takes a character set and will only match spaces between characters where the previous character is not in the set and the next character is in the set. It sounds complicated, but basically it lets you find the beginnings or endings of words.
To use the frontier pattern, we need to figure out what a domain or username might look like. We may not be able to do this perfectly, but, in practice, being overly greedy should be fine.
s = "%f[%w-][Tt][Ee][Ss][Tt]%f[^%w-]"
This new pattern will match "TEST" and "test", but will not match "TESTING" or "testing".
Before proceeding, let's look at a problem that might occur with a domain like your "work-user". The character "-" has a special meaning in patterns, so we must escape it. All special characters can be escaped by adding a "%" in front. So, our work-user pattern would look like:
s = "%f[%w-][Ww][Oo][Rr][Kk]%-[Uu][Ss][Ee][Rr]%f[^%w-]"
Well, these kind of patterns are sort of awful to write out, so let's try to write a function to do it for us:
function string_to_pattern(str, frontier_set, ci)
-- escape magic characters
str = str:gsub("[][^$()%%.*+-?]", "%%%0")
if ci then
-- make the resulting pattern case-insensitive
str = str:gsub("%a", function(letter)
return "["..letter:upper()..letter:lower().."]"
end)
end
if frontier_set then
str = "%f["..frontier_set.."]"..str.."%f[^"..frontier_set.."]"
end
return str
end
print(string_to_pattern("work-user", "%w-", true))
-- %f[%w-][Ww][Oo][Rr][Kk]%-[Uu][Ss][Ee][Rr]%f[^%w-]
I'll go ahead a mention the corner case now: this pattern will not match "-work-user" or "work-user-". This may be okay or not depending on what kind of messages get generated. You could take "-" out of frontier set, but then you would match e.g. "my-work-user". You can decide if this matters, but I haven't thought how to solve it with Lua's pattern matching language.
Now, how do we replace a match with *'s? This part is pretty easy. The built-in string.gsub function will allow us to replace matches of our patterns with other strings. We just need to generate a replacement string that consists of as many *'s as characters.
function string_to_stars(str)
return ("*"):rep(str:len())
end
local pattern = string_to_pattern("test", "%w-", true)
print( (test_string:gsub(pattern, string_to_stars)) )
Now, there's a final problem. We can match users in the same we match domains. For example:
-- note that different frontier_set here
-- I don't know what the parameters for your usernames are,
-- but this matches your code
local pattern = string_to_pattern("admin", "%w_", true)
print( (test_string:gsub(pattern, string_to_stars)) )
However, even if we replace all the domains and usernames separately, the backslash between "TEST" and "admin" in "TEST\admin" will not be replaced. We could do a hack like this:
test_string:gsub("%*\\%*","***")
This would replace "**" with "***" in the final output. However, this is not quite robust because it could replace a "**" that was in the original message and not a result of our processing. To do things properly, we would have to iterate over all domain+user pairs and do something like this:
test_string:gsub(domain_pattern .. "\\" .. user_pattern, string_to_stars)
Note that this must be done before any other replacements, as otherwise the domain and username will have already been replaced, and can no longer be matched.
Now that the problem is solved in that way, let me suggest an alternative approach that reflects something more like what I would write from scratch. I think it is probably simpler and more readable. Instead of using pattern matching to find our domains and usernames exactly, let's instead just match tokens that could be domains or usernames and then check if they match exactly later.
local message = -- broken into multiple lines only for
-- formatting reasons
"User tried to login from TEST\\admin; but should "
.."have logged in from work-user\\user1, No logs present "
.."for testing\\guest, account. The domain test.com and "
.."WORK-USER.org are active and TESTING domain in inactive"
-- too greedy, but may not matter in your case
local domain_pattern = "%w[%w-]*"
-- again, not sure
local user_pattern = "[%w_]+"
-- for case-insensitivity, call :lower before inserting into the set
local domains = {["test"]=true, ["work-user"]=true}
local users = {["admin"]=true, ["guest"]=true}
local pattern = "(("..domain_pattern..")\\("..user_pattern.."))"
message = message:gsub(pattern, function(whole, domain, user)
-- only call lower if case-insensitive
if domains[domain:lower()] and users[user:lower()] then
return string_to_stars(whole)
else
return whole
end
end)
local function replace_set(message, pattern, set, ci)
return (message:gsub(pattern, function(str)
if ci then str = str:lower() end
if set[str] then
return string_to_stars(str)
else
return str
end
end))
end
message = replace_set(message, domain_pattern, domains, true)
message = replace_set(message, user_pattern, users, true)
print(message)
Notice how simple the patterns are in this example. We no longer need case-insensitive character classes like "[Tt]" because the case-insensitivity is checked after the matching by forcing both strings to be lowercase with string.lower (which may not be maximally efficient, but, hey, this is Lua). We no longer need to use the frontier pattern because we are guaranteed to get full words because of greedy matching. The backslash case is still weird, but I've handled it in the same "robust" way as I suggested above.
A final note: I don't know exactly why your doing this, but I can maybe guess that it is to prevent someone from seeing domains or usernames. Replacing them with *'s is not necessarily the best way to go. First, doing matching in these ways could be problematic if your messages are (for example) delimited with letters. This seems unlikely for user-friendly messages, but I don't know whether that's something you should count on when security is at stake. Another thing is that you are not hiding the lengths of the domains or usernames. This can also be a major source of insecurity. For example, a user might reasonably guess that ***** is "admin".

What's the point of nesting brackets in Lua?

I'm currently teaching myself Lua for iOS game development, since I've heard lots of very good things about it. I'm really impressed by the level of documentation there is for the language, which makes learning it that much easier.
My problem is that I've found a Lua concept that nobody seems to have a "beginner's" explanation for: nested brackets for quotes. For example, I was taught that long strings with escaped single and double quotes like the following:
string_1 = "This is an \"escaped\" word and \"here\'s\" another."
could also be written without the overall surrounding quotes. Instead one would simply replace them with double brackets, like the following:
string_2 = [[This is an "escaped" word and "here's" another.]]
Those both make complete sense to me. But I can also write the string_2 line with "nested brackets," which include equal signs between both sets of the double brackets, as follows:
string_3 = [===[This is an "escaped" word and "here's" another.]===]
My question is simple. What is the point of the syntax used in string_3? It gives the same result as string_1 and string_2 when given as an an input for print(), so I don't understand why nested brackets even exist. Can somebody please help a noob (me) gain some perspective?
It would be used if your string contains a substring that is equal to the delimiter. For example, the following would be invalid:
string_2 = [[This is an "escaped" word, the characters ]].]]
Therefore, in order for it to work as expected, you would need to use a different string delimiter, like in the following:
string_3 = [===[This is an "escaped" word, the characters ]].]===]
I think it's safe to say that not a lot of string literals contain the substring ]], in which case there may never be a reason to use the above syntax.
It helps to, well, nest them:
print [==[malucart[[bbbb]]]bbbb]==]
Will print:
malucart[[bbbb]]]bbbb
But if that's not useful enough, you can use them to put whole programs in a string:
loadstring([===[print "o m g"]===])()
Will print:
o m g
I personally use them for my static/dynamic library implementation. In the case you don't know if the program has a closing bracket with the same amount of =s, you should determine it with something like this:
local c = 0
while contains(prog, "]" .. string.rep("=", c) .. "]") do
c = c + 1
end
-- do stuff

Resources