SQL Tautologies pattern is not clear - security

I have a question about rule id 942130 (https://github.com/fastly/waf_testbed/blob/master/templates/default/REQUEST-942-APPLICATION-ATTACK-SQLI.conf.erb). Why check this rule "only" x = x in this pattern (x can be any character). The attack can only be successful if there is an "or" in the query "or x=x". For this reason, the pattern should be expanded so that "or x = x" is checked. Have I overlooked something?
Example is: "FooH=HA", which is deteced as a sql injection, but don't understand why. The issue is the "H=H".
Shouldn't the pattern compare the whole string before and after = instead of just single characters?

It would be a lot more complicated to describe the rule in a regex (which is already quite complicated). A lot of things could happen between OR and x=x that would still be syntactically valid.
Also OR is not actually needed. Consider for example sql injection where the parameter is meant to be something like "ORDER BY x", and the attacker passing "WHERE x=x". Or HAVING. Or a lot of other potential attacks.
It's just a lot simpler to define a tautology as x=x, which would not normally happen most of the time.

Related

Parse arithmetic/boolean expression but skip capture

Given the following expression
x = a + 3 + b * 5
I would like to write that in the following data structure, where I'm only interested to capture the variables used on the RHS and keep the string intact. Not interesting in parsing a more specific structure since I'm doing a transformation from language to language, and not handling the evaluation
Variable "x" (Expr ["a","b"] "a + 3 + b * 5")
I've been using this tutorial as my starting point, but I'm not sure how to write an expression parser without buildExpressionParser. That doesn't seem to be the way I should approach this.
I am not sure why you want to avoid buildExpressionParser, as it hides a lot of the complexity in parsing expressions with infix operators. It is the right way to do things....
Sorry about that, but now that I got that nag out of the way, I can answer your question.
First, here is some background-
The main reason writing a parser for expressions with infix operators is hard is because of operator precedence. You want to make sure that this
x+y*z
parses as this
+
/ \
x *
/\
y z
and not this
*
/ \
+ z
/ \
x y
Choosing the correct parsetree isn't a very hard problem to solve.... But if you aren't paying attention, you can write some really bad code. Why? Performance....
The number of possible parsetrees, ignoring precedence, grows exponentially with the size of the input. For instance, if you write code to try all possibilities then throw away all but the ones with the proper precedence, you will have a nasty surprise when your parser tackles anything in the real world (remember, exponential complexity often ain't just slow, it is basically not a solution at all.... You may find that you are waiting half an hour for a simple parse, no one will use that parser).
I won't repeat the details of the "proper" solution here (a google search will give the details), except to note that the proper solution runs at O(n) with the size of the input, and that buildExpressionParser hides all the complexity of writing such a parser for you.
So, back to your original question....
Do you need to use buildExpressionParser to get the variables out of the RHS, or is there a better way?
You don't need it....
Since all you care about is getting the variables used in the right side, you don't care about operator precedence. You can just make everything left associative and write a simple O(n) parser. The parsetrees will be wrong, but who cares? You will still get the same variables out. You don't even need a context free grammar for this, this regular expression basically does it
<variable>(<operator><variable>)*
(where <variable> and <operator> are defined in the obvious way).
However....
I wouldn't recommend this, because, as simple as it is, it still will be more work than using buildExpressionParser. And it will be trickier to extend (like adding parenthesis). But most important, later on, you may accidentally use it somewhere where you do need a full parsetree, and be confused for a while why the operator precedence is so completely messed up.
Another solution is, you could rewrite your grammar to remove the ambiguity (again, google will tell you how).... This would be good as a learning exercise, but you basically would be repeating what buildExpressionParser is doing internally.

Advantage to a certain string comparison order

Looking at some pieces of code around the internet, I've noticed some authors tend to write string comparisons like
if("String"==$variable)
in PHP, or
if("String".equals(variable))
Whereas my preference is:
if(variable.equals("String"))
I realize these are effectively equal: they compare two strings for equality. But I was curious if there was an advantage to one over the other in terms of performance or something else.
Thank you for the help!
One example to the approach of using an equality function or using if( constant == variable ) rather than if( variable == constant ) is that it prevents you from accidentally typoing and writing an assignment instead of a comparison, for instance:
if( s = "test" )
Will assign "test" to s, which will result in undesired behaviour which may potentially cause a hard-to-find bug. However:
if( "test" = s )
Will in most languages (that I'm aware of) result in some form of warning or compiler error, helping to avoid a bug later on.
With a simple int example, this prevents accidental writes of
if (a=5)
which would be a compile error if written as
if (5=a)
I sure don't know about all languages, but decent C compilers warn you about if (a=b). Perhaps whatever language your question is written in doesn't have such a feature, so to be able to generate an error in such a case, they have reverted the order of the comparison arguments.
Yoda conditions call these some.
The kind of syntaxis a language uses has nothing to do with efficiency. It is all about how the comparison algorithm works.
In the examples you mentioned, this:
if("String".equals(variable))
and this:
if(variable.equals("String"))
would be exactly the same, because the expression "String" will be treated as a String variable.
Languages that provide a comparison method for Strings, will use the fastest method so you shouldn't care about it, unless you want to implement the method yourself ;)

Programming language without ELSE keyword - is it more complicated?

I'm working on a simple programming language for kids, based on Karel. For controlling program flow, I currently provide these facilities (in pseudocode):
defining parameterless procedures
if [not] EXPRESSION STATEMENT
while [not] EXPRESSION STATEMENT
I don't have any means to return from a procedure, and I don't provide the else statement.
Take the following code for an example:
if something
statement1
if not something
statement2
The execution of code flows to if, executing statement1 if something is true;
then testing if something is not true (but the state of the program has changed!), then executing statement2. This can lead to both tests succeeding.
Does this limit the programmer? So far I've been able to solve all of my example problems by just using if ... if not ..., or using if not first, then if.
So, my question is:
Is adding the else statement necessary? It would make the language a bit more complicated with having more keywords. Are all problems that would be solvable with else statement solvable also without it, albeit more complicated?
Or is omitting the else statement actually making the language more complicated and counter-intuitive?
If something is expensive to evaluate then your language with else might give a problem because the evaluation will be performed twice.
Another potential problem is that if statement1 can modify the value of something you may end up with both tests succeeding - something that could not happen if you used else.
Of course these problems can be mitigated by storing the result in a temporary local variable:
bool result = something
if result
statement1
if not result
statement2
So no you aren't limiting the programmer in what is possible - everything that can be done with else can be done without it by using the above approach. But it is a little more code to write each time and it introduces a few new potential problems for the unwary programmer that would be avoided if you allowed else.
Semantically speaking you could avoid having the else construct, but from practical point of view I don't see any necessity of doing that.
The concept of do something if something is true, otherwise something else is not so strange and confusing, it sounds actually quite straightforward that having to evaluate and negate an expression again just to check its negation.. it's a free (in sense of "with no added complexity") optional synctactic sugar that is automatic when developing a language.
I saw many other features really more useless compared to the else statement.. then you are not considering the fact that evaluating a condition twice maybe harmful for side-effects or for complexity (wasted cpu?) or for the fact itself that you already have calculated it and you have to do it again for a lack of the language not because it's senseful.
If something has side-effects than your approach will cause them to happen twice, which is probably not what you want.
IMHO It's bad idea to teach children to duplicate code.

What are some examples of where using parentheses in a program lowers readability?

I always thought that parentheses improved readability, but in my textbook there is a statement that the use of parentheses dramatically reduces the readability of a program. Does anyone have any examples?
I can find plenty of counterexamples where the lack of parentheses lowered the readability, but the only example I can think of for what the author may have meant is something like this:
if(((a == null) || (!(a.isSomething()))) && ((b == null) || (!(b.isSomething()))))
{
// do some stuff
}
In the above case, the ( ) around the method calls is unnecessary, and this kind of code may benefit from factoring out of terms into variables. With all of those close parens in the middle of the condition, it's hard to see exactly what is grouped with what.
boolean aIsNotSomething = (a == null) || !a.isSomething(); // parens for readability
boolean bIsNotSomething = (b == null) || !b.isSomething(); // ditto
if(aIsNotSomething && bIsNotSomething)
{
// do some stuff
}
I think the above is more readable, but that's a personal opinion. That may be what the author was talking about.
Some good uses of parens:
to distinguish between order of operation when behavior changes without the parens
to distinguish between order of operation when behavior is unaffected, but someone who doesn't know the binding rules well enough is going to read your code. The good citizen rule.
to indicate that an expression within the parens should be evaluated before used in a larger expression: System.out.println("The answer is " + (a + b));
Possibly confusing use of parens:
in places where it can't possibly have another meaning, like in front of a.isSomething() above. In Java, if a is an Object, !a by itself is an error, so clearly !a.isSomething() must negate the return value of the method call.
to link together a large number of conditions or expressions that would be clearer if broken up. As in the code example up above, breaking up the large paranthetical statement into smaller chunks can allow for the code to be stepped through in a debugger more straightforwardly, and if the conditions/values are needed later in the code, you don't end up repeating expressions and doing the work twice. This is subjective, though, and obviously meaningless if you only use the expressions in 1 place and your debugger shows you intermediate evaluated expressions anyway.
Apparently, your textbook is written by someone who hate Lisp.
Any way, it's a matter of taste, there is no single truth for everyone.
I think that parentheses is not a best way to improve readability of your code. You can use new line to underline for example conditions in if statement. I don't use parentheses if it is not required.
Well, consider something like this:
Result = (x * y + p * q - 1) % t and
Result = (((x * y) + (p * q)) - 1) % t
Personally I prefer the former (but that's just me), because the latter makes me think the parantheses are there to change the actual order of operations, when in fact they aren't doing that. Your textbook might also refer to when you can split your calculations in multiple variables. For example, you'll probably have something like this when solving a quadratic ax^2+bx+c=0:
x1 = (-b + sqrt(b*b - 4*a*c)) / (2*a)
Which does look kind of ugly. This looks better in my opinion:
SqrtDelta = sqrt(b*b - 4*a*c);
x1 = (-b + SqrtDelta) / (2*a);
And this is just one simple example, when you work with algorithms that involve a lot of computations, things can get really ugly, so splitting the computations up into multiple parts will help readability more than parantheses will.
Parentheses reduce readability when they are obviously redundant. The reader expects them to be there for a reason, but there is no reason. Hence, a cognitive hiccough.
What do I mean by "obviously" redundant?
Parentheses are redundant when they can be removed without changing the meaning of the program.
Parentheses that are used to disambiguate infix operators are not "obviously redundant", even when they are redundant, except perhaps in the very special case of multiplication and addition operators. Reason: many languages have between 10–15 levels of precedence, many people work in multiple languages, and nobody can be expected to remember all the rules. It is often better to disambiguate, even if parentheses are redundant.
All other redundant parentheses are obviously redundant.
Redundant parentheses are often found in code written by someone who is learning a new language; perhaps uncertainty about the new syntax leads to defensive parenthesizing.
Expunge them!
You asked for examples. Here are three examples I see repeatedly in ML code and Haskell code written by beginners:
Parentheses between if (...) then are always redundant and distracting. They make the author look like a C programmer. Just write if ... then.
Parentheses around a variable are silly, as in print(x). Parentheses are never necessary around a variable; the function application should be written print x.
Parentheses around a function application are redundant if that application is an operand in an infix expression. For example,
(length xs) + 1
should always be written
length xs + 1
Anything taken to an extreme and/or overused can make code unreadable. It wouldn't be to hard to make the same claim with comments. If you have ever looked at code that had a comment for virtually every line of code would tell you that it was difficult to read. Or you could have whitespace around every line of code which would make each line easy to read but normally most people want similar related lines (that don't warrant a breakout method) to be grouped together.
You have to go way over the top with them to really damage readability, but as a matter of personal taste, I have always found;
return (x + 1);
and similar in C and C++ code to be very irritating.
If a method doesn't take parameters why require an empty () to call method()? I believe in groovy you don't need to do this.

In Functional Programming, is it considered a bad practice to have incomplete pattern matchings

Is it generally considered a bad practice to use non-exhaustive pattern machings in functional languages like Haskell or F#, which means that the cases specified don't cover all possible input cases?
In particular, should I allow code to fail with a MatchFailureException etc. or should I always cover all cases and explicitly throw an error if necessary?
Example:
let head (x::xs) = x
Or
let head list =
match list with
| x::xs -> x
| _ -> failwith "Applying head to an empty list"
F# (unlike Haskell) gives a warning for the first code, since the []-case is not covered, but can I ignore it without breaking functional style conventions for the sake of succinctness? A MatchFailure does state the problem quite well after all ...
If you complete your pattern-matchings with a constructor [] and not the catch-all _, the compiler will have a chance to tell you to look again at the function with a warning the day someone adds a third constructor to lists.
My colleagues and I, working on a large OCaml project (200,000+ lines), force ourselves to avoid partial pattern-matching warnings (even if that means writing | ... -> assert false from time to time) and to avoid so-called "fragile pattern-matchings" (pattern matchings written in such a way that the addition of a constructor may not be detected) too. We consider that the maintainability benefits.
Explicit is better than implicit (borrowed from the Zen of Python ;))
It's exactly the same as in a C switch over an enum... It's better to write all the cases (with a fall through) rather than just putting a default, because the compiler will tell you if you add new elements to the enumeration and you forgot to handle them.
I think that it depends quite a bit on the context. Are you trying to write robust, easy to debug code, or are you trying to write something simple and succinct?
If I were working on a long term project with multiple developers, I'd put in the assert to give a more useful error message. I also agree with Pascal's comment that not using a wildcard would be ideal from a software engineering perspective.
If I were working on a smaller scale project on which I was the only developer, I wouldn't think twice about using an incomplete match. If necessary, you can always check the compiler warnings.
I think it also depends a bit on the types you're matching against. Realistically, no extra union cases will be added to the list type, so you don't need to worry about fragile matching. On the other hand, in code that you control and are actively working on, there may well be types which are in flux and have additional union cases added, which means that protecting against fragile matching may be worth it.
This is a special case of a more general question, which is "should you ever create partial functions". Incomplete pattern matches are only one example of partial functions.
As a rule total functions are preferable. When you find yourself looking at a function that just has to be partial, ask yourself if you can solve the problem in the type system first. Sometimes that is more trouble than its worth (e.g. creating a whole type of lists with known lengths just to avoid the "head []" problem). So its a trade-off.
Or maybe you just asking whether its good practice in partial functions to say things like
head [] = error "head: empty list"
In which case the answer is YES!
The Haskell prelude (standard functions) contains many partial functions, e.g. head and tail only work on non-empty lists, but don't ask me why.
This question has two aspects.
For the user of the API, failwith... simply throws a System.Exception, which is unspecific (and therefore is sometimes considered a bad practice in itself). On the other hand, the implicitly thrown MatchFailureException can be specifically caught using a type test pattern, and therefore is preferrable.
For the reviewer of the implementation code, failwith... clearly documents that the implementer has at least given some thought about the possible cases, and therefore is preferrable.
As the two aspects contradict each other, the right answer depends on the circumstances (see also kvb's answer). A solution which is 100% "correct" from any point of view would have to
deal with every case explicitly,
throw a specific exception where necessary, and
clearly document the exception
Example:
/// <summary>Gets the first element of the list.</summary>
/// <exception cref="ArgumentException">The list is empty.</exception>
let head list =
match list with
| [] -> invalidArg "list" "The list is empty."
| x::xs -> x

Resources