What are some general best practices for input validation?

What are some general best practices for input validation? - security

What are some best pratices associated with use of IRIs to prevent character missrepresentation, spoofing, or character injection?

There is no one silver bullet for preventing all attacks that involve injecting control characters. Vulnerabilities are highly dependent on how the data is being used. For instance xss uses the control characters <> where as SQL Injection uses the control characters '"\, to mix both of these filters does not make sense.
One can use a collection of Regular Expressions to insure that data is valid before it is used. A specific regular expression can be used to prevent a specific vulnerability on a function by function basis. Input validation goes beyond the realm of security and is often required for the program to work properly.
Regex's are not always the best way to get the job done. For instance if you are using the mysql library there should be the function call mysql_real_escape_string() which insures that all control characters that mysql recognizes are properly escaped. It is in your best interest to use this function instead of attempting to write your own security system, re-inventing the wheel is bad engineering and can be catastrophic when it comes to security systems.

Related

How do you deal with legacy data integrity issues when rewriting software?

I am working on a project which is a rewrite of an existing legacy software. The legacy software primarily consists of CRUD operations (create, read, update, delete) on an SQL database.
Despite the CRUD-based style of coding, the legacy software is extremely complex. This software complexity is not only the result of the complexity of the problem domain itself, but also the result of poor (and regularly bordering on insane) design decision. This poor coding has lead to the data in the database lacking integrity. These integrity issues are not solely in terms of relationships (foreign keys), but also in terms of the integrity within a single row. E.g., the meaning of column "x" outright contradicts the meaning of column "y". (Before you ask, the answer is "yes", I have analysed the problem domain and correctly understand the meaning and purpose of these columns, and better than the original software developers it seems).
When writing the replacement software, I have used principles from Domain Driven Design and Command Query Reponsibility Segregation, primarily due to the complexity of the domain. E.g., I've designed aggregate roots to enforce invariants in the write model, command handlers to perform "cross-aggregate" consistency checks, query handlers to query intentionally denormalised data in a manner appropriate for various screens, etc, etc.
The replacement software works very well when entering new data, in terms of accuracy and ease of use. In that respect, it is successful. However, because the existing data is full of integrity issues, operations that involve the existing data regularly fail by throwing an exception. This typically occurs because an aggregate can't be read from a repository because the data passed to the constructor violates the aggregate's invariants.
How should I deal with this legacy data that "breaks the rules". The old software worked fine in this respect, because it performed next to no validation. Because of this lack of validation, it was easy for inexperienced users to enter nonsensical data (and experienced users became very valuable because they had years of understanding it's "idiosyncrasies").
The data itself is very important, so it cannot be discarded. What can I do? I've tried sorting out the integrity issues as I go, and this has worked in some cases, but in others it is nearly impossible (e.g., data is outright missing from the database because the original developers decided not to save it). The sheer number of data integrity issues is overwhelming.
What can I do?

for a question tagged with DDD the answer is almost always talk to your domain expert. How do they want things to work.
I also noticed your question is tagged with CQRS. are you actually implementing CQRS? in that case it should be almost a non-issue.
Your domain model will live on the command side of your application and always enforce validation. The read stack will provide just dumb viewmodels. This means that on a read, your domain model isn't even involved and also no validation is applied. It will just show whatever nonsense it can use to populate your viewmodel. However on a write validations are triggered. and any write will need to adher to the full validations of your viewmodel.
Now back to reality: be VERY sure that the validations you implement are actually the valididations required. For example even something simple as a telephone number (often implemented as 3 digits dash 3 digits dash 4 digits). But then companies haver special phone numbers like 1800-CALLME which not only have digits but also have letters, and could even be of different lengths (and different countries might also have different rules). If your system needs to handle this it pritty much means you can't apply any validation on phonenumbers.
This is just an example how what you might think is a real validation really can't be implemented at all because that 1 special case it needs to handle. The rule here becomes again. Talk to your domain expert how he wants to have things handled. But be VERY careful that your validations doens't make it near impossible for real users to use your system. since that's the fastest way to have your project killed.
Update: In DDD you would also hear the term anti-corruption layer. This layer ensures that incomming data meets the expectations of your domain model. This might be the prefered method but if you say you cannot ignore items with garbage data then this might not solve the problem in your case.

How to test mysqli's real_escape_string()?

I'm fairly new to mysqli (not to mysql!) and I'm updating a currently-mysql-function which secures a string (or an (recursive) array with strings) for basic sanitation.
The php.net mysqli::real_escape_string() manual has a very clear warning about that charset. I've implemented this.
How do I test this? I can't find the information I'm looking for. I guess I'm looking for certain strings to input resulting in an unsecure result, and a result considered save.
I don't mean "add slashes to ' <- those single quotes". I'm looking for some more advanced tricks or vulnerabilities.
I'm also not looking for prepared statements. Those are wonderful and I'd love to use those, but not an option at this point because updating I'm a old system as fast as possible, prepared statements are not an option at this point in time. I'll be adding those in the future.

Here is the code you are looking for, I believe. Just change mysql to mysqli.
Also please note that
this function is not to "secure" strings but to format them. Means every string that is going into query have to be processed, no matter if you count it "dangerous" or not.
this function have to be used to format SQL string literals only. And it is utterly useless for all other query parts.
this function should not to be used in the application code, but to support emulated prepared statements only.
Anyway, if your database encoding is conventional utf-8, there is no point to bother with encoding at all. "A clear warning" actually connected to some marginal and extremely rarely used encodings only.

Grails - Security Encoding By Default

I submitted data up to a controller in Grails and did a javascript injection which worked. I was surprised as I figured the default would be to encode parameters as they came into the controllers. Can I set this easily so that all parameters are encoded when they hit the controller? Also, am I safe to do a GORM create with the text as it came up, or is this vulnerable to SQL injection? I see in the guide how to prevent SQL Injection for a find query, but what about object creation?
I also saw that there is an encodeAsHTML method that I can call to encode on the way back down to the client. Wouldn't I want to do this before it went into the database so that I only have to encode once? Thanks!

User Input
The idea of "sanitizing your inputs" (or pre-encoding or pre-escaping) your content is a terrible one. It does not actually protect you when you really need it, and leads to all sorts of design headaches. Even PHP finally dumped the technique.
It's always better to handle the data via proper APIs correctly, which eliminates the risk. For example, SQL injections is absolutely eliminated using prepared statements or statements with placeholders for the content. This technique has been around for a very long time (as long as I've using Java & SQL).
Grails (GORM) automatically handles encoding any content that is saved via the objects, including setting a single property, creating a new object and saving it, or setting properties via obj.properties = params or something similar.
As long as you are using GORM, there is no risk of SQL injection.
Content Storage
Also, it's generally considered incorrect to store the already-encoded information in the database, as that encoding would only be correct for a given display type (such as HTML). If you instead wanted to render it using JSON, the HTML encoding is incorrect. XML is also slightly different, and there are times when you might prefer plain text, too.
Instead, you generally should store the raw (UTF8 or similar) data in the database, and convert it to the correct display type when it is being rendered for display. Note that it's converted when it's being rendered — this doesn't necessarily mean every time you send it to the client. You can use a variety of caching techniques to make sure this doesn't happen too often — including the new cache plugin added to Grails 2.1.
Preventing XSS Attacks
One highly recommended technique, however, is to set the default view codec to HTML, using the grails.views.default.codec option, like this:
grails.views.default.codec = 'html'
This will only affect GSPs, and only content echoed using dollar-sign syntax (${foo}). This gives you the flexibility of overriding this behavior using tags (the recommended way) or the <%= %> syntax for one-off situations. It should, however, provide a decent catch to prevent XSS attacks in general.
Performance Considerations
One final note: the concern about encoding the content as HTML being a performance issue would be considered premature optimization. Chances are extremely high that any performance bottlenecks would be somewhere else, and not in the encoding of the content. Build your application first — using a good design — and optimize later after you can benchmark and analyze the running application.

SQL Injection or Server.HTMLEncode or both? Classic ASP

People say to prevent SQL Injection, you can do one of the following (amongst other things):
Prepare statements (parameterized)
Stored procedures
Escaping user input
I have done item 1, preparing my statements, but I'm now wondering if I should escape all user input as well. Is this a waste of time seeming as I have prepared statements or will this double my chances of prevention?

It's usually a waste of time to escape your input on top of using parametrized statements. If you are using a database "driver" from the database vendor and you are only using parametrized
statements without doing things like SQL String concatenation or trying to parametrize the actual SQL syntax, instead of just providing variable values, then you are already as safe as you can be.
To sum it up, your best option is to trust the database vendor to know how to escape values inside their own SQL implementation instead of trying to roll your own encoder, which for a lot of DBs out there can be a lot more work then you think.
If on top of this you want additional protection you can try using a SQL Monitoring Solution. There are a few available that can spot out-of-the-ordinary SQL queries and block/flag them, or just try to learn your default app behavior and block everything else. Obviously your mileage may vary with these based on your setup and use-cases.

Certainly the first step to prevent SQL Injection attacks is to always use parameterised queries, never concatenate client supplied text into a SQL string. The use of stored procedures is irrelevant once you have taken the step to parameterise.
However there is a secondary source of SQL injection where SQL code itself (usually in an SP) will have to compose some SQL which is then EXEC'd. Hence its still possible to be vunerable to injection even though your ASP code is always using parameterised queries. If you can be certain that none of your SQL does that and will never do that then you are fairly safe from SQL Injection. Depending on what you are doing and what version to SQL Server you are using there are occasions where SQL compositing SQL is unavoidable.
With the above in mind a robust approach may require that your code examines incoming string data for patterns of SQL. This can be fairly intense work because attackers can get quite sophisticated in avoiding SQL pattern detection. Even if you feel the SQL you are using is not vunerable it useful to be able to detect such attempts even if they fail. Being able to pick that up and record additional information about the HTTP requests carrying the attempt is good.
Escaping is the most robust approach, in that case though all the code that uses the data in you database must understand the escaping mechanim and be able to unescape the data to make use of it. Imagine a Server-side report generating tool for example, would need to unescape database fields before including them in reports.
Server.HTMLEncode prevents a different form of Injection. Without it an attacker could inject HTML (include javascript) into the output of your site. For example, imagine a shop front application that allowed customers to review products for other customers to read. A malicious "customer" could inject some HTML that might allow them to gather information about other real customers who read their "review" of a popular product.
Hence always use Server.HTMLEncode on all string data retrieved from a database.

Back in the day when I had to do classic ASP, I used both methods 2 and 3. I liked the performance of stored procs better and it helps to prevent SQL injection. I also used a standard set of includes to filter(escape) user input. To be truly safe, don't use classic ASP, but if you have to, I would do all three.

First, on the injections in general:
Both latter 2 has nothing to do with injection actually.
And the former doesn't cover all the possible issues.
Prepared statements are okay until you have to deal with identifiers.
Stored provedures are vulnerable to injections as well. It is not an option at all.
"escaping" "user input" is most funny of them all.
First, I suppose, "escaping" is intended for the strings only, not whatever "user input". Escaping all other types is quite useless and will protect nothing.
Next, speaking of strings, you have to escape them all, not only coming from the user input.
Finally - no, you don't have to use whatever escaping if you are using prepared statements
Now to your question.
As you may notice, HTMLEncode doesn't contain a word "SQL" in it. One may persume then, that Server.HTMLEncode has absolutely nothing to do with SQL injections.
It is more like another attack prevention, called XSS. Here it seems a more appropriate action and indeed should be used on the untrusted user input.
So, you may use Server.HTMLEncode along with prepared statements. But just remember that it's completely different attacks prevention.
You may also consider to use HTMLEncode before actual HTML output, not at the time of storing data.

What should I be doing to secure my web form UI?

I have a mostly desktop programming background. In my spare time I dabble in web development, page faulting my way from problem to solution with some success. I have reached the point were I need to allow site logins and collect some data from a community of users (that is the plan anyway).
So, I realize there is a whole world of nefarious users out there who are waiting eagerly for an unsecured site to decorate, vandalize and compromise. If I am really lucky, a couple of those users might make their way to my site. I would like to be reasonably prepared for them.
I have a UI to collect information from a logged in user and some of that information is rendered into HTML pages. My site is implemented with PHP/MySQL on the back end and some javascript stuff on the front. I am interested in any suggestions/advice on how I should tackle any of the following issues:
Cross Site Scripting : I am hoping this will be reasonably simple for me since I am not supporting marked down input, just plain text. Should I be just scanning for [A-Za-z ]* and throwing everything else out? I am pretty ignorant about the types of attacks that can be used here, so I would love to hear your advice.
SQL injection : I am using parametized queries (mysqli) here , so I am hoping I am OK in this department. Is there any extra validation I should be doing on the user entered data to protect myself?
Trollish behaviour : I am supporting polylines drawn by the user on a Google Map, so (again if I am lucky enough to get some traffic) I expect to see a few hand drawn phallices scrawled across western Europe. I am planning to implement some user driven moderation (flag inaproriate SO style), but I would be interested in any other suggestions for discouraging this kind of behavior.
Logins : My current login system is a pretty simple web form, MySQL query in PHP, mp5 encoded password validation, and a stored session cookie. I hope the system is simple enough to be secure, but I wonder if their are vulnerabilies here I am not aware of?
I hope I haven't been too verbose here and look forward to hearing your comments.

Your first problem is that you are concerned with your UI. A simple rule to follow is that you should never assume the submitted data is coming from a UI that you created. Don't trust the data coming in, and sanitize the data going out. Use PHP's strip_tags and/or htmlentities.
Certain characters (<,>,",') can screw up your HTML and permit injection, but should be allowed. Especially in passwords. Use htmlentities to permit the use of these characters. Just think about what would happen if certain characters were output without being "escaped".
Javascript based checks and validation should only be used to improve the user experience (i.e. prevent a page reload). Do not use eval except as an absolute last resort.

Cross Site Scripting can be easily taken care of with htmlentities, there is also a function called strip tags which removes the tags from the post and you'll note that this allows you to whitelist certain tags. If you do decide to allow specific tags through in the future keep in mind that the attributes on these tags are not cleaned in any way, this can be used to insert javascript in the page (onClick etc.) and is not really recommended. If you want to have formatting in the future I'd recommend implementing a formatting language (like [b] for bold or something similar) to stop your users from just entering straight html into the page.
SQL Injection is also easily taken care of as you can prepare statements and then pass through the user data as arguments to the prepared statement. This will stop any user input from modifying the sql statement.
CSRF (Cross-Site Request Forgery) is an often overlooked vulnerability that allows an attacker to submit data from a victims account using the form. This is usually done either by specifying your form get string for an img src (the image loads for the victim, the get loads and the form is processed, but the user is unaware ). Additionally if you use post the attacker can use javascript to auto-submit a hidden form to do the same thing as above. To solve this one you need to generate a key for each form, keep one in the session and one on the form itself (as a hidden input). When the form is submitted you compare the key from the input with the key in the session and only continue if they match.
Some security companies also recommend that you use the attribute 'autocomplete="off"' on login forms so the password isn't saved.

Against XSS htmlspecialchars is pretty enough, use it to clear the output.
SQL injection: if mysql parses your query before adds the parameters, afaik its not possible to inject anything malicious.

I would look into something else besides only allowing [A-Za-z]* into your page. Just because you have no intention of allowing any formatting markup now doesn't mean you won't have a need for it down the line. Personally I hate rewriting things that I didn't design to adapt to future needs.
You might want to put together a whitelist of accepted tags, and add/remove from that as necessary, or look into encoding any markup submitted into plain text.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string