What is the best way of prevention from stored XSS ?
should every text field (even plain text) be sanitized server-side to prevent from XSS HTML using something like OWASP Java HTML Sanitizer Project?
or should the client protect itself from XSS bugs by applying XSS prevention rules?
The problem with the first solution is that data may be modified (character encoding, partial or total deletion...). Which can alter the behavior of the application, especially for display concerns.
You apply sanitisation if and only if your data needs to conform to a specific format/standard and you are sure you can safely discard data; e.g. you strip all non-numeric characters from a telephone or credit card number. You always apply escaping for the appropriate context, e.g. HTML-encode user-supplied data when putting it into HTML.
Most of the time you don't want to sanitise, because you want to explicitly allow freeform data input and disallowing certain characters simply makes little sense. One of the few exceptions I see here would be if you're accepting HTML input from your users, you will want to sanitise that HTML to filter out unwanted tags and attributes and ensure the syntax is valid; however, you'd probably want to store the raw, unsanitised version in the database and apply this sanitisation only on output.
The gold standard in security is: Validate your inputs, and encode, not sanitize, your outputs.
First, validate the input server side. A good example of this would be a phone number field on a user profile. Phone numbers should only consist of digits, dashes, and perhaps a +. So why allow users to submit letters, special characters, etc? It only increases attack surface. So validate that field as strictly as you can, and reject bad inputs.
Second, encode the output according to its output context. I'd recommend doing this step server side as well, but it's relatively safe to do client side as long as you are using a good, well tested front-end framework. The main problem with sanitization is that different contexts have different requirements for safety. To prevent XSS when you are injecting user data into an HTML attribute, directly into the page, or into a script tag, you need to do different things. So, you create specific output encoders based on the output context. Entity encode for the HTML context, JSON.stringify for the script context, etc.
The client has to defend against this. For two reasons:
Because this is where the vulnerability happens. You might be calling a 3rd party API and they haven't escaped / encoded everything. It is better not trust anything.
Second, the API could be written for HTML page and for Android App. So why should the server html encode what some may consider html tags in a request when on the way back out it may be going to android app?
Related
We are using the ZAP tool for our application security testing and running across a High priority vulnerability in form of Cross site XSS DOM based. Attached image for reference.
Can someone help to set a direction to resolve this. What could be the possible areas we can look into
Additional information : We are using the following libraries in our application JQuery 3.6.0, Bootstrap 4.3.1 Screen Shot ZAP report
Here's the current solution associated with that alert:
Phase: Architecture and Design Use a vetted library or framework that does not
allow this weakness to occur or provides constructs that make this weakness
easier to avoid. Examples of libraries and frameworks that make it easier to
generate properly encoded output include Microsoft's Anti-XSS library, the OWASP
ESAPI Encoding module, and Apache Wicket. Phases: Implementation; Architecture
and Design Understand the context in which your data will be used and the
encoding that will be expected. This is especially important when transmitting
data between different components, or when generating outputs that can contain
multiple encodings at the same time, such as web pages or multi-part mail
messages. Study all expected communication protocols and data representations to
determine the required encoding strategies. For any data that will be output to
another web page, especially any data that was received from external inputs,
use the appropriate encoding on all non-alphanumeric characters. Consult the XSS
Prevention Cheat Sheet for more details on the types of encoding and escaping
that are needed. Phase: Architecture and Design For any security checks that are
performed on the client side, ensure that these checks are duplicated on the
server side, in order to avoid CWE-602. Attackers can bypass the client-side
checks by modifying values after the checks have been performed, or by changing
the client to remove the client-side checks entirely. Then, these modified
values would be submitted to the server. If available, use structured mechanisms
that automatically enforce the separation between data and code. These
mechanisms may be able to provide the relevant quoting, encoding, and validation
automatically, instead of relying on the developer to provide this capability at
every point where output is generated. Phase: Implementation For every web page
that is generated, use and specify a character encoding such as ISO-8859-1 or
UTF-8. When an encoding is not specified, the web browser may choose a different
encoding by guessing which encoding is actually being used by the web page. This
can cause the web browser to treat certain sequences as special, opening up the
client to subtle XSS attacks. See CWE-116 for more mitigations related to
encoding/escaping. To help mitigate XSS attacks against the user's session
cookie, set the session cookie to be HttpOnly. In browsers that support the
HttpOnly feature (such as more recent versions of Internet Explorer and
Firefox), this attribute can prevent the user's session cookie from being
accessible to malicious client-side scripts that use document.cookie. This is
not a complete solution, since HttpOnly is not supported by all browsers. More
importantly, XMLHTTPRequest and other powerful browser technologies provide read
access to HTTP headers, including the Set-Cookie header in which the HttpOnly
flag is set. Assume all input is malicious. Use an 'accept known good' input
validation strategy, i.e., use an allow list of acceptable inputs that strictly
conform to specifications. Reject any input that does not strictly conform to
specifications, or transform it into something that does. Do not rely
exclusively on looking for malicious or malformed inputs (i.e., do not rely on a
deny list). However, deny lists can be useful for detecting potential attacks or
determining which inputs are so malformed that they should be rejected outright.
When performing input validation, consider all potentially relevant properties,
including length, type of input, the full range of acceptable values, missing or
extra inputs, syntax, consistency across related fields, and conformance to
business rules. As an example of business rule logic, 'boat' may be
syntactically valid because it only contains alphanumeric characters, but it is
not valid if you are expecting colors such as 'red' or 'blue.' Ensure that you
perform input validation at well-defined interfaces within the application. This
will help protect the application even if a component is reused or moved
elsewhere.
It's quite a wall of text, it wasn't written by me :)
You should also refer to the DOM XSS Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html which outlines 7 rules:
RULE #1 - HTML Escape then JavaScript Escape Before Inserting Untrusted Data into HTML Subcontext within the Execution Context
RULE #2 - JavaScript Escape Before Inserting Untrusted Data into HTML Attribute Subcontext within the Execution Context
RULE #3 - Be Careful when Inserting Untrusted Data into the Event Handler and JavaScript code Subcontexts within an Execution Context
RULE #4 - JavaScript Escape Before Inserting Untrusted Data into the CSS Attribute Subcontext within the Execution Context
RULE #5 - URL Escape then JavaScript Escape Before Inserting Untrusted Data into URL Attribute Subcontext within the Execution Context
RULE #6 - Populate the DOM using safe JavaScript functions or properties
RULE #7 - Fixing DOM Cross-site Scripting Vulnerabilities
Which further defines 12 guidelines.
The TipTap editor and its progeny quasar-tiptap can export user created content in the browser in both HTML and JSON formats. If I plan on allowing round trips of the user data to my server what are the pros and cons of using either format for storage.
I would assume HTML has a greater likelihood of XSS attacks, and indeed such a vulnerability has been found (and rectified) in the past. And using JSON would be easier for backend parsing should it ever be required.
Beyond this, are there any major benefits of using either format? Preserving fidelity of user input is important. Size is important (any differences in image storage?). Editor performance is important. Scripting attack vulnerability is extremely important.
Which to choose?
I am sure at 12 months later you have made your choice. The usual caveats that this question is not really answerable and can only really elicit opinions. However, the best I can offer (and I am definately no expert):
User entered text is a risk (cross-site scripting, SQL injection attack, potentially creating a false alert dialog with disinformation for the user reading another user's content) regardless of how the rich text data is sent to the server and/or stored. This is because eventually the content will be a) very likely stored in a database b) sent to and interpreted by a browser in HTML format. Whether it goes through an intermediary JSON format is largely irrelevant. Stripping out all unwanted tags, attributes and SQL commands is going to be an extremely important server responsibility (known as sanitising), whatever the format. Many more well tested sanitising libraries exist for HTML in many more languages/server side technologies than for the relatively niche ProseMirror JSON format used by TipTap.
The TipTap documentation talks about the advantages of the 2 formats here
The ProseMirror JSON format is reasonably verbose, and inspecting both the JSON and HTML side by side, the JSON format seems to require more characters. Obviously it will depend on the rich text itself - how many formatting marks etc, but I would not worry about this in terms of server storage. Given you are likely using Vue or React with TipTap, it is likely you can set up an editor and have 2 seperate divs displaying the output side by side. You can see the 2 formats for the same text by scrolling to the bottom of this page
Editor performance - if you store the data as HTML you can pass it straight to the client. If you store it as JSON you will have to parse it and create the HTML. If you do this client side, the client will have to download and execute a library to perform this task - on NPM this is currently 14kB, so not a big issue.
Both formats will almost certainly send image data to the server as Base64 strings, so no bandwidth will be saved with either format for image files. Base64 is less efficient for DB storage as compared to saving as binary objects, particularly if compression is used, so you could strip these out at the server, but this will have a cost in time spent developing and testing the backend which may well be better spent on other things
This is my first question on stack overflow and I have taken a lot of time to search for the similar question but surprisingly could not find one.
So I read that no data should be trusted, whether from a client or that which is coming out of a database. Now while there are lots of examples that show how to sanitize data from a user ($_POST or $_GET), I could not find one that shows how the data from a database should be sanitized.
Now maybe it's the same as the data coming from a user / client (that's what I think it should be) but I found no example of it. So I am asking it just to make sure.
So for example if the result of a query yields as follows:-
$row=mysqli_fetch_assoc($result);
$pw = $row['Password'];
$id = $row['ID'];
$user = $row['Username'];
then do the variables $pw, $id and $user have to be sanitized before they should be used in the program? If so, then how ?
Thanks to all.
Your thinking is back to front here. By the time you are able to sanitise inputs using php, it's probably too late. The data is already in php. You don't sanitise inputs. You:
validate input & sanitise output
Normally a database is wrapped by the application tier. So the only data in there should have been filtered and escaped by your code. You should be able to trust it. But even then, in a relational database the data is fairly strongly typed. Hence there is little scope for attacking php from the data tier.
But you should be sanitising (escaping or encoding) any output. How you do that depends on where and how you are sending the data, hence it should be done at the point where it leaves php, not the point where it enters php. And the method you use (mysqli_escape, HTMLentities, base64, urlencode.....) Should be appropriate to where the data is going. Indeed it is better practice to change the representation of a copy of the data (and discard it after use) rather than the original.
It depends... How are you accessing this database? Who works on / maintains it? Going in is definitely a far bigger concern. However, if you wanted to sanitize it coming out of a database you need to know what you are sanitizing for. If you want to sanitize web traffic against XSS you'd probably want to remove all url's not on a whitelist, perhaps script tags and a few other things as well. Are you sanitizing data going into a C/C++ program? Then you probably want to make sure you're protecting yourself against buffer overflow issues as this is a legitimate avenue of attack.
I'm drawing some assumptions about your design here but I'm going to assume you're just working on the model aspect of an MVC application using PHP. PHP, in this case, has been most vulnerable to SQL Injection attacks on the backend, and XSS (cross site scripting) attacks on the front end. (NOTE: This isn't a PHP problem exclusively, this is a problem in all programming and different languages provide different solutions to different problems. Remember - you need to know what you're sanitizing for what reason. There is no one size fits all.
So really, unless you are sanitizing against something universal in all the code this model will sanitize for, you probably don't want to sanitize here. XSS would be a bigger concern to you now than sql injection... the way out is too late to stop an injection attack.
To take some liberty just to get the juices flowing - From a security standpoint, given your code seems to revolve around authentication, I would be much more immediately concerned around how you are storing and processing your credentialing data. A few things should definitely be doing:
Running the password through a secure, 1-way hash BEFORE storage (such as BCrypt).
Salting these hashes (with a different salt for EACH user) before storing them in the database to protect your user's data from things such as rainbow table attacks.
Using TLS for all communications.
Establishing and maintaining a secure session (track user-login without exposing password data with every single request sent, amongst other things).
I submitted data up to a controller in Grails and did a javascript injection which worked. I was surprised as I figured the default would be to encode parameters as they came into the controllers. Can I set this easily so that all parameters are encoded when they hit the controller? Also, am I safe to do a GORM create with the text as it came up, or is this vulnerable to SQL injection? I see in the guide how to prevent SQL Injection for a find query, but what about object creation?
I also saw that there is an encodeAsHTML method that I can call to encode on the way back down to the client. Wouldn't I want to do this before it went into the database so that I only have to encode once? Thanks!
User Input
The idea of "sanitizing your inputs" (or pre-encoding or pre-escaping) your content is a terrible one. It does not actually protect you when you really need it, and leads to all sorts of design headaches. Even PHP finally dumped the technique.
It's always better to handle the data via proper APIs correctly, which eliminates the risk. For example, SQL injections is absolutely eliminated using prepared statements or statements with placeholders for the content. This technique has been around for a very long time (as long as I've using Java & SQL).
Grails (GORM) automatically handles encoding any content that is saved via the objects, including setting a single property, creating a new object and saving it, or setting properties via obj.properties = params or something similar.
As long as you are using GORM, there is no risk of SQL injection.
Content Storage
Also, it's generally considered incorrect to store the already-encoded information in the database, as that encoding would only be correct for a given display type (such as HTML). If you instead wanted to render it using JSON, the HTML encoding is incorrect. XML is also slightly different, and there are times when you might prefer plain text, too.
Instead, you generally should store the raw (UTF8 or similar) data in the database, and convert it to the correct display type when it is being rendered for display. Note that it's converted when it's being rendered — this doesn't necessarily mean every time you send it to the client. You can use a variety of caching techniques to make sure this doesn't happen too often — including the new cache plugin added to Grails 2.1.
Preventing XSS Attacks
One highly recommended technique, however, is to set the default view codec to HTML, using the grails.views.default.codec option, like this:
grails.views.default.codec = 'html'
This will only affect GSPs, and only content echoed using dollar-sign syntax (${foo}). This gives you the flexibility of overriding this behavior using tags (the recommended way) or the <%= %> syntax for one-off situations. It should, however, provide a decent catch to prevent XSS attacks in general.
Performance Considerations
One final note: the concern about encoding the content as HTML being a performance issue would be considered premature optimization. Chances are extremely high that any performance bottlenecks would be somewhere else, and not in the encoding of the content. Build your application first — using a good design — and optimize later after you can benchmark and analyze the running application.
I have a mostly desktop programming background. In my spare time I dabble in web development, page faulting my way from problem to solution with some success. I have reached the point were I need to allow site logins and collect some data from a community of users (that is the plan anyway).
So, I realize there is a whole world of nefarious users out there who are waiting eagerly for an unsecured site to decorate, vandalize and compromise. If I am really lucky, a couple of those users might make their way to my site. I would like to be reasonably prepared for them.
I have a UI to collect information from a logged in user and some of that information is rendered into HTML pages. My site is implemented with PHP/MySQL on the back end and some javascript stuff on the front. I am interested in any suggestions/advice on how I should tackle any of the following issues:
Cross Site Scripting : I am hoping this will be reasonably simple for me since I am not supporting marked down input, just plain text. Should I be just scanning for [A-Za-z ]* and throwing everything else out? I am pretty ignorant about the types of attacks that can be used here, so I would love to hear your advice.
SQL injection : I am using parametized queries (mysqli) here , so I am hoping I am OK in this department. Is there any extra validation I should be doing on the user entered data to protect myself?
Trollish behaviour : I am supporting polylines drawn by the user on a Google Map, so (again if I am lucky enough to get some traffic) I expect to see a few hand drawn phallices scrawled across western Europe. I am planning to implement some user driven moderation (flag inaproriate SO style), but I would be interested in any other suggestions for discouraging this kind of behavior.
Logins : My current login system is a pretty simple web form, MySQL query in PHP, mp5 encoded password validation, and a stored session cookie. I hope the system is simple enough to be secure, but I wonder if their are vulnerabilies here I am not aware of?
I hope I haven't been too verbose here and look forward to hearing your comments.
Your first problem is that you are concerned with your UI. A simple rule to follow is that you should never assume the submitted data is coming from a UI that you created. Don't trust the data coming in, and sanitize the data going out. Use PHP's strip_tags and/or htmlentities.
Certain characters (<,>,",') can screw up your HTML and permit injection, but should be allowed. Especially in passwords. Use htmlentities to permit the use of these characters. Just think about what would happen if certain characters were output without being "escaped".
Javascript based checks and validation should only be used to improve the user experience (i.e. prevent a page reload). Do not use eval except as an absolute last resort.
Cross Site Scripting can be easily taken care of with htmlentities, there is also a function called strip tags which removes the tags from the post and you'll note that this allows you to whitelist certain tags. If you do decide to allow specific tags through in the future keep in mind that the attributes on these tags are not cleaned in any way, this can be used to insert javascript in the page (onClick etc.) and is not really recommended. If you want to have formatting in the future I'd recommend implementing a formatting language (like [b] for bold or something similar) to stop your users from just entering straight html into the page.
SQL Injection is also easily taken care of as you can prepare statements and then pass through the user data as arguments to the prepared statement. This will stop any user input from modifying the sql statement.
CSRF (Cross-Site Request Forgery) is an often overlooked vulnerability that allows an attacker to submit data from a victims account using the form. This is usually done either by specifying your form get string for an img src (the image loads for the victim, the get loads and the form is processed, but the user is unaware ). Additionally if you use post the attacker can use javascript to auto-submit a hidden form to do the same thing as above. To solve this one you need to generate a key for each form, keep one in the session and one on the form itself (as a hidden input). When the form is submitted you compare the key from the input with the key in the session and only continue if they match.
Some security companies also recommend that you use the attribute 'autocomplete="off"' on login forms so the password isn't saved.
Against XSS htmlspecialchars is pretty enough, use it to clear the output.
SQL injection: if mysql parses your query before adds the parameters, afaik its not possible to inject anything malicious.
I would look into something else besides only allowing [A-Za-z]* into your page. Just because you have no intention of allowing any formatting markup now doesn't mean you won't have a need for it down the line. Personally I hate rewriting things that I didn't design to adapt to future needs.
You might want to put together a whitelist of accepted tags, and add/remove from that as necessary, or look into encoding any markup submitted into plain text.