I don't understand the security patch from last week: https://typo3.org/teams/security/security-bulletins/typo3-core/typo3-core-sa-2016-022/ . I have an old TYPO3 6.2 installation. I have truncated all cf_* tables and opened the pages with UID 2-6. No cHash. As a result I see 13 cf_cache_hash-entries.
Now I have opened a detail page from a listing page in frontend. I see some parameters in URL like action, controller, the UID of the current displayed record and of cause a cHash.
Then I have copied these parameters (excluding id=x) to the URL of my pages 2-6. In cf_cache_hash I have still 13 records. So, there is no cache flooding.
Or how I have to interprete this quote:
Links with a valid cHash argument lead to newly generated page cache
entries. Because the cHash is not bound to a specific page, attackers
could use valid cHash arguments for multiple pages, leading to
additional useless page cache entries.
Next problem:
If extensions like realurl are used, it is required to flush their
caches (and TYPO3 caches as well)
Can you please tell me WHICH tables I/we should clear?
tx_realurl_urldecodecache
tx_realurl_urlencodecache
are maybe OK. But what about tx_realurl_pathcache? Of cause, I can clear that, but what about older entries for earlier realurl configuration? If I truncate that table, these old entries are not valid anymore and they were not builded again. So, old Search Engine Results are invalid.
Question from one of our customers: Is it enough to clear system cache in backend or should he click on Clear all Cache in Installtool? Nice. IMO, it is not enough and the tables have to be truncated on DB directly. Right.
Next one:
This means if such URLs are indexed by a search engine, visitors from
this search engine will end up on a not properly working page.
Hey cool. And now? What is the solution? Keep it as it is? IMO it depends on an InstallTool setting called: pageNotFoundOnCHashError. Right?
Please tell us what to do and please add some more details how to handle that.
Stefan
For me it boils down to (after installing the updated TYPO3 version):
If you don't use realurl: enable
$GLOBALS['TYPO3_CONF_VARS']['FE']['cHashIncludePageId'] = true;
& and you are probably "done". Of course all old google hits will be done for, but on a "public" site it's quite probable you never cared about google anyway if you didn't run realurl (or similar)
If you use realurl 1.X on a 6.2
Don't enable the config (there'll probably never be a proper patch)
Two options:
take the risk of a DDOS
use the 1.x version from https://github.com/mogic-le/typo3-realurl
If I understand it correctly it will set TYPO3 to no_cache mode if there is no hit on the caching table; While that is a performance issue, it will prevent cache table entries being made (as a side effect)
If you run 7.6+ and realurl 2
Wait for realurl 2.1 (and take the risk?)
Change the caching
framework to something like memcached (it's somewhat suggested
between the lines: If you have a caching backend that cannot be used
for a DDOS, you don't really have to care)
Use the fork from
helhum (though I think that won't help you one bit regarding old
links)
Realurl >= 2.1.0 supports this core option. But you are recommended to update to at least 2.1.4 because that fixes various other cHash issues.
I'd like to exclude certain pages from the Varnish cache based on the content of the page (for instance if the Form uses a particular hidden field which is a security feature and needs to be unique on every page refresh).
I have dozens of forms, so I don't want to have to exclude each unique page individually from the cache.
Is this possible within the VCL?
No, normally not. The proper way to do it would be to set cache-headers (for instance "Cache-Control: no-cache, must-revalidate") on your pages with the non-cacheable forms that varnish in turn will read.
As a nice side effect that will also cancel most client side caches that also often can cause troubles with CAPTCHAs and the like.
In some website for which I have access, there are some input fields. In the sixth field I need to enter some input string from a list of 10000 strings, then a new page appears, for which I would just need to count the number of lines. Finally I would like to get a table with two columns like input string and number of resulting lines. Since I have to manually enter the info for all the different 10000 strings, I wonder therefore what is the best approach to enter a string into a generic formular field and get the resulting text. I heard about curl but I am not sure whether this is the easiest one.
P.S.
Example of interactive way: I type some string o words into google search and then I get a new page with the search results. Previously I have introduced my google username and password, so the results will be probably filtered according to my profile.
Example of non-interactive way: A script somehow introduces my user information, search query and saves to some text file the search results. Imagine the same idea but for a more complicated website like this.
What you want to do is to send a HTTP POST with specific data. This can be done with any proper HTTP client code, and one such is libcurl (or the pycurl binding or even using the curl command line tool). On the response from the post, you probably get a redirect and then the results, or you need to do a separate request for the results and then you're done and go back to do the next POST. Repeat until all POSTs are done.
What you may need to take into account is that you may have to deal with cookies and possibly to follow a redirect from the POST. A good approach is to record a "manual session" as done with a browser (use firebug or LiveHTTPHeaders etc) and then use that recording to help you repeat the same thing with a HTTP client.
A decent tutorial to get some starting up details on this kind of work can be found here: http://curl.haxx.se/docs/httpscripting.html
You could also use JMeter to run all the posts. You may use the CSV input to set the 10000 strings. Then you save the result as xml and extract the necessary data.
I was trying to use the program cURL inside of BASH to download a webpage's source code. I am having difficulty when trying to download a page's code when the page is using more complex encoding than simple HTML. For example I am trying to view the following page's source code with the following command:
curl "http://shop.sprint.com/NASApp/onlinestore/en/Action/DisplayPhones?INTNAV=ATG:HE:Phones"
However the result of this doesn't match the source code generated by Firefox when I click "View source". I believe it is because there are Javascript elements on the page, but I can not be sure.
For example, I can not do:
curl "http://shop.sprint.com/NASApp/onlinestore/en/Action/DisplayPhones?INTNAV=ATG:HE:Phones" | grep "Access to 4G speeds"
Even though that phrase is clearly found in the Firefox source. I tried looking through the man pages but I don't know enough about the problem to figure out a possible solution.
A preferable answer will include why this is not working the way I expect it to and a solution to the issue using curl or another tool executable from a Linux box.
EDIT:
Upon suggestion below I have also included a useragent switch with no success:
curl "http://shop.sprint.com/NASApp/onlinestore/en/Action/DisplayPhones?INTNAV=ATG:HE:Phones" -A "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3" | grep -i "Sorry"
I don't see the "Access to 4G speed" thing in the first place when I go to that page.
The two most likely culprits for this difference are cookies and your user-agent.
You can specify cookies manually using both curl or wget. Dump out your cookies from Firefox using whatever plugins you want, or just
javascript:prompt('',document.cookie);
in your location bar
Then stick read through the man pages for wget or curl and see how to include that cookie.
EDIT:
It appears to be what I thought, a missing cookie.
curl --cookie "INSERT THE COOKIE YOU GOT HERE" http://shop.sprint.com/NASApp/onlinestore/en/Action/DisplayPhones?INTNAV=ATG:HE:Phones | grep "Access to 4G"
As stated above, you can grab whatever you cookie is from above: javascript:prompt('',document.cookie) then copy the default text that comes up. Make sure you're on the sprint page when you stick that in the location bar (otherwise you'll end up with the wrong website's cookie)
EDIT 2
The reason your browser cookie and your shell cookie were different was the different in interaction that took place.
The reason I didn't see the Access to 4G speed thing you were talking about in the first place was that I hadn't entered my zip code.
If you want to have a constantly relevant cookie, you can force curl to do whatever is required to obtain that cookie, in this case, entering a zip code.
In curl, you can do this with multiple requests and holding the retrieved cookies in a cookie jar:
[stackoverflow] curl --help | grep cookie
-b/--cookie <name=string/file> Cookie string or file to read cookies from (H)
-c/--cookie-jar <file> Write cookies to this file after operation (H)
-j/--junk-session-cookies Ignore session cookies read from file (H)
So simply specify a cookie jar, send the request to send the zipcode, then work away.
If you are getting different source code from the same source the server is, most likelly sniffing your user agent and laying out specific code.
Javascript can act on the DOM and do all sorts of things but if you use 'see source' the code will be exactly the same as the one your browser first read (before DOM manipulation).
I am not concerned about other kinds of attacks. Just want to know whether HTML Encode can prevent all kinds of XSS attacks.
Is there some way to do an XSS attack even if HTML Encode is used?
No.
Putting aside the subject of allowing some tags (not really the point of the question), HtmlEncode simply does NOT cover all XSS attacks.
For instance, consider server-generated client-side javascript - the server dynamically outputs htmlencoded values directly into the client-side javascript, htmlencode will not stop injected script from executing.
Next, consider the following pseudocode:
<input value=<%= HtmlEncode(somevar) %> id=textbox>
Now, in case its not immediately obvious, if somevar (sent by the user, of course) is set for example to
a onclick=alert(document.cookie)
the resulting output is
<input value=a onclick=alert(document.cookie) id=textbox>
which would clearly work. Obviously, this can be (almost) any other script... and HtmlEncode would not help much.
There are a few additional vectors to be considered... including the third flavor of XSS, called DOM-based XSS (wherein the malicious script is generated dynamically on the client, e.g. based on # values).
Also don't forget about UTF-7 type attacks - where the attack looks like
+ADw-script+AD4-alert(document.cookie)+ADw-/script+AD4-
Nothing much to encode there...
The solution, of course (in addition to proper and restrictive white-list input validation), is to perform context-sensitive encoding: HtmlEncoding is great IF you're output context IS HTML, or maybe you need JavaScriptEncoding, or VBScriptEncoding, or AttributeValueEncoding, or... etc.
If you're using MS ASP.NET, you can use their Anti-XSS Library, which provides all of the necessary context-encoding methods.
Note that all encoding should not be restricted to user input, but also stored values from the database, text files, etc.
Oh, and don't forget to explicitly set the charset, both in the HTTP header AND the META tag, otherwise you'll still have UTF-7 vulnerabilities...
Some more information, and a pretty definitive list (constantly updated), check out RSnake's Cheat Sheet: http://ha.ckers.org/xss.html
If you systematically encode all user input before displaying then yes, you are safe you are still not 100 % safe.
(See #Avid's post for more details)
In addition problems arise when you need to let some tags go unencoded so that you allow users to post images or bold text or any feature that requires user's input be processed as (or converted to) un-encoded markup.
You will have to set up a decision making system to decide which tags are allowed and which are not, and it is always possible that someone will figure out a way to let a non allowed tag to pass through.
It helps if you follow Joel's advice of Making Wrong Code Look Wrong or if your language helps you by warning/not compiling when you are outputting unprocessed user data (static-typing).
If you encode everything it will. (depending on your platform and the implementation of htmlencode) But any usefull web application is so complex that it's easy to forget to check every part of it. Or maybe a 3rd party component isn't safe. Or maybe some code path that you though did encoding didn't do it so you forgot it somewhere else.
So you might want to check things on the input side too. And you might want to check stuff you read from the database.
As mentioned by everyone else, you're safe as long as you encode all user input before displaying it. This includes all request parameters and data retrieved from the database that can be changed by user input.
As mentioned by Pat you'll sometimes want to display some tags, just not all tags. One common way to do this is to use a markup language like Textile, Markdown, or BBCode. However, even markup languages can be vulnerable to XSS, just be aware.
# Markup example
[foo](javascript:alert\('bar'\);)
If you do decide to let "safe" tags through I would recommend finding some existing library to parse & sanitize your code before output. There are a lot of XSS vectors out there that you would have to detect before your sanitizer is fairly safe.
I second metavida's advice to find a third-party library to handle output filtering. Neutralizing HTML characters is a good approach to stopping XSS attacks. However, the code you use to transform metacharacters can be vulnerable to evasion attacks; for instance, if it doesn't properly handle Unicode and internationalization.
A classic simple mistake homebrew output filters make is to catch only < and >, but miss things like ", which can break user-controlled output out into the attribute space of an HTML tag, where Javascript can be attached to the DOM.
No, just encoding common HTML tokens DOES NOT completely protect your site from XSS attacks. See, for example, this XSS vulnerability found in google.com:
http://www.securiteam.com/securitynews/6Z00L0AEUE.html
The important thing about this type of vulnerability is that the attacker is able to encode his XSS payload using UTF-7, and if you haven't specified a different character encoding on your page, a user's browser could interpret the UTF-7 payload and execute the attack script.
One other thing you need to check is where your input comes from. You can use the referrer string (most of the time) to check that it's from your own page, but putting in a hidden random number or something in your form and then checking it (with a session set variable maybe) also helps knowing that the input is coming from your own site and not some phishing site.
I'd like to suggest HTML Purifier (http://htmlpurifier.org/) It doesn't just filter the html, it basically tokenizes and re-compiles it. It is truly industrial-strength.
It has the additional benefit of allowing you to ensure valid html/xhtml output.
Also n'thing textile, its a great tool and I use it all the time, but I'd run it though html purifier too.
I don't think you understood what I meant re tokens. HTML Purifier doesn't just 'filter', it actually reconstructs the html. http://htmlpurifier.org/comparison.html
I don't believe so. Html Encode converts all functional characters (characters which could be interpreted by the browser as code) in to entity references which cannot be parsed by the browser and thus, cannot be executed.
<script/>
There is no way that the above can be executed by the browser.
**Unless their is a bug in the browser ofcourse.*
myString.replace(/<[^>]*>?/gm, '');
I use it, then successfully.
Strip HTML from Text JavaScript