Browser codepage detection

Browser codepage detection - browser

I have an ASP.Net page, where a user can enter some text in a TEXTAREA and submit it to the server. This text will be stored in a database and will be presented in a winform application.
How can I make sure that the winform application presents the exact characters that the user entered in the TEXTAREA.
That is, do I have a potential problem like for example if the user enters special language specific letters such as Æ, Ø and Å, which are Danish letters?
Those letters have different codes depending on the codepage, so as far as I can see, I need to know what codepage the TEXTAREA control is showing its input in. Or am i missing something here?
I have tried to find material on this on the net, but it is difficult to find something that addresses this issue. I typically found pages talking about what codepage the server requires the browser to use, in order to display the sent data correctly.
But my question goes the other way, i.e. from client to server.

You could also use the HEBCI: HTML Entity-Based Codepage Inference technique if you REALY want to be sure that users sending text with crappy browsers don't corrupt your data-backbone.
In essence this is how it works:
Every codepage has its own finger-print. For instance the single entity "º" could be used to distinguish between the Big Three: ISO-8859-1/Windows-1252 (=BA), MacRoman(=BC), and UTF-8 (=C2BA).
In a form you simply add a hidden input containing those fingerprints as entity's (like °, ÷, and —) and when the users submits the form you simply check the returned hex-values and compare them against your finger-print table.
IF this does not give a match, only THEN continue other fall-back solutions.
A slightly larger implementation works great with only five codepoints:
my #fp_ents = qw/deg divide mdash bdquo euro/;
my %fingerprints = (
"UTF-8" => ['c2b0','c3b7','e28094','e2809e','e282ac'],
"WINDOWS-1252" => ['b0','f7','97','84','80'],
"MAC" => ['a1','d6','d1','e3','db'],
"MS-HEBR" => ['b0','ba','97','84','80'],
"MAC-CYRILLIC" => ['a1','d6','d1','d7',''],
"MS-GREEK" => ['b0','','97','84','80'],
"MAC-IS" => ['a1','d6','d0','e3',''],
"MS-CYRL" => ['b0','','97','84','88'],
"MS932" => ['818b','8180','815c','',''],
"WINDOWS-31J" => ['818b','8180','815c','',''],
"WINDOWS-936" => ['a1e3','a1c2','a1aa','',''],
"MS_KANJI" => ['818b','8180','','',''],
"ISO-8859-15" => ['b0','f7','','','a4'],
"ISO-8859-1" => ['b0','f7','','',''],
"CSIBM864" => ['80','dd','','',''],
);

You can look at the content-type header to find out the encoding.
For more details see this SO answer to a related question.

Related

Example of restricted_package_name usage in FCM

I am currently working on two different apps but testing them on a single hardware device (as that is one of my usage scenarios) - I have seen a similar question at StackOverflow on how to send notifications to one of two apps in the same project at firebase
However, in my case I have two different apps (with different package names) and want to be able to differentiate when I use push_service.notify_single_device().
From the previous question I understand I can use restricted_package_name but after lots of googling, I am yet to find any example code. How do I use this?
Would the following usage be correct?
pushresult = push_service.notify_single_device(registration_id=requester_fcm_key,
message_title="Message Title",
message_body="Please hold on while we connect you",
restricted_package_name='com.domain.packagename',
data_message=requester_data_message)
It would be great if someone could share some sample/example code that I could take a look at and understand.

While looking for answers, I loosened my search to include FCM usage in other languages and sure enough, I found a PHP example at stefanhoth's GitHub page
$fields = array(
'registration_ids' => $registrationIDs,
'restricted_package_name' => 'com.example.myandroidapp',
'collapse_key' => 'somekey_'.$messageType,
'data' => array( "KEY_GCM_MESSAGE_TYPE" => $messageType,
"payload" => $message ),
);

Invalid length for a Base-64 char array - System.Web.UI.ViewStateException: Invalid viewstate [duplicate]

I have very little to go on here. I can't reproduce this locally, but when users get the error I get an automatic email exception notification:
Invalid length for a Base-64 char array.
at System.Convert.FromBase64String(String s)
at System.Web.UI.ObjectStateFormatter.Deserialize(String inputString)
at System.Web.UI.ObjectStateFormatter.System.Web.UI.IStateFormatter.Deserialize(String serializedState)
at System.Web.UI.Util.DeserializeWithAssert(IStateFormatter formatter, String serializedState)
at System.Web.UI.HiddenFieldPageStatePersister.Load()
I'm inclined to think there is a problem with data that is being assigned to viewstate.
For example:
List<int> SelectedActionIDList = GetSelectedActionIDList();
ViewState["_SelectedActionIDList"] = SelectedActionIDList;
It's difficult to guess the source of the error without being able to reproduce the error locally.
If anyone has had any experience with this error, I would really like to know what you found out.

After urlDecode processes the text, it replaces all '+' chars with ' ' ... thus the error. You should simply call this statement to make it base 64 compatible again:
sEncryptedString = sEncryptedString.Replace(' ', '+');

I've seen this error caused by the combination of good sized viewstate and over aggressive content-filtering devices/firewalls (especially when dealing with K-12 Educational institutions).
We worked around it by storing Viewstate in SQL Server. Before going that route, I would recommend trying to limit your use of viewstate by not storing anything large in it and turning it off for all controls which do not need it.
References for storing ViewState in SQL Server:
MSDN - Overview of PageStatePersister
ASP Alliance - Simple method to store viewstate in SQL Server
Code Project - ViewState Provider Model

My guess is that something is either encoding or decoding too often - or that you've got text with multiple lines in.
Base64 strings have to be a multiple of 4 characters in length - every 4 characters represents 3 bytes of input data. Somehow, the view state data being passed back by ASP.NET is corrupted - the length isn't a multiple of 4.
Do you log the user agent when this occurs? I wonder whether it's a badly-behaved browser somewhere... another possibility is that there's a proxy doing naughty things. Likewise try to log the content length of the request, so you can see whether it only happens for large requests.

Try this:
public string EncodeBase64(string data)
{
string s = data.Trim().Replace(" ", "+");
if (s.Length % 4 > 0)
s = s.PadRight(s.Length + 4 - s.Length % 4, '=');
return Encoding.UTF8.GetString(Convert.FromBase64String(s));
}

int len = qs.Length % 4;
if (len > 0) qs = qs.PadRight(qs.Length + (4 - len), '=');
where qs is any base64 encoded string

As others have mentioned this can be caused when some firewalls and proxies prevent access to pages containing a large amount of ViewState data.
ASP.NET 2.0 introduced the ViewState Chunking mechanism which breaks the ViewState up into manageable chunks, allowing the ViewState to pass through the proxy / firewall without issue.
To enable this feature simply add the following line to your web.config file.
<pages maxPageStateFieldLength="4000">
This should not be used as an alternative to reducing your ViewState size but it can be an effective backstop against the "Invalid length for a Base-64 char array" error resulting from aggressive proxies and the like.

This isn't an answer, sadly. After running into the intermittent error for some time and finally being annoyed enough to try to fix it, I have yet to find a fix. I have, however, determined a recipe for reproducing my problem, which might help others.
In my case it is SOLELY a localhost problem, on my dev machine that also has the app's DB. It's a .NET 2.0 app I'm editing with VS2005. The Win7 64 bit machine also has VS2008 and .NET 3.5 installed.
Here's what will generate the error, from a variety of forms:
Load a fresh copy of the form.
Enter some data, and/or postback with any of the form's controls. As long as there is no significant delay, repeat all you like, and no errors occur.
Wait a little while (1 or 2 minutes maybe, not more than 5), and try another postback.
A minute or two delay "waiting for localhost" and then "Connection was reset" by the browser, and global.asax's application error trap logs:
Application_Error event: Invalid length for a Base-64 char array.
Stack Trace:
at System.Convert.FromBase64String(String s)
at System.Web.UI.ObjectStateFormatter.Deserialize(String inputString)
at System.Web.UI.Util.DeserializeWithAssert(IStateFormatter formatter, String serializedState)
at System.Web.UI.HiddenFieldPageStatePersister.Load()
In this case, it is not the SIZE of the viewstate, but something to do with page and/or viewstate caching that seems to be biting me. Setting <pages> parameters enableEventValidation="false", and viewStateEncryption="Never" in the Web.config did not change the behavior. Neither did setting the maxPageStateFieldLength to something modest.

Take a look at your HttpHandlers. I've been noticing some weird and completely random errors over the past few months after I implemented a compression tool (RadCompression from Telerik). I was noticing errors like:
System.Web.HttpException: Unable to validate data.
System.Web.HttpException: The client disconnected.---> System.Web.UI.ViewStateException: Invalid viewstate.
and
System.FormatException: Invalid length for a Base-64 char array.
System.Web.HttpException: The client disconnected. ---> System.Web.UI.ViewStateException: Invalid viewstate.
I wrote about this on my blog.

This is because of a huge view state, In my case I got lucky since I was not using the viewstate. I just added enableviewstate="false" on the form tag and view state went from 35k to 100 chars

During initial testing for Membership.ValidateUser with a SqlMembershipProvider, I use a hash (SHA1) algorithm combined with a salt, and, if I changed the salt length to a length not divisible by four, I received this error.
I have not tried any of the fixes above, but if the salt is being altered, this may help someone pinpoint that as the source of this particular error.

As Jon Skeet said, the string must be multiple of 4 bytes. But I was still getting the error.
At least it got removed in debug mode. Put a break point on Convert.FromBase64String() then step through the code. Miraculously, the error disappeared for me :) It is probably related to View states and similar other issues as others have reported.

In addition to #jalchr's solution that helped me, I found that when calling ATL::Base64Encode from a c++ application to encode the content you pass to an ASP.NET webservice, you need something else, too. In addition to
sEncryptedString = sEncryptedString.Replace(' ', '+');
from #jalchr's solution, you also need to ensure that you do not use the ATL_BASE64_FLAG_NOPAD flag on ATL::Base64Encode:
BOOL bEncoded = Base64Encode(lpBuffer,
nBufferSizeInBytes,
strBase64Encoded.GetBufferSetLength(base64Length),
&base64Length,ATL_BASE64_FLAG_NOCRLF/*|ATL_BASE64_FLAG_NOPAD*/);

Real time Prefix matching and auto-complete in Quora

How is real time autocomplete with prefix matching implemented in Quora ?
Since Solr and Sphinx doesn't support real-time updating so what changes were made to support real time updating?

Looks like it's done using javascript and jquery. I grabbed a few key lines from the minified script on the Quora homepage that I think support this theory:
Here's an ajax call to a resource providing JSON data:
$.ajax({type:"GET",url:this.resultsQueryPath,dataType:"json",data:a,success:this.fnbind(ƒ(a){this.ajaxCallback(a)}),error:this.fnbind(ƒ(a,b,c){console.log(b,c),this.requestOutstanding=!1,this.$("##results_shell").html("Could not retrieve results: "+b)})})}
note that the successful result gets put into the "a" variable. Then later here's the autocompletion based on the keydown of the "question_box" element which is completing from the parent of "a"
this.$ ("##item input.question_box").keydown (ƒ (b) {
if (b.keyCode==9&&!b.shiftKey)for (var c=e.getLiveDomId (a.cid),d=a.parent ().orderedVisibleChildren (),f\^M=0;f<d.length-1;++f)if (c==d [f]) {
$ (this).blur (),$ ("#"+d [f+1]+" input.question_box").focus ();return!1}
})
I think this is pretty incontrovertible, but it would still be nice to have the un-minified script to compare. For instance I can't see where resultsQueryPath comes from (I can't locate it's source, may be intentionally obfuscated).

drupal_add_js() only adds the JS when no error message (D6)

In my custom form (in a custom module) drupal_add_js() only adds the JS when there is no error message.
My code goes like this:
function ntcf_redo_order_form( &$form_state = array() ) {
global $base_path, $user;
$my_dir = drupal_get_path('module', 'ntcf_redo');
drupal_add_js("$my_dir/order.js", 'module', 'header', FALSE, TRUE, FALSE);
$form = array();
...
return $form;
}
If the validation function used _form_set_error()_ to display an error message and highlight the offending field, the message is displayed and the field highlighted, but the _drupal_add_js()_ call does nothing. Without a pending error message to display, all is well.
EDIT: this problem does not occur with drupal_set_message(), only with form_set_error().
I tried adding the 3 later parameters to the *drupal_add_js()* call to tell it to not optimize it (don't combine it with other JS files). There is no mention of the file order.js in the HTML, and it makes no difference whether I use the last 4 parameters ('header', FALSE, TRUE, FALSE) or not.
In Admin/Performance, I turned off Optimize Javascript Files, and pretty much all caching, which also made no difference.
Extra Details:
I'm not sure if this makes a difference, but it wouldn't surprise me, so here goes:
What I'm doing here is a multi-part "wizard" form that allows the user to proceed forward and go back. Also, many of the pages use AJAX, so I need to do all the "required" field validation in the _submit function instead of letting Drupal do it automatically (since that makes a mess of AJAX). So, if there's a "required" field that's missing, the _submit() function sets an error message, and the form generation function generates the same form again (with the additional decoration resulting from the error message).
Also: this is off-topic, but it might help someone using Google: when doing a multi-page form that allows going backward, you MUST assign a weight to every item on the form, or else the fields tend to "wander" when you go backwards.
Any ideas?

I had the same problem, this is a workaround I found (for Drupal 7, may work in 6) :
1. in your form setup (or hook_form_alter), do this :
$form['#post_render'][]='yourfunction';
2. define :
function yourfunction($content,$element){
$my_dir = drupal_get_path('module', 'ntcf_redo');
drupal_add_js("$my_dir/order.js", 'module', 'header', FALSE, TRUE, FALSE);
return $content;
}
I think this works (while your approach does not), because hook_form_alter (and/or hook_form)
do NOT get called again for a prepared/cached form, so the initial form load WILL load the javascript, but subsequent posts will NOT.
HTH

Mikes answer ($form['#post_render'][]='yourfunction';), will work, though its not the optimal way and will cause issues with drupal_add_js.
The best way to do this is by adding your javascript via the form api '#attached'.
Instead of using drupal_add_js or a new callback on the '#post_render':
$form['#attached']['js'] = array(
drupal_get_path('module', 'module_name') .'/file/path/filename.js',
);
You may pass in a 'css' array as well. Being an array, you can pass in as may files as you want.
*This is for Drupal 7. Other versions may be different.

Best way to handle security and avoid XSS with user entered URLs

We have a high security application and we want to allow users to enter URLs that other users will see.
This introduces a high risk of XSS hacks - a user could potentially enter javascript that another user ends up executing. Since we hold sensitive data it's essential that this never happens.
What are the best practices in dealing with this? Is any security whitelist or escape pattern alone good enough?
Any advice on dealing with redirections ("this link goes outside our site" message on a warning page before following the link, for instance)
Is there an argument for not supporting user entered links at all?
Clarification:
Basically our users want to input:
stackoverflow.com
And have it output to another user:
stackoverflow.com
What I really worry about is them using this in a XSS hack. I.e. they input:
alert('hacked!');
So other users get this link:
stackoverflow.com
My example is just to explain the risk - I'm well aware that javascript and URLs are different things, but by letting them input the latter they may be able to execute the former.
You'd be amazed how many sites you can break with this trick - HTML is even worse. If they know to deal with links do they also know to sanitise <iframe>, <img> and clever CSS references?
I'm working in a high security environment - a single XSS hack could result in very high losses for us. I'm happy that I could produce a Regex (or use one of the excellent suggestions so far) that could exclude everything that I could think of, but would that be enough?

If you think URLs can't contain code, think again!
https://owasp.org/www-community/xss-filter-evasion-cheatsheet
Read that, and weep.
Here's how we do it on Stack Overflow:
/// <summary>
/// returns "safe" URL, stripping anything outside normal charsets for URL
/// </summary>
public static string SanitizeUrl(string url)
{
return Regex.Replace(url, #"[^-A-Za-z0-9+&##/%?=~_|!:,.;\(\)]", "");
}

The process of rendering a link "safe" should go through three or four steps:
Unescape/re-encode the string you've been given (RSnake has documented a number of tricks at http://ha.ckers.org/xss.html that use escaping and UTF encodings).
Clean the link up: Regexes are a good start - make sure to truncate the string or throw it away if it contains a " (or whatever you use to close the attributes in your output); If you're doing the links only as references to other information you can also force the protocol at the end of this process - if the portion before the first colon is not 'http' or 'https' then append 'http://' to the start. This allows you to create usable links from incomplete input as a user would type into a browser and gives you a last shot at tripping up whatever mischief someone has tried to sneak in.
Check that the result is a well formed URL (protocol://host.domain[:port][/path][/[file]][?queryField=queryValue][#anchor]).
Possibly check the result against a site blacklist or try to fetch it through some sort of malware checker.
If security is a priority I would hope that the users would forgive a bit of paranoia in this process, even if it does end up throwing away some safe links.

Use a library, such as OWASP-ESAPI API:
PHP - http://code.google.com/p/owasp-esapi-php/
Java - http://code.google.com/p/owasp-esapi-java/
.NET - http://code.google.com/p/owasp-esapi-dotnet/
Python - http://code.google.com/p/owasp-esapi-python/
Read the following:
https://www.golemtechnologies.com/articles/prevent-xss#how-to-prevent-cross-site-scripting
https://www.owasp.org/
http://www.secbytes.com/blog/?p=253
For example:
$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$esapi = new ESAPI( "/etc/php5/esapi/ESAPI.xml" ); // Modified copy of ESAPI.xml
$sanitizer = ESAPI::getSanitizer();
$sanitized_url = $sanitizer->getSanitizedURL( "user-homepage", $url );
Another example is to use a built-in function. PHP's filter_var function is an example:
$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$sanitized_url = filter_var($url, FILTER_SANITIZE_URL);
Using filter_var allows javascript calls, and filters out schemes that are neither http nor https. Using the OWASP ESAPI Sanitizer is probably the best option.
Still another example is the code from WordPress:
http://core.trac.wordpress.org/browser/tags/3.5.1/wp-includes/formatting.php#L2561
Additionally, since there is no way of knowing where the URL links (i.e., it might be a valid URL, but the contents of the URL could be mischievous), Google has a safe browsing API you can call:
https://developers.google.com/safe-browsing/lookup_guide
Rolling your own regex for sanitation is problematic for several reasons:
Unless you are Jon Skeet, the code will have errors.
Existing APIs have many hours of review and testing behind them.
Existing URL-validation APIs consider internationalization.
Existing APIs will be kept up-to-date with emerging standards.
Other issues to consider:
What schemes do you permit (are file:/// and telnet:// acceptable)?
What restrictions do you want to place on the content of the URL (are malware URLs acceptable)?

Just HTMLEncode the links when you output them. Make sure you don't allow javascript: links. (It's best to have a whitelist of protocols that are accepted, e.g., http, https, and mailto.)

You don't specify the language of your application, I will then presume ASP.NET, and for this you can use the Microsoft Anti-Cross Site Scripting Library
It is very easy to use, all you need is an include and that is it :)
While you're on the topic, why not given a read on Design Guidelines for Secure Web Applications
If any other language.... if there is a library for ASP.NET, has to be available as well for other kind of language (PHP, Python, ROR, etc)

For Pythonistas, try Scrapy's w3lib.
OWASP ESAPI pre-dates Python 2.7 and is archived on the now-defunct Google Code.

How about not displaying them as a link? Just use the text.
Combined with a warning to proceed at your own risk may be enough.
addition - see also Should I sanitize HTML markup for a hosted CMS? for a discussion on sanitizing user input

There is a library for javascript that solves this problem
https://github.com/braintree/sanitize-url
Try it =)

In my project written in JavaScript I use this regex as white list:
url.match(/^((https?|ftp):\/\/|\.{0,2}\/)/)
the only limitation is that you need to put ./ in front for files in same directory but I think I can live with that.

Using Regular Expression to prevent XSS vulnerability is becoming complicated thus hard to maintain over time while it could leave some vulnerabilities behind. Having URL validation using regular expression is helpful in some scenarios but better not be mixed with vulnerability checks.
Solution probably is to use combination of an encoder like AntiXssEncoder.UrlEncode for encoding Query portion of the URL and QueryBuilder for the rest:
public sealed class AntiXssUrlEncoder
{
public string EncodeUri(Uri uri, bool isEncoded = false)
{
// Encode the Query portion of URL to prevent XSS attack if is not already encoded. Otherwise let UriBuilder take care code it.
var encodedQuery = isEncoded ? uri.Query.TrimStart('?') : AntiXssEncoder.UrlEncode(uri.Query.TrimStart('?'));
var encodedUri = new UriBuilder
{
Scheme = uri.Scheme,
Host = uri.Host,
Path = uri.AbsolutePath,
Query = encodedQuery.Trim(),
Fragment = uri.Fragment
};
if (uri.Port != 80 && uri.Port != 443)
{
encodedUri.Port = uri.Port;
}
return encodedUri.ToString();
}
public static string Encode(string uri)
{
var baseUri = new Uri(uri);
var antiXssUrlEncoder = new AntiXssUrlEncoder();
return antiXssUrlEncoder.EncodeUri(baseUri);
}
}
You may need to include white listing to exclude some characters from encoding. That could become helpful for particular sites.
HTML Encoding the page that render the URL is another thing you may need to consider too.
BTW. Please note that encoding URL may break Web Parameter Tampering so the encoded link may appear not working as expected.
Also, you need to be careful about double encoding
P.S. AntiXssEncoder.UrlEncode was better be named AntiXssEncoder.EncodeForUrl to be more descriptive. Basically, It encodes a string for URL not encode a given URL and return usable URL.

You could use a hex code to convert the entire URL and send it to your server. That way the client would not understand the content in the first glance. After reading the content, you could decode the content URL = ? and send it to the browser.

Allowing a URL and allowing JavaScript are 2 different things.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string