The documentation at https://developers.facebook.com/docs/authentication/signed_request/ provides an example of encoded data, which seems to be wrong.
<?php
echo base64_decode("eyJhbGdvcml0aG0iOiJITUFDLVNIQTI1NiIsIjAiOiJwYXlsb2FkIn0");
echo "\n";
echo base64_encode('{"algorithm":"HMAC-SHA256","0":"payload"}');
echo "\n";
?>
Gives the output:
{"algorithm":"HMAC-SHA256","0":"payload"}
eyJhbGdvcml0aG0iOiJITUFDLVNIQTI1NiIsIjAiOiJwYXlsb2FkIn0=
The argument to base64_decode is missing the = padding character. Using base64_encode on the same data shows that it was not created with PHP, or the PHP version used was bugged, or I don't know what. The same documentation page provides a signature that signs the missing = string.
Question: Is the omission of padding a bug in Facebook documentation, or should I expect these kinds of omissions in production code?
Other languages don't fail as "gracefully" as php does, and will actually not decode base64 data with missing padding signs - so this is somewhat important.
My shop does stuff in C#, and we went for adding the padding ourselves.
Knowing Facebook, from years of developing on their platform ... well, let's just say that this little documentation hiccup doesn't surprise me at all.
Related
I'm using
https://clifff.com/2015/10/01/2015-failed-experiments-with-aws-lambda/
+
https://www.twilio.com/blog/2015/09/build-your-own-ivr-with-aws-lambda-amazon-api-gateway-and-twilio.html
to create an image resizing service on aws lambda ... I solved the content-type issue the first article was stuck at, but encoding seems like a dead end ... any help would be greatly appreciated!
ruby
Base64.decode64("R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7")
-> working image
GIF89a\u0001\u0000\u0001\u0000\x80\u0000\u0000\u0000\u0000\u0000\xFF\xFF\xFF!\xF9\u0004\u0001\u0000\u0000\u0000\u0000,\u0000\u0000\u0000\u0000\u0001\u0000\u0001\u0000\u0000\u0002\u0001D\u0000;
api gateway with
$util.base64Decode("R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7")
-> broken image
GIF89a\u0001\u0000\u0001\u0000�\u0000\u0000\u0000\u0000\u0000���!�\u0004\u0001\u0000\u0000\u0000\u0000,\u0000\u0000\u0000\u0000\u0001\u0000\u0001\u0000\u0000\u0002\u0001D\u0000;
OK, this is super old thread, but since the issue is still not resolved, and the binary support for API GW is miserably documented, I thought that somebody might find the workaround I found useful:
I believe, that binary data is passed around as UTF-8 strings somewhere inside the API GW. If you only need to return a tracking pixel (and not e.g. generated image), then you can avoid the problem with messed encoding by using an image that does not have problematic bytes in its binary data.
For example, the shortest tracking pixel (26 byte long GIF) has a byte with hexadecimal representation 0xFF in the middle. This will break the API GW. But if you edit this picture in hex editor and replace the byte with 0x00, you'll get something that is still a valid image (and EVEN Microsoft browsers don't complain about it), but can still be processed by API GW.
Just make your "Body Mapping" template look like this:
$util.base64Decode("R0lGODlhAQABAAAAACwAAAAAAQABAAACADs=")
Yeah ... looks like it's a know issue https://forums.aws.amazon.com/thread.jspa?messageID=668306򣊒
Having searched around for a while now, I believe my problem may not be directly related to what others had. I am using unicode chars in forms (using angularjs for client-side) and noticed that the UTF8 strings didn't display on the server logs properly. Thus I decided to base64.encode all strings on the client side before submitting to the server (nodejs/express4). The JSON data arrives properly to the server, but when I try to convert it from base64 to UTF8 using a buffer I'm getting different symbols. I tested the strings on http://www.base64decode.org/ and they decode fine. Can anyone suggest what I might be doing wrong?
Example char: σ, base64="z4M=".
On the server this line decodes all JSON values to UTF8:
Object.keys(req.body).forEach(function(key) { req.body[key] = new Buffer(req.body[key], 'base64').toString('utf8'); });
And the "σ" char becomes "Ο" on the server. Anyone can assist?
Thus I decided to base64.encode all strings on the client side before submitting to the server (nodejs/express4).
No need to, really. Probably the thing you were doing wrong with utf-8 json is also wrong now.
Try to debug that.
noticed that the UTF8 strings didn't display on the server logs properly.
What do they display?
And on what OS are you?
Did you look at the logs with a hex viewer?
To me this looks like a typical "I have an a problem X, thought my solution half the way, but I am stuck with a sub-problem Y". Go back to X and attack it the right way (no base64).
Within Node.js, I am using querystring.stringify() to encode an object into a query string for usage in a URL. Values that have spaces are encoded as %20.
I'm working with a particularly finicky web service that will only accept spaces encoded as +, as used to be commonly done prior to RFC3986.
Is there a way to set an option for querystring so that it encodes spaces as +?
Currently I am simply doing a .replace() to replace all instances of %20 with +, but this is a bit tedious if there is an option I can set ahead of time.
If anyone still facing this issue, "qs" npm package has feature to encode spaces as +
qs.stringify({ a: 'b c' }, { format : 'RFC1738' })
I can't think of any library doing that by default, and unfortunately, I'd say your implementation may be the more efficient way to do this, since any other option would probably either do what you're already doing, or will use slower non-compiled pure JavaScript code.
What about asking the web service provider to follow the RFC?
https://github.com/kvz/phpjs is a node.js package that provides all the php functions. The http_build_query implementation at the time of writing this only supports urlencode (the query string includes + instead of spaces), but hopefully soon will include the enc_type parameter / rawurlencode (%20's for spaces).
See http://php.net/http_build_query.
RFC1738 (+'s) will be the default enc_type either way, so you can use it immediately for your purposes.
I am looking for a url encoding method that is most efficient in terms of space. Raw binary (base2) could be represented in base16 which is smaller and is url safe, but base64 is even more efficient. However, the usual base64 encoding isn't url safe....
So what is the smallest encoding method that is also safe for URLS?
This is what the Base64 URL encoding variant is for.
It uses the same standard Base64 Alphabet except that + is changed to - and / is changed to _.
Most modern Base64 implementations will support this alternate encoding. If yours doesn't, it's usually just a matter of doing a search/replace on the Base64 input prior to decoding, or on the output prior to sending it to a browser.
You can use a 62 character representation instead of the usual base 64. This will give you URLs like the youtube ones:
http://www.youtube.com/watch?v=0JD55e5h5JM
You can use the PHP functions provided in this page if you need to map strings to a database numerical ID:
http://bsd-noobz.com/blog/how-to-create-url-shortening-service-using-simple-php
Or this one if you need to directly convert a numerical ID to a short URL string:
http://kevin.vanzonneveld.net/techblog/article/create_short_ids_with_php_like_youtube_or_tinyurl/
"base66" (theoretical, according to spec)
As far as I can tell, the optimal encoding for URLs is a "base66" encoding into the following alphabet:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789-_.~
These are all the "Unreserved characters" according the URI specification RFC 3986 (section 2.3), so they will appear as-is in the URL. Using this "base66" encoding could give a URL like:
https://example.org/articles/.3Ja~jkWe
The question is then if you want . and ~ in your URLs?
On some older servers (ancient by now, I guess) ~joe would mean the "www directory" of the user joe on this server. And thus a user might be confused as to what the ~ character is doing in the middle of your URL.
This is common for academic websites, especially CS professors (e.g. Donald Knuth's website https://www-cs-faculty.stanford.edu/~knuth/)
"base80" (in practice, but not battle-tested)
However, in my own testing the following 14 other symbols also do not get
percent-encoded (in Chrome 95 and Firefox 93):
!$'()*+,:;=#[]
(see also this StackOverflow answer)
leaving a "base80" URL encoding possible. Some of these (notably + and =) would not work in the query string part of the URL, only in the path part. All in all, this ends up giving you beautiful, hyper-compressed URLs like:
https://example.org/articles/1OWG,HmpkySCbBy#RG6_,
https://example.org/articles/21Cq-b6Ud)txMEW$,hc4K
https://example.org/articles/:3Tx**U9X'd;tl~rR]q+
There's a plethora of reasons why you might not want all of those symbols in your URLs. One example is that StackOverflow's own "linkifier" won't include that ending comma in the link it generates (I've manually made it a part of the link here).
Also the percent encoding seems to be quite finicky. In some cases Firefox would initially percent-encode ' and ~] but on later requests would not.
My ColdFusion (MX7 on IIS 6) site has search functionality which appends the search term to the URL e.g. http://www.example.com/search.cfm/searchterm.
The problem I'm running into is this is a multilingual site, so the search term may be in another language e.g. القاهرة leading to a search URL such as http://www.example.com/search.cfm/القاهرة
The problem is when I come to retrieve the search term from the URL. I'm using cgi.PATH_INFO to retrieve the path of the search page and the search term and extracting the search term from this e.g. /search.cfm/searchterm however, when unicode characters are used in the search they are converted to question marks e.g. /search.cfm/??????.
These appear actual question marks, rather than the browser not being able to format unicode characters, or them being mangled on output.
I can't find any information about whether ColdFusion supports unicode in the URL, or how I can go about resolving this and getting hold of the complete URL in some way - does anyone have any ideas?
Cheers,
Tom
Edit: Further research has lead me to believe the issue may related to IIS rather than ColdFusion, but my original query still stands.
Further edit
The result of GetPageContext().GetRequest().GetRequestUrl().ToString() is http://www.example.com/search.cfm/searchterm/????? so it appears the issue goes fairly deep.
Yeah, it's not really ColdFusion's fault. It's a common problem.
It's mostly the fault of the original CGI specification, which specifies that PATH_INFO has to be %-decoded, thus losing the original %xx byte sequences that would have allowed you to work out which real characters were meant.
And it's partly IIS's fault, because it always tries to read submitted %xx bytes in the path part as UTF-8-encoded Unicode (unless the path isn't a valid UTF-8 byte sequence in which case it plumps for the Windows default code page, but gives you no way to find out this has happened). Having done so, it puts it in environment variables as a Unicode string (as envvars are Unicode under Windows).
However most byte-based tools using the C stdio (and I'm assuming this applies to ColdFusion, as it does under Perl, Python 2, PHP etc.) then try to read the environment variables as bytes, and the MS C runtime encodes the Unicode contents again using the Windows default code page. So any characters that don't fit in the default code page are lost for good. This would include your Arabic characters when running on a Western Windows install.
A clever script that has direct access to the Win32 GetEnvironmentVariableW API could call that to retrieve a native-Unicode environment variable which they could then encode to UTF-8 or whatever else they wanted, assuming that the input was also UTF-8 (which is what you'd generally want today). However, I don't think CodeFusion gives you this access, and in any case it only works from IIS6 onwards; IIS5.x will throw away any non-default-codepage characters before they even reach the environment variables.
Otherwise, your best bet is URL-rewriting. If a layer above CF can convert that search.cfm/القاهرة to search.cfm/?q=القاهرة then you don't face the same problem, as the QUERY_STRING variable, unlike PATH_INFO, is not specified to be %-decoded, so the %xx bytes remain where a tool at CF's level can see them.
Here's what you could do:
<cfset url.searchTerm = URLEncodedFormat("القاهر", "utf-8") >
<cfset myVar = URLDecode(url.searchTerm , "utf-8") >
Ofcourse, I'd recommend that you work with something like this in that case:
yourtemplate.cfm?searchTerm=%C3%98%C2%A7%C3%99%E2%80%9E
And then you do URL rewriting in IIS (if not already done by framework/rest of the app) http://learn.iis.net/page.aspx/461/creating-rewrite-rules-for-the-url-rewrite-module/ to match your pattern.
You can set the character encoding of the URL and FORM scope using the setEncoding() function:
http://www.adobe.com/livedocs/coldfusion/7/htmldocs/wwhelp/wwhimpl/common/html/wwhelp.htm?context=ColdFusion_Documentation&file=00000623.htm
You need to do this before you access any of the variables in this scope.
But, the default encoding of those scopes is already UTF-8, so this may not help. Also, this would probably not affect the CGI scope.
Is the IIS Server logging the correct characters into the request log?