Aws lambda messes up my encoding

Aws lambda messes up my encoding - node.js

I'm using
https://clifff.com/2015/10/01/2015-failed-experiments-with-aws-lambda/
+
https://www.twilio.com/blog/2015/09/build-your-own-ivr-with-aws-lambda-amazon-api-gateway-and-twilio.html
to create an image resizing service on aws lambda ... I solved the content-type issue the first article was stuck at, but encoding seems like a dead end ... any help would be greatly appreciated!
ruby
Base64.decode64("R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7")
-> working image
GIF89a\u0001\u0000\u0001\u0000\x80\u0000\u0000\u0000\u0000\u0000\xFF\xFF\xFF!\xF9\u0004\u0001\u0000\u0000\u0000\u0000,\u0000\u0000\u0000\u0000\u0001\u0000\u0001\u0000\u0000\u0002\u0001D\u0000;
api gateway with
$util.base64Decode("R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7")
-> broken image
GIF89a\u0001\u0000\u0001\u0000�\u0000\u0000\u0000\u0000\u0000���!�\u0004\u0001\u0000\u0000\u0000\u0000,\u0000\u0000\u0000\u0000\u0001\u0000\u0001\u0000\u0000\u0002\u0001D\u0000;

OK, this is super old thread, but since the issue is still not resolved, and the binary support for API GW is miserably documented, I thought that somebody might find the workaround I found useful:
I believe, that binary data is passed around as UTF-8 strings somewhere inside the API GW. If you only need to return a tracking pixel (and not e.g. generated image), then you can avoid the problem with messed encoding by using an image that does not have problematic bytes in its binary data.
For example, the shortest tracking pixel (26 byte long GIF) has a byte with hexadecimal representation 0xFF in the middle. This will break the API GW. But if you edit this picture in hex editor and replace the byte with 0x00, you'll get something that is still a valid image (and EVEN Microsoft browsers don't complain about it), but can still be processed by API GW.
Just make your "Body Mapping" template look like this:
$util.base64Decode("R0lGODlhAQABAAAAACwAAAAAAQABAAACADs=")

Yeah ... looks like it's a know issue https://forums.aws.amazon.com/thread.jspa?messageID=668306&#668306

Related

Decoding base64 while using GitHub API to Download a File

I am using the GitHub API to download a file from GitHub. I have been able to successfully authenticate as well as get a response from github, and see a base64 encoded string representing the file contents.
Unfortunately, I get an unusual error (string length is not a multiple of 4) when decoding the base64 string.
The HTTP request is illustrated below:
GET /repos/:owner/:repo/contents/:path
The (partial) response is illustrated below:
{
"name":....,
"download_url":...",
"type":"file",
"content":"ewogICAgInN3YWdnZXIiOiAiM...
}
The issue I am encountering is that the length of the string is 15263 bytes, and I get an error in decoding the string (string length is not a multiple of 4). I am using node.js and the 'base64-js' npm module to decode the string. Code to execute the decoding is illustrated below:
var base64 = require('base64-js');
var contents = base64.toByteArray(fileContent);
The decoding causes an exception:
Error: Invalid string. Length must be a multiple of 4
at placeHoldersCount (.../node_modules/base64-js/index.js:23:11)
at Object.toByteArray (...node_modules/base64-js/index.js:42:18)
:
:
I would think that the GitHub API is sending me the correct data, so I figure that is not the issue.
Am I performing the decoding improperly or is there another problem I am overlooking?
Any help is appreciated.

I experimented a bit and found a solution by using a different base64 decoding library as follows:
var base64 = require('js-base64').Base64;
var contents = base64.decode(res.content);
I am not sure if it is mandatory to have an encoded string length divisible by 4 (clearly my 15263 character length string is not divisible by 4) but the alternate library decoded the string properly.
A second solution which I also found to work is specific to how to use the GitHub API. By adding the following to the GitHub API call header, I was also able to get the decoded file contents:
'accept': 'application/vnd.github.VERSION.raw'

After much experimenting, I think I nailed down the difference between the working and broken base64 decoding.
It appears GitHub Base-64 encodes with:
UTF-8 charset
Base 64 MIME encoder (RFC2045)
As opposed to a "basic" (RFC4648) Base64 encoder. Several languages seem to default to the basic encoder (including Java, which I was using). When I switched to a MIME encoder, I got the full contents of the file un-garbled. This would explain why switching libraries in some cases fixed the issue.
I will note the contents field contained newline characters - decoders are supposed to ignore them, but not all do, so if you still get errors, you may need to try removing them.
The media-type header will do the job better, however in my case I am trying to use the API via a GitHub App - at time of writing, GitHub requires a specific media type be used when doing that, and it returns the JSON response.

For some reason the Github APIs base64 encoded content doesn't decode properly at all the online base64 decoders I've tried from the front page of google.
Python works however:
import base64
base64.b64decode("ewogICAgInN3YWdnZXIiOiAiM...")

Google Protocol Buffers (protobuf) in Python3 - trouble with ParseFromString (encoding?)

I've got Google Protocol buffers 80% working in Python3. My .proto file works, I'm encoding data, life is almost good.
The problem is that I can't ParseFromString the result of SerializeToString.
When I print SerializeToString it looks like what I'd expect, a fairly compact binary representation (preceded by b').
My guess is that perhaps this is a difference in the way Python2 and Python3 handle strings. The putput of SerializeToString is Bytes, not a string.
Printed output of SerializeToString (Python type is ):
b'\x10\xd7\xeb\x8e\xcd\x04\x1a\x0cnamegoeshere2#\x08\x80\xf8\xde\xc3\x9f\xb0\x81\x89\x14\x11\x00\x00\x00\x00\x00\x80d\xc0\x19\x00\x00\x00\x00\x00\xc0m#!\x00\x00\x00\x00\x00\x80R\xc0)\x00\x00\x00\x00\x00x\xb7\xc01\x00\x00\x00\x00\x00\x8c\x95#9\x00\x00\x00\x00\x00\x16\xb2#'
result of ParseFromString(message):
None
No error is provided...
So - my best guess is that all I need to do is .decode() the bytes object generated, the problem is that I have no clue what the encoding is. I've tried UTF-8, -16, Latin-1, and a few others without success. My Google-Fu is strong but I haven't found anything on this.
Any help would be appreciated.

ParseFromString is a method -- it does not return anything, but rather fills in self with the parsed content. Use it like:
message = MyMessageType()
message.ParseFromString(data)
print(message.some_field)

Node.js wrong UTF8 string representation, even though byte-codes seem correct

Having searched around for a while now, I believe my problem may not be directly related to what others had. I am using unicode chars in forms (using angularjs for client-side) and noticed that the UTF8 strings didn't display on the server logs properly. Thus I decided to base64.encode all strings on the client side before submitting to the server (nodejs/express4). The JSON data arrives properly to the server, but when I try to convert it from base64 to UTF8 using a buffer I'm getting different symbols. I tested the strings on http://www.base64decode.org/ and they decode fine. Can anyone suggest what I might be doing wrong?
Example char: σ, base64="z4M=".
On the server this line decodes all JSON values to UTF8:
Object.keys(req.body).forEach(function(key) { req.body[key] = new Buffer(req.body[key], 'base64').toString('utf8'); });
And the "σ" char becomes "Ο" on the server. Anyone can assist?

Thus I decided to base64.encode all strings on the client side before submitting to the server (nodejs/express4).
No need to, really. Probably the thing you were doing wrong with utf-8 json is also wrong now.
Try to debug that.
noticed that the UTF8 strings didn't display on the server logs properly.
What do they display?
And on what OS are you?
Did you look at the logs with a hex viewer?
To me this looks like a typical "I have an a problem X, thought my solution half the way, but I am stuck with a sub-problem Y". Go back to X and attack it the right way (no base64).

artist working with base 64

Hi Im fairly new to base64 code and could do with some info/help etc. I have recently used it while converting my photos into code as part of an experiment regarding my sculptures, I converted a photo into 152 pages of base 64 using an online converter, just to get started. I was wondering how I can reverse this, and also if I edited out some code would the image change or would it not convert at all!? Like I said I am interested in using this to help my experiments. thanks for any info .

Base64 decode functions will decode it. If you remove some of the base64 code it may not translate back into a valid image or translate into valid base64 code at all depending on how much you remove. Depending on how the image was originally compressed, it MAY give you something, but the results will be incredibly unpredictable... So you'll never know what the result of the change will be on the final product. Part of the base64 standard involves a padded width (which is why some strings end in up to 2 = signs).

What is the smallest URL friendly encoding?

I am looking for a url encoding method that is most efficient in terms of space. Raw binary (base2) could be represented in base16 which is smaller and is url safe, but base64 is even more efficient. However, the usual base64 encoding isn't url safe....
So what is the smallest encoding method that is also safe for URLS?

This is what the Base64 URL encoding variant is for.
It uses the same standard Base64 Alphabet except that + is changed to - and / is changed to _.
Most modern Base64 implementations will support this alternate encoding. If yours doesn't, it's usually just a matter of doing a search/replace on the Base64 input prior to decoding, or on the output prior to sending it to a browser.

You can use a 62 character representation instead of the usual base 64. This will give you URLs like the youtube ones:
http://www.youtube.com/watch?v=0JD55e5h5JM
You can use the PHP functions provided in this page if you need to map strings to a database numerical ID:
http://bsd-noobz.com/blog/how-to-create-url-shortening-service-using-simple-php
Or this one if you need to directly convert a numerical ID to a short URL string:
http://kevin.vanzonneveld.net/techblog/article/create_short_ids_with_php_like_youtube_or_tinyurl/

"base66" (theoretical, according to spec)
As far as I can tell, the optimal encoding for URLs is a "base66" encoding into the following alphabet:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789-_.~
These are all the "Unreserved characters" according the URI specification RFC 3986 (section 2.3), so they will appear as-is in the URL. Using this "base66" encoding could give a URL like:
https://example.org/articles/.3Ja~jkWe
The question is then if you want . and ~ in your URLs?
On some older servers (ancient by now, I guess) ~joe would mean the "www directory" of the user joe on this server. And thus a user might be confused as to what the ~ character is doing in the middle of your URL.
This is common for academic websites, especially CS professors (e.g. Donald Knuth's website https://www-cs-faculty.stanford.edu/~knuth/)
"base80" (in practice, but not battle-tested)
However, in my own testing the following 14 other symbols also do not get
percent-encoded (in Chrome 95 and Firefox 93):
!$'()*+,:;=#[]
(see also this StackOverflow answer)
leaving a "base80" URL encoding possible. Some of these (notably + and =) would not work in the query string part of the URL, only in the path part. All in all, this ends up giving you beautiful, hyper-compressed URLs like:
https://example.org/articles/1OWG,HmpkySCbBy#RG6_,
https://example.org/articles/21Cq-b6Ud)txMEW$,hc4K
https://example.org/articles/:3Tx**U9X'd;tl~rR]q+
There's a plethora of reasons why you might not want all of those symbols in your URLs. One example is that StackOverflow's own "linkifier" won't include that ending comma in the link it generates (I've manually made it a part of the link here).
Also the percent encoding seems to be quite finicky. In some cases Firefox would initially percent-encode ' and ~] but on later requests would not.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Aws lambda messes up my encoding - node.js

Yeah ... looks like it's a know issue https://forums.aws.amazon.com/thread.jspa?messageID=668306&#668306

Related

Decoding base64 while using GitHub API to Download a File

Google Protocol Buffers (protobuf) in Python3 - trouble with ParseFromString (encoding?)

Node.js wrong UTF8 string representation, even though byte-codes seem correct

artist working with base 64

What is the smallest URL friendly encoding?

Categories

Resources