How do I validate base64 images in NodeJS/Express? - node.js

Let's say I am creating a REST API with Node/Express and data is exchanged between the client and server via JSON.
A user is filling out a registration form and one of the fields is an image input to upload a profile image. Images cannot be sent through JSON and therefore must be converted to a base64 string.
How do I validate this is indeed a base64 string of an image on the serverside? Or is it best practice not to send the profile image as a base64?

You could start by checking if the string is a base64 image, with a proper mime type.
I found this library on npm registry doing exactly that (not tested).
const isBase64 = require('is-base64');
let base64str_img = 'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAAoHBwgHBgoIC...ljA5GC68sN8AoXT/AF7fw7//2Q==';
console.log(isBase64(base64str_img, { mime: true })); // true
Then you can verify if the mime type is allowed within your app, or make other verifications like trying to display the image file and catch possible error.
Anyway, If you want to be really sure about user input, you have to handle it by yourself in the first place. That is the best practice you should care about.

The Base64 value it is a valid image only if its decoded data has the correct MIME type, and the width and height are greater than zero. A handy way to check it all, is to install the jimp package and use it as follows:
var b64 = 'R0lGODdhAQADAPABAP////8AACwAAAAAAQADAAACAgxQADs=',
buf = Buffer.from(b64, 'base64');
require('jimp').read(buf).then(function (img) {
if (img.bitmap.width > 0 && img.bitmap.height > 0) {
console.log('Valid image');
} else {
console.log('Invalid image');
}
}).catch (function (err) {
console.log(err);
});

I wanted to do something similar, and ended up Googling it and found nothing, so I made my own base64 validator:
function isBase64(text) {
let utf8 = Buffer.from(text).toString("utf8");
return !(/[^\x00-\x7f]/.test(utf8));
}
This isn't great, because I used it for a different purpose but you may be able to build on it, here is an example using atob to prevent invalid base64 chars (they are ignored otherwise):
function isBase64(text) {
try {
let utf8 = atob(text);
return !(/[^\x00-\x7f]/.test(utf8));
} catch (_) {
return false;
}
}
Now, about how it works:
Buffer.from(text, "base64") removes all invalid base64 chars from the string, then converts the string to a buffer, toString("utf8"), converts the buffer to a string. atob does something similar, but instead of removing the invalid chars, it will throw an error when it encounters one (hence the try...catch).
!(/[^\x00-\x7f]/.test(utf8)) will return true if all the chars from the decoded string belong in the ASCII charset, otherwise it will return false. This can be altered to use a smaller charset, for example, [^\x30-\x39\x41-\x5a\x61-\x7a] will only return true if all the characters are alphanumeric.

Related

nodeJS: convert response.body in utf-8 (from windows-1251 encoding)

I'm trying to convert an HTML body encoded in windows-1251 into utf-8 but I still get messed up characters on html.
They are basically Russian alphabet but I can't get them to be shown properly. I get ??????? ?? ???
const GOT = require('got') // https://www.npmjs.com/package/got
const WIN1251 = require('windows-1251') // https://www.npmjs.com/package/windows-1251
async function query() {
var body = Buffer.from(await GOT('https://example.net/', {resolveBodyOnly: true}), 'binary')
var html = WIN1251.decode(body.toString('utf8'))
console.log(html)
}
query()
You’re doing a lot of silly encoding back-and-forth here. And the ‘backs’ don’t even match the ‘forths’.
First, you use the got library to download a webpage; by default, got will dutifully decode response texts as UTF-8. You stuff the returned Unicode string into a Buffer with the binary encoding, which throws away the higher octet of each UTF-16 code unit of the Unicode string. Then you use .toString('utf-8') which interprets this mutilated string as UTF-8 (in actuality, it is most likely not valid UTF-8 at all). Then you pass the ‘UTF-8’ string to the windows-1251, to decode it as a ‘code page 1251’ string. Nothing good can possibly come from all this confusion.
The windows-1251 package you want to use takes so-called ‘binary’ (pseudo-Latin-1) strings as input. What you should do instead is take the binary response, interpret it as Latin-1/‘binary’ string and then pass it to the windows-1251 library for decoding.
In other words, use this:
const GOT = require('got');
const WIN1251 = require('windows-1251');
async function query() {
const body = await GOT('https://example.net/', {
resolveBodyOnly: true,
responseType: 'buffer'
});
const html = WIN1251.decode(body.toString('binary'))
console.log(html)
}
query()

AWS Lambda base64 encoded form data with image in Node

So, I'm trying to pass an image to a Node Lambda through API Gateway and this is automatically base64 encoded. This is fine, and my form data all comes out correct, except somehow my image is being corrupted, and I'm not sure how to decode this properly to avoid this. Here is the relevant part of my code:
const multipart = require('aws-lambda-multipart-parser');
exports.handler = async (event) => {
console.log({ event });
const buff = Buffer.from(event.body, 'base64');
// using utf-8 appears to lose some of the data
const decodedEventBody = buff.toString('ascii');
const decodedEvent = { ...event, body: decodedEventBody };
const jsonEvent = multipart.parse(decodedEvent, false);
const asset = Buffer.from(jsonEvent.file.content, 'ascii');
}
First off, it would be good to know if aws-sdk had a way of parsing the multipart form data rather than using this unsupported third party code. Next, the value of asset ends with a buffer that's exactly the same size as the original file, but some of the byte values are off. My assumption is that the way this is being encoded vs decoded is slightly different and maybe some of the characters are being interpreted differently.
Just an update in case anybody else runs into a similar problem - updated 'ascii' to 'latin1' in both places and then it started working fine.

iconv-lite not decoding everything properly, even though I'm using proper decoding

I'm using this piece of code to download a webpage (using request library) and decode everything (using iconv-lite library). The loader function is for finding some elements from the body of the website, then returning them as a JavaScript object.
request.get({url: url, encoding: null}, function(error, response, body) {
// if webpage exists, process it, otherwise throw 'not found' error
if (response.statusCode === 200) {
body = iconv.decode(body, "iso-8859-1");
const $ = cheerio.load(body);
async function show() {
var data = await loader.getDay($, date, html_tags, thumbs, res, image_thumbnail_size);
res.send(JSON.stringify(data));
}
show();
} else {
res.status(404);
res.send(JSON.stringify({"error":"No content for this date."}))
}
});
The pages are encoded in ISO-8859-1 format, and the content is looking normal, there are no bad chars. When I wasn't using iconv-lite, some characters, eg. ü, were looking like this: �. Now, when I'm using the library like in the code provided above, most of the chars are looking good, but some, eg. š are an empty box, even though they're displayed without any problems on the website.
I'm sure it's not cheerio's issue, because when I printed the output using res.send(body); or res.send(JSON.stringify({"body":body}));, the empty box character was still present there. Maybe it's a problem with Express? Is there a way to fix that?
EDIT:
I copied the empty box character to Google, and it has changed to š, maybe that's important
Also, I tried to change output of Express using res.charset but that didn't help.
I used this website: https://validator.w3.org/nu/?doc=https%3A%2F%2Fapod.nasa.gov%2Fapod%2Fap170813.html to check if the page I'm scraping really has ISO-8859-1 encoding, it turned out that it has Windows-1252 encoding. I changed the encoding in my API (var encoding = 'windows-1252') and it works well now.

Vanilla node.js response encoding - weird behaviour (heroku)

I'm developing my understanding of servers by writing a webdev framework using vanilla node.js. For the first time a situation arose where a french character was included in a json response from the server, and this character showed up as an unrecognized symbol (a question-mark within a diamond on chrome).
The problem was the encoding, which was being specified here:
/*
At this stage we have access to "response", which is an object of the
following format:
response = {
data: 'A string which contains the data to send',
encoding: 'The encoding type of the data, e.g. "text/html", "text/json"'
}
*/
var encoding = response.encoding;
var length = response.data.length;
var data = response.data;
res.writeHead(200, {
'Content-Type': encoding,
'Content-Length': length
});
res.end(data, 'binary'); // Everything is encoded as binary
The problem was that everything sent by the server is encoded as binary, which ruins the ability to display certain characters. The fix seemed simple; include a boolean binary value in response, and adjust the 2nd parameter of res.end accordingly:
/*
At this stage we have access to "response", which is an object of the
following format:
response = {
data: 'A string which contains the data to send',
encoding: 'The encoding type of the data, e.g. "text/html", "text/json"',
binary: 'a boolean value determining the transfer encoding'
}
*/
.
.
.
var binary = response.binary;
.
.
.
res.end(data, binary ? 'binary' : 'utf8'); // Encode responses appropriately
Here is where I have produced some very, very strange behavior. This modification causes french characters to appear correctly, but occasionally causes the last character of a response to be omitted on the client-side!!!
This bug only happens once I host my application on heroku. Locally, the last character is never missing.
I noticed this bug because certain responses (not all of them!) now break the JSON.parse call on the client-side, although they are only missing the final } character.
I have a horrible band-aid solution right now, which works:
var length = response.data.length + 1;
var data = response.data + ' ';
I am simply appending a space to every single response sent by the server. This actually causes all text/html, text/css, text/json, and application/javascript responses to work because they can tolerate the unnecessary whitespace, but I hate this solution and it will break other Content-Types!
My question is: can anyone give me some insight into this problem?
If you're going to explicitly set a Content-Length, you should always use Buffer.byteLength() on the body to calculate the length, since that method returns the actual number of bytes in the string and not the number of characters like the string .length property will return.

Node.js encoding UTF-8 issue

I have been facing an issue on node.js express framework encoding/decoding style.
Brief background, I store pdf file in mysql database with longblob data-type with latin1 charset. From server side, i need to send the binary data with UTF8 Encoding format as my client knows utf8 decoding format only.
I tried all the possible solutions available on google.
For ex:
new Buffer(mySqlData).toString('utf8');
Already tried module "UTF8" with given functionality utf8.encode(mySqlData); But it is not working.
Also i already tried "base64" encoding and retrieve data at client with base64 decoding. It is working just fine but i need to have utf8 encoding set. Also you know base64 certainly increase the size.
Please help guys.
Ok, your problem is the conversion of latin to utf-8. If you just call your buffer.toString('utf-8'), the latin encoded characters were wrong.
To convert other charset to utf-8, the simple wai is to use iconv and icu-charset-detector. With that, you can switch to utf-8 from all possibles charset (except certains charset).
This is an example of conversion using stream. The result stream is encoded with utf-8 :
var charsetDetector = require("node-icu-charset-detector"),
Iconv = require('iconv').Iconv,
Stream = require('stream'),
function convertToUtf8(source, callback) {
var iconv,
charsetTestStream = new Stream.PassThrough(),
newResStream = new Stream.PassThrough();
source.pipe(charsetTestStream);
source.pipe(newResStream);
charsetDetector.detectCharsetStream(charsetTestStream, function (charset) {
if (!iconv && charset && !/utf-*8/i.test(charset.toString())) {
try {
iconv = new Iconv(charset, 'utf-8');
console.log('Converting from charset %s to utf-8', charset);
iconv.on('error', function (err) {
callback(err);
});
var convertStream = newResStream.pipe(iconv);
callback(null, convertStream);
} catch(err) {
callback(err);
}
return;
}
callback(null, newResStream);
});
}

Resources