I have a base64 string, which I need to decode with utf8, which is then gzipped, so then I need to gunzip this.
What I'm doing is:
var data = 'base64/gzipped data'
var buffer = new Buffer(data, 'base64');
zlib.gunzip(buffer, function (err, dezipped) {
dezipped.toString();
});
This is all fine, however it's not getting the utf8 characters. I could do .toString('utf8') on the base64 buffer, but it doesn't seem gunzip will accept a string.
Anyone have any suggestions on a good way to do this?
EDIT:
This is essentially what I'm trying to do in nodejs, which works in php
gzinflate(substr(base64_decode($base64),10));
Related
I'm using Firebase real-time database for my app. Daily backup is enabled for the database.
The database contains data with accents in words such as "Manutenção".
If I check this text in the Firebase console it is shown as "Manutenção".
If I export the data from the Firebase console it is shown as "Manutenção".
But if I download the backup file (.gzip) and after extraction, it is shown as "Manutenção". Notice here the encoding of accents. This encoding is according to https://string-functions.com/encodingtable.aspx?encoding=65001&decoding=10000
Why does the .gzip backup file encode the accents?
How to decode these encoded accents programmatically?
I tried to use the node module iconv but was not able to convert it.
var Iconv = require('iconv').Iconv;
var iconv = new Iconv('macintosh', 'UTF-8');
var buffer = iconv.convert('Manutenção');
console.log(buffer.toString()); // Manutenção
how can I get back "Manutenção" from "Manuten√ß√£o"?
Thanks!
Checking the threads, it seems that it is an issue with macOS
How can I convert encoding of special characters in python?
How to decode these characters? á é í
Solution
const iconv = require('iconv-lite');
let isMacRomanEncoded = (data.indexOf('¬') > -1) || (data.indexOf('√') > -1);
if(isMacRomanEncoded){
// MacRoman encoded, convert to utf-8
let buffer = iconv.encode(data, 'MacRoman');
return iconv.decode(buffer, 'utf-8');
}else{
// not MacRoman encoded, return the original
return data;
}
I'm using Iconv in Node.js to convert scraped HTML (via request with binary encoding) from SHIFT_JIS to UTF-8:
request({url:url, encoding:'binary'}, function (error, res, html) {
var iconv = new Iconv('SHIFT_JIS', 'UTF-8//TRANSLIT//IGNORE')
var converted = iconv.convert(new Buffer(html,'binary')).toString('utf8')
})
The conversion I'm getting back looks like:
é«SnÌ\r\núêXj[J[ÍAVvÉÈ調ȫ³É\r\nå«ÈCpNgð^
While the pre-conversion looks like: ���[�J�b�g����X�j�[�J�[
I tried using encoding:null in the request, but that didn't work either.
The encoding actually works as posted above, it was an issue in handling the final response outside the request function.
I used the code below to encode a file to base64.
var bitmap = fs.readFileSync(file);
return new Buffer(bitmap).toString('base64');
I figured that in the file we have issues with “” and ‘’ characters, but it’s fine with "
When we have It’s, node encodes the characters, but when I decode, I see it as
It’s
Here's the javascript I'm using to decode:
fs.writeFile(reportPath, body.buffer, {encoding: 'base64'}
So, once the file is encoded and decoded, it becomes unusable with these funky characters - It’s
Can anyone shed some light on this?
This should work.
Sample script:
const fs = require('fs')
const filepath = './testfile'
//write "it's" into the file
fs.writeFileSync(filepath,"it's")
//read the file
const file_buffer = fs.readFileSync(filepath);
//encode contents into base64
const contents_in_base64 = file_buffer.toString('base64');
//write into a new file, specifying base64 as the encoding (decodes)
fs.writeFileSync('./fileB64',contents_in_base64,{encoding:'base64'})
//file fileB64 should now contain "it's"
I suspect your original file does not have utf-8 encoding, looking at your decoding code:
fs.writeFile(reportPath, body.buffer, {encoding: 'base64'})
I am guessing your content comes from a http request of some sorts so it is possible that the content is not utf-8 encoded. Take a look at this:
https://www.w3.org/International/articles/http-charset/index if charset is not specified Content-Type text/ uses ISO-8859-1.
Here is the code that helped.
var bitmap = fs.readFileSync(file);
// Remove the non-standard characters
var tmp = bitmap.toString().replace(/[“”‘’]/g,'');
// Create a buffer from the string and return the results
return new Buffer(tmp).toString('base64');
You can provide base64 encoding to the readFileSync function itself.
const fileDataBase64 = fs.readFileSync(filePath, 'base64')
I have been facing an issue on node.js express framework encoding/decoding style.
Brief background, I store pdf file in mysql database with longblob data-type with latin1 charset. From server side, i need to send the binary data with UTF8 Encoding format as my client knows utf8 decoding format only.
I tried all the possible solutions available on google.
For ex:
new Buffer(mySqlData).toString('utf8');
Already tried module "UTF8" with given functionality utf8.encode(mySqlData); But it is not working.
Also i already tried "base64" encoding and retrieve data at client with base64 decoding. It is working just fine but i need to have utf8 encoding set. Also you know base64 certainly increase the size.
Please help guys.
Ok, your problem is the conversion of latin to utf-8. If you just call your buffer.toString('utf-8'), the latin encoded characters were wrong.
To convert other charset to utf-8, the simple wai is to use iconv and icu-charset-detector. With that, you can switch to utf-8 from all possibles charset (except certains charset).
This is an example of conversion using stream. The result stream is encoded with utf-8 :
var charsetDetector = require("node-icu-charset-detector"),
Iconv = require('iconv').Iconv,
Stream = require('stream'),
function convertToUtf8(source, callback) {
var iconv,
charsetTestStream = new Stream.PassThrough(),
newResStream = new Stream.PassThrough();
source.pipe(charsetTestStream);
source.pipe(newResStream);
charsetDetector.detectCharsetStream(charsetTestStream, function (charset) {
if (!iconv && charset && !/utf-*8/i.test(charset.toString())) {
try {
iconv = new Iconv(charset, 'utf-8');
console.log('Converting from charset %s to utf-8', charset);
iconv.on('error', function (err) {
callback(err);
});
var convertStream = newResStream.pipe(iconv);
callback(null, convertStream);
} catch(err) {
callback(err);
}
return;
}
callback(null, newResStream);
});
}
I am receiving string base64 encoded, when I try to decode and save in to an image file, it seems to be everything working fine, but when I download open the file, my viewer does not recognize it.
I know my string is a valid image because I use http://www.freeformatter.com/base64-encoder.html
and it sends me the right image.
Is there something I am missing?
var imagestring = "/9j/2wCEAA0JCQsJCA0LCgsODQ0PEx8UExEREyYbHRcfLSgwLy0oLCsyOEg9MjVENissP1U/REpNUFFQMDxYX1hOXkhPUE0BDQ4OExATJRQUJU00LDRNTU1NTU1NTU1NTU1NTU1NTU1NTU1NTU1NTU1NTU1NTU1NTU1NTU1NTU1NTU1NTU1NTf/AABEIAHgAoAMBIQACEQEDEQH/3QAEAAr/xAGiAAABBQEBAQEBAQAAAAAAAAAAAQIDBAUGBwgJCgsQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29/j5+gEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoLEQACAQIEBAMEBwUEBAABAncAAQIDEQQFITEGEkFRB2FxEyIygQgUQpGhscEJIzNS8BVictEKFiQ04SXxFxgZGiYnKCkqNTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqCg4Q=hYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDzcU6gBamtztlVj0BzTA2rK5njm3xHaCMdamuNUvktHLT7HzxwOlaNX1HYt2mpS3cCOyrkjn61Zhjyc4rCTuCL0aYFThayKFxTSKoQhFNIq0AhFNNUA0nnFIatEjaKoD//0POBS0AKK1vD1it/fNFIPl8tvwPQH8yKqKuxPYtDFudkjhCDg57Gs+5DzT4RzLn05q5PSxTN7TLYwWqKwO7GSPetWFK5ZAi2i1IBUIYYppFUAhFNIq0A00w1YhpppqkIaaSrQH//0fNw6+tHmL60AKJE9a6Hwtq+naW8811JiUgBBtJHv0H0pp2Ey5JrGiTytLcSI7tyS0TH+lRnW9KUlYZlRPRYmA/lSdnsMlj13TF63P8A443+FW4/EWkjrd/+Q3/wrJpjROviXSB/y9/+Q3/wp3/CTaP/AM/f/kN/8KmzHcT/AISbSP8An7/8hv8A4U3/AISXSP8An7/8hv8A4VVguIfEuk/8/f8A5Df/AApD4k0n/n7/APIb/wCFUFxp8R6V/wA/X/kN/wDCmHxFpf8Az9f+Q2/wqhDT4h0v/n6/8ht/hTf+Eg0z/n4=f/Ibf4VSYhDr+mf8/P8A443+FJ/b+m/8/P8A443+FUmgP//S8xooAKeikg4GaAEpQKAFPXijJpNAKHKkZORU2KlgJRSASimAlFMBKKoAooA//9PzGigAqxb8L9aaEyN12OR27UgPpSGPFPCA+30qW7ACwqDk80/FTe4CUUAJSUwCkqgA0lABRTA//9TzGigAqePOAAKaAWVcgEjkVGCKT0AehBrodM8Nf2lpMl9HqVhGY87oZZtr4Hes5XAq3ei31iqtc2skaOMo+Mqw9mHB/A1RZCp5qUAwimmqQCUlMAopgIaSmAUUwP/V8xooAUDJxU8W4nk8U0JhM2DwagAzQ9xomVRjFSpKE+XNS0B6h8ONZN9p0ulTyxyGLmOGUfeTuAfb6HrXNeP4LO11wwWdiLQog8xVI2sTzkAHAGMen0qOgHKq/wA5UjpSHrQA2imAlFMBKSmAUUwP/9bzGigBynByKnB2jLZPsKaEQuxdqcq0hjxwMYqJvlNAFi0neOTcjFSvIIOMGrl7qFxqEomu5Wmk2hd7dSB0ye59zWb3AqYAORSUAFIaYCUUwEpKYBRTA//X8xooAkQY579qepOeRTQh/lq3I+U0vlNjjB+hp2C5XO4mgRsfQVFxk6KEGBTs1ABmkpgFJQAUlUAlFMBKWgD/0PMaXtQA4NgU8HPamBI=opxUihgeRxVElXvTxWJQ6lpAFFMBKKAEoqgCkpgFFAH/0fMhS4zQA5Bg89KsrFVJXEyVI8U8lY1LN0Fa2sSZobJzUi1yssdS0AFFABRTQBSUwCimAUlAH//S8xHWpAD6UALkVaicGMEnpxVR3ExXuAmODzURD3b7UWRyOdqDNOcugkQlAhwev1pVPNZFD6WkAUUAFFMAopgFJTAKKAP/0/MgcUu8kYzQAnSpI5CnB6UICdgJE4AJ6iozdzmEwCVliJyY1OFJ9SB1NE1cRFThUjHA04UgCloAKKoAopgFFABRQB//1PNNnHPFIv3hkcUAObaBxTaAJoG8sknOKY4yxI4BND2ENxTqkYopc0AOzRQAUUwClpgFFABRQB//1fNnBAzTBuHQUkA7BI5HIpQCDkDFCAX5j2pdrelAFmx0y51CUx26AsBnk4qOW0kgcpIMMpwajmV7CIzGR2pNp9KoYoU+lLtPpQAu0+lG0+lMA2n0pdp9KYBtPpRtPpQAbT6UFTQB/9k=";
var decodedImage = new Buffer(imagestring, 'base64');
fs.writeFile('imagedecoded.jpg', decodedImage , function(err) {});