I'm using Firebase real-time database for my app. Daily backup is enabled for the database.
The database contains data with accents in words such as "Manutenção".
If I check this text in the Firebase console it is shown as "Manutenção".
If I export the data from the Firebase console it is shown as "Manutenção".
But if I download the backup file (.gzip) and after extraction, it is shown as "Manutenção". Notice here the encoding of accents. This encoding is according to https://string-functions.com/encodingtable.aspx?encoding=65001&decoding=10000
Why does the .gzip backup file encode the accents?
How to decode these encoded accents programmatically?
I tried to use the node module iconv but was not able to convert it.
var Iconv = require('iconv').Iconv;
var iconv = new Iconv('macintosh', 'UTF-8');
var buffer = iconv.convert('Manutenção');
console.log(buffer.toString()); // Manutenção
how can I get back "Manutenção" from "Manuten√ß√£o"?
Thanks!
Checking the threads, it seems that it is an issue with macOS
How can I convert encoding of special characters in python?
How to decode these characters? á é í
Solution
const iconv = require('iconv-lite');
let isMacRomanEncoded = (data.indexOf('¬') > -1) || (data.indexOf('√') > -1);
if(isMacRomanEncoded){
// MacRoman encoded, convert to utf-8
let buffer = iconv.encode(data, 'MacRoman');
return iconv.decode(buffer, 'utf-8');
}else{
// not MacRoman encoded, return the original
return data;
}
Related
A legacy 3rd party I'm working with requires that we provide them with text files using ANSI (~ASCII) character encoding.
The content to be saved to the file is large so I'm using streams. If using the fs library I can do something like this:
const file = fs.createWriteStream(filePath, {encoding: 'ascii'});
data.pipe(file).on('error', handleError).on('finish', handleSuccess);
I'm trying to do the equivalent in the GCS node client:
const storage = new Storage();
const gcsFile = storage.bucket(bucket).file(fileName, {});
const fileStream = gcsFile.createWriteStream();
data.pipe(fileStream).on('error', handleError).on('finish', handleSuccess);
However the createWriteStream method has no such option to specify the character encoding.
Is there a way to explicitly stream data using ASCII character encoding to GCS?
I think you can pass options.metadata to the createWriteStream. Does something like the following work for you?
gcsFile = gcsBucket.file(fileName);
fs.createReadStream('myLocalFile.txt')
.pipe(myFile.createWriteStream({
metadata: {
contentEncoding: 'ascii'
}
}))
.on('error', handleError)
.on('finish', handleSuccess);
See the client ref docs here: https://googleapis.dev/nodejs/storage/latest/global.html#CreateWriteStreamOptions
I have a base64 encoded csv file, and I want to process it without saving to storage. How do you decode a base64 string, then assign it to a variable and then parse it using NodeJS?
There are many modules in the main npm repository. This is just one I chose, you can use another one. The module is base-x, the docs page has examples, which you should modify slightly to work with the base64 encoding:
var BASE64 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
var bs64 = require('base-x')(BASE64);
var decoded = bs64.decode(youStringVariable);
// then store the decoded string or log it, or whatever
// console.log(decoded);
// myApi.store(decoded); etc.
I used the code below to encode a file to base64.
var bitmap = fs.readFileSync(file);
return new Buffer(bitmap).toString('base64');
I figured that in the file we have issues with “” and ‘’ characters, but it’s fine with "
When we have It’s, node encodes the characters, but when I decode, I see it as
It’s
Here's the javascript I'm using to decode:
fs.writeFile(reportPath, body.buffer, {encoding: 'base64'}
So, once the file is encoded and decoded, it becomes unusable with these funky characters - It’s
Can anyone shed some light on this?
This should work.
Sample script:
const fs = require('fs')
const filepath = './testfile'
//write "it's" into the file
fs.writeFileSync(filepath,"it's")
//read the file
const file_buffer = fs.readFileSync(filepath);
//encode contents into base64
const contents_in_base64 = file_buffer.toString('base64');
//write into a new file, specifying base64 as the encoding (decodes)
fs.writeFileSync('./fileB64',contents_in_base64,{encoding:'base64'})
//file fileB64 should now contain "it's"
I suspect your original file does not have utf-8 encoding, looking at your decoding code:
fs.writeFile(reportPath, body.buffer, {encoding: 'base64'})
I am guessing your content comes from a http request of some sorts so it is possible that the content is not utf-8 encoded. Take a look at this:
https://www.w3.org/International/articles/http-charset/index if charset is not specified Content-Type text/ uses ISO-8859-1.
Here is the code that helped.
var bitmap = fs.readFileSync(file);
// Remove the non-standard characters
var tmp = bitmap.toString().replace(/[“”‘’]/g,'');
// Create a buffer from the string and return the results
return new Buffer(tmp).toString('base64');
You can provide base64 encoding to the readFileSync function itself.
const fileDataBase64 = fs.readFileSync(filePath, 'base64')
I have been facing an issue on node.js express framework encoding/decoding style.
Brief background, I store pdf file in mysql database with longblob data-type with latin1 charset. From server side, i need to send the binary data with UTF8 Encoding format as my client knows utf8 decoding format only.
I tried all the possible solutions available on google.
For ex:
new Buffer(mySqlData).toString('utf8');
Already tried module "UTF8" with given functionality utf8.encode(mySqlData); But it is not working.
Also i already tried "base64" encoding and retrieve data at client with base64 decoding. It is working just fine but i need to have utf8 encoding set. Also you know base64 certainly increase the size.
Please help guys.
Ok, your problem is the conversion of latin to utf-8. If you just call your buffer.toString('utf-8'), the latin encoded characters were wrong.
To convert other charset to utf-8, the simple wai is to use iconv and icu-charset-detector. With that, you can switch to utf-8 from all possibles charset (except certains charset).
This is an example of conversion using stream. The result stream is encoded with utf-8 :
var charsetDetector = require("node-icu-charset-detector"),
Iconv = require('iconv').Iconv,
Stream = require('stream'),
function convertToUtf8(source, callback) {
var iconv,
charsetTestStream = new Stream.PassThrough(),
newResStream = new Stream.PassThrough();
source.pipe(charsetTestStream);
source.pipe(newResStream);
charsetDetector.detectCharsetStream(charsetTestStream, function (charset) {
if (!iconv && charset && !/utf-*8/i.test(charset.toString())) {
try {
iconv = new Iconv(charset, 'utf-8');
console.log('Converting from charset %s to utf-8', charset);
iconv.on('error', function (err) {
callback(err);
});
var convertStream = newResStream.pipe(iconv);
callback(null, convertStream);
} catch(err) {
callback(err);
}
return;
}
callback(null, newResStream);
});
}
I have a web service that takes a base 64 encoded string representing an image, creates a thumbnail of that image using the imagemagick library, then stores both of them in mongodb. I am doing this with the following code (approximately):
var buf = new Buffer(req.body.data, "base64"); //original image
im.resize({ srcData: buf, width: 256 }, function(err, stdout, stderr) {
this.thumbnail = new Buffer(stdout, "binary");
//store buf and stdout in mongo
});
You will notice that I am creating a Buffer object using the "binary" encoding, which the docs say not to do:
'binary' - A way of encoding raw binary data into strings by using
only the first 8 bits of each character. This encoding method is
deprecated and should be avoided in favor of Buffer objects where
possible. This encoding will be removed in future versions of Node.
First off I'm not sure what they are saying there. I'm trying to create a Buffer object and they seem to imply I should already have one.
Secondly, the source of the problem appears to be that the imagemagick resize method returns a string containing binary data. Doing typedef(stdout) return "string" and printing it out to the screen certainly appears to show a bunch of non-character data.
So what do I do here? I can't change how imagemagick works. Is there another way of doing what I'm trying to do?
Thats how I am doing the same with success, storing images in mongodb.
//original ---> base64
var thumbnail = new Buffer(req.body.data).toString('base64');
//you can store this string value in a mongoose model property, and save to mongodb
//base64 ---> image
var buffer = new Buffer(thumbnail, "base64");
I am not sure if storing images as base64 is the best way to do it
Please try this as your base64 might not be pre-handled:
var imgRawData =
req.body.images[0].replace(/^data:image\/png;base64,|^data:image\/jpeg;base64,|^data:image\/jpg;base64,|^data:image\/bmp;base64,/, "");
var yourBuffer = new Buffer(imgRawData, "base64");
Then, save the yourBuffer into MongoDB buffer.