Encoding a file to base 64 Nodejs - node.js

I used the code below to encode a file to base64.
var bitmap = fs.readFileSync(file);
return new Buffer(bitmap).toString('base64');
I figured that in the file we have issues with “” and ‘’ characters, but it’s fine with "
When we have It’s, node encodes the characters, but when I decode, I see it as
It’s
Here's the javascript I'm using to decode:
fs.writeFile(reportPath, body.buffer, {encoding: 'base64'}
So, once the file is encoded and decoded, it becomes unusable with these funky characters - It’s
Can anyone shed some light on this?

This should work.
Sample script:
const fs = require('fs')
const filepath = './testfile'
//write "it's" into the file
fs.writeFileSync(filepath,"it's")
//read the file
const file_buffer = fs.readFileSync(filepath);
//encode contents into base64
const contents_in_base64 = file_buffer.toString('base64');
//write into a new file, specifying base64 as the encoding (decodes)
fs.writeFileSync('./fileB64',contents_in_base64,{encoding:'base64'})
//file fileB64 should now contain "it's"
I suspect your original file does not have utf-8 encoding, looking at your decoding code:
fs.writeFile(reportPath, body.buffer, {encoding: 'base64'})
I am guessing your content comes from a http request of some sorts so it is possible that the content is not utf-8 encoded. Take a look at this:
https://www.w3.org/International/articles/http-charset/index if charset is not specified Content-Type text/ uses ISO-8859-1.

Here is the code that helped.
var bitmap = fs.readFileSync(file);
// Remove the non-standard characters
var tmp = bitmap.toString().replace(/[“”‘’]/g,'');
// Create a buffer from the string and return the results
return new Buffer(tmp).toString('base64');

You can provide base64 encoding to the readFileSync function itself.
const fileDataBase64 = fs.readFileSync(filePath, 'base64')

Related

Firebase RTDB .gzip and Node JS encoding

I'm using Firebase real-time database for my app. Daily backup is enabled for the database.
The database contains data with accents in words such as "Manutenção".
If I check this text in the Firebase console it is shown as "Manutenção".
If I export the data from the Firebase console it is shown as "Manutenção".
But if I download the backup file (.gzip) and after extraction, it is shown as "Manutenção". Notice here the encoding of accents. This encoding is according to https://string-functions.com/encodingtable.aspx?encoding=65001&decoding=10000
Why does the .gzip backup file encode the accents?
How to decode these encoded accents programmatically?
I tried to use the node module iconv but was not able to convert it.
var Iconv = require('iconv').Iconv;
var iconv = new Iconv('macintosh', 'UTF-8');
var buffer = iconv.convert('Manutenção');
console.log(buffer.toString()); // Manutenção
how can I get back "Manutenção" from "Manuten√ß√£o"?
Thanks!
Checking the threads, it seems that it is an issue with macOS
How can I convert encoding of special characters in python?
How to decode these characters? á é í
Solution
const iconv = require('iconv-lite');
let isMacRomanEncoded = (data.indexOf('¬') > -1) || (data.indexOf('√') > -1);
if(isMacRomanEncoded){
// MacRoman encoded, convert to utf-8
let buffer = iconv.encode(data, 'MacRoman');
return iconv.decode(buffer, 'utf-8');
}else{
// not MacRoman encoded, return the original
return data;
}

nodeJS: convert response.body in utf-8 (from windows-1251 encoding)

I'm trying to convert an HTML body encoded in windows-1251 into utf-8 but I still get messed up characters on html.
They are basically Russian alphabet but I can't get them to be shown properly. I get ??????? ?? ???
const GOT = require('got') // https://www.npmjs.com/package/got
const WIN1251 = require('windows-1251') // https://www.npmjs.com/package/windows-1251
async function query() {
var body = Buffer.from(await GOT('https://example.net/', {resolveBodyOnly: true}), 'binary')
var html = WIN1251.decode(body.toString('utf8'))
console.log(html)
}
query()
You’re doing a lot of silly encoding back-and-forth here. And the ‘backs’ don’t even match the ‘forths’.
First, you use the got library to download a webpage; by default, got will dutifully decode response texts as UTF-8. You stuff the returned Unicode string into a Buffer with the binary encoding, which throws away the higher octet of each UTF-16 code unit of the Unicode string. Then you use .toString('utf-8') which interprets this mutilated string as UTF-8 (in actuality, it is most likely not valid UTF-8 at all). Then you pass the ‘UTF-8’ string to the windows-1251, to decode it as a ‘code page 1251’ string. Nothing good can possibly come from all this confusion.
The windows-1251 package you want to use takes so-called ‘binary’ (pseudo-Latin-1) strings as input. What you should do instead is take the binary response, interpret it as Latin-1/‘binary’ string and then pass it to the windows-1251 library for decoding.
In other words, use this:
const GOT = require('got');
const WIN1251 = require('windows-1251');
async function query() {
const body = await GOT('https://example.net/', {
resolveBodyOnly: true,
responseType: 'buffer'
});
const html = WIN1251.decode(body.toString('binary'))
console.log(html)
}
query()

Upload text file to Google Cloud storage with ASCII encoding with node

A legacy 3rd party I'm working with requires that we provide them with text files using ANSI (~ASCII) character encoding.
The content to be saved to the file is large so I'm using streams. If using the fs library I can do something like this:
const file = fs.createWriteStream(filePath, {encoding: 'ascii'});
data.pipe(file).on('error', handleError).on('finish', handleSuccess);
I'm trying to do the equivalent in the GCS node client:
const storage = new Storage();
const gcsFile = storage.bucket(bucket).file(fileName, {});
const fileStream = gcsFile.createWriteStream();
data.pipe(fileStream).on('error', handleError).on('finish', handleSuccess);
However the createWriteStream method has no such option to specify the character encoding.
Is there a way to explicitly stream data using ASCII character encoding to GCS?
I think you can pass options.metadata to the createWriteStream. Does something like the following work for you?
gcsFile = gcsBucket.file(fileName);
fs.createReadStream('myLocalFile.txt')
.pipe(myFile.createWriteStream({
metadata: {
contentEncoding: 'ascii'
}
}))
.on('error', handleError)
.on('finish', handleSuccess);
See the client ref docs here: https://googleapis.dev/nodejs/storage/latest/global.html#CreateWriteStreamOptions

Decode Base64, then parse CSV in Express

I have a base64 encoded csv file, and I want to process it without saving to storage. How do you decode a base64 string, then assign it to a variable and then parse it using NodeJS?
There are many modules in the main npm repository. This is just one I chose, you can use another one. The module is base-x, the docs page has examples, which you should modify slightly to work with the base64 encoding:
var BASE64 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
var bs64 = require('base-x')(BASE64);
var decoded = bs64.decode(youStringVariable);
// then store the decoded string or log it, or whatever
// console.log(decoded);
// myApi.store(decoded); etc.

Node.js encoding UTF-8 issue

I have been facing an issue on node.js express framework encoding/decoding style.
Brief background, I store pdf file in mysql database with longblob data-type with latin1 charset. From server side, i need to send the binary data with UTF8 Encoding format as my client knows utf8 decoding format only.
I tried all the possible solutions available on google.
For ex:
new Buffer(mySqlData).toString('utf8');
Already tried module "UTF8" with given functionality utf8.encode(mySqlData); But it is not working.
Also i already tried "base64" encoding and retrieve data at client with base64 decoding. It is working just fine but i need to have utf8 encoding set. Also you know base64 certainly increase the size.
Please help guys.
Ok, your problem is the conversion of latin to utf-8. If you just call your buffer.toString('utf-8'), the latin encoded characters were wrong.
To convert other charset to utf-8, the simple wai is to use iconv and icu-charset-detector. With that, you can switch to utf-8 from all possibles charset (except certains charset).
This is an example of conversion using stream. The result stream is encoded with utf-8 :
var charsetDetector = require("node-icu-charset-detector"),
Iconv = require('iconv').Iconv,
Stream = require('stream'),
function convertToUtf8(source, callback) {
var iconv,
charsetTestStream = new Stream.PassThrough(),
newResStream = new Stream.PassThrough();
source.pipe(charsetTestStream);
source.pipe(newResStream);
charsetDetector.detectCharsetStream(charsetTestStream, function (charset) {
if (!iconv && charset && !/utf-*8/i.test(charset.toString())) {
try {
iconv = new Iconv(charset, 'utf-8');
console.log('Converting from charset %s to utf-8', charset);
iconv.on('error', function (err) {
callback(err);
});
var convertStream = newResStream.pipe(iconv);
callback(null, convertStream);
} catch(err) {
callback(err);
}
return;
}
callback(null, newResStream);
});
}

Resources