Buffer.from(base64EncodedString, 'base64').toString('binary') vs 'utf8' - node.js

In Node.js: Why does this test fail on the second call of main?
test('base64Encode and back', () => {
function main(input: string) {
const base64string = base64Encode(input);
const text = base64Decode(base64string);
expect(input).toEqual(text);
}
main('demo');
main('😉😉😉');
});
Here are my functions:
export function base64Encode(text: string): string {
const buffer = Buffer.from(text, 'binary');
return buffer.toString('base64');
}
export function base64Decode(base64EncodedString: string): string {
const buffer = Buffer.from(base64EncodedString, 'base64');
return buffer.toString('binary');
}
From these pages, I figured I had written these functions correctly so that one would reverse the other:
https://github.com/node-browser-compat/btoa/blob/master/index.js
https://github.com/node-browser-compat/atob/blob/master/node-atob.js
https://stackoverflow.com/a/47890385/470749
If I change the 'binary' options to be 'utf8'instead, the test passes.
But my database currently has data where this function only seems to work if I use 'binary'.

binary is an alias for latin1
'latin1': Latin-1 stands for ISO-8859-1. This character encoding only supports the Unicode characters from U+0000 to U+00FF. Each character is encoded using a single byte. Characters that do not fit into that range are truncated and will be mapped to characters in that range.
This character set is unable to display multibyte utf8 characters.
To get utf8 multibyte characters back, go directly to base64 and back again
function base64Encode(str) {
return Buffer.from(str).toString('base64')
}
function base64Decode(str) {
return Buffer.from(str, 'base64').toString()
}
> base64Encode('😉')
'8J+YiQ=='
> base64Decode('8J+YiQ==')
'😉'

Related

How to scape special characters for linux, nodejs exec function

I'm running this ffmpeg command on my linux server and while I paste it into the terminal, it works just fine but as soon as I use execPromise to run the EXACT same command, it returns an error.
const { exec } = require('child_process');
const { promisify } = require('util');
const execPromise = promisify(exec);
const encode = async ffmpegCode => {
try {
console.log(ffmpegCode) //Here I can see that the code is the
//exact same one than the one that works
//when pasted into the terminal
await execPromise(ffmpegCode);
return 200
} catch (err) {
console.log(err)
}
}
I need \: to be interpreted as such. When I type it as is, \:, the error message shows me that it interpreted it as : which is expected.
If I pass in \\:, I expect it to interpret it as I need it which would be \: but the error shows me that it interprets it as \\:.
\\\: is interpreted as \\: and \\\\: is interpreted as \\\\:.
Part of the command passed:
...drawtext=text='timestamp \\: %{pts \\: localtime \\: 1665679092.241...
Expected command:
...drawtext=text='timestamp \: %{pts \: localtime \: 1665679092.241...
Error message:
...drawtext=text='timestamp \\: %{pts \\: localtime \\: 1665679092.241...
How do I get /: through to the exec function?
It looks like the issue might be related to escaping special characters in the ffmpegCode string. The execPromise function is interpreting the backslash character () as an escape character, which is modifying the string passed to ffmpeg in unexpected ways.
To properly escape special characters in the ffmpegCode string, you can use the built-in JSON.stringify() function. This function properly escapes special characters in a string and produces a valid JSON string that can be passed as an argument to execPromise.
Here's an updated encode function that uses JSON.stringify() to properly escape the ffmpegCode string:
const encode = async ffmpegCode => {
try {
console.log(ffmpegCode);
const command = `ffmpeg ${JSON.stringify(ffmpegCode)}`;
await execPromise(command);
return 200;
} catch (err) {
console.log(err);
}
};
By wrapping the ffmpegCode string with JSON.stringify(), the backslashes and other special characters will be properly escaped, allowing the execPromise function to properly execute the ffmpeg command.
Note that the resulting command string will be wrapped in double quotes, so you may need to modify your ffmpeg command to properly handle quoted arguments.

Apply regex to .txt file node.js

I'm trying to escape quotes in txt file using node.js and regex.
My code looks like this:
const fs = require("fs");
const utf8 = require("utf8");
var dirname = ".\\f\\";
const regex = new RegExp(`(?<=".*)"(?=.*"$)`, "gm");
fs.readFile(dirname + "test.txt", (error, data) => {
if (error) {
throw error;
}
var d = data.toString();
d = utf8.encode(d)
console.log(`File: ${typeof d}`); //string
// d = `Another string\n"Test "here"."\n"Another "here"."\n"And last one here."`;
console.log(`Text: ${typeof d}`); //string
var re = d.replace(regex, '\\"');
console.log(`Result:\n${re}`);
/* Another string
"Test \"here\"."
"Another \"here\"."
"And last one here."
*/
});
The problem is:
When I remove comment from the line, everything works fine. But if i read the text from the file it doesn't want to work.
Thanks for any comments on this.
Well.. turns out the problem was in file encoding. The file was encoded in UTF-16, not in UTF-8. Node.js wasn't giving me any signs of wrong encoding, so well, nice.

How to handle special characters in csv file with encoding other then utf-8

I am trying to read a csv file in nodejs using createReadStream but stuck with an issue in case of special characters. When csv file charset is UTF-8 it return special characters intact but if charset is other then UTF-8 then special characters are being converted to ?
Here is what i have tried :
let parseOptions = {
headers: false,
ignoreEmpty: false,
trim: true,
discardUnmappedColumns: false,
quoteHeaders: true
};
let stream = fs.createReadStream(obj.data.file_data.path, {encoding : 'utf8'});
let parser=csv.fromStream(stream, parseOptions)
.on("data", function(row){
console.log('Row data ----->', row);
// Prints row
}).on("end", function(){
// proccess data here
});
I have tried with encoding option binary, utf16 and other as well but nothing seems to handle all characters. Is there any way we can ignore charset and get intact special characters or convert it to UTF-8 charset.

Converting a string from utf8 to latin1 in NodeJS

I'm using a Latin1 encoded DB and can't change it to UTF-8 meaning that I run into issues with certain application data. I'm using Tesseract to OCR a document (tesseract encodes in UTF-8) and tried to use iconv-lite; however, it creates a buffer and to convert that buffer into a string. But again, buffer to string conversion does not allow "latin1" encoding.
I've read a bunch of questions/answers; however, all I get is setting client encoding and stuff like that.
Any ideas?
Since Node.js v7.1.0, you can use the transcode function from the buffer module:
https://nodejs.org/api/buffer.html#buffer_buffer_transcode_source_fromenc_toenc
For example:
const buffer = require('buffer');
const latin1Buffer = buffer.transcode(Buffer.from(utf8String), "utf8", "latin1");
const latin1String = latin1Buffer.toString("latin1");
You can create a buffer from the UFT8 string you have, and then decode that buffer to Latin 1 using iconv-lite, like this
var buff = new Buffer(tesseract_string, 'utf8');
var DB_str = iconv.decode(buff, 'ISO-8859-1');
I've found a way to convert any encoded text file, to UTF8
var
fs = require('fs'),
charsetDetector = require('node-icu-charset-detector'),
iconvlite = require('iconv-lite');
/* Having different encodings
* on text files in a git repo
* but need to serve always on
* standard 'utf-8'
*/
function getFileContentsInUTF8(file_path) {
var content = fs.readFileSync(file_path);
var original_charset = charsetDetector.detectCharset(content);
var jsString = iconvlite.decode(content, original_charset.toString());
return jsString;
}
I'ts also in a gist here: https://gist.github.com/jacargentina/be454c13fa19003cf9f48175e82304d5
Maybe you can try this, where content should be your database buffer data (in latin1 encoding)

Invalid character entity parsing xml

I am trying to parse a string of xml and I getting an error
[Error: Invalid character entity
Line: 0
Column: 837
Char: ]
Does xml not like brackets? Do I need to replace all the brackets with something like \\]. Thanks
Ok, the invalid character was the dash and an &. I fixed it by doing the following:
xml = data.testSteps.replace(/[\n\r]/g, '\\n')
.replace(/&/g,"&")
.replace(/-/g,"-");
Thanks
Using a node domparser will get around having to do a string replace on every character that is not easily parsed as a string. This is especially useful if you have a large amount of XML to parse that may have different characters.
I would recommend xmldom as I have used it successfully with xml2js
The combined usage looks like the following:
var parseString = require('xml2js').parseString;
var DOMParser = require('xmldom').DOMParser;
var xmlString = "<test>some stuff </test>";
var xmlStringSerialized = new DOMParser().parseFromString(xmlString, "text/xml");
parseString(xmlStringSerialized, function (err, result) {
if (err) {
//did not work
} else {
//worked! use JSON.stringify()
var allDone = JSON.stringify(result);
}
});

Resources