parameter from package.json script (Encoding problem) - node.js

https://nodejs.org/docs/latest/api/process.html#processargv
https://www.golinuxcloud.com/pass-arguments-to-npm-script/
passing a parameter by invoking a script in package.json as follows:
--pathToFile=./ESMM/Parametrização_Dezembro_PS1_2022.xlsx
in code retrieve that parameter as argument
const value = process.argv.find( element => element.startsWith( `--pathToFile=` ) );
const pathToFile=value.replace( `--pathToFile=` , '' );
The string that's obtain seems to be in the wrong format/encoding
./ESMM/Parametrização_Dezembro_PS1_2022.xlsx
I tried converting to latin1 (other past issues were fixed with this encoding)
const latin1Buffer = buffer.transcode(Buffer.from(pathToFile), "utf8", "latin1");
const latin1String = latin1Buffer.toString("latin1");
but still don't get the string in the correct encoding:
./ESMM/Parametriza?º?úo_Dezembro_PS1_2022.xlsx
My package.json is in UTF-8.
My current locale is (chcp): Active code page: 850
OS: Windows
This seems to be related to:
https://code.visualstudio.com/docs/editor/tasks#_changing-the-encoding-for-a-task-output
vs code, how to change encoding for terminal triggered by "build task"
https://pt.stackoverflow.com/questions/148543/como-consertar-erro-de-acentua%C3%A7%C3%A3o-do-cmd
Get argv raw bytes in Node.js
will try those configurations
const min = parseInt("0xD800",16), max = parseInt("0xDFFF",16);
console.log(min);//55296
console.log(max);//57343
let textFiltered = "",specialChars = 0;
for(let charAux of pathToFile){
const hexChar = Buffer.from(charAux, 'utf8').toString('hex');
console.log(hexChar)
const intChar = parseInt(hexChar,16);
if(hexChar.length > 2){
//if(intChar>min && intChar<max){
//console.log(Buffer.from(charAux, 'utf8').toString('hex'))
specialChars++;
console.log(`specialChars(${specialChars}): ${hexChar}`);
}else{
textFiltered += String.fromCharCode(intChar);
}
}
console.log(textFiltered); //normal characters
./ESMM/Parametrizao_Dezembro_PS1_2022.xlsx
console.log(specialChars(${specialChars}): ${hexChar}); //specialCharacters
specialChars(1): e2949c
specialChars(2): c2ba
specialChars(3): e2949c
specialChars(4): c3ba
seems that e2949c hex value to indicate a special character since it repeats and 0xc2ba should be able to convert to "ç" and 0xc3ba to "ã" idealy still trying to figure that out.
Each Unicode codepoint can be written in a string with \u{xxxxxx} where xxxxxx represents 1–6 hex digits

As #JosefZ indicated but for Python, in my case gona use a direct conversion since will alls have the keyword "Parametrização" as part of the parameter.
The probleam that encountered in this case is that my package.json and my script are in the correct format UTF8 as stated by #tripleee (thanks for the help providade) but process.argv that returns <string[]> that basicaly UTF16... so my solution is deal with the ├ that in hex is "e2949c" and retrive the correct characters:
const UTF8_Character = "e2949c" //├
//for this cases use this json/array that haves the correct encoding
const personalized_encoding = {
"c2ba": "ç",
"c3ba": "ã"
}
let textFiltered = "",specialChars = 0;
for(let charAux of pathToFile){
const hexChar = Buffer.from(charAux, 'utf8').toString('hex');
//console.log(hexChar)
const intChar = parseInt(hexChar,16);
if(hexChar.length > 2){
if(hexChar === UTF8_Character) continue;
specialChars++;
//console.log(`specialChars(${specialChars}): ${hexChar}`);
textFiltered += personalized_encoding[hexChar];
}else{
textFiltered += String.fromCharCode(intChar);
}
}
console.log(textFiltered);

Related

Issue concatenating two strings containing '&' in dart

I have a code like this :
// Language = Dart
var someVariable = 'Hello';
var someOtherVariable = 'World';
var str = 'somedomain?x=${someVariable}&y=${someOtherVariable}';
return str;
// Expected:
// somedomain?x=Hello&y=World;
// Actual
// somedomain?x=Hello
If I replace the & character with any alphabets, it is able to successfully concatenate. What am I doing wrong.
This is the actual code which I used in FlutterFlow, and am having issues with:
Future<String> getEventUrlFromReference(BuildContext context, DocumentReference? eventReference) async {
var userId = currentUser?.uid as String;
return "https://somedomain.com/event?eventReference=${eventReference?.id}" + "&invitedBy="+userId;
}
// result: https://somedomain.com/event?eventReference=referencevalue
This was a string encoding issue. I was using the result of my function/code as body text in sms://<number>?&body=<string_containigng_&_character>; The text which is appended to the sms text truncates at the & character, and I made a mistake assuming it's a string concatenation issue.

Parsing log file to JSON object

Does anyone know what the best way would be to parse a log file (Salesforce) into a structured JSON object
There are certain sections in the log file that can be identified, like EXECUTION_STARTED,EXECUTION_FINISHED, CODE_UNIT_STARTED CODE_UNIT_FINISHED and many more
There are also time information that I would like to have in the JSON object
Are there any libraries available in nodejs that could be used to accomplish this ?
I was looking into antlr4 but it seems quite complex
-Jani
Looks like a CSV -> JSON conversion
// Reading the file using default
// fs npm package
const fs = require("fs");
csv = fs.readFileSync("CSV_file.csv")
// Convert the data to String and
// split it in an array
var array = csv.toString().split("\r");
// All the rows of the CSV will be
// converted to JSON objects which
// will be added to result in an array
let result = [];
// The array[0] contains all the
// header columns so we store them
// in headers array
let headers = array[0].split(", ")
// Since headers are separated, we
// need to traverse remaining n-1 rows.
for (let i = 1; i < array.length - 1; i++) {
let obj = {}
// Create an empty object to later add
// values of the current row to it
// Declare string str as current array
// value to change the delimiter and
// store the generated string in a new
// string s
let str = array[i]
let s = ''
// By Default, we get the comma separated
// values of a cell in quotes " " so we
// use flag to keep track of quotes and
// split the string accordingly
// If we encounter opening quote (")
// then we keep commas as it is otherwise
// we replace them with pipe |
// We keep adding the characters we
// traverse to a String s
let flag = 0
for (let ch of str) {
if (ch === '"' && flag === 0) {
flag = 1
}
else if (ch === '"' && flag == 1) flag = 0
if (ch === ', ' && flag === 0) ch = '|'
if (ch !== '"') s += ch
}
// Split the string using pipe delimiter |
// and store the values in a properties array
let properties = s.split("|")
// For each header, if the value contains
// multiple comma separated data, then we
// store it in the form of array otherwise
// directly the value is stored
for (let j in headers) {
if (properties[j].includes(", ")) {
obj[headers[j]] = properties[j]
.split(", ").map(item => item.trim())
}
else obj[headers[j]] = properties[j]
}
// Add the generated object to our
// result array
result.push(obj)
}
// Convert the resultant array to json and
// generate the JSON output file.
let json = JSON.stringify(result);
fs.writeFileSync('output.json', json);
Change "CSV_file.csv" above to whatever your CSV file is.
Save the js file as app.js and run using
node app.js
Source: https://www.geeksforgeeks.org/how-to-convert-csv-to-json-file-having-comma-separated-values-in-node-js/

nodejs convert decimal to hex(00-00) and reverse?

So I want to send a hex value to a service. The problem is that the service want this hex value in a specific format, 00-00, and also reversed.
For Example, when I want to tell the Service I want tag '1000':
1000 dec is 3E8 in hex. easy.
Now the service wants the format 03-E8.
Also the service reads from right to left so the final value would be E8-03.
Another example, '3' would be 03-00.
Edit:
I forgott to say that I dont need it as string, but as Uint8Array. So I would create the final result with new Uint8Array([232, 3]). (Eaquals E8-03 = 1000)
So the question in general is: How can I get a [232,3] from an input of 1000 or [3, 0] from 3?
Is there a build in methode or a pakage that already can do this convertion?
There's probably a much simpler way, but here is a simple solution.
Edit: If you need at least two pairs, you can change the first argument of padStart.
function dec2hexUi8Arr (n) {
const hex = (n).toString(16);
const len = hex.length;
const padLen = len + (len % 2);
const hexPad = hex.padStart(Math.max(padLen, 4), '0');
const pairs = hexPad.match(/../g).reverse().map(p => parseInt(p, 16));
const ui8Arr = new Uint8Array(pairs);
return ui8Arr;
}
const vals = [
1000,
3,
65536
].map(dec2hexUi8Arr);
console.log (vals);
I don't know any tool or package. I would do something like this:
const a = 1000;
const a0 = a % 256;
const a1 = Math.floor(a / 256);
const arr = new Uint8Array([a0, a1]);
console.log(arr);
const arrStr = []
arr.forEach((elem) => {
const str = elem.toString(16).toUpperCase();
if (str.length === 2) {
arrStr.push(str);
} else {
arrStr.push('0' + str)
}
});
console.log(arrStr.reverse().join('-'));
Output:
Uint8Array(2) [ 232, 3 ]
03-E8

NodeJS RTF ANSI Find and Replace Words With Special Chars

I have a find and replace script that works no problem when the words don't have any special characters. However, there will be a lot of times where there will be special characters since it's finding names. As of now this is breaking the script.
The script looks for {<some-text>} and attempts to replace the contents (as well as remove the braces).
Example:
text.rtf
Here's a name with special char {Kotouč}
script.ts
import * as fs from "fs";
// Ingest the rtf file.
const content: string = fs.readFileSync("./text.rtf", "utf8");
console.log("content::\n", content);
// The string we are looking to match in file text.
const plainText: string = "{Kotouč}";
// Look for all text that matches the patter `{TEXT_HERE}`.
const anyMatchPattern: RegExp = /{(.*?)}/gi;
const matches: string[] = content.match(anyMatchPattern) || [];
const matchesLen: number = matches.length;
for (let i: number = 0; i < matchesLen; i++) {
// It correctly identifies the targeted text.
const currMatch: string = matches[i];
const isRtfMetadata: boolean = currMatch.endsWith(";}");
if (isRtfMetadata) {
continue;
}
// Here I need a way to escape `plainText` string so that it matches the source.
console.log("currMatch::", currMatch);
console.log("currMatch === plainText::", currMatch === plainText);
if (currMatch === plainText) {
const newContent: string = content.replace(currMatch, "IT_WORKS!");
console.log("newContent:", newContent);
}
}
output
content::
{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\f0\fs24 \cf0 Here's a name with special char \{Kotou\uc0\u269 \}.}
currMatch:: {Kotou\uc0\u269 \}
currMatch === plainText:: false
It looks like ANSI escaping, and I've tried using jsesc but that produces a different string, {Kotou\u010D} instead of what the document produces {Kotou\uc0\u269 \}.
How can I dynamically escape the plainText string variable so that it matches what is found in the document?
What I needed was to deepen my knowledge on rtf formatting as well as general text encoding.
The raw RTF text read from the file gives us a few hints:
{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600...
This part of the rtf file metadata tells us a few things.
It is using RTF file formatting version 1. The encoding is ANSI, and specifically cpg1252, also known as Windows-1252 or CP-1252 which is:
...a single-byte character encoding of the Latin alphabet
(source)
The valuable piece of information from that is that we know it is using the Latin alphabet, this will be used later.
Knowing the specific RTF version used I stumbled upon the RTF 1.5 Spec
A quick search on that spec for one of the escape sequences that I was looking into revealed that it was an RTF specific escape control sequence, that is \uc0. So knowing that I was able to then parse what I was really after, \u269. Now I knew it was unicode and had a good hunch that the \u269 stood for unicode character code 269. So I look that up...
The \u269 (char code 269) shows up on this page to confirm. Now I know the character set and what needs done to get the equivalent plain text (unescaped), and there's a basic SO post I used here to get the function started.
Using all this knowledge I was able to piece it together from there. Here's the full corrected script and it's output:
script.ts
import * as fs from "fs";
// Match RTF unicode control sequence: http://www.biblioscape.com/rtf15_spec.htm
const unicodeControlReg: RegExp = /\\uc0\\u/g;
// Extracts the unicode character from an escape sequence with handling for rtf.
const matchEscapedChars: RegExp = /\\uc0\\u(\d{2,6})|\\u(\d{2,6})/g;
/**
* Util function to strip junk characters from string for comparison.
* #param {string} str
* #returns {string}
*/
const cleanupRtfStr = (str: string): string => {
return str
.replace(/\s/g, "")
.replace(/\\/g, "");
};
/**
* Detects escaped unicode and looks up the character by that code.
* #param {string} str
* #returns {string}
*/
const unescapeString = (str: string): string => {
const unescaped = str.replace(matchEscapedChars, (cc: string) => {
const stripped: string = cc.replace(unicodeControlReg, "");
const charCode: number = Number(stripped);
// See unicode character codes here:
// https://unicodelookup.com/#latin/11
return String.fromCharCode(charCode);
});
// Remove all whitespace.
return unescaped;
};
// Ingest the rtf file.
const content: string = fs.readFileSync("./src/TEST.rtf", "binary");
console.log("content::\n", content);
// The string we are looking to match in file text.
const plainText: string = "{Kotouč}";
// Look for all text that matches the pattern `{TEXT_HERE}`.
const anyMatchPattern: RegExp = /{(.*?)}/gi;
const matches: string[] = content.match(anyMatchPattern) || [];
const matchesLen: number = matches.length;
for (let i: number = 0; i < matchesLen; i++) {
const currMatch: string = matches[i];
const isRtfMetadata: boolean = currMatch.endsWith(";}");
if (isRtfMetadata) {
continue;
}
if (currMatch === plainText) {
const newContent: string = content.replace(currMatch, "IT_WORKS!");
console.log("\n\nnewContent:", newContent);
break;
}
const unescapedMatch: string = unescapeString(currMatch);
const cleanedMatch: string = cleanupRtfStr(unescapedMatch);
if (cleanedMatch === plainText) {
const newContent: string = content.replace(currMatch, "IT_WORKS_UNESCAPED!");
console.log("\n\nnewContent:", newContent);
break;
}
}
output
content::
{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural\partightenfactor0
\f0\fs24 \cf0 Here\'92s a name with special char \{Kotou\uc0\u269 \}}
newContent: {\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural\partightenfactor0
\f0\fs24 \cf0 Here\'92s a name with special char \IT_WORKS_UNESCAPED!}
Hopefully that helps others that aren't familiar with character encoding/escaping and it's uses in rtf formatted documents!

Convert Uint8Array into hex string equivalent in node.js

I am using node.js v4.5. Suppose I have this Uint8Array variable.
var uint8 = new Uint8Array(4);
uint8[0] = 0x1f;
uint8[1] = 0x2f;
uint8[2] = 0x3f;
uint8[3] = 0x4f;
This array can be of any length but let's assume the length is 4.
I would like to have a function that that converts uint8 into the hex string equivalent.
var hex_string = convertUint8_to_hexStr(uint8);
//hex_string becomes "1f2f3f4f"
You can use Buffer.from() and subsequently use toString('hex'):
let hex = Buffer.from(uint8).toString('hex');
Another solution:
Base function to convert int8 to hex:
// padd with leading 0 if <16
function i2hex(i) {
return ('0' + i.toString(16)).slice(-2);
}
reduce:
uint8.reduce(function(memo, i) {return memo + i2hex(i)}, '');
Or map and join:
Array.from(uint8).map(i2hex).join('');
Buffer.from has multiple overrides.
If it is called with your uint8 directly, it unnecessarily copies its content because it selects Buffer.from( <Buffer|Uint8Array> ) version.
You should call Buffer.from( arrayBuffer[, byteOffset[, length]] ) version which does not copy and just creates a view of the buffer.
let hex = Buffer.from(uint8.buffer,uint8.byteOffset,uint8.byteLength).toString('hex');
Buffer is nodeJS specific.
This is a version that works everywhere:
const uint8 = new Uint8Array(4);
uint8[0] = 0x1f;
uint8[1] = 0x2f;
uint8[2] = 0x3f;
uint8[3] = 0x4f;
function convertUint8_to_hexStr(uint8) {
Array.from(uint8)
.map((i) => i.toString(16).padStart(2, '0'))
.join('');
}
convertUint8_to_hexStr(uint8);

Resources