How to compare filenames with difference in special character encoding?

How to compare filenames with difference in special character encoding? - node.js

I am working with a system that syncs files between two vendors. The tooling is written in Javascript and does a transformation on file names before sending it to the destination. I am trying to fix a bug in it that is failing to properly compare file names between the origin and destination.
The script uses the file name to check if it's on destination
For example:
The following file name contains a special character that has different encoding between source and destination.
source: Chinchón.jpg // hex code: ó
destination : Chinchón.jpg // hex code: 0xf3
The function that does the transformation is:
export const normalizeText = (text:string) => text
.normalize('NFC')
.replace(/\p{Diacritic}/gu, "")
.replace(/\u{2019}/gu, "'")
.replace(/\u{ff1a}/gu, ":")
.trim()
and the comparison is happening just like the following:
const array1 = ['Chinchón.jpg'];
console.log(array1.includes('Chinchón.jpg')); // false
Do I reverse the transformation before comparing? what's the best way to do that?

If i got your question right:
// prepare dictionary
const rawDictionary = ['Chinchón.jpg']
const dictionary = rawDictionary.map(x => normalizeText(x))
...
const rawComparant = 'Chinchón.jpg'
const comparant = normalizeText(rawComparant)
console.log(rawSources.includes(comparant))

Related

the output of "crypto.createCipheriv with chinese character" is not correct

when there is no chinese character, php and node output the same result.
but when this is chinese character, the output of php is correct, the output of node is not correct
const crypto = require('crypto');
function encodeDesECB(textToEncode, keyString) {
var key = new Buffer(keyString.substring(0, 8), 'utf8');
var cipher = crypto.createCipheriv('des-ecb', key, '');
cipher.setAutoPadding(true);
var c = cipher.update(textToEncode, 'utf8', 'base64');
c += cipher.final('base64');
return c;
}
console.log(encodeDesECB(`{"key":"test"}`, 'MIGfMA0G'))
console.log(encodeDesECB(`{"key":"测试"}`, 'MIGfMA0G'))
node output
6RQdIBxccCUFE+cXPODJzg==
6RQdIBxccCWXTmivfit9AOfoJRziuDf4
php output
6RQdIBxccCUFE+cXPODJzg==
6RQdIBxccCXFCRVbubGaolfSr4q5iUgw

The problem is not the encryption, but a different JSON serialization of the plaintext.
In the PHP code, json_encode() converts the characters as a Unicode escape sequence, i.e. the encoding returns {"key":"\u6d4b\u8bd5"}. In the NodeJS code, however, {"key": "测试"} is applied.
This means that different plaintexts are encrypted in the end. Therefore, for the same ciphertext, a byte-level identical plaintext must be used.
If Unicode escape sequences are to be applied in the NodeJS code (as in the PHP code), an appropriate conversion is necessary. For this the jsesc package can be used:
const jsesc = require('jsesc');
...
console.log(encodeDesECB(jsesc(`{\"key\":\"测试\"}`, {'lowercaseHex': true}), 'MIGfMA0G')); // 6RQdIBxccCXFCRVbubGaolfSr4q5iUgw
now returns the result of the posted PHP code.
If the Unicode characters are to be used unmasked in the PHP code (as in the NodeJS code), an appropriate conversion is necessary. For this the flag JSON_UNESCAPED_UNICODE can be set in json_encode():
$data = json_encode($data, JSON_UNESCAPED_UNICODE); // 6RQdIBxccCWXTmivfit9AOfoJRziuDf4
now returns the result of the posted NodeJS code.

node uri regex not capturing capture groups

I know there are a billion regex questions on stackoverflow, but I can't understand why my uri matcher isn't working in node.
I have the following:
var uri = "file:tmp.db?mode=ro"
function parseuri2db(uri){
var regex = new RegExp("(?:file:)(.*)(?:\\?.*)");
let dbname = uri.match(regex)
return dbname
}
I'm trying to identify only the database name, which I expect to be:
After an uncaptured file: group
Before an optional ? + parameters to end of string.
While I'm using:
var regex1 = new RegExp("(?:file:)(.*)(?:\\?.*)");
I thought the answer was actually more like:
var regex2 = new RegExp("(?:file:)(.*)(?:\\??.*)");
With a 0 or 1 ? quantifier on the \\? literal. But the latter fails.
Anyway, my result is:
console.log(parseuri2db(conf.db_in.filename))
[ 'file:tmp.db?mode=ro',
'tmp.db',
index: 0,
input: 'file:tmp.db?mode=ro' ]
Which seems to be capturing the whole string in the first argument, rather than just the single capture group I asked for.
My questions are:
What am I doing wrong that I'm getting multiple captures?
How can I rephrase this to capture my capture groups with names?
I expected something like the following to work for (2):
function parseuri2db(uri){
// var regex = new RegExp("(?:file:)(.*)(?:\\?.*)");
// let dbname = uri.match(regex)
var regex = new RegExp("(?<protocol>file:)(?<fname>.*)(<params>\\?.*)");
let [, protocol, fname, params] = uri.match(regex)
return dbname
}
console.log(parseuri2db(conf.db_in.filename))
But:
SyntaxError: Invalid regular expression: /(?<protocol>file:)(?<fname>.*)(<params>\?.*)/: Invalid group
Update 1
Answer to my first question is that I needed to not capture the ? literal in the second capture group:
"(?:file:)([^?]*)(?:\\??.*)"

That particular node regex library does not support groups.

Node.js: How to know real filename which matches an internal filename in different case on Windows

My Node.js program wants to read the contents of the file "test.txt" on a Windows machine. It checks with fs.existsSync() that the file exists and reads its content. But now I want the program instead to give an error or warning if the name of the file on disk is actually "TEST.txt" or any other name which differs in case from the name my program is looking for, e.g. "test.txt".
Is there a straightforward way to figure out that even though existsSync() tells me a file exists, the file on disk has a name which differs in case from the file-name I am using to look for it?

You can use fs.readdir to get a list of all files in directory and then compare the filename to see if matches as is.
var fs = require('fs');
var path = __dirname;
var filename = 'test.txt';
var files = fs.readdirSync(path);
var exists = files.includes(filename);
// true if file on disk is "test.txt",
// false if file on disk is "TEST.txt"
console.log(exists);

How to generate a base64 encoded, SHA-512 hash in Appcelerator?

Have been trying this for 2 days but failed miserably. We are using appcelerator 5.1.0.
I'm able to hash a string using the module Securely . However the result string is in hex format and i need it to be in base64 encoded string.
Tried the Ti.Utils.base64encode function but the result doesn't match what is generated at the backend. Here's my code snippet:
function convertHexToBase64(hexStr){
console.log("hex: "+hexStr);
var hexArray = hexStr
.replace(/\r|\n/g, "")
.replace(/([\da-fA-F]{2}) ?/g, "0x$1 ")
.replace(/ +$/, "")
.split(" ");
var byteString = String.fromCharCode.apply(null, hexArray);
var base64String = Ti.Utils.base64encode(byteString).toString();
console.log("base64 string:"+base64String);
return base64String;
}
Tried to find other modules to use and the node's Buffer is the closest i can get but am not sure how to use a node class in appcelerator...
Anyone can shed a light or two? Thanks.

Finally did it with the help of Forge, putting the steps here for future reference
Create a folder under the lib folder, named it forge
Install the module to local machine (via node), copy the whole contents of the js folder into the forge folder.
In the code, create the object:
var forge = require('forge/forge');
Hash the string first to get a buffer object, then encode it to base64 string.
var md = forge.md.sha512.create();
md.update(saltedText);
var buffer = md.digest();
result = forge.util.encode64(buffer.getBytes());

NodeSchool IO Exercies 3

I've started learning node.js
I'm currently on exercise 3, where we have to, based on a file buffer, calculate the number of new line characters "\n"
I pass the tester but somehow if I create my own file file.txt, I am able to get the buffer, and print out the string, but it is unable to calculate the number of new lines (console.log(newLineNum)) returns 0
Here is the code
//import file system module
var fs = require("fs");
//get the buffer object based on argv[2]
var buf = fs.readFileSync(process.argv[2]);
//convert buffer to string
var str_buff = buf.toString();
//length of str_buff
var str_length = str_buff.length;
var numNewLines = 0;
for (var i = 0; i < str_length; i ++)
{
if(str_buff.charAt(i) == '\n')
{
numNewLines++;
}
}
console.log(numNewLines);

If i understand your question correctly, you are trying to get the line length of current file.
From the documentation:
The first element will be 'node', the second element will be the name
of the JavaScript file.
So you should replace process.argv[2] with process.argv[1].
Edit:
If you are passing a parameter for a file name on command-line like:
node server.py 'test.txt'
your code should work without any problem.

Your code is fine. You should check the file that you are using for the input.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to compare filenames with difference in special character encoding? - node.js

If i got your question right: // prepare dictionary const rawDictionary = ['Chinchón.jpg'] const dictionary = rawDictionary.map(x => normalizeText(x)) ... const rawComparant = 'Chinchón.jpg' const comparant = normalizeText(rawComparant) console.log(rawSources.includes(comparant))

Related

the output of "crypto.createCipheriv with chinese character" is not correct

node uri regex not capturing capture groups

Node.js: How to know real filename which matches an internal filename in different case on Windows

How to generate a base64 encoded, SHA-512 hash in Appcelerator?

NodeSchool IO Exercies 3

Categories

Resources