node base64 decode incorrect in windows10

node base64 decode incorrect in windows10 - node.js

Exec the code use nodejs v16 or v18 in windows10(10.0.19044)
const content = 'c21hcnQgc2l6aW5nOmk6MQ1hZG1pbmlzdHJhdGl2ZSBzZXNzaW9uOmk6MA1zY3JlZW4gbW9kZSBpZDppOjENZGVza3RvcHdpZHRoOmk6MjE5NA1kZXNrdG9waGVpZ2h0Omk6MTIzNA1mdWxsIGFkZHJlc3M6czoxMjcuMC4wLjE6OTQ3Nw1yZWRpcmVjdGNsaXBib2FyZDppOjANYXV0b3JlY29ubmVjdGlvbiBlbmFibGVkOmk6MQ1kcml2ZXN0b3JlZGlyZWN0OnM6RHluYW1pY0RyaXZlcw1jb25uZWN0aW9uIHR5cGU6aToxDWF1dGhlbnRpY2F0aW9uIGxldmVsOmk6MA1hbGxvdyBmb250IHNtb290aGluZzppOjE='
console.log(
Buffer.from(content).toString()
)
// output: allow font smoothing:i:1icDrives
But the correct result should be:
window.atob('c21hcnQgc2l6aW5nOmk6MQ1hZG1pbmlzdHJhdGl2ZSBzZXNzaW9uOmk6MA1zY3JlZW4gbW9kZSBpZDppOjENZGVza3RvcHdpZHRoOmk6MjE5NA1kZXNrdG9waGVpZ2h0Omk6MTIzNA1mdWxsIGFkZHJlc3M6czoxMjcuMC4wLjE6OTQ3Nw1yZWRpcmVjdGNsaXBib2FyZDppOjANYXV0b3JlY29ubmVjdGlvbiBlbmFibGVkOmk6MQ1kcml2ZXN0b3JlZGlyZWN0OnM6RHluYW1pY0RyaXZlcw1jb25uZWN0aW9uIHR5cGU6aToxDWF1dGhlbnRpY2F0aW9uIGxldmVsOmk6MA1hbGxvdyBmb250IHNtb290aGluZzppOjE=')
// output: smart sizing:i:1\radministrative session:i:0\rscreen mode id:i:1\rdesktopwidth:i:2194\rdesktopheight:i:1234\rfull address:s:127.0.0.1:9477\rredirectclipboard:i:0\rautoreconnection enabled:i:1\rdrivestoredirect:s:DynamicDrives\rconnection type:i:1\rauthentication level:i:0\rallow font smoothing:i:1

The string is decoding correctly, if you assign the value to a variable and inspect it you can see your expected output. The issue here is what is being displayed in the console, this is being caused by the string having special characters which are being parsed.
The problem is \r - this is the sequence for a Carriage Return
You would need to escape the values for the string to fully display as you expect, there's a couple of different ways you can do this:
Using JSON.stringify as a quick way to escape the full string:
const content = "c21hcnQgc2l6aW5nOmk6MQ1hZG1pbmlzdHJhdGl2ZSBzZXNzaW9uOmk6MA1zY3JlZW4gbW9kZSBpZDppOjENZGVza3RvcHdpZHRoOmk6MjE5NA1kZXNrdG9waGVpZ2h0Omk6MTIzNA1mdWxsIGFkZHJlc3M6czoxMjcuMC4wLjE6OTQ3Nw1yZWRpcmVjdGNsaXBib2FyZDppOjANYXV0b3JlY29ubmVjdGlvbiBlbmFibGVkOmk6MQ1kcml2ZXN0b3JlZGlyZWN0OnM6RHluYW1pY0RyaXZlcw1jb25uZWN0aW9uIHR5cGU6aToxDWF1dGhlbnRpY2F0aW9uIGxldmVsOmk6MA1hbGxvdyBmb250IHNtb290aGluZzppOjE="
console.log(JSON.stringify(Buffer.from(content, 'base64').toString()))
//output: "smart sizing:i:1\radministrative session:i:0\rscreen mode id:i:1\rdesktopwidth:i:2194\rdesktopheight:i:1234\rfull address:s:127.0.0.1:9477\rredirectclipboard:i:0\rautoreconnection enabled:i:1\rdrivestoredirect:s:DynamicDrives\rconnection type:i:1\rauthentication level:i:0\rallow font smoothing:i:1"
Or, replacing the special characters using String.prototype.replaceAll():
const content = "c21hcnQgc2l6aW5nOmk6MQ1hZG1pbmlzdHJhdGl2ZSBzZXNzaW9uOmk6MA1zY3JlZW4gbW9kZSBpZDppOjENZGVza3RvcHdpZHRoOmk6MjE5NA1kZXNrdG9waGVpZ2h0Omk6MTIzNA1mdWxsIGFkZHJlc3M6czoxMjcuMC4wLjE6OTQ3Nw1yZWRpcmVjdGNsaXBib2FyZDppOjANYXV0b3JlY29ubmVjdGlvbiBlbmFibGVkOmk6MQ1kcml2ZXN0b3JlZGlyZWN0OnM6RHluYW1pY0RyaXZlcw1jb25uZWN0aW9uIHR5cGU6aToxDWF1dGhlbnRpY2F0aW9uIGxldmVsOmk6MA1hbGxvdyBmb250IHNtb290aGluZzppOjE="
console.log(Buffer.from(content, 'base64').toString().replaceAll('\r','\\r'))
//output: smart sizing:i:1\radministrative session:i:0\rscreen mode id:i:1\rdesktopwidth:i:2194\rdesktopheight:i:1234\rfull address:s:127.0.0.1:9477\rredirectclipboard:i:0\rautoreconnection enabled:i:1\rdrivestoredirect:s:DynamicDrives\rconnection type:i:1\rauthentication level:i:0\rallow font smoothing:i:1
Same as above but using RegEx:
const content = "c21hcnQgc2l6aW5nOmk6MQ1hZG1pbmlzdHJhdGl2ZSBzZXNzaW9uOmk6MA1zY3JlZW4gbW9kZSBpZDppOjENZGVza3RvcHdpZHRoOmk6MjE5NA1kZXNrdG9waGVpZ2h0Omk6MTIzNA1mdWxsIGFkZHJlc3M6czoxMjcuMC4wLjE6OTQ3Nw1yZWRpcmVjdGNsaXBib2FyZDppOjANYXV0b3JlY29ubmVjdGlvbiBlbmFibGVkOmk6MQ1kcml2ZXN0b3JlZGlyZWN0OnM6RHluYW1pY0RyaXZlcw1jb25uZWN0aW9uIHR5cGU6aToxDWF1dGhlbnRpY2F0aW9uIGxldmVsOmk6MA1hbGxvdyBmb250IHNtb290aGluZzppOjE="
console.log(Buffer.from(content, 'base64').toString().replace(/[\n\r]/g, '\\r'))
//output: smart sizing:i:1\radministrative session:i:0\rscreen mode id:i:1\rdesktopwidth:i:2194\rdesktopheight:i:1234\rfull address:s:127.0.0.1:9477\rredirectclipboard:i:0\rautoreconnection enabled:i:1\rdrivestoredirect:s:DynamicDrives\rconnection type:i:1\rauthentication level:i:0\rallow font smoothing:i:1

Related

Unescape encoded string in Node (\x##)

I have the following encoded string in Node;
const test = '\x50\x77\x6b\x6d\x77\x37\x54\x43\x6f\x51\x3d\x3d'
I want to get it's unencoded value Pwkmw7TCoQ==
How can I achieve this?

Use the .toString() method. It should work.
test.toString()

Nothing to do there. Just print the string to the console as it is.
const test = '\x50\x77\x6b\x6d\x77\x37\x54\x43\x6f\x51\x3d\x3d';
console.log(test);
'\x50\x77\x6b\x6d\x77\x37\x54\x43\x6f\x51\x3d\x3d' and 'Pwkmw7TCoQ==' are different notations for the same value.

Fails to parse Hebrew text from pdf using iText 7 with .net

I am trying to read a PDF file with several pages, using iText 7 on a .NET CORE 2.1
The following is my code:
Rectangle rect = new Rectangle(0, 0, 1100, 1100);
LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
inputStr = PdfTextExtractor.GetTextFromPage(pdfDocument.GetPage(i), strategy);
inputStr gets the following string:
"\u0011\v\u000e\u0012\u0011\v\f)(*).=*%'\f*).5?5.5*.\a \u0011\u0002\u001b\u0001!\u0016\u0012\u001a!\u0001\u0015\u001a \u0014\n\u0015\u0017\u0001(\u001b)\u0001)\u0016\u001c*\u0012\u0001\u001d\u001a \u0016* \u0015\u0001\u0017\u0016\u001b\u001a(\n,\u0002>&\u00...
and in the Text Visualizer, it looks like that:
)(*).=*%'*).5?5.5*. !!
())* * (
,>&2*06) 2.-=9 )=&,

2..*0.5<.?
.110
)<1,3
  2.3*1>?)10/6
 (& >(*,1=0>>*1?

  2.63)&*,..*0.5
  206)&13'?*9*<
  *-5=0>
?*&..,?)..*0.5
it looks like I am unable to resolve the encoding or there is a specific, custom encoding at the PDF level I cannot read/parse.
Looking at the Document Properties, under Fonts it says the following:
Any ideas how can I parse the document correctly?
Thank you
Yaniv

Analysis of the shared files
file1_copyPasteWorks.pdf
The font definitions here have an invalid ToUnicode entry:
/ToUnicode/Identity-H
The ToUnicode value is specified as
A stream containing a CMap file that maps character codes to Unicode values
(ISO 32000-2, Table 119 — Entries in a Type 0 font dictionary)
Identity-H is a name, not a stream.
Nonetheless, Adobe Reader interprets this name, and for apparently any name starting with Identity- assumes the text encoding for the font to be UCS-2 (essentially UTF-16). As this indeed is the case for the character codes used in the document, copy&paste works, even if for the wrong reasons. (Without this ToUnicode value, Adobe Reader also returns nonsense.)
iText 7, on the other hand, for mapping to Unicode first follows the Encoding value with unexpected results.
Thus, in this case Adobe Reader arrives at a better result by interpreting meaning into an invalid piece of data (and without that also returns nonsense).
file2_copyPasteFails.pdf
The font definitions here have valid but incomplete ToUnicode maps which only contain entries for the used Western European characters but not for Hebrew ones. They don't have Encoding entries.
Both Adobe Reader and iText 7 here trust the ToUnicode map and, therefore, cannot map the Hebrew glyphs.
How to parse
file1_copyPasteWorks.pdf
In case of this file the "problem" is that iText 7 applies the Encoding map. Thus, for decoding the text one can temporarily replace the Encoding map with an identity map:
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); i++)
{
PdfPage page = pdfDocument.GetPage(i);
PdfDictionary fontResources = page.GetResources().GetResource(PdfName.Font);
foreach (PdfObject font in fontResources.Values(true))
{
if (font is PdfDictionary fontDict)
fontDict.Put(PdfName.Encoding, PdfName.IdentityH);
}
string output = PdfTextExtractor.GetTextFromPage(page);
// ... process output ...
}
This code shows the Hebrew characters for your file 1.
file2_copyPasteFails.pdf
Here I don't have a quick work-around. You may want to analyze multiple PDFs of that kind. If they all encode the Hebrew characters the same way, you can create your own ToUnicode map from that and inject it into the fonts like above.

Converting stream to string in node.js

I am reading a file which comes in as an attachment like follows
let content = fs.readFileSync(attachmentNames[index], {encoding: 'utf8'});
When I inspect content, it looks ok, I see file contents but when I try to assign it to some other variable
attachmentXML = builder.create('ATTACHMENT','','',{headless:true})
.ele('FILECONTENT',content).up()
I get the following error
Error: Invalid character in string: PK
There are a couple of rectangular boxes (special characters) after PK in the above message which are not getting displayed.
builder here refers to an instance of the xmlbuilder https://www.npmjs.com/package/xmlbuilder node module.

I fixed this by enclosing the string inside a JS escape() method

golang remove characters (used for readability) in const string at compile time (spaces, \n and \t)

Spaces are useful to indent urls, sql queries to make it more readable.
Is there a way to remove characters from a const string at compile time in golang ?
ex: (runtime version)
const url = `https://example.com/path?
attr1=test
&attr2=test
`
// this is the code to be replaced
urlTrim := strings.Replace(
strings.Replace(url, "\n", "", -1)
)

Constant expressions cannot contain function calls (except a few built-in functions). So what you want cannot be done using a raw string literal.
If your goal with using multiple lines is just for readability, simply use multiple literals and concatenate them:
const url = "https://example.com/path?" +
"attr1=test" +
"&attr2=test"
Try it on the Go Playground.
See related question: Initialize const variable

nodejs skipping single quote from json key in output

I see a very weird problem when json when used in nodejs, it is skipping single quote from revision key . I want to pass this json as input to node request module and since single quote is missing from 'revision' key so it is not taking as valid json input. Could someone help how to retain it so that I can use it. I have tried multiple attempts but not able to get it correct.
What did I try ?
console.log(jsondata)
jsondata = {
'splits': {
'os-name': 'ubuntu',
'platform-version': 'os',
'traffic-percent': 100,
'revision': 'master'
}
}
Expected :-
{ splits:
{ 'os-name': 'ubuntu',
'platform-version': 'os',
'traffic-percent': 100,
'revision': 'master'
}
}
But in actual output single quote is missing from revision key :-
{ splits:
{ 'os-name': 'ubuntu',
'platform-version': 'os',
'traffic-percent': 100,
revision: 'master'
}
}
Run 2 :- Tried below code this also produce same thing.
data = JSON.stringify(jsondata)
result = JSON.parse(data)
console.log(result)
Run 3:- Used another way to achieve it
jsondata = {}
temp = {}
splits = []
temp['revision'] = 'master',
temp['os-name'] = 'ubuntu'
temp['platform-version'] = 'os'
temp['traffic-percent'] = 100
splits.push(temp)
jsondata['splits'] = splits
console.log(jsondata)
Run 4: tries replacing single quotes to double quotes
Run 5 : Change the order of revision line

This is what is supposed to happen. The quotes are kept only if the object key it’s not a valid JavaScript identifier. In your example, the 'splits' & 'revision' don't have a dash in their name, so they are the only ones with the quotes removed.
You shouldn't receive any error using this object - if you do, update this post mentioning the scenario and the error.

You should note that JSON and JavaScript are not the same things.
JSON is a format where all keys and values are surrounded by double quotes ("key" and "value"). A JSON string is produced by JSON.stringify, and is required by JSON.parse.
A JavaScript object has very similar syntax to the JSON file format, but is more flexible - the values can be surrounded by double quotes or single quotes, and the keys can have no quotes at all as long as they are valid JavaScript identifiers. If the keys have spaces, dashes, or other non-valid characters, then they need to be surrounded by single quotes or double quotes.
If you need your string to be valid JSON, generate it with JSON.stringify. If it's OK for it to be just valid JavaScript, then it's already fine - it does not matter whether the quotes are there or not.
If, for some reason, you need some imaginary third option (perhaps you are interacting with an API where someone has written their own custom string parser, and they are demanding that all keys are surrounded by single quotes?) you will probably need to write your own little string generator.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

node base64 decode incorrect in windows10 - node.js

Related

Unescape encoded string in Node (\x##)

Fails to parse Hebrew text from pdf using iText 7 with .net

Converting stream to string in node.js

golang remove characters (used for readability) in const string at compile time (spaces, \n and \t)

nodejs skipping single quote from json key in output

Categories

Resources