I'm making a get request in node.js to a url which returns an object i'm trying to parse. However I am getting the unexpected token error. I have played with different encodings as well as converting the response body into string and then removing those tokens but nothing works. Setting encoding to null also did not solve my problem.
Below is the response body im getting:
��[{"unit":"EN15","BOX":"150027","CD":"12 - Natural Gas Leak","Levl":"1","StrName":"1000 N Madison Ave","IncNo":"2020102317","Address":"1036 N Madison Ave","CrossSt":"W 5TH ST/NECHES ST"},{"unit":"EN23","BOX":"230004","CD":"44 - Welfare Check","Levl":"1","StrName":"S Lancaster Rd / E Overton Rd","IncNo":"2020102314","Address":"S Lancaster Rd / E Overton Rd","CrossSt":""}]
Those are the headers which go with my request
headers: {'Content-Type': 'text/plain; charset=utf-8'}
Here is how I'm parsing response body
const data = JSON.parse(response.body)
Any help would be greatly appreciated!
UPDATE: had to do this with response body to make it work
const data = response.body.replace(/^\uFFFD/, '').replace(/^\uFFFD/, '').replace(/\0/g, '')
You are probably getting byte order mark (BOM) of the UTF-8 string.
The simplest workeround would be to remove it before the parsing.
const data = JSON.parse(response.body.toString('utf8').replace(/^\uFFFD/, ''));
Update: Your first 2 characters are Unicode REPLACEMENT CHARACTER. In order to remove it, use the \uFFD character.
Related
I'm receiving polish text from a SOAP action that has the polish diacritics encoded as XML entities, but as far as I can tell, they are not encoded in UTF-8 but ISO-8859-1 and I'm struggling to decode them properly in NodeJS.
Example text: Borek FaÅÄcki
Expected decoding result: Borek Fałęcki
Current result: Borek Fałęcki
While I achieved the correct result in PHP using following code:
echo html_entity_decode('Borek FaÅÄcki', ENT_QUOTES | ENT_SUBSTITUTE | ENT_XML1, 'ISO-8859-1');
I'm having no luck in doing the same in NodeJS. There aren't many complete packages to help with decoding html/xml entities, I have used both entites and html-entities but they provide the same results, and none of them seem to have any charset settings.
const { decode, encode } = require('html-entities');
const entities = require('entities');
const txt = 'Borek FaÅÄcki';
console.log('html-entities decode', decode(txt));
console.log('utf8-encoding', encode('Borek Fałęcki', {
mode: 'nonAsciiPrintable',
numeric: 'decimal',
level: 'xml',
}));
console.log('entities decode', entities.decodeXML(txt));
Output:
html-entities decode Borek Fałęcki
utf8-encoding Borek Fałęcki
entities decode Borek Fałęcki
As we can see, when encoded with UTF-8 there are single entities for each character:
ł = ł
ę = ę
With ISO-8859-1, there are 2 entities per character. I have no more ideas how to achieve the same decoding result as in PHP. If there were no entities, I could just convert the encoding to UTF-8 but with entities I have no idea how to do it properly. I cannot get the other side to send me UTF-8, since this is a closed old protocol that I have no control of.
The correct XML encoding of Borek Fałęcki is Borek Fałęcki. The SOAP action XML that you receive is wrongly encoded.
However, the following expression converts it as needed:
Buffer.concat(
"Borek FaÅÄcki"
.match(/[^&]+|&#\d+;/g)
.map(c => c[0] === "&"
? Buffer.of(Number(c.substring(2, c.length - 1)))
: Buffer.from(c))
).toString()
I have a problem to verify a string create by crypto.createHmac with Node.js.
I made some test, first in PHP - everything is OK but I can't find the correct way todo this in Node.js:
PHP CODE:
$jsonData = '"face_url":"https:\/\/"';
echo($jsonData);
echo("\n");
$client_secret = 'kqm6FksaIT';
echo hash_hmac("sha256", $jsonData, $client_secret);
Result:
"face_url":"https:\/\/"
34a4eb09a639c9b80713158ae89e7e8311586e6e6d76e09967f4e42a24759b3e
With Node.js, I have a problem with the interpretation of the string:
var crypto = require('crypto');
var str = '"face_url":"https:\/\/"';
console.log(str);
//OK
var buf1 = crypto.createHmac('sha256','kqm6FksaIT').update(str);
var v = buf1.digest('hex');
console.log(v);
//END
RESULT:
"face_url":"https://"
eb502c4711a6d926eeec7830ff34e021ed62c91e574f383f6534fdd30857a907
=> FAIL.
As you can see, the interpretation of the string is different "face_url":"https:\/\/"** VS **"face_url":"https://"**
I have tried a lot of things, Buffer.From base64, utf8, JSON.stringify, JSON.parse but I can't find a solution.
If you try with another string like: '"face_url":"https"' it's OK Result is the same.
I try to validate the key received in a Netatmo POST packet who contain:
"face_url":"https:\/\/netatmocameraimage.blob.core
You can find an implementation of netatmo webhook in PHP here:
https://github.com/Netatmo/Netatmo-API-PHP/blob/master/Examples/Webhook_Server_Example.php
After reflexion, the only difference between codes was the interpretation of request.body.
In PHP, it seems to be in plain text.
Nodejs parses the request in JSON format ...
After that supposition, i made some test with NodeJS this morning, i configured the expres server with the following option:
var express = require('express');
var crypto = require('crypto');
var app = express();
var bodyParser = require('body-parser');
app.use(bodyParser.text({type:"*/*"}));
After that, the string appears correctly with these famous "/" :
console.log RESULT :
,"face_url":"https://netatmocameraimage.blob.core.windows.net/production/
And voila! The HMAC is now CORRECT!
The HMAC from NETATMO is calculated on brut text and not from JSON!
In the PHP code, only the escape sequences \\ and \' are recognized in a single quoted expression, in all other cases the backslash is interpreted as a literal backslash, i.e. \/ is interpreted as a literal backslash followed by a literal slash (see here, sec. Single quoted). This explains the output of the PHP code:
$jsonData = '"face_url":"https:\/\/"';
...
Output:
"face_url":"https:\/\/"
34a4eb09a639c9b80713158ae89e7e8311586e6e6d76e09967f4e42a24759b3e
In JavaScript, the backslash is ignored for characters that do not represent an escape sequence, (see here, last passage), i.e. an \/ is equivalent to a literal slash. This explains the output of the JavaScript code:
var str = '"face_url":"https:\/\/"';
...
Output:
"face_url":"https://"
eb502c4711a6d926eeec7830ff34e021ed62c91e574f383f6534fdd30857a907
So in order for the JavaScript code to give the same result as the PHP, the backslash must be masked:
var str = '"face_url":"https:\\/\\/"';
...
Output:
"face_url":"https:\/\/"
34a4eb09a639c9b80713158ae89e7e8311586e6e6d76e09967f4e42a24759b3e
Presumably the string with the \/ is the result of a JSON serialization in PHP with json_encode(), which escapes the / by default, i.e. converts it to \/, see also here. In JavaScript, / is simply serialized as /. Note that in PHP the escaping of / can be disabled with JSON_UNESCAPED_SLASHES, see also here.
I am using GMAIL api to send email from the nodejs api. I am rendering the raw body using the following utility function
message += '[DEFAULT EMOJI 😆]'
const str = [
'Content-Type: text/html; charset="UTF-8"\n',
'MIME-Version: 1.0\n',
'Content-Transfer-Encoding: 7bit\n',
'to: ',
to,
'\n',
'from: ',
from.name,
' <',
from.address,
'>',
'\n',
'subject: ',
subject + '[DEFAULT EMOJI 😆]',
'\n\n',
message
].join('');
return Buffer.alloc(str.length, str).toString('base64').replace(/\+/g, '-').replace(/\//g, '_');
The code i have used to send the email is
const r = await gmail.users.messages.send({
auth,
userId: "me",
requestBody: {
raw: makeEmailBody(
thread.send_to,
{
address: user.from_email,
name: user.from_name,
},
campaign.subject,
campaign.template,
thread.id
),
},
});
The emojis are being rendered in the body but not working in subject. See the pic below
Left one is from Gmail in Google Chrome on Desktop and right one is from Gmail App in Mobile
Your utility may benefit from several improvements (we will get to the emoji problem):
First, make it RFC822 compliant by separating lines with CRLF (\r\n).
Be careful with Content-Transfer-Encoding header, set it to 7bit is easiest, but may not be generic enough (quoted-printable is likely the better option).
Now to the emoji problem:
You need to make sure the subject is correctly encoded separately from the body to be able to pass the emoji. According to RFC13420, you can use either Base64 or quoted-printable encoding on a subject to create an encoded-word, described as:
"=" "?" charset "?" encoding "?" encoded-text "?" "="
Where encoding is either Q for quoted-printable and B for Base64 encoding.
Note that the resulting encoded subject string must not be longer than 76 chars in length, which reserves 75 chars for the string and 1 for the separator (to use multiple words, separate them with space or newline [CRLF as well]).
So, set your charset to utf-8, encoding to Q, encode the actual subject with something like below1, and you are half way done:
/**
* #summary RFC 1342 header encoding
* #see {#link https://www.rfc-editor.org/rfc/rfc1342}
*/
class HeaderEncoder {
/**
* #summary encode using Q encoding
*/
static quotedPrintable(str: string, encoding = "utf-8") {
let encoded = "";
for (const char of str) {
const cp = char.codePointAt(0);
encoded += `=${cp.toString(16)}`;
}
return `=?${encoding}?Q?${encoded}?=`;
}
}
Now, the fun part. I was working on a GAS project that had to leverage the Gmail API directly (after all, that is what the client library does under the hood). Even with the correct encoding, an attempt to pass something like a "Beep! \u{1F697}" resulted in an incorrectly parsed subject.
Turns out you need to leverage fromCodePoint operating on byte array or buffer from the original string. This snippet should suffice (don't forget to apply only to multibyte chars):
const escape = (u: string) => String.fromCodePoint(...Buffer.from(u));
0 This is the initial RFC, it would be more appropriate to refer to RFC 2047. Also, see RFC 2231 for including locale info in the header (and some more obscure extensions).
1 If the char falls into a range of printable US-ASCII, it can be left as-is, but due to an extensive set of rules, I recommend sticking to 48-57 (numbers), 65-90 (uppercase) and 97-122 (lowercase) ranges.
I am getting value from API response as below
{
"ORG_ID":"165",
"DEPOT_NAME":"Pesto",
"DEPOT_SHORT_NAME":"PSD",
"PROD_ID":"709492",
"DESCRIPTION":"EX CL (2X14) U17\SH36\5",
"PRICE":"3708.55",
"STOCK":"2"
},
now when I am parsing it in json like json.parse(response) it cashes the app.. error is below:
undefined:11
"DESCRIPTION":"EXELON HGC 4.5MG (2X14) U17\SH36\5",
^
SyntaxError: Unexpected token S in JSON at position 296
What should I do to get rid of these escapes.
though I need the same values I don't want to change any value or remove these slashes.
You need to escape the special characters before parsing the jSON.
In this case for it to be valid it should be:
{
"ORG_ID":"165",
"DEPOT_NAME":"Pesto",
"DEPOT_SHORT_NAME":"PSD",
"PROD_ID":"709492",
"DESCRIPTION":"EX CL (2X14) U17\\SH36\\5",
"PRICE":"3708.55",
"STOCK":"2"
}
I just discovered that Node (tested: v0.8.23, current git: v0.11.3-pre) ignores any decoding errors in its Buffer handling, silently replacing any non-utf8 characters with '\ufffd' (the Unicode REPLACEMENT CHARACTER) instead of throwing an exception about the non-utf8 input. As a consequence, fs.readFile, process.stdin.setEncoding and friends mask a large class of bad input errors for you.
Example which doesn't fail but really ought to:
> notValidUTF8 = new Buffer([ 128 ], 'binary')
<Buffer 80>
> decodedAsUTF8 = notValidUTF8.toString('utf8') // no exception thrown here!
'�'
> decodedAsUTF8 === '\ufffd'
true
'\ufffd' is a perfectly valid character that can occur in legal utf8 (as the sequence ef bf bd), so it is non-trivial to monkey-patch in error handling based on this showing up in the result.
Digging a little deeper, it looks like this stems from node just deferring to v8's strings and that those in turn have the above behaviour, v8 not having any external world full of foreign-encoded data.
Are there node modules or otherwise that let me catch utf-8 decode errors, preferrably with context about where the error was discovered in the input string or buffer?
I hope you solved the problem in those years, I had a similar one and eventually solved with this ugly trick:
function isValidUTF8(buf){
return Buffer.compare(new Buffer(buf.toString(),'utf8') , buf) === 0;
}
which converts the buffer back and forth and check it stays the same.
The 'utf8' encoding can be omitted.
Then we have:
> isValidUTF8(new Buffer('this is valid, 指事字 eè we hope','utf8'))
true
> isValidUTF8(new Buffer([128]))
false
> isValidUTF8(new Buffer('\ufffd'))
true
where the '\ufffd' character is correctly considered as valid utf8.
UPDATE: now this works in JXcore, too
From node 8.3 on, you can use util.TextDecoder to solve this cleanly:
const util = require('util')
const td = new util.TextDecoder('utf8', {fatal:true})
td.decode(Buffer.from('foo')) // works!
td.decode(Buffer.from([ 128 ], 'binary')) // throws TypeError
This will also work in some browsers by using TextDecoder in the global namespace.
As Josh C. said above: "npmjs.org/package/encoding"
From the npm website: "encoding is a simple wrapper around node-iconv and iconv-lite to convert strings from one encoding to another."
Download:
$ npm install encoding
Example Usage
var result = encoding.convert(new Buffer([ 128 ], 'binary'), "utf8");
console.log(result); //<Buffer 80>
Visit the site: npm - encoding