How to encode arbitrary string for request in Node.js? - node.js

I have a string like that: "abcde 李". It can be any string with non latin characters.
I want to encode it to use in request, so it will be "abcde %E6%9D%8E" and can be used for http.request.
I have tried this:
str.toString("utf-8");
or
var buffer = new Buffer(str);
str = buffer.toString('utf-8');
but none of them work. what is the proper way to handle this?

That string is already UTF-8. It looks like you're trying to escape it for use in an HTTP query string, so try this:
var qs = require('querystring');
qs.escape('abcde 李'); // => 'abcde%20%E6%9D%8E'

Related

nodeJS: convert response.body in utf-8 (from windows-1251 encoding)

I'm trying to convert an HTML body encoded in windows-1251 into utf-8 but I still get messed up characters on html.
They are basically Russian alphabet but I can't get them to be shown properly. I get ??????? ?? ???
const GOT = require('got') // https://www.npmjs.com/package/got
const WIN1251 = require('windows-1251') // https://www.npmjs.com/package/windows-1251
async function query() {
var body = Buffer.from(await GOT('https://example.net/', {resolveBodyOnly: true}), 'binary')
var html = WIN1251.decode(body.toString('utf8'))
console.log(html)
}
query()
You’re doing a lot of silly encoding back-and-forth here. And the ‘backs’ don’t even match the ‘forths’.
First, you use the got library to download a webpage; by default, got will dutifully decode response texts as UTF-8. You stuff the returned Unicode string into a Buffer with the binary encoding, which throws away the higher octet of each UTF-16 code unit of the Unicode string. Then you use .toString('utf-8') which interprets this mutilated string as UTF-8 (in actuality, it is most likely not valid UTF-8 at all). Then you pass the ‘UTF-8’ string to the windows-1251, to decode it as a ‘code page 1251’ string. Nothing good can possibly come from all this confusion.
The windows-1251 package you want to use takes so-called ‘binary’ (pseudo-Latin-1) strings as input. What you should do instead is take the binary response, interpret it as Latin-1/‘binary’ string and then pass it to the windows-1251 library for decoding.
In other words, use this:
const GOT = require('got');
const WIN1251 = require('windows-1251');
async function query() {
const body = await GOT('https://example.net/', {
resolveBodyOnly: true,
responseType: 'buffer'
});
const html = WIN1251.decode(body.toString('binary'))
console.log(html)
}
query()

JSON via NodeJS Express Websocket has lots extra characters

I have a nodeJS server configured with Express and BodyParser
const express = require('express')
const expressWs = require('express-ws')
const bodyParser = require('body-parser')
app.ws('/ws', websocket)
When the websocket gets a message I pass it on
ws.onmessage = e => {
const {action, payload} = JSON.parse(e.data)
channel.send(action,payload)
}
However when it comes to the app via the channel it's got lots of extra characters in it
"{\"action\":\"guide_data_retreived\",\"payload\":[{\"id\":544,\"json\":\"{\\\"code\\\":\\\"lPvwP4rz\\\",\\\"coverDesign\\\":null,\\\"created\\\":1535018423000,\\\"description\\\":\\\"{\\\\\\\"blocks\\\\\\\":[{\\\\\\\"key\\\\\\\":\\\\\\\"dpcth\\\\\\\",\\\\\\\"text\\\\\\\":\\\\\\\"This is an example of a medical emergency. \\\\\\\",\\\\\\\"type\\\\\\\":\\\\\\\"unstyled\\\\\\\",\\\\\\\"depth\\\\\\\":0,\\\\\\\"inlineStyleRanges\\\\\\\":[],\\\\\\\"entityRanges\\\\\\\":[],\\\\\\\"data\\\\\\\":{}},
Which makes it unparseable.
Any idea where this is coming from and how to fix it?
The problem is that you have several levels of string-encoded JSON nested within your objects.
The \'s are escape characters. They are there to indicate that the quote following them is not a terminating quote, but rather is a character in the string. So for instance, let's say I want to have a javascript object which looks like this:
// myFile.javascript
{ "x" : "abc" }
const s = JSON.stringify(foo) // s is a string
Then s will include these escape characters, so that the quotes around "x" and "abc" will be interpreted as inside the string rather than as string terminators:
s == "{\"x\":\"abc\"}" // -> true
So since s is a string, you can also put it inside another object, like this:
const bar = { "nested" : s }
And if you stringify that, you will end up with another level of escapes to signify that s is a string and not a nested JSON object within the object:
JSON.stringify(bar) == "{\"nested\":\"{\\"x\\":\\"abc\\"}\"}
So it's clear that within your application, you're passing around strings instead of objects. For instance, inside your payload, json is a nested string-encoaded JSON, and inside json, description is string-encoaded-JSON.
If you have control over the message being sent on this websocket, then you should parse those strings before you put them in the payload.
So for instance, if you are building the payload like this:
func sendMessage(ws, action, id, json) {
ws.send(action, {id: id, json: json})
}
then you should change it to this:
func sendMessage(ws, action, id, json) {
ws.send(action, {id: id, json: JSON.parse(json)})
}
And so on, for each level of nested object.

NodeJS Express encodes the URL - how to decode

I'm using NodeJS with Express, and when I use foreign characters in the URL, they automatically get encoded.
How do I decode it back to the original string?
Before calling NodeJS, I escape characters.
So the string: אובמה
Becomes %u05D0%u05D5%u05D1%u05DE%u05D4
The entire URL now looks like: http://localhost:32323/?query=%u05D0%u05D5%u05D1%u05DE%u05D4
Now in my NodeJS, I get the escaped string %u05D0%u05D5%u05D1%u05DE%u05D4.
This is the relevant code:
var url_parts = url.parse(req.url, true);
var params = url_parts.query;
var query = params.query; // '%u05D0%u05D5%u05D1%u05DE%u05D4'
I've tried url and querystring libraries but nothing seems to fit my case.
querystring.unescape(query); // still '%u05D0%u05D5%u05D1%u05DE%u05D4'
Update 16/03/18
escape and unescape are deprecated.
Use:
encodeURIComponent('אובמה') // %D7%90%D7%95%D7%91%D7%9E%D7%94
decodeURIComponent('%D7%90%D7%95%D7%91%D7%9E%D7%94') // אובמה
Old answer
unescape('%u05D0%u05D5%u05D1%u05DE%u05D4') gives "אובמה"
Try:
var querystring = unescape(query);
You should use decodeURI() and encodeURI() to encode/decode a URL with foreign characters.
Usage:
var query = 'http://google.com';
query = encodeURI(query);
query = decodeURI(query); // http://google.com
Reference on MDN:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURI
Decoding query parameters from a URL
decodeURIComponent cannot be used directly to parse query parameters from a URL. It needs a bit of preparation.
function decodeQueryParam(p) {
return decodeURIComponent(p.replace(/\+/g, ' '));
}
console.log(decodeQueryParam('search+query%20%28correct%29'));
// 'search query (correct)'
SOURCE: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#decoding_query_parameters_from_a_url
#**update may-2020**
# how to use an encoder/decoder on node js
### I am writing my answer due to I have noisy data I spend 4 hour to fixed
data email input = myemail#gmail.com
data URL input = /us/home
```
decodeURI function that only decodes a URL special character
email output =>myemail%40gmail.com
url output => %2Fus%2F
using decodeURIComponent
email output = > myemail#gmail.com
url output => /us/
```
here some clarification where you can use decodeURI and decodeURIComponent a fucntion

Node Convert String to HTTP Header

I have ByteBuffer which I have converted to a String. Is there a nice way to convert that to an "object" like http.IncomingMessage so that I can do something like this:
var message = stringToIncomingMessage(buf.toString());
var host = message.headers['host']
I essentially need a nice way of extracting the Host field from the string version of an HTTP request. I tried using a regex to find "Host", but sanitizing got complicated.

What is a good replacement for new Buffer("my string", "binary")

According to the node docs the "binary" encoding will be deprecated in future versions.
However I found that my code only works if I create my buffer like this:
var buffer = new Buffer("Special chars like ñ and backspace", "binary");
What is the right way to achieve the same thing?
Fixed it simply by passing the encoding parameter to http.write, like this:
var http = require("http");
var httpReq = http.request(options, callback);
httpReq.write("some string with special characters", "binary");
There are no deprecations in the docs about the write method AFAIK.

Resources