How to make UTF-8 in request-promise? - node.js

I made a request with Request-Promise with umlauts after the request:
var file = rp({uri: serviceURL, encoding: 'utf8'}).forEach(function (polizeistelle) {
console.log(polizeistelle)
}
In the console log it says 'pr�si' instead of 'präsi'
Thanks for help

This is because the serviceURL is not delivering utf8. Here utf-8 is not converting to utf8, but merely tells to interpret the response as utf8.
You should use
rp({uri: serviceURL, encoding: 'latin1'})
to read the response correctly, and convert it to utf8 afterwards, if you need to.

Related

Node-Fetch API and Blob to convert Windows-1252 characters to utf-8 [duplicate]

I'm trying to use Fetch to bring some data into the screen, however some of the characters ares showing a weird � sign which I believe has something to do with converting special chars.
When debugging on the server side or if I call the servlet on my browser, the problem doesn't happen, so I believe the issue is with my JavaScript. See the code below:
var myHeaders = new Headers();
myHeaders.append('Content-Type','text/plain; charset=UTF-8');
fetch('getrastreiojadlog?cod=10082551688295', myHeaders)
.then(function (response) {
return response.text();
})
.then(function (resp) {
console.log(resp);
});
I think it is probably some detail, but I haven't managed to find out what is happening. So any tips are welcome
Thx
The response's text() function always decodes the payload as utf-8.
If you want the text in other charset you may use TextDecoder to convert the response buffer (NOT the text) into a decoded text with chosen charset.
Using your example it should be:
var myHeaders = new Headers();
myHeaders.append('Content-Type','text/plain; charset=UTF-8');
fetch('getrastreiojadlog?cod=10082551688295', myHeaders)
.then(function (response) {
return response.arrayBuffer();
})
.then(function (buffer) {
const decoder = new TextDecoder('iso-8859-1');
const text = decoder.decode(buffer);
console.log(text);
});
Notice that I'm using iso-8859-1 as decoder.
Credits: Schneide Blog
Maybe your server isn't returning an utf-8 encoded response, try to find which charset is used and then modify it in call headers.
Maybe ISO-8859-1 :
myHeaders.append('Content-Type','text/plain; charset=ISO-8859-1');
As it turns out, the problem was in how ther servlet was serving the data without explicitly informing the enconding type on the response.
By adding the following line in the Java servlet:
response.setContentType("text/html;charset=UTF-8");
it was possible got get the characters in the right format.

How can I get the value in utf-8 from an axios get receiving iso-8859-1 in node.js

I have the following code:
const notifications = await axios.get(url)
const ctype = notifications.headers["content-type"];
The ctype receives "text/json; charset=iso-8859-1"
And my string is like this: "'Ol� Matheus, est� pendente.',"
How can I decode from iso-8859-1 to utf-8 without getting those erros?
Thanks
text/json; charset=iso-8859-1 is not a valid standard content-type. text/json is wrong and JSON must be UTF-8.
So the best way to get around this at least on the server, is to first get a buffer (does axios support returning buffers?), converting it to a UTF-8 string (the only legal Javascript string) and only then run JSON.parse on it.
Pseudo-code:
// be warned that I don't know axios, I assume this is possible but it's
// not the right syntax, i just made it up.
const notificationsBuffer = await axios.get(url, {return: 'buffer'});
// Once you have the buffer, this line _should_ be correct.
const notifications = JSON.parse(notificationBuffer.toString('ISO-8859-1'));

Node.js encoding UTF-8 issue

I have been facing an issue on node.js express framework encoding/decoding style.
Brief background, I store pdf file in mysql database with longblob data-type with latin1 charset. From server side, i need to send the binary data with UTF8 Encoding format as my client knows utf8 decoding format only.
I tried all the possible solutions available on google.
For ex:
new Buffer(mySqlData).toString('utf8');
Already tried module "UTF8" with given functionality utf8.encode(mySqlData); But it is not working.
Also i already tried "base64" encoding and retrieve data at client with base64 decoding. It is working just fine but i need to have utf8 encoding set. Also you know base64 certainly increase the size.
Please help guys.
Ok, your problem is the conversion of latin to utf-8. If you just call your buffer.toString('utf-8'), the latin encoded characters were wrong.
To convert other charset to utf-8, the simple wai is to use iconv and icu-charset-detector. With that, you can switch to utf-8 from all possibles charset (except certains charset).
This is an example of conversion using stream. The result stream is encoded with utf-8 :
var charsetDetector = require("node-icu-charset-detector"),
Iconv = require('iconv').Iconv,
Stream = require('stream'),
function convertToUtf8(source, callback) {
var iconv,
charsetTestStream = new Stream.PassThrough(),
newResStream = new Stream.PassThrough();
source.pipe(charsetTestStream);
source.pipe(newResStream);
charsetDetector.detectCharsetStream(charsetTestStream, function (charset) {
if (!iconv && charset && !/utf-*8/i.test(charset.toString())) {
try {
iconv = new Iconv(charset, 'utf-8');
console.log('Converting from charset %s to utf-8', charset);
iconv.on('error', function (err) {
callback(err);
});
var convertStream = newResStream.pipe(iconv);
callback(null, convertStream);
} catch(err) {
callback(err);
}
return;
}
callback(null, newResStream);
});
}

Umlauts broken when doing get request

I'm trying to query a webservice which answers with plain text. The text often has german umlauts in it. In the received stream the umlauts are broken. Any ideas what am I doing wrong?
Regards,
Torsten
Here is the sample code:
var request = require('request');
var uri = <anUriWithUserId>;
request(uri, {encoding: 'utf8','content-type': 'text/plain; charset=UTF-8'},
function (error, response, body)
{
console.log("encoding: " + response.headers['content-encoding']);
console.log("type: " + response.headers['content-type']);
console.log(body);
});
And the response:
encoding: undefined
type: text/plain
error=0
---
asin=
name=Eistee
detailname=Pfanner Der Gr�ne Tee, Zitrone - Kaktusfeige, 2,0 l
vendor=Hermann Pfanner Getr�nke GmbH, Lauterach, �sterreich
maincat=Getr�nke, Alkohol
When you set the encoding option in your request call, you advise the request module to decode the response body with this encoding. In this way you ignore the encoding used by the webservice, wich may or may not be utf-8. You need to find out wich encoding was used be the webservice and use that.
Depending on how complient the webservice you could also try to set the Accept-Charset: utf-8 header.
As your output shows, the webservice doesn't provide the used encoding in the Content-Type header, which is a bad habbit imho.
Sidenote: Content-Encoding isn't for charset, but for compression, gzip migh be a valid value for it.

Module request how to properly retrieve accented characters? � � �

I'm using: Module: Request -- Simplified HTTP request method to scrape a webpage with accented characters á é ó ú ê ã etc.
I've already tried encoding: utf-8 with no success. I'm still getting this ��� characters in the result.
request.get({
uri: url,
encoding: 'utf-8'
// ...
Is there any configuration to fix it?
I don't know if it is an issue, but I filled one for this module. No answers yet. :/
Since binary is deprecated it seems like a better idea to use iconv and correctly handle the decoding:
var request = require("request"), iconv = require('iconv-lite');
var requestOptions = { encoding: null, method: "GET", uri: "http://something.com"};
request(requestOptions, function(error, response, body) {
var utf8String = iconv.decode(new Buffer(body), "ISO-8859-1");
console.log(utf8String);
});
The important part is to set the encoding on the HTTP request to be null encoding: null.
Specify the encoding as utf8 not utf-8. Here are a list of possible encodings for a buffer from the Node.js documentation.
ascii - for 7 bit ASCII data only. This encoding method is very fast, and will strip the high bit if set.
utf8 - Unicode characters. Many web pages and other document formats use UTF-8.
base64 - Base64 string encoding.
'binary - A way of encoding raw binary data into strings by using only the first 8 bits of each character. This encoding method is depreciated and should be avoided in favor of Buffer objects where possible. This encoding will be removed in future versions of Node.
I were tried and OK (Shift_JIS):
var concat = require('concat-stream'),
Iconv = require('iconv').Iconv,
request = require('request');
var conv = new Iconv('Shift_JIS', 'utf8'),
req = request('http://www.alc.co.jp/');
req.pipe(conv);
req.on('error', function() {
console.log('an error occurred');
});
conv.pipe(concat(function(body) {
console.log(body.toString());
}));
https://github.com/request/request/issues/1080#issuecomment-56172161
Not a direct answer to OP, but I hate a similar problem and might help someone.
I had the issue because there was a gzip compression, so it needs to be decompressed first
var headers = {
'Accept-Encoding': 'gzip',
};
request({url:url, 'headers': headers, encoding:null},(e,r,b)=>{zlib.gunzip(b, (e,b)=>{console.log(b.toString())}) })

Resources