Umlauts broken when doing get request - node.js

I'm trying to query a webservice which answers with plain text. The text often has german umlauts in it. In the received stream the umlauts are broken. Any ideas what am I doing wrong?
Regards,
Torsten
Here is the sample code:
var request = require('request');
var uri = <anUriWithUserId>;
request(uri, {encoding: 'utf8','content-type': 'text/plain; charset=UTF-8'},
function (error, response, body)
{
console.log("encoding: " + response.headers['content-encoding']);
console.log("type: " + response.headers['content-type']);
console.log(body);
});
And the response:
encoding: undefined
type: text/plain
error=0
---
asin=
name=Eistee
detailname=Pfanner Der Gr�ne Tee, Zitrone - Kaktusfeige, 2,0 l
vendor=Hermann Pfanner Getr�nke GmbH, Lauterach, �sterreich
maincat=Getr�nke, Alkohol

When you set the encoding option in your request call, you advise the request module to decode the response body with this encoding. In this way you ignore the encoding used by the webservice, wich may or may not be utf-8. You need to find out wich encoding was used be the webservice and use that.
Depending on how complient the webservice you could also try to set the Accept-Charset: utf-8 header.
As your output shows, the webservice doesn't provide the used encoding in the Content-Type header, which is a bad habbit imho.
Sidenote: Content-Encoding isn't for charset, but for compression, gzip migh be a valid value for it.

Related

How do I send raw binary data via HTTP in node.js?

From a node.js back end, I need to send an HTTP message to a REST endpoint. The endpoint requires some parameters that it will expect to find in the HTTP message. Some of the parameters are simple enough, just requiring a number or a string as an argument. But one of the parameters is to be "the raw binary file content being uploaded" and this has puzzled me. As far as I understand, the parameters need to be gathered together into a string to put in the body of the HTTP request; How do I add raw binary data to a string? Obviously, for it to be in the string, it cannot be raw binary data; it needs to be encoded into characters.
The endpoint in question is the Twitter media upload API. The "raw binary data" parameter is called media. Below is an incomplete code snippet showing the basic gist of what I've tried. Specifically, the line where I build the requestBody string. I don't believe it is anywhere near correct, because the endpoint is returning a "bad request" message.
var https = require("https");
var base64ImageData = /* (some base 64 string here) */;
var options = {
host: "api.twitter.com",
path: "/1.1/media/upload.json",
method: "POST",
headers: {
"Content-Type": "multipart/form-data"
}
};
var request = https.request(options, function(response) {});
var requestBody = "media_id=18283918294&media=" + Buffer.from(base64ImageData, "base64").toString("binary");
request.write(requestBody);
request.end();
Also worth noting, Twitter themselves note the following extremely confusing statement:
"When posting base64 encoded images, be sure to set the “Content-Transfer-Encoding: base64” on the image part of the message."
Source: https://developer.twitter.com/en/docs/media/upload-media/uploading-media/media-best-practices
That might be part of the answer to my question, but what I don't understand is: How do I apply different headers to different parts of the HTTP message? Because apparently, the image data needs to have a Content-Transfer-Encoding header of "base64" while the rest of the HTTP message does not...
How do I apply different headers to different parts of the HTTP message?
This is the point of the multipart/form-data content type. A multi-part message looks like this:
Content-Type: multipart/form-data; boundary=---foo---
---foo---
Content-Disposition: form-data; name="datafile1"; filename="r.gif"
Content-Transfer-Encoding: base64
Content-Type: image/gif
// data goes here
---foo---
Content-Disposition: form-data; name="datafile2"; filename="g.png"
Content-Transfer-Encoding: base64
Content-Type: image/png
// another file's data goes here
---foo---
You probably don't want to put all this together yourself. There are a bunch of good libraries for putting together complex POSTs. For example: https://www.npmjs.com/package/form-data

Return JSON response in ISO-8859-1 with NodeJS/Express

I have built an API with Node and Express which returns some JSON. The JSON data is to be read by a web application. Sadly this application only accepts ISO-8859-1 encoded JSON which has proven to be a bit difficult.
I can't manage to return the JSON with the right encoding even though I've tried the methods in the Express documentation and also all tips from googling the issue.
The Express documentation says to use "res.set()" or "res.type()" but none of these is working for me. The commented lines are all the variants that I've tried (using Mongoose):
MyModel.find()
.sort([['name', 'ascending']])
.exec((err, result) => {
if (err) { return next(err) }
// res.set('Content-Type', 'application/json; charset=iso-8859-1')
// res.set('Content-Type', 'application/json; charset=ansi')
// res.set('Content-Type', 'application/json; charset=windows-1252')
// res.type('application/json; charset=iso-8859-1')
// res.type('application/json; charset=ansi')
// res.type('application/json; charset=windows-1252')
// res.send(result)
res.json(result)
})
None of these have any effect on the response, it always turns into "Content-Type: application/json; charset=utf-8".
Since JSON should(?) be encoded in utf-8, is it event possible to use any other encoding with Express?
If you look at the lib/response.js file in the Express source code (in your node_modules folder or at https://github.com/expressjs/express/blob/master/lib/response.js) you'll see that res.json takes your result, generates the corresponding JSON representation in a JavaScript String, and then passes that string to res.send.
The cause of your problem is that when res.send (in that same source file) is given a String argument, it encodes the string as UTF8 and also forces the charset of the response to utf-8.
You can get around this by not using res.json. Instead build the encoded response yourself. First use your existing code to set up the Content-Type header:
res.set('Content-Type', 'application/json; charset=iso-8859-1')
After that, manually generate the JSON string:
jsonString = JSON.stringify(result);
then encode that string as ISO-8859-1 into a Buffer:
jsonBuffer = Buffer.from(jsonString, 'latin1');
Finally, pass that buffer to res.send:
res.send(jsonBuffer)
Because res.send is no longer being called with a String argument, it should skip the step where it forces charset=utf-8 and should send the response with the charset value that you specified.

How to make UTF-8 in request-promise?

I made a request with Request-Promise with umlauts after the request:
var file = rp({uri: serviceURL, encoding: 'utf8'}).forEach(function (polizeistelle) {
console.log(polizeistelle)
}
In the console log it says 'pr�si' instead of 'präsi'
Thanks for help
This is because the serviceURL is not delivering utf8. Here utf-8 is not converting to utf8, but merely tells to interpret the response as utf8.
You should use
rp({uri: serviceURL, encoding: 'latin1'})
to read the response correctly, and convert it to utf8 afterwards, if you need to.

non-chunked multipart/mixed POST?

I'm trying to send multiple binary files to a web service in a single multipart/mixed POST but can't seem to get it to work... target server rejects it. One thing I noticed is Node is trying to do the encoding as chunked, which I don't want:
POST /SomeFiles HTTP/1.1
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=123456789012345
Host: host.server.com
Connection: keep-alive
Transfer-Encoding: chunked
How do I get it to stop being chunked? Docs say that can be disabled by setting the Content-Length in the main request header but I can't set the Content-Length since, among other reasons, I'm loading one file at a time -- but it shouldn't be required either since it's multipart.
I think the rest is OK (excepting it's not chunked, unless the req_post.write() is doing that part), e.g. for the initial part I:
var req_post = https.request({
hostname: 'host.server.com',
path: '/ManyFiles',
method: 'POST',
headers: {
'MIME-Version': '1.0',
'Content-Type': 'multipart/mixed; boundary=' + boundary
}
},...);
and then pseudo-code:
while ( true ) {
// { get next file here }
req_post.write('\r\n--' + boundary + '\r\nContent-Type: ' + filetype + '\r\nContent-Length: ' + filesize + '\r\n\r\n');
req_post.write(chunk);// filesize
if ( end ) {
req_post.write('\r\n--' + boundary + '--\r\n');
req_post.end();
break;
}
}
Any help/pointers is appreciated!
The quick answer is you cannot disable chunked without setting content-length. The chunked encoding was introduced for cases where you do not know the size of the payload when you start transmitting. Originally, content-length was required and the recipient knew it had the full message when it received content-length bytes. Chunked encoding removed that requirement by transmitting mini-payloads each with a content-length, followed by a zero-size to denote completion of the payload. So, if you do not set the content-length, and you do not use the chunked methodology, the recipient will never know when it has the full payload.
To help solve your problem, if you cannot send chunked and do not want to read all the files before sending, take a look at fs.stat. You can use it to get the file size without reading the file.

Module request how to properly retrieve accented characters? � � �

I'm using: Module: Request -- Simplified HTTP request method to scrape a webpage with accented characters á é ó ú ê ã etc.
I've already tried encoding: utf-8 with no success. I'm still getting this ��� characters in the result.
request.get({
uri: url,
encoding: 'utf-8'
// ...
Is there any configuration to fix it?
I don't know if it is an issue, but I filled one for this module. No answers yet. :/
Since binary is deprecated it seems like a better idea to use iconv and correctly handle the decoding:
var request = require("request"), iconv = require('iconv-lite');
var requestOptions = { encoding: null, method: "GET", uri: "http://something.com"};
request(requestOptions, function(error, response, body) {
var utf8String = iconv.decode(new Buffer(body), "ISO-8859-1");
console.log(utf8String);
});
The important part is to set the encoding on the HTTP request to be null encoding: null.
Specify the encoding as utf8 not utf-8. Here are a list of possible encodings for a buffer from the Node.js documentation.
ascii - for 7 bit ASCII data only. This encoding method is very fast, and will strip the high bit if set.
utf8 - Unicode characters. Many web pages and other document formats use UTF-8.
base64 - Base64 string encoding.
'binary - A way of encoding raw binary data into strings by using only the first 8 bits of each character. This encoding method is depreciated and should be avoided in favor of Buffer objects where possible. This encoding will be removed in future versions of Node.
I were tried and OK (Shift_JIS):
var concat = require('concat-stream'),
Iconv = require('iconv').Iconv,
request = require('request');
var conv = new Iconv('Shift_JIS', 'utf8'),
req = request('http://www.alc.co.jp/');
req.pipe(conv);
req.on('error', function() {
console.log('an error occurred');
});
conv.pipe(concat(function(body) {
console.log(body.toString());
}));
https://github.com/request/request/issues/1080#issuecomment-56172161
Not a direct answer to OP, but I hate a similar problem and might help someone.
I had the issue because there was a gzip compression, so it needs to be decompressed first
var headers = {
'Accept-Encoding': 'gzip',
};
request({url:url, 'headers': headers, encoding:null},(e,r,b)=>{zlib.gunzip(b, (e,b)=>{console.log(b.toString())}) })

Resources