urlencoding form data with windows-1252 charset in node.js - node.js

I need to post a form that has been set to use windows-1252 charset for urlencoding its data. for simple characters, default encoding (utf8) works but it is the special characters that have to be encoded with the required charset.
the npm "request" package i am using does not allow setting any specific charset and uses utf8 by default underneath. i tried another package "Restler", which allows encoding to be set but it throws exception saying invalid charset when i specify windows-1252 (Node only offers a handful of encoding charsets (Buffer class) and windows-1252 is not one of them).
please let me know whether what i am trying to achieve is even possible in node nor not? for verification purposes, i created a little client in java and used apache's http client library with windows-1252 encoding and my request was successfully accepted by the server. so far, i have not been able to figure this out in node.

Sending HTTP request data in a legacy encoding like Windows-1252 is not straightforward in node, as there is no native support for these encodings.
Support can be added in the form of an iconv library, so it's definitely doable, even if it does not work out of the box.
The following targets restler, because you are using it, but in principle this applies to any client HTTP library.
Notes:
Traditional HTTP POSTs are URL-encoded, we will use qs for this.
Support for encodings other than UTF-8 will be provided by qs-iconv, as documented in qs - Dealing with special character sets.
Restler usually encodes data as UTF-8 if you pass it as a string or plain object, but if you pass a Buffer, Restler will send it as it is.
Setting a proper Content-Type and Content-Length will ensure the data can be interpreted properly at the receiving end. Since we supply our own data here, we need to set those headers manually.
Be aware that any character that is not contained in the target charset (Windows-1252 in this case) will be encoded as ? by iconv (%3F in URL form) and therefore will be lost.
Code:
var rest = require('restler');
var qs = require('qs');
var win1252 = require('qs-iconv/encoder')('win1252');
var requestData = {
key1: "‘value1‘",
key2: "‘value2‘"
};
var requestBody = qs.stringify(requestData, { encoder: win1252 });
// => "key1=%91value1%91&key2=%91value2%91"
var requestBuf = new Buffer(requestBody);
rest.post('your/url', {
data: requestBuf,
headers: {
'Content-Type': 'application/x-www-form-urlencoded; charset=windows-1252',
'Content-Length': requestBuf.length
}
}).on('complete', function(data) {
console.log(data);
});

Related

node-superagent responseType('blob') vs. buffer(true)

Due to the deprecation of request, we're currently rewriting the request-service in our node app with superagent. So far all looks fine, however we're not quite sure how to request binary data/octet-stream and to process the actual response body as a Buffer. According to the docs (on the client side) one should use
superAgentRequest.responseType('blob');
which seems to work fine on NodeJS, but I've also found this github issue where they use
superAgentRequest.buffer(true);
which works just as well. So I'm wondering what the preferred method to request binary data in NodeJS is?
According to superagent's source-code, using the responseType() method internally sets the buffer flag to true, i.e. the same as setting it manually to true.
In case of dealing with binary-data/octet-streams, a binary data parser is used, which is in fact just a simple buffer:
module.exports = (res, fn) => {
const data = []; // Binary data needs binary storage
res.on('data', chunk => {
data.push(chunk);
});
res.on('end', () => {
fn(null, Buffer.concat(data));
});
};
In both cases this parser is used, which explains the behaviour. So you can go with either of the mentioned methods to deal with binary data/octet-streams.
As per documentation https://visionmedia.github.io/superagent/
SuperAgent will parse known response-body data for you, currently supporting application/x-www-form-urlencoded, application/json, and multipart/form-data. You can setup automatic parsing for other response-body data as well:
You can set a custom parser (that takes precedence over built-in parsers) with the .buffer(true).parse(fn) method. If response buffering is not enabled (.buffer(false)) then the response event will be emitted without waiting for the body parser to finish, so response.body won't be available.
So to parse other response types, you will need to set .buffer(true).parse(fn). But if you do not want to parse response then no need to set buffer(true).

Handling UTF8 characters in express route parameters

I'm having an issue with a NodeJS REST api created using express.
I have two calls, a get and a post set up like this:
router.get('/:id', (request, response) => {
console.log(request.params.id);
});
router.post('/:id', (request, response) => {
console.log(request.params.id);
});
now, I want the ID to be able to contain special characters (UTF8).
The problem is, when I use postman to test the requests, it looks like they are encoded very differently:
GET http://localhost:3000/api/â outputs â
POST http://localhost:3000/api/â outputs â
Does anyone have any idea what I am missing here?
I must mention that the post call also contains a file upload so the content type will be multipart/form-data
You should encode your URL on the client and decode it on the server. See the following articles:
What is the proper way to URL encode Unicode characters?
Can urls have UTF-8 characters?
Which characters make a URL invalid?
For JavaScript, encodeURI may come in handy.
It looks like postman does UTF-8 encoding but NOT proper url encoding. Consequently, what you type in the request url box translates to something different than what would happen if you typed that url in a browser.
I'm requesting: GET localhost/ä but it encodes it on the wire as localhost/ä
(This is now an invalid URL because it contains non ascii characters)
But when I type localhost/ä in to google chrome, it correctly encodes the request as localhost/%C3%A4
So you could try manually url encoding your request to http://localhost:3000/api/%C3%A2
In my opinion this is a bug (perhaps a regression). I am using the latest version of PostMan v7.11.0 on MacOS.
Does anyone have any idea what I am missing here?
yeah, it doesn't output â, it outputs â, but whatever you're checking the result with, think you're reading something else (iso-8859-1 maybe?), not UTF-8, and renders it as â
Most likely, you're viewing the result in a web browser, and the web server is sending the wrong Content-Type header. try doing header("Content-type: text/plain;charset=utf-8"); or header("Content-type: text/html;charset=utf-8"); , then your browser should render your â correctly.

nodejs partial data just from firefox

I have a server running on nodejs, and I have the following piece of code to manage a post request -
form.on('file', function (field, file) {
var RecordingInfo = JSON.parse(file.name);
...
when I tried to upload a file I got the following exception:
undefined:1
"}
SyntaxError: Unexpected end of input
at Object.parse (native)
at IncomingForm.<anonymous> (.../root.js:31:34)
...
searching around the web, I fond that this exception is caused because the data comes in bits, and the event is fired after the first bit arrives, and I don't have all the data. OK. The thing is, after a little testing I fond that from chrome I can upload large files (tried a 1.75gb file) without any problem, while firefox crashes the server with a 6kb file.
My question is - why are they different?
A sample capture can be downloaded form here. The first post is from chrome, the second from firefox.
The complete file.name string before uploading is:
// chrome
"{"subject":"flksajfd","lecturer":"אבישי וינר","path":"/גמרא","fileType":".png"}"
// firefox
"{"subject":"fdsa","lecturer":"אלקס ציקין","path":"/גמרא","fileType":".jpg"}"
(The data submitted is not the same, but I don't think it matters)
Chrome is encoding double-quotes in the JSON-encoded "filename" as %22 while Firefox is encoding them as \".
Your file upload parsing library, Formidable, explicitly truncates the filename from the last \ character. It expects double-quotes to be encoded as %22 although RFC 2616 allows backslash-escaped quotes like Firefox has implemented. You can consider this a bug in Formidable. The result is that the following JSON string:
'{"subject":"fdsa",...,"fileType":".jpg"}'
...is encoded as follows:
'{%22subject%22:%22fdsa",...,%22fileType%22:%22.jpg%22}' // Chrome
'{\"subject\":\"fdsa\",...\"fileType\":\".jpg\"}' // Firefox
...and then decoded by Formidable:
'{"subject":"fdsa",..."fileType":".jpg"}' // Chrome
'"}' // Firefox
To fix the issue you have a few choices:
Raise the issue with Formidable to correctly handle backslash-escaped quoted-value strings (or fix it yourself and submit a pull request).
Send the JSON payload in a separate part of the FormData object, e.g. using a Blob.
Transliterate all double-quote characters in your JSON-format filename to a 'safe' character that will not appear elsewhere in the string (I chose ^ as an example); replace the quote client-side and reinstate it server-side as follows.
Client:
var formData = new FormData();
formData.append('file', $scope.recording, JSON.stringify(RecordingInfo).replace(/"/g, '^');
Server
form.on('file', function (field, file) {
var RecordingInfo = JSON.parse(file.name.replace(/\^/g, '"');

Send file to browser from string using socket.io

I am using PDFKit and socket.io in a node.js project to generate a pdf when a user clicks a button on the front end. How do I stream or otherwise send the resulting pdf to the end-user from here? I'd rather avoid saving the file to the file system and then having to delete it later if I can... hoping to stream it somehow.
socket.on('customerRequestPDF', function(){
doc = new PDFDocument;
doc.text('Some text goes here', 100, 100);
//I could do this but would rather avoid it
doc.write('output.pdf');
doc.output(function(string) {
//ok I have the string.. now what?
});
});
A websocket isn't really the appropriate mechanism to deliver the PDF. Just use a regular HTTP request.
// assuming Express, but works similarly with the vanilla HTTP server
app.get('/pdf/:token/filename.pdf', function(req, res) {
var doc = new PDFDocument();
// ...
doc.output(function(buf) { // as of PDFKit v0.2.1 -- see edit history for older versions
res.writeHead(200, {
'Content-Type': 'application/pdf',
'Cache-Control': 'private',
'Content-Length': buf.length
});
res.end(buf);
});
});
Now a word of warning: This PDF library is broken. As of version 0.2.1, the output is a proper Buffer, but it uses the deprecated binary string encoding internally instead of Buffers. (Previous versions gave you the binary-encoded string.) From the docs:
'binary' - A way of encoding raw binary data into strings by using only the first 8 bits of each character. This encoding method is deprecated and should be avoided in favor of Buffer objects where possible. This encoding will be removed in future versions of Node.
This means that when node removes the binary string encoding, the library will stop working.

What encoding should I use to properly generate an ETag with crypto in nodeJS?

In my nodeJS app, I'd like to generate ETags for all the content that I return to the client. I need the ETag to be based off the actual content of the file instead of the date, so that the same file across different node processes has the same ETag.
Right now, I am doing the following:
var fs = require('fs'), crypto = require('crypto');
fs.readFile(pathToFile, function(err, buf){
var eTag = crypto.createHash('md5').update(buf).digest('hex');
res.writeHead(200, {'ETag': '"' + eTag + '"','Content-Type':contentType});
res.end(buf);
});
I am not sure what encodings I should be using for the different crypto functions in order to have a proper system in place. Should I be using something other than hex? Should I get the fs.readFile call to return a hex encoded buffer? If so, will doing so impact the content returned to users?
Best, and Thanks,Sami
You're doing it fine. There is no reason to encode the file in any special format, and using hex for the output is pretty standard. The requirements, loosely speaking, are:
the same document should always return the same ETag
any changes in the document causes a change in ETag
the ETag data should fit neatly into an HTTP header

Resources