Converting from Windows-1255 to UTF-8 in Node JS

Converting from Windows-1255 to UTF-8 in Node JS - node.js

I'm extracting text from a Windows-1255-encoded webpage using Node.js. I'm trying to decode the text using the windows-1255.
After installing it using NPM and requiring it in the relevant file, I tried using it like this:
var title = windows1255.decode('#title').text());
This doesn't seem to have any effect. Any idea why?
Thanks!
Morgan

don't know if you still waiting for an answer about this issue, but the following worked for me...
When fetching the data (a file), I set the get options of encoding to be binary:
var options = {
method: 'GET',
url: 'myURL',
encoding: 'binary'
};
request(options, function (error, response, body) {
//deal with hebrew encoding
csvString = encoding.convert(body, 'UTF8', "CP1255").toString();
Then for I switch encoding from CP1255 (=windows1255) to UTF8.
Hope it helps :)

Related

POST request to retrieve pdf in node.js

I am making a POST request to retrieve a pdf. The request works fine if I do it in postman, but I get an empty pdf if I do it through node.js using the request package. Here's my request using the request package:
let body = {
attr1: "attr1",
attr2: "attr2"
}
let opts = {
url: "some_url",
method: "post",
headers: {
"Content-Type": "application/x-www-form-urlencoded",
},
body
}
request(requestOpts).then(pdf => {
console.log(pdf) // prints out the binary version of the pdf file
fs.writeFileSync("testing.pdf", pdf);
});
I use the exact same request parameters when I use postman but it returns the pdf w/ the correct content.
Can someone help? Or is the way I am saving my pdf incorrect?
Thanks in advance!

Solution - i had to set encoding: false in the request options.

Try
fs.writeFileSync("testing.pdf", pdf, 'binary');
The third argument here tells fs to write binary rather than trying to UTF-8 encode it.

According to the docs the third paramter should be a string that represents the encoding.
For pdf files the encoding is 'application/pdf'
So this should work for you : fs.writeFileSync("testing.pdf", pdf, 'application/psf');

NODE Js: Wrong file type generating?

I want to create mp3 file with words I have given it programmatically. I am using Google Text-To-Speech API to convert into .mp3. Code working fine and its also generating file test.mp3 but not in the original format of .mp3 (It is looks like a .mp3 file and cannot open). Can anyone help me on the same.
My code is:
var fs = require('fs');
var request = require('request');
var text = 'Hello World';
var options = {
url: 'http://translate.google.com/translate_tts?ie=UTF-8&q=' + encodeURIComponent(text) + '&tl=en&client=t',
headers: {
'Referer': 'http://translate.google.com/',
'User-Agent': 'stagefright/1.2 (Linux;Android 5.0)'
}
}
request(options)
.pipe(fs.createWriteStream('test.mp3'))

I got the answer-
just change the url , use simply +text+ not +encodeURIComponent(text)+
url: 'http://translate.google.com/translate_tts?ie=UTF-8&q=' + encodeURIComponent(text) + '&tl=en&client=t'
I dont know why encodeURIComponent() is not working. Luckly fixed right now.

Iconv encoding conversion in Node

I'm using Iconv in Node.js to convert scraped HTML (via request with binary encoding) from SHIFT_JIS to UTF-8:
request({url:url, encoding:'binary'}, function (error, res, html) {
var iconv = new Iconv('SHIFT_JIS', 'UTF-8//TRANSLIT//IGNORE')
var converted = iconv.convert(new Buffer(html,'binary')).toString('utf8')
})
The conversion I'm getting back looks like:
é«SnÌ\r\núêXj[J[ÍAVvÉÈèª¿È«³É\r\nå«ÈCpNgð^
While the pre-conversion looks like: ���[�J�b�g����X�j�[�J�[
I tried using encoding:null in the request, but that didn't work either.

The encoding actually works as posted above, it was an issue in handling the final response outside the request function.

Requestjs returning original language

I'm using request.js and cheerio to capture some text of my site.
The original text is English, and I would like to capture the translated version.
Here's what I have for the request:
request.get({uri:
'http://immocostablancasofia.com/listing/villa-in-lliber-ref-p01638/?lang=nl',
'followAllRedirects': true}
It returns the English version instead of the Dutch one.
I also tried using formData, with no luck.

Add options for request:
var options = {
url: 'http://immocostablancasofia.com/listing/villa-in-lliber-ref-p01638/',
headers: {'Accept-Language': 'nl-NL'},
qs: {lang:'nl'}
};
And
request.get(options, callback);

I changed the code adding ',headers:{'Accept-Language': 'nl-NL'}' , and it works!
request.get({uri:
'http://immocostablancasofia.com/listing/villa-in-lliber-ref-p01638/?lang=nl',headers:{'Accept-Language': 'nl-NL'}
'followAllRedirects': true}

Module request how to properly retrieve accented characters? � � �

I'm using: Module: Request -- Simplified HTTP request method to scrape a webpage with accented characters á é ó ú ê ã etc.
I've already tried encoding: utf-8 with no success. I'm still getting this ��� characters in the result.
request.get({
uri: url,
encoding: 'utf-8'
// ...
Is there any configuration to fix it?
I don't know if it is an issue, but I filled one for this module. No answers yet. :/

Since binary is deprecated it seems like a better idea to use iconv and correctly handle the decoding:
var request = require("request"), iconv = require('iconv-lite');
var requestOptions = { encoding: null, method: "GET", uri: "http://something.com"};
request(requestOptions, function(error, response, body) {
var utf8String = iconv.decode(new Buffer(body), "ISO-8859-1");
console.log(utf8String);
});
The important part is to set the encoding on the HTTP request to be null encoding: null.

Specify the encoding as utf8 not utf-8. Here are a list of possible encodings for a buffer from the Node.js documentation.
ascii - for 7 bit ASCII data only. This encoding method is very fast, and will strip the high bit if set.
utf8 - Unicode characters. Many web pages and other document formats use UTF-8.
base64 - Base64 string encoding.
'binary - A way of encoding raw binary data into strings by using only the first 8 bits of each character. This encoding method is depreciated and should be avoided in favor of Buffer objects where possible. This encoding will be removed in future versions of Node.

I were tried and OK (Shift_JIS):
var concat = require('concat-stream'),
Iconv = require('iconv').Iconv,
request = require('request');
var conv = new Iconv('Shift_JIS', 'utf8'),
req = request('http://www.alc.co.jp/');
req.pipe(conv);
req.on('error', function() {
console.log('an error occurred');
});
conv.pipe(concat(function(body) {
console.log(body.toString());
}));
https://github.com/request/request/issues/1080#issuecomment-56172161

Not a direct answer to OP, but I hate a similar problem and might help someone.
I had the issue because there was a gzip compression, so it needs to be decompressed first
var headers = {
'Accept-Encoding': 'gzip',
};
request({url:url, 'headers': headers, encoding:null},(e,r,b)=>{zlib.gunzip(b, (e,b)=>{console.log(b.toString())}) })

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Converting from Windows-1255 to UTF-8 in Node JS - node.js

Related

POST request to retrieve pdf in node.js

NODE Js: Wrong file type generating?

Iconv encoding conversion in Node

Requestjs returning original language

Module request how to properly retrieve accented characters? � � �

Categories

Resources