I'm using a Latin1 encoded DB and can't change it to UTF-8 meaning that I run into issues with certain application data. I'm using Tesseract to OCR a document (tesseract encodes in UTF-8) and tried to use iconv-lite; however, it creates a buffer and to convert that buffer into a string. But again, buffer to string conversion does not allow "latin1" encoding.
I've read a bunch of questions/answers; however, all I get is setting client encoding and stuff like that.
Any ideas?
Since Node.js v7.1.0, you can use the transcode function from the buffer module:
https://nodejs.org/api/buffer.html#buffer_buffer_transcode_source_fromenc_toenc
For example:
const buffer = require('buffer');
const latin1Buffer = buffer.transcode(Buffer.from(utf8String), "utf8", "latin1");
const latin1String = latin1Buffer.toString("latin1");
You can create a buffer from the UFT8 string you have, and then decode that buffer to Latin 1 using iconv-lite, like this
var buff = new Buffer(tesseract_string, 'utf8');
var DB_str = iconv.decode(buff, 'ISO-8859-1');
I've found a way to convert any encoded text file, to UTF8
var
fs = require('fs'),
charsetDetector = require('node-icu-charset-detector'),
iconvlite = require('iconv-lite');
/* Having different encodings
* on text files in a git repo
* but need to serve always on
* standard 'utf-8'
*/
function getFileContentsInUTF8(file_path) {
var content = fs.readFileSync(file_path);
var original_charset = charsetDetector.detectCharset(content);
var jsString = iconvlite.decode(content, original_charset.toString());
return jsString;
}
I'ts also in a gist here: https://gist.github.com/jacargentina/be454c13fa19003cf9f48175e82304d5
Maybe you can try this, where content should be your database buffer data (in latin1 encoding)
Related
I need to read some .csv file, get data in .json format and work with it.
I'm using npm package convert-csv-to-json. As a result - cyrillic symbols aren't displaying properly:
const csvToJson = require('convert-csv-to-json');
let json = csvToJson.fieldDelimiter(',').getJsonFromCsv("input.csv");
console.log(json);
Result:
If I try to decode file:
const csvToJson = require('convert-csv-to-json');
let json = csvToJson.asciiEncoding().fieldDelimiter(',').getJsonFromCsv("input.csv");
console.log(json);
result is:
When I open a .csv file using AkelPad or notepad++ - it displays as it has to, and detected format is Win 1251 (ANSI - кириллица).
Is there a way to read a file with properly encoding, or to decode a result string?
Try using UTF-8 encoding instead of ASCII.
As a result, change
let json = csvToJson.asciiEncoding().fieldDelimiter(',').getJsonFromCsv("input.csv");
to
let json = csvToJson.utf8Encoding().fieldDelimiter(',').getJsonFromCsv("input.csv");
This is a code to solve the problem:
const fs = require('fs');
var iconv = require('iconv-lite');
const Papa = require('papaparse');
// read csv file and get buffer
const buffer = fs.readFileSync("input.csv");
// parse buffer to string with encoding
let dataString = iconv.decode(buffer, 'win1251');
// parse string to array of objects
let config = {
header: true
};
const parsedOutput = Papa.parse(dataString, config);
console.log('parsedOutput: ', parsedOutput);
Hi i am trying to read a file and i am having trouble with the fileReader readAsArrayBuffer function in nodejs.
var FileReader = require("filereader");
let p12_path = __dirname + "/file.p12";
var p12xxx = fs.readFileSync(p12_path, "utf-8");
var reader = new FileReader();
reader.readAsArrayBuffer(p12xxx);//The problem is here
reader.onloadend = function() {
arrayBuffer = reader.result;
var arrayUint8 = new Uint8Array(arrayBuffer);
var p12B64 = forge.util.binary.base64.encode(arrayUint8);
var p12Der = forge.util.decode64(p12B64);
var p12Asn1 = forge.asn1.fromDer(p12Der);
............
}
-------The error
Error: cannot read as File: "0�6�\.............
You are reading a PDF file which is not a text based format and should not have an encoding specified. As per the fs docs "If the encoding option is specified then this function returns a string" but because its mostly a binary file its reading invalid UTF8 characters. When you exclude the encoding it should give you a Buffer object instead which is what you most likely want.
According to the npm filereader Doc, the reader created with fs.readFileSync(p12_path, "utf-8"); needs to get a path of a file in the utf-8 encoding, otherwise it cannot read it.
The printed out "0�6�\............. shows the file is obviously not in utf8 and therefor not readable.
I'm using node-gd to process images, but I'd like to do a few things before saving them to the disk. Right now I save the file with the .savePng() and .saveJpeg() functions.
I'd like to convert it to a stream which can be piped to an FS stream.
I tried the module streamifier because it sounds like it would do what I need, but when running the code below, the exported image is unreadable (though the same size as exporting via node-gd).
Here is what I attempted to do:
var gd = require("node-gd");
var fs = require("fs");
const streamifier = require('streamifier');
var inputImage = gd.createFromPng('input.png');
var writeStream = fs.createWriteStream('output.png');
var pngstream = inputImage.pngPtr();
streamifier.createReadStream(pngstream).pipe(writeStream);
Is there something I'm missing?
The png pointer needs to first be converted to a buffer like so
var pngstream = Buffer.from(inputImage.pngPtr(), 'binary');
And on the contrary, how can I convert the binaries data back to image? Because the image data save in the backend are stored as binaries.
Try this .
var fs = require("fs");
fs.readFile('image.jpg', function(err, data) {
if (err) throw err;
// Encode to base64
var encodedImage = new Buffer(data, 'binary').toString('base64');
// Decode from base64
var decodedImage = new Buffer(encodedImage, 'base64').toString('binary');
});
Hope it will be useful for you.
You can do it by using fs.createReadStream instead of Buffer, Buffer is deprecated method.
Find more info about the differences in https://medium.com/tensult/stream-and-buffer-concepts-in-node-js-87d565e151a0
If you want a solution for reading files(obviously you can read images too) and get it converted to binary, I wrote a small code in NodeJS, have a look hope it will help you out. It is all about reading a file into binary, but surely you can convert the string to array or byte-array. If you get suck here, please let me know in the comments below.
Here is a simple yet robust snippet you can try.
params format:
getBinary({
path : '<file_relative_path>',
padlength: '<prepending_padding_length>', (Default: 4)
debug: false, (Default: true)
limit: 10 (Default: Full_File_Length)
putSpacing: Boolean (Default: false)
})
Params Description:
1. path: Specifies the relative file path, to be read.
2. padlength: After reading the file, it reads object as number
(ex: hex(f): 1111, hex(0): 0), so if you need a
uniform length binary string then you will need to
fill the strings. as hex(0): 0000 when padlength is 4.
3. limit: limits the read buffer to render.
4. putSpacing: if true it puts a space after each padlength.
or
getBinary('<file_relative_path>');
Get it here: https://computopedia.com/how-to-convert-image-to-binary-nodejs/
Gist: https://gist.github.com/shankha96/cffe620776066078289ea1f8b15956e0
I have a relatively small file (some hundreds of kilobytes) that I want to be in memory for direct access for the entire execution of the code.
I don't know exactly the internals of Node.js, so I'm asking if a fs open is enough or I have to read all file and copy to a Buffer?
Basically, you need to use the readFile or readFileSync function from the fs module. They return the complete content of the given file, but differ in their behavior (asynchronous versus synchronous).
If blocking Node.js (e.g. on startup of your application) is not an issue, you can go with the synchronized version, which is as easy as:
var fs = require('fs');
var data = fs.readFileSync('/etc/passwd');
If you need to go asynchronous, the code is like that:
var fs = require('fs');
fs.readFile('/etc/passwd', function (err, data ) {
// ...
});
Please note that in either case you can give an options object as the second parameter, e.g. to specify the encoding to use. If you omit the encoding, the raw buffer is returned:
var fs = require('fs');
fs.readFile('/etc/passwd', { encoding: 'utf8' }, function (err, data ) {
// ...
});
Valid encodings are utf8, ascii, utf16le, ucs2, base64 and hex. There is also a binary encoding, but it is deprecated and should not be used any longer. You can find more details on how to deal with encodings and buffers in the appropriate documentation.
As easy as
var buffer = fs.readFileSync(filename);
With Node 0.12, it's possible to do this synchronously now:
var fs = require('fs');
var path = require('path');
// Buffer mydata
var BUFFER = bufferFile('../public/mydata');
function bufferFile(relPath) {
return fs.readFileSync(path.join(__dirname, relPath)); // zzzz....
}
fs is the file system. readFileSync() returns a Buffer, or string if you ask.
fs correctly assumes relative paths are a security issue. path is a work-around.
To load as a string, specify the encoding:
return readFileSync(path,{ encoding: 'utf8' });