Cannot read cyrillic symbols from a .csv file - node.js

I need to read some .csv file, get data in .json format and work with it.
I'm using npm package convert-csv-to-json. As a result - cyrillic symbols aren't displaying properly:
const csvToJson = require('convert-csv-to-json');
let json = csvToJson.fieldDelimiter(',').getJsonFromCsv("input.csv");
console.log(json);
Result:
If I try to decode file:
const csvToJson = require('convert-csv-to-json');
let json = csvToJson.asciiEncoding().fieldDelimiter(',').getJsonFromCsv("input.csv");
console.log(json);
result is:
When I open a .csv file using AkelPad or notepad++ - it displays as it has to, and detected format is Win 1251 (ANSI - кириллица).
Is there a way to read a file with properly encoding, or to decode a result string?

Try using UTF-8 encoding instead of ASCII.
As a result, change
let json = csvToJson.asciiEncoding().fieldDelimiter(',').getJsonFromCsv("input.csv");
to
let json = csvToJson.utf8Encoding().fieldDelimiter(',').getJsonFromCsv("input.csv");

This is a code to solve the problem:
const fs = require('fs');
var iconv = require('iconv-lite');
const Papa = require('papaparse');
// read csv file and get buffer
const buffer = fs.readFileSync("input.csv");
// parse buffer to string with encoding
let dataString = iconv.decode(buffer, 'win1251');
// parse string to array of objects
let config = {
header: true
};
const parsedOutput = Papa.parse(dataString, config);
console.log('parsedOutput: ', parsedOutput);

Related

Getting error while reading json file using node.js

I am getting the following error while reading the json file using Node.js. I am explaining my code below.
SyntaxError: Unexpected token # in JSON at position 0
at JSON.parse (<anonymous>)
My json file is given below.
test.json:
#PATH:/test/
#DEVICES:div1
#TYPE:p1
{
name:'Raj',
address: {
city:'bbsr'
}
}
This json file has some # included strings . Here I need to remove those # included string from this file. I am explaining my code below.
fs.readdirSync(`${process.env['root_dir']}/uploads/${fileNameSplit[0]}`).forEach(f => {
console.log('files', f);
let rawdata = fs.readFileSync(`${process.env['root_dir']}/uploads/${fileNameSplit[0]}/${f}`);
let parseData = JSON.parse(rawdata);
console.log(parseData);
});
Here I am trying to read the code first but getting the above error. My need is to remove those # included lines from the json file and then read all the data and convert the removed lines to object like const obj ={PATH:'/test/',DEVICES:'div1',TYPE:p1}. Here I am using node.js fs module to achive this.
As you said, you need to remove those # lines from the JSON file. You need to code this yourself. To help with that, read the file into a string and not a Buffer by providing a charset to readFileSync.
const text = fs.readFileSync(path, 'utf8');
console.log(text);
const arr = raw.split("\n");
const noComments = arr.filter(x => x[0] !== "#"));
const filtered = noComments.join("\n");
const data = JSON.parse(filtered);
console.log(data);

Reading file using Node.js "Invalid Encoding" Error

I am creating an application with Node.js and I am trying to read a file called "datalog.txt." I use the "append" function to write to the file:
//Appends buffer data to a given file
function append(filename, buffer) {
let fd = fs.openSync(filename, 'a+');
fs.writeSync(fd, str2ab(buffer));
fs.closeSync(fd);
}
//Converts string to buffer
function str2ab(str) {
var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
var bufView = new Uint16Array(buf);
for (var i=0, strLen=str.length; i < strLen; i++) {
bufView[i] = str.charCodeAt(i);
}
return buf;
}
append("datalog.txt","12345");
This seems to work great. However, now I want to use fs.readFileSync to read from the file. I tried using this:
const data = fs.readFileSync('datalog.txt', 'utf16le');
I changed the encoding parameter to all of the encoding types listed in the Node documentation, but all of them resulted in this error:
TypeError: Argument at index 2 is invalid: Invalid encoding
All I want to be able to do is be able to read the data from "datalog.txt." Any help would be greatly appreciated!
NOTE: Once I can read the data of the file, I want to be able to get a list of all the lines of the file.
Encoding and type are an object:
const data = fs.readFileSync('datalog.txt', {encoding:'utf16le'});
Okay, after a few hours of troubleshooting a looking at the docs I figured out a way to do this.
try {
// get metadata on the file (we need the file size)
let fileData = fs.statSync("datalog.txt");
// create ArrayBuffer to hold the file contents
let dataBuffer = new ArrayBuffer(fileData["size"]);
// read the contents of the file into the ArrayBuffer
fs.readSync(fs.openSync("datalog.txt", 'r'), dataBuffer, 0, fileData["size"], 0);
// convert the ArrayBuffer into a string
let data = String.fromCharCode.apply(null, new Uint16Array(dataBuffer));
// split the contents into lines
let dataLines = data.split(/\r?\n/);
// print out each line
dataLines.forEach((line) => {
console.log(line);
});
} catch (err) {
console.error(err);
}
Hope it helps someone else with the same problem!
This works for me:
index.js
const fs = require('fs');
// Write
fs.writeFileSync('./customfile.txt', 'Content_For_Writing');
// Read
const file_content = fs.readFileSync('./customfile.txt', {encoding:'utf8'}).toString();
console.log(file_content);
node index.js
Output:
Content_For_Writing
Process finished with exit code 0

Cannot read File from fs with FileReader

Hi i am trying to read a file and i am having trouble with the fileReader readAsArrayBuffer function in nodejs.
var FileReader = require("filereader");
let p12_path = __dirname + "/file.p12";
var p12xxx = fs.readFileSync(p12_path, "utf-8");
var reader = new FileReader();
reader.readAsArrayBuffer(p12xxx);//The problem is here
reader.onloadend = function() {
arrayBuffer = reader.result;
var arrayUint8 = new Uint8Array(arrayBuffer);
var p12B64 = forge.util.binary.base64.encode(arrayUint8);
var p12Der = forge.util.decode64(p12B64);
var p12Asn1 = forge.asn1.fromDer(p12Der);
............
}
-------The error
Error: cannot read as File: "0�6�\.............
You are reading a PDF file which is not a text based format and should not have an encoding specified. As per the fs docs "If the encoding option is specified then this function returns a string" but because its mostly a binary file its reading invalid UTF8 characters. When you exclude the encoding it should give you a Buffer object instead which is what you most likely want.
According to the npm filereader Doc, the reader created with fs.readFileSync(p12_path, "utf-8"); needs to get a path of a file in the utf-8 encoding, otherwise it cannot read it.
The printed out "0�6�\............. shows the file is obviously not in utf8 and therefor not readable.

how to give file name a input in baby parser

I am trying to use baby parser for parsing csv file but i am getting below output if i give file name
file and code are in same directory
my code:
var Papa = require('babyparse');
var fs = require('fs');
var file = 'test.csv';
Papa.parse(file,{
step: function(row){
console.log("Row: ", row.data);
}
});
Out put :
Row: [ [ 'test.csv' ] ]
file must be a File object: http://papaparse.com/docs#local-files. In nodejs, you should use the fs API to load the content of the file and then pass it to PapaParse: https://nodejs.org/api/fs.html#fs_fs_readfilesync_filename_options
var Papa = require('babyparse');
var fs = require('fs');
var file = 'test.csv';
var content = fs.readFileSync(file, { encoding: 'binary' });
Papa.parse(content, {
step: function(row){
console.log("Row: ", row.data);
}
});
The encoding option is important and setting it to binary works for any text/csv file, you could also set it to utf8 if your file is in unicode.

Converting a string from utf8 to latin1 in NodeJS

I'm using a Latin1 encoded DB and can't change it to UTF-8 meaning that I run into issues with certain application data. I'm using Tesseract to OCR a document (tesseract encodes in UTF-8) and tried to use iconv-lite; however, it creates a buffer and to convert that buffer into a string. But again, buffer to string conversion does not allow "latin1" encoding.
I've read a bunch of questions/answers; however, all I get is setting client encoding and stuff like that.
Any ideas?
Since Node.js v7.1.0, you can use the transcode function from the buffer module:
https://nodejs.org/api/buffer.html#buffer_buffer_transcode_source_fromenc_toenc
For example:
const buffer = require('buffer');
const latin1Buffer = buffer.transcode(Buffer.from(utf8String), "utf8", "latin1");
const latin1String = latin1Buffer.toString("latin1");
You can create a buffer from the UFT8 string you have, and then decode that buffer to Latin 1 using iconv-lite, like this
var buff = new Buffer(tesseract_string, 'utf8');
var DB_str = iconv.decode(buff, 'ISO-8859-1');
I've found a way to convert any encoded text file, to UTF8
var
fs = require('fs'),
charsetDetector = require('node-icu-charset-detector'),
iconvlite = require('iconv-lite');
/* Having different encodings
* on text files in a git repo
* but need to serve always on
* standard 'utf-8'
*/
function getFileContentsInUTF8(file_path) {
var content = fs.readFileSync(file_path);
var original_charset = charsetDetector.detectCharset(content);
var jsString = iconvlite.decode(content, original_charset.toString());
return jsString;
}
I'ts also in a gist here: https://gist.github.com/jacargentina/be454c13fa19003cf9f48175e82304d5
Maybe you can try this, where content should be your database buffer data (in latin1 encoding)

Resources