I need to read xlsx in nodejs. Xlsx contains text with accents and apostrophes and so on. Then i have to save the text in json file.
What are the best practices to perform that task?
Stage 1 - take a look at this module node-xlsx or more robust and possibly better for your needs xlsx.
Stage 2 - Writing the file to JSON - if the module can return a JSON format then great. If you use xlsx it has an option to JSON --> take a look here.
Since you may need to actually strip and/or protect special accents etc. you may need to validate the data which is returned before producing a JSON file.
As to actually writing a JSON file, there are a huge amount of NPM modules for the task.
Related
we want Json format from binary content of pdf file using node.js.
Actually we are getting binary content of pdf from 3 party api response , using this response we will save in our database ,so give me working code for convert binary pdf format to json format
in simple words
Please let us know , "any working code so i have just pass binary data got json data" .
The JSON format natively doesn't support binary data.
Use Base64 or base85
I think the best you can do space-wise is base85 which represents four bytes as five characters. However, this is only a 7% improvement over base64, it's more expensive to compute, and implementations are less common than for base64 so it's probably not a win.
For my research I have a dataset of about 20,000 gziped multiline json files (~2TB, all have the same schema). I need to process and clean this data (I should say I'm very new to data analytics tools).
After spending a few days reading about Spark and Apache Beam I'm convinced that the first step would be to first convert this dataset to NDJSONs. In most books and tutorials they always assume you are working with some new line delimited file.
What is the best way to go about converting this data?
I've tried to just launch a large instance on gcloud and just use gunzip and jq to do this. Not surprisingly, it seems that this will take a long time.
Thanks in advance for any help!
Apache Beam supports unzipping file if you use TextIO.
But the delimiter remains to be New Line.
For multiline json you can read complete file using in parallel and then convert the json string to pojo and eventually reshuffle the data to utilize parallelism.
So the steps would be
Get the file list > Read individual files > Parse file content to json objects > Reshuffle > ...
You can get the file list by FileSystems.match("gcs://my_bucker").metadata().
Read individual files by Compression Compression.detect((fileResouceId).getFilename()).readDecompressed(FileSystems.open(fileResouceId))
Converting to NDJSON is not necessary if you use sc.wholeTextFiles. Point this method at a directory, and you'll get back an RDD[(String, String)] where ._1 is the filename and ._2 is the content of the file.
I want to compare the data in two .csv files.Have to compare the updated data between these two .csv file using nodejs.
Is ther any possibilities to do it in Nodejs.
Thanks,I am very newbie to this.
It will be easiest using one of the following modules:
https://www.npmjs.com/package/csv
https://www.npmjs.com/package/tsv
or other that you find in:
https://www.npmjs.com/browse/keyword/csv
https://www.npmjs.com/browse/keyword/tsv
(don't worry if it's CSV or TSV - just make sure that you use the correct delimiter which is comma in your case).
I'm trying to receive some data in csv format, what I read is that StrongLoop only works with json data. So can I receive csv and transform to json to process the data?
Thanks.
This isn't a StrongLoop specific question. It is a general Node.js and data question. As such, I will answer in a generic fashion, but it is applicable to StrongLoop.
You will need to use a library to convert the delimited file into a JavaScript object. There are many packages on npm for reading/parsing/transforming/etc. CSV files: search npm.
The package that I have used extensively is David's CSV parser.
These libraries will allow you to parse and transform CSV into JavaScript objects (JSON).
Beware, however, that most CSV that I have dealt with does not conform to well formatted CSV. They don't properly escape quotes, quote strings with delimiters, etc.
I am trying to work with Paradox files and convert these to an Excel file.
Does anyone know how to achieve such conversion?
I wrote a small Python script to read Paradox .DB files.
But please be careful, it's not complete: some field types may not be converted (only memos AFAIK, but I'm not a Paradox expert).
https://gist.github.com/BertrandBordage/9892556
You can either read a .DB file as Python objects using paradox.read('your_file.DB') or convert it to a CSV file using paradox.to_csv('your_file.DB').