nodejs best way to convert an input file to json format

nodejs best way to convert an input file to json format - node.js

I have an input file.txt that takes the following format
Order: Order1
from customerA to customerB
orderItemA
orderItemB
Order: Order2
from customerC to customerD
orderItemC
orderItemD
I want to convert this total input into a json and then start processing it. At the moment all I can think of is to read this line by line and then start to process it, but is there any node package or an alternative way in which I can read the whole file in one go and convert it to a JSON format ? I found the following but did not work https://www.npmjs.com/package/plain-text-data-to-json

Related

How can I convert a Pyspark dataframe to a CSV without sending it to a file?

I have a dataframe which I need to convert to a CSV file, and then I need to send this CSV to an API. As I'm sending it to an API, I do not want to save it to the local filesystem and need to keep it in memory. How can I do this?

Easy way: convert your dataframe to Pandas dataframe with toPandas(), then save to a string. To save to a string, not a file, you'll have to call to_csv with path_or_buf=None. Then send the string in an API call.
From to_csv() documentation:
Parameters
path_or_bufstr or file handle, default None
File path or object, if None is provided the result is returned as a string.
So your code would likely look like this:
csv_string = df.toPandas().to_csv(path_or_bufstr=None)
Alternatives: use tempfile.SpooledTemporaryFile with a large buffer to create an in-memory file. Or you can even use a regular file, just make your buffer large enough and don't flush or close the file. Take a look at Corey Goldberg's explanation of why this works.

Converting 2TB of gziped multiline JSONs to NDJSONs

For my research I have a dataset of about 20,000 gziped multiline json files (~2TB, all have the same schema). I need to process and clean this data (I should say I'm very new to data analytics tools).
After spending a few days reading about Spark and Apache Beam I'm convinced that the first step would be to first convert this dataset to NDJSONs. In most books and tutorials they always assume you are working with some new line delimited file.
What is the best way to go about converting this data?
I've tried to just launch a large instance on gcloud and just use gunzip and jq to do this. Not surprisingly, it seems that this will take a long time.
Thanks in advance for any help!

Apache Beam supports unzipping file if you use TextIO.
But the delimiter remains to be New Line.
For multiline json you can read complete file using in parallel and then convert the json string to pojo and eventually reshuffle the data to utilize parallelism.
So the steps would be
Get the file list > Read individual files > Parse file content to json objects > Reshuffle > ...
You can get the file list by FileSystems.match("gcs://my_bucker").metadata().
Read individual files by Compression Compression.detect((fileResouceId).getFilename()).readDecompressed(FileSystems.open(fileResouceId))

Converting to NDJSON is not necessary if you use sc.wholeTextFiles. Point this method at a directory, and you'll get back an RDD[(String, String)] where ._1 is the filename and ._2 is the content of the file.

Convert text content to JSON using CSV node in NODE_RED?

Can any one help me how can we pass input with multiple lines to CSV node which has different columns/elements for each line.
I have a text file with below content
H|1|2|3|$|4|4
D|3|4|5
D|4|4|6
D|2|3|4
Here,how can we pass columns names for H-header,D-details differently to CSV node which generated JSON.
Thanks in advance!!

The best you can do is to use a switch node to split lines based on the first character and then run those lines through 2 separate CSV nodes.
The 2 CSV node can contain the header names for different line types.
The switch node should be configured something like this:
You can then combine the 2 legs of the flow again to consume the different messages for the output of the CSV nodes.

Nodejs best way to read xlsx as utf8 text

I need to read xlsx in nodejs. Xlsx contains text with accents and apostrophes and so on. Then i have to save the text in json file.
What are the best practices to perform that task?

Stage 1 - take a look at this module node-xlsx or more robust and possibly better for your needs xlsx.
Stage 2 - Writing the file to JSON - if the module can return a JSON format then great. If you use xlsx it has an option to JSON --> take a look here.
Since you may need to actually strip and/or protect special accents etc. you may need to validate the data which is returned before producing a JSON file.
As to actually writing a JSON file, there are a huge amount of NPM modules for the task.

Cat output different from redirect output

I have a ".msg" file, and I wanted to do some textual parsing on it. I have not installed any package to convert a .msg file to any other kind of format.
When I do :
cat testing.msg <--- Shows up correctly and whatever in the screen is what I want
However when I do:
cat testing.msg > file <---- The file seems to be encoded when I do vi and look into the file.
testing.msg: CDF V2 Document, No summary info
How can I correctly to read the msg file so that I get the correct textual data?
I have tried changing the $LANG variables but that doesn't seem to work.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

nodejs best way to convert an input file to json format - node.js

Related

How can I convert a Pyspark dataframe to a CSV without sending it to a file?

Converting 2TB of gziped multiline JSONs to NDJSONs

Convert text content to JSON using CSV node in NODE_RED?

Nodejs best way to read xlsx as utf8 text

Cat output different from redirect output

Categories

Resources