Streaming a large CSV file into a mongoDB database using mongoose - node.js

Searching for an efficient and quick way to stream a large (10 million lines) of csv data into a mongoose database.
Problems that arise are dealing with streaming instead of importing which could be solve with fs.createReadStream (although still learning how to use it) and how to deal with inserting that large amount of data into the mongoDB using mongoose because overloading mongoose/mongo with insert requests could lead to some errors.

you simply need 'stream-to-mongo-db' and 'csvtojson' npm library.
here is the example code i use to dump millions of records from BIG csv files. It just works!
const fs = require('fs');
const csv = require('csvtojson');
const streamToMongoDB = require('stream-to-mongo-db').streamToMongoDB;
const csvFile = './bigFiles.csv';
const dbURL = 'mongodb://localhost/tweets';
const collection = 'tweets';
fs.createReadStream(csvFile)
.pipe(csv())
.pipe(streamToMongoDB({
dbURL : dbURL, collection : collection
}))

there is a insertMany() method in mongoose. but that only lets you insert 10 000 docs only per once so.. My solution is to you loop asyncly using that method and insert untll the stream finishes..

Related

how to upload multiple files inside mongodb using nodejs

Hello friends I am new in nodejs and mongodb. Is it possible to upload multiple files inside single mongodb doc with other information.
You can make use of BSON (Binary JSON) to store files in MongoDB collections. However, BSON has a limit of 16MB. If you are planning to store files bigger than that, consider GridFS
You can write files to MongoDB like so in Node.js:
var Binary = require(‘mongodb’).Binary;
//Read the file that you want to store
var file_data = fs.readFileSync(file_path);
var db_doc = {};
db_doc.file_data= Binary(file_data);
var my_collection = db.collection(‘files’);
my_collection.insert(db_doc, function(err, result){
//more code..
....

How to save returned protobuf object in nodejs?

In my code, a function is returning a protobuf object and I want to save it in a file xyz.pb.
When I am trying to save it using fs.writefilesync it is not saving it.
It is circular in nature. So, I tried to save it using circular-json module to confirm if there is anything inside it and it has data.
But, as I used circular-json in the first place it doesn't have the proper information(not properly formatted) and it is of no use.
How can I save this protobuf in a file using nodejs?
Thanks!
you can try to use streams like mentioned in documentation
as following
const crypto = require('crypto');
const fs = require('fs');
const wstream = fs.createWriteStream('fileWithBufferInside');
// creates random Buffer of 100 bytes
const buffer = crypto.randomBytes(100);
wstream.write(buffer);
wstream.end();
or you can convert the buffer to JSON and save it in file as following:
const crypto = require('crypto');
const fs = require('fs');
const wstream = fs.createWriteStream('myBinaryFile');
// creates random Buffer of 100 bytes
const buffer = crypto.randomBytes(100);
wstream.write(JSON.stringify(buffer));
wstream.end();
and if your application logic doesn't require to use sync nature you should not use writeFileSync due to it will block your code until it will end so be careful.
try instead using writeFile or Streams it's more convenient.
The purpose of Protocol Buffers is to serialize strongly typed messages to binary format and back into messages. If you want to write a message from memory into a file, first serialize the message into binary and then write binary to a file.
NodeJS Buffer docs
NodeJS write binary buffer into a file
Protocol Buffers JavaScript SDK Docs
It should look something like this:
const buffer = messageInstance.serializeBinary()
fs.writeFile("filename.pb", buffer, "binary", callback)
I found how to easily save protobuf object in a file.
Convert the protobuf object into buffer and then save it.
const protobuf = somefunction(); // returning protobuf object
const buffer = protobuf.toBuffer();
fs.writeFileSync("filename.pb", buffer);

GZip Paginated JSON Response

I have a paginated request that gives me a list of objects, which I later concat to get the full list of objects.
If I attempt to JSON.stringify this, it fails for large objects with range error. I was looking for a way to zlib.gzip to handle large JSON objects.
Try installing stream-json it will solve your problem, It's a great wrapper around streams and parsing a JSON.
//require the modules stream-json
const StreamArray = require('stream-json/utils/StreamArray');
// require fs if your using a file
const fs = require('fs');
const zlib = require('zlib');
// Create an instance of StreamArray
const streamArray = StreamArray.make();
fs.createReadStream('./YOUR_FILE.json.gz')
.pipe(zlib.createUnzip()) // Unzip
.pipe(streamArray.input); //Read the stream
//here you can do whatever you want with the stream,
//you can stream it to response.
streamArray.output.pipe(process.stdout);
In the example, I'm using a JSON (file) but you can use a collection and pass it to the stream.
Hope that's help.

import data from a csv file into mongoDB collection with meteor

i would like to get data from a csv file into a mongoDB collection using Meteor js and i would be grateful for any help
You can use papa-parse and read csv file using Node file system like this:
var fs = Npm.require('fs');
// Assume that the csv file is in yourApp/public/data folder
var data = fs.readFileSync(process.env.PWD + '/public/data/yourCSV.csv', 'utf8');
var usersData = Papa.parse(data, {header: true});
The userData will in the JSON format, you can store it in the MongoDb as you want.
csv-parse is used to parse csv files. Loading the MongoDB collection can be done via the upsert method of Meteor collection.

Populate MongoDB from CSV using NodeJS

I am trying to populate my MongoDB using data from a CSV file. There are currently no databases or collections in my MongoDB and I would like to create these using an update function that creates objects parsed from a csv file.
I am using ya-csv to parse my csv file and the mongodb driver for node.
My code looks like this:
var csv = require('ya-csv');
var fs = require('fs');
var MongoClient = require('mongodb').MongoClient;
var Server = require('mongodb').Server;
var mongoclient = new MongoClient(new Server('localhost', 27017, {'native_parser' : true}));
var reader = csv.createCsvFileReader('YT5.csv', {columnsFromHeader:true,'separator': ','});
reader.addListener('data', function(data){
var nameHolder = data.name;
//I have no issue getting the needed variables from my csv file
mongoclient.db(nameHolder).collection('assets').update({assetId:data.assetId,name:data.name},{upsert:true},function(err,updated){if(err){console.log(err)}});
reader.addListener('end', function(data){
console.log("done");
}
I have not created the databases or collections for this, but can it do this for me with this update? I get an error:
[Error: Connection was destroyed by application]
When I run this, the databases get created but they're empty.
Any help would be greatly appreciated.
Unless there is a specific need to use NodeJS, say to not only reorder but make some complex modification to fields read from CSV file, use mongoimport.
If all you need to do is to skip and reorder some fields - work the fields with simple awk, skipping and changing order as needed:
cat /tmp/csv | awk -F',' 'BEGIN{OFS=","} {print $1,$2,$4,$3,$10}' | mongoimport --type csv --db test --collection csv_import
Additionally, if there is a need to change collection or db name based on the csv values (field 10 in this example is used as db name and field 11 as collection):
cat /tmp/csv | awk -F',' 'BEGIN{OFS=","} {print $1,$2,$4,$3,$10 | "mongoimport --type csv --db "$10" --collection "$11 }'
Or you could split the file into the per db/collection chunks first.
If you convert the csv rows in a javascript array (each row is an object), you can use https://github.com/bitliner/MongoDbPopulator

Resources