Trying to write / export some data from MongoDB to a file, where later I'd like to read/update them back.
filehandle = await fs.promises.open(params.credentialsFilename, 'w');
// ... some other code
let records= await mongoConn.db(params.db).collection(COLLECTION_RECORDS)
.find({}).toArray();
for(const anObj of records) {
console.log('[exportMongoDbWriteToFile] writing _id: ', anObj._id);
console.log(anObj);
await filehandle.write(JSON.stringify(anObj, null, 2)); // start array
// await filehandle.write(anObj); // doesn't work with an object
}
Th problem is that JSON.stringify(anObj) converts the ObjectId(hex_id) to a string of hex_id for the _id property (for any property using the ObjectId reference).
This is a problem as reading the data back to the MongoDB the string hex_id is not/ is different than ObjectId(hex_id) (as far I know).
However console.log(anObj); actually writes the full JSON with the ObjectId notation, not sure how the object is serialized for the console, we'd like to have the same output written into the file too.
Edit: we're using the default mongodb library
Related
I have always created a single connection with one connection string. My question is, how to make multiple connections(MongoDB instances) if an array of connection strings are given in NodeJs get API?
Let’s say multiple connection strings will have the same type of database. e.g., my database name is “University” and this database is available in all different locations. And I wanted to write one common API which will provide me with an array of universities from different connections, how to do it?
Example
connectionString1 = mongodb://localhost:27017
connectionString2 = mongodb://localhost:27018
connectionString3 = mongodb://localhost:27019
Now I wanted to connect with all three connection strings and fetch all records from them and send them
in response to one common API, how can I do it in an efficient manner? Also after retrieval of each query, I need to close the corresponding database instances.
Your input will help me to understand this structure in a better way
Execute your query against each database using example function named exec, and await the returned promise array by Promise.allSettled. Once settled, parse (e.g., reduce, maybe sort) for proper merging.
// each db with client.db(name)
const dbArr = [db1, db2, ...];
// execute query for given collection across each db, return promise array
function exec(coll, query) {
let p = [];
for (let db of dbArr) {
p.push(db.collection(coll).find(query))
}
return p;
}
// main
async function fetchUniversitiesBy(filter) {
try {
// make mongo filter doc
const query = filter
const results = await Promise.allSettled(exec('university', query));
/*
reduce, or execute any other manipulation here to merge results.
Can check `status` of settled objects here
*/
return results.reduce((acc, c) => [...acc, ...c], []);
} catch(e) {
console.log(e)
} finally {
// client `close()` here
}
}
In terms of 'API', invoke fetchUniversitiesBy where you defined your api/universities/get (or however defined) route. Imagine your request params can be passed as filter.
I'm new to NodeJS and MongoDB and trying to insert a CSV file into MongoDB.
my first version was to create an array variable and push the data into the array like this
.on('data',(data)=>{array.push(JSON.parse(data))}
then after pushing all the objects into the array I insert it into MongoDB using
TempModel.insertMany(array)
this solution worked great for me in small files and even large ones if I allocate enough memory for nodeJS so the array can store more objects.
but in very large files I get an error
FATAL ERROR: Ineffective mark-compacts heap limit Allocation failed - JavaScript heap out of memory
I am guessing this error occurred because there are too many objects in the array (correct me if I am wrong)
So my new solution was to stream the CSV file and insert every line in it as an object into MongoDB, instead of pushing it into the array.
but when I start the project it stops at the first line and doesn't insert it into the MongoDB.
that's the code I have now.
any ideas on how can I make it work?
It is good to insert millions of objects one by one into the MongoDB
instead of insertMany?
I have created a schema and model in mongoose, then created a read stream and converted the CSV file into objects, and then insert it into MongoDB
const tempSchema = new mongoose.Schema({},{stric:false});
const TempModel = mongoose.model('tempCollection',tempSchema);
fs.createReadStream(req.file.path)
.pipe(csv())
.on('data',(data) => {
TempModel.insertOne(JSON.parse(data));
})
.on('end',()=>{
console.log('finished');
)};
The snippet can be restructured to use stream pipes to control the data flow down to the MongoDB write operations. This will avoid the memory issue and provide a means to batch operations together.
A somewhat complete pseudocode example:
import util from "util";
import streams from "streams";
const tempSchema = new mongoose.Schema({},{stric:false});
const TempModel = mongoose.model('tempCollection',tempSchema);
// Promisify waiting for the file to be parsed and stored in mongodb
await util.promisify(streams.pipeline)(
fs.createReadStream(req.file.path),
csv(),
// Create a writeable to piggyback streams built-in batch processing logic
new streams.Writable({
// bulkWrite() supports at most 1000 ops/call. Consequently, we do not need
// to load/parse additional rows into memory when this queue is full
highWaterMark: 1000,
writev: async (chunks, next) => {
try {
// Bulk write documents to MongoDB
await TempModel.bulkWrite(chunks.map(({chunk: data}) => ({
insertOne: JSON.parse(data)
})), {
ordered: false
});
// Signal completion
next();
}
catch(error) {
// Propagate errors
next(error)
}
}
})
);
console.log('finished');
Is it possible to query firestore documents by updateTime. The field that is available from the document snapshot as doc.updateTime and use it in a where query?
I am using the node.js sdk.
As far as I know there is no way to query on the metadata that Firestore automatically maintains. If you need to query the last update date, you will need to add a field with that value to the document's data.
I really need to query Firebase on document _updateTime, so I wrote a function that copies that hidden internal timestamp to a queryable field. This took some work to figure this out, so I am posting the complete solution. (Technically, this is "Cloud Firestore" rather then "Realtime Database".)
This is done using Firebase Functions, which itself took some tries to get working. This tutorial was helpful:
https://firebase.google.com/docs/functions/get-started
However, on Windows 10, the only command line that worked was the new Bash shell, available since about 2017. This was something of a runaround to install, but necessary. The GIT Bash shell, otherwise very useful, was not able to keep track of screen positions during Firebase project setup.
In my example code, I have left in all the 'console.log' statements, to show detail. Not obvious at first was where these logs go. They do not go to the command line, but to the Firebase console:
https://console.firebase.google.com/u/0/
under (yourproject) > Functions > Logs
For testing, I found it useful to, at first, deploy only one function (this is in the CLI):
firebase deploy --only functions:testFn
Below is my working function, heavily commented, and with some redundancy for illustration. Replace 'PlantSpp' with the name of your collection of documents:
// The Cloud Functions for Firebase SDK to create Cloud Functions and setup triggers.
const functions = require('firebase-functions');
// The Firebase Admin SDK to access the Firebase Realtime Database.
const admin = require('firebase-admin');
admin.initializeApp();
// Firestore maintains an interal _updateTime for every document, but this is
// not queryable. This function copies that to a visible field 'Updated'
exports.makeUpdateTimeVisible = functions.firestore
.document('PlantSpp/{sppId}')
.onWrite((sppDoc, context) => {
console.log("Event type: ", context.eventType);
// context.eventType = 'google.firestore.document.write', so cannot use
// to distinguish e.g. create from update
const docName = context.params.sppId // this is how to get the document name
console.log("Before: ", sppDoc.before); // if a create, a 'DocumentSnapshot',
// otherwise a 'QueryDocumentSnapshot'
// if a create, everything about sppDoc.before is undefined
if (typeof sppDoc.before._fieldsProto === "undefined"){
console.log('document "', docName, '" has been created');
// set flags here if desired
}
console.log("After: ", sppDoc.after); // if a delete, a 'DocumentSnapshot',
// otherwise a 'QueryDocumentSnapshot'
// if a delete, everything about sppDoc.after is undefined
if (typeof sppDoc.after._fieldsProto === "undefined"){
console.log('document "', docName, '" has been deleted');
// other fields could be fetched from sppDoc.before
return null; // no need to proceed
}
console.log(sppDoc.after.data()); // the user defined fields:values
// inside curly braces
console.log(sppDoc.after._fieldsProto); // similar to previous except with
// data types, e.g.
// data() has { Code: 'OLDO',...
// _fieldsProto has { Code: { stringValue: 'OLDO' },...
const timeJustUpdated = sppDoc.after._updateTime; // this is how to get the
// internal nonqueryable timestamp
console.log(timeJustUpdated);
// e.g. Timestamp { _seconds: 1581615533, _nanoseconds: 496655000 }
// later: Timestamp { _seconds: 1581617552, _nanoseconds: 566223000 }
// shows this is correctly updating
// see if the doc has the 'Updated' field yet
if (sppDoc.after._fieldsProto.hasOwnProperty('Updated')) {
console.log("doc has the field 'Updated' with the value",
sppDoc.after._fieldsProto.Updated);
console.log("sppDoc:", sppDoc);
const secondsInternal = timeJustUpdated._seconds;
console.log(secondsInternal, "seconds, internal timestamp");
const secondsExternal = sppDoc.after.data().Updated._seconds;
console.log(secondsExternal, "seconds, external timestamp");
// Careful here. If we just update the externally visible time to the
// internal time, we will go into an infinite loop because that update
// will call this function again, and by then the internal time will have
// advanced
// the following exit will not work:
if (secondsInternal === secondsExternal) return null; // will never exit
// instead, allow the external time to lag the internal by a little
const secondsLate = secondsInternal - secondsExternal;
if (secondsLate < 120) { // two minutes sufficient for this purpose
console.log("the field 'Updated' is", secondsLate,
"seconds late, good enough");
return null;
}
console.log("the field 'Updated' is", secondsLate,
"seconds late, updating");
// return a promise of a set operation to update the timestamp
return sppDoc.after.ref.set({
Updated: timeJustUpdated
}, {merge: true}); // 'merge' prevents overwriting whole doc
// this change will call this same function again
} else { // field 'Updated' does not exist in the document yet
// this illustrates how to add a field
console.log("doc does not have the field 'Updated', adding it now.");
// return a promise of a set operation to create the timestamp
return sppDoc.after.ref.set({
Updated: timeJustUpdated
}, {merge: true}); // 'merge' prevents overwriting the whole doc
// this change will call this same function again
}
});
True, there is no query for time created/modified, but when you fetch a document those fields exist in the payload. You have:
payload.doc['_document'].proto.createTime and payload.doc['_document'].proto.updateTime
Sure it's not good practice to rely on private fields, so will prolly need ongoing adjustments as Firestore changes its data model, but for now, for my uses, it gets me this otherwise un-query-able data.
In Meteor, on the server side, I want to use the .find() function on a Collection and then get a Node ReadStream interface from the curser that is returned. I've tried using .stream() on the curser as described in the mongoDB docs Seen Here. However I get the error "Object [object Object] has no method 'stream'" So it looks like Meteor collections don't have this option. Is there a way to get a stream from a Meteor Collection's curser?
I am trying to export some data to CSV and I want to pipe the data directly from the collections stream into a CSV parser and then into the response going back to the user. I am able to get the response stream from the Router package we are using, and it's all working except for getting a stream from the collection. Fetching the array from the find to push it into the stream manually would defeat the purpose of a stream since it would put everything in memory. I guess my other option is to use a foreach on the collection and push the rows into the stream one by one, but this seems dirty when I could pipe the stream directly through the parser with a transform on it.
Here's some sample code of what I am trying to do:
response.writeHead(200,{'content-type':'text/csv'});
// Set up a future
var fut = new Future();
var users = Users.find({}).stream();
CSV().from(users)
.to(response)
.on('end', function(count){
log.verbose('finished csv export');
response.end();
fut.ret();
});
return fut.wait();
Have you tried creating a custom function and piping to it?
Though this would only work if Users.find() supported .pipe()(again, only if Users.find inherited from node.js streamble object).
Kind of like
var stream = require('stream')
var util = require('util')
streamreader = function (){
stream.Writable.call(this)
this.end = function() {
console.log(this.data) //this.data contains raw data in a string so do what you need to to make it usable, i.e, do a split on ',' or something or whatever it is you need to make it usable
db.close()
})
}
util.inherits(streamreader,stream.Writeable)
stream.prototype._write = function (chunk, encoding, callback) {
this.data = this.data + chunk.toString('utf8')
callback()
}
Users.find({}).pipe(new streamReader())
In my node application i'm using redis DB to store the data.While getting the stored value using key i'm not getting the expected output.
var redis=require('redis');
var client = redis.createClient();
var pageContent={a: "a", b: "b", c: "c"};
//client.set('A',pageContent);//here i'm setting the value
client.get('A',function(err,res){
if(!err){
Object.keys(res).forEach(function(k){
console.log('key is '+k + ' value is '+res[k]);
});
}
else{
console.log('error');
}
});
Above code is not giving the stored value.While looping the result i'm getting the below error
TypeError: Object.keys called on non-object
So i have tried res.toString(); but i'm not getting the stored value instaed of that i'm getting only [object object];
The issue is that you are trying to save an object with SET. In redis, SET and GET only work with strings, so the reason you get [object Object] back is that's the string which was saved in redis -- the string representation of your object.
You can either serialize your objects as e.g. JSON, using JSON.stringify when saving, and JSON.parse when reading, or you can save your objects as redis hashes, using HMSET when saving, and HGETALL (or HGET / HMGET) when reading.
Edit: Note, though, that if you decide to use redis hashes, you cannot have "nested" objects -- i.e., you cannot store an object where one of the properties is an array or another object. That is,
{
a: 1,
b: 2
}
is okay, while
{
a: {
b: 2
}
}
is not. If you have such objects, you need another model (JSON with SET/GET works perfectly well in this case).