Resolve MongoDB reference - node.js

I am currently building a chatting app with nodejs and mongoDB.
Basically I have two collections to maintain in the db.
user = {
_id: ObjectId("1234"),
account: "stan123"
}
thread = {
_user: ObjectId("1234"),
messages: [
{
body:"hi"
_user:ObjectId("1234")
},
{
body:"second msg"
_user:ObjectId("1234")
}
]
}
I am planning to pass the thread model with all resolved info (user) to the client side, so that I can construct my widget with it.
I searched for solutions for this.Some suggests to make extra calls from client side to get the data.
However, I am worried that when the amount of message grows, there will be considerable http calls that might hurt site speed.
I know some drivers can resolve DBRefs automatically and make the code clean.
However, according to
http://docs.mongodb.org/manual/applications/database-references/
I decided to just use id to maintain reference that make it's as simple as possible.
My plan is resolving all references on server side. Current approach is getting the length of message array first.
Then loop through the message array and make a second query to resolve user info separately.
In each query callback, do a messageToResolve++ and if(messageToResolve >= thread.messages.length)
If the condition meets, send the resolved model to client and end the response.
This is not a case I would consider embedded because it would be painful when you need to update user data.
(message is embedded because it exists only when thread exists)
I am not sure if it's a good way to do it.
Does anyone has a better solution?
Sorry if I didn't explain my problem and solution clear enough.
And thanks in advance.

Related

Socket.io offline behaviour server-side

I have run into an unforeseen problem with my socket.io setup.
I use socket.io to live load data from my database (mongoDB, nodejs, react).
To accomplish this, I use mongoDB's changestream to detect changes and then push them to the front-end via socket.io.
Now this works perfectly as long as the user is connected. And right now, when the user reconnects, it just reloads all data. While this is fine for most users, there is a small group with very bad network connection and thus the front-end is reloading data all the time. Which causes the front-end to be unresponsive for some time.
So, I am looking for a way to only send events that occurred during the front-end being offline. While the front-end can do this quite easily: https://socket.io/docs/v4/client-offline-behavior/
It doesn't seem possible to do this at the server side. Since socket.io (server side) immediately forgets sockets that have disconnected and thus cant buffer events.
So, I was wondering if there is a good way do this? Or would this need a full "wrapper" around socket.io that caches disconnected sockets?
Any help or advice would be appreciated!
I find it is a really interesting and painful problem ! ^^'
If you can give more variables, it may help people to give you a better answer
For instance
How many data are stored in database, how much a typical user will receive, and how many events are triggered on a time frame ?
How long should an event take to be visible ? I mean, if users receive an event with a 10s,30s,... delay, is it harmfull for the service they provide.
How your data is structured ? is it a simple json array with the same field, custom field, dynamic json object, etc..
How your react app is structured, do you put heavy logic when your data is update, etc..
I think you should put more controls in your front end code and update only when new datas.
Some paths to explore
1. Put more controls in your front end
As you stated, for the users with bad connection, the react client seems to update his state too quickly, when they reload data after the websocket is connected, again and again. Ui may freeze in this case, yes.
For this, I think of two approaches :
Before updating the state, check if react current state is the same as the data you receive from websocket connection. If the reconnection is quick enough and no new data arrived, it should be the same. So in this case do not update react state.
If too many events are triggered and after each reconnection new data arrived, you can buffer the datas from the websocket and display it only once per time frame. What i mean by time frame, is you can use functions like setInterval or requestAnimationFrame to trigger react update. A pseudo react code to illustrate this.
function App() {
const [events, setEvents] = useState({ datas: [] });
const bufferedEvents = useRef([]);
useEffect(() => {
websocket.on("connected", (newEvents) => {
bufferedEvents.current = bufferedEvents.current.concat(newEvents);
})
websocket.on("data", (newEvent) => {
bufferedEvents.current = bufferedEvents.current.concat(newEvent);
})
// In the setInterval function you take all the events receive at the connection + new events. to update the react state. You clean the bufferedEvents at the same time.
const intervalId=setInterval(() => {
const events = bufferedEvents.current;
bufferedEvents.current = [];
//update if new datas
if (events.length > 0) {
setEvents((prevState) => { return { datas: prevState.datas.concat(events) } });
}
// console.log(events)
}, 1000) // trigger data update every second. You could replace this approach with a requestAnimationFrame. You can adapt the time refresh as you need.
//Do not forget to clear the interval when the component is unmount
return ()=>{
clearInterval(intervalId)
}
}, []);
return (
<div>
<span>Total events : {events.datas.length}</span>
<br />
{
events.datas.map(event => {
return <div>{event.data}</div>
})
}
</div>
)
}
You can look at this article for details on using requestAnimation frame.
I think that modifying the front end is needed in all case, but still alone, not really good on performance.
2. Fetch only new data in your back end
For this approach, it really depends how your data is structured in the database.
If the data have some timestamp in it, I can think of a naive but simple cookie with a timestamp in it.
When user connects the first time, this cookie is null.
When they fetch the data, on the websocket connection, they receive all the datas. When datas arrived, you update the cookie timestamp with the most recent date in the data.
Websocket is disconnected, you open a new websocket with the cookie timestamp on it. With this information you can query all the datas more recent than the timestamp on the cookie.
Like this, you don't have to download the entirity of data, but only fresh ones.
Other approaches may be more helpfull but without more informations on your datas and more precise requirements, it is hard to say.
If you have a lot of data, I will personally check some pagination mechanism and maybe combine some classic http request for fetching the data, and websocket, sse, or long polling for live events.
You can put a comment if needed and I will update my response !
Cheers

Strapi & react-admin : I'd like to set 'Content-Range' header dynamically when any fetchAll query fires

I'm still a novice web developer, so please bear with me if I miss something fundamental !
I'm creating a backoffice for a Strapi backend, using react-admin.
React-admin library uses a 'data provider' to link itself with an API. Luckily someone already wrote a data provider for Strapi. I had no problem with step 1 and 2 of this README, and I can authenticate to Strapi within my React app.
I now want to fetch and display my Strapi data, starting with Users. In order to do that, quoting Step 3 of this readme : 'In controllers I need to set the Content-Range header with the total number of results to build the pagination'.
So far I tried to do this in my User controller, with no success.
What I try to achieve:
First, I'd like it to simply work with the ctx.set('Content-Range', ...) hard-coded in the controller like aforementioned Step 3.
Second, I've thought it would be very dirty to c/p this logic in every controller (not to mention in any future controllers), instead of having some callback function dynamically appending the Content-Range header to any fetchAll request. Ultimately that's what I aim for, because with ~40 Strapi objects to administrate already and plenty more to come, it has to scale.
Technical infos
node -v: 11.13.0
npm -v: 6.7.0
strapi version: 3.0.0-alpha.25.2
uname -r output: Linux 4.14.106-97.85.amzn2.x86_64
DB: mySQL v2.16
So far I've tried accessing the count() method of User model like aforementioned step3, but my controller doesn't look like the example as I'm working with users-permissions plugin.
This is the action I've tried to edit (located in project/plugins/users-permissions/controllers/User.js)
find: async (ctx) => {
let data = await strapi.plugins['users-permissions'].services.user.fetchAll(ctx.query);
data.reduce((acc, user) => {
acc.push(_.omit(user.toJSON ? user.toJSON() : user, ['password', 'resetPasswordToken']));
return acc;
}, []);
// Send 200 `ok`
ctx.send(data);
},
From what I've gathered on Strapi documentation (here and also here), context is a sort of wrapper object. I only worked with Express-generated APIs before, so I understood this snippet as 'use fetchAll method of the User model object, with ctx.query as an argument', but I had no luck logging this ctx.query. And as I can't log stuff, I'm kinda blocked.
In my exploration, I naively tried to log the full ctx object and work from there:
// Send 200 `ok`
ctx.send(data);
strapi.log.info(ctx.query, ' were query');
strapi.log.info(ctx.request, 'were request');
strapi.log.info(ctx.response, 'were response');
strapi.log.info(ctx.res, 'were res');
strapi.log.info(ctx.req, 'were req');
strapi.log.info(ctx, 'is full context')
},
Unfortunately, I fear I miss something obvious, as it gives me no input at all. Making a fetchAll request from my React app with these console.logs print this in my terminal:
[2019-09-19T12:43:03.409Z] info were query
[2019-09-19T12:43:03.410Z] info were request
[2019-09-19T12:43:03.418Z] info were response
[2019-09-19T12:43:03.419Z] info were res
[2019-09-19T12:43:03.419Z] info were req
[2019-09-19T12:43:03.419Z] info is full context
[2019-09-19T12:43:03.435Z] debug GET /users?_sort=id:DESC&_start=0&_limit=10& (74 ms)
While in my frontend I get the good ol' The Content-Range header is missing in the HTTP Response message I'm trying to solve.
After writing this wall of text I realize the logging issue is separated from my original problem, but if I was able to at least log ctx properly, maybe I'd be able to find the solution myself.
Trying to summarize:
Actual problem is, how do I set my Content-Range properly in my strapi controller ? (partially answered cf. edit 3)
Collateral problem n°1: Can't even log ctx object (cf. edit 2)
Collateral problem n°2: Once I figure out the actual problem, is it feasible to address it dynamically (basically some callback function for index/fetchAll routes, in which the model is a variable, on which I'd call the appropriate count() method, and finally append the result to my response header)? I'm not asking for the code here, just if you think it's feasible and/or know a more elegant way.
Thank you for reading through and excuse me if it was confuse; I wasn't sure which infos would be relevant, so I thought the more the better.
/edit1: forgot to mention, in my controller I also tried to log strapi.plugins['users-permissions'].services.user object to see if it actually has a count() method but got no luck with that either. Also tried the original snippet (Step 3 of aforementioned README), but failed as expected as afaik I don't see the User model being imported anywhere (the only import in User.js being lodash)
/edit2: About the logs, my bad, I just misunderstood the documentation. I now do:
ctx.send(data);
strapi.log.info('ctx should be : ', {ctx});
strapi.log.info('ctx.req = ', {...ctx.req});
strapi.log.info('ctx.res = ', {...ctx.res});
strapi.log.info('ctx.request = ', {...ctx.request});
ctrapi.log.info('ctx.response = ', {...ctx.response});
Ctx logs this way; also it seems that it needs the spread operator to display nested objects ({ctx.req} crash the server, {...ctx.req} is okay). Cool, because it narrows the question to what's interesting.
/edit3: As expected, having logs helps big time. I've managed to display my users (although in the dirty way). Couldn't find any count() method, but watching the data object that is passed to ctx.send(), it's equivalent to your typical 'res.data' i.e a pure JSON with my user list. So a simple .length did the trick:
let data = await strapi.plugins['users-permissions'].services.user.fetchAll(ctx.query);
data.reduce((acc, user) => {
acc.push(_.omit(user.toJSON ? user.toJSON() : user, ['password', 'resetPasswordToken']));
return acc;
}, []);
ctx.set('Content-Range', data.length) // <-- it did the trick
// Send 200 `ok`
ctx.send(data);
Now starting to work on the hard part: the dynamic callback function that will do that for any index/fetchAll call. Will update once I figure it out
I'm using React Admin and Strapi together and installed ra-strapi-provider.
A little boring to paste Content-Range header into all of my controllers, so I searched for a better solution. Then I've found middleware concept and created one that fits my needs. It's probably not the best solution, but do its job well:
const _ = require("lodash");
module.exports = strapi => {
return {
// can also be async
initialize() {
strapi.app.use(async (ctx, next) => {
await next();
if (_.isArray(ctx.response.body))
ctx.set("Content-Range", ctx.response.body.length);
});
}
};
};
I hope it helps
For people still landing on this page:
Strapi has been updated from #alpha to #beta. Care, as some of the code in my OP is no longer valid; also some of their documentation is not up to date.
I failed to find a "clever" way to solve this problem; in the end I copy/pasted the ctx.set('Content-Range', data.length) bit in all relevant controllers and it just worked.
If somebody comes with a clever solution for that problem I'll happily accept his answer. With the current Strapi version I don't think it's doable with policies or lifecycle callbacks.
The "quick & easy fix" is still to customize each relevant Strapi controller.
With strapi#beta you don't have direct access to controller's code: you'll first need to "rewrite" one with the help of this doc. Then add the ctx.set('Content-Range', data.length) bit. Test it properly with RA, so for the other controllers, you'll just have to create the folder, name the file, copy/paste your code + "Search & Replace" on model name.
The "longer & cleaner fix" would be to dive into the react-admin source code and refactorize so the lack of "Content-Range" header doesn't break pagination.
You'll now have to maintain your own react-admin fork, so make sure you're already committed into this library and have A LOT of tables to manage through it (so much that customizing every Strapi controller will be too tedious).
Before forking RA, please remember all the stuff you can do with the Strapi backoffice alone (including embedding your custom React app into it) and ensure it will be worth the trouble.

How do I manage groups/rooms with node WebSockets?

TL;DR below.
I am currently developing a React/Redux SPA that is driven by real-time data. I've decided to use ws, instead of socket.io since socket.io feels a bit high level for what I'm doing, I'd rather manage sockets myself.
In saying that, I'm struggling to find a way to manage the separation of updates/messages per view/route. Since I'm using client-side routing it's per express route won't really work...
Messages between the server and client via WebSockets are JSON with actions like GET_ITEMS then a response of GET_ITEMS_SUCCESS with an array of 'items' and for errors: ..._ERROR etc. This is all fine, since it's just 1 to 1 transaction. Though the problem arises when broadcasting (1 to all) to all relevant clients when the server receives an update.
So, I assume it best practice to limit these broadcasts to the clients that are viewing/want the data. So when viewing, for example, the Item page, there is no point in broadcasting updates to the User data since that is only used on the User page.
I haven't been able to find any common practices when dealing with this sort of situation, just a few small outdated/barely used wrappers for ws that just add a few basic functions to leave/join but don't offer much flexibility with implementation.
What I think MIGHT work is to have an object/array for each 'group'/'room', which stores the clients that are currently listening to updates from a given section. So a user would send an action to INIT_LISTEN (& ``) with a param of category, e.g. ITEM for updates and other actions related to items.
TL;DR
What my question really boils down to is: How do I store a reference to a single socket? (ws client object? ws client ID?) Then, can I store this in an object/array to iterate through like below.
const ClientRooms = {
Items: {
{
...ws
}
/* ...rest of the client */
}
}
or
const ClientRooms = {
Items: [ "xyz" ] /* Array of ws ids */
}
I have a "ping--pong" heartbeat function to keep clients active and prevent silent connection failures/disconnections. I can't find if ws.terminate() still fires the ws close event so I can iterate 'group'/'room' the object/array to find and remove instances of that client.

node-vertica throws exception for multi-queries with a callback

I have created a simple web interface for vertica.
I expose simple operation above a vertica cluster.
one of the functionality I expose is querying vertica.
when my user enters a multi-query the node modul throws an exception and my process exits with exit 1.
Is there any way to catch this exception?
Is there any way overcome the problem in a different way?
Right now there's no way to overcome this when using a callback for the query result.
Preventing this from happening would involve making sure there's only one query in the user's input. This is hard because it involves parsing SQL.
The callback API isn't built to deal with multi-queries. I simply haven't bothered implementing proper handling of this case, because this has never been an issue for me.
Instead of a callback, you could use the event listener API, which will send you lower level messages, and handle this yourself.
q = conn.query("SELECT...; SELECT...");
q.on("fields", function(fields) { ... }); // 1 time per query
q.on("row", function(row) { ... }); // 0...* time per query
q.on("end", function(status) { ... }); // 1 time per query

Cancel previous MongoDB operation from the same client

I have a MongoDB collection of 3257477 cities, and I'm using Mongoose on NodeJS to access it. I'm making requests to it repeatedly (once per 500ms). Requests are usually answered very quickly. However, when I make a bad typo the query takes a long time and requests start to pile up until the initial request is answered. Here are some logs I collected of requests and responses:
21:48:50 started query for "new"
21:48:50 finished query for "new"
21:48:52 started query for "newj ljl" // blockage
21:48:54 started query for "newj"
21:48:55 started query for "new"
21:48:57 started query for "new ye"
21:48:59 started query for "new york"
21:49:08 finished query for "newj ljl" // blockage removed, quick queries flood in
21:49:08 finished query for "new"
21:49:08 finished query for "new york"
21:49:08 finished query for "new ye"
21:49:23 finished query for "newj"
I'm able to cancel the requests made by the client so I'm not worried about queries coming back in the wrong order. And I'm not interested in how to make that query faster at this point, since queries for actual correct spellings are quick.
I'm wondering how a new request can cancel an old request that was made by the same client. In other words "newj ljl" gets canceled when "newj" arrives, "newj" gets canceled when "new" arrives, and so on. If it's just going to be thrown out, why tie up the database?
Is there a proper way to do this?
Update:
I'm aware of db.currentOp().inprog and I'm thinking I can use the client property of the documents within that array to know whether it's a repeat request, but I can't quite figure out how to access that from Mongoose. I'm also not sure when to do that, or how I know which request was spawned from this client (and therefore which to cancel). I'd like an actual code example using Mongoose, or the native NodeJS MongoDB driver if possible!
Here's some sample code to go off of:
models.City.find({ ... })
.exec(function (err, cities) {
});
Below is what I came up with to solve the issue.
I can easily do db.currentOp().inprog and db.killOp() from the Mongo shell, but I really need this to happen automatically, when it needs to, from Mongoose. Since you can reference the MongoDB driver using require('mongoose').connection.db, you can execute those commands by doing "queries" on the following collections:
db.collection('$cmd.sys.inprog');
db.collection('$cmd.sys.killop');
The full solution:
var db = require('mongoose').connection.db,
// get the client IP address
ip = request.headers['x-forwarded-for'] ||
request.connection.remoteAddress ||
request.socket.remoteAddress ||
request.connection.socket.remoteAddress;
// same thing as db.currentOp().inprog
db.collection('$cmd.sys.inprog').findOne(function (err, data) {
if (err) throw err;
data.inprog.filter(function (op) {
// get the operation's client IP address without the port
return ip == op.client.split(':')[0];
}).forEach(function(op){
// same thing as db.killOp()
db.collection('$cmd.sys.killop')
.findOne({ 'op': op.opid }, function (err, data) {
if (err) throw err;
});
});
// start the new cities query
models.City.find({ ... })
.exec(function (err, cities) {
});
});
Helpful links:
https://groups.google.com/forum/#!topic/mongodb-user/1wFp7AqWnM4
drop database with mongoose
How to determine a user's IP address in node
You can try using db.killOp()
http://docs.mongodb.org/manual/reference/method/db.killOp/#db.killOp
UPDATE: You can get the list of current operations from db.currentOp() and identify the operation to be cancelled by matching fields like op, query and client
http://docs.mongodb.org/manual/reference/method/db.currentOp/#db.currentOp
You can definitely do this with killop, and the above solution looks like it could work for the problem as stated. However, I think it may be worthwhile to dig a bit deeper.
The fact that you have a noticeably slow query when you've got a query that's going to return no results seems unusual. That reeks of a full collection scan. The questions to ask are, first, do you have indices set up, and second, are you querying with a general regex? MongoDB doesn't really handle regex searches like { "name" : /.*new york.*/ } particularly well.
Also, the whole "send an http request every time the user hits a key" approach is simple and elegant, but also causes some unnecessary server load. Perhaps a search button or a client-side timeout where you only send a request if a user hasn't hit a key for 1 second could help alleviate the need for the killop approach.

Resources