A common connection string for mongoose connecting to a replica set is something like follows
var connection = mongoose.createConnection("mongodb://db_1:27017/client_test,mongodb://db_2:27017/client_test", {
replSet : { rs_name : "rs0", poolSize : 5, socketOptions : { keepAlive : 1 } }
}, function(err) {
if (err) { throw err; }
});
The problem with that is if one of the two hosts is down, then it will fail to connect. If you only specify one host, then no requests end up getting sent to secondaries.
Here's my proof for that claim. If you specify one host, and setup your replica set so that there is one primary and an arbiter and then perform a query such as
myApi.find({}).slaveOk().read("s").exec(function(err, docs) {
console.log(docs)
})
It will return results. Well, since I am specifying "s" (secondary), this query should throw an error because there are no running secondaries. In addition, if you bring the secondary online and then do db.currentOp(true), you will never see any actual queries sent it's way.
The moment you alter the connection string to specify every host then you will see connections go to the secondary. The dilemma is that now, because you had to specify the additional host in the connection string, in the event a secondary was offline, it would fail to connect and we've now lost failover (or the entire point to replica sets)
I can't determine if this is a configuration mistake on my part, a bug in Mongoose, or a conceptual flaw in my understanding of the way replica sets function. From some of the docs, they seem to state that reading from secondaries is basically a bad idea, but the reason for doing so is usually issues with stale data. My issue doesn't have anything to do with stale date, I can't figure out a way to setup the system so that I can get queries to secondaries without losing failover capacity.
1.connection string just defines seed servers, mongodb driver tries to connect to these servers and get information about other servers in replicaSet( by calling rs.status()). You could have replicaSet with 5 nodes, but specify only one in connection string, but driver would be able to find four others if server from connection string is available.
2.My proposal is to use secondaryPreferred instead of just secondary, so that in case there is no secondary available, request would be done to primary.
Ok, I believe I have solved all of my problems. Here is what I learned.
Specify all possible replica nodes in your connection string, otherwise Mongoose will never send requests there. Mongoose has a specific format for this which is different than the node-mongodb-native driver. Example below.
In order to prevent it from hanging forever if one of the nodes is down on bootup you need to specify connectTimeoutMS in the 'replset' options, then it will only wait that long for responses from each nodes on initial connection. If the node comes online at a later date, it will still be available.
The host name entries in your mongodb replica setup need to match the hostname entries in the connection string from your application and all hostnames need to be accessible from all parties (mongo to mongo and application to mongo). In my case I had aliased the hostnames from mongo to mongo as mongo1:27017, mongo2:27017, and mongo3:27017. My application server used a connection string with IPs. Mongoose was attempting to re-initate the connection using the mongo1:27017 hostname (which my application server could not reach) rather than the IP address I specified in the connection string. This resulted in it never re-connecting to a node it lost contact with. It is possible had I used hostnames that the application could reach it still would have worked, but I think it's a best practice to make the connection string and the replica setup identical to remove possibly places for error.
On the mongodb node that you rs.initiate() you might need to update the hostname to be a value that all boxes (other mongodbs and application server can reach). By default it will likely end up with a hostname like localhost, which means something different on each machine. This can be from that boxes mongo shell like so.
Example:
// from mongo shell
conf = rs.conf()
conf.members[0].host = "mongo1:27017"
rs.reconfig(conf)
Final functioning connection string which successfully fails over between nodes, including throwing errors if a query is destined for a secondary but there aren't secondaries.
var connection = mongoose.createConnection("mongodb://mongo1:27017/client_test,mongo2:27017/client_test,mongo3:27017/client_test", {
replset : { rs_name : "rs0", poolSize : 5, socketOptions : { keepAlive : 1, connectTimeoutMS : 1000 } },
}, function(err) {
if (err) { throw err; }
});
Working replica setup
{
"_id" : "rs0",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "mongo1:27017"
},
{
"_id" : 1,
"host" : "mongo2:27017"
},
{
"_id" : 2,
"host" : "mongo3:27017",
"arbiterOnly" : true
}
]
}
I had some issue similar to yours while dealing with replica, in my case I had 1 primary node with a priority of 10, 1 secondary priority of 0(for analytics) and an arbiter.
My writes would fail after reconnecting the primary instance and I went through a lot trying to fix it here's the most important thing I learnt:
When my primary is down or unreacheable, there has to be another member eligible to become primary.(At least 2members in my set has to have a priority >= 1).
If I have only arbiters, hidden, or members with a priority of 0,
queries will get stuck even after I reconnect my primary, my client is
unable to complete write queries. Read queries would still work, but
write wouldn't.
This is what I faced with mongoose, even with keepalive, autoreconnect and all the socket and connection timeout MS set.
Hopefully this helps.
Related
We are creating a NodeJS based solution that makes use of MongoDB, by means of Mongoose. We recently started adding support for Atlas, but we would like to be able to fallback to non-Atlas based queries, when Atlas is not available, for a given connection.
I can't assume the software will be using MongoDB Cloud. Although I could make assumptions based on the URL, I'd still need to have a way to be able to do something like:
const available: boolean = MyModel.connection.isAtlasAvailable()
The reason we want this is because if we make an assumption on Atlas being available and then the client uses a locally hosted MongoDB, the following code will break, since $search is Atlas specific:
const results = await Person.aggregate([
{
$search: {
index: 'people_text_index',
deleted: { $ne: true },
'text': {
'query': filter.query,
'path': {
wildcard: '*'
}
},
count: {
type: 'total'
}
}
},
{
$addFields: {
'mongoMeta': '$$SEARCH_META'
}
},
{ $skip : offset },
{ $limit: limit }
]);
I suppose I could surround this with a try/catch and then fall back to a non-Atlas search, but I'd rather check something is doable before trying an operation.
Is there any way to check whether MongoDB Atlas is available, for a given connection? As an extension to the question, does Mongoose provide a general pattern for checking for feature availability, such as if the connection supports transactions?
I suppose I could surround this with a try/catch and then fall back to a non-Atlas search, but I'd rather check something is doable before trying an operation.
As an isAtlasCluster() check, it would be more straightforward to use a regex match to confirm the hostname in the connection URI ends in mongodb.net as used by MongoDB Atlas clusters.
However, it would also be much more efficient to set a feature flag based on the connection URI when your application is initialised rather than using try/catch within the model on every request (which will add latency of at least one round trip failure for every search request).
I would also note that checking for an Atlas connection is not equivalent to checking if Atlas Search is configured for a collection. If your application requires some initial configuration of search indexes, you may want to have a more explicit feature flag configured by an app administrator or enabled as part of search index creation.
There are a few more considerations depending on the destination cluster tier:
Atlas free & shared tier clusters support fewer indexes so a complex application may have a minimum cluster tier requirement.
Atlas Serverless Instances (currently in preview) does not currently have support for Atlas Search (see Serverless Instance Limitations).
As an extension to the question, does Mongoose provide a general pattern checking for feature availability, such as if the connection supports transactions?
Multi-document transactions are supported in all non-EOL versions of MongoDB server (4.2+) as long as you are connected to a replica set or sharded cluster deployment using the WiredTiger storage engine (default for new deployments since MongoDB 3.2). MongoDB 4.0 also supports multi-document transactions, but only for replica set deployments using WiredTiger.
If your application has a requirement for multi-document transaction support, I would also check that on startup or make it part of your application deployment prerequisites.
Overall this feels like complexity that should be covered by prerequisites and set up of your application rather than runtime checks which may cause your application to behave unexpectedly even if the initial deployment seems fine.
I believe this is more of a MongoDB question than a Meteor question, so don't get scared if you know a lot about mongo but nothing about meteor.
Running Meteor in development mode, but connecting it to an external Mongo instance instead of using Meteor's bundled one, results in the same problem. This leads me to believe this is a Mongo problem, not a Meteor problem.
The actual problem
I have a meteor project which continuosly gets data added to the database, and displays them live in the application. It works perfectly in development mode, but has strange behaviour when built and deployed to production. It works as follows:
A tiny script running separately collects broadcast UDP packages and shoves them into a mongo collection
The Meteor application then publishes a subset of this collection so the client can use it
The client subscribes and live-updates its view
The problem here is that the subscription appears to only get data about every 10 seconds, while these UDP packages arrive and gets shoved into the database several times per second. This makes the application behave weird
It is most noticeable on the collection of UDP messages, but not limited to it. It happens with every collection which is subscribed to, even those not populated by the external script
Querying the database directly, either through the mongo shell or through the application, shows that the documents are indeed added and updated as they are supposed to. The publication just fails to notice and appears to default to querying on a 10 second interval
Meteor uses oplog tailing on the MongoDB to find out when documents are added/updated/removed and update the publications based on this
Anyone with a bit more Mongo experience than me who might have a clue about what the problem is?
For reference, this is the dead simple publication function
/**
* Publishes a custom part of the collection. See {#link https://docs.meteor.com/api/collections.html#Mongo-Collection-find} for args
*
* #returns {Mongo.Cursor} A cursor to the collection
*
* #private
*/
function custom(selector = {}, options = {}) {
return udps.find(selector, options);
}
and the code subscribing to it:
Tracker.autorun(() => {
// Params for the subscription
const selector = {
"receivedOn.port": port
};
const options = {
limit,
sort: {"receivedOn.date": -1},
fields: {
"receivedOn.port": 1,
"receivedOn.date": 1
}
};
// Make the subscription
const subscription = Meteor.subscribe("udps", selector, options);
// Get the messages
const messages = udps.find(selector, options).fetch();
doStuffWith(messages); // Not actual code. Just for demonstration
});
Versions:
Development:
node 8.9.3
mongo 3.2.15
Production:
node 8.6.0
mongo 3.4.10
Meteor use two modes of operation to provide real time on top of mongodb that doesn’t have any built-in real time features. poll-and-diff and oplog-tailing
1 - Oplog-tailing
It works by reading the mongo database’s replication log that it uses to synchronize secondary databases (the ‘oplog’). This allows Meteor to deliver realtime updates across multiple hosts and scale horizontally.
It's more complicated, and provides real-time updates across multiple servers.
2 - Poll and diff
The poll-and-diff driver works by repeatedly running your query (polling) and computing the difference between new and old results (diffing). The server will re-run the query every time another client on the same server does a write that could affect the results. It will also re-run periodically to pick up changes from other servers or external processes modifying the database. Thus poll-and-diff can deliver realtime results for clients connected to the same server, but it introduces noticeable lag for external writes.
(the default is 10 seconds, and this is what you are experiencing , see attached image also ).
This may or may not be detrimental to the application UX, depending on the application (eg, bad for chat, fine for todos).
This approach is simple and and delivers easy to understand scaling characteristics. However, it does not scale well with lots of users and lots of data. Because each change causes all results to be refetched, CPU time and network bandwidth scale O(N²) with users. Meteor automatically de-duplicates identical queries, though, so if each user does the same query the results can be shared.
You can tune poll-and-diff by changing values of pollingIntervalMs and pollingThrottleMs.
You have to use disableOplog: true option to opt-out of oplog tailing on a per query basis.
Meteor.publish("udpsPub", function (selector) {
return udps.find(selector, {
disableOplog: true,
pollingThrottleMs: 10000,
pollingIntervalMs: 10000
});
});
Additional links:
https://medium.baqend.com/real-time-databases-explained-why-meteor-rethinkdb-parse-and-firebase-dont-scale-822ff87d2f87
https://blog.meteor.com/tuning-meteor-mongo-livedata-for-scalability-13fe9deb8908
How to use pollingThrottle and pollingInterval?
It's a DDP (Websocket ) heartbeat configuration.
Meteor real time communication and live updates is performed using DDP ( JSON based protocol which Meteor had implemented on top of SockJS ).
Client and server where it can change data and react to its changes.
DDP (Websocket) protocol implements so called PING/PONG messages (Heartbeats) to keep Websockets alive. The server sends a PING message to the client through the Websocket, which then replies with PONG.
By default heartbeatInterval is configure at little more than 17 seconds (17500 milliseconds).
Check here: https://github.com/meteor/meteor/blob/d6f0fdfb35989462dcc66b607aa00579fba387f6/packages/ddp-client/common/livedata_connection.js#L54
You can configure heartbeat time in milliseconds on server by using:
Meteor.server.options.heartbeatInterval = 30000;
Meteor.server.options.heartbeatTimeout = 30000;
Other Link:
https://github.com/meteor/meteor/blob/0963bda60ea5495790f8970cd520314fd9fcee05/packages/ddp/DDP.md#heartbeats
So we are having an interesting problem. We wanted to add authentication at the MongoDB Layer for more security. But we not getting a favorable outcome.
Pre-Setup
Use mongo shell (against admin table) as root
Switch to desired database (applicationdb)
Execute db.createUser()
Validate user was created successfully
{
"_id" : "applicationdb.appuser",
"user" : "appuser",
"db" : "applicationdb",
"roles" : [
{
"role" : "readWrite",
"db" : "applicationdb"
}
]
}
Scenario 1:
Change mongodb.conf, auth=true
Restart the Mongod service
Connect mongoose using:
mongoose.connect('mongodb://appuser:password#xx.xxx.xxx.xxx:27017/applicationdb');
No errors received for connect, so try to perform a GET through Mongoose causes the operation to timeout without any error (at least that I could find)
Scenario 2:
Change mongodb.conf, auth=false
Restart the Mongod service
Connect mongoose using:
mongoose.connect('mongodb://xx.xxx.xxx.xxx:27017/applicationdb');
No errors received for connect, so try to perform a GET through Mongoose and it returns documents successfully
Why do we get this timeout and never a completed request when using authentication in MongoDB?
Any help would be great, we're at a loss on this one!
You need to restart the mongo service with the --auth option see here
If it doesn't works do this:
Try setting server options in mongoose with keepAlive set. See here and here.
I am trying to connect node.js app to MongoDB having replica set but it's throwing an error when any write operations are performed.
It throws MongoError: not master.
It tries to write on secondary mongo instances.
I have the options as { db: { readPreference: secondaryPreferred } } and passing it to the function MongoClient.connect in the node.js code using native Mongo Driver.
The URL used to connect looks like mongodb://admin:pass#host_one:27017,host_two:27017,host_three:27017/dbName
Any help would be really appreciated.
Did you add in your replicaSet name?
mongodb://admin:pass#host_one:27017,host_two:27017,host_three:27017/dbName?replicaSet=my-replica-set
replicaSet=name
The driver verifies that the name of the replica set it connects to
matches this name. Implies that the hosts given are a seed list, and
the driver will attempt to find all members of the set. No default
value.
If this is not set it will be treated as a standalone node.
Maybe your replica set configuration is not correct.
To check the configuration run the rs.conf() command in your mongo servers. You need to have a mongo host running as primary member.
MongoError: Not master
This error seems like your primary member of replica set is not configured properly.
You can confirm this by entering into mongo shell of the host_one. If mongo shell prompt doesn't show PRIMARY, then it's not configured properly.
Mongo shell prompt of host_two and host_three should show SECONDARY after proper configuration.
Important : Run rs.initiate() on just one and only one mongod instance for the replica set.
You can execute this command on the primary member to make the configuration work properly.
rs.initiate();
cfg = {
_id: 'rs0',
members: [{
_id: 0,
host: 'host_one:27017',
priority: 2
}, {
_id: 1,
host: 'host_two:27017',
priority: 1
}, {
_id: 2,
host: 'host_three:27017',
priority: 1
}]
};
cfg.protocolVersion = 1;
rs.reconfig(cfg, {
force: true
});
Please note that priority value indicates the relative eligibility of a member to become a primary.
Specify higher values to make a member more eligible to become primary, and lower values to make the member less eligible. A member with a priority of 0 is ineligible to become primary.
You can again check your replica set configuration using this command
rs.conf()
Read preference is not applicable to writes. Writes must always be performed on the primary.
You should be connecting to replica set instead of directly to an individual node. See node.js mongodb how to connect to replicaset of mongo servers
I've been doing some reading up on deepstream.io and so far I've discovered the following:
All records are stored in the same table (deepstream_records by default)
To interact with this data, the client can be used, both client side (browser) and server side (node), but should not be used on the server side (node).
Questions:
How should I interact with records from the server side?
Is there something stopping me from changing records in the database?
Would changes to records in the database update client subscriptions?
Would this be considered bad practice?
Why are all the records stored in the same table?
Data example from RethinkDB:
{
"_d": { },
"_v": 0,
"ds_id": "users/"
}, {
"_d": { },
"_v": 0,
"ds_id": "users/admin"
}
Why are all the records stored in the same table?
server.set( 'storage', new RethinkDBStorageConnector( {
port: 5672,
host: 'localhost' ,
/* (Optional) A character that's used as part of the
* record names to split it into a tabel and an id part, e.g.
*
* books/dream-of-the-red-chamber
*
* would create a table called 'books' and store the record under the name
* 'dream-of-the-red-chamber'
*/
splitChar: '/'
}));
server.start();
Did you not mention the splitChar? ( It doesn't default )
How should I interact with records on the server side?
To interact with this data you would create a node client that connects to your server using tcp ( default port 6021 ). The server itself is a very efficient message broker that can distribute messages with low latency, and our recommendation is to not include any custom code that isn't necassary, even when using permissioning and dataTransforms.
https://deepstream.io/tutorials/core/transforming-data
You can see this explained in the FX provider example in the tutorials:
https://deepstream.io/tutorials/core/active-data-providers
And the tank game tutorial example:
https://github.com/deepstreamIO/ds-tutorial-tanks
Is there something stopping me from changing records in the database?
deepstream maintains its low latency by actually doing all the writes/reads to the cache. Writing to the database happens secondary in order not to introduce a hit. Because of this changing the record directly will not actually notify any of the users, as well as break some logic used for merge handling...