We are creating a NodeJS based solution that makes use of MongoDB, by means of Mongoose. We recently started adding support for Atlas, but we would like to be able to fallback to non-Atlas based queries, when Atlas is not available, for a given connection.
I can't assume the software will be using MongoDB Cloud. Although I could make assumptions based on the URL, I'd still need to have a way to be able to do something like:
const available: boolean = MyModel.connection.isAtlasAvailable()
The reason we want this is because if we make an assumption on Atlas being available and then the client uses a locally hosted MongoDB, the following code will break, since $search is Atlas specific:
const results = await Person.aggregate([
{
$search: {
index: 'people_text_index',
deleted: { $ne: true },
'text': {
'query': filter.query,
'path': {
wildcard: '*'
}
},
count: {
type: 'total'
}
}
},
{
$addFields: {
'mongoMeta': '$$SEARCH_META'
}
},
{ $skip : offset },
{ $limit: limit }
]);
I suppose I could surround this with a try/catch and then fall back to a non-Atlas search, but I'd rather check something is doable before trying an operation.
Is there any way to check whether MongoDB Atlas is available, for a given connection? As an extension to the question, does Mongoose provide a general pattern for checking for feature availability, such as if the connection supports transactions?
I suppose I could surround this with a try/catch and then fall back to a non-Atlas search, but I'd rather check something is doable before trying an operation.
As an isAtlasCluster() check, it would be more straightforward to use a regex match to confirm the hostname in the connection URI ends in mongodb.net as used by MongoDB Atlas clusters.
However, it would also be much more efficient to set a feature flag based on the connection URI when your application is initialised rather than using try/catch within the model on every request (which will add latency of at least one round trip failure for every search request).
I would also note that checking for an Atlas connection is not equivalent to checking if Atlas Search is configured for a collection. If your application requires some initial configuration of search indexes, you may want to have a more explicit feature flag configured by an app administrator or enabled as part of search index creation.
There are a few more considerations depending on the destination cluster tier:
Atlas free & shared tier clusters support fewer indexes so a complex application may have a minimum cluster tier requirement.
Atlas Serverless Instances (currently in preview) does not currently have support for Atlas Search (see Serverless Instance Limitations).
As an extension to the question, does Mongoose provide a general pattern checking for feature availability, such as if the connection supports transactions?
Multi-document transactions are supported in all non-EOL versions of MongoDB server (4.2+) as long as you are connected to a replica set or sharded cluster deployment using the WiredTiger storage engine (default for new deployments since MongoDB 3.2). MongoDB 4.0 also supports multi-document transactions, but only for replica set deployments using WiredTiger.
If your application has a requirement for multi-document transaction support, I would also check that on startup or make it part of your application deployment prerequisites.
Overall this feels like complexity that should be covered by prerequisites and set up of your application rather than runtime checks which may cause your application to behave unexpectedly even if the initial deployment seems fine.
Related
I am trying to watch for changes in my collection but I am getting the following error:
"MongoError: Majority read concern requested, but it is not supported by the storage engine."
The answer seems to be: "To use watch you need to use replica set which is not part of mLab".
But i have a paid plan with a replica set. My connection to mlab looks like this.
mongoose.connect('mongodb://<dbuser>:<dbpassword>#ds327925-a0.mlab.com:27925,ds327925-a1.mlab.com:27925/<dbname>?replicaSet=rs-ds327925');
const taskCollection = db.collection('tasks');
const changeStream = taskCollection.watch();
changeStream.on('change', (change) => {
});
Majority read concern requires WiredTiger storage engine.
Availability of WiredTiger in mlab is limited to "dedicated" plans, apparently.
Besides upgrading your plan, you could also consider migrating to MongoDB Atlas.
In MongoDB 4.2+ change streams do not require majority read concern, but I don't imagine 4.2 is available in mlab either.
I'm using this code to run the tests outlined in this blog post.
(For posterity, relevant code pasted at the bottom).
What I've found is that if I run these experiments with a local instance of Mongo (in my case, using docker)
docker run -d -p 27017:27017 -v ~/data:/data/db mongo
Then I get pretty good performance, similar results as outlined in the blog post:
finished populating the database with 10000 users
default_query: 277.986ms
query_with_index: 262.886ms
query_with_select: 157.327ms
query_with_select_index: 136.965ms
lean_query: 58.678ms
lean_with_index: 65.777ms
lean_with_select: 23.039ms
lean_select_index: 21.902ms
[nodemon] clean exit - waiting
However, when I switch do using a cloud instance of Mongo, in my case an Atlas sandbox instance, with the following configuration:
CLUSTER TIER
M0 Sandbox (General)
REGION
GCP / Iowa (us-central1)
TYPE
Replica Set - 3 nodes
LINKED STITCH APP
None Linked
(Note that I'm based in Melbourne, Australia).
Then I get much worse performance.
adding 10000 users to the database
finished populating the database with 10000 users
default_query: 8279.730ms
query_with_index: 8791.286ms
query_with_select: 5234.338ms
query_with_select_index: 4933.209ms
lean_query: 13489.728ms
lean_with_index: 10854.134ms
lean_with_select: 4906.428ms
lean_select_index: 4710.345ms
I get that obviously there's going to be some round trip overhead between my computer and the mongo instance, but I would expect that to add 200ms max.
It seems that that round trip time must be being added multiple times, or something completely else that I'm not aware of - can someone explain just what it is that would cause this to blow out?
A good answer might involve doing an explain plan, and explaining that in terms of network latency.
Tests against different Atlas instances - For those suggesting the issue is that I'm using a Sandbox instance of Atlas - here is the results for a M20 and M30 instances:
BACKUPS
Active
CLUSTER TIER
M20 (General)
REGION
GCP / Iowa (us-central1)
TYPE
Replica Set - 3 nodes
LINKED STITCH APP
None Linked
BI CONNECTOR
Disabled
adding 10000 users to the database
finished populating the database with 10000 users
default_query: 9015.309ms
query_with_index: 8779.388ms
query_with_select: 4568.794ms
query_with_select_index: 4696.811ms
lean_query: 7694.718ms
lean_with_index: 7886.828ms
lean_with_select: 3654.518ms
lean_select_index: 5014.867ms
BACKUPS
Active
CLUSTER TIER
M30 (General)
REGION
GCP / Iowa (us-central1)
TYPE
Replica Set - 3 nodes
LINKED STITCH APP
None Linked
BI CONNECTOR
Disabled
adding 10000 users to the database
finished populating the database with 10000 users
default_query: 8268.799ms
query_with_index: 8933.502ms
query_with_select: 4740.234ms
query_with_select_index: 5457.168ms
lean_query: 9296.202ms
lean_with_index: 9111.568ms
lean_with_select: 4385.125ms
lean_select_index: 4812.982ms
These really don't show any significant difference (be aware than any difference may just be network noise).
Tests colocating the Mongo client and the mongo database instance
I created a docker container and ran it on Google's Cloud Run, in the same region (US Central1), the results are:
2019-12-30 11:46:06.814 AEDTfinished populating the database with 10000 users
2019-12-30 11:46:07.885 AEDTdefault_query: 1071.233ms
2019-12-30 11:46:08.917 AEDTquery_with_index: 1031.952ms
2019-12-30 11:46:09.375 AEDTquery_with_select: 457.659ms
2019-12-30 11:46:09.657 AEDTquery_with_select_index: 281.678ms
2019-12-30 11:46:10.281 AEDTlean_query: 623.417ms
2019-12-30 11:46:10.961 AEDTlean_with_index: 680.622ms
2019-12-30 11:46:11.056 AEDTlean_with_select: 94.722ms
2019-12-30 11:46:11.148 AEDTlean_select_index: 91.984ms
So while this doesn't give results as fast as running on my own machine - it does show that colocating the client and the database gives a very large performance improvement.
So the question again is - why is the improvement ~7000ms?
The test code:
(async () => {
try {
await mongoose.connect('mongodb://localhost:27017/perftest', {
useNewUrlParser: true,
useCreateIndex: true
})
await init()
// const query = { age: { $gt: 22 } }
const query = { favoriteFruit: 'potato' }
console.time('default_query')
await User.find(query)
console.timeEnd('default_query')
console.time('query_with_index')
await UserWithIndex.find(query)
console.timeEnd('query_with_index')
console.time('query_with_select')
await User.find(query)
.select({ name: 1, _id: 1, age: 1, email: 1 })
console.timeEnd('query_with_select')
console.time('query_with_select_index')
await UserWithIndex.find(query)
.select({ name: 1, _id: 1, age: 1, email: 1 })
console.timeEnd('query_with_select_index')
console.time('lean_query')
await User.find(query).lean()
console.timeEnd('lean_query')
console.time('lean_with_index')
await UserWithIndex.find(query).lean()
console.timeEnd('lean_with_index')
console.time('lean_with_select')
await User.find(query)
.select({ name: 1, _id: 1, age: 1, email: 1 })
.lean()
console.timeEnd('lean_with_select')
console.time('lean_select_index')
await UserWithIndex.find(query)
.select({ name: 1, _id: 1, age: 1, email: 1 })
.lean()
console.timeEnd('lean_select_index')
process.exit(0)
} catch (err) {
console.error(err)
}
})()
My best guess is that you're dealing with slow network throughput between your local machine and Atlas (something I've experienced myself this week - hence how I found this post!)
Looking at your local query performance:
default_query: 277.986ms
query_with_index: 262.886ms
The query with index isn't noticeably any faster than the one without. For an indexed query to take 262ms in a Node app with a local DB probably means that either:
The index isn't being used properly OR more likely...
You're returning quite a few results in the query. If the query returns say 3,000 results and each result is 1KB, that's 3MB of JSON data that your app needs to handle.
I've got a 150Mbit/s internet connection and yet my throughput to Atlas (M2 shared tier, if that makes a difference) fluctuates between around 1Mbit/s to 6Mbit/s.
On localhost I have a Mongo query that returns 2,400 results for a total of 1.7MB of JSON data. The roundtrip time for that query in my Node app (using console.time() like you did) connected to Mongo on the same local dev machine is ~150ms. But when connecting that local app to Atlas the query takes 2,400ms to 3,400ms to return. When I profiled the query on Atlas it only took 2ms to execute, so the query itself is really fast, it's apparently the data transfer that's slow.
Based on these results, I have a feeling that Atlas perhaps throttles throughput over the public internet (or just doesn't bother optimizing for it in their network) because 99% of apps are colocated in the same network region as their Atlas DB. That's the reason why they ask you to pick not just AWS, Azure, etc but your specific network region when creating a cluster.
UPDATE: I just ran a few Amazon EC2 speed tests for my network region (us-east-1) using a 3rd-party service and the average download speed was 4.5Mbit/s for smaller files (1KB to 128KB) and 41Mbit/s for larger files (256KB to 10MB). So the primary issue may be generally slow throughput on the EC2 instances that Atlas clusters run on rather than any throttling by Atlas, or perhaps a combination of both.
Usually, It takes a little bit of time for a request to propagate over the network. this depends on the connection speed, latency, and distance to the server and so many factors. but the server on your local computer doesn't face above mentioned issues as it is for a cloud environment.
But since you are confident about the max delay due to network propagations is ~200ms.
There may be several other possible reasons also to consider
Usually, sandbox plans are for testing and they have limited resources allocated to them.
They don't use SSD drives to store data and uses cheap storage solutions.
They assume that sandbox plans are usually just for exploring features.
Most of the times those instances are run on shared virtual machines.
Make sure there are no other services running on your computer which consumes a higher data rate eg :( torrent applications )
Cloud services depend on a variety of metrics like System Availability, Response Time, Throughput, Latency and many more...
If the average response time of the user base and the data centers is located in the same region then the average overall response time is about 50ms but if located in the different region the response time significantly increases from 200ms - 400ms which can also depend upon the type of instance you're using and the region which you choose.
Since you're using the Atlas Sandbox cluster you must first select the nearest region to avoid poor performance issues as Atlas Sandbox clusters do have it's own limitations. If you're looking for quick response time and faster performance try to upgrade your instance.
If you are sure that it's not about network issues like latency and bandwidth vs response size, then it's either low edge host (non-SSD, low RAMs) or misconfigured web server/proxy, or there is throttling/filtering happening to your traffic.
To narrow it down more use encrypted (https) connection (it's easy, just install letsencrypt on your server) and try to use VPN to change your network route.
Also you can try running the script directly on the server to measure actual executing performance.
Of course you have to consider that your network delay is for each request to the cloud instance , so if you have a ping time of +30ms , you will take 30ms more for each query (approximately) , moreover if your instance is a sandbox ( free account https://docs.atlas.mongodb.com/tutorial/deploy-free-tier-cluster/ ) you will have a poor and shared CPU/RAM.
This is why your mongo db queries are slow.
Making a system faster in production is one of the design goals
We need to take into the account many variables:
Networking, for example, VPC/subnetting
MongoDB Storage (SSD)
MongoDB Indexes
MongoDB RAM, CPU
Node Web Servers or Cluster
Cluud Tenants
TLS encryption
You may need to discard each and every single possible bottleneck
I believe this is more of a MongoDB question than a Meteor question, so don't get scared if you know a lot about mongo but nothing about meteor.
Running Meteor in development mode, but connecting it to an external Mongo instance instead of using Meteor's bundled one, results in the same problem. This leads me to believe this is a Mongo problem, not a Meteor problem.
The actual problem
I have a meteor project which continuosly gets data added to the database, and displays them live in the application. It works perfectly in development mode, but has strange behaviour when built and deployed to production. It works as follows:
A tiny script running separately collects broadcast UDP packages and shoves them into a mongo collection
The Meteor application then publishes a subset of this collection so the client can use it
The client subscribes and live-updates its view
The problem here is that the subscription appears to only get data about every 10 seconds, while these UDP packages arrive and gets shoved into the database several times per second. This makes the application behave weird
It is most noticeable on the collection of UDP messages, but not limited to it. It happens with every collection which is subscribed to, even those not populated by the external script
Querying the database directly, either through the mongo shell or through the application, shows that the documents are indeed added and updated as they are supposed to. The publication just fails to notice and appears to default to querying on a 10 second interval
Meteor uses oplog tailing on the MongoDB to find out when documents are added/updated/removed and update the publications based on this
Anyone with a bit more Mongo experience than me who might have a clue about what the problem is?
For reference, this is the dead simple publication function
/**
* Publishes a custom part of the collection. See {#link https://docs.meteor.com/api/collections.html#Mongo-Collection-find} for args
*
* #returns {Mongo.Cursor} A cursor to the collection
*
* #private
*/
function custom(selector = {}, options = {}) {
return udps.find(selector, options);
}
and the code subscribing to it:
Tracker.autorun(() => {
// Params for the subscription
const selector = {
"receivedOn.port": port
};
const options = {
limit,
sort: {"receivedOn.date": -1},
fields: {
"receivedOn.port": 1,
"receivedOn.date": 1
}
};
// Make the subscription
const subscription = Meteor.subscribe("udps", selector, options);
// Get the messages
const messages = udps.find(selector, options).fetch();
doStuffWith(messages); // Not actual code. Just for demonstration
});
Versions:
Development:
node 8.9.3
mongo 3.2.15
Production:
node 8.6.0
mongo 3.4.10
Meteor use two modes of operation to provide real time on top of mongodb that doesn’t have any built-in real time features. poll-and-diff and oplog-tailing
1 - Oplog-tailing
It works by reading the mongo database’s replication log that it uses to synchronize secondary databases (the ‘oplog’). This allows Meteor to deliver realtime updates across multiple hosts and scale horizontally.
It's more complicated, and provides real-time updates across multiple servers.
2 - Poll and diff
The poll-and-diff driver works by repeatedly running your query (polling) and computing the difference between new and old results (diffing). The server will re-run the query every time another client on the same server does a write that could affect the results. It will also re-run periodically to pick up changes from other servers or external processes modifying the database. Thus poll-and-diff can deliver realtime results for clients connected to the same server, but it introduces noticeable lag for external writes.
(the default is 10 seconds, and this is what you are experiencing , see attached image also ).
This may or may not be detrimental to the application UX, depending on the application (eg, bad for chat, fine for todos).
This approach is simple and and delivers easy to understand scaling characteristics. However, it does not scale well with lots of users and lots of data. Because each change causes all results to be refetched, CPU time and network bandwidth scale O(N²) with users. Meteor automatically de-duplicates identical queries, though, so if each user does the same query the results can be shared.
You can tune poll-and-diff by changing values of pollingIntervalMs and pollingThrottleMs.
You have to use disableOplog: true option to opt-out of oplog tailing on a per query basis.
Meteor.publish("udpsPub", function (selector) {
return udps.find(selector, {
disableOplog: true,
pollingThrottleMs: 10000,
pollingIntervalMs: 10000
});
});
Additional links:
https://medium.baqend.com/real-time-databases-explained-why-meteor-rethinkdb-parse-and-firebase-dont-scale-822ff87d2f87
https://blog.meteor.com/tuning-meteor-mongo-livedata-for-scalability-13fe9deb8908
How to use pollingThrottle and pollingInterval?
It's a DDP (Websocket ) heartbeat configuration.
Meteor real time communication and live updates is performed using DDP ( JSON based protocol which Meteor had implemented on top of SockJS ).
Client and server where it can change data and react to its changes.
DDP (Websocket) protocol implements so called PING/PONG messages (Heartbeats) to keep Websockets alive. The server sends a PING message to the client through the Websocket, which then replies with PONG.
By default heartbeatInterval is configure at little more than 17 seconds (17500 milliseconds).
Check here: https://github.com/meteor/meteor/blob/d6f0fdfb35989462dcc66b607aa00579fba387f6/packages/ddp-client/common/livedata_connection.js#L54
You can configure heartbeat time in milliseconds on server by using:
Meteor.server.options.heartbeatInterval = 30000;
Meteor.server.options.heartbeatTimeout = 30000;
Other Link:
https://github.com/meteor/meteor/blob/0963bda60ea5495790f8970cd520314fd9fcee05/packages/ddp/DDP.md#heartbeats
Working with express(nodejs), mongoose, and Azure Cosmos DB to return objects.
When I connect to my local mongodb, the following code correctly returns a list of commit objects that exist in local mongodb.
Commit
.find({}, function(err, commits) {
if (err) {
res.render('search/index', {});
} else {
res.json(commits);
}
});
However, when connecting to Azure Cosmos DB using a PRIMARY CONNECTION STRING shown on my Azure portal website, the code just returns an empty list.
I checked that the mongoose.connection.readyState value is 1.
In addition, I can connect to the Azure Cosmos DB using Robo 3T.
Mongoose was designed to work with MongoDB. If your local testing with a real MongoDB server yields the expected result, then the fault is unlikely to be in mongoose or your code. Since CosmosDB only attempts to mimic MongoDB's API, there is no guarantee that it will work the same way. In your case, apparently it doesn't.
Being able to connect to CosmosDB using tools designed to work with MongoDB doesn't necessarily mean that CosmosDB will return the correct result.
If you require a cloud-based MongoDB deployment, using MongoDB Atlas is likely the best solution at this point in time.
Finally I could solve this problem by myself.
The latest version (v3.1.1) of the library doesn't work for connecting to Azure Cosmos DB.
You should use mongodb 2.2.33.
I found the solution from a comment on https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb-samples.
I'm trying to write a test to test a method that connects to mongo, but I don't actually want to have to have mongo running and actually make a connection to it to have my tests pass successfully.
Here's my current test which is successful when my mongo daemon is running.
describe('with a valid mongo string parameter', function() {
it('should return a rejected promise', function(done) {
var con = mongoFactory.getConnection('mongodb://localhost:27017');
expect(con).to.be.fulfilled;
done();
});
});
mongoFactory.getConnection code:
getConnection: function getConnection(connectionString) {
// do stuff here
// Initialize connection once
MongoClient.connect(connectionString, function(err, database) {
if (err) {
def.reject(err);
}
def.resolve(database);
});
return def.promise;
}
There are a couple of SO answers related to unit testing code that uses MongoDB as a data store:
Mocking database in node.js?
Mock/Test Mongodb Database Node.js
Embedded MongoDB when running integration tests
Similar: Unit testing classes that have online functionality
I'll make an attempt at consolidating these solutions.
Preamble
First and foremost, you should want MongoDB to be running while performing your tests. MongoDB's query language is complex, so running legitimate queries against a stable MongoDB instance is required to ensure your queries are running as planned and that your application is responding properly to the results. With this in mind, however, you should never run your tests against a production system, but instead a peripheral system to your integration environment. This can be on the same machine as your CI software, or simply relatively close to it (in terms of process, not necessarily network or geographically speaking).
This ENV could be low-footprint and completely run in memory (resource 1) (resource 2), but would not necessarily require the same performance characteristics as your production ENV. (If you want to performance test, this should be handled in a separate environment from your CI anyway.)
Setup
Install a mongod service specifically for CI. If repl sets and/or sharding are of concern (e.g. write concern, no use of $isolated, etc.), it is possible to mimic a clustered environment by running multiple mongod instances (1 config, 2x2 data for shard+repl) and a mongos instance on the same machine with either some init.d scripts/tweaks or something like docker.
Use environment-specific configurations within your application (either embedded via .json files, or in some place like /etc, /home/user/.your-app or similar). Your application can load these based on a node environment variable like NODE_ENV=int. Within these configurations your db connection strings will differ. If you're not using env-specific configs, start doing this as a means to abstract the application runtime settings (i.e. "local", "dev", "int", "pre", "prod", etc.). I can provide a sample upon request.
Include test-oriented fixtures with your application/testing suite. As mentioned in one of the linked questions, MongoDB's Node.js driver supports some helper libraries: mongodb-fixtures and node-database-cleaner. Fixtures provide a working and consistent data set for testing: think of them as a bootstrap.
Builds/Tests
Clean the associated database using something like node-database-cleaner.
Populate your fixtures into the now empty database with the help of mongodb-fixtures.
Perform your build and test.
Repeat.
On the other hand...
If you still decide that not running MongoDB is the correct approach (and you wouldn't be the only one), then abstracting your data store calls from the driver with an ORM is your best bet (for the entire application, not just testing). For example, something like model claims to be database agnostic, although I've never used it. Utilizing this approach, you would still require fixtures and env configurations, however you would not be required to install MongoDB. The caveat here is that you're at the mercy of the ORM you choose.
You could try tingodb.
TingoDB is an embedded JavaScript in-process filesystem or in-memory database upwards compatible with MongoDB at the API level.