Interacting with deepstream.io records server side

Interacting with deepstream.io records server side - node.js

I've been doing some reading up on deepstream.io and so far I've discovered the following:
All records are stored in the same table (deepstream_records by default)
To interact with this data, the client can be used, both client side (browser) and server side (node), but should not be used on the server side (node).
Questions:
How should I interact with records from the server side?
Is there something stopping me from changing records in the database?
Would changes to records in the database update client subscriptions?
Would this be considered bad practice?
Why are all the records stored in the same table?
Data example from RethinkDB:
{
"_d": { },
"_v": 0,
"ds_id": "users/"
}, {
"_d": { },
"_v": 0,
"ds_id": "users/admin"
}

Why are all the records stored in the same table?
server.set( 'storage', new RethinkDBStorageConnector( {
port: 5672,
host: 'localhost' ,
/* (Optional) A character that's used as part of the
* record names to split it into a tabel and an id part, e.g.
*
* books/dream-of-the-red-chamber
*
* would create a table called 'books' and store the record under the name
* 'dream-of-the-red-chamber'
*/
splitChar: '/'
}));
server.start();
Did you not mention the splitChar? ( It doesn't default )
How should I interact with records on the server side?
To interact with this data you would create a node client that connects to your server using tcp ( default port 6021 ). The server itself is a very efficient message broker that can distribute messages with low latency, and our recommendation is to not include any custom code that isn't necassary, even when using permissioning and dataTransforms.
https://deepstream.io/tutorials/core/transforming-data
You can see this explained in the FX provider example in the tutorials:
https://deepstream.io/tutorials/core/active-data-providers
And the tank game tutorial example:
https://github.com/deepstreamIO/ds-tutorial-tanks
Is there something stopping me from changing records in the database?
deepstream maintains its low latency by actually doing all the writes/reads to the cache. Writing to the database happens secondary in order not to introduce a hit. Because of this changing the record directly will not actually notify any of the users, as well as break some logic used for merge handling...

Related

Why is a test node.js app slow compared to running a query in the Astra CQL console?

I made an test account in datastax (https://astra.datastax.com/) and want to test cassandra.
In there homepage is an cqlsh console. If I select datas is goes very fast maybe 1ms.
If I use it with nodejs and cassandra driver it takes 2-3 seconds. And I have only ONE row.
Why it takes time? Its my code fault?
const { Client } = require("cassandra-driver");
async function run() {
const client = new Client({
cloud: {
secureConnectBundle: "secure-connect-weinf.zip",
},
keyspace: 'wf_db',
credentials: {
username: "admin",
password: "password",
},
});
await client.connect();
// Execute a query
const rs = await client.execute("SELECT * FROM employ_by_id;");
console.log(`${rs}`);
await client.shutdown();
}
// Run the async function
run();

Unfortunately, it's not an apples-for-apples comparison.
Every time your app connects to a Cassandra cluster (Astra or otherwise), the driver executes these high-level steps:
Unpack the secure bundle to get cluster info
Open a TCP connection over the internet
Create a control connection to one of the nodes in the cluster
Obtain schema from the cluster using the control connection
Discover the topology of the cluster using the control connection
Open connections to the nodes in the cluster
Compute query plan (list of hosts to connect to based on load-balancing policy)
And finally, run the query
In contrast when you access the CQL Console on the Astra dashboard, the UI automatically connects + authenticates to the cluster and when you type a CQL statement it goes through the following steps:
Skipped (you're already authenticated to the cluster)
Skipped (it's already connected to a node within the same local VPC)
Skipped (already connected to cluster)
Skipped (already connected to cluster)
Skipped (already connected to cluster)
Skipped (already connected to cluster)
Skipped (already connected to cluster)
And finally, run the query
As you can see, the CQL Console does not have the same overhead as running an app repeatedly which only has 1 CQL statement in it.
In reality, your app will be reusing the same cluster session to execute queries throughout the life of the app so it doesn't have the same overhead as just re-running the app you have above. The initialisation phase (steps 1 to 6 above) are only done when the app is started. Once it's already running, it only has to do steps 7 and 8. Cheers!

Syncing app state with clients using socketio

I'm running a node server with SocketIO which keeps a large object (app state) that is updated regularly.
All clients receive the object after connecting to the server and should keep it updated in real-time using the socket (read-only).
Here's what I have considered:
1:
Emit a delta of changes to the clients using diff after updates
(requires dealing with the reability of delivery and lost updates)
2:
Use the diffsync package (however it allows clients to push changes to the server, but I need updates to be unidirectional, i.e. server-->clients)
I'm confident there should be a readily available solution to deal with this but I was not able to find a definitive answer.

The solution is very easy. You must modify the server so that it accepts updates only from trusted clients.
let Server = require('diffsync').Server;
let receiveEdit = Server.prototype.receiveEdit
Server.receiveEdit = function(connection, editMessage, sendToClient){
if(checkIsTrustedClient(connection))
receiveEdit.call(this, connection, editMessage, sendToClient)
}
but
// TODO: implement backup workflow
// has a low priority since `packets are not lost` - but don't quote me on that :P
console.log('error', 'patch rejected!!', edit.serverVersion, '->',
clientDoc.shadow.serverVersion, ':',
edit.localVersion, '->', clientDoc.shadow.localVersion);
Second option is try find another solution based on jsondiffpatch

Meteor MongoDB subscription delivering data in 10 second intervals instead of live

I believe this is more of a MongoDB question than a Meteor question, so don't get scared if you know a lot about mongo but nothing about meteor.
Running Meteor in development mode, but connecting it to an external Mongo instance instead of using Meteor's bundled one, results in the same problem. This leads me to believe this is a Mongo problem, not a Meteor problem.
The actual problem
I have a meteor project which continuosly gets data added to the database, and displays them live in the application. It works perfectly in development mode, but has strange behaviour when built and deployed to production. It works as follows:
A tiny script running separately collects broadcast UDP packages and shoves them into a mongo collection
The Meteor application then publishes a subset of this collection so the client can use it
The client subscribes and live-updates its view
The problem here is that the subscription appears to only get data about every 10 seconds, while these UDP packages arrive and gets shoved into the database several times per second. This makes the application behave weird
It is most noticeable on the collection of UDP messages, but not limited to it. It happens with every collection which is subscribed to, even those not populated by the external script
Querying the database directly, either through the mongo shell or through the application, shows that the documents are indeed added and updated as they are supposed to. The publication just fails to notice and appears to default to querying on a 10 second interval
Meteor uses oplog tailing on the MongoDB to find out when documents are added/updated/removed and update the publications based on this
Anyone with a bit more Mongo experience than me who might have a clue about what the problem is?
For reference, this is the dead simple publication function
/**
* Publishes a custom part of the collection. See {#link https://docs.meteor.com/api/collections.html#Mongo-Collection-find} for args
*
* #returns {Mongo.Cursor} A cursor to the collection
*
* #private
*/
function custom(selector = {}, options = {}) {
return udps.find(selector, options);
}
and the code subscribing to it:
Tracker.autorun(() => {
// Params for the subscription
const selector = {
"receivedOn.port": port
};
const options = {
limit,
sort: {"receivedOn.date": -1},
fields: {
"receivedOn.port": 1,
"receivedOn.date": 1
}
};
// Make the subscription
const subscription = Meteor.subscribe("udps", selector, options);
// Get the messages
const messages = udps.find(selector, options).fetch();
doStuffWith(messages); // Not actual code. Just for demonstration
});
Versions:
Development:
node 8.9.3
mongo 3.2.15
Production:
node 8.6.0
mongo 3.4.10

Meteor use two modes of operation to provide real time on top of mongodb that doesn’t have any built-in real time features. poll-and-diff and oplog-tailing
1 - Oplog-tailing
It works by reading the mongo database’s replication log that it uses to synchronize secondary databases (the ‘oplog’). This allows Meteor to deliver realtime updates across multiple hosts and scale horizontally.
It's more complicated, and provides real-time updates across multiple servers.
2 - Poll and diff
The poll-and-diff driver works by repeatedly running your query (polling) and computing the difference between new and old results (diffing). The server will re-run the query every time another client on the same server does a write that could affect the results. It will also re-run periodically to pick up changes from other servers or external processes modifying the database. Thus poll-and-diff can deliver realtime results for clients connected to the same server, but it introduces noticeable lag for external writes.
(the default is 10 seconds, and this is what you are experiencing , see attached image also ).
This may or may not be detrimental to the application UX, depending on the application (eg, bad for chat, fine for todos).
This approach is simple and and delivers easy to understand scaling characteristics. However, it does not scale well with lots of users and lots of data. Because each change causes all results to be refetched, CPU time and network bandwidth scale O(N²) with users. Meteor automatically de-duplicates identical queries, though, so if each user does the same query the results can be shared.
You can tune poll-and-diff by changing values of pollingIntervalMs and pollingThrottleMs.
You have to use disableOplog: true option to opt-out of oplog tailing on a per query basis.
Meteor.publish("udpsPub", function (selector) {
return udps.find(selector, {
disableOplog: true,
pollingThrottleMs: 10000,
pollingIntervalMs: 10000
});
});
Additional links:
https://medium.baqend.com/real-time-databases-explained-why-meteor-rethinkdb-parse-and-firebase-dont-scale-822ff87d2f87
https://blog.meteor.com/tuning-meteor-mongo-livedata-for-scalability-13fe9deb8908
How to use pollingThrottle and pollingInterval?

It's a DDP (Websocket ) heartbeat configuration.
Meteor real time communication and live updates is performed using DDP ( JSON based protocol which Meteor had implemented on top of SockJS ).
Client and server where it can change data and react to its changes.
DDP (Websocket) protocol implements so called PING/PONG messages (Heartbeats) to keep Websockets alive. The server sends a PING message to the client through the Websocket, which then replies with PONG.
By default heartbeatInterval is configure at little more than 17 seconds (17500 milliseconds).
Check here: https://github.com/meteor/meteor/blob/d6f0fdfb35989462dcc66b607aa00579fba387f6/packages/ddp-client/common/livedata_connection.js#L54
You can configure heartbeat time in milliseconds on server by using:
Meteor.server.options.heartbeatInterval = 30000;
Meteor.server.options.heartbeatTimeout = 30000;
Other Link:
https://github.com/meteor/meteor/blob/0963bda60ea5495790f8970cd520314fd9fcee05/packages/ddp/DDP.md#heartbeats

Pusher Account over quota

We use Puhser in our application in order to have real-time updates.
Something very stange happens - while google analytics says that we have around 200 simultaneous connections, Pusher says that we have 1500.
I would like to monitor Pusher connections in real-time but could not find any method to do so. Somebody can help??

Currently there's no way to get realtime stats on the number of connections you currently have open for your app. However, it is something that we're investigating currently.
In terms of why the numbers vary between Pusher and Google Analytics, it's usually down to the fact that Google Analytics uses different methods of tracking whether or not a user is on the site. We're confident that our connection counting is correct, however, that's not to say that there isn't a potentially unexpected reason for your count to be high.
A connection is counted as a WebSocket connection to Pusher. When using the Pusher JavaScript library a new WebSocket connection is created when you create a new Pusher instance.
var pusher = new Pusher('APP_KEY');
Channel subscriptions are created over the existing WebSocket connection (known as multiplexing), and do not count towards your connection quota (there is no limit on the number allowed per connection).
var channel1 = pusher.subscribe('ch1');
var channel2 = pusher.subscribe('ch2');
// All done over as single connection
// more subscriptions
// ...
var channel 100 = pusher.subscribe('ch100');
// Still just a 1 connection
Common reasons why connections are higher than expected
Users open multiple tabs
If a user has multiple tabs open to the same application, multiple instances of Pusher will be created and therefore multiple connections will be used e.g. 2 tabs open will mean 2 connections are established.
Incorrectly coded applications
As mentioned above, a new connection is created every time a new Pusher object is instantiated. It is therefore possible to create many connections in the same page.
Using an older version of one our libraries
Our connection strategies have improved over time, and we recommend that you keep up to date with the latest versions.
Specifically, in newer versions of our JS library, we carry out ping-pong requests between server and client to verify that the client is still around.
Other remedies
While our efforts are always to keep a connection going indefinitely to an application, it is possible to disconnect manually if you feel this works in your scenario. It can be achieved by making a call to Pusher.disconnect(). Below is some example code:
var pusher = new Pusher("APP_KEY");
var timeoutId = null;
function startInactivityCheck() {
timeoutId = window.setTimeout(function(){
pusher.disconnect();
}, 5 * 60 * 1000); // called after 5 minutes
};
// called by something that detects user activity
function userActivityDetected(){
if(timeoutId !== null) {
window.clearTimeout(timeoutId);
}
startInactivityCheck();
};
How this disconnection is transmitted to the user is up to you but you may consider prompting them to let them know that they will not receive any further real-time updates due to a long period of inactivity. If they wish to start receiving real-time updates again they should click a button.

Trouble getting Mongoose to reconnect to nodes and send requests to secondaries

A common connection string for mongoose connecting to a replica set is something like follows
var connection = mongoose.createConnection("mongodb://db_1:27017/client_test,mongodb://db_2:27017/client_test", {
replSet : { rs_name : "rs0", poolSize : 5, socketOptions : { keepAlive : 1 } }
}, function(err) {
if (err) { throw err; }
});
The problem with that is if one of the two hosts is down, then it will fail to connect. If you only specify one host, then no requests end up getting sent to secondaries.
Here's my proof for that claim. If you specify one host, and setup your replica set so that there is one primary and an arbiter and then perform a query such as
myApi.find({}).slaveOk().read("s").exec(function(err, docs) {
console.log(docs)
})
It will return results. Well, since I am specifying "s" (secondary), this query should throw an error because there are no running secondaries. In addition, if you bring the secondary online and then do db.currentOp(true), you will never see any actual queries sent it's way.
The moment you alter the connection string to specify every host then you will see connections go to the secondary. The dilemma is that now, because you had to specify the additional host in the connection string, in the event a secondary was offline, it would fail to connect and we've now lost failover (or the entire point to replica sets)
I can't determine if this is a configuration mistake on my part, a bug in Mongoose, or a conceptual flaw in my understanding of the way replica sets function. From some of the docs, they seem to state that reading from secondaries is basically a bad idea, but the reason for doing so is usually issues with stale data. My issue doesn't have anything to do with stale date, I can't figure out a way to setup the system so that I can get queries to secondaries without losing failover capacity.

1.connection string just defines seed servers, mongodb driver tries to connect to these servers and get information about other servers in replicaSet( by calling rs.status()). You could have replicaSet with 5 nodes, but specify only one in connection string, but driver would be able to find four others if server from connection string is available.
2.My proposal is to use secondaryPreferred instead of just secondary, so that in case there is no secondary available, request would be done to primary.

Ok, I believe I have solved all of my problems. Here is what I learned.
Specify all possible replica nodes in your connection string, otherwise Mongoose will never send requests there. Mongoose has a specific format for this which is different than the node-mongodb-native driver. Example below.
In order to prevent it from hanging forever if one of the nodes is down on bootup you need to specify connectTimeoutMS in the 'replset' options, then it will only wait that long for responses from each nodes on initial connection. If the node comes online at a later date, it will still be available.
The host name entries in your mongodb replica setup need to match the hostname entries in the connection string from your application and all hostnames need to be accessible from all parties (mongo to mongo and application to mongo). In my case I had aliased the hostnames from mongo to mongo as mongo1:27017, mongo2:27017, and mongo3:27017. My application server used a connection string with IPs. Mongoose was attempting to re-initate the connection using the mongo1:27017 hostname (which my application server could not reach) rather than the IP address I specified in the connection string. This resulted in it never re-connecting to a node it lost contact with. It is possible had I used hostnames that the application could reach it still would have worked, but I think it's a best practice to make the connection string and the replica setup identical to remove possibly places for error.
On the mongodb node that you rs.initiate() you might need to update the hostname to be a value that all boxes (other mongodbs and application server can reach). By default it will likely end up with a hostname like localhost, which means something different on each machine. This can be from that boxes mongo shell like so.
Example:
// from mongo shell
conf = rs.conf()
conf.members[0].host = "mongo1:27017"
rs.reconfig(conf)
Final functioning connection string which successfully fails over between nodes, including throwing errors if a query is destined for a secondary but there aren't secondaries.
var connection = mongoose.createConnection("mongodb://mongo1:27017/client_test,mongo2:27017/client_test,mongo3:27017/client_test", {
replset : { rs_name : "rs0", poolSize : 5, socketOptions : { keepAlive : 1, connectTimeoutMS : 1000 } },
}, function(err) {
if (err) { throw err; }
});
Working replica setup
{
"_id" : "rs0",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "mongo1:27017"
},
{
"_id" : 1,
"host" : "mongo2:27017"
},
{
"_id" : 2,
"host" : "mongo3:27017",
"arbiterOnly" : true
}
]
}

I had some issue similar to yours while dealing with replica, in my case I had 1 primary node with a priority of 10, 1 secondary priority of 0(for analytics) and an arbiter.
My writes would fail after reconnecting the primary instance and I went through a lot trying to fix it here's the most important thing I learnt:
When my primary is down or unreacheable, there has to be another member eligible to become primary.(At least 2members in my set has to have a priority >= 1).
If I have only arbiters, hidden, or members with a priority of 0,
queries will get stuck even after I reconnect my primary, my client is
unable to complete write queries. Read queries would still work, but
write wouldn't.
This is what I faced with mongoose, even with keepalive, autoreconnect and all the socket and connection timeout MS set.
Hopefully this helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string