How to efficiently sync Apollo's cache using subscriptions and AWS AppSync - node.js

I'm using aws-appsync in a Node.js client to keep a cached list of data items. This cache must be available at all times, including when not connected to the internet.
When my Node app starts, it calls a query which returns the entire list of items from the AppSync data source. This is cached by Apollo's cache storage, which allows future queries (using the same GraphQL query) to be made using only the cache.
The app also makes a subscription to the mutations which are able to modify the list on other clients. When an item in the list is changed, the new data is sent to the app. This can trigger the original query for the entire list to be re-fetched, thus keeping the cache up to date.
Fetching the entire list when only one item has changed is not efficient. How can I keep the cache up to date, while minimising the amount of data that has to be fetched on each change?
The solution must provide a single point to access cached data. This can either be a GraphQL query or access to the cache store directly. However, using results from multiple queries is not an option.
The Apollo documentation hints that this should be possible:
In some cases, just using [automatic store updates] is not enough for your application ... to update correctly. For example, if you want to add something to a list of objects without refetching the entire list ... Apollo Client cannot update existing queries for you.
The alternatives it suggests are refetching (essentially what I described above) and using an update callback to manually update the cached query results in the store.
Using update gives you full control over the cache, allowing you to make changes to your data model in response to a mutation in any way you like. update is the recommended way of updating the cache after a query.
However, here it is referring to mutations made by the same client, rather than syncing using between clients using subscriptions. The update callback option doesn't appear to be available to a subscription (which provides the updated item data) or a query (which could fetch the updated item data).

As long as your subscription includes the full resource that was added, it should be possible by reading from and writing to the cache directly. Let's assume we have a subscription like this one from the docs:
const COMMENTS_SUBSCRIPTION = gql`
subscription onCommentAdded {
commentAdded {
id
content
}
}
`;
The Subscription component includes a onSubscriptionData prop, so we should be able to do something along these lines:
<Subscription
subscription={COMMENTS_SUBSCRIPTION}
onSubscriptionData={({ client, subscriptionData: { data, error } }) => {
if (!data) return
const current = client.readQuery({ query: COMMENTS_QUERY })
client.writeQuery({
query: COMMENTS_QUERY,
data: {
comments: [...current.comments, data.commentAdded],
},
})
}}
/>
Or, if you're using plain JavaScript instead of React:
const observable = client.subscribe({ query: COMMENTS_SUBSCRIPTION })
observable.subscribe({
next: (data) => {
if (!data) return
const current = client.readQuery({ query: COMMENTS_QUERY })
client.writeQuery({
query: COMMENTS_QUERY,
data: {
comments: [...current.comments, data.commentAdded],
},
})
},
complete: console.log,
error: console.error
})

Related

Use an array of values to query Firestore and setup a snapshot listener

Here is my problem:
I have a firestore collection that has a number of documents. There are about 500 documents generated/updated every hour and saved to the collection.
I would like to query the collection and setup a real-time snapshot listener for a subset of document IDs, that are provided by the client.
I think maybe I could to something like this (this syntax is likely not correct...just trying to get a feel for if it's even possible...but isn't the "in" limited to an array of 10 items? ):
const subbedDocs = ["doc1","doc2","doc3","doc4","doc5"]
docsRef.where('docID', 'in', subbedDocs).onSnapshot((doc) => {
handleSnapshot(doc);
});
I'm sorry, that code probably doesn't make sense....I'm still trying to learn all the ins and outs of Firestore.
Essentially, what I am trying to do is take an array of ID's and setup a .onSnapshot listener for those ID's. This list of IDs could be upwards of 40-50 items. Is this even possible? I am trying to avoid just setting up a listener on the whole collection and filtering out things I am not "subscribed" too as that seems wasteful from a resources perspective.
If you have the doc IDs in your array (it looks like you have) you can loop over them and start a listener during that:
const subbedDocs = ["doc1", "doc2", "doc3", "doc4", "doc5"];
for (let i = 0; i < subbedDocs.length; i++) {
const docID = subbedDocs[i];
docsRef.doc(docID).onSnapshot((doc) => {
handleSnapshot(doc);
});
}
It would be better to listen to a query and all filtered docs at once. But if you want to listen to each of them with a explicit listener that would do the trick.
As you've discovered, Firestore's in operator only allows up to 10 entries in the array. I'm also guessing you've added the docID as a field in the document, since I don't believe 'docID references the actual documentid.
I would not take this approach, because of the 10-entry limitation. What I would do is, as the client is selecting documents to follow, set a field (same in each document) to a unique Id for the client, so your query completely avoids the limitation. You can allow an unlimited number of Client listeners (up to implementation limits of Firestore) if you add that client ID into an array (called something like "ListenerArray") [again, as the client is selecting them]. Your query would be more like:
docsRef.where('ListenerArray', 'array-contains', clientID).onSnapshot((doc) => {
handleSnapshot(doc);
})
array-contains checks a single value against all entries in a document array, without limit. Every client can mark any number of documents to subscribe to.

Nodejs, MongoDB concurrent request creates duplicate record

Let me be real simple. I am running node.js server. When I receive data from patch request, first I need to check in database. If the row exists I will update it, otherwise I will create that record. Here is my code. This is what I am calling in the request handler callback.
let dbCollection = db$master.collection('users');
createUser(req.body).then(user => {
dbCollection.updateMany(
{ rmisId: req.body.rmisId },
{ $set: { ...user } },
{ upsert: true }
).then((result) => {
log.debug("RESULTS", user);
return result;
})
.catch((err => {
log.debug(err);
return err;
}));
})
This is working fine in sequential requests. But its creating duplicate record when I receive 10 concurrent request. I am running on my local machine and replicating concurrent request using Apache JMeter. Please help me if you have experienced this kind of problem.
Thank you !
UPDATE
I have tested another approach that reads the database like dbCollection.find({rmisId: req.body.rmisId}) the database for determine its existing or no. But it has no difference at all.
You cannot check-and-update. Mongodb operations are atomic at the document level. After you check and see that the record does not exist, another request may create the document you just checked, and after that you can recreate the same record if you don't have unique indexes or if you're generating the IDs.
Instead, you can use upsert, as you're already doing, but without the create. It looks like you're getting the ID from the request, so simply search using that ID, and upsert the user record. That way if some other thread inserts it before you do, you'll update what the previous thread inserted. If this is not something you prefer, add a unique index for that user ID field.

Which HTTP Method to Choose When Building Restful API

I am new to node.js and have my first node.js Restful API built in hapi.js framework. All the services do is basically doing database query. An example of the services is like this:
let myservice = {
method: "POST",
path: "/updateRule",
config: {
handler: (request, reply) => {
updateRule(request.payload)
.then((result) => {
reply(successResponse(request, result));
})
.catch((err) => reply(failResponse(request, err)).code(500));
},
validate: {
payload: {
ruleId: joi.number().required(),
ruleName: joi.string().required(),
ruleDesc: joi.string().required()
}
},
auth: "jwt",
tags: ["api", "a3i"]
},
}
updateRule(input): Promise<any> {
return new Promise((resolve, reject) => {
let query = `select a3i.update_rule(p_rul_id := ${input.ruleId}, p_rul_name := '${input.ruleName}', p_rul_desc := '${input.ruleDesc}')`;
postgresQuery(lbPostgres, query, (data, commit, rollback) => {
try {
let count = data.rows[0].update_rule.count;
if (count === 1) {
let ruleId = data.rows[0].update_rule.result[0];
let payload: SuccessPayload = {
type: "string",
content: `Rule ${ruleId} has been updated`
};
commit();
resolve(payload);
} else {
let thisErr = new Error("No rule can be found.");
thisErr.name = "4003";
throw thisErr;
}
}
catch (err) {
rollback();
if (err.name === "4003") {
reject(detailError(4003, err.message));
} else {
reject(detailError(4001, err.message));
}
}
}, reject);
});
}
As you can see, when the service is called, it evokes a database call (query) and updates specified row in database table. Similarly, I have other services named createRule/deleteRule creating/deleting records in database table. In my opinion, the difference between the services is doing different database query. I read this post PUT vs. POST in REST but couldn't see any difference of POST and PUT in my case.
Here are my questions:
What HTTP method should I used in this case?
Most of Restful API examples (for example https://www.codementor.io/olatundegaruba/nodejs-restful-apis-in-10-minutes-q0sgsfhbd) use the same URL with different HTTP methods to do different operations on same "resource", which in my opinion is usually a database table. What's the benefit of this architecture compared with my practice in which one URL only has one HTTP method and only do one type of operation?
I know this question does not refer to a problem and is not specific. Some people may give it a down-vote. But as a beginner I really want to know what's a typical Restful API and make sure my API is "best practice". Please help!
If the resource already exists and thus you have a specific URI to that exact resource and you want to update it, then use PUT.
If the resource does not exist yet and you want to create it and you will let the server pick the URI that represents that new resource, then use POST and the POST URI will be a generic "create new resource" URI, not a URI to a specific resource and it will create the URI that represents that resource.
You can also use PUT to create a new resource if the caller is going to create the resource URI that represents the new resource. In that case, you would just PUT to that new resource and, if a resource with that URI already exists, it would be updated, if not, it would be created.
You do not have to support both. You can decide to make your api work in a way that you just use one or the other.
In your specific case, an update of a specific row in your database that already exists would pretty much always be a PUT because it already exists so you're doing a PUT to a specific URI that represents that row.
What's the benefit of this architecture compared with my practice in which one URL only has one HTTP method and only do one type of operation?
It's really up to you how you want to present your API. The general concept behind REST is that you have several components:
resource identifier
data
method
In some cases, the method can be subsumed by GET, PUT, POST or DELETE so you just need the resource identifier, data and GET, PUT, POST or DELETE.
In other cases or other designs, the method is more detailed than can be expressed in just a PUT or POST, so you have a method actually in the URL in which case, you may not need the distinction between PUT and POST as much.
For example, an action might be "buy". While you could capture that in a POST where the method is implied by the rest of the URL, you may want to actually POST to a URL that has a method in it: /buy for clarity and then you may use that same endpoint prefix with other methods such as /addToCart, etc... It really depends upon what the objects are in your REST design and what operations you want to surface on them. Sometimes, the objects lends themselves to just GET, PUT, POST and DELETE and sometimes, you want more info in the URL as to the specific operation to be carried out on that resource.
If you want to be Rest compliant, you can just use Post and Get.
If you want to be Restfull, you need to base your method on the CRUD
Create -> Post
Read -> Get
Update -> Put or Patch
Delete -> Delete
About building a full API, using method on the same URL could be easier to build / understand. All queries about your user will be on the user url and not user/get, user/add, user/update ... It let you have the same functionality, without too much different URL.
When you build an API, you will want to have some logs, for stats analysis and other stuff. This way, if you split with your method, you can just have a filter to logs how many Post requests, or Get requests.
In fact, you could build an API only with Get requests too. But spliting with methods and URL is the best way to avoid complexes URL (or URL with too much action name) and to have an easiest way to log every requests going through your API
- List item
Level 1 is Rest
Level 2 is Restfull
Level 3 is Hateoas
You should find more informations inside some books or articles written by Martin Fowler
What I usually do is use "POST" for creating a new resource, and use "PUT" for updating an already existing resource.
For your second question, yes most API's use the same URL to do different things on the same resource. That could be because of security where you don't want to expose what you are doing in your URL's (/delete for example). Also, many frameworks generate an auto URL for a resource (Object Class), that is then differentiated on the request method. People just don't tend to use custom URL's for those.

User Segmentation Engine using MongoDB

I have an analytics system that tracks customers and their attributes as well as their behavior in the form of events. It is implemented using Node.js and MongoDB (with Mongoose).
Now I need to implement a segmentation feature that allows to group stored users into segments based on certain conditions. For example something like purchases > 3 AND country = 'Netherlands'
In the frontend this would look something like this:
An important requirement here is that the segments get updated in realtime and not just periodically. This basically means, that every time a user's attributes change or he triggers a new event, I have to check again which segments he does belong to.
My current approach is to store the conditions for the segments as MongoDB queries, that I can then execute on the user collection in order to determine which users belong to a certain segment.
For example a segment to filter out all users that are using Gmail would look like this:
{
_id: '591638bf833f8c843e4fef24',
name: 'Gmail Users',
condition: {'email': { $regex : '.*gmail.*'}}
}
When a user matches the condition I would then store that he belongs to the 'Gmail Users' segment directly on the user's document:
{
username: 'john.doe',
email: 'john.doe#gmail.com',
segments: ['591638bf833f8c843e4fef24']
}
However by doing this, I would have to execute all queries for all segments every time a user's data changes, so I can check if he is part of the segment or not. This feels a bit complicated and cumbersome from a performance point of view.
Can you think of any alternative way to approach this? Maybe use a rule-engine and do the processing in the application and not on the database?
Unfortunately I don't know a better approach but you can optimize this solution a little bit.
I would do the same:
Store the segment conditions in a collection
Once you find a matching user, store the segment id in the user's document (segments)
An important requirement here is that the segments get updated in realtime and not just periodically.
You have no choice, you need to run the segmentation query every times when a segment changes.
I would have to execute all queries for all segments every time a user's data changes
This is where I would change your solution, actually just optimise it a little bit:
You don't need to run the segmentation queries on the whole collection. If you put your user id into the query with an $and, Mongodb will fetch the user first and after that will check the rest of the segmentation conditions. You need to make sure Mongodb uses the user's _id as an index, for this you can use .explain() to check it or .hint() to force it. Unfortunately you need to run N+1 queries if you have N segments (+1 is for the user update)
I would fetch every segments and store them in a cache (redis). If someone changed the segment I would update the cache as well. (Or just invalidate the cache and the next query will handle the rest, depends on the implementation). The point is that I would have every segments without fetching the database and if a user updated a record I would go through every segments with Node.js and validate the user by the conditions and I could update the user's segments array in the original update query so it would not require any extra database operation.
I know it could be a pain in the ass implementing something like this but it doesn't overload the database ...
Update
Let me give you some technical details about my second suggestion:
(This is just a pseudo code!)
Segment cache
module.exporst = function() {
return new Promise(resolve) {
Redis.get('cache:segments', function(err, segments) {
// handle error
// Segments are cached
if(segments) {
segments = JSON.parse(segments);
return resolve(segments);
}
//fetch segments and save it to the cache
Segments.find().exec(function(err, segments) {
// handle error
segments = JSON.stringify(segments);
// Save to the database but set 60 seconds as an expiration
Redis.set('cache:segments', segments, 'EX', 60, function(err) {
// handle error
return resolve(segments);
})
});
})
}
}
User update
// ...
let user = user.findOne(_id: ObjectId(req.body.userId));
// etc ...
// fetch segments from cache or from the database
let segments = yield segmentCache();
let userSegments = [];
segments.forEach(function(segment) {
if(checkSegment(user, segment)) {
userSegments.push(segment._id)
}
});
// Override user's segments with userSegments
This is where the magic happens, somehow you need to define the conditions in a way you can use them in an if statement.
Hint: Lodash has this functions: _.gt, _.gte, _.eq ...
Check segments
module.exports = function(user, segment) {
let keys = Object.keys(segment.condition);
keys.forEach(function(key) {
if(user[key] === segment.condition[key]) {
return false;
}
})
return true;
}
You are already storing an entire segment "query" in a document in segments collection - why not include a field in the same document which will enumerate which fields in the users document impact membership in a particular segment.
Since action of changing user data will know which fields are being changed, it can fetch only the segments which are computed using the fields being changed significantly reducing the size of segmentation "queries" you have to re-run.
Note that a change in user's data may add them to a segment they are not currently a member of, so checking only the segments currently stored in the user is not sufficient.

Can I restrict Easy Search to only return published data?

I'm using matteodem's Easy Search package and have just discovered that instead of returning only published documents, the searches have access to the entire collection.
I've tested this by setting my publish function to return an empty array, and then checking that MyCollection.find().fetch() in the console correctly returns []. But searching MyCollection with Easy Search still returns all matching documents in the collection.
Is there any way to ensure that Easy Search only passes permitted data up to the client? I can't find anything in the documentation.
Easy Search is running the search on the server where it has universal access. According to the docs you can setup a default selector to filter the search by some criteria. In your case you can just copy the selector from your normal publication (the first parameter in your publication's find()) and set that as the default selector for Easy Search.
let index = new EasySearch.Index({
collection: someCollection,
fields: ['name'],
engine: new EasySearch.Minimongo({
sort: () => ['score'], // sort by score
selector: function (searchObject, options, aggregation) {
// selector contains the default mongo selector that Easy Search would use
let selector = this.defaultConfiguration().selector(searchObject, options, aggregation);
// modify the selector to only match documents created by the current user
selector.createdBy = this.userId || options.search.userId; // to run on both client and server
return selector;
}
})
});
From matteodem, the developer of Easy Search, I have a simple answer to my original question of how to ensure that only published data are returned by a search.
The easiest way is to use a client-side search engine such as Minimongo. This accesses data through your publications, so will see exactly the data that you have published, no more no less. No selector is needed.
Example code:
postsIndex = new EasySearch.Index({
collection: Posts,
fields: ['name', 'tags', 'created_by_username'],
defaultSearchOptions: {
limit: 6
},
engine: new EasySearch.Minimongo() // search only on the client, so only published documents are returned
});
Alternatively you could use a server-side search engine such as MongoDB. In this case you should add a selector to control what data are returned.
Unfortunately I haven't been able to get selectors working with MongoDB. You can't use Meteor.userId() because the code runs on both server and client, and when I try the recipe for this in the documentation, I see an error on the server. Using Michel Floyd's construction, options.search.userId exists but is null. If I find out how to do it, I'll update this answer for completeness.

Resources