Is DynamoDB's updateitem threadsafe? - multithreading

If I have a table that looks like the following:
Books {
author: StringAttribute,
title: StringAttribute,
rating: NumberAttribute
}
and I issue two UpdateItem requests concurrently to the same item, one that updates the rating and another that updates the title, is it possible to hit a race condition in which they both succeed but one overwrites the other? Do I need to introduce my own locking in my code?
I've been looking through AWS's docs for something that addresses this scenario but so far have not been able to find anything relevant.

Related

Idempotency of cronjobs - "workflow" tables in database

I'm currently working on a backend system, and am faced with porting cronjobs functionality from a legacy system to the new backend. A bunch of these jobs are not idempotent, and I will want to make them idempotent when porting them.
As I understand it, for a job to be idempotent, its state (whether it has been completed, or possible whether it is being currently performed) should somehow be represented in the database / entity model. Because then, a single task can always conditionally opt-out of running if the data shows that it's already done / being handled.
I can imagine simple scenario's where you can just add an extra field (column) to entities (tables) for certain tasks specifically related to that entity, for example
entity Reservation {
id
user_id
...
reminder_sent(_at) <- whether the "reminder" task has been performed yet
}
But more generally, I feel like tasks often involve a bunch of different entities, and it would "pollute" the entities if they need to "know about" the tasks that operate on them. Also, the "state" of a job can be more complicated than just "done or not done yet" in more complex cases. Here's some examples from our business:
If a user has more than a certain amount in total unpaid invoices, we sent three consecutive email reminders at certain intervals, until it it resolved, and if not, end up sending the data to an external party for collection. If the user pays the said invoices, but then acquires new ones, the workflow should restart instead of continue.
Once every month, certain users get rewarded with vouchers. The description of the voucher will note the details, e.g. "Campaign bla bla, Jul 2022", but that's all we have "in" the data of the voucher to know it relates to this job.
I feel like there must be a general known engineering concept here, but I can't seem to find the right resources on the internet. For the time being, I'm calling these things "workflows", and think it makes sense for them to have their own entity/table, e.g.
entity Workflow_UnpaidInvoicesReminder {
id
# "key" by which the job is idempotent
user_id
invoice_id / invoice_ids
# workflow status/result fields
created_at
paid_at
first_reminder_sent_at
second_reminder_sent_at
third_reminder_sent_at
sent_externally_at
}
entity Workflow_CampaignVouchers {
id
# "key" by which the job is idempotent
user_id
campaign_key
# workflow status/result fields
created_at
voucher_id
}
Can someone maybe help me find the appropriate terms and resources to describe the stuff above? I can't seem to find the relevant information about the general idea of "workflows" like these, on the internet, that well.

Mongodb: is replacing an array with a new version more efficient than adding elements to it?

I have a single /update-user endpoint on my server that triggers an updateUser query on mongo.
Basically, I retrieve the user id thanks to the cookie, and inject the received form, that can comprise any kind of key allowed in the User model, in the mongo query.
It looks like:
const form = {
friends: [{id: "1", name: "paul", thumbnail: "www.imglink.com"},
{id: "2", name: "joe", thumbnail: "www.imglink2.com"}],
locale: "en",
age: 77
}
function updateUser(form, _id){
const query = JSON.stringify(form)
return UserDB.findOneAndUpdate( { _id }, { $set: query })
}
So each time, I erase the necessary data and replace it by a brand new one. Sometimes, this data can be an array of 50 objects (let's say I've removed two persons in a 36 friends array as described above).
It is very convenient, because I can abstract all the logic both in the front and back with a single update function. However, is this a pure heresy from a performance point of view? Should I rather use 10 endpoints to update each part of the form?
The form is dynamic, I never know what is going to be inside, except that it belongs to the User model, this is why I've used this strategy.
From MongoDB's point of view, it doesn't matter much. MongoDB is a journalled database (particularly with the WiredTiger storage engine), and it probably (re)writes a large part of the document on update. It might make a minor difference under very heavy loads when replicating the oplog between primary and replicas, but if you have performance constraints like these, you'll know. If in doubt, benchmark and monitor your production system - don't over-optimize.
Focus on what's best for the business domain. Is your application collaborative? Do multiple users edit the same documents at the same time? What happens when they overwrite one another's changes? Are the JSONs that the client sends to the back-end large, or do they not clog up the network? These are the most important questions you should ask, and performance should only be optimized once you have the UX, the interaction model and the concurrency issues nailed.

Repository methods for query children of aggregate root

I have Order aggregate root class containing children value objects:
class Order {
val id: String
val lines: Seq[OrderLine]
val destination: Destination
//...omit other fields
}
This is a CQRS read model, that is represented by order-search microservice responsible for searching orders by some filter.
There is OrderApplicationService that uses OrderRepository (I am not sure that it is a pure repository in ddd terms):
trait OrderRepository {
def search(filter:OrderFilter):Seq[Order]
def findById(orderId:String):Order
}
and ElasticSearchOrderRepository which uses ES as search engine.
Due to new requirements I need new api method for UI that will search for the all destinations across the orders by some filter. It should be /destinations endpoint, that will call repository to find all data. The performance is important in this case, so to search for all orders and that map them to destination doesn't seem a good solution.
What is the most appropriate option to solve this? :
Add new method in OrderRepository e.g. def searchOrderDestinations(filter:DestinationFilter): Seq[Destination]
Create new repository:
trait OrderDestinationRepository {
def searchOrderDestinations(filter:DestinationFilter): Seq[Destination]
}
The same is for application service - do I need to create new DestinationAppService?
Are these options applicable? Or maybe there is some better solution?
Thanks in advance!
This is a CQRS read model
Perfect - create and update a list of your orders indexed by destination, and use that to serve the query results.
Think "relational database that includes the data you need to create the view, and an index". Queries go to the database, which acts as a cache for the information. A background process (async) runs to update the information in database.
How often you run that process will depend on how stale the data in the view can be. How bad is it for the business if the view shows results as of 10 minutes ago? as of 1 minute ago? as of an hour ago?

GraphQL Dataloader vs Mongoose Populate

In order to perform a join-like operation, we can use both GraphQL and Mongoose to achieve that end.
Before asking any question, I would like to give the following example of Task/Activities (none of this code is tested, it is given just for the example's sake):
Task {
_id,
title,
description,
activities: [{ //Of Activity Type
_id,
title
}]
}
In mongoose, we can retrieve the activities related to a task with the populate method, with something like this:
const task = await TaskModel.findbyId(taskId).populate('activities');
Using GraphQL and Dataloader, we can have the same result with something like:
const DataLoader = require('dataloader');
const getActivitiesByTask = (taskId) => await ActivityModel.find({task: taskId});
const dataloaders = () => ({
activitiesByTask: new DataLoader(getActivitiesByTask),
});
// ...
// SET The dataloader in the context
// ...
//------------------------------------------
// In another file
const resolvers = {
Query: {
Task: (_, { id }) => await TaskModel.findbyId(id),
},
Task: {
activities: (task, _, context) => context.dataloaders.activitiesByTask.load(task._id),
},
};
I tried to see if there is any article that demonstrates which way is better regarding performance, resource exhaustion,...etc but I failed to find any comparison of the two methods.
Any insight would be helpful, thanks!
It's important to note that dataloaders are not just an interface for your data models. While dataloaders are touted as a "simplified and consistent API over various remote data sources" -- their main benefit when coupled with GraphQL comes from being able to implement caching and batching within the context of a single request. This sort of functionality is important in APIs that deal with potentially redundant data (think about querying users and each user's friends -- there's a huge chance of refetching the same user multiple times).
On the other hand, mongoose's populate method is really just a way of aggregating multiple MongoDB requests. In that sense, comparing the two is like comparing apples and oranges.
A more fair comparison might be using populate as illustrated in your question as opposed to adding a resolver for activities along the lines of:
activities: (task, _, context) => Activity.find().where('id').in(task.activities)
Either way, the question comes down to whether you load all the data in the parent resolver, or let the resolvers further down do some of the work. because resolvers are only called for fields that are included in the request, there is a potential major impact to performance between these two approaches.
If the activities field is requested, both approaches will make the same number of roundtrips between the server and the database -- the difference in performance will probably be marginal. However, your request might not include the activities field at all. In that case, the activities resolver will never be called and we can save one or more database requests by creating a separate activities resolver and doing the work there.
On a related note...
From what I understand, aggregating queries in MongoDB using something like $lookup is generally less performant than just using populate (some conversation on that point can be found here). In the context of relational databases, however, there's additional considerations to ponder when considering the above approaches. That's because your initial fetch in the parent resolver could be done using joins, which will generally be much faster than making separate db requests. That means at the expense of making the no-activities-field queries slower, you can make the other queries significantly faster.

How to abort a particular task in bull queue?

According to this package https://github.com/OptimalBits/bull is it possible, to abort a certain task in the "waiting queue"?
My use-case is as follows:
I have a mongodb collection "users" and a collection "friendship" where I store name and avatar of both users. So I only need one query to get friendlist of a certain user. When a user changes his avatar, I have to update all documents within this user in "friendship" collection. This is a performance-uncritical operation since I want it to do in background and consistency is not important for this use-case. But when a User updates his avatar several times in a short time span, I want to cancel all referencing old tasks (for updating the friendship collection) except the newest. Is this with bull possible?
Thanks in advance, I would appreciate every information about that.
Looking at the Bull reference you will find that there is a Job.remove() method. Since you haven't posted any code I could only guess how it looks like. Hence I have described what you could do.
However what you have to do is to store the Promise<Job> which will be returned by Queue.add() for instance in a Map<string, Map<string, Promise<Job>>. String would be the _id of your user and Promise<Job>[] is an array containing all the queued jobs for a specific user. Once a Job has been resolved (you can await the resolved job with Job.finished()) you need to remove the Promise from your Map.
Whenever a user changes his avatar you could then look into your Map if you need to remove any jobs. The value in the above mentioned Map is another Map (key is a string, which represents the JobId) which easily allows you to remove Jobs by JobId. That may sound a bit complex, but don't be afraid - if you understand how Maps work it shouldn't be a problem :-).

Resources