MongoDB Best way to pair and delete sequential database entries

MongoDB Best way to pair and delete sequential database entries - node.js

Okay so let's say I'm making a game of blind war!
Users A & B have x amount of soldiers
There are currently 0 DB docs.
User A sends 50 soldiers making a DB doc
User B sends 62 soldiers after user A!
This creates a new DB doc.
I need the most effective/scalable way to lookup user A's doc, compare it to User B's doc then delete both docs! (After returning the result of course)
Here's the problem! I could potentially have 10,000+ users sending soldiers at relatively the same time! How can I successfully complete the above process without overlapping?
I'm using the MEANstack for development so I'm not limited to doing this in the database but obviously the WebApp has to be 100% secure!
If you need any additional info or explanation please let me know and I'll update this question
-Thanks

One thing that comes to mind here is you may not need to do all the work that you think you need to, and your problem can probably be solved with a little help from TTL Indexes and possibly capped collections. Consider the following entries:
{ "_id" : ObjectId("531cf5f3ba53b9dd07756bb7"), "user" : "A", "units" : 50 }
{ "_id" : ObjectId("531cf622ba53b9dd07756bb9"), "user" : "B", "units" : 62 }
So there are two entries and you got that _id value back when you inserted. So at start, "A" had no-one to play against, but the entry for "B" will play against the one before it.
ObejctId's are monotonic, which means that the "next" one along is always greater in value from the last. So with the inserted data, just do this:
db.moves.find({
_id: {$lt: ObjectId("531cf622ba53b9dd07756bb9") },
user: { $ne: "B" }
}).limit(1)
That gives the preceding inserted "move" to the current move that was just made, and does this because anything that was previously inserted will have an _id with less value than the current item. You also make sure that you are not "playing" against the user's own move, and of course you limit the result to one document only.
So the "moves" will be forever moving forward, When the next insert is made by user "C" they get the "move" from user "B", and then user "A" would get the "move" from user "C", and so on.
All that "could" happen here is that "B" make the next "move" in sequence, and you would pick up the same document as in the last request. But that is a point for your "session" design, to store the last "result" and make sure that you didn't get the same thing back, and as such, deal with that however you want to in your design.
That should be enough to "play" with. But let's get to your "deletion" part.
Naturally you "think" you want to delete things, but back to my initial "helpers" this should not be necessary. From above, deletion becomes only a factor of "cleaning-up", so your collection does not grow to massive proportions.
If you applied a TTL index,in much the same way as this tutorial explains, your collection entries will be cleaned up for you, and removed after a certain period of time.
Also what can be done, and especially considering that we are using the increasing nature of the _id key and that this is more or less a "queue" in nature, you could possibly apply this as a capped collection. So you can set a maximum size to how many "moves" you will keep at any given time.
Combining the two together, you get something that only "grows" to a certain size, and will be automatically cleaned for you, should activity slow down a bit. And that's going to keep all of the operations fast.
Bottom line is that the concurrency of "deletes" that you were worried about has been removed by actually "removing" the need to delete the documents that were just played. The query keeps it simple, and the TTL index and capped collection look after you data management for you.
So there you have what is my take on a very concurrent game of "Blind War".

Related

PostgreSQL: Is it possible to limit inserts per user based on time difference between timestamp column and current time?

I have an issue when two almost concurrent requests (+- 10ms difference) by the same user (unintentionally duplicated by client side) successfully execute whole use case logic twice. I can't really solve this situation in code of my API, so I've been thinking about how to limit one user_id to be able to insert row into table order max. once every second for example.
I want to achieve this: If in table order exists row with user_id X and that row was created (inserted) less than 1 second ago, insert with user_id X would fail.
This could be effective way of avoiding unintentionally duplicated requests by client side. Because I can't imagine situation when user could send two complex requests less than 1 second between intentionally. I'm also interested in any other ideas, for example what's the proper way to deal with similar situations in API's.

There is one problem with your idea. If the server becomes really slow for just a second, the orders will arrive more than one second apart in the database and will be inserted.
I'd recommend generating a unique ID, like a UUID, in the front-end, and sending that with the request. You could, for example, generate a new one every page load. Then, if the server sees that the received UUID already exists in the database, the order is skipped.
This avoids any potential timing issues, but also retains the possibility of someone re-ordering the exact same products.

You can do it with an EXCLUDE constraint. You need to create your own immutable helper function, and use an extension.
create extension btree_gist;
create function addsec(timestamptz) returns tstzrange immutable language sql as $$
select tstzrange($1,$1+interval '1 second')
$$;
create table orders (
userid int,
t timestamptz,
exclude using gist (userid with =, addsec(t) with &&)
);
But you should probably change the front end anyway to include a validation token, as currently it may be subject to CSRF attacks.
Note that EXCLUDE constraints may be much less efficient than UNIQUE constraints. Also, I'm not 100% sure that addsec really is immutable. There might be weird things with leap seconds or something that messes it up.

MongoDB (+ Node/Mongoose): How to track processed records without "marking" all of them?

I have several large "raw" collections of documents which are processed in a queue, and the processed results are all placed into a single collection.
The queue only runs when the system isn't otherwise indisposed, and new data is being added into the "raw" collections all the time.
What I need to do is make sure the queue knows which documents it has already processed, so it doesn't either (a) process any documents more than once, or (b) skip documents. Updating each raw record with a "processed" flag as I go isn't a good option because it adds too much overhead.
I'm using MongoDB 4.x, with NodeJS and Mongoose. (I don't need a strictly mongoose-powered answer, but one would be OK).
My initial attempt was to do this by retrieving the raw documents sorted by _id in a smallish batch (say 100), then grabbing the first and last _id values in the return result, and storing those values, so when I'm ready to process the next batch, I can limit my find({}) query to records with an _id greater than what I stored as the last-processed result.
But looking into it a bit more, unless I'm misunderstanding something, it appears I can't really count on a strict ordering by _id.
I've looked into ways to implement an auto-incrementing numeric ID field (SQL style), which would have a strict ordering, but the solutions I've seen look like they add a nontrivial amount of overhead each time I create a record (not dissimilar to what it would take to mark processed records, just would be on the insertion end instead of the processing end), and this system needs to process a LOT of records very fast.
Any ideas? Is there a way to do an auto-incrementing numeric ID that's super efficient? Will default _id properties actually work in this case and I'm misunderstanding? Is there some other way to do it?

As per the documentation of ObjectID:
While ObjectId values should increase over time, they are not
necessarily monotonic. This is because they:
Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
Are generated by clients, which may have differing system clocks.
So if you are creating that many records per second then _id ordering is not for you.
However Timestamp within a mongo instance is guaranteed to be unique.
BSON has a special timestamp type for internal MongoDB use and is not
associated with the regular Date type. Timestamp values are a 64 bit
value where:
the first 32 bits are a time_t value (seconds since the Unix epoch)
the second 32 bits are an incrementing ordinal for operations within a
given second.
Within a single mongod instance, timestamp values are always unique.
Although it clearly states that this is for internal use it maybe something for you to consider. Assuming you are dealing with a single mongod instance you can decorate your records when they are getting into the "raw" collections with timestamps ... then you could remember the last processed record only. Your queue would only pick records with timestamps larger that the last processed timestamp.

Mongodb document insertion order

I have a mongodb collection for tracking user audit data. So essentially this will be many millions of documents.
Audits are tracked by loginID (user) and their activities on items. example: userA modified 'item#13' on date/time.
Case: I need to query with filters based on user and item. That's Simple. This returns many thousands of documents per item. I need to list them by latest date/time (descending order).
Problem: How can I insert new documents to the top of the stack? (like a capped collection) or Is it possible to find records from the bottom of the stack? (reverse order). I do NOT like the idea of find and sorting because when dealing with thousand and millions of documents sorting is a bottleneck.
Any solutions?
Stack: mongodb, node.js, mongoose.
Thanks!

the top of the stack?
you're implying there is a stack, but there isn't - there's a tree, or more precisely, a B-Tree.
I do NOT like the idea of find and sorting
So you want to sort without sorting? That doesn't seem to make much sense. Stacks are essentially in-memory data structures, they don't work well on disks because they require huge contiguous blocks (in fact, huge stacks don't even work well in memory, and growing stacks requires copying the entire data set, that would hardly work
sorting is a bottleneck
It shouldn't be, at least not for data that is stored closely together (data locality). Sorting is an O(m log n) operation, and since the _id field already encodes a timestamp, you already have a field that you can sort on. m is relatively small, so I don't see the problem here. Have you even tried that? With MongoDB 3.0, index intersectioning has become more powerful, you might not even need _id in the compound index.
On my machine, getting the top items from a large collection, filtered by an index takes 1ms ("executionTimeMillis" : 1) if the data is in RAM. The sheer network overhead will be in the same league, even on localhost. I created the data with a simple network creation tool I built and queried it from the mongo console.

I have encountered the same problem. My solution is to create another additional collection which maintain top 10 records. The good point is that you can query quickly. The bad point is you need update additional collection.
I found this which inspired me. I implemented my solution with ruby + mongoid.
My solution:
collection definition
class TrainingTopRecord
include Mongoid::Document
field :training_records, :type=>Array
belongs_to :training
index({training_id: 1}, {unique: true, drop_dups: true})
end
maintain process.
if t.training_top_records == nil
training_top_records = TrainingTopRecord.create! training_id: t.id
else
training_top_records = t.training_top_records
end
training_top_records.training_records = [] if training_top_records.training_records == nil
top_10_records = training_top_records.training_records
top_10_records.push({
'id' => r.id,
'return' => r.return
})
top_10_records.sort_by! {|record| -record['return']}
#limit training_records' size to 10
top_10_records.slice! 10, top_10_records.length - 10
training_top_records.save

MongoDb's ObjectId is structured in a way that has natural ordering.
This means the last inserted item is fetched last.
You can override that by using: db.collectionName.find().sort({ $natural: -1 }) during a fetch.
Filters can then follow.
You will not need to create any additional indices since this works on _id, which is indexed by default.
This is possibly the only efficient way you can achieve what you want.

How to account for a failed write or add process in Mongodb

So I've been trying to wrap my head around this one for weeks, but I just can't seem to figure it out. So MongoDB isn't equipped to deal with rollbacks as we typically understand them (i.e. when a client adds information to the database, like a username for example, but quits in the middle of the registration process. Now the DB is left with some "hanging" information that isn't assocaited with anything. How can MongoDb handle that? Or if no one can answer that question, maybe they can point me to a source/example that can? Thanks.

MongoDB does not support transactions, you can't perform atomic multistatement transactions to ensure consistency. You can only perform an atomic operation on a single collection at a time. When dealing with NoSQL databases you need to validate your data as much as you can, they seldom complain about something. There are some workarounds or patterns to achieve SQL like transactions. For example, in your case, you can store user's information in a temporary collection, check data validity, and store it to user's collection afterwards.
This should be straight forwards, but things get more complicated when we deal with multiple documents. In this case, you need create a designated collection for transactions. For instance,
transaction collection
{
id: ..,
state : "new_transaction",
value1 : values From document_1 before updating document_1,
value2 : values From document_2 before updating document_2
}
// update document 1
// update document 2
Ooohh!! something went wrong while updating document 1 or 2? No worries, we can still restore the old values from the transaction collection.
This pattern is known as compensation to mimic the transactional behavior of SQL.

Running query on database after a document/row is of certain age

What is the best practice for running a database-query after any document in a collection become of certain age?
Let's say this is a node.js web-system with mongoDB, with a collection of posts. After a new post is inserted, it should be updated with some data after 60 minutes.
Would a cron-job that checks all posts with (age < one hour) every minute or two be the best solution? What would be the least stressing solution if this system has >10.000 active users?

Some ideas:
Create a second collection as a queue with a "time to update" field which would contain the time at which the source record needs to be updated. Index it, and scan through looking for values older than "now".
Include the field mentioned above in the original document and index it the same way
You could just clear the value when done or reset it to the next 60 minutes depending on behavior (rather than inserting/deleting/inserting documents into the collection).
By keeping the update-collection distinct, you have a better chance of always keeping the entire working set of queued updates in memory (compared to storing the update info in your posts).
I'd kick off the update not as a web request to the same instance of Node but instead as a separate process so as to not block user-requests.
As to how you schedule it -- that's up to you and your architecture and what's best for your system. There's no right "best" answer, especially if you have multiple web servers or a sharded data system.
You might use a capped collection, although you'd run the risk of potentially losing records needing to be updated (although you'd gain performance)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string