ArangoDB: having a few large collections vs. a lot small collections

ArangoDB: having a few large collections vs. a lot small collections - arangodb

I have a question regarding performance/best practice:
Scenario: I have a user-collection and a chatbot-collection. They can be a lot of users (lets say 100 -1000 users) in the user-collection. Each user can have multiple chatbots (around 10 per user).
Option A: I create an edge collection to define the connection between user -> chatbot. At the end I would have 1 user-collection, 1 chatbot-collection (containing all chatbots from all users) and 1 edge-collection (containing the definitions from a user to its chatbots)
Option B: I create a separate chatbot-collection for each user, to have all chatbots of a specific user in one place. Chatbot-collection name would be e.g. user_xyz(user._key)_chatbots. So if I need all chatbots of a user with the _key ‚abc‘, I would check the collection user_abc_chatbots. In this case I don’t need an edge collection for the connection user -> chatbot. At the end I would have 1 user-collection and a lot of user_xyz_chatbots-collections (depending on how many users I have - can be 100-1000 as I wrote before).
Now my question: What is the better option? Also regarding performance - Image I have to get all (or a specific) chatbot of a user each time I receive a request.
Would be awesome if you can give me feedback on your experience/thoughts :)

Looking at the numbers you posted, i.e. 100 - 1000 users and about 10 chatbots per user, this would mean just 1000 to 10000 chatbots in total.
For this dimension of data, I would say it makes more sense to store all chatbots in a single collection, and use an (indexed) attribute to store the user id for each chatbot. This is a 1:n relationship (1 user mapped to n chatbots)
That way you can easily and still quickly find all chatbots mapped to a particular user, but this setup will also allow you to make analyses for all users or all chatbots easily.
This would be much more difficult to achieve if the chatbots of each user would be located in a different collection.
In addition, if the same chatbots can be mapped to multiple users, it may actually make sense to use three collections:
one collection for users
one collection for chatbots
and one mapping collection between users and chatbots
This would be an n:m relationship, in which each user can still be mapped to any number of chatbots, but if multiple users are mapped to the same chatbot, the data of each chatbot does not need to be stored redundantly.
I would only recommend to use separate chatbot collections per user if each chatbot has an individual data structure that is separate from all others, and that needs special indexing or querying. In this case it may make sense to separate different chatbots.
However, having too many collections (here we would think of at most 1,000) also isn't great, because each collection has a small overhead even when empty. This amortized much better if there are fewer collections that are used more frequently than when having many collections which are used seldomly.

Related

Managing messages in chat room with MongoDB and Graphql

I am wondering how to manage messages in my chat room. My assumptions are:
There is a collection rooms with fields like id, messages, participants
There can be many rooms with many participants
Now, I have doubts:
should I have separate collection with messages (id, author, text, where author is a reference to users collection)?
Or maybe should I keep simple objects in messages instead of documents with refs?
I can imagine that collection with messages will be huuuuge (if is not cleared). Will Mongo handle it? Or maybe there is a better way for doing that?
Regards

It depends on the scale of what you're building.
I would say that, what meatsacks like you and me consider to be huge amounts is often peanuts for database systems (be it relational or a nosql datastore).
It's hard to say without knowing anything about the project, but i suspect you'll likely be better off, if you design your data model based on correctness/usefulness, and worry about performance as a next step.
Based on the entities you describe (rooms, messages, participants, users, ...) i'm picturing an application such as Discord. In such a case i would think of rooms and users as first order entities and both participants and messages as (big) ordered lists of data belonging to a room (while each entry in both obviously also has a reference to its personal user/author).

MongoDB, how to manage user related records

I'm currently trying to learn Node.js and Mongoodb by building the server side of a web application which should manage insurance documents for the insurance agent.
So let's say i'm the user, I sign in, then I start to add my customers and their insurances.
So I have 2 collection related, Customers and Insurances.
I have one more collection to store the users login data, let's call it Users.
I don't want the new users to see and modify the customers and the insurances of other users.
How can I "divide" every user related record, so that each user can work only with his data?
I figured out I can actually add to every record, the _id of the one user who created the record.
For example I login as myself, I got my Id "001", I could add one field with this value in every customer and insurance.
In that way I could filter every query with this code.
Would it be a good idea? In my opinion this filtering is a waste of processing power for mongoDB.
If someone has any idea of a solution, or even a link to an article about it, it would be helpful.
Thank you.

This is more a general permissions problem than just a MongoDB question. Also, without knowing more about your schemas it's hard to give specific advice.
However, here are some approaches:
1) Embed sub-documents
Since MongoDB is a document store allowing you to store arbitrary JSON-like objects, you could simply store the customers and licenses wholly inside each user object. That way querying for a user would return their customers and licenses as well.
2) Denormalise
Common practice for NoSQL databases is to denormalise related data (ie. duplicate the data). This might include embedding a sub-document that is a partial representation of your customers/licenses/whatever inside your user document. This has the similar benefit to the above solution in that it eliminates additional queries for sub-documents. It also has the same drawbacks of requiring more care to be taken for preserving data integrity.
3) Reference with foreign key
This is a more traditionally relational approach, and is basically what you're suggesting in your question. Depending on whether you want the reference to be bi-directional (both documents reference each other) or uni-directional (one document references the other) you can either store the user's ID as a foreign user_id field, or store an array of customer_ids and insurance_ids in the user document. In relational parlance this is sometimes described to as "has many" or "belongs to" (the user has many customers, the customer belongs to a user).

How to design order schema for multiple products?

I have to design a schema in such a way that I can store user id and their order which can be multiple products like bread, butter plus in addition to that I want to store the quantity of product ordered, please guide.

It is difficult to provide you with a real solution to your problem as designing a NoSQL DB structure depends on how you want to access your data. You can keep orders as nested/embedded documents in the User model or store them in a separate collection. In the first case, you will have all the data in one requests, but you will not be able to query and receive orders, that match certain criteria as you will get all orders including those that match. And then you would need to filter them out. Or you could use aggregation to get exactly what you need.
However, there is a limitation to keep in mind. MongoDB document has a size limitation - 16 megabytes. Since users may have very many orders, you can reach the document size limit for some users for sure. Aggregation also has a limitation - Pipeline stages have a limit of 100 megabytes of RAMe but you can override it.
Having orders in a separate collection would require you to separately load them for users. While it is one more request, it will give you more flexibility in terms of how you query them.
Then, of course, create/update operations are also done differently for both cases.
My advice would be that you carefully design your application first - what data you need and where you will show it, how you create/update it. It will give you a better idea and chances are that relational DB will be a better choice for what you need (though absolutely not necessary).

Azure Search- replicating result of nested SQL query

I have a database comprising of the following schema depicting the linkage between individuals who connect with multiple Advisors and these Advisors have affiliations with multiple organizations
Individuals--> Advisors (m:n relationship)
Advisors --> Enterprises (m:n relationship)
The business need is to enable search on all these concepts and organize results around AdvisorIds. As an example, display of a search result could be as follows
a) Advisor1-> connected to Individuals A,B,C; and linked to Enterprises X,Y
b) Advisor2-> connected to Individuals A, E; and linked to Enterprises M,X,Z
Towards this, we created a flattened table on these concepts and the relationship between them. Hence the same AdvisorId would appear in multiple rows
When I search for a string, I want to ensure that ALL records around an AdvisorId to be returned together irrespective of search score of the individual records.
One approach could be
a) first run an Azure Search and get a result of AdvisorId, ordered by search score of each record. This will repeat Advisor Ids
b) take a distinct set of AdvisorIds (across pages) via standard SQL
c) for each AdvisorId, pick all the related records via standard SQL
2 questions
Here a lot of processing in (b) and (c) will be done outside Azure leading to delays. Also, if I were to use pagination for (a), I am never sure of number of AdvisorId's, I end up with after the distinct operation
I wanted to check if there is a way to get the nested search implemented in Azure to do (a), (b) and (c) as a single API call
If I were to use facets for handling (a) and (b) together, how do I ensure that the ordering is based on the best search-score document within a facet

There isn't a way to achieve what you want in a single request unless you model your data differently. Instead of denormalizing the Individuals-Advisors and Advisors-Enterprises relationships, it may be possible instead to have one document per Advisor and use collections to store information about the related Individuals and Enterprises. This may or may not work for you depending on whether you need to supported correlated filtering on Individuals and Enterprises that are related to an Advisor. There is a whitepaper here that should help you evaluate whether this approach would work for you.
Another option might be to model Individuals, Advisors, and Enterprises as separate indexes, issue three queries, and do a client-side join. However, this is limited by the number of Advisor IDs you'd need to send in the queries on Individuals and Enterprises. Azure Search has limits on the size of filters that can make this impractical unless your queries have low recall.
We are working on making Azure Search better for scenarios like yours. For example, we're currently working on adding support for complex types. Please vote on User Voice and feel free to suggest other features that would help.

Architecting a Mongodb lottery app with bets and jackpots in different currencies

I am designing a nodejs lottery app, using MongoDB/Mongoose; it currently works with fake money.
I want users to continue to be able bet in a 'sandbox', with fake money, but I also want to allow users to use 1+ currencies, each currency with a different jackpot.
I'm looking for the best way to architect this within MongoDB:
Some possibilities:
Use an entirely separate database for each currency. Users will have to have 1 account for each currency. Not ideal.
Have 'bet', 'jackpot', etc. schemas have a 'currency' field. Probably easiest, but not sure if this is a relational way of thinking. It doesn't feel particularly elegant.
Have 2 separate databases for 'bet' and 'jackpot', but a shared database with 'user' information. Since I do use 'populate' a couple of times, this may or may not be feasible.
I appreciate any thoughts on this.

When you want the lottery of each currency separated, you could put them in the same database, but in different collections for each currency. That way you can easily decide which data is currency-agnostic (like user accounts) and which data is currency-dependent (like bets).
Keep in mind, though, that any queries which get data about lotteries in different currencies will get more complicated, because you can only query one collection at a time. When you need a lot of such queries (like when the user has a dashboard where he sees all loteries he currently takes part in regardless of currency), you should rather go for the solution with a currency-field.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string