I've been using MongoDB for a few years now, and I'm starting to use Prisma's client, which seems to be providing excellent type safety. I'm not familiar at all with relational data storage, though...
This is how my database is looking:
Users are each associated with a single group
Each one of them can be a candidate in the group, and all vote for one of the group candidates during an election once they've been selected
What best practices would you advise me to follow here to store the votes? (I'm not talking about the privacy perspective specific to voting, this is a whole other topic, but only the performance & data storage stance)
Typically with MongoDB, I would create an embedded object in the Group document, with the User IDs of candidates as the keys, and an array of the User IDs of the voters for each candidate as the values.
But strong types & the relational database-inspired approach of Prisma seem to prevent me from doing anything like that, most likely for the better.
In a relational database, you would typically create a separate table to store the votes. Each row in the table would represent a single vote and would contain the following columns:
group_id: the ID of the group in which the vote took place
candidate_id: the ID of the candidate who received the vote
voter_id: the ID of the user who cast the vote
You can also add a timestamp column to store the date and time when the vote was cast.
To get the votes for a particular group, you would perform a query like this:
SELECT * FROM votes WHERE group_id = :groupId;
You can then use the results of this query to build the necessary data structures in your application.
Using a separate table to store votes has the advantage of keeping the data normalized, which can make it easier to work with and more performant. It also allows you to use foreign key constraints to ensure that each vote references a valid group and candidate.
Related
I'm currently trying to learn Node.js and Mongoodb by building the server side of a web application which should manage insurance documents for the insurance agent.
So let's say i'm the user, I sign in, then I start to add my customers and their insurances.
So I have 2 collection related, Customers and Insurances.
I have one more collection to store the users login data, let's call it Users.
I don't want the new users to see and modify the customers and the insurances of other users.
How can I "divide" every user related record, so that each user can work only with his data?
I figured out I can actually add to every record, the _id of the one user who created the record.
For example I login as myself, I got my Id "001", I could add one field with this value in every customer and insurance.
In that way I could filter every query with this code.
Would it be a good idea? In my opinion this filtering is a waste of processing power for mongoDB.
If someone has any idea of a solution, or even a link to an article about it, it would be helpful.
Thank you.
This is more a general permissions problem than just a MongoDB question. Also, without knowing more about your schemas it's hard to give specific advice.
However, here are some approaches:
1) Embed sub-documents
Since MongoDB is a document store allowing you to store arbitrary JSON-like objects, you could simply store the customers and licenses wholly inside each user object. That way querying for a user would return their customers and licenses as well.
2) Denormalise
Common practice for NoSQL databases is to denormalise related data (ie. duplicate the data). This might include embedding a sub-document that is a partial representation of your customers/licenses/whatever inside your user document. This has the similar benefit to the above solution in that it eliminates additional queries for sub-documents. It also has the same drawbacks of requiring more care to be taken for preserving data integrity.
3) Reference with foreign key
This is a more traditionally relational approach, and is basically what you're suggesting in your question. Depending on whether you want the reference to be bi-directional (both documents reference each other) or uni-directional (one document references the other) you can either store the user's ID as a foreign user_id field, or store an array of customer_ids and insurance_ids in the user document. In relational parlance this is sometimes described to as "has many" or "belongs to" (the user has many customers, the customer belongs to a user).
We're building a database of cars and their properties, supposed to be stored in a DynamoDB.
Creating a cars table and filling it with objects that has properties like brand, model, year etc is easy.
But we also want a few other features en the admin interface:
Suggestions when typing
When creating a car, it should suggest brand and model from existing cars, when typing in the field.
Should we then maintain a list of brands and models in another table, and make a query to that table, when the user types?
Or is it good enough to query the "rich" table of car definitions, and get all values for brand, all model values where brand has a certain value, etc? My first thought is that it would be a heavy operation and we'd want a separate index of cars and models. But I'm not a NoSQL expert...
Best matches
When enrolling a new car in our system we want to use use an existing defined car as a reference if possible.
So when the user has typed in a brand, model, year etc we want to show a few options of the best matches - we can accept that they year etc. is different, but want the best matches first.
What is the best way to do matches like this on data in a NoSQL database? Any links to tools, concepts etc. will be appreciated :)
Thanks in advance
In dynamodb (all nosql), the less you create tables the best is your architecture (this is one of the main reason we use nosql), so no need of a new table, just add a new attribute and fill it with the searchable data you want, just have in mind that querying by dynamodb is case sensitive and you only can use the begins_with or the contains function to query data
The cons are :
You will use lot of reading capacity unit
You have to manage the capital letters
You have to fabric at each creation the searchable attribute
The solution I suggest is using aws cloudsearch, which gives an out of the boxes suggester, you will will have better results and give a better user experience, the indexation in cloudsearch is automatic each time you have a new item, but be aware of the pricing, however they will give you 30 day for free
So I have this MEAN-project I hobby on in my spare time.
Right now I'm setting up users and rooms, and am a bit hesitant about progressing further, as I am unsure about the proper protocol of db's in general.
As I recall, you're not supposed to have a Many-To-Many relationship; rather, you're supposed to have a relation table.
Right now, my User schema has an array of rooms he is in, and my Room schema has an array of users tied to it (the third and last schema being Message).
Is it better to have a userroomrelation doc that holds a PK, an id of one room, and then a list of all users in this room?
Thanks,
Rasmus
MongoDB isn't a relational database like *SQL databases (hence why MongoDB is called NoSQL), so using a relation table is fairly inefficient in Mongo. Holding an array of user _id's in the room collection is about as ideal as you could get, if you don't want repeat data.
Here are some more indepth answers on many-to-many in MongoDB.
How can a User be in more than one room? Isn't that just a property on the User? And if you index that why would you also need to store it on the Room?
There is no one right way, it really depends on how many of each object you have and if it's a small number (as rooms and users implies) you may be better with a simpler and more robust (cannot store impossible values) approach like having a single property on a user RoomId. That's never going to be inconsistent and if you need to find the set of users in a given room it's a cheap query.
In MongoDB you CAN denormalize the data and store an array on each object containing part or all of the other object, but you can also create an effective join collection if you want to.
For example you could have a collection {UserId, RoomId, DateTimeEntered, DateTimeLeft} with appropriate indexes which allows you to quickly find all the users in a given room at a given time. Once you have the set of Ids you could go load them if you need them for display OR you could add the fields you need for display to this table {UserId, UserName, ...} BUT then you have the problem of maintaining that data if it ever changes OR keeping it intact if you need to know that when they entered the room that's what it was called.
There are also a TON of other questions on StackOverflow relating to how you should store related data, I suggest you go read those also.
I am using Azure DocumentDB and all my experience in NoSql has been in MongoDb. I looked at the pricing model and the cost is per collection. In MongoDb I would have created 3 collections for what I was using: Users, Firms, and Emails. I noted that this approach would cost $24 per collection per month.
I was told by the people I work with that I'm doing it wrong. I should have all three of those things stored in a single collection with a field to describe what the data type is. That each collection should be related by date or geographic area so one part of the world has a smaller portion to search.
and to:
"Combine different types of documents into a single collection and add
a field across all to separate them in searching like a type field or
something"
I would never have dreamed of doing that in Mongo, as it would make indexing, shard keys, and other things hard to get right.
There might not be may fields that overlap between the objects (example: Email and firm objects)
I can do it this way, but I can't seem to find a single example of anyone else doing it that way - which indicates to me that maybe it isn't right. Now, I don't need an example, but can someone point me to some location that describes which is the 'right' way to do it? Or, if you do create a single collection for all data - other than Azure's pricing model, what are the advantages / disadvantages in doing that?
Any good articles on DocumentDb schema design?
Yes. In order to leverage CosmosDb to it's full potential need to think of a Collection is an entire Database system and not as a "table" designed to hold only one type of object.
Sharding in Cosmos is exceedingly simply. You just specify a field that all of your documents will populate and select that as your partition key. If you just select a generic value such as key or partitionKey you can easily separate the storage of your inbound emails, from users, from anything else by picking appropriate values.
class InboundEmail
{
public string Key {get; set;} = "EmailsPartition";
// other properties
}
class User
{
public string Key {get; set;} = "UsersPartition";
// other properties
}
What I'm showing is still only an example though. In reality your partition key values should be even more dynamic. It's important to understand that queries against a known partition are extremely quick. As soon as you need to scan across multiple partitions you'll see much slower and more costly results.
So, in an app that ingests a lot of user data. Keeping a single user's activity together in one partition might make sense for that particular entity.
If you want evidence that this is the appropriate way to use CosmosDb, consider the addition of the new Gremlin Graph APIs. Graphs are inherently heterogenous as they contain many different entities and entity types as well as the relationships between them. The query boundary of Cosmos is at the collection level so if you tried putting your entities all in different collections none of the Graph API or queries would work.
EDIT:
I noticed in the comments you made this statement And you would have an index on every field in both objects. CosmosDb does automatically index every field of every document. They use a special proprietary path based indexing mechanism that ensures every path of your JSON tree has indices on it. You have to specifically opt out of this auto indexing feature.
I have a system where actions of users need to be sent to other users who subscribe to those updates. There aren't a lot of users/subscribers at the moment, but it could grow rapidly so I want to make sure I get it right. Is it just this simple?
create table subscriptions (person_uuid uuid,
subscribes_person_uuid uuid,
primary key (person_uuid, subscribes_person_uuid)
)
I need to be able to look up things in both directions, i.e. answer the questions:
Who are Bob's subscribers.
Who does Bob subscribe to
Any ideas, feedback, suggestions would be useful.
Those two queries represent the start of your model:
you want the user to be the PK or part of the PK.
depending on the cardinality of subscriptions/subscribers you could go with:
for low numbers: using a single table and two sets
for high numbers: using 2 tables similar to the one you describe
#Jacob
Your use case looks very similar to the Twitter example, I did modelize it here
If you want to track both sides of relationship, I'll need to have a dedicated table to index them.
Last but not least, depending on the fact that the users are mutable OR not, you can decide to denormalize (e.g. duplicate User content) or just store user ids and then fetch users content in a separated table.
I've implemented simple join feature in Achilles. Have a look if you want to go this way