cassandra shopping cart data modeling explain - cassandra

I have seen this screenshot about a data modeling in cassandra for an shopping cart.
Can anyone explain me two questions?
Q1: why is cart_id a UUID and user_id a TEXT ?
Q2: why I need items_by_name when items_by_id table exists ?

If you see the explanation in the above diagram, each table has a marker text on arrows which says for what queries these tables will be required. In Cassandra, you create schema based on your queries, which are required by application. So that answers your second question, if your application has a query like Q3 then you need to create that table. Otherwise no way you can find items by name. For second question it is more of architectural decision what kind of key you want.

Related

NoSQL - how to implement autosuggest and best matches properly?

We're building a database of cars and their properties, supposed to be stored in a DynamoDB.
Creating a cars table and filling it with objects that has properties like brand, model, year etc is easy.
But we also want a few other features en the admin interface:
Suggestions when typing
When creating a car, it should suggest brand and model from existing cars, when typing in the field.
Should we then maintain a list of brands and models in another table, and make a query to that table, when the user types?
Or is it good enough to query the "rich" table of car definitions, and get all values for brand, all model values where brand has a certain value, etc? My first thought is that it would be a heavy operation and we'd want a separate index of cars and models. But I'm not a NoSQL expert...
Best matches
When enrolling a new car in our system we want to use use an existing defined car as a reference if possible.
So when the user has typed in a brand, model, year etc we want to show a few options of the best matches - we can accept that they year etc. is different, but want the best matches first.
What is the best way to do matches like this on data in a NoSQL database? Any links to tools, concepts etc. will be appreciated :)
Thanks in advance
In dynamodb (all nosql), the less you create tables the best is your architecture (this is one of the main reason we use nosql), so no need of a new table, just add a new attribute and fill it with the searchable data you want, just have in mind that querying by dynamodb is case sensitive and you only can use the begins_with or the contains function to query data
The cons are :
You will use lot of reading capacity unit
You have to manage the capital letters
You have to fabric at each creation the searchable attribute
The solution I suggest is using aws cloudsearch, which gives an out of the boxes suggester, you will will have better results and give a better user experience, the indexation in cloudsearch is automatic each time you have a new item, but be aware of the pricing, however they will give you 30 day for free

PouchDB structure

i am new with nosql concept, so when i start to learn PouchDB, i found this conversion chart. My confusion is, how PouchDB handle if lets say i have multiple table, does it mean that i need to create multiple databases? Because from my understanding in pouchdb a database can store a lot of documents, but a document mean a row in sql or am i misunderstood?
The answer to this question seems to be surprisingly under-documented. While #llabball clearly gave a decent answer, I don't think that views are always the way to go.
As you can read here in the section When not to use map/reduce, Nolan explains that for simpler applications, the key is to abuse _ids, and leverage the power of allDocs().
In other words, if you had two separate types (say artists, and albums), then you could prefix the id of each type to obtain an easily searchable data set. For example _id: 'artist_name' & _id: 'album_title', would allow you to easily retrieve artists in name order.
Laying out the data this way will result in better performance due to not requiring extra indexes, and less code. Clearly however, if your data requirements are more complex, then views are the way to go.
... does it mean that i need to create multiple databases?
No.
... a document mean a row in sql or am i misunderstood?
That's right. The SQL table defines column header (name and type) - that are the JSON property names of the doc.
So, all docs (rows) with the same properties (a so called "schema") are the equivalent of your SQL table. You can have as much different schemata in one database as you want (visit json-schema.org for some inspiration).
How to request them separately? Create CouchDB views! You can get all/some "rows" of your tabular data (docs with the same schema) with one request as you know it from SQL.
To write such views easily the property type is very common for CouchDB docs. Your known name from a SQL table can be your type like doc.type: "animal"
Your view names will be maybe animalByName or animalByWeight. Depends on your needs.
Sometimes multiple-databases plan is a good option, like a database per user or even a database per user-feature. Take a look at this conversation on CouchDB mailing list.

How to arrange my Data in NoSQL (Invoices)

i'm walking my first steps with nosql databases, but so far my knowledge is very basic. I try to set up a database for a small invoice system.
In SQL i'd create 4 Tables: Products, Customers , Invoices, and a match table for Invoice and the produts.
But how to do this with nosql? Do i even build relations or just build 1 document for each invoice.
You should keep in mind that NoSQL design is not only based on data structure but also strongly on data function. So you should first ask yourself what kind of queries you need to do over your data and take it from there.
First figure out how far you want to go with denormalization and aggregation. For instance: what sets of data will often require to query or update at once? And try to keep that to a single document even if it means duplicating data from other entities (i.e. Storing customer data along with the invoice data).
So ask yourself why you want to use non relational databases, and how will you use that data. Then decide which modeling techniques to apply and how far. The highly scalable blog has a great article about NoSQL data modeling if you care to give it a read.
... or just build 1 document for each invoice.
Yes, do that for the beginning. Imagine your data in the CouchDB as read-only copy of your data in the relational database. The docs are like the result of your SQL queries.
Do i even build relations?
Of course you can, its the same as in your SQL tables. You including ids of foreign docs and name the property regarding to the relation you want to express e.g. doc.customer_id in an invoice doc can point to the doc._id of a customer doc.
Its helpful you imagine the CouchDB views as "relations" e.g. you can create a view called InvoicesByCustomer with the example above.
But summarized i would recommend to begin with the 1 document for each invoice.-approach and follow #JavoSN hint ...
So you should first ask yourself what kind of queries you need to do over your data and take it from there
... when you know that clearly its time to dig deeper into your possibilities of document designs.

Is Cassandra's thrift interface aware of compound keys table CQL3?

Is there a way for Cassandra's Thrift interface to know in advance whether a particular client query will use a compound keys defined table (CQL3)? How can you know what the schema is for the table?
Cassandra stores the schema information in some system tables. You could query those to get the schema information that indicates that the rows have a compound primary key.
But you might want to reconsider why you want to do this. Your application program should know the schema of the tables it manipulates; it should already know what tables it uses and what their primary key is.
Check out this question and the answer for details on how to determine the schema from system tables.
Anyways as Raedwald already said, you should probably ask twice why you'd want to do this.

Design approach cassandra for Rowkey and already exists check

I am new to cassandra and my cassandra is giving lot of read timeout errors..tweaked timout but still problem may be problem with design (for my application cassandra expected to store trillions of data):
Question 1 : In an all my cassandra tables i use UUID as rowkey...but for few tables just for maintainence i break that rule like in user table i make email id as rowkey....so that looking at tables i can understand data stored...IS using UUID right approach for huge case and second approach for user table is right or not ???????????
Question 2 : i have one relations table with startNodeId, relationTypeId, endNodeId...rowkey for that is UUID which is relationId.....i define secondary indexes on startNode, relationType, endNode as i can have lookup by any of them by business case.........becuase of that for each new row i have to do get to check ALREADY existing relation or not....One approach to avoid existing check is : i take startNodeId, relationTypeId, endNodeId SORT them and create HASH CODE and use that as ROWKEY...so my already checking explicitly will be avoided here..........IS THIS RIGHT approach ???????
Please guide me i am stuck at these thoughts...any guidance will really help me
Answering to your first question, until and unless you are comfortable in handling the rowkey with non-uuid value, its great also easier to track else go for the UUID.
Regarding to your second question, why don't you try the compound key. You don't have to maintain hashcode like stuffs, leave it on Cassandra.
1) Better use natural keys not UUIDs. Email, timestamp, composite primary keys, and so on. Using UUID is an approach from RDBMS world, you should avoid it in Cassandra
2) Read-modify-update is wrong pattern for Cassandra. Try rewritng data, if your business case allows this. Or just use timestamp and get the row with latest timestamp (don't forget about TTL).

Resources