Deleting 'foreign key' relationships in CosmosDB? - azure

I'm using CosmosDB as a document database, for some things I have to store a 'foreign key' link and the way I've been doing it is as a string property on a document pointing at the ID of the 'foreign key' document.
If the foreign document is deleted what would the most efficient way be of finding all foreign key links and ensuring they are removed?
I'd like it to be as automated as possible so I don't have to 'think' about it too hard in a project, my best solution in my head is to store foreign key links in a well defined structure like:
{
"foreignId": "crazy_person",
"foreignType": "person"
}
And store that structure in any way shape or form across various documents, then when "crazy_person" is deleted I find all documents which have that structure defined and if the foreignId/foreignType matches crazy_person to remove them.
Not sure how I would implement this though as the structure above could be in any given document, like so:
{
"foreignPerson": {
"foreignId": "crazy_person",
"foreignType": "person"
},
"foreignPeople": [
{
"foreignId": "crazy_person",
"foreignType": "person"
}
]
}

There is NO Foreign Key concept in CosmosDB. According to the docs,
any inter-document relationships that you have in documents are
effectively "weak links" and will not be verified by the database
itself. If you want to ensure that the data a document is referring to
actually exists, then you need to do this in your application, or
through the use of server-side triggers or stored procedures on Azure
Cosmos DB.

Related

Can I create multiple collections per database?

Switching from mongo to pouchdb (with Cloudant), i like the "one database per user" concept, but is there a way to create multiple collections/tables per database ?
Example
- Peter
- History
- Settings
- Friends
- John
- History
- Settings
- Friends
etc...
Couchdb does not have the concept of collections. However, you can achieve similar results using type identifiers on your documents in conjunction with Couchdb views.
Type Identifiers
When you save a document in Couchdb add a field that specifies the type. For example, you would store a friend like so:
{
_id: "XXXX",
type: "Friend",
first_name: "John",
...
}
And you would store history like this:
{
_id: "XXXX",
type: "History",
url: "http://www.google.com",
...
}
Both of these documents would be in the same database, and if you queried all documents on that database then you would receive both.
Views
You can create views that filter on type and then query those views directly. For example, create a view to retrieve friends like so (in Cloudant you can go to add new Design Document and you can copy and paste this directly):
{
"_id" : "_design/friends",
"views" : {
"all" : {
"map" : "function(doc){ if (doc.type && doc.type == 'Friend') { emit(doc._id, doc._rev)}}"
}
}
}
Let's expand the map function:
function(doc) {
if (doc.type && doc.type == "Friend") {
emit(doc._id, doc._rev);
}
}
Essentially this map function is saying to only associate documents to this view that have type == "Friend". Now, we can query this view and only friends will be returned:
http://SERVER/DATABASE/_design/friends/_view/all
Where friends = name of the design document and all = name of the view. Replace SERVER with your server and DATABASE with your database name.
You can find more information about views here:
https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
You could look into relational-pouch for something like this. Else you could do "3 databases per user." ;)
I may not fully understand what you need here but in general you can achieve what you describe in 3 different ways in CouchDB/Cloudant/PouchDB.
Single document per person (Peter, John). Sure - if the collections are not enormous and more importantly if they are not updated by different users concurrently (or worse in different database instances) leading to conflicts then, in JSON just an element for each collection, holding an array and you can manipulate everything with just one document. Makes access a breeze.
Single document per collection (Peter History, Peter Settings ect). Similar constraints, but you could create a document to hold each of these collections. Provided they will not be concurrently modified often, you would then have a document for Peter's History, and another for Peter's Settings.
Single document per item. This is the finest grain approach - lots of small simple documents each containing one element (say a single History entry for Peter). The code gets slightly simpler because removing items becomes a delete and many clients can update items simultaneously, but now you depend on Views to bring all the items into a list. A view with keys [person, listName, item] for example would let you access what you want.
Generally your data schema decisions come down to concurrency. You mention PouchDB so it may be that you have a single threaded client and option 1 is nice and easy?

Basic CouchDB Queries

I've never worked with a database before, but I chose Couch DB because I needed a Json database, and HTTP queries seemed kinda simple. However the documentation assumes a level of knowledge I just don't have.
Assuming I have a database called 'subjects', it seems I can access the json by using GET on
http://localhost:5984/subjects/c6604f65029f1a6a5d565da029001f4c
However beyond that I'm stuck. Ideally I want to be able to:
Access a list of all the keys in the database (not their values)
Access an individual element by its key
Do I need to use views for this? Or can I just set fields in my GET request? Can someone give me a complete example of the request they'd use? Please don't link to the CouchDB documentation, it really hasn't helped me so far.
Views can be used to fetch the data
1) In order to get all keys from the database you can use below view
function(doc) {
if (doc.type=="article")
emit(doc._id,null); //emit(key,value), if you have any other field as key then specify as doc.key e.g doc.
}
You can access this view from browser using below URL
http://<ipaddress>:<port>/databasename/_design/designdocumentname/_view/viewname
e.g :
http://<ipaddress>:<port>/article/_design/articlelist/_view/articlelist
article is the database name,articlelist is name of the design document as well as view.
2) In order to access individual document by key
Below view will return all the articles belonging to a particular department
function(doc) {
if(doc.type == 'article' ) {
emit([doc.departmentname], doc);
}
}
Query this view based on the "department name"
e.g: Get all the articles belonging to "IBU3" department
http://<ipaddress>:<port>/department/_design/categoryname/_view/categoryname?key=[%22IBU3%22]

Is a type property the correct way to store different data entities in CouchDB?

I'm trying to wrap my head around CouchDB. I'm trying to switch off of MongoDB to CouchDB because I think the concept of views are more appealing to me. In CouchDB it looks like all records are stored in a single database. There is no concept of collections or anything, like in MongoDB. So, when storing different data entities such as users, blog posts, comments, etc, how do you differentiate between them from within your map reduce functions? I was thinking about just using some sort of type property and for each item I'd just have to make sure to specify the type, always. This line of thought was sort of reinforced when I read over the CouchDB cookbook website, in which an example does the same thing.
Is this the most reliable way of doing this, or is there a better method? I was thinking of alternatives, and I think the only other alternative way is to basically embed as much as I can into logical documents. Like, the immediate records inside of the database would all be User records, and each User would have an array of Posts, in which you just add all of the Posts to. The downside here would be that embedded documents wouldn't get their own id properties, correct?
Using type is convenient and fast when creating views. Alternatively you can consider using a part of the JSON document. I.e., instead of defining:
{
type: "user",
firstname: "John",
lastname: "Smith"
}
You would have:
{
user: {
firstname: "John",
lastname: "Smith"
}
}
And then in the view for emitting documents containing user information, instead of using:
function (doc) {
if (doc.type === "user") emit(null, doc);
}
You would write:
function (doc) {
if (doc.user) emit(null, doc);
}
As you can see there is not much difference. As you have already realized 1st approach is the most widely used but second (afaik) is well accepted.
Regarding the question of storing all Posts of one User in one single document. Depends on how you plan to update your document. Remember that you need to write the whole document each time that you update (unless you use attachments). That means that each time a user writes a new Post you need to retrieve the document containing the array of Posts, add/modify one element and update the document. Probably too much (heavy).

CouchDB - Parameter and views - What goes on behind the scenes, and is it fast/faster than temporary views?

Considering these three documents...
[
{
_id: "...",
_rev: "...",
title: "Foo",
body: "..."
},
{
_id: "...",
_rev: "...",
title: "Bar",
body: "..."
},
{
_id: "...",
_rev: "...",
title: "Hello World!",
body: "..."
},
]
And this view...
byTitle: {
map: function (document)
{
emit(document.title, document);
}
}
What goes on behind the scenes, when I query the view?...
GET /database/_design/posts/_view/byTitle?key="Foo"
I've asked a few questions on views lately... questions about what I phrased as "dynamic parameters"... Essentially I wanted to know how to do the equivalent of SELECT ... WHERE field = parameter
All answers steered me towards using temporary views, which are really slow, and should not be used in production. So my second question is... is the above method for querying by title, fit for use in production? Or am I forcing CouchDB to do unspeakable horrors, performance-wise?... am I essentially doing the same as using a temporary view?
I think you have misinterpreted some answer. You can use a temporary view to test various map/reduce functions. When you are satisfied with the code you should put it into a design document and use it for querying.
Temporary views are slow because the index is built and deleted for every query. Putting it into a design document, tells CouchDB to not delete the index and to keep it updated (this is done on query time).
So
GET /database/_design/posts/_view/byTitle?key="Foo"
is the fastest way to query by title because it is indexed.
As a side note: you can use
byTitle: {
map: function (document)
{
emit(document.title, null);
}
}
and query with include_docs=true to save some disk space.
For answering your question, a few things have to be cleared out (and I hope I get it all right):
Permanent vs. temporary views:
The difference between permanent and temporary views is, that permanent views are stored permanently.
In order to understand the storing part, you need to know, that CouchDB's storage engine relies on a B+ Tree offering very powerful indexing capabilities that enable us to find data in that storage by key in a "logarithmic amortized time" (CouchDB book).
CouchDB is handling documents in an "append only" manner. That means it is not like in the most relational DBMS where single values within a table row get updated and locking occurs. If a document is updated, it simply incrementally is set a new revision (_rev) and is appended to the storage.
When you are creating a permanent view, upon querying it the first time, for each document in your database, your new view is executed, storing that data to a new B+ tree file for that view, thus providing a new index to aggregate data according to the key you defined in your view.
Upon updating documents that are handled by that view, not the whole permanent view needs to be recomputed, but only the updated documents.
Now you should be able to understand why temporary views are nice for developing or testing in Futon, but since they have to be computed new for all your documents are not recommended for anything else than development.
Anyways. Marcello is right. If you are intending to just pass back complete documents, it is are encouraged to query with "include_docs=true". Why? Because the B-tree for your permanent view will just need to store the copied data next to your indexing key.
#Marcello-Nuccio I am not sure although if it is correct to say, that dynamic views have no index? As I understood, they have an index, but it makes no sense as they are computed new upon every query? Ok, now my brbain is hurting!

Does CouchDB supports referential integrity?

I am new to CouchDB and learning about it. I did not come across CouchDB support for referential integrity.
Can we create a foreign key for a field in the CouchDB document?
For e.g. Is it possible to ensure a vendor name used in a order document is available in the vendor database?
Does CouchDB support referential integrity?
And Is it possible to make a field in a document as Primary key?
No, CouchDB doesn't do foreign keys as such, so you can't have it handle the referential integrity of the system for you. You would need to handle checking for vendors in the application level.
As to whether you can make a field a primary key, the primary key is the _id field, but you can use any valid json as a key for the views on the db. So, for instance you could create a view of orders with their vendor as the key.
something like
function(doc) {
if (doc.type == 'order')
emit(doc.vendor,doc);
}
would grab all the docs in the database that have a type attribute with the value order and add them to a view using their vendor as the key.
Intro to CouchDB views
These questions are incredibly relational database specific.
In CouchDB, or any other non-RDBMS, you wouldn't store your data the same way you would in an RDBMS so designing the relationship this way may not be best. But, just to give you an idea of how you could do this, lets assume you have a document for a vendor and a bunch of documents for orders that need to "relate" back to the vendor document.
There are no primary keys, documents have an _id which is a uuid. If you have a document for a vendor, and you're creating a new document for something like an order, you can reference the vendor documents _id.
{"type":"order","vendor-id":"asd7d7f6ds76f7d7s"}
To look up all orders for a particular vendor you would have a map view something like:
function(doc) { if (doc.type == 'order') {emit(doc['vendor-id'], doc)}}
The document _id will not change, so there is "integrity" there, even if you change other attributes on the vendor document like their name or billing information. If you stick the vendor name or other attributes from the vendor document directly in to the order document you would need to write a script if you ever wanted to change them in bulk.
Hope that helps a bit.
While not possible to create an FK constraint, it is possible using Couch's Validate function
function(newDoc, oldDoc, userCtx, secObj) {
if(newDoc && newDoc.type) switch(newDoc.type){
case 'fish':
var allSpecies = ['trout','goldfish'];
if(!allSpecies.contains(newDoc.species)){
throw({forbidden : 'fish must be of a know species'});
}
break;
case 'mammals':
if(!['M','F'].contains(newDoc.sex)){
throw({forbidden : 'mammals must have their sex listed'});
}
break;
}
}
Now, if a person were really clever (I'm not), they might do a call out to the DB itself for the list of Species... that would be a foreign key.
You may also want to read up on:
How do I DRY up my CouchDB views?

Resources