CouchDB and Couchbase Document Keys - couchdb

In reference material for CouchDB and Couchbase it's common guidance to store the type of a document as a parameter within the actual document.
I've got a database, where I have different documents that record certain behaviour by URL. So naturally, I use the URL as the id of the document.
The problem I find is that by using just the key as the document id, I now get clashes between documents of different types. So I have started using the type as the first part of the key like this:
{ doc._id: "rss_entry|http://www.spiegel.de/1234", [...] }
{ doc._id: "page_text|http://www.spiegel.de/1234", [...] }
Now I start to wonder why I've never seen this approach to model type in any of the documentation.

Prefixes are commonly used. In addition to support for scenarios such as yours, prefixing allows one to perform logical range queries against views. There is use of this technique in the modeling examples, but perhaps the concept is not described in as much detail as you are expecting. In the section http://docs.couchbase.com/couchbase-devguide-2.5/#modeling-documents, the documents are keyed as beer_NNNN and brewery_NNNN. Also, the section http://docs.couchbase.com/couchbase-devguide-2.5/#using-reference-documents-for-lookups goes a bit deeper into this technique. There is a counter document named user::count and then each user is keyed as user::NNNN. Additionally, there are documents in the example that are keyed as fb::NNNN for a Facebook ID, email::XXX#YYYY.com for a user's email address, etc.

Related

CouchDB check if a document exists in a validation function

I would like to see if a document exists in the database that has the name field "name" set to "a name" before allowing a new document to be added to the database.
I this possible in CouchDB using update handlers (inside design documents)?
Seems you are looking for a unique constraint in CouchDB. The only unique constraint supported by CouchDB is based on the document ID.
You should include your "name" attribute value into the document ID if you would like to have the document unicity based on it.
Validate document update functions defined in desing documents can only use the data of the document being created/updated/deleted, it can no use data from other documents in the database.
Yo can find a similar question here.
This is not widely known, but _update endpoint allowed to return a doc with _id prop different from requested. It means, in your case, you need to have an unique document say _id:"doc-name", which will serve as a constraint.
Then you call smth like POST _design/whatever/_update/saveDependentDoc/doc-name, providing new doc with different _id as a request body.
Your _update function will effectively receive two docs as an input (or null and newDoc if constraint doc is missing). The function then decides what should it do: return received doc to persist it, or return nothing.
The solution isn’t a full answer to your question, however it might be helpful in some cases.
This trick only works for updating existing docs if you know revision, for sure.

DDD/CQRS: Combining read models for UI requirements

Let's use the classic example of blog context. In our domain we have the following scenarios: Users can write Posts. Posts must be cataloged at least in one Category. Posts can be described using Tags. Users can comment on Posts.
The four entities (Post, Category, Tag, Comment) are implemented as different aggregates because of I have not detected any rule for that an entity data should interfere in another. So, for each aggregate I will have one repository that represent it. Too, each aggregate reference others by his id.
Following CQRS, from this scenario I have deducted typical use cases that result on commands such as WriteNewPostCommand, PublishPostCommand, DeletePostCommand etc... along with their respective queries to get data from repositories. FindPostByIdQuery, FindTagByTagNameQuery, FindPostsByAuthorIdQuery etc...
Depending on which site of the app we are (backend or fronted) we will have queries more or less complex. So, if we are on the front page maybe we need build some widgets to get last comments, latest post of a category, etc... Queries that involve a simple Query object (few search criterias) and a QueryHandler very simple (a single repository as dependency on the handler class)
But in other places this queries can be more complex. In an admin panel we require to show in a table a relation that satisfy a complex search criteria. Might be interesting search posts by: author name (no id), categories names, tags name, publish date... Criterias that belongs to different aggregates and different repositories.
In addition, in our table of post we dont want to show the post along with author ID, or categories ID. We need to show all information (name user, avatar, category name, category icon etc).
My questions are:
At infrastructure layer, when we design repositories, the search methods (findAll, findById, findByCriterias...), should have return the corresponding entity referencing to all associations id's? I mean, If a have a method findPostById(uuid) or findPostByCustomFilter(filter), should return a post instance with a reference to all categories id it has, all tags id, and author id that it has? Or should my repo have some kind of method that populates a given post instance with the associations I want?
If I want to search posts created from 12/12/2014, written by John, and categorised on "News" and "Videos" categories and tags "sci-fi" and "adventure", and get the full details of each aggregate, how should create my Query and QueryHandler?
a) Create a Query with all my parameters (authorName, categoriesNames, TagsNames, if a want retrive User, Category, Tag association full detailed) and then his QueryHandler ensamble the different read models in a only one. Or...
b) Create different Queries (FindCategoryByName, FindTagByName, FindUserByName) and then my web controller calls them for later
call to FindPostQuery but now passing him the authorid, categoryid, tagid returned from the other queries?
The b) solution appear more clean but it seems me more expensive.
On the query side, there are no entities. You are free to populate your read models in any way suits your requirements best. Whatever data you need to display on (a part of) the screen, you put it in the read model. It's not the command side repositories that return these read models but specialized query side data access objects.
You mentioned "complex search criteria" -- I recommend you model it with a corresponding SearchCriteria object. This object would be technnology agnostic, but it would be passed to your Query side data access object that would know how to combine the criteria to build a lower level query for the specific data store it's targeted at.
With simple applications like this, it's easier to not get distracted by aggregates. Do event sourcing, subscribe to the events by one set of tables that is easy to query the way you want.
Another words, it sounds like you're main goal is to be able to query easily for the scenarios you describe. Start with that end goal. Now write your event handler to adjust your tables accordingly.
Start with events and the UI. Then everything else will fit easily. Google "Event Modeling" as it will help you formulate ideas sound what and how you want to build these style of applications.
I can see three problems in your approach and they need to be solved separately:
In CQRS the Queries are completely separate from the Commands. So, don't try to solve your queries with your Commands pipelines repositories. The point of CQRS is precisely to allow you to solve the commands and queries in very different ways, as they have very different requirements.
You mention DDD in the question title, but you don't mention your Bounded Contexts in the question itself. If you follow DDD, you'll most likely have more than one BC. For example, in your question, it could be that CategoryName and AuthorName belong to two different BCs, which are also different from the BC where the blog posts are. If that is the case and each BC properly owns its own data, the data that you want to search by and show in the UI will be stored potentially in different databases, therefore implementing a query in the DB with a join might not even be possible.
Searching and Reading data are two different concerns and can/should be solved differently. When you search, you get some search criteria (including sorting and paging) and the result is basically a list of IDs (authorIds, postIds, commentIds). When you Read data, you get one or more Ids and the result is one or more DTOs with all the required data properties. It is normal that you need to read data from multiple BCs to populate a single page, that's called UI composition.
So if we agree on these 3 points and especially focussing on point 3, I would suggest the following:
Figure out all the searches that you want to do and see if you can decompose them to simple searches by BC. For example, search blog posts by author name is a problem, because the author information could be in a different BC than the blog posts. So, why not implement a SearchAuthorByName in the Authors BC and then a SearchPostsByAuthorId in the Posts BC. You can do this from the Client itself or from the API. Doing it in the client gives the client a lot of flexibility because there are many ways a client can get an authorId (from a MyFavourites list, from a paginated list or from a search by name) and then get the posts by authorId is a separate operation. You can do the same by tags, categories and other things. The Post will have Ids, but not the extra details about those IDs.
Potentially, you might want more complicated searches. As long as the search criteria (including sorting fields) contain fields from a single BC, you can easily create a read model and execute the search there. Note that this is only for the search criteria. If the search result needs data from multiple BCs you can solve it with UI composition. But if the search criteria contain fields from multiple BCs, then you'll need some sort of Search engine capable of indexing data coming from multiple sources. This is especially evident if you want to do full-text search, search by categories, tags, etc. with large quantities of data. You will need to use some specialized service like Elastic Search and it won't belong to any of your existing BCs, it'll be like a supporting service.
From CQRS you will have a separeted Stack for Queries and Commands. Your query stack should represent a diferente module, namespace, dll or package at your project.
a) You will create one QueryModel and this query model will return whatever you need. If you are familiar with Entity Framework or NHibernate, you will create a Façade to hold this queries togheter, DbContext or Session.
b) You can create this separeted queries, but saying again, if you are familiar with any ORM your should return the set that represents the model, return every set as IQueryable and use LET (Linq Expression Trees) to make your Query stack more dynamic.
Using Entity Framework and C# for exemple:
public class QueryModelDatabase : DbContext, IQueryModelDatabase
{
public QueryModelDatabase() : base("dbname")
{
_products = base.Set<Product>();
_orders = base.Set<Order>();
}
private readonly DbSet<Order> _orders = null;
private readonly DbSet<Product> _products = null;
public IQueryable<Order> Orders
{
get { return this._orders.Include("Items").Include("Items.Product"); }
}
public IQueryable<Product> Products
{
get { return _products; }
}
}
Then you should do queries the way you need and return anything:
using (var db = new QueryModelDatabase())
{
var queryable = from o in db.Orders.Include(p => p.Items).Include("Details.Product")
where o.OrderId == orderId
select new OrderFoundViewModel
{
Id = o.OrderId,
State = o.State.ToString(),
Total = o.Total,
OrderDate = o.Date,
Details = o.Items
};
try
{
var o = queryable.First();
return o;
}
catch (InvalidOperationException)
{
return new OrderFoundViewModel();
}
}

What is the difference between `ID` and `Internal ID` for NetSuite records?

According to the help pop up:
ID
This field's value represents the script ID, used to identify this
record for scripting purposes. It is a text field.
Internal ID
This field's value is a read-only system-generated unique identifier.
It is an integer field.
Both fields seem to uniquely identity a record type.
One is a string, one a integer.
The string ID is used for searches and
loading of records, but I've also seen Internal ID used when
referring to a record type from a lists point of view.
Can anyone provide the reasoning behind having two identifiers and when to use one versus the other when scripting?
The major difference is that you (as the creator of a custom record or script) are in complete control of the text ID. You can establish patterns and best practices for defining these IDs, and it will make it very easy for developers to identify record types just by looking at the string ID. You have no control over the numeric ID. When looking at code, it is much easier for me to determine what records I am referring to if it looks like:
nlapiSearchRecord('customrecord_product', null, filters, columns);
nlapiResolveURL('SUITELET', 'customscript_sl_orderservice', 'customdeploy_sl_orderservice')
as opposed to looking at:
nlapiSearchRecord(118, null, filters, columns);
nlapiResolveURL('SUITELET', 13, 1)
I'm not even sure the second nlapiSearchRecord actually works, but I know that nlapiResolveURL can be written that way.
That said, if you simply let NetSuite generate the text ID, you'll end up with generic IDs like customrecord1, which I find no more useful than the numeric ID. It is a good practice to explicitly specify your own IDs.
Furthermore, the numeric ID can vary between environments (e.g. Sandbox could be different than Production, until a subsequent refresh occurs). If you are following good migration practices, then the text ID should never vary between environments, so your code would not have to make any kind of decision on which ID to use based on environment.
Rarely have I found myself referencing any record, whether native or custom, by its numeric ID; scripts are always using the text ID to reference a record type.

Retrieving a value object without Aggreteroot

I'm developing an application with Domain Drive Design approach. in a special case I have to retrieve the list of value objects of an aggregate and present them. to do that I've created a read only repository like this:
public interface IBlogTagReadOnlyRepository : IReadOnlyRepository<BlogTag, string>
{
IEnumerable<BlogTag> GetAllBlogTagsQuery(string tagName);
}
BlogTag is a value object in Blog aggregate, now it works fine but when I think about this way of handling and the future of the project, my concerns grow! it's not a good idea to create a separate read only repository for every value object included in those cases, is it?
anybody knows a better solution?
You should not keep value objects in their own repository since only aggregate roots belong there. Instead you should review your domain model carefully.
If you need to keep track of value objects spanning multiple aggregates, then maybe they belong to another aggregate (e.g. a tag cloud) that could even serve as sort of a factory for the tags.
This doesn't mean you don't need a BlogTag value object in your Blog aggregate. A value object in one aggregate could be an entity in another or even an aggregate root by itself.
Maybe you should take a look at this question. It addresses a similar problem.
I think you just need a query service as this method serves the user interface, it's just for presentation (reporting), do something like..
public IEnumerable<BlogTagViewModel> GetDistinctListOfBlogTagsForPublishedPosts()
{
var tags = new List<BlogTagViewModel>();
// Go to database and run query
// transform to collection of BlogTagViewModel
return tags;
}
This code would be at the application layer level not the domain layer.
And notice the language I use in the method name, it makes it a bit more explicit and tells people using the query exactly what the method does (if this is your intent - I am guessing a little, but hopefully you get what I mean).
Cheers
Scott

CouchDB views - Multiple join... Can it be done?

I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.

Resources