Core data predicate based on multiple entitites - core-data

Normally if you have a 1 to many relationship in Core Data I understand that you should set that up as a relationship in the data model.
In this case, it is difficult to do because of the origin and management of the data.
I'm trying to essentially accomplish a join.
I'd like to fetch an entity A which meets some criteria on A but also meets a criteria on B.code and another attribute.
select statement would be
select attributeFromA from A, B where A.code = B.code and B.attrib="foo"
Is there a reasonable way to accomplish this without creating a relationship in core data?

I've only found two solutions, neither very good.
From what I've read, Core Data does not support a query against multiple entities unless they have a relationship between them.
Add a relationship anyway. This can be particularly bad since the data is coming from a server. No way to easily maintain relationships when individually updating each table from the server. Need to recreate relationships when data changes.
Manually perform the join outside of Core Data. In the above case, the intent is to get the set of object identifiers ('code') that match. One way to do that is to perform separate queries then get the intersection. Setup each query to only retrieve 'code', not managed objects.

Related

Reuse same database tables in different repositories (repositories overlap on the data they access)

Suppose I have database tables Customer, Order, Item. I have OrderRepository that accesses, directly with SQL/my ORM, both the Order and Items table. E.g. I could have a method, getItems on the OrderRespositry that returns all items of that order.
Suppose I now also create ItemRepository. Given I now have 2 repositories accessing the same database table, is that generally considered poor design? My thinking is, sometimes a user wants to update the details about an Item (e.g. name), but when using the OrdersRepository, it doesn't really make sense to not be able to access the items directly (you want to know about all the items in an order)
Of course, the OrderRepository could internally create* an ItemRepository and call methods like getItemsById(ids: string[]). However, consider the case that I want to get all orders and items ever purchased by a Customer. Assuming you had the orderIds for a customer, you could have a getOrders(ids: string[]) on the OrderRepository to fetch all the orders and then do a second query to fetch all the Items. I feel you make your life harder (and less efficient) in the sense you have to do the join to match items with orders in the app code rather than doing a join in SQL.
If it's not considered bad practice, is there some kind of limit to how much overlap Repositories should have with each other. I've spent a while trying to search for this on the web, but it seems all the tutorials/blogs/vdieos really don't go further than 1 table per entity (which may be an anti-pattern).
Or am I missing a trick?
Thanks
FYI: using express with TypeScript (not C#)
is a repository creating another repository considered acceptable. shouldn't only the service layer do that?
It's difficult to separate the Database Model from the DDD design but you have to.
In your example:
GetItems should have this signature - OrderRepostiory.GetItems(Ids: int[]) : ItemEntity. Note that this method returns an Entity (not a DAO from your ORM). To get the ItemEntity, the method might pull information from several DAOs (tables, through your ORM) but it should only pull what it needs for the entity's hydration.
Say you want to update an item's name using the ItemRepository, your signature for that could look like ItemRepository.rename(Id: int, name: string) : void. When this method does it's work, it could change the same table as the GetItems above but note that it could also change other tables as well (For example, it could add an audit of the change to an AuditTable).
DDD gives you the ability to use different tables for different Contexts if you want. It gives you enough flexibility to make really bold choices when it comes the infrastructure that surrounds your domain. So ultimately, it's a matter of what makes sense for your specific situation and team. Some teams would apply CQRS and the GETOrder and Rename methods will look completely different under the covers.

How to use Azure Search Service with heterogenous data sources

I have worked on Azure Search service previously where I created an indexer directly on a SQL DB in the Azure Portal.
Now I have a use-case where I would want to ingest from multiple data sources each having different data schema. Assume these data sources to be 3 search APIs of X,Y,Z teams. All of them take search term and gives back results in their own schema. I want my Azure Search Service to be proxy for these so that I have one search API that a user can use to get results from multiple sources, ordered correctly.
How should I go about doing it? I assume that I might have to create a common schema and whenever user searches something, I would call these 3 APIs and get results, map them to a common schema and then index this data in common schema into Azure Search index. Finally, call this Azure Search API to give back the results to the caller.
I would appreciate any help! If I can get hold of a better documentation for doing this work, that will be great as well.
Your assumption is correct. You can work with 3 different indexes and fire queries against them, or you can try to combine all of them in the same index. The benefit of the second approach is a better way to implement ordering / paging as all the information will be stored in the same index.
It really depends on what you mean by ordered correctly. Should team X be able to see results from teams Y and Z? The only way you can get ranked results like this is to maintain a single index with a common schema containing data from all teams.
One potential pitfall with this approach is conflicts in the schema. For example if one team requires a field to be of a specific datatype or use a specific analyzer, while another team has different requirements. We do this in our indexes, but with some carefully selected common fields and then dedicated fields prefixed according to our own naming convention to avoid conflicts.
One thing to consider is the need to reset the index. If you need to add, change or remove fields you will have to delete the index and create it again with a new schema. If you have a common index and team X needs to add a new property, you would need to reset (delete and create) the common index which affects all teams.
So, creating separate indexes per team has its benefits. Each team can have their own schema without risk of conflicts and they can reset their index without affecting the other teams.

PouchDB structure

i am new with nosql concept, so when i start to learn PouchDB, i found this conversion chart. My confusion is, how PouchDB handle if lets say i have multiple table, does it mean that i need to create multiple databases? Because from my understanding in pouchdb a database can store a lot of documents, but a document mean a row in sql or am i misunderstood?
The answer to this question seems to be surprisingly under-documented. While #llabball clearly gave a decent answer, I don't think that views are always the way to go.
As you can read here in the section When not to use map/reduce, Nolan explains that for simpler applications, the key is to abuse _ids, and leverage the power of allDocs().
In other words, if you had two separate types (say artists, and albums), then you could prefix the id of each type to obtain an easily searchable data set. For example _id: 'artist_name' & _id: 'album_title', would allow you to easily retrieve artists in name order.
Laying out the data this way will result in better performance due to not requiring extra indexes, and less code. Clearly however, if your data requirements are more complex, then views are the way to go.
... does it mean that i need to create multiple databases?
No.
... a document mean a row in sql or am i misunderstood?
That's right. The SQL table defines column header (name and type) - that are the JSON property names of the doc.
So, all docs (rows) with the same properties (a so called "schema") are the equivalent of your SQL table. You can have as much different schemata in one database as you want (visit json-schema.org for some inspiration).
How to request them separately? Create CouchDB views! You can get all/some "rows" of your tabular data (docs with the same schema) with one request as you know it from SQL.
To write such views easily the property type is very common for CouchDB docs. Your known name from a SQL table can be your type like doc.type: "animal"
Your view names will be maybe animalByName or animalByWeight. Depends on your needs.
Sometimes multiple-databases plan is a good option, like a database per user or even a database per user-feature. Take a look at this conversation on CouchDB mailing list.

How to perform intersection operation on two datasets in Key-Value store?

Let's say I have 2 datasets, one for rules, and the other for values.
I need to filter the values based on rules.
I am using a Key-Value store (couchbase, cassandra etc.). I can use multi-get to retrieve all the values from one table, and all rules for the other one, and perform validation in a loop.
However I find this is very inefficient. I move massive volume of data (values) over the network, and the client busy working on filtering.
What is the common pattern for finding the intersection between two tables with Key-Value store?
The idea behind the nosql data model is to write data in a denormalized way so that a table can answer to a precise query. To make an example imagine you have reviews made by customers on shops. You need to know the reviews made by a user on shops and also reviews received by a shop. This would be modeled using two tables
ShopReviews
UserReviews
In the first table you query by shop id in the second by user id but data are written twice and accessed directly using just a key access.
In the same way you should organize values by rules (can't be more precise without knowing what's the relation between them) and so on. One more consideration: newer versions of nosql db supports collections which might help to model 1 to many relations.
HTH, Carlo

Core Data: am I on the right track? Setting up data model for data that contains multiple arrays, eg. accelerometer data

I am working on a project that involves a lot of data, and at first I was doing it all in plist, and I realized it was getting out of hand and I would have to learn Core Data. I'm still not entirely sure whether I can do what I want in Core Data, but I think it should work out. I've set up a data model, but I'm not sure if it's the right way to do it. Please read on if you think you can help out and let me know if I'm on the right track. Please bear with me, because I am trying to explain it as thoroughly as I can.
I've got the basic object with attributes set up at the root level; say a person with attributes like a name, date of birth, etc. Pretty simple. You set up one entity like this "Person" in your model, and you can save as many of them as you want in your data and retrieve them as an array, right? It could be sorted based on an attribute in the Person, such as the date they were added to the database.
Now where I get a bit more confused is when I want to store several different collections of data with each person. For example a list of courses and associated test marks. In a plist I would have stored an array of dictionaries that stored this, sorted by the date assessed. The way I set this up in my data model was that I added an entity called "Tests" and a "to-many" relationship from Person to Tests, and then when I pull that I get an NSSet that I can order by a timestamp again? Is there a better way to do this?
Similarly the Person may have a set of arrays of numerical data (the kind that you could graph over time,eg. Nike+ stores your running data like distance vs time, and a person would have multiple runs associated with them, hence a set of arrays, each with their own associated date of collection). The way I set this up is a little different, with a "Runs" attribute with just a timestamp attribute, and that is connected from Person via a to-many relationship, with inverse "forPerson". Then the Runs entity is connected to another entity via a to-many relationship that has attributes to store numerical data and the time. This would once again I would use a time/order attribute to sort them.
So the main question I have is whether using an internal attribute like timestamp to sort a set would be the right way to load in a "array" from core data. Searching forums/stack overflow about how to store NSArrays in core data seem overly complicated compared to this, giving me the sense that I'm misunderstanding something.
Thanks for your help. Sorry for all the text, but I'm new to Core Data and I figure setting up the data model properly is essential before starting to code methods for getting/saving data. If necessary, I can set up a sample model to demonstrate this and post a picture of it.
CoreData will give you NSSets by default. These are convertible to arrays by calling allObjects or sortedArrayUsingDescriptors, if you want a sorted array. The "ordered" property on the relationship description gives you an NSOrderedSet in the managed object. Hashed sets provide quicker adds, access and membership checks, with a penalty (relative to ordered sets) for the sort.

Resources