Fetch into Map for a one to many relationship in JOOQ - jooq

We have a Feature table and a Tier table, 1 tier can have multiple features. So wanted to query and get output as
Map<Byte,List<String>>
byte corresponds to a tierID and list of feature names
Managed to get an answer from an older post here
Question
Further we wanted to get a map of
Map<id, List<List<FeaturePojo>>

got it working, using the following
.intoGroups(FreemiumFeature.FREEMIUM_FEATURE.ID,
c->c.into(Fr‌​eemiumTierFeature.FR‌​EEMIUM_TIER_FEATURE)‌​
.into(FeaturePojo.class));

Related

How to use Azure Search Service with heterogenous data sources

I have worked on Azure Search service previously where I created an indexer directly on a SQL DB in the Azure Portal.
Now I have a use-case where I would want to ingest from multiple data sources each having different data schema. Assume these data sources to be 3 search APIs of X,Y,Z teams. All of them take search term and gives back results in their own schema. I want my Azure Search Service to be proxy for these so that I have one search API that a user can use to get results from multiple sources, ordered correctly.
How should I go about doing it? I assume that I might have to create a common schema and whenever user searches something, I would call these 3 APIs and get results, map them to a common schema and then index this data in common schema into Azure Search index. Finally, call this Azure Search API to give back the results to the caller.
I would appreciate any help! If I can get hold of a better documentation for doing this work, that will be great as well.
Your assumption is correct. You can work with 3 different indexes and fire queries against them, or you can try to combine all of them in the same index. The benefit of the second approach is a better way to implement ordering / paging as all the information will be stored in the same index.
It really depends on what you mean by ordered correctly. Should team X be able to see results from teams Y and Z? The only way you can get ranked results like this is to maintain a single index with a common schema containing data from all teams.
One potential pitfall with this approach is conflicts in the schema. For example if one team requires a field to be of a specific datatype or use a specific analyzer, while another team has different requirements. We do this in our indexes, but with some carefully selected common fields and then dedicated fields prefixed according to our own naming convention to avoid conflicts.
One thing to consider is the need to reset the index. If you need to add, change or remove fields you will have to delete the index and create it again with a new schema. If you have a common index and team X needs to add a new property, you would need to reset (delete and create) the common index which affects all teams.
So, creating separate indexes per team has its benefits. Each team can have their own schema without risk of conflicts and they can reset their index without affecting the other teams.

Azure Table Storage Filter Query

I am trying to filter rows against a String Type column. Basically I wanted to filter with part of string. It is very similar to LIKE operation in MySQL.
I have gone through this document https://learn.microsoft.com/en-us/rest/api/storageservices/querying-tables-and-entities
However, I couldn't find relevant information for my requirement. Any suggestion more helpful.
Basically I wanted to filter with part of string. It is very similar
to LIKE operation in MySQL.
Azure Tables have limited querying support and unfortunately LIKE is unsupported. What you would need to do is fetch all the entities on the client side and then apply the filter there.

Way to dump the relations from Freebase?

I have ran through the Google API for Freebase, but still confusing.
Is there simple way to dump the relations from Freebase?
I want to dump all entity-name-pair with a specific relation (e.g. marry_with, ...), and also want the chinese entity names.
Should I
write MQL to query all entity satisfying the condition? (but the MQL service is going to be retired recently. )
or dump all freebase and parse?
or is there other API capable of doing this?
or other KB (YAGO, DBpedia, wikidata) is more easier of doing this?
Which way is easier to work out.
Please shed me some direction . thanks
Freebase was retired and Wikidata is the recommended alternative.
You can use the Wikidata Query API to get entities with a specific property.
For instance, the query http://wdq.wmflabs.org/api?q=CLAIM[26] retrieves the IDs of all items having the property spouse (P26).
You can combine this with the Wikidata API, for instance to get labels and aliases in English for the first three items returned by the previous query:
http://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q23|Q24|Q42&languages=en&props=labels|aliases

Is there any way to skip rows when I retrieve from Azure table storage?

I believe in the past the answer to this question was no. However has anything changed with the recent releases or does anyone know of a way that I can do this. I am using datatables and would love to be able to do something like skip 50 retrieve 50 rows. skip 100 retrieve 50 rows etc.
It is still not possible to skip rows. The only navigation construct supported is top. The Table Service REST API is the definitive way to access Wndows Azure Storage, so its documentation is the go-to location for what is or is not possible.
What you're asking here is possible using continuation tokens. Scott Densmore blogged about this a while ago to explain how you can use continuation tokens for paging when you're displaying a table (like what you're asking here with DataTables): Paging with Windows Azure Table Storage. The blog post shows how to show pages of 3 items while using continuation tokens to move forward and back between pages:
Besides that there's also Steve's post that describes the same concept: Paging Over Data in Windows Azure Tables
Yes (kinda) and no. No, in the sense that the Skip operation is not directly supported at the REST head. You could of course do it in memory, but that would defeat the purpose.
However, you can of course actually do this pattern if you structure your data correctly. We do something like this ourselves. We align our partition key to the datetime and use the RowKey as a discriminator. This means we can always pinpoint the partition range we are interested in and then Take() some amount of data. So, for example, we can easily Take() the first 20 rows per hour by specifying a unique query (skipping over data we don't want). The partion key is simply aligned per hour and then we optionally discriminate further using the RowKey - finally, we just take data. When executed in parallel, this works just dandy.
Again, the more technically correct answer is NO. However, you can approximate it cleverly using the PK and RK.

How does solr work with data split into different services and therefore not synchronously available?

take for instance an ecommerce store with catalog and price data in different web services. Now, we know that solr does not allow partial updates to a document field(JIRA bug), so how do you index these two services ?
I had three possibilities, but I'm not sure which one is correct:
Partial update - not possible
Solr join - have price and catalog in separate index and join them in solr. You cant join them in your client side code, without screwing up pagination and facet counts. I dont know if this is possible in pre-solr 4.0
have some sort of intermediate indexing service, which composes an entire document based on the results from both these services and sends this for indexing. however there are two problems with this approach:
3.1 You can still compose documents partially, and then when the document is complete, you can set a flag indicating that this is a complete document. However, to do this each time a document has to be indexed, it has to first check whether the document exists in the index, edit it and push it back. So, big performance hit.
3.2 Your intermediate service checks whether a particular id is available from all services - if not silently drops it and hopes that when it appears in the other service, the first service will already be populated. This is OK, but it means that an item is not available in search until all fields are available (not desirable always - if u dont have price, you can simply set it to out-of-stock and still have it available)
Of all these methods, only #3.2 looks viable to me - does anyone know how you do this kind of thing with DIH? Because now, you have two different entry points (2 different web services) into indexing and each has to check the other
The usual way to solve this is close to your 3.2: write code that creates the document you want to index from the different available services. The usual flow would be to fetch all the items from the catalog, then fetch the prices when indexing. Wether you want to have items in the search from the catalog that doesn't have prices available depends on your business rules for the service. If you want to speed up the process (fetch product, fetch price, repeat), expand the API to fetch 1000 products and then prices for all the products at the same time.
There is no reason why you should drop an item from the index if it doesn't have price, unless you don't want items without prices in your index. It's up to you and your particular need what kind of information you need to have available before indexing the document.
As far as I remember 4.0 will probably support partial updates as it moves to the new abstraction layer for the index files, although I'm not sure it'll make your situation that much more flexible.
Approach 3.2 is the most common, though I think about it slightly differently. First, think about what you want in your search results, then create one Solr document for each potential result, with as much information as you can get. If it is OK to have a missing price, then add the document that way.
You may also want to match the documents in Solr, but get the latest data for display from the web services. That gives fresh results and avoids skew between the batch updates to Solr and the live data.
Don't hold your breath for fine-grained updates to be added to Solr and Lucene. It gets a lot of its speed from not having record-level locking and update.

Resources