I have a few levels of terms in term store. I wanted to be able to create the whole tree of terms using a single query
The following request doesn't return terms with their children:
https://graph.microsoft.com/v1.0/sites/{siteId}/termStore/groups/{groupId}/sets/{setId}/terms?$expand=children
What is wrong with it?
I was able to get children for the single term by its id but not for the whole set of terms
Related
I just want to have an expert opinion about my use case and the way I am planning to use indices to see if there is no problem in my approach or if there are any better ways to achieve it. Since I am new to ES, your opinions would really help me. We are storing data in couchdb in different databases based for each type of data.
I have database that serves as a link between 2 databases. For example, database A has 'floor' data, database B that links floor to items and then separate database for each item that a floor can have (e.g., card reader, camera etc).
We need to search for items that are linked to a floor and get them with filtering and paging. (Right now my links database has only ids and type but I am also planning to save name for each type as well in links db so that I can have filtering while I can do paging).
The way I want to achieve filtering and paging in my datastore is, I'll just have indices for each db. So based on floor, i'll get all its linked items for a type and 'search filter' (from index of links db) that would give me a page of certain items, i'll then use ids from that result to get those full objects (from index of) db of that item type.
Please let me know if there is any better approach in handling that, like e.g., if I can create one index for my floor and links and item databases and is it possible to do that through logstash couchdb plugin.
Many thanks.
Your setup does not sound wrong, but there are alternatives. You can use nested objects or parent-child relationships for an easier setup. Both approaches have their advantages. It all depends on the type of queries that you would like to do, and the amount of items that are related.
I would start by reading he next section of the definitive guide, that should give you a good start.
https://www.elastic.co/guide/en/elasticsearch/guide/current/modeling-your-data.html?q=model
I have a document with nested types and I use nested query to search on the nested part.
Now I get a hit and the search result returned is the whole doc.
Could anyone tell me how can I get only this nested part or the part of document which contains this nested part?
Should I use parent-child? Or using nested types can also meet my requirement?
Thanks!
With nested documents you can only get back the whole structure (parent + all children) and you can only update the whole structure.
If you switch to parent/child you can index parent and every child independently paying the price of more memory usage and a little worse performance. On the other hand this way you can search on the parents and get back the children, or search on the children and get back the parents.
Also, children are separate documents and you can index query them independently, regardless of the fact that they have a parent.
I'm developing a search engine which functions taking the semantics of data into account, unlike the usual keyword based index. I managed to develop a reasonable index for the search using metadata extraction methods and RDF, but I have difficulty in using such methods on the search query itself since the search query is very much shorter that the actual data. any idea how to perform a successful tagging of a search query, using similar methods, natural language processing, etc. ?
Thank You!
Yes, the sample size of a typical query is too small for semantic analysis to be of any value.
One approach might be to constrain or expand your query using drop-down menus for things like "Named Entities" or "Subject Verb Object" tuples.
Another approach would be to expand simple keywords using rules created from your metadata so that, for example, a query for 'car' might be expanded to the tuple pattern
(*,[drive,operate,sell],[car,automobile,vehicle])
before submission.
Finally, you might try expanding the query with a non-semantically valuable prefix and/or suffix to get the query size large enough to trigger OpenCalais' recognizer.
Something like 'The user has specified the following terms in her query: one, two, three.'.
And once the results are returned, filter out all results that match only the added prefix/suffix.
Just a few quick thoughts.
You need to build semantic tree. It will based on the combination of keywords.
For example, automobile -->vehicle --> car this relation technical aspect of car. travel --
hire/rent-->vehicle-->car this is something related to travel and rent a car.
In this case MongoDB will help you a lot.
I have a social network set up and via an api I want to search the entries. The database of the social network is mysql. I want the search to return results in the following format: Results that match the query AND are friends of the user performing the search should be prioritized over results that simply match the query.
So can this be done in one query or will I have to do two separate queries and merge the results and remove duplicates?
I could possibly build up a data structure using Lucene and search that index efficiently, but am wondering if the penalty of updating a document everytime a new relationship is created is going to be too much?
Thanks
The reference to Lucene complicates the equation a little bit. Let's solve it (or at least get a baseline) without it first.
Assuming the following datamodel (or something approaching.
tblUsers
UserId PK
UserName
Age
...
tblBuddies
UserId FK to tblUsers.UserId
FriendId tblUsers.Userid = Id of one of the friends
BuddyRating float 0.0 to 1.0 (or whatever normalized scale) indicating
the level of friendship/similarity/whatever
tblItems
ItemId PK
ItemName
Description
Price
...
tblUsersToItems
UserId FK to tblUsers.UserId
ItemId FK to
ItemRating float 0.0 to 1.0 (or whatever normalized scale) indicating
the "value" assigned to item by user.
A naive query (but a good basis for an optimized one) could be:
SELECT [TOP 25] I.ItemId, ItemName, Description, SUM(ItemRating * BuddyRating)
FROM tblItems I
LEFT JOIN tblUserToItems UI ON I.ItemId = UI.ItemId
LEFT JOIN tblBuddies B ON UI.UserId = B.FriendId
WHERE B.UserId = 'IdOfCurrentUser'
AND SomeSearchCriteria -- Say ItemName = 'MP3 Player'
GROUP BY I.ItemId, ItemName, Description
ORDER BY SUM(ItemRating * BuddyRating) DESC
The idea is that a given item is given more weight if it is recommended/used by a friend. The extra weigh is the more important if the friend is a a close friend [BuddyRating] and/or if the friend recommend this item more strongly [ItemRating]
Optimizing such a query depends on the overal number of item, the average/max numbers of buddies a given user has, the average/max number of items a user may have in his/her list.
Is this type of ideas/info you are seeking or am I missing the question?
One way is to store all your social network graph separately from Lucene. Run your keyword query on Lucene, and also lookup all the friends in your network graph. For all the friends that are returned, boost all of those friends' search results by some factor and resort. This re-sort would be done outside of Lucene. I've done things like this before and it performs pretty well.
You can also create a custom HitCollector that does the boosting as the hits are being collected in Lucene. You'd have to construct a list of internal Lucene ID's that belong to the friends of the current user.
Your social network graph can be stored in Mysql, in memory as a sparse adjacency matrix, or you can take a look at Neo4j.
So for a new project, I'm building a system for an ecommerce site. The idea is to import products from suppliers and instead of inserting them directly into our catalog, we would store all the information in a staging area. Each supplier has their own stage (i.e. table in the database), and then I will flatten the multiple staging areas into a single entity (currently a single table but later on perhaps into Sphinx or Solr). Then our merchandisers would be able to search the staging products' relevant fields (name and description) and be shown a list of products that match and then choose to have those products pushed into the live catalog. The search will query on the single table (the flattened staging areas).
My design calls to only store searchable and filterable fields in the single flattened table - e.g. name, description, supplier_id, supplier_prod_id etc. And the search queries will return only the ID's of the items matching and a class (supplier_id) that would be used to identify which staging area the product is from.
Another senior engineer feels the flattened search table should include other meta fields (which would not be searched on), but could be used when 'pushing' the products from stage to live catalog. He also feels that the query should return all this other information.
I feel pretty strongly about only having searchable fields in the flattened table and having the search return only class/id pairs which could be used to fetch all the other necessary metadata about the product (simple select * from class_table where id in (1,2,3)).
Part of my reasoning is that this will make it easier later on to switch the flattened table from database to a search server like sphinx or solr and the rest of the code wouldn't have to be changed just because implementation of the search changed.
Am I on the right path? How can I convince the other engineer why it is important to keep only searchable fields and return only ID's? Or more specifically, why should a search application return only IDs of objects?
I think that you're on the right path. If those other fields provide no value to either uniquely identify a staged item or to allow the user to filter staged items, then the data is fundamentally useless until the item is pushed to the live environment. If the other engineer feels that the extra metadata will help the users make a more informed decision, then you might as well make those extra fields searchable (thereby meeting your stated purpose for the table(s).)
The only reason I could think of to pre-fetch that other, non-searchable data would be for a performance improvement on the push to the live environment.
You should use each tool for what it does best. A full text search engine, such as Solr or Sphinx, excels at searching textual fields and ranking the hits quickly. It has no special advantage in retrieving stored data in a select-like fashion. A database is optimized for that. So, yes, you are on the right path. Please see Search Engine versus DBMS for other issues involved in deciding what to store inside the search engine.
In the case of sphinx, it only returns document ids and named attributes back to you anyway (attributes being numerical data, for the most part). I'd say you've got the right idea as the other metadata is just a simple JOIN away from the flattened table if you need it.
You can regard Solr as a powerfull index, so as an index gives IDs back, it would be logical that solr does the same.
You can use the solr query parameter fl to ask for identifier only results, for instance fl=id.
However, there's a feature that needs solr to give you back some data too: the highlighting of search terms in the matched documents. If you don't need it, then using solr to retrieve the identifiers only is fine (I assume you need only the documents list, and no other features, like facets, related docs or spell checking).
That said, it should matter how you build your objects in your search function, either from the DB using uniquely solr to retrieve IDs or from solr returned fields (providing they're stored) or even a mix of both. Think solr to get the 'highlighted' content fields and DB for the other ones. Again if you don't need highlighting, this is not an issue.
I'm using Solr with thousands of documents but only return the ids for the following reasons :
For Solr :
- if some sync mistake append, it's not a big deal (especially in your case, displaying a different price can be a big issue... it's like the item will not be in the right place, but the data are right)
- you will save a lot of time because when you don't ask Solr to return the 'description' of documents (I mean many lines of text)
For your DB :
- you can cache your results, so it's even faster with an ID (you don't need all the data from Solr everytime !!!)
- you build you results in the same way (you don't need a specific method when you want to build html from Solr, and an other method from your DB)
I think there is a lot more...