Solr to list partial matches at top-different scenario - search

I have performed search against the field company (**which is ngram-d as i need to fetch results against partial match)** with the search text "aetnahmo" . I could bring exact matches and partial matches very top.
I need to handle a scenario such as:
Example: From the below results, I need to bring "AETNA BETTER HLTH PAHMO" and "AETNA BETTER HEALTH MAHMO" at the top of "CIGNAHMO HEALTHPLAN - METH".
Here,even these results do not have 'aetnahmo' it has 'aetna'. I need to display results starts with this, below the exact matches and similar matches.
"docs": [
{
"company": "AETNAHMOGNPIPA",
"score": 0.32741508
},
{
"company": "AETNAHMOPOSOUT OF NETWORK",
"score": 0.32741508
},
{
"company": "CIGNAHMO HEALTHPLAN - METH",
"score": 0.14788051
},
{
"company": "CIGNAHMOPOSOZ08",
"score": 0.14500062
},
{
"company": "CIGNAHMOPOSGNPIPA",
"score": 0.14500062
},
{
"company": "HUMANAHMO MCD",
"score": 0.14500062
},
{
"company": "AETNA BETTER HLTH PAHMO",
"score": 0.1069743
},
{
"company": "AETNA BETTER HEALTH MAHMO",
"score": 0.1069743
},
{
"company": "MOLINA HLTHCARE IL PAHMO",
"score": 0.067287326
},
{
"company": "BCBSMAHMO OUTPT",
"score": 0.065203
}
]
Is there a way to achieve this. Please help

Phrase boosting will work here.
You'll need to use the edismax query parser along with the pf field.
The following params appended to your query should do the trick:
&defType=edismax&pf=company.
I've tested this out on Solr-5.4.1 with the dataset that you've posted above and the results are as follows:
Query: http://localhost:8983/solr/Test/select?q=company%3Aaetnamho&wt=json&indent=true&defType=edismax&pf=company&stopwords=true&lowercaseOperators=true&omitHeader=true
Response:
{
"response":{"numFound":9,"start":0,"docs":[
{
"company":"AETNAHMOPOSOUT OF NETWORK",
"id":2,
"_version_":1530885533005250560},
{
"company":"AETNA BETTER HEALTH MAHMO",
"id":8,
"_version_":1530885600368918528},
{
"company":"AETNA BETTER HLTH PAHMO",
"id":7,
"_version_":1530885592734236672},
{
"company":"AETNAHMOGNPIPA",
"id":1,
"_version_":1530885512290631680},
{
"company":"CIGNAHMO HEALTHPLAN - METH",
"id":3,
"_version_":1530885543046414336},
{
"company":"MOLINA HLTHCARE IL PAHMO",
"id":9,
"_version_":1530885608894889984},
{
"company":"CIGNAHMOPOSGNPIPA",
"id":5,
"_version_":1530885565434560512},
{
"company":"CIGNAHMOPOSOZ08",
"id":4,
"_version_":1530885555631423488},
{
"company":"HUMANAHMO MCD",
"id":6,
"_version_":1530885585061806080}]
}}

Related

elasticsearch doesn't find results when searching the exact term

I am using the elasticsearch module in my nodejs app to query my index using fuzzy completion. The text I'm trying to search is Rome–Fiumicino Leonardo da Vinci International Airport. when searching this term I get no results, but if I cut the term to 50 characters it does find it and return results.
const result = await elasticsearch.search({
index: 'myIndex',
body: {
suggest: {
fuzzinessZero: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 0,
},
contexts,
},
},
fuzzinessOne: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 1,
},
contexts,
},
},
fuzzinessTwo: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 2,
},
contexts,
},
},
},
}
})
This is the result I get in fuzzinessOne
As you can see, the result in the text field is cut to 50 characters (maybe that's the issue). And inside the _source I get back all the inputs which is used for the search, and one of them is the full exact term which I tried to search, as well with all the other available combinations available.
It is worth mentioning that I'm using AWS openSearch.
And this is the settings which I use to create the index:
settings: {
analysis: {
filter: {
autocomplete_filter: {
type: 'edge_ngram',
min_gram: 2,
max_gram: 20,
},
shingle_filter: {
type: 'shingle',
max_shingle_size: 3,
},
},
analyzer: {
autocomplete: {
type: 'custom',
tokenizer: 'standard',
filter: ['lowercase', 'shingle_filter', 'asciifolding'],
},
},
},
}
You are facing this issue because of default value of max_input_length parameter is set to 50.
Below is description given for this parameter in documentation:
Limits the length of a single input, defaults to 50 UTF-16 code
points. This limit is only used at index time to reduce the total
number of characters per input string in order to prevent massive
inputs from bloating the underlying datastructure. Most use cases
won’t be influenced by the default value since prefix completions
seldom grow beyond prefixes longer than a handful of characters.
You can use this default behaviour or you can updated your index mapping with increase value of max_input_length parameter and reindex your data.
{
"mappings": {
"dynamic": "false",
"properties": {
"namesuggest": {
"type": "completion",
"analyzer": "keyword_lowercase_analyzer",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 100,
"contexts": [
{
"name": "searchable",
"type": "CATEGORY"
}
]
}
}
},
"settings": {
"index": {
"mapping": {
"ignore_malformed": "true"
},
"refresh_interval": "5s",
"analysis": {
"analyzer": {
"keyword_lowercase_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
}
},
"number_of_replicas": "0",
"number_of_shards": "1"
}
}
}

PouchDB/CouchDB Group By Value in Array

I am using PouchDB and I have a dataset representing a social network in a graph. People are documents, and the people they follow are in an array of the _id of the person followed. Here is a sample of the data:
[
{
"_id": "mc0001",
"name": "Jill Martin",
"joined": "2020-01-15",
"follows": []
},
{
"_id": "mc0002",
"name": "Elena Markova",
"joined": "2020-01-21",
"follows": ["mc0001"]
},
{
"_id": "mc0003",
"name": "Carlos Sanchez",
"joined": "2020-01-27",
"follows": ["mc0001", "mc0002"]
},
{
"_id": "mc0004",
"name": "Ai Sato",
"joined": "2020-02-21",
"follows": ["mc0001", "mc0003"]
},
{
"_id": "mc0005",
"name": "Ming Wu",
"joined": "2020-03-21",
"follows": ["mc0002", "mc0003", "mc0004"]
}
]
What I would like to do is query for each person, and get a list of followers. I am looking for something like this:
[
{
"_id": "mc0001",
"followers": ["mc0002", "mc0003", "mc0004"]
},
{
"_id": "mc0002",
"followers": ["mc0003", "mc0005"]
},
{
"_id": "mc0003",
"followers": ["mc0004", "mc0005"]
},
{
"_id": "mc0004",
"followers": ["mc0005"]
},
{
"_id": "mc0005",
"followers": []
}
]
Is there a way to do this without changing the data structure (e.g. moving the followers array into the doc of the person being followed)?
Create a Map/Reduce view that loops through the follows array in each document and emits those; like this:
function (doc) {
for(var i =0; i<doc.follows.length; i++) {
emit(doc.follows[i], null);
}
}
You end up with an index keyed on a user and where each row has the id of a follower of that user. You can then query the index, supplying the key of the user whose followers you want to find, like this:
$URL/users/_design/users/_view/by-follower?key="mc0001"&reduce=false
You will get something like this:
{"total_rows":8,"offset":0,"rows":[
{"id":"mc0002","key":"mc0001","value":null},
{"id":"mc0003","key":"mc0001","value":null},
{"id":"mc0004","key":"mc0001","value":null}
]}
This is not exactly the format of the data you have in your question, but you can see that the id field in each object contains a follower of your desired user, so you can go from there.

Not able to query for nested relations using dgraph-orm

I am using dgraph-orm for fetching nested relational values but it works for single level but not multiple level.
I am getting the page details but unable to fetch the avatar of the page.
Here is my snippet:
let posts = await PagePost.has('page_id', {
filter: {
page_id: {
uid_in: [page_id]
}
},
include: {
page_id: {
as: 'page',
include: {
avatar: {
as: 'avatar'
}
}
},
owner_id: {
as: 'postedBy'
}
},
order: [], // accepts order like the above example
first: perPage, // accepts first
offset: offset, // accepts offset
});
I am not getting avatar for the attribute page_id:
{
"uid": "0x75b4",
"title": "",
"content": "haha",
"created_at": "2019-09-23T08:50:52.957Z",
"status": true,
"page": [
{
"uid": "0x75ac",
"name": "Saregamaapaaaa...",
"description": "This a is place where you can listen ti thrilling music.",
"created_at": "2019-09-23T06:46:50.756Z",
"status": true
}
],
"postedBy": [
{
"uid": "0x3",
"first_name": "Mohit",
"last_name": "Talwar",
"created_at": "2019-07-11T11:37:33.853Z",
"status": true
}
]
}
Is there a support for multilevel field querying in the orm??
There was some issue with ORM itself it was not able to recognize the correct model name for multilevel includes and generating the wrong queries.
Fixed the same in version 1.2.4, please run npm update dgraph-orm --save to update your DgraphORM.
Thanks for the issue.

Mongoose how to find specific value

After a lot of reading, I am stuck.
the code I posted here, is the implementation of a store database I am trying to make.
in every store, we have some fields. I am interested in doing something with the items array, that contains JSON variables.
I want to filter the items through three filters, firstly by the store ID, secondly by the category ID, and the last filter will be the semi category ID.
I want to send the data from the front end, meaning I supply STOREID, the CategoryID, and the SemiCategoryID.
after receiving the data at the back end side, I am expecting to receive only the relevant items according to the data supplied by the front end.
{
"_id": {
"$oid": "5a1844b5685cb50a38adf5bb" --> **ID of the STORE**
},
"name": "ACE",
"user_id": "59e4c41105d1f6227c1771ea",
"imageURL": "none",
"rating": 5,
"items": [
{
"name": "NirCohen",
"categoryID": "5a0c2d292235680012bd12c9",
"semiCatID": "5a0c2d5a2235680012bd12ca",
"_id": {
"$oid": "5a1958181cd8a208882a80f9"
}
},
{
"name": "he",
"categoryID": "5a0c2d292235680012bd12c9",
"semiCatID": "5a0c2d5a2235680012bd12ca",
"_id": {
"$oid": "5a1973c40e561e08b8aaf2b2"
}
},
{
"name": "a",
"categoryID": "5a0c2d292235680012bd12c9",
"semiCatID": "5a0c2d5a2235680012bd12ca",
"_id": {
"$oid": "5a197439bc1310314c4c583b"
}
},
{
"name": "aaa",
"categoryID": "5a0c2d292235680012bd12c9",
"semiCatID": "5a0c2d5a2235680012bd12ca",
"_id": {
"$oid": "5a197474558a921bb043317b"
}
},
],
"__v": 9
}
and I want the Backend to return the filtered items according to the query.
The problem is, I am not managing to get the CORRECT query.
your help will be much appreciated,
thank you in advance.
If I understand you correctly, you are doing something like this:
Store.find({}).then(..);
If you only want to find the stores where categoryID is equal to the variable myCategory, you could filter them out by using:
Store.find({semiCatID: myCategory}).then(..);
Please let me know if this is not what you are after, then we can keep trying to figure this out together.
EDIT: So you are sending the variables StoreID, CategoryID and SemiCategoryID from the frontend. Receive them in the backend, and want to filter your database collection matching all three fields?
If so.. then I think all you have to do is change your current query:
store.findOne({ _id: req.body.userID }, (err, store) => { console.log(store); });
To something like:
store.findOne({
_id: req.body.userID,
storeID: req.body.StoreID,
categoryID: req.body.CategoryID,
semiCategoryID: req.body.SemiCategoryID
}, (err, store) => { console.log(store); });
This way, the objects you get back from mongo must match all four criterias given from the frontend.
As far as I Understood your question here is my answer to it you can use findById
Store.findById({//store id}).then(..);
or
Store.findOne({_id:ObjectID(storeID)}).then(..);

MongoDB: Query model and check if document contains object or not, then mark / group result

I have a Model called Post, witch contains an property array with user-ids for users that have liked this post.
Now, i need to query the post model, and mark the returned results with likedBySelf true/false for use in by client - is this possible?
I dont have to store the likedBySelf property in the database, just modify the results to have that property.
A temporary solution i found was to do 2 queries, one that finds the posts that is liked by user x, and the ones that have not been liked by user x, and en map (setting likedBySelf true/false) and combine the 2 arrays and return the combined array. But this gives some limitations to other query functions such as limit and skip.
So now my queries looks like this:
var notLikedByQuery = Post.find({likedBy: {$ne: req.body.user._id}})
var likedByQuery = Post.find({likedBy: req.body.user._id})
(I'm using the Mongoose lib)
PS. A typical post can look like this (JSON):
{
"_id": {
"$oid": "55fc463c83b2d2501f563544"
},
"__t": "Post",
"groupId": {
"$oid": "55fc463c83b2d2501f563545"
},
"inactiveAfter": {
"$date": "2015-09-25T17:13:32.426Z"
},
"imageUrl": "https://hootappprodstorage.blob.core.windows.net/devphotos/55fc463b83b2d2501f563543.jpeg",
"createdBy": {
"$oid": "55c49e2d40b3b5b80cbe9a03"
},
"inactive": false,
"recentComments": [],
"likes": 8,
"likedBy": [
{
"$oid": "558b2ce70553f7e807f636c7"
},
{
"$oid": "559e8573ed7c830c0a677c36"
},
{
"$oid": "559e85bced7c830c0a677c43"
},
{
"$oid": "559e854bed7c830c0a677c32"
},
{
"$oid": "559e85abed7c830c0a677c40"
},
{
"$oid": "55911104be2f86e81d0fb573"
},
{
"$oid": "559e858fed7c830c0a677c3b"
},
{
"$oid": "559e8586ed7c830c0a677c3a"
}
],
"location": {
"type": "Point",
"coordinates": [
10.01941398718396,
60.96738099591897
]
},
"updatedAt": {
"$date": "2015-09-22T08:45:41.480Z"
},
"createdAt": {
"$date": "2015-09-18T17:13:32.426Z"
},
"__v": 8
}
#tskippe you can use a method like following to process whether the post is liked by the user himself and call the function anywhere you want.
var processIsLiked = function(postId, userId, doc, next){
var q = Post.find({post_id: postId});
q.lean().exec(function(err,res){
if(err) return utils.handleErr(err, res);
else {
if(_.find(doc.post.likedBy,userId)){ //if LikedBy array contains the user
doc.post.isLiked = true;
} else {
doc.post.isLiked = false;
}
});
next(doc);
}
});
}
Because you are using q.lean() you dont need to actually persist the data. You need to just process it , add isLiked field in the post and send back the response. **note that we are manuplating doc directly. Also you chan tweek it to accept doc containing array of posts and iterating it and attach an isLiked field to each post.
I found that MongoDB's aggregation with $project tequnique was my best bet. So i wrote up an aggregation like this.
Explanation:
Since i want to keep the entire document, but $project purpose is to modify the docs, thus you have to specify the properties you want to keep. A simple way of keeping all the properties is to use "$$ROOT".
So i define a $project, set all my original properties to doc: "$$ROOT", then create a new property "likedBySelf", which is marked true / false if a specified USERID is in the $likedBy set.
I think that this is more clean and simple, than querying every single model after a query to set a likedBySelf flag. It may not be faster, but its cleaner.
Model.aggregate([
{ $project: {
doc: "$$ROOT",
likedBySelf: {
$cond: {
"if": { "$setIsSubset": [
[USERID],
"$likedBy"
]},
"then": true,
"else": false
}
}
}}
]);

Resources