ElasticSearch Field boosting using java api - search

I am new to ES and trying to search using java apis. I am unable to figure out how I can provide filed specific boosting using the java apis.
Here is the example:
My index document looks like:
_source": {
"th_id": 1,
"th_name": "test name",
"th_description": "test desc",
"th_image": "test-img",
"th_slug": "Make-Me-Smart",
"th_show_title": "Coast Tech Podcast",
"th_sh_category": "Alternative Health
}
When i search for keywords I want to boost the results higher if they found in the "th_name" compared to they're found in some other fields.
Currently I am using below code to do search:
QueryBuilder qb1 = QueryBuilders.multiMatchQuery(keyword, "th_name", "th_description", "th_show_title", "th_sh_category");
SearchResponse response = client.prepareSearch("talk").setTypes("themes")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setQuery(qb1)
.setFrom(start).setSize(maxRows)
.setExplain(true).execute().actionGet();
Is there anything I can do at query time to boost the document if the keyword is found in "th_name" field compared to found in other fields?

The accepted answer did not work me. ES version I am using is 6.2.4.
QueryBuilders.multiMatchQuery(keyword)
.field("th_name" ,2.0f)
.field("th_description")
.field("th_show_title")
.field("content")
Hope it helps someone else.

Edit: This has changed and does no longer work in ES 6.x and upwards.
You should also be able to boost a field directly in the Multi-match query:
"The multi_match query supports field boosting via ^ notation in the fields json field.
{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^2", "message" ]
}
}
In the above example hits in the subject field are 2 times more important than in the message field."
In the java-api, just use the MultiMatchQueryBuilder:
MultiMatchQueryBuilder builder =
new MultiMatchQueryBuilder( keyword, "th_name^2", "th_description", "th_show_title", "th_sh_category" );
Disclaimer: Not tested

You can use "BoostingQuery"
http://www.elasticsearch.org/guide/reference/query-dsl/boosting-query.html
javadoc : https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/index/query/BoostingQueryBuilder.java

Related

What is the use of "useMaster: true" in sequelize(nodejs)?

So I was going through a project codebase(nodejs 10) and they are using sequelize for executing queries in Mysql and at some places they are using useMaster: true.
I am just wondering what is it there for? Anyone having any idea?
Example code:
[error , coupons ] = await to(#wagner.get('sequelize').query( query , { useMaster: true, replacements : [coupon_offset,parseInt(limit)] ,type: #wagner.get('sequelize').QueryTypes.SELECT } ))
Didi you check the official documentation?
It says about useMaster in the Params section of query instance method:
Force the query to use the write pool, regardless of the query type.
See query method
I'm not sure if it's applicable to MySQL but still I'd recommend to find out more about write pools.

Cannot get documentid using pnp.sp.search in spfx app

In an older JavaScript app I used keyword-query to search for document properties, and I could add the 'DlcDocID' field (Document id) to be retrieved.
I am currently developing an Spfx version of the app, and use pnp.sp.search to get document data. This way I can get the UniqueId and the DocId, but not the Document Id. How can I have this parameter included in the search results?
Extra:
I am using 1.3.11, and this code
pnp.sp.search(
{
Querytext:query,
RowLimit:rows,
StartRow:start,
SelectProperties: ["DocId"
, "UniqueId"
,"FileType"
,"ServerRedirectedEmbedURL"
, "ServerRedirectedPreviewURL"
,"LastModifiedTime"
,"Write"
,"Size"
,"SPWebUrl"
,"ParentLink"
,"Title"
,"HitHighlightedSummary"
,"Path"
,"Author"
,"LastModifiedTime"
,"DlcDocID"
],
But DlcDocID is never retrieved.
Looking at the docs, DlcDocID should be retrievable (it's queryable and retrievable by default). Have you tried using SearchQueryBuilder and selectProperties?
const q = SearchQueryBuilder().text(yourQuery).
.rowLimit(10).processPersonalFavorites.selectProperties('*', 'DlcDocID');
const results = await sp.search(q);
SearchQueryBuilder reference
The issue was that the pnp
SearchResult interface didn't have the DlcDocID in this version. Adding it solved the problem.

Parse values from the JSON returned by URL of Google Place API

I am creating a bot using dialogflow-fulfillment, and I am using Google Place API to pull additional information about hospitals.
I have made a dummy response, for the sake of example, that is returned by Google Place API, here is the link: http://www.mocky.io/v2/5c2b9f9e3000007000abafe3
{
"candidates" : [
{
"formatted_address" : "140 George St, The Rocks NSW 2000, Australia",
"name" : "Museum of Contemporary Art Australia",
"photos" : [
{
"height" : 3492,
"html_attributions" : [
"\u003ca href=\"https://maps.google.com/maps/contrib/105784220914426417603/photos\"\u003eKeith Chung\u003c/a\u003e"
],
"photo_reference" : "CmRaAAAAaGaCX-kivNEaJ-z97AduTYgW3d98uv53-8skNrS1k1GTgOtiQ1-Z2gfWJydrpkrshuV_kHPKizl088dezEJgIxYGoTWqtJgah-u_I46qNNYMfUbk8LKBZqxzkHyIL1nWEhBO6lPa0NgvlyLGBrXpXFPUGhT0lAUj_oCiOWV2MEYdBeKf-kTtgg",
"width" : 4656
}
]
}
],
"status" : "OK"
}
I need to parse values of my choice from the JSON returned by Google Place API. For example, If I had to parse value of 'name' from the JSON above using Python, I would do this:
import requests, json
api_key = ''
r = requests.get('https://maps.googleapis.com/maps/api/place/findplacefromtext/json?input=Museum%20of%20Contemporary%20Art%20Australia&inputtype=textquery&fields=photos,formatted_address,name&key=' + api_key)
x = r.json()
y = x['candidates']
print(y[0]['name'])
Above code for the job is lucid and works perfectly. Considering my inexperience in Nodejs, would you please let me know something similar in Nodejs to parse the value, for instance, the value of 'name'?
Your valuable reply will encourage me.
P.S: Humbly, the question involves first making a call to Google Place API and then parsing values from the returned JSON. Please follow the steps given in Python code above for better understanding.
Get the API response in an async HTTP request (there are tons of npm libraries like request to help you automatically set headers etc), then use standard library JSON.parse(body) to get an plain JavaScript object which contains a structured representation of the API response.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse

Simple solr query in order to search for a name in a field

I am trying to search for whatever docs which have for example Reddy and Kumar in the respondent_name field but whenever I tried this search I never get what I want.
I have tried it:
respondent_name:”Reddy Kumar”~3
/select?indent=on&q=respondent_name:%E2%80%9DReddy%20Kumar%E2%80%9D~3&rows=1000&wt=json
I did get result from this query but most of them returned for any docs which have Reddy and Kumar word in any field, not the docs which has these word in the respondent_name field.
I tried to follow this site but for some reason, it doesn't work for me:
https://dzone.com/articles/tips-name-search-solr
Json docs samples:
{
"db_id":["590dd7e5fa25c3080f0d706e"],
"main_title":["District Courts, Ananthapur"],
"respondent_name":["Erikela Sunkanna",
"The ManagerLegal CellICICI Lombard General Insurance Co.Ltd.",
"S. Chandrasekhar Reddy"],
"id":"9a3c1c41-a634-4df8-91e6-1a90072ebd9f",
"_version_":1574199934804033536
},
{
"db_id":["590dd7bbfa25c3080f0d6eb4"],
"main_title":["Jr . Civil Courts,Uravakonda"],
"respondent_name":["Nayika Chinnappayya",
"Nayika Chinna Reddy",
"Nayika Devi",
"Nayika Gowramma"],
"id":"65dff199-5e51-415e-9fa3-14693c74a6a9",
"_version_":1574199931583856640
},
{
"db_id":["590dd7d0fa25c3080f0d6f93"],
"main_title":["Jr . Civil Courts,Tadipatri"],
"respondent_name":["Y.Hari Kumar Reddy"],
"id":"a162347b-6af1-4c1f-b65b-3eee76f71469",
"_version_":1574199933688348672
}
Does anyone can tell me what I am doing wrong?

How to link/join multiple Lucene docs by AND operation

I am beginner to lucene. Now I am blocked because of a search issue. We are developing an API to use lucene as search engine for our application and have to make lot of queries with different conditions as joined.
We store many entities into lucene as individual documents.
Each entity comes as number of records and stored into lucene as individual docs. Added below a sample structure of data,
Serial no. 1 --> 16 are docs into lucene.
1) "id": "1","sendr_name": "**sender1**", "recip_name": "**recipient1**", "subject": "**subject1**"
2) "id": "1","attachment": "**attachment1**"
3) "id": "1","domain": "**domain1**", "ip": "ip1"
5) "id": "1","mid": "**mid1**"
6) "id": "1","type": "type1"
7) "id": "2","sendr_name": "sender1", "recip_name": "recipient1", "subject": "subject1"
8) "id": "2","attachment": "attachment2"
9) "id": "2","domain": "domain1", "ip": "ip2"
10) "id": "2","mid": "mid2"
11) "id": "2","type": "type2"
12) "id": "3","sendr_name": "sender1", "recip_name": "recipient3", "subject": "subject3"
13) "id": "3","attachment": "attachment3"
14) "id": "3","domain": "domain1", "ip": "ip3"
15) "id": "3","mid": "mid3"
16) "id": "3","type": "type3"
Note : serial no. 1-16 are documents for different entities and field "id" get generated internally , so id value cannot use as query value by user.
My need is to extract specific entity or entities on specific condition.
+sendr_name:sender1 + recip_name:recipient1 +subject:subject1 +attachment:attachment1 +domain:domain1 +mid:mid1
This is to get an entity info(1-6 docs for an entity).
But above query fails to return result because attachment, mid and domain in different docs.
Is there any way that we can span AND condition to multiple docs? or is there anyway that we can join query on a field like doc1.id = doc2.id?
I request you all to provide your suggestions or help to solve this issue.
First of all, with plain Lucene, its not recommended to store heterogeneous documents in same index as that can have multitude of other problems in long run and other infrastructure problems.
Go through this SO Answer. You better use other top level techs like SOLR or ElasticSearch for that matter which are better capable to handle scenario that you describe.
You have not shown any code so its not clear if you are using Java or .NET or Lucene API version.
I am using Lucene 6.0 with Java and I think, its achievable with - BooleanQuery as top level container.
public static BooleanQuery.Builder buildQuery(final SearchBean searchBean) {
BooleanQuery.Builder finalQuery = new BooleanQuery.Builder();
finalQuery.add(buildDoc1Query(searchBean).build(), Occur.SHOULD);
finalQuery.add(buildDoc2Query(searchBean).build(), Occur.SHOULD);
....
....
return finalQuery;
}
i.e. first you build queries for each entity type depending on what all needed to be searched. SearchBean is a POJO that has all the searchable fields for all doc types combined.
private static BooleanQuery.Builder buildDoc1Query(SearchBean searchBean ) {
BooleanQuery.Builder doc1MatchQuery = new BooleanQuery.Builder();
if (StringUtils.isNotEmpty(searchBean.getSender_name())) {
doc2MatchQuery.add(new BoostQuery(new TermQuery(new Term(AppConstants.SENDER_NAME, searchBean.getSender_name())), MatchingBooster.SENDER_NAME), BooleanClause.Occur.MUST);
}
if (StringUtils.isNotEmpty(searchBean.getRecip_name())) {
doc2MatchQuery.add(new BoostQuery(new TermQuery(new Term(AppConstants.RECIP_NAME, searchBean.getRecip_name()())), MatchingBooster.RECIP_NAME), BooleanClause.Occur.MUST);
}
....
....
return doc2MatchQuery;
}
StringUtils is coming from Apache Commons library.
AppConstants contains indexed field names.
What is important here is - BooleanClause.Occur.MUST in child queries and Occur.SHOULD in master and that way you group child queries into one master query.
So you will get something like - (+sendr_name:sender1 + recip_name:recipient1 +subject:subject1) (+attachment:attachment1) ....so on.
Above will give you doc1 & doc2.
You can remove boosting part in above sample code ( BoostQuery) and can directly use TermQuery.
Hope it helps and let me know if I misunderstood your requirement.

Resources