How to link/join multiple Lucene docs by AND operation

How to link/join multiple Lucene docs by AND operation - search

I am beginner to lucene. Now I am blocked because of a search issue. We are developing an API to use lucene as search engine for our application and have to make lot of queries with different conditions as joined.
We store many entities into lucene as individual documents.
Each entity comes as number of records and stored into lucene as individual docs. Added below a sample structure of data,
Serial no. 1 --> 16 are docs into lucene.
1) "id": "1","sendr_name": "**sender1**", "recip_name": "**recipient1**", "subject": "**subject1**"
2) "id": "1","attachment": "**attachment1**"
3) "id": "1","domain": "**domain1**", "ip": "ip1"
5) "id": "1","mid": "**mid1**"
6) "id": "1","type": "type1"
7) "id": "2","sendr_name": "sender1", "recip_name": "recipient1", "subject": "subject1"
8) "id": "2","attachment": "attachment2"
9) "id": "2","domain": "domain1", "ip": "ip2"
10) "id": "2","mid": "mid2"
11) "id": "2","type": "type2"
12) "id": "3","sendr_name": "sender1", "recip_name": "recipient3", "subject": "subject3"
13) "id": "3","attachment": "attachment3"
14) "id": "3","domain": "domain1", "ip": "ip3"
15) "id": "3","mid": "mid3"
16) "id": "3","type": "type3"
Note : serial no. 1-16 are documents for different entities and field "id" get generated internally , so id value cannot use as query value by user.
My need is to extract specific entity or entities on specific condition.
+sendr_name:sender1 + recip_name:recipient1 +subject:subject1 +attachment:attachment1 +domain:domain1 +mid:mid1
This is to get an entity info(1-6 docs for an entity).
But above query fails to return result because attachment, mid and domain in different docs.
Is there any way that we can span AND condition to multiple docs? or is there anyway that we can join query on a field like doc1.id = doc2.id?
I request you all to provide your suggestions or help to solve this issue.

First of all, with plain Lucene, its not recommended to store heterogeneous documents in same index as that can have multitude of other problems in long run and other infrastructure problems.
Go through this SO Answer. You better use other top level techs like SOLR or ElasticSearch for that matter which are better capable to handle scenario that you describe.
You have not shown any code so its not clear if you are using Java or .NET or Lucene API version.
I am using Lucene 6.0 with Java and I think, its achievable with - BooleanQuery as top level container.
public static BooleanQuery.Builder buildQuery(final SearchBean searchBean) {
BooleanQuery.Builder finalQuery = new BooleanQuery.Builder();
finalQuery.add(buildDoc1Query(searchBean).build(), Occur.SHOULD);
finalQuery.add(buildDoc2Query(searchBean).build(), Occur.SHOULD);
....
....
return finalQuery;
}
i.e. first you build queries for each entity type depending on what all needed to be searched. SearchBean is a POJO that has all the searchable fields for all doc types combined.
private static BooleanQuery.Builder buildDoc1Query(SearchBean searchBean ) {
BooleanQuery.Builder doc1MatchQuery = new BooleanQuery.Builder();
if (StringUtils.isNotEmpty(searchBean.getSender_name())) {
doc2MatchQuery.add(new BoostQuery(new TermQuery(new Term(AppConstants.SENDER_NAME, searchBean.getSender_name())), MatchingBooster.SENDER_NAME), BooleanClause.Occur.MUST);
}
if (StringUtils.isNotEmpty(searchBean.getRecip_name())) {
doc2MatchQuery.add(new BoostQuery(new TermQuery(new Term(AppConstants.RECIP_NAME, searchBean.getRecip_name()())), MatchingBooster.RECIP_NAME), BooleanClause.Occur.MUST);
}
....
....
return doc2MatchQuery;
}
StringUtils is coming from Apache Commons library.
AppConstants contains indexed field names.
What is important here is - BooleanClause.Occur.MUST in child queries and Occur.SHOULD in master and that way you group child queries into one master query.
So you will get something like - (+sendr_name:sender1 + recip_name:recipient1 +subject:subject1) (+attachment:attachment1) ....so on.
Above will give you doc1 & doc2.
You can remove boosting part in above sample code ( BoostQuery) and can directly use TermQuery.
Hope it helps and let me know if I misunderstood your requirement.

Related

Get exception details from Azure Monitor Workbook that deals with multiple app insight instances

I am working on creating a workbook that provides an umbrella view over multiple app insight instances. Our solution has many microservices (Azure functions) each having its own app insight instance. Aim of this workbook is to provide a health status for the whole app by surfacing up errors across app insight instances in to a single view.
I have used the "Failure Analysis" template to set this up. User is able to select different app insight instances at the top and the views will filter based on that. There's a view that shows exception counts with trends like this. Each error may belong to different App Insight instances.
When you click on a line item all instances of that error will be shown in a following view like this
I use the following query to load it
let row = dynamic({Row});
let req = requests
| where '{Row}' == '{}' or (row.Kind == 'Application' and row.Id
== appName) or (row.Kind == 'Request' and row.Id ==
strcat(appName, "::", name))
| where success == "False";
let errors = exceptions
| where appName == appName
| where timestamp between({TimeRange:start}..{TimeRange:end});
errors
| join req on operation_Id
| project operation_Id, itemId, timestamp,requestName=name,
exception=type, method, outerMessage, innermostMessage,
details, appName
As mentioned in this question Get exception details from a Azure Monitor Workbook, the itemId is available and I try to link it to the "Exception Details" view. Please note that these errors can come from one of many App Insight instances based on what gets selected at the previous view.
I have configured the item Id and appName columns as follows using Link renderer and Automatic renderer.
However the link always directs to one specific app insight instance (Not the one associated with the error) and hence the error won't get loaded. Is it possible to load the "Detail Views" across App Insight instances using this technique? If not what could be other avenues?

When the application insights is workspace based, each telemetry item has a field _ResourceId that contains a link to the resource, for example
/subscriptions/c8vfbeab-a5a67-4272-aa6e-4c9f4142e962/resourcegroups/rg-my-resource-group/providers/microsoft.insights/components/my-ai-resource
You can use this a part of the url to create a deep link to the details page of the Application Insights resource of the telemetry with the specified item id. Take this query for example:
exceptions
| take 1
| extend portalUrl = strcat("https://portal.azure.com/#blade/AppInsightsExtension/BladeRedirect/BladeName/searchV1/ResourceId/", url_encode(_ResourceId), "/BladeInputs/%7B%22tables%22%3A%5B%22availabilityResults%22%2C%22requests%22%2C%22exceptions%22%2C%22pageViews%22%2C%22traces%22%2C%22customEvents%22%2C%22dependencies%22%5D%2C%22timeContextWhereClause%22%3A%22%7C%20where%20timestamp%20%3E%20datetime(%5C%222022-02-12T12%3A55%3A02.739Z%5C%22)%20and%20timestamp%20%3C%20datetime(%5C%222022-03-14T12%3A55%3A02.739Z%5C%22)%22%2C%22filterWhereClause%22%3A%22%7C%20where%20*%20has%20%5C%22a1a20ad1a12ff348a852288a4d9953a5%5C%22%7C%20order%20by%20timestamp%20desc%22%2C%22originalParams%22%3A%7B%22eventTypes%22%3A%5B%7B%22value%22%3A%22availabilityResult%22%2C%22tableName%22%3A%22availabilityResults%22%2C%22label%22%3A%22Availability%22%7D%2C%7B%22value%22%3A%22request%22%2C%22tableName%22%3A%22requests%22%2C%22label%22%3A%22Request%22%7D%2C%7B%22value%22%3A%22exception%22%2C%22tableName%22%3A%22exceptions%22%2C%22label%22%3A%22Exception%22%7D%2C%7B%22value%22%3A%22pageView%22%2C%22tableName%22%3A%22pageViews%22%2C%22label%22%3A%22Page%20View%22%7D%2C%7B%22value%22%3A%22trace%22%2C%22tableName%22%3A%22traces%22%2C%22label%22%3A%22Trace%22%7D%2C%7B%22value%22%3A%22customEvent%22%2C%22tableName%22%3A%22customEvents%22%2C%22label%22%3A%22Custom%20Event%22%7D%2C%7B%22value%22%3A%22dependency%22%2C%22tableName%22%3A%22dependencies%22%2C%22label%22%3A%22Dependency%22%7D%5D%2C%22timeContext%22%3A%7B%22durationMs%22%3A2592000000%7D%2C%22filter%22%3A%5B%5D%2C%22searchPhrase%22%3A%7B%22originalPhrase%22%3A%22", itemId,"%22%2C%22_tokens%22%3A%5B%7B%22conjunction%22%3A%22and%22%2C%22value%22%3A%22a1a20ad1a12ff348a852288a4d9953a5%22%2C%22isNot%22%3Afalse%2C%22kql%22%3A%22%20*%20has%20%5C%22a1a20ad1a12ff348a852288a4d9953a5%5C%22%22%7D%5D%7D%2C%22sort%22%3A%22desc%22%7D%7D")
| project timestamp, problemId, itemId, portalUrl
If you create a workbook and render a table based on the query above you have to modify the itemId column to be a link like this:
Clicking the itemId column it should open the correct Application Insights resource details page:
Now, I hope this give you enough clues to extend your own queries by including the deep link url in your query output.
For completeness, this is the full workbook Gallery Template:
{
"version": "Notebook/1.0",
"items": [
{
"type": 3,
"content": {
"version": "KqlItem/1.0",
"query": "exceptions\n| take 1\n| extend portalUrl = strcat(\"https://portal.azure.com/#blade/AppInsightsExtension/BladeRedirect/BladeName/searchV1/ResourceId/\", url_encode(_ResourceId), \"/BladeInputs/%7B%22tables%22%3A%5B%22availabilityResults%22%2C%22requests%22%2C%22exceptions%22%2C%22pageViews%22%2C%22traces%22%2C%22customEvents%22%2C%22dependencies%22%5D%2C%22timeContextWhereClause%22%3A%22%7C%20where%20timestamp%20%3E%20datetime(%5C%222022-02-12T12%3A55%3A02.739Z%5C%22)%20and%20timestamp%20%3C%20datetime(%5C%222022-03-14T12%3A55%3A02.739Z%5C%22)%22%2C%22filterWhereClause%22%3A%22%7C%20where%20*%20has%20%5C%22a1a20ad1a12ff348a852288a4d9953a5%5C%22%7C%20order%20by%20timestamp%20desc%22%2C%22originalParams%22%3A%7B%22eventTypes%22%3A%5B%7B%22value%22%3A%22availabilityResult%22%2C%22tableName%22%3A%22availabilityResults%22%2C%22label%22%3A%22Availability%22%7D%2C%7B%22value%22%3A%22request%22%2C%22tableName%22%3A%22requests%22%2C%22label%22%3A%22Request%22%7D%2C%7B%22value%22%3A%22exception%22%2C%22tableName%22%3A%22exceptions%22%2C%22label%22%3A%22Exception%22%7D%2C%7B%22value%22%3A%22pageView%22%2C%22tableName%22%3A%22pageViews%22%2C%22label%22%3A%22Page%20View%22%7D%2C%7B%22value%22%3A%22trace%22%2C%22tableName%22%3A%22traces%22%2C%22label%22%3A%22Trace%22%7D%2C%7B%22value%22%3A%22customEvent%22%2C%22tableName%22%3A%22customEvents%22%2C%22label%22%3A%22Custom%20Event%22%7D%2C%7B%22value%22%3A%22dependency%22%2C%22tableName%22%3A%22dependencies%22%2C%22label%22%3A%22Dependency%22%7D%5D%2C%22timeContext%22%3A%7B%22durationMs%22%3A2592000000%7D%2C%22filter%22%3A%5B%5D%2C%22searchPhrase%22%3A%7B%22originalPhrase%22%3A%22\", itemId,\"%22%2C%22_tokens%22%3A%5B%7B%22conjunction%22%3A%22and%22%2C%22value%22%3A%22a1a20ad1a12ff348a852288a4d9953a5%22%2C%22isNot%22%3Afalse%2C%22kql%22%3A%22%20*%20has%20%5C%22a1a20ad1a12ff348a852288a4d9953a5%5C%22%22%7D%5D%7D%2C%22sort%22%3A%22desc%22%7D%7D\")\n| project timestamp, problemId, itemId, portalUrl",
"size": 1,
"timeContext": {
"durationMs": 86400000
},
"queryType": 0,
"resourceType": "microsoft.insights/components",
"gridSettings": {
"formatters": [
{
"columnMatch": "problemId",
"formatter": 1,
"formatOptions": {
"linkColumn": "portalUrl",
"linkTarget": "Url"
}
}
]
}
},
"name": "query - 2"
}
],
"fallbackResourceIds": [
"/subscriptions/4547474-1a67-4272-aa6e-4c9f4142e269/resourceGroups/rg-resourcegroup-prod/providers/microsoft.insights/components/appi-demo-prod"
],
"$schema": "https://github.com/Microsoft/Application-Insights-Workbooks/blob/master/schema/workbook.json"
}

It looks like from a quick perusal of the code, that if you also have an appName column in the row where itemId is, it will try to look in the resource list used in the query to try to find a resource with that same name, and if not, just take the first one it can find?
but i see that you have appName there, but i'm not sure the rest of the configuration from your step? are you also using all of the resources in the query?

String comparision in Apache free marker

I have a list which contains below values. The list Gets data from restAPI.
I know how to compare record but list has many values so i dont know how to search in array or in list. for eg list has below values. I need to check whether particular record exist in list or not.
var items= [
{
"organizationCode": "FP1",
"organizationName": "FTE Process Org"
},
{
"organizationCode": "T11",
"organizationName": "FTE Discrete Org"
},
{
"organizationCode": "PD2",
"organizationName": "Product development Org"
},
{
"organizationCode": "PD1",
"organizationName": "Product1 development Org"
},
{
"organizationCode": "MD1",
"organizationName": "Main development Org"
}
]
I have to search value based on organizationName.
I tried
items.organizationName?contains(<input>)
But not working
I could not get any material so seeking help.

The problem is that items.organizationName is an error already, since items is purely a list, so it has no named sub-variables directly. (You may saw that pattern when items is coming from XML, where you can get the list of sub-elements like that. But same doesn't work with plain lists.) So, what you need is this expression:
items?map(it -> it.organizationName)?seq_contains(input)
Slightly related, but sometimes you want to retrieve the list of matching items (so you also will have the organizationCode), in which case you should write items?filter(it -> it.organizationName == input).
Note that ?map and ?filter was added in FreeMarker 2.3.29.
See also:
https://freemarker.apache.org/docs/ref_builtins_sequence.html#ref_builtin_filter
https://freemarker.apache.org/docs/ref_builtins_sequence.html#ref_builtin_map

display data from 2 assets using single query in hyperledger faric ?

i have these two assets :
1]
asset bloodBankInformations identified by bloodBankId {
o String bloodBankId
o contact bloodBankContactDetails
--> bloodData bloodBankBloodData
}
2]
asset bloodData extends bloodBankInformations {
o String bloodDatakey
o bloodquotas bloodquota
}
now i want to query on this asset such that i can get data from both assets in one single query.

Composer Queries are run against a single Asset or Participant registry. There is no 'join' as you might see in a relational database.
However for your example it is possible to use a Filter on the REST server to filter the results (like a Query) and resolve the relationship of relationship fields.
Before showing you my example I wonder if you really wanted to 'extend' original asset? So I have simplified your model in my example - but the same principle works if you did intend to extend.
Model:
asset bloodBankInformations identified by bloodBankId {
o String bloodBankId
o contact bloodBankContactDetails
--> bloodData bloodBankBloodData
}
asset bloodData extends bloodBankInformations {
o String bloodDatakey
o bloodquotas bloodquota
}
Filter:
On the REST server used on GET on the endpoint /blookBankInformations
{"where":{"bloodBankId":"BB03"},"include":"resolve"}
My Response Body:
[
{
"$class": "org.acme.mynetwork.bloodBankInformations",
"bloodBankId": "BB03",
"bloodBankContactDetails": "Diferent address!",
"bloodBankBloodData": {
"$class": "org.acme.mynetwork.bloodData",
"bloodDatakey": "BL04",
"bloodquota": "Quota BBBB"
}
}
]
There is some additional information in the Composer Knowledge Wiki on filters.

Buildfire: Private Portal plugin

I am having a very specific issue with a gateway plugin I am trying to finish.
I am trying to navigate to a different plugin using
'buildfire.pluginInstance.get($scope.deepLinnk,function (err, plugin) {
if (err) {
$scope.status = 'error!';
}
else {
console.log(plugin);
$scope.navigateSignIn(plugin);}
});
$scope.navigateSignIn = function (plugin) {
buildfire.navigation.navigateTo({
pluginId: plugin.token,
instanceId: plugin.instanceId,
title: plugin.title,
folderName: plugin.pluginTypeId
});
};
The navigateTo object is the only way I can get buildfire.navigate.navigateTo to work for buildfire made plugins.
However, when I try to navigate to plugins that I have created, the debugger shows and an alert saying "cannot load config file" then the entire platform crashes and makes me sign in again.
How can I navigate to plugins that I have created?

How are you getting the pluginId, instanceId and folderName? You cant simply save them or hard code them in. You need to initiate a dynamic data lookup see https://github.com/BuildFire/sdk/wiki/How-to-use-the-Datastore-Dynamic-Data
also you can look at an example such as the folder plugin https://github.com/BuildFire/folderPlugin/blob/d84551feb06cfc304c325480ca96d87795a66929/widget/widget.controller.js#L163
Basically every time a plugin is updated the plugin identifiers like folderName or title may change. So you need to keep your reference data fresh using dynamic data.
Here is a simple example that may draw a better picture. If you are referencing a plugin titled "Holiday Sales" so you save to your datastore collection {title: "Holiday Sales"} and hence forth refer to it by that title. This may work for a short period of time. However, if the app owner changes the title to "Summer Sale" now your copy is out-of-date. In traditional databases you would have 2 tables one with the source of truth and the other would have a foreign key referencing the first table. This way you join and always display the latest data.
Dynamic data is sort of an assisted lookup for you. You simply give it a key and what that key references. Then at run time when you make the call it will make the lookup you need server side and return to you the latest data you are looking for.
sample:
buildfire.datastore.save("MyData",{
_buildfire: { /// key identifier
myPluginsToNavTo: {
data:["123123-123123","asdasda-asdasd"] /// plugin instances
,dataType: "pluginInstance"
}
}
});
======
buildfire.datastore.getWithDynamicData("MyData",function(err,data){
// data would be:
/*
_buildfire: { /// key identifier
myPluginsToNavTo: {
data:["55f71347d06b61b4010351dc","asdasda-asdasd"]
,dataType: "pluginInstance"
,result: [ /// <=============new property added dynamically
{
"id": "55f71347d06b61b4010351dc",
"data": {
"pluginTypeId": 3212,
"token": "6372b101-addf-45da-bb0a-9208a09e7b6b",
"title": "YouTube Plugin",
"iconUrl": "http://s3-us-west-2.amazonaws.com/pluginserver/plugins/6372b101-addf-45da-bb0a-9208a09e7b6b/resources/image.png",
}
,{
"id": "asdasda-asdasd",
"data": {
"pluginTypeId": 123123,
"token": "1223123123-addf-45da-bb0a-9208a09e7b6b",
"title": "Plugin 2",
"iconUrl": "...",
}
}
}
]
}
}
*/
});
hope this helps

ElasticSearch Field boosting using java api

I am new to ES and trying to search using java apis. I am unable to figure out how I can provide filed specific boosting using the java apis.
Here is the example:
My index document looks like:
_source": {
"th_id": 1,
"th_name": "test name",
"th_description": "test desc",
"th_image": "test-img",
"th_slug": "Make-Me-Smart",
"th_show_title": "Coast Tech Podcast",
"th_sh_category": "Alternative Health
}
When i search for keywords I want to boost the results higher if they found in the "th_name" compared to they're found in some other fields.
Currently I am using below code to do search:
QueryBuilder qb1 = QueryBuilders.multiMatchQuery(keyword, "th_name", "th_description", "th_show_title", "th_sh_category");
SearchResponse response = client.prepareSearch("talk").setTypes("themes")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setQuery(qb1)
.setFrom(start).setSize(maxRows)
.setExplain(true).execute().actionGet();
Is there anything I can do at query time to boost the document if the keyword is found in "th_name" field compared to found in other fields?

The accepted answer did not work me. ES version I am using is 6.2.4.
QueryBuilders.multiMatchQuery(keyword)
.field("th_name" ,2.0f)
.field("th_description")
.field("th_show_title")
.field("content")
Hope it helps someone else.

Edit: This has changed and does no longer work in ES 6.x and upwards.
You should also be able to boost a field directly in the Multi-match query:
"The multi_match query supports field boosting via ^ notation in the fields json field.
{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^2", "message" ]
}
}
In the above example hits in the subject field are 2 times more important than in the message field."
In the java-api, just use the MultiMatchQueryBuilder:
MultiMatchQueryBuilder builder =
new MultiMatchQueryBuilder( keyword, "th_name^2", "th_description", "th_show_title", "th_sh_category" );
Disclaimer: Not tested

You can use "BoostingQuery"
http://www.elasticsearch.org/guide/reference/query-dsl/boosting-query.html
javadoc : https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/index/query/BoostingQueryBuilder.java

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to link/join multiple Lucene docs by AND operation - search

Related

Get exception details from Azure Monitor Workbook that deals with multiple app insight instances

String comparision in Apache free marker

display data from 2 assets using single query in hyperledger faric ?

Buildfire: Private Portal plugin

ElasticSearch Field boosting using java api

Categories

Resources