handle empty date in elasticsearch dynamically

handle empty date in elasticsearch dynamically - azure

I have below dynamic mapping template.
PUT my_index
{
"mappings": {
"dynamic_templates": [
{
"objects": {
"match_mapping_type": "object",
"mapping": {
"type": "nested"
}
}
}
],
"dynamic_date_formats": ["yyyy-MM-dd" , "yyyy-MM-dd HH:mm:ss"]
}
}
Only problem is when I am having empty date it is throwing error. I just want to ignore empty dates. My data having multiple date fields hence don't want to do mapping for each date fields.
Below is the error I am getting:
org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception: mapper [pb_bureau.applications.accounts.dateclosed] of different type, current_type [text], merged_type [date]
{"index":{"_id":"02ade9b5-1ca5-4006-ab06-9c96439e7d02"}}
below date we are inserting: blank field is null value of date
select date1, date2 from cbl_application_credit_report_account ;
2014-11-14
2018-03-31
2012-07-27 2012-07-23
2015-11-30
2017-08-04 2016-05-13
below is mapping which I am applying:
PUT my_index
{
"mappings": {
"dynamic_templates": [
{
"objects": {
"match_mapping_type": "object",
"mapping": {
"type": "nested"
}
}
},
{
"dates_ignore_malformed": {
"path_match": "*",
"match_mapping_type": "date",
"mapping": {
"format": "yyyy-MM-dd||yyyy-MM-dd HH:mm:ss",
"ignore_malformed": true
}
}
}
],
"dynamic_date_formats": ["yyyy-MM-dd" , "yyyy-MM-dd HH:mm:ss"]
}
}
Is there any way in the dynamic mapping to ignore empty dates?

Mapping:
PUT my_index4
{
"mappings": {
"dynamic_templates": [
{
"objects": {
"match_mapping_type": "object",
"mapping": {
"type": "nested"
}
}
},
{
"dates_ignore_malformed": {
"path_match": "*",
"match_mapping_type": "date",
"mapping": {
"format": "yyyy-MM-dd||yyyy-MM-dd HH:mm:ss" ---> date format on which to be applied ,
"ignore_malformed": true ---> Ignores if field s malformed
}
}
}
],
"dynamic_date_formats": [
"yyyy-MM-dd",
"yyyy-MM-dd HH:mm:ss"
]
}
}
Data:
POST my_index4/_doc
{
"date":"2019-01-01 04:30:22",
"Id":1
}
POST my_index4/_doc
{
"name":2,
"date":"2019-01-01"
}
POST my_index4/_doc
{
"name":2,
"date":""
}
Query:
GET my_index4/_search
Result:
"hits" : [
{
"_index" : "my_index4",
"_type" : "_doc",
"_id" : "NT5XSG0BbzgYofLxTDZ_",
"_score" : 1.0,
"_source" : {
"date" : "2019-01-01 04:30:22",
"Id" : 1
}
},
{
"_index" : "my_index4",
"_type" : "_doc",
"_id" : "Nj5XSG0BbzgYofLxUTaT",
"_score" : 1.0,
"_source" : {
"name" : 2,
"date" : "2019-01-01"
}
},
{
"_index" : "my_index4",
"_type" : "_doc",
"_id" : "Nz5XSG0BbzgYofLxWDYi",
"_score" : 1.0,
"_ignored" : [
"date"
],
"_source" : {
"name" : 2,
"date" : ""
}
}
]

Related

ElasticSearch can't get multiple suggestor values from the same document

Can you help me please?
I have a problem with Completion Suggester in ElasticSearch
Example: I have this mapping :
PUT music
{
"mappings": {
"properties": {
"suggest": {
"type": "completion"
},
"title": {
"type": "keyword"
}
}
}
}
and index multiple suggestions for a document as follows:
PUT music/_doc/1?refresh
{
"suggest": [
{
"input": "Nirva test",
"weight": 10
},
{
"input": "Nirva hola",
"weight": 3
}
]
}
Querying: you can do this request on kibana
POST music/_search?pretty
{
"suggest": {
"song-suggest": {
"prefix": "nirv",
"completion": {
"field": "suggest"
}
}
}
}
and the result I retrieve only the first value but not both.
I did the test on kibana dev tool too and this is the result
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"song-suggest" : [
{
"text" : "nir",
"offset" : 0,
"length" : 3,
"options" : [
{
"text" : "Nirvana test",
"_index" : "music",
"_type" : "_doc",
"_id" : "1",
"_score" : 10.0,
"_source" : {
"suggest" : [
{
"input" : "Nirvana test",
"weight" : 10
},
{
"input" : "Nirvana best",
"weight" : 3
}
]
}
}
]
}
]
}
}
expected result :
"suggest" : {
"song-suggest" : [
{
"text" : "nirvana",
"offset" : 0,
"length" : 7,
"options" : [
{
"text" : "Nirvana test",
"_index" : "music",
"_type" : "_doc",
"_id" : "1",
"_score" : 10.0,
"_source" : {
"suggest" : [
{
"input" : "Nirvana test",
"weight" : 10
},
{
"input" : "Nirvana best",
"weight" : 3
}
]
}
}
]
},
{
"text" : "nirvana b",
"offset" : 0,
"length" : 9,
"options" : [
{
"text" : "Nirvana best",
"_index" : "music",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0,
"_source" : {
"suggest" : [
{
"input" : "Nirvana test",
"weight" : 10
},
{
"input" : "Nirvana best",
"weight" : 3
}
]
}
}
]
}
]
}

This is the default behavior of current implementations. You can check #31738. Below is one of the comment for an explanation why it is returning only one document/suggestion.
The completion suggester is document-based by design so we cannot
return one entry per matching suggestion. It is documented that it
returns documents not suggestions and a single input can be indexed in
multiple suggestions (if you have synonyms in your analyzer for
instance) so it is not trivial to differentiate a match from its
variations. Also the completion suggester does not visit all
suggestions to select the top N, it has a special structure (a
weighted FST) that can visit suggestions in the order of their scores
and early terminates the query once enough documents have been found.

Elastic Search multi match query can't ignore special characters

I have a name field value as "abc_name" so when I search "abc_" I am getting proper results but when I search "abc_##£&-#&" still I am getting same results. I want my query to ignore this special characters that doesn't matches with my query.
My query has:
Multi_match
type as cross_fields
operator AND
I am using search_analyzer standard for my Fields
And I want this structure as it is otherwise it will affect my other Search behaviour
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard"
}

Please see the below sample which would fit your use case where I've created a custom analyzer which would fit your use case:
Sample Mapping:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "custom_tokenizer",
"filter": ["lowercase", "3_5_edge_ngram"]
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+". <---- Note this pattern
}
},
"filter": {
"3_5_edge_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 5
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}
The above mentioned pattern would simply ignore the tokens with the format like abc_$%^^##. As a result this token would not be indexed.
Note that the way the analyzer works is:
First executes tokenizer
Then applies the edge_ngram filter on the tokens generated.
You can verify by simply removing the edge_ngram filter in the above mapping to first understand what tokens are getting generated via Analyze API which would be as below:
POST some_test_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text": "abc_name asda efg_!##!## 1213_adav"
}
Tokens generated:
{
"tokens" : [
{
"token" : "abc_name",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
},
{
"token" : "asda",
"start_offset" : 9,
"end_offset" : 13,
"type" : "word",
"position" : 1
},
{
"token" : "1213_adav",
"start_offset" : 25,
"end_offset" : 34,
"type" : "word",
"position" : 2
}
]
}
Note that the token efg_!##!## has been removed.
I've added edge_ngram fitler as you would want the search to be successful if you search with abc_ if your tokens generated via tokenizer is abc_name.
Sample Document:
POST some_test_index/_doc/1
{
"my_field": "abc_name asda efg_!##!## 1213_adav"
}
Query Request:
Use-case 1:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "abc_"
}
}
}
Use-case-2:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "efg_!##!##"
}
}
}
Responses:
Response for use-case-1:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.47992462,
"hits" : [
{
"_index" : "some_test_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.47992462,
"_source" : {
"my_field" : "abc_name asda efg_!##!## 1213_adav"
}
}
]
}
}
Response for use-case-2:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Updated Answer:
Create your mapping as follows based on the index I've created and let me know if that works:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "punctuation",
"filter": ["lowercase"]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+"
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "autocompete", <----- Assuming you have already this in setting
"search_analyzer": "my_custom_analyzer". <----- Note this
}
}
}
}
Please try and let me know if this works for all your use-cases.

No results match your search criteria in kibana with timestamp

I created an index in elasticsearch 6.5.1 successfully loaded the data to that index. there is one field "submitted_date" which is the timestamp. below is the mapping like of this field.
"submitted_date": { "type": "date", "format":"yyyy-MM-dd HH:mm:ss.SSS" },
then I created the index pattern. I used the Time Filter field name as "submitted_date". after that, I tried to check the data in Discover tab, but data are not showing. there is a message saying that No results match your search criteria.
NOTE that I have changed the time in time range picker in every possible way which is on top of the right corner in kibana dashboard.
data appear in Dev Tools tab with elastic queries.
ps : I inserted the data using nodejs with elasticsearch official library, did not used logstash.
I followed this article, but it did not help me.
UPDATE : sample document
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 10480,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "tests",
"_id" : "1214334",
"_score" : 1.0,
"_source" : {
"priority" : "4",
"submitted_date" : "2018-01-04T18:32:21.000Z",
"submitted_month" : 0,
"submitted_month_name" : "January",
"submitted_day" : 4,
"submitted_weekday" : "Tuesday",
"submitted_hour" : 18,
"submitted_year_month" : "2018-0",
"submitted_year_month_name" : "2018-January",
"date_key" : "20180104",
"year_month_key" : "201801",
"status" : "Closed"
}
}
]
}
}
Inspect request
{
"version": true,
"size": 500,
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"_source": {
"excludes": []
},
"aggs": {
"2": {
"date_histogram": {
"field": "submitted_date",
"interval": "1d",
"time_zone": "Asia/Kolkata",
"min_doc_count": 1
}
}
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
{
"field": "close_date",
"format": "date_time"
},
{
"field": "last_modified_date",
"format": "date_time"
},
{
"field": "last_resolved_date",
"format": "date_time"
},
{
"field": "submitted_date",
"format": "date_time"
},
{
"field": "time_to_resolve",
"format": "date_time"
}
],
"query": {
"bool": {
"must": [
{
"match_all": {}
},
{
"range": {
"submitted_date": {
"gte": 1514745000000,
"lte": 1543937620414,
"format": "epoch_millis"
}
}
}
],
"filter": [],
"should": [],
"must_not": []
}
},
"highlight": {
"pre_tags": [
"#kibana-highlighted-field#"
],
"post_tags": [
"#/kibana-highlighted-field#"
],
"fields": {
"*": {}
},
"fragment_size": 2147483647
}
}
Index pattern
function _putMapping() {
return client.indices.create({
index: process.env.ELASTICSEARCH_INDEX,
body: {
settings:{
index:{
"number_of_shards": 1,
"number_of_replicas": 5
},
"index.mapping.ignore_malformed" : true
},
mappings:{
tests:{
properties:{
"last_modified_date": { "type": "date" },
"last_resolved_date": { "type": "date" },
"time_to_resolve": { "type": "date" },
"submitted_date": { "type": "date", "format":"yyyy-MM-dd HH:mm:ss.SSS" },
"date_key": { "type": "integer" },
"priority": { "type": "long" },
"submitted_hour": { "type": "long" },
"submitted_month": { "type": "long" },
"submitted_year": { "type": "long" },
"submitted_year": { "type": "keyword" },
"submitted_year_month": { "type": "keyword" },
"submitted_year_month_name": { "type": "keyword" },
}
}
}
}
});
}

Your mYour submitted_date is coming like 2018-01-04T18:32:21.000Z but your mapping is set as yyyy-MM-dd HH:mm:ss.SSS.
You need to change it to "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'".

Converting a MongoDB aggregate into an ArangoDB COLLECT

I'm migrating data from Mongo to Arango and I need to reproduce a $group aggregation. I have successfully reproduced the results but I'm concerned that my approach maybe sub-optimal. Can the AQL be improved?
I have a collection of data that looks like this:
{
"_id" : ObjectId("5b17f9d85b2c1998598f054e"),
"department" : [
"Sales",
"Marketing"
],
"region" : [
"US",
"UK"
]
}
{
"_id" : ObjectId("5b1808145b2c1998598f054f"),
"department" : [
"Sales",
"Marketing"
],
"region" : [
"US",
"UK"
]
}
{
"_id" : ObjectId("5b18083c5b2c1998598f0550"),
"department" : "Development",
"region" : "Europe"
}
{
"_id" : ObjectId("5b1809a75b2c1998598f0551"),
"department" : "Sales"
}
Note the value can be a string, Array or not present
In Mongo I'm using the following code to aggregate the data:
db.test.aggregate([
{
$unwind:{
path:"$department",
preserveNullAndEmptyArrays: true
}
},
{
$unwind:{
path:"$region",
preserveNullAndEmptyArrays: true
}
},
{
$group:{
_id:{
department:{ $ifNull: [ "$department", "null" ] },
region:{ $ifNull: [ "$region", "null" ] },
},
count:{$sum:1}
}
}
])
In Arango I'm using the following AQL:
FOR i IN test
LET FIELD1=(FOR a IN APPEND([],NOT_NULL(i.department,"null")) RETURN a)
LET FIELD2=(FOR a IN APPEND([],NOT_NULL(i.region,"null")) RETURN a)
FOR f1 IN FIELD1
FOR f2 IN FIELD2
COLLECT id={department:f1,region:f2} WITH COUNT INTO counter
RETURN {_id:id,count:counter}
Edit:
The APPEND is used to convert string values into an Array
Both produce results that look like this;
{
"_id" : {
"department" : "Marketing",
"region" : "US"
},
"count" : 2.0
}
{
"_id" : {
"department" : "Development",
"region" : "Europe"
},
"count" : 1.0
}
{
"_id" : {
"department" : "Sales",
"region" : "null"
},
"count" : 1.0
}
{
"_id" : {
"department" : "Marketing",
"region" : "UK"
},
"count" : 2.0
}
{
"_id" : {
"department" : "Sales",
"region" : "UK"
},
"count" : 2.0
}
{
"_id" : {
"department" : "Sales",
"region" : "US"
},
"count" : 2.0
}

Your approach seems alright. I would suggest to use TO_ARRAY() instead of APPEND() to make it easier to understand though.
Both functions skip null values, thus it is unavoidable to provide some placeholder, or test for null explicitly and return an array with a null value (or whatever works best for you):
FOR doc IN test
FOR field1 IN doc.department == null ? [ null ] : TO_ARRAY(doc.department)
FOR field2 IN doc.region == null ? [ null ] : TO_ARRAY(doc.region)
COLLECT department = field1, region = field2
WITH COUNT INTO count
RETURN { _id: { department, region }, count }
Collection test:
[
{
"_key": "5b17f9d85b2c1998598f054e",
"department": [
"Sales",
"Marketing"
],
"region": [
"US",
"UK"
]
},
{
"_key": "5b18083c5b2c1998598f0550",
"department": "Development",
"region": "Europe"
},
{
"_key": "5b1808145b2c1998598f054f",
"department": [
"Sales",
"Marketing"
],
"region": [
"US",
"UK"
]
},
{
"_key": "5b1809a75b2c1998598f0551",
"department": "Sales"
}
]
Result:
[
{
"_id": {
"department": "Development",
"region": "Europe"
},
"count": 1
},
{
"_id": {
"department": "Marketing",
"region": "UK"
},
"count": 2
},
{
"_id": {
"department": "Marketing",
"region": "US"
},
"count": 2
},
{
"_id": {
"department": "Sales",
"region": null
},
"count": 1
},
{
"_id": {
"department": "Sales",
"region": "UK"
},
"count": 2
},
{
"_id": {
"department": "Sales",
"region": "US"
},
"count": 2
}
]

Search query to retrieve nested documents in elasticsearch with _source disabled

I have the following mapping
{
"cloth": {
"dynamic" : false,
"_source" : {"enabled" : false },
"properties": {
"name": {
"type": "string",
"index": "analyzed"
},
"variation": {
"type": "nested",
"properties": {
"size": {
"type": "string",
"index": "not_analyzed"
},
"color": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
I am not able to figure out a way to retrieve the nested object fields using the fields query.
{
"fields" : ["name" , "variation.size", "variation.color"],
"query" : {
"nested" : {
"path" : "variation",
"query" : {
"bool" : {
"must" : [
{ "term" : { "variation.size" : "XXL" } },
{ "term" : { "variation.color" : "red" } }
]
}
}
}
}
}
The above query returns
"_id" : "1",
"_score" : 1.987628,
"fields" : {
"variation.size" : [ "XXL", "XL" ],
"variation.color" : [ "red", "black" ],
"name" : [ "Test shirt" ]
}
When I tried
"fields" : ["name" , "variation"]
I got the error
status: 400
reason: "ElasticsearchIllegalArgumentException[field [variation] isn't a leaf field]"
Which is as expected.
How can I get the variation object as it is?
Expected Result. I need to retrieve the variable object as a whole so that I can preserve the association of size and color. Like "red" with "XXL".
"variation" : { "XXL" , "red"}
Update: Source is disabled for this Index Type.

If you use Source Filtering it will return the nested objects as a whole, your query would be:
{
"_source": [
"name",
"variation"
],
"query": {
"nested": {
"path": "variation",
"query": {
"bool": {
"must": [
{
"term": {
"variation.size": "XXL"
}
},
{
"term": {
"variation.color": "red"
}
}
]
}
}
}
}
}

You should use this:
"script_fields": {
"variation": {
"script": {
"inline": "doc['variation.size'].value + ' ' + doc['variation.red'].value"
}
}
}
I use elasticsearch v. 5.1.1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

handle empty date in elasticsearch dynamically - azure

Related

ElasticSearch can't get multiple suggestor values from the same document

Elastic Search multi match query can't ignore special characters

No results match your search criteria in kibana with timestamp

Converting a MongoDB aggregate into an ArangoDB COLLECT

Search query to retrieve nested documents in elasticsearch with _source disabled

Categories

Resources