Simple date histogram? - groovy

Viewing documents on per weekday classification?
My data is in a format like this:
{"text": "hi","created_at": "2016-02-21T18:30:36.000Z"}
For this I am using a dateConversion.groovy script and kept in the scripts folder in ES 5.1.1.
Date date = new Date(doc[date_field].value);
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat(format);
format.format(date)
When I executed the following code in ES PLUGIN:
"aggs": {
"byDays": {
"terms": {
"script": {
"lang": "groovy",
"file": "dateConversion",
"params": {
"date_field": "created_at",
"format": "EEEEEE"
}
}
}
} ``
I am getting an exception like this:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Unable to find on disk file script [dateConversion] using lang [groovy]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "testindex-stats",
"node": "vVhZxH7pQ7CO3qpbYm_uew",
"reason": {
"type": "illegal_argument_exception",
"reason": "Unable to find on disk file script [dateConversion] using lang [groovy]"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Unable to find on disk file script [dateConversion] using lang [groovy]"
}
},
"status": 400
}

To use a script in an aggregation the script value is not a string but another object. I think you also need to specify lang alongside file.
"aggs": {
"byDays": {
"terms": {
"script": {
"lang": "groovy",
"file": "dateConversion",
"params": {
"date_field": "created_at",
"format": "EEEEEE"
}
}
}
}
}

Some parts of my code need some modifications
{
"aggs": {
"byDays": {
"terms": {
"script":{
"file":"test",
"params": {
"date_field": "created_at",
"format": "EEEEEE"
}
}
}
}
}
}
And also my test.groovy code too
Date date = new Date(doc[date_field].value);
date.format(format);

Related

Modeshape full-text-search works only on binary files

I am trying to perform a full-text-search on my Modeshape 5.3.0.Final repository. The query is as simple as:
Query query = queryManager.createQuery("SELECT * FROM [nt:resource] as data WHERE ISDESCENDANTNODE('/somenode') AND CONTAINS(data.*,'*" + text + "*')
Looks like it works well for binary stored files (i.e. pdf,doc,docx, etc...) but it does not match txt files, or any text format file.
This is my repository configuration
{
"name": "Persisted-Repository",
"textExtraction": {
"extractors": {
"tikaExtractor": {
"name": "General content-based extractor",
"classname": "tika"
}
}
},
"workspaces": {
"predefined": [
"otherWorkspace"
],
"default": "default",
"allowCreation": true
},
"security": {
"anonymous": {
"roles": [
"readonly",
"readwrite",
"admin"
],
"useOnFailedLogin": false
}
},
"storage": {
"persistence": {
"type": "file",
"path": "/var/content/storage"
},
"binaryStorage": {
"type": "file",
"directory": "/var/content/binaries",
"minimumBinarySizeInBytes": 999,
"mimeTypeDetection": "content"
}
},
"indexProviders": {
"lucene": {
"classname": "lucene",
"directory": "/var/content/indexes"
}
},
"indexes": {
"textFromFiles": {
"kind": "text",
"provider": "lucene",
"nodeType": "nt:resource",
"columns": "jcr:data(BINARY)"
}
}
}
Currently I'm performing a hack to get around this issue by executing another search for configured text file extensions and manually using Tika (maybe since it's text already Tika is not required here...) to extract the text and search for occurrences.
Does anybody know if this is expected behavior or maybe I am doing something wrong?
Cheers!

parsing exception on numbers

I am trying to index data that look like the following :
var json = {
"kwg": {
"kwg0List": [
{
"lemma": "bilingue",
"categories": [
"terme"
],
"occList": [
{
"startTimeSec": 537.1,
"endTimeSec": 537.46,
"value": "bilingue"
},
{
"startTimeSec": 563.2,
"endTimeSec": 563.55,
"value": "bilingue"
}
]
}
]
}
}
Everything works fine. Now, let's say, for whatever reason, that the one of the startTimeSec fields is equal to 10. It's interpreted as a long and not as a double anymore.
I would get the following error : mapper_parsing_exception, telling me that I should have a double and not a long.
Now my question is : is there a way to "force" the long to be cast to a double when indexing, or is previously checking that the data is correctly formatted the only way of doing it?
Trace :
{
"took": 1112,
"errors": true,
"items": [
{
"create": {
"_index": "library",
"_type": "relevance",
"_id": "AViRhRJ-_Tb2laJ1W4JH",
"status": 400,
"error": {
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "mapper [kwg.kwg0List.occList.endTimeSec] of different type, current_type [double], merged_type [long]"
}
}
}
}
]
}

Elasticsearch: Searching for fields with mapping not_analyzed get no hits

I have elasticsearch running and do all my requests with nodejs.
I have the following mapping applied for my index "mastert4":
{
"mappings": {
"mastert4": {
"properties": {
"s": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
I added exactly one document to the index which looks pretty much like this:
{
"master": {
"vi": "ff155d9696818dde0627e14c79ba5d344c3ef01d",
"s": "Anne Will"
}
}
Now doing any of the following search queries will not return any hits:
{
"index": "mastert4",
"body": {
"query": {
"filtered": {
"query": {
"match"/"term": {
"s": "anne will"/"Anne Will"
}
}
}
}
}
}
But the following query will return the exact document:
{
"index": "mastert4",
"body": {
"query": {
"filtered": {
"query": {
"constant_score": {
"filter": [
{
"missing": {
"field": "s"
}
}
]
}
}
}
}
}
}
And if I search for
{
"exists": {
"field": "s"
}
}
I will get no hits again.
When analyzing the field itsself I get:
{
"tokens": [
{
"token": "Anne Will",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 1
}
]
}
I'm really in a dead end here. Can someone tell me where I did wrong? Thx!!!!
You've enclosed the fields s and vi inside an outer field called master which is not declared in your mapping. That's the reason. If you query for master.s, you'll get results.
The second solution is to remove the enclosing master object in your document and that will work also:
{
"vi": "ff155d9696818dde0627e14c79ba5d344c3ef01d",
"s": "Anne Will"
}

Elasticsearch: transform date with groovy script

i have the following (simplified) mapping:
{
"event_date": {
"_source": { "enabled": true },
"_all": { "enabled": true },
"dynamic": "strict",
"properties": {
"start_date_time": {
"type": "date",
"format": "dateOptionalTime"
},
"start_date_day": {
"type": "date",
"format": "dateOptionalTime",
"index": "not_analyzed"
}
}
}
}
The indexed objects will look like this:
{
"start_date_time": "2017-05-08T18:23:45+0200"
}
The property start_date_day should always contain the same date, but with the time set to 00:00:00. In the example above start_date_day must be "2017-05-08T00:00:00+0200".
I think,it is possible to achieve this with a transform mapping and a groovy script, but my developed groovy code did not work in the elasticsearch-context and I am not that familiar with the groovy language.
Maybe someone has an idea on how to solve this?
Yes this is doable, for testing/running you might need to turn on script.groovy.sandbox.enabled: true in ../conf/elasticsearch.yml first.
PUT datetest/
{
"mappings": {
"event_date": {
"_source": { "enabled": true },
"_all": { "enabled": true },
"dynamic": "strict",
"transform" : {
"script" : "ctx._source['start_date_day'] = new Date().parse(\"yyyy-MM-dd\", ctx._source['start_date_time']).format(\"yyyy-MM-dd\");",
"lang": "groovy"
},
"properties": {
"start_date_time": {
"type": "date",
"format": "dateOptionalTime"
},
"start_date_day": {
"type": "date",
"format": "dateOptionalTime",
"index": "not_analyzed",
"store": "yes"
}
}
}
}
}
Sample data:
PUT /datetest/event_date/1
{
"start_date_time": "2017-05-08T18:23:45+0200"
}
Sample output:
GET /datetest/event_date/_search
{
"query": {
"match_all": {}
},
"fields": ["start_date_time","start_date_day"]
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "datetest",
"_type": "event_date",
"_id": "1",
"_score": 1,
"fields": {
"start_date_day": [
"2017-05-08T00:00:00.000Z"
],
"start_date_time": [
"2017-05-08T18:23:45+0200"
]
}
}
]
}
}
i think it is the format which isn't good. "date" seems to be MM/DD/YYYY and only this. If you want the time you need datetime format.
I have found this link which can help you : Elastic date format
You can try to change type "date" to "basic_date_time".
Based on your comments, since you don't need the hour part, you can simply define the mapping for the start_date_day field using the date format and use the following transform like this:
{
"event_date": {
"_source": {
"enabled": true
},
"_all": {
"enabled": true
},
"dynamic": "strict",
"transform": {
"script": "ctx._source['start_date_day'] = ctx._source['start_date_time'].split('T')[0]",
"lang": "groovy"
},
"properties": {
"start_date_time": {
"type": "date",
"format": "dateOptionalTime"
},
"start_date_day": {
"type": "date",
"format": "date",
"index": "not_analyzed"
}
}
}
}
ES will only store the date part and leave the hours and timezone out.
However, you should note that when using transform, the original source is stored without modification, the result of the transform will be indexed though, and hence searchable.

function score groovy script not returning any result

my query is
{
"query": {
"function_score": {
"functions": [{
"script_score": {
"lang": "groovy",
"script_file": "category-score",
"params": {
"my_modifier": "doc['category'].value"
}
}
},
{
"script_score": {
"lang": "groovy",
"script_file": "popularity-score",
"params": {
"my_modifier": "doc['popularity'].value"
}
}
},
{
"script_score": {
"lang": "groovy",
"script_file": "type-score",
"params": {
"my_modifier": "doc['finder_type'].value"
}
}
}
],
"query": {
"filtered": {
"query": {"multi_match": {
"query": "rent,buy",
"fields": ["category","categorytags"]
}},"filter": {
"bool" : {"must": [{"terms" : { "city": ["mumbai"] }}]}
},"_cache" : true
}
},
"score_mode": "sum",
"boost_mode": "replace"
}
}}
and my four script files looks like
(my_modifier == 'rent,buy' ? 10 : 0)
log(my_modifier1.value)
(my_modifier2 > 0 ? 20 : 0)
I am trying to calculate the score of matching documents in the function score with three script_score functions.
My scripts are getting compiled on startup as i can see in logs but it don't return me any result.ES version is 1.6.0.
Also how can i enable inline/dynamic scripting on with ES-1.6.0 as i tried around with many settings changes in elasticsearch.yml as ES as comeup with some breaking changes with 1.6.0 release for scripting module.
"my_modifier": "doc['finder_type'].value"
Params should be a value and not a script.

Resources