Kibana and groovy scripting - groovy

I was looking for a way to calculate a ratio on Kibana. After many researches i found this way :
Using the "JSON Input" feature in a visualisation.
I have all my informations in an index, with 2 types of documents (boots and reboots).
I am looking for the script which count the number of documents with the type boots, same for the reboots type then divide the second by the first.
It sounds really easy, but i do not find any way to get it after my researches, and i am not used to groovy enough yet to do it by myself.
I found many ways to manipulate documents values (doc['mydocname'].values etc), but nothing about the type.
Thanks in advance.
EDIT : I tried this
{
"aggs" : {
"boots_count" : { "value_count" : { "_type" : "boots" } }
}
}
Which is supposed to count the number of fields (here the field _type) in the index. But when i put it into "JSON Input" in a visualisation, that results in an error :
Error: Request to Elasticsearch failed: {"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[BbXJ0O6tRxa_OcyBfYCGJQ][informationbe][0]: SearchParseException[[informationbe][0]: from[-1],size[0]: Parse Failure [Failed to parse source [{\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"#sitePoste\",\"size\":5,\"order\":{\"1\":\"desc\"}},\"aggs\":{\"1\":{\"avg\":{\"script\":\"0\",\"lang\":\"expression\",\"ratio\":{\"boots_count\":{\"value_count\":{\"_type\":\"boots\"}}}}}}}}
I am wrong. But where ?
EDIT2 : In other hand, i am trying scripted fields, with something like this using lucene expression :
doc['_type:boots'].count / doc['_type:reboots'].count
but it doesnt work more, i am pretty confident about the "doc['_type:boots']" part, i guess the problem is on the "XXX.count" part.
After many attempts, i understand better and better how it works. Default scripted fields scope is on the document, not on the whole index, so i cant do a count action of whole values of the index from documents in it.
I am looking for a workaround, i'll post it it if find something interesting.

I finally solved my problem :
I added a scripted field, if the type of the document is boots, the scripted field = 1, else 0. Then i created a search with only boots and reboots documents (filter _type:boots _type:reboots) and calculated the average of the scripted field in a metric.
Everything works well !

Related

ADF: can't build simple Json file transformation (one field flattening)

I need a help in transforming simple json file inside Azure Data Flow. I need to flatten just one field date_sk in example here:
{
"date_sk": {"string":"2021-09-03"}
"is_influencer": 0,
"is_premium": -1,
"doc_id": "234"
}
Desired transformation:
"date_sk": {"string":"2021-09-03"}
to become
"dateToGroupBy" : "2021-09-03"
I create source stream, note the strange projection Azure picks, there is no "string" field anymore, but this is how automatic Azure transformation works for some reason:
Data preview of the same source stream node:
And here's how it suggest me to transform it in a separate "Derived Column" modifier. I played with the right part, but this is the only format (date_sk.{}) that does not display any error I was able to pick:
But then output dateToGroupBy field happens to be empty:
Any ideas on what could got wrong and how can I build the expected transformation? Thank you
Alright, it happened to be a Microsoft bug in ADF.
ADF stumbles upon "string" field name as JSON field, can't handle it, though schema and data validation passes through Ok, and showing no errors.
When I replace date_sk": {"string":"2021-09-03"} by date_sk": {"s1":"2021-09-03"} or anything other than string everything starts working just fine
and dateToGroupBy is filled with date values taken from date_sk.s1
When I return string back, it shows NULL in output values.
It supposed to either show error on verification stage or handle this field naming properly.

Using Logstash Aggregate Filter plugin to process data which may or may not be sequenced

Hello all!
I am trying to use the Aggregate filter plugin of Logstash v7.7 to correlate and combine data from two different CSV file inputs which represent API data calls. The idea is to produce a record showing a combined picture. As you can expect the data may or may not arrive in the right sequence.
Here is as an example:
/data/incoming/source_1/*.csv
StartTime, AckTime, Operation, RefData1, RefData2, OpSpecificData1
231313232,44343545,Register,ref-data-1a,ref-data-2a,op-specific-data-1
979898999,75758383,Register,ref-data-1b,ref-data-2b,op-specific-data-2
354656466,98554321,Cancel,ref-data-1c,ref-data-2c,op-specific-data-2
/data/incoming/source_1/*.csv
FinishTime,Operation,RefData1, RefData2, FinishSpecificData
67657657575,Cancel,ref-data-1c,ref-data-2c,FinishSpecific-Data-1
68445590877,Register,ref-data-1a,ref-data-2a,FinishSpecific-Data-2
55443444313,Register,ref-data-1a,ref-data-2a,FinishSpecific-Data-2
I have a single pipeline that is receiving both these CSVs and I am able to process and write them as individual records to a single Index. However, the idea is to combine records from the two sources into one record each representing a superset. of Operation related information
Unfortunately, despite several attempts I have been unable to figure out how to achieve this via Aggregate filter plugin. My primary question is whether this is a suitable use of the specific plugin? And if so, any suggestions would be welcome!
At the moment, I have this
input {
file {
path => ['/data/incoming/source_1/*.csv']
tags => ["source1"]
}
file {
path => ['/data/incoming/source_2/*.csv']
tags => ["source2"]
}
# use the tags to do some source 1 and 2 related massaging, calculations, etc
aggregate {
task_id = "%{Operation}_%{RefData1}_%{RefData1}"
code => "
map['source_files'] ||= []
map['source_files'] << {'source_file', event.get('path') }
"
push_map_as_event_on_timeout => true
timeout => 600 #assuming this is the most far apart they will arrive
}
...
}
output {
elastic { ...}
}
And other such variations. However, I keep getting individual records being written to the Index and am unable to get one combined. Yet again, as you can see from the data set there's no guarantee of the sequencing of records - so I am wondering if the filter is the right tool for the job, to begin with? :-\
Or is it just me not being able to use it right! ;-)
In either case, any inputs/ comments/ suggestions welcome. Thanks!
PS: This message is being cross-posted over from Elastic forums. I am providing a link there just in case some answers pop up there too.
The answer is to use Elastic search in upsert mode. Please see the specifics here..
I recommend first that the information reaches you in order so that the filter can take it better, secondly, you could set the options in your pipeline.yml: pipeline.workers: 1 and pipeline.ordered: true, thus guaranteeing the order of processing.

AZURE SEARCH, ismatch not filtering

I'm using Azure Search for perform some customs search in a database.
I got this one field that have this kind of structure:
"STUFF": "05-05-16-00|"
but I'm having trouble by creating the filter, because its possible that I'll not have all the numbers that builds this structure. It all depends that what the final user will type. So I need a wildcard to fill the blanks with the missing numbers, like this
"05-05-??-??" -> the pipe is important, because this field can have more than 1 code inside.
Now I need to catch all the possible elements that STARTS WITH 05-05, like, for example: 05-05-11-01
I thought I suposed to use the search.ismatch() function, but it doesnt work.
here some code:
search.ismatch('05-05-??-??','STUFF');
And the results were:
"STUFF": "02-02-16-00|",
"STUFF": "02-02-14-00|",
this is driving me crazy, because I dont know why this results came back.
Maybe is important to know that Im performing a POST request to the Azure Search API with this code in 'filter'
Maybe i should to escape this especial characters like - and ? like this
search.ismatch('05\\-05\\-\\?\\?\\-\\?\\?','STUFF')
But the results were the same.
Can somebody please help me ?
EDIT 1
following this Article I change some things and make the following search:
search.ismatch('\"05-00*\"','STUFF','simple', 'all')
And I starting the get some results, but now this is my results:
"STUFF": "06-05-02-00|", //WRONG
"STUFF": "05-02-05-01|", //RIGHT
"STUFF": "05-02-02-07|", //RIGHT
For some reason, it's returing the right structure but not the in the front of the text.
EDIT 2
I made some changes and change all the "-" for the keyword "OU" and I'm trying to follow this question to make sore like a "contains", but i perfoming a POST request with the following parameters
{
"search": "*",
"filter": "search.ismatch('/.*08010000OU/.*','STUFF', 'full', 'all')",
"skip": "0",
"count": true
}
Im trying to use a wildcard in the begining of the query search because I still missing some information.
I believe you won't be able to solve this using the StandardAnalyzer. Try switching to WhitespaceAnalyzer for this particular field and it probably will work with "05-05*"

How do I create a composite index for my Firestore query?

I'm trying to perform a firestore query on a collection which results in a failure because an index needs to be created for the query I'm attempting. The error contains a link that is suppose to auto create the missing index for me. However when I follow the link and attempt to create the index that has been prepared for me I encounter an error stating "name only indexes are not supported". I would also point out I have been using the npm functions-framework to test my cloud function that contains the relevant query.
I have tried creating the composite index myself manually but none of the index I have made seem to satisfy my attempted query.
Sample docs in my Items Collection:
{
descriptionLastModified: someTimestamp <a timestamp datatype>
detectedLanguage: "en-us" <string>
}
{
descriptionLastModified: someTimestamp <a timestamp datatype>
detectedLanguage: "en-us" <string>
}
{
descriptionLastModified: someTimestamp <a timestamp datatype>
detectedLanguage: "fr" <string>
}
{
descriptionLastModified:someTimestamp <a timestamp datatype>
detectedLanguage: "en-us" <string>
}
These are all queries I have tried which fail:
let queryRef = itemsRef.where('descriptionLastModified','<=', oneDayAgoTimestamp).orderBy("descriptionLastModified","desc").where("detectedLanguage", '==', "en-us").get()
let queryRef = itemsRef.where('descriptionLastModified','<=', oneDayAgoTimestamp).where("detectedLanguage", '==', "en-us").get()
let queryRef = itemsRef.where("detectedLanguage", '==', "en-us").where('descriptionLastModified','<=', oneDayAgoTimestamp).get()
I have made the following composite indexes at the collection level to no avail:
CollectionId:items Fields: descriptionLastModified:DESC detectedLangauge: ASC
CollectionId:items Fields: descriptionLastModified:ASC detectedLangauge: ASC
CollectionId:items Fields: detectedLangauge: ASC descriptionLastModified:DESC
My expectation is I should be able to filter my items by their descriptionLastModified timestamp field and additionally by the value of their detected Language string field.
In case anyone finds this in the future, its 2021, I still find composite indexes created manually, despite being incredibly simple, or you'd think and I fully understand why the OP thought his indexes would work, often just don't. Doubtless there is some subtlety that reading some guides would make clear but I haven't found the trick yet and have been using firestore for over 18 months intensively at work.
The trick is to use the link it creates, but this often fails, you get a dialogue box telling you an index will be created, but no details for you to manually create and the friendly blue 'create' button does nothing, it neither creates the index nor does it dismiss the window.
For a while I had it working in firefox but it stopped. A colleague across a couple of desks who has to create them a lot tells me that Edge is the most reliable, and you have to be very careful to not have multiple google accounts signed in - if edge (or chrome) takes you to the wrong login initially when following the link, even if you switch user back (and you have to do this because it will assume your default login rather than say the one currently selected in your only google cloud console window), even if you switch back its about a 1 in 3. He tells me in edge it works about 60%
I used to get about 30% with firefox just hitting refresh and soon a few times, but cant get it working other than in edge now, and actually, unless there is a client with little cash who will notice, I just go for inefficient and costly queries which return the superset of results and do some filters on the results. Mostly running in nodejs and its nippy enough for my purposes. Real shame to ramp up the read counts and consequential bills, but just doesn't seem a fix.

search with startkey, endkey and array keys

I have a view wich returns several elements with array keys.
Example :
{"total_rows":4,"offset":0,"rows":[
{"id":"","key":[15,"2"],"value":1,"doc":{},
{"id":"","key":[20,"2"],"value":1,"doc":{},
{"id":"","key":[20,"3"],"value":1,"doc":{},
{"id":"","key":[20,"4"],"value":1,"doc":{}
]}
I'm trying to search through those elements. So if I do the following request :
/database/_design/element/_view/all/?
startkey=[15, "2"]&
endkey=[20, "3"]&
include_docs=true&reduce=false
Live example : http://jchris.couchone.com/keyhuh/_design/Record/_view/by_CreationDate_and_BoreholeName?startkey=[1267686720,%22sp4%22]&endkey=[1267686725,%22sp4\u9999%22]&include_docs=true&reduce=false
This one doesn't works. It returns me all the records, even the last one, which doesn't meets the second element of the array.
Strangely enough, it works with strings only.
Example :
{"total_rows":4,"offset":0,"rows":[
{"id":"","key":["15","2"],"value":1,"doc":{},
{"id":"","key":["20","2"],"value":1,"doc":{},
{"id":"","key":["20","3"],"value":1,"doc":{},
{"id":"","key":["20","4"],"value":1,"doc":{}
]}
if I do the following request :
/database/_design/element/_view/all/?
startkey=["15", "2"]&
endkey=["20", "3"]&
include_docs=true&
reduce=false
Live Example : http://jchris.couchone.com/keyhuh/_design/Record/_view/by_Client_and_BoreholeName?startkey=[%22Test1%22,%22sp4%22]&endkey=[%22Test1%22,%22sp4\u9999%22]&include_docs=true&reduce=false
Here it'll work well and only return the three first elements.
Am I missing something with couchdb's search for arrays with integers and strings ? Or have I fallen on a bug ?
Note : it does the same with CouchDB 0.10 and 0.11.
This looks wrong, and there are a few things it could be. Is it possible for you to share your code with us? If the data isn't proprietary you could replicate your db to http://jchris.couchone.com/keyhuh and I'll take a look at the whole thing there.
...
Thanks for posting the live data. This is the query that is busted?
http://jchris.couchone.com/keyhuh/_design/Record/_view/by_Client_and_BoreholeName?startkey=[%22Test1%22,%22sp4%22]&endkey=[%22Test1%22,%22sp4\u9999%22]&reduce=false
Because that looks fine to me. What am I missing?

Resources