Elastic Search Bulk partial update for timestamped index with Kibana - python-3.x

I am using Elastic search and Kibana with a python client. My data are stored in elastic search, and I used kibana for data analysis and visualisation. In Kibana, I created a new index pattern with the timestamp field.
When I run the bulk partial update code, the documents disappear.
Then, I removed the index pattern and re-create the index pattern without the timestamp field. Only the fields(data_partial) provided in the "_source" can be seen in Kibana's discover panel.
So, I am wondering whether the partial update ('doc_as_upsert': True) is only worked for an index pattern without a timestamp field or not.
Or, I do not know what I am missing.
def add_data_partial_to_bulk(es, index):
d_body = []
qu = {'query': {'match_all': {}}}
for hit in scan(es, index=index, query=qu):
body = {
"_index": index,
"_id": hit["_id"],
"doc_as_upsert": True, # << this partial update only work for index pattern w/o a timestamp field
"_source": {
"data_partial": "hello world"}
}
d_body.append(body)
return d_body
doc_body = add_data_partial_to_bulk(es, es_index_name)
helpers.bulk(es, doc_body)

Related

Is there any way to create partial index in ArangoDB?

I want to create partial index for the collection, but the index should be applied to documents by conditions. For example, I want to check uniqueness of documents only if they have the certain field value. In other words, I'm looking for some construction of index creating:
db.person.createIndex(
{ age: 1},
{ partialFilterExpression: { age: { $gte: 18 }}
);
This example is from MongoDB and it is applying index on documents with field 'age' value greater then 18
There is no way to create a "filtered index" (like you can in SQL). According to the docs, you can include attributes, but not conditionally.
You could try a sparse index, but I think your best bet is adding the age attribute to a "skiplist" index, which supports sorting and gt/lt evaluation.
Make sure you use the explain feature to validate index usage.

Update same field in different documents with different value at once with MongoDB

I didn't found a question similar to mine and i'm not sure it's possible. I have several documents, each document is a person, for example :
{
"name" = "Paul",
"score" = 105
}
{
"name" = "John",
"score" = 98
}
Before the update i have a dict (in python) with the name and the new score {"Paul": 107, "John": 92}. How do i update the score in all the documents from the dict in one request ?
You can not update multiple documents with different conditions in a single query. You can refer MongoDB update doc. MongoDB introduced a new parameter as multi but it has a different meaning. update query with param { multi: true } will update multiple documents only which will match the same condition which we set in query part.
Optionally you can update multiple documents with loop through your query. This feature is still missing in MongoDB so we are also doing such thing in the same way.

Searching for a particular phrase in _all fields generates less number of records comparing to do the same thing on small number of fields

I wanted to search for a particular phrase using elasticsearch on both _all fields and only 2 fields. The phrase is taken from a file listing more than 10000 keywords.: Here is the code:
from elasticsearch import Elasticsearch
import os, sys
import requests
import json
es = Elasticsearch(['localhost:9200/'])
with open('localDrive\\extract_keywords\\t2.txt') as my_keywordfile:
for keyword in my_keywordfile.readlines():
keyword_array.append(keyword.strip().strip("'"))
with open('LocalFile\\_description_Results2.txt','w',encoding="utf-8") as f:
for x in keyword_array:
doc = {
"query": {
"multi_match": {
"query": x,
"type": "phrase",
"fields":["title", "description"],
}
res = es.search(index='xxx_062617', body=doc)
json.dump(res, f, ensure_ascii=False)
f.write(("\n"))
f.close()
Also, the query that matches _all fields is:
"multi_match": {
"query": x,
"type": "phrase",
"fields":"_all",
}
Now what happens is that, I get 101 returned record if I use query only on title, and description. But, I only get 100 returned records if I use _all fields. And if I want to get unique IDs by combining the ids of all records and remove duplicate ones, I see that there are only 86 duplicate records!
My questions are:
Does using type:phrase works differently if I use _all fields?
Should not I get more number of records if I use _all fields?
If _all includes all fields including title, and description, then why using _all does not cover all the records that have been returned by querying title, and description?
Thanks,

Azure Stream Processing upsert to DocumentDB with array

I'm using Azure Stream Analytics to copy my Json over to DocumentDB using upsert to overwrite the document with the latest data. This is great for my base data, but I would love to be able to append the list data, as unfortunately I can only send one list item at a time.
In the example below, the document is matched on id, and all items are updated, but I would like the "myList" array to keep growing with the "myList" data from each document (with the same id). Is this possible? Is there any other way to use Stream Analytics to update this list in the document?
I'd rather steer clear of using a tumbling window if possible, but is that an option that would work?
Sample documents:
{
"id": "1234",
"otherData": "example",
"myList": [{"listitem": 1}]
}
{
"id": "1234",
"otherData": "example 2",
"myList": [{"listitem": 2}]
}
Desired output:
{
"id": "1234",
"otherData": "example 2",
"myList": [{"listitem": 1}, {"listitem": 2}]
}
My current query:
SELECT id, otherData, myList INTO [myoutput] FROM [myinput]
Currently arrays are not merged, this is the existing behavior of DocumentDB output from ASA, also mentioned in this article. I doubt using a tumbling window would help here.
Note that changes in the values of array properties in your JSON document result in the entire array getting overwritten, i.e. the array is not merged.
You could transform the input that is coming as an array (myList) into a dictionary using GetArrayElements function .
Your query might look something like --
SELECT i.id , i.otherData, listItemFromArray
INTO myoutput
FROM myinput i
CROSS APPLY GetArrayElements(i.myList) AS listItemFromArray
cheers!

Couchdb - How get results in reversed order by date and without id field

I'm testing some Couchdb features and I want get results with a reversed order by insertion date, querying by "i" field
A sample doc:
{
"_id": "970c3a0fdbb23dde47fb4075091a4d2b",
"_rev": "1-54448147611ff5e89189bb44e58c1521",
"doc_type": "Test",
"e": "3/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/36/2",
"d": "64/183/329/2/360/10/13/47/6/351/331/320/355/342/7/335/47/18/30/56/18/323/351/325/323/218/163/155/155/155/155",
"f": "1399305161/1399305185/1399305194/1399305254/1399305314/1399305374/1399305434/1399305447/1399305506/1399305566/1399305626/1399305668/1399305727/1399305787/1399305847/1399305908/1399305963/1399305970/1399306022/1399306068/1399306078/1399306100/1399306159/1399306219/1399306279/1399306308/1399306321/1399306379/1399306439/1399306493/1399306506",
"i": "3566120224",
"dated": "1399305161",
"v": "0/5/6/32/63/63/51/16/35/60/0/10/64/31/64/48/14/31/6/55/60/50/0/0/21/5/34/0/0/0/0",
"date": "2014-05-05T15:52:42Z"
}
My view:
function(doc) {
if(doc.i && doc.date){
emit([doc.i,doc.date],1); // 1 to test only
}
}
I'm testing it with:
myview?startkey=["3566120224"]&endkey=["3566120224",{}]&reversed=true
But I'm getting the data with a date order not reversed
{"total_rows":545,"offset":508,"rows":[
{"id":"407ee687674b783601ce6d7da906515e","key":["3566120224","2014-05-05T14:11:01Z"],"value":1},
{"id":"407ee687674b783601ce6d7da9062b51","key":["3566120224","2014-05-05T14:15:21Z"],"value":1},
{"id":"407ee687674b783601ce6d7da905f4d9","key":["3566120224","2014-05-05T14:19:41Z"],"value":1},
{"id":"407ee687674b783601ce6d7da905b4e1","key":["3566120224","2014-05-05T14:24:01Z"],"value":1},
{"id":"407ee687674b783601ce6d7da905733c","key":["3566120224","2014-05-05T14:28:22Z"],"value":1},
{"id":"407ee687674b783601ce6d7da904e7ea","key":["3566120224","2014-05-05T14:32:42Z"],"value":1},
{"id":"407ee687674b783601ce6d7da9043709","key":["3566120224","2014-05-05T14:37:02Z"],"value":1},
{"id":"407ee687674b783601ce6d7da9039896","key":["3566120224","2014-05-05T14:41:22Z"],"value":1},
{"id":"407ee687674b783601ce6d7da90303be","key":["3566120224","2014-05-05T14:45:43Z"],"value":1},
{"id":"407ee687674b783601ce6d7da90239ae","key":["3566120224","2014-05-05T14:50:03Z"],"value":1},
{"id":"407ee687674b783601ce6d7da9018442","key":["3566120224","2014-05-05T14:54:23Z"],"value":1},
{"id":"407ee687674b783601ce6d7da90104f0","key":["3566120224","2014-05-05T14:58:43Z"],"value":1},
{"id":"407ee687674b783601ce6d7da9007b67","key":["3566120224","2014-05-05T15:03:04Z"],"value":1},
{"id":"90bb394f7a4a581ff4dc78bfaffff448","key":["3566120224","2014-05-05T15:07:24Z"],"value":1},
{"id":"90bb394f7a4a581ff4dc78bfafff368e","key":["3566120224","2014-05-05T15:11:44Z"],"value":1},
{"id":"90bb394f7a4a581ff4dc78bfaffe7e65","key":["3566120224","2014-05-05T15:16:05Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091f8e5c","key":["3566120224","2014-05-05T15:24:45Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091f6241","key":["3566120224","2014-05-05T15:29:05Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091f254a","key":["3566120224","2014-05-05T15:33:26Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091ed01b","key":["3566120224","2014-05-05T15:37:46Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091e5f42","key":["3566120224","2014-05-05T15:42:06Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091dd992","key":["3566120224","2014-05-05T15:46:26Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091d3853","key":["3566120224","2014-05-05T15:50:47Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091c9a3c","key":["3566120224","2014-05-05T15:55:07Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091bf465","key":["3566120224","2014-05-05T15:59:27Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091ba442","key":["3566120224","2014-05-05T16:03:47Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091ad482","key":["3566120224","2014-05-05T16:08:08Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb4075091a2130","key":["3566120224","2014-05-05T16:12:28Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb40750919a6ef","key":["3566120224","2014-05-05T16:16:48Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb407509192479","key":["3566120224","2014-05-05T16:21:08Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb40750918a977","key":["3566120224","2014-05-05T16:25:29Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb40750917b468","key":["3566120224","2014-05-05T16:29:49Z"],"value":1},
{"id":"970c3a0fdbb23dde47fb407509170583","key":["3566120224","2014-05-05T16:34:09Z"],"value":1}
]}
I have two times the same data(date & dated[ms date]), 1399305161 that is = 2014-05-05T15:52:42Z
thinking that I could order the results with a data type more easy to parse for couchdb, but didn't work using the dated field
Also I don't need the id field, how can exclude it from the results?
If you look here, you can see that reverse is not a supported query option. To reverse the data, you want to do:
myview?endkey=["3566120224"]&startkey=["3566120224",{}]&descending=false
If you don't want the ID field, you can just return the reduced value at the highest grouping level:
myview?endkey=["3566120224"]&startkey=["3566120224",{}]&descending=false&group_level=2
To remove the doc.i just write:
function(doc) {
if(doc.i && doc.date){
emit([doc.date],1); // just removed doc.i
}
}
To reverse the results:
You can either change the results so that you use the descending option. I'd think that is bad practice though as it depends on the client using this feature.
Assuming it's the default use of the view, I'd opt to returning the date as integer. Date.parse(doc.date).valueof() should do the trick of returning the epoch time as integer or double. If you then return this with a '-' (minus) the view will by default be sorted in descending order. This is assuming you don't need the formatted date in the key.

Resources