Speeding up Cloudant query for type text index - couchdb

We have a table with this type of structure:
{_id:15_0, createdAt: 1/1/1, task_id:[16_0, 17_0, 18_0], table:”details”, a:b, c: d, more}
We created indexes using
{
"index": {},
"name": "paginationQueryIndex",
"type": "text"
}
It auto created
{
"ddoc": "_design/28e8db44a5a0862xxx",
"name": "paginationQueryIndex",
"type": "text",
"def": {
"default_analyzer": "keyword",
"default_field": {
},
"selector": {
},
"fields": [
],
"index_array_lengths": true
}
}
We are using the following query
{
"selector": {
"createdAt": { "$gt": 0 },
"task_id": { "$in": [ "18_0" ] },
"table": "details"
},
"sort": [ { "createdAt": "desc" } ],
"limit”: 20
}
It takes 700-800 ms for first time, after that it decreases to 500-600 ms
Why does it take longer the first time?
Any way to speed up the query?
Any way to add indexes to specific fields if type is “text”? (instead of indexing all the fields in these records)

You could try creating the index more explicitly, defining the type of each field you wish to index e.g.:
{
"index": {
"fields": [
{
"name": "createdAt",
"type": "string"
},
{
"name": "task_id",
"type": "string"
},
{
"name": "table",
"type": "string"
}
]
},
"name": "myindex",
"type": "text"
}
Then your query becomes:
{
"selector": {
"createdAt": { "$gt": "1970/01/01" },
"task_id": { "$in": [ "18_0" ] },
"table": "details"
},
"sort": [ { "createdAt": "desc" } ],
"limit": 20
}
Notice that I used strings where the data type is a string.
If you're interested in performance, try removing clauses from your query one at-a-time to see if one is causing the performance problem. You can also look at the explanation of your query to see if it using your index correctly.
Documentation on creating an explicit text query index is here

Related

Return the documents if the array field length greater than 0 in esclient nodejs

I have millions of documents in my es index.
I wanted to fetch the documents where the array field length greater than 0.
My docs looks like this
[
{
"primaryKey": "9c30d9e8-af04-4cc8-afcb-0c1311988c1e",
"language": "all",
"industry": [
"Accounting & auditing"
],
"text": "what's the status of my incident?",
"textId": "d0c70fc4-5e2a-4cab-a5f6-32339e6632dd",
"extractions": [],
"active": true,
"status": "active",
"createdAt": 1620208485092,
"updatedAt": 1620208485092,
"secondaryKey": "5db5f725-ec09-49da-9507-7bb2f94fd741"
},
{
"primaryKey": "9c30d9e8-af04-4cc8-afcb-0c1311988c1e",
"language": "all",
"industry": [
"Accounting"
],
"text": "What is the rating of my incident",
"textId": "4a53533f-293e-440c-aaa9-f7e5ae1436ca",
"extractions": [
{
"name": "Abinas Patra",
"role": "api-user",
"primaryKey": "ed12851d-c18d-4c92-8cc3-1782e41bc9d0"
},
{
"name": "Anil Patra",
"role": "ui-user",
"primaryKey": "933fad33-78b3-4779-a7bd-c62c6e02af75"
}
],
"active": true,
"status": "active",
"createdAt": 1620208485092,
"updatedAt": 1620208485092,
"secondaryKey": "5db5f725-ec09-49da-9507-7bb2f94fd741"
}
]
I am using elasticsearch nodejs client.
I tried in the below way
let dataCount = await esClient.count({
index: "indexName",
type: "docType",
body: {
query: {
bool: {
must: [
{
"script": {
"script": {
"inline": "doc['extractions'].values.length > 0",
"lang": "painless"
}
}
},
{
"match": {
"primaryKey": {
query: primaryKey,
"operator": "and"
}
}
},
{
"match": {
"language": {
query: language,
"operator": "and"
}
}
}
]
}
}
}
});
I get runtime parsing error everytime, i tried with exist field as well.
{"error":{"root_cause":[{"type":"script_exception","reason":"runtime error","script_stack":["org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:65)","org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:27)","doc[\'extractions\'].values.length > 1"," ^---- HERE"],
tried this as well
must_not:[
{
"script": {
"script": "_source.extractions.size() > 0"
}
}
]
Can anyone please help here.
thanks :)

How can I update one document at nested array

{
"_id": "5e28b029a0c8263a8a56980a",
"name": "Recruiter",
"data": [
{
"_id": "5e28b0980f89ba3c0782828f",
"targetLink": "https://www.linkedin.com/in/dan-kelsall-7aa0926b/",
"name": "Dan Kelsall",
"headline": "Content Marketing & Copywriting",
"actions": [
{
"result": 1,
"name": "VISIT"
},
{
"result": 1,
"name": "FOLLOW"
}
]
},
{
"_id": "5e28b0980f89ba3c078283426f",
"targetLink": "https://www.linkedin.com/in/56wergwer/",
"name": "56wergwer",
"headline": "asdgawehethre",
"actions": [
{
"result": 1,
"name": "VISIT"
}
]
}
]
}
Here is one of my mongodb document. I'd like to update data->actions->result
So this is what I've done
Campaign.updateOne({
'data.targetLink': "https://www.linkedin.com/in/dan-kelsall-7aa0926b/",
'data.actions.name': "Follow"
}, {$set: {'data.$.actions.result': 0}})
But it seems not updating anything and even it can't find the document by this 'data.actions.name'
You need the positional filtered operator since the regular positional operator ($) can only be used for one level of nested arrays:
Campaign.updateOne(
{ "_id": "5e28b029a0c8263a8a56980a", "data.targetLink": "https://www.linkedin.com/in/dan-kelsall-7aa0926b/" },
{ $set: { "data.$.actions.$[action].result": 0 } },
{ arrayFilters: [ { "action.name": "Follow" } ] }
)

cloudant searching index by multiple values

Cloudant is returning error message:
{"error":"invalid_key","reason":"Invalid key use-index for this request."}
whenever I try to query against an index with the combination operator, "$or".
A sample of what my documents look like is:
{
"_id": "28f240f1bcc2fbd9e1e5174af6905349",
"_rev": "1-fb9a9150acbecd105f1616aff88c26a8",
"type": "Feature",
"properties": {
"PageName": "A8",
"PageNumber": 1,
"Lat": 43.051523,
"Long": -71.498852
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-71.49978935969642,
43.0508382914137
],
[
-71.49978564033566,
43.052210148524
],
[
-71.49791499857444,
43.05220740550381
],
[
-71.49791875962663,
43.05083554852429
],
[
-71.49978935969642,
43.0508382914137
]
]
]
}
}
The index that I created is for field "properties.PageName", which works fine when I'm just querying for one document, but as soon as I try for multiple ones, I would receive the error response as quoted in the beginning.
If it helps any, here is the call:
POST https://xyz.cloudant.com/db/_find
request body:
{
"selector": {
"$or": [
{ "properties.PageName": "A8" },
{ "properties.PageName": "M30" },
{ "properties.PageName": "AH30" }
]
},
"use-index": "pagename-index"
}
In order to perform an $or query you need to create a text (full text) index, rather than a json index. For example, I just created the following index:
{
"index": {
"fields": [
{"name": "properties.PageName", "type": "string"}
]
},
"type": "text"
}
I was then be able to perform the following query:
{
"selector": {
"$or": [
{ "properties.PageName": "A8" },
{ "properties.PageName": "M30" },
{ "properties.PageName": "AH30" }
]
}
}

Elasticsearch match with stemming

How do I do a search for a stemmed match?
I.e. at the moment I have many documents that contain the word "skateboard" in the item_title field, but only 3 documents that contain the word "skateboards". Because of this, when I do the following search:
POST /my_index/my_type/_search
{
"size": 100,
"query" : {
"multi_match": {
"query": "skateboards",
"fields": [ "item_title^3" ]
}
}
}
I only get 3 results. However, I would like also documents with the word "skateboard" to be returned.
From what I understand from Elasticsearch I would expect that this is done by specifying a mapping on the item_title field that contains an analyser which indexes the stemmed version of each word, but I can't seem to find the documentation on how to do this, which suggests that it's done in a different way.
Suggestions?
Here's one example:
PUT /stem
{
"settings": {
"analysis": {
"filter": {
"filter_stemmer": {
"type": "stemmer",
"language": "english"
}
},
"analyzer": {
"tags_analyzer": {
"type": "custom",
"filter": [
"standard",
"lowercase",
"filter_stemmer"
],
"tokenizer": "standard"
}
}
}
},
"mappings": {
"test": {
"properties": {
"item_title": {
"analyzer": "tags_analyzer",
"type": "text"
}
}
}
}
}
Index some sample docs:
POST /stem/test/1
{
"item_title": "skateboards"
}
POST /stem/test/2
{
"item_title": "skateboard"
}
POST /stem/test/3
{
"item_title": "skate"
}
Perform the query:
GET /stem/test/_search
{
"query": {
"multi_match": {
"query": "skateboards",
"fields": [
"item_title^3"
]
}
},
"fielddata_fields": [
"item_title"
]
}
And see the results:
"hits": [
{
"_index": "stem",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"item_title": "skateboards"
},
"fields": {
"item_title": [
"skateboard"
]
}
},
{
"_index": "stem",
"_type": "test",
"_id": "2",
"_score": 1,
"_source": {
"item_title": "skateboard"
},
"fields": {
"item_title": [
"skateboard"
]
}
}
]
I have added, also, the fielddata_fields element so that you can see how the content of the field has been indexed. As you can see, in both cases, the indexed term is skateboard.

How to use facet filtering with nested documents on ElasticSearch

I have the following mapping:
curl -XPUT 'http://localhost:9200/bookstore/user/_mapping' -d '
{
"user": {
"properties": {
"user_id": { "type": "integer" },
"gender": { "type": "string", "index" : "not_analyzed" },
"age": { "type": "integer" },
"age_bracket": { "type": "string", "index" : "not_analyzed" },
"current_city": { "type": "string", "index" : "not_analyzed" },
"relationship_status": { "type": "string", "index" : "not_analyzed" },
"books" : {
"type": "nested",
"properties" : {
"b_oid": { "type": "string", "index" : "not_analyzed" },
"b_name": { "type": "string", "index" : "not_analyzed" },
"bc_id": { "type": "integer" },
"bc_name": { "type": "string", "index" : "not_analyzed" },
"bcl_name": { "type": "string", "index" : "not_analyzed" },
"b_id": { "type": "integer" }
}
}
}
}
}'
Now, I try to query for example for Users which have "gender": "Male", have bought book in a certain category "bcl_name": "Trivia" and show the "b_name" book titles. I somehow cannot get it to run.
I have the query
curl -XGET 'http://localhost:9200/bookstore/user/_search?pretty=1' -d '{
"size": 0,
"from": 0,
"query": {
"filtered": {
"query": {
"terms": {
"gender": [
"Male"
]
}
}
}
},
"facets": {
"CategoryFacet": {
"terms": {
"field": "books.b_name",
"size": 5,
"shard_size": 1000,
"order": "count"
},
"nested": "books",
"facet_filter": {
"terms": {
"books.bcl_name": [
"Trivia"
]
}
}
}
}
}'
which returns a result, but I'm not sure whether this is correct. I looked for some examples, and found this (http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/) for example. I'm able to rewrite my query like this:
curl -XGET 'http://localhost:9200/bookstore/user/_search?pretty=1' -d '{
"size": 0,
"from": 0,
"query": {
"filtered": {
"query": {
"terms": {
"gender": [
"Male"
]
}
},
"filter": {
"nested": {
"path": "books",
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"term": {
"books.bcl_name": "Trivia"
}
}
]
}
}
}
}
}
}
},
"facets": {
"CategoryFacet": {
"terms": {
"field": "books.b_name",
"size": 5,
"shard_size": 1000,
"order": "count"
},
"nested": "books"
}
}
}'
which shows different results.
I, as a beginner, am a litte lost right now. Can someone please give me hint on how to solve this`? Thanks a lot in advance!
First query means:
Search for users whose gender : "Male"
But "CategoryFacet" includes the count of gender : "Male" AND
books.bcl_name : "Trivia"
So in result set you get all "Male" users, but your CategoryFacet gives you the count of "Male users AND whose books.bcl_name is Trivia".
In second query your "CategoryFacet" does not include extra filtering. It just returns the facets from the exact result set.

Resources