limit in _source in elasticsearch - node.js

This is my source from ES:
"_source": {
"queryHash": "query412236215",
"id": "query412236215",
"content": {
"columns": [
{
"name": "Catalog",
"type": "varchar(10)",
"typeSignature": {
"rawType": "varchar",
"typeArguments": [],
"literalArguments": [],
"arguments": [
{
"kind": "LONG_LITERAL",
"value": 10
}
]
}
}
],
"data": [
[
"apm"
],
[
"postgresql"
],
[
"rest"
],
[
"system"
],
[
"tpch"
]
],
"query_string": "show catalogs",
"execution_time": 1979
},
"createdOn": "1514269074289"
}
How can i get the n records inside _source.data?
Lets say _source.data have 100 records , I want only 10 at a time ,also is it possible to assign offset for next 10 records?
Thanks

Take a look at scripting. As far as I know there isn't any built-in solution because Elasticsearch is primarily built for searching and filtering with a document store only as a secondary concern.

First, the order in _source is stable, so it's not totally impossible:
When you get a document back from Elasticsearch, any arrays will be in
the same order as when you indexed the document. The _source field
that you get back contains exactly the same JSON document that you
indexed.
However, arrays are indexed—made searchable—as multivalue fields,
which are unordered. At search time, you can’t refer to "the first
element" or "the last element." Rather, think of an array as a bag of
values.
However, source filtering doesn't cover this, so you're out of luck with arrays.
Also inner hits won't help you. They do have options for sort, size, and from, but those will only return the matched subdocuments and I assume you want to page freely through all of them.
So your final hope is scripting, where you can build whatever you want. But this is probably not what you want:
Do you really need paging here? Results are transferred in a compressed fashion, so the overhead of paging is probably much larger than transferring the data in one go.
If you do need paging, because your array is huge, you probably want to restructure your documents.

Related

Mango index "does not contain a valid index for this query" even when specified manually

I'm trying to efficiently query data via Mango (as that seems to be the only option given my requirements Searching for sub-objects with a date range containing the queried date value), but I can't even get a very simple index/query pair to work: although I specify my index manually for the query, I'm told that my index "was not used because it does not contain a valid index for this query. No matching index found, create an index to optimize query time."
(I'm doing all of this via Fauxton on CouchDB v. 3.0.0)
Let's say my documents look like this:
{
"tenant": "TNNT_a",
"$doctype": "JobOpening",
// a bunch of other fields
}
All documents with a $doctype of "JobOpening" are guaranteed to have a tenant property. The searches I wish to perform will only ever be for documents with $doctype of "JobOpening" and a tenant selector will always be provided when querying.
Here's the test index I've configured:
{
"index": {
"fields": [
"tenant",
"$doctype"
],
"partial_filter_selector": {
"\\$doctype": {
"$eq": "JobOpening"
}
}
},
"ddoc": "job-openings-doctype-index",
"type": "json"
}
And here's the query
{
"selector": {
"tenant": "TNNT_a",
"\\$doctype": "JobOpening"
},
"use_index": "job-openings-doctype-index"
}
Why isn't the index being used for the query?
I've tried not using a partial index, and I think the $doctype escaping is done properly in the requisite places, but nothing seems to keep CouchDB from performing a full scan.
The index isn't being used because the $doctype field is not being recognized by the query planner as expected.
Changing the fields declaration from $doctype to \\$doctype in the design document solves the issue.
{
"index": {
"fields": [
"tenant",
"\\$doctype"
],
"partial_filter_selector": {
"\\$doctype": {
"$eq": "JobOpening"
}
}
},
"ddoc": "job-openings-doctype-index",
"type": "json"
}
After that small refactor, the query
{
"selector": {
"tenant": "TNNT_a",
"\\$doctype": "JobOpening"
},
"use_index": "job-openings-doctype-index"
}
Returns the expected result, and produces an "explain" which confirms the job-openings-doctype-index was queried:
{
"dbname": "stack",
"index": {
"ddoc": "_design/job-openings-doctype-index",
"name": "7f5c5cea5acd90f11fffca3e3355b6a03677ad53",
"type": "json",
"def": {
"fields": [
{
"tenant": "asc"
},
{
"\\$doctype": "asc"
}
],
"partial_filter_selector": {
"\\$doctype": {
"$eq": "JobOpening"
}
}
}
},
// etc etc etc
Whether this change is intuitive or not is unclear, however it is consistent - and perhaps reveals leading field names with a "special" character may not be desirable.
Regarding the indexing of the filtered field, as per the documentation regarding partial_filter_selector
Technically, we don’t need to include the filter on the "status" [e.g.
$doctype here] field in the query selector ‐ the partial index
ensures this is always true - but including it makes the intent of the
selector clearer and will make it easier to take advantage of future
improvements to query planning (e.g. automatic selection of partial
indexes).
Despite that, I would not choose to index a field whose value is constant.

How to include two analyzers into a single SEARCH statement?

I have a feeds collection with documents like this:
{
"created": 1510000000,
"find": [
"title of the document",
"body of the document"
],
"filter": [
"/example.com",
"-en"
]
}
created contains an epoch timestamp
find contains an array of fulltext snippets, e.g. the title and the body of a text
filter is an array with further search tokens, such as hashtags, domains, locales
Problem is that find contains fulltext snippets, which we want to tokenize, e.g. with a text analyzer, but filter contains final tokens which we want to compare as a whole, e.g. with the identity analyzer.
Goal is to combine find and filter into a single custom analyzer or to combine two analyzers using two SEARCH statements or something to that end.
I did manage to query by either find or by filter successfully, but do not manage to query by both. This is how I query by filter:
I created a feeds_search view:
{
"writebufferIdle": 64,
"type": "arangosearch",
"links": {
"feeds": {
"analyzers": [
"identity"
],
"fields": {
"find": {},
"filter": {},
"created": {}
},
"includeAllFields": false,
"storeValues": "none",
"trackListPositions": false
}
},
"consolidationIntervalMsec": 10000,
"writebufferActive": 0,
"primarySort": [],
"writebufferSizeMax": 33554432,
"consolidationPolicy": {
"type": "tier",
"segmentsBytesFloor": 2097152,
"segmentsBytesMax": 5368709120,
"segmentsMax": 10,
"segmentsMin": 1,
"minScore": 0
},
"cleanupIntervalStep": 2,
"commitIntervalMsec": 1000,
"id": "362444",
"globallyUniqueId": "hD6FBD6EE239C/362444"
}
and I created a sample query:
FOR feed IN feeds_search
SEARCH ANALYZER(feed.created < 9990000000 AND feed.created > 1500000000
AND (feed.find == "title of the document")
AND (feed.`filter` == "/example.com" OR feed.`filter` == "-uk"), "identity")
SORT feed.created
LIMIT 20
RETURN feed
The sample query works, because find contains the full text (identity analyzer). As soon as I switch to a text analyzer, single word tokens work for find, but filter no longer works.
I tried using a combination of SEARCH and FILTER, which gives me the desired result, but I assume it probably performs worse than having the SEARCH analyzer do the whole thing. I see that analyzers is an array in the view syntax, but I seem not to be able to set individual fields for each analyzer.
The analyzers can be added as a property to each field in fields. What is specified in analyzers is the default and is used in case a more specific analyzer is not set for a given field.
"analyzers": [
"identity"
],
"fields": {
"find": {
"analyzers": [
"text_en"
]
},
"filter": {},
"created": {}
},
Credits: Simran at ArangoDB

How to get character matches in Azure Search index instead of substrings

I created an Azure index for my DocumentDB collection, and it seems to be working fine. The index has properties for a user account like FirstName, LastName, and Username. The problem is the default tokenizer seems to be tokenizing the Username field. While I want token matches for the first two fields, I'd like character matching for the usernames. Is there an easy way to achieve this through the Azure portal? If not, how can I achieve this?
Adding another answer based on your above comments. So basically in the best case, what you want to do is prefix, suffix and wildcard search. So if the username was user246392, you could find it by typing "use", "392" or even "er246". The prefix is easy, because you could search use* and it would find it.
Kendra Little did a really nice blog post on how to leverage RegEx with Azure Search, which can allow you to do the full wildcard part of your ask (i.e. search for "392").
If you wanted to do the suffix search, you can do a trick that is quite efficient where you create a new field that would be a custom analyzer that would index the words in opposite order. Here is an example of a index schema that would allow this (over suffixName field)
{
"name":"people",
"fields": [
{ "name":"id", "type":"Edm.String", "key":true, "searchable":false },
{"name": "suffixName", "type": "Edm.String", "searchable":true, "indexAnalyzer":"suffixIndexingAnalyzer", "searchAnalyzer":"reverseText"}
],
"analyzers": [
{
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "suffixIndexingAnalyzer",
"tokenizer": "keyword_v2",
"tokenFilters": [
"asciifolding",
"lowercase",
"reverse",
"my_edgeNGramForSuffix"
],
"charFilters": []
},
{
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "reverseText",
"tokenizer": "classic",
"tokenFilters": [
"lowercase",
"reverse"
],
"charFilters": []
}
],
"tokenFilters":[
{
"#odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"name": "my_edgeNGramForSuffix",
"minGram": 2,
"maxGram": 25,
"side": "front"
}
]
}
Can you give us an example of what you would want to do over this username field? I am not sure what you mean by character matching. Is it a RegEx based character match? If so, perhaps a custom analyzer that enabled RegEx searched might help for this field? Please note, RegEx is not as performant as typical indexing as we would need to scan the entire content as opposed to going to the inverted index to find token matches.

Pagination with per-row access rights

Hi I am using CouchDB and assuming I have an articles document with the field users, containing an array of user IDs that are allowed to view this article.
Example scenario, there will be a paginated table view showing 10 articles per page, my controller will retrieve the first 10 articles from CouchDB then perform the access rights check one by one on the returned articles. But the current user may only have view access rights on say, 8 of them, therefore the table will only show 8 articles instead of 10.
What are the best practice of handling such situation besides implementing the access rights logic on the CouchDB layer?
To accomplish this, I would simply use a view keyed on the users field:
function (doc) {
doc.users.forEach(function (user) {
emit([ user ]);
});
}
I emitted an array with just 1 item in this case. I presume you'd also emit something like doc.created in order to have your articles sorted, you would simply add them after user in that array.
The view results would look something like:
{
"rows": [
{ "id": "<article-1>", "key": [ "<user-1>", "<created>" ] },
{ "id": "<article-2>", "key": [ "<user-1>", "<created>" ] },
{ "id": "<article-3>", "key": [ "<user-1>", "<created>" ] },
{ "id": "<article-1>", "key": [ "<user-2>", "<created>" ] },
{ "id": "<article-1>", "key": [ "<user-3>", "<created>" ] }
]
}
You can simply paginate like you normally would with CouchDB. You simply use start_key=["<user-1>"]&end_key=["<user-1>","\ufff0"] in addition to the usual paging limit=10&skip=0 for page 1, limit=10&skip=10 for page 2, etc.

Filtering Contentful Query on Linked Objects

I'm attempting to utilize Contentful on a current project of mine and I'm trying to understand how to filter my query results based on a field in a linked object.
My top level object contains a Link defined as such:
"name": "Service_Description",
"fields": [
{
"name": "Header",
"id": "header",
"type": "Link",
"linkType": "Entry",
"required": true,
"validations": [
{
"linkContentType": [
"offerGeneral"
]
}
],
"localized": false,
"disabled": false,
"omitted": false
},
This "header" field links to another content type that has this definition:
"fields": [
{
"name": "General",
"id": "general",
"type": "Link",
"linkType": "Entry",
"required": true,
"validations": [
{
"linkContentType": [
"genericGeneral"
]
}
],
"localized": false,
"disabled": false,
"omitted": false
},
which then links to the lowest level:
"fields": [{
"name": "TagList",
"id": "tagList",
"type": "Array",
"items": {
"type": "Link",
"linkType": "Entry",
"validations": [
{
"linkContentType": [
"tag"
]
}
]
},
"validations": []
}
where tagList is an array of tags this piece of content may have.
I want to be able to run a query from the top level object that says get me X number of these "Service_Description" content entries where it contains a tag from a supplied list of tags.
In PostMan, I've been running with this:
https://cdn.contentful.com/spaces/{SPACE_ID}/entries?access_token={ACCESS_TOKEN}&content_type=serviceDescription&include=3
I'm trying to add a filter something like so:
fields.header.fields.general.fields.tagList.sys.id%5Bin%5D={TAG_SYS_ID}
This is clearly incorrect, but I've been struggling with how to walk this relationship to achieve my goal. Perusing the documentation this seems to have something to do with includes, but I'm unsure of how to rectify the problem.
Any direction on how to achieve my goal or if this is possible?
This is now possible, something I believe was solved for in the API based on requests for this functionality. You can see the thread here.
This gist of it is that you have to query on the entries that have linked entries and then include the contentType for those linked entries in the query like so:
contentfulClient.getEntries({
'content_type': 'location',
'fields.market.fields.marketName': 'New York',
'fields.market.sys.contentType.sys.id': 'marketRegion'
})
Unfortunately what you are requesting is not currently possible in Contentful.
We were facing a very similar issue with nested/referenced content types and support said it wasn't possible.
We ended up writing a very complicated system that allowed us to do what you want. Essentially doing a full text search for the referenced content and then querying all of the parents entries. We then matched the relationships by iterating over the parents to find the relationship.
Sorry it couldn't be easier. Hopefully the devs work on something that improve this complication. We have brought this to their attention.

Resources