My elasticsearch has data, particularly something like this for dates:
{
"startTime": {
"type": "string",
"format": "yyyy/MM/dd",
"index": "analyzed",
"analyzer": "keyword"
}
}
I am adding a date range picker and want to use the dates picked to go query elasticsearch for data with startTime inside this range chosen. I'm not sure how to structure this query to elasticsearch, or if it will even work with this being a string field (I can potentially change it, though).
can anyone help me here?
Your field is a string, the format property is ignored. You should change your mapping and use the date type. Have a look here to see the core types available in elasticsearch.
I would use a filter instead of a query. It will be cached, thus faster. The following is an example for the last 7 days:
{
"filter" : {
"range" : {
"PublishTime" : {
"from" : "20130505T000000",
"to" : "20131105T235959"
}
}
}
}
Note that if you use the filter like this it's going to be the same filter the whole day, thus you would make good use of the cache.
Related
I have stored sentences in elasticsearch for autosuggestion.
format:
{
"text": "what is temperature in chicago"
}
it suggests correctly when w or wha or what typed. but I am wondering if there is any way I can fetch most search sentences from elasticsearch.
Sounds like what you need is terms aggregations:
Your request body should look something like this:
{
"query": {
//your query
},
"aggs": {
"common" : {
"terms" : { "field" : "text.keyword", "size": 20 }
}
}
}
If I get your question correctly you want most common searches done wrt to input query, a simple solution can be implemented.
Just track what user finally selects (document of ES) and then increment its counter by 1 keeping mapping of _id.
Running a batch system/sync/indexing this data in ES data will have counter value in your data.
Use this while giving suggestions i.e sort with count field.
This will start working properly as users start using.
Your ES document would look like.
{ "text":"what is temperature in chicago",
"count":10
}
I would suggest this is very raw solution there can be many, but nice to start with.
I have a collection with a Date field that is populated by a C# application using a DateTime object. This field is serialized to the following format "2018-06-10T17:32:48.3285735Z".
I haven't touched the Index Policy in the collection, so strings are using the Range index type. From what I've read in the documentation, that's the most efficient way to index dates, however, when I use the Date field in an ORDER BY clause, the query consumes at least 10x more RUs than if I were to query using the timestamp (_ts) number field. That means paying 10x more for this single collection.
To illustrate the issue:
SELECT TOP 100 * FROM c ORDER BY c.Date DESC
//query consumes a minimum of 500 RUs
SELECT TOP 100 * FROM c ORDER BY c._ts DESC
//query consumes 50 RUs
Is this how it is supposed to work or am I missing something? I suspect that if this was the expected behavior, it would be emphasized in the index documentation, and storing dates as numbers would be highlighted as the best practice.
EDIT:
This is the index policy for the collection (I never changed it).
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Spatial",
"dataType": "Point"
}
]
}
],
"excludedPaths": []
}
This may have to do with index collisions (multiple values map to the same index term).
You may want to narrow the range of the filed Date and see if that helps. Basically, try this query:
SELECT TOP 100 * FROM c WHERE (c.Date BETWEEN '2000-01-01' AND '2100-01-01') ORDER BY c.Date DESC
Please note that the added filter should not charge the query result set.
Did you try specifically configuring for Range Queries?
I think by default strings are hashed and you have to specify indexing for range queries.
I found this in the documentation:
By default, Azure Cosmos DB indexes all string properties within
documents consistently with a Hash index.
Documentation link
For setting up a range query index on the collection:
DocumentCollection collection = new DocumentCollection { Id = "orders" };
collection.IndexingPolicy = new IndexingPolicy(new RangeIndex(DataType.String)
{ Precision = -1 });
await client.CreateDocumentCollectionAsync("/dbs/orderdb", collection);
The document they are querying against looks like this:
{
"id": "09152014101",
"OrderDate": "2014-09-15T23:14:25.7251173Z",
"ShipDate": "2014-09-30T23:14:25.7251173Z",
"Total": 113.39
}
Documentation link
I believe this is an optimisation deficiency when the query uses TOP and ORDER BY. I've found that whilst there is not much difference in RU for a range query using timestamp as number and timestamp as string, in scenarios such as yours the range index on string appears to be ignored.
User Voice issue here:
https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/32345410-optimise-top-with-order-by-clause-queries
I have a documentDB collection that looks like this sample:
{
"data1": "hello",
"data2": [
{
"key": "key1",
"value": "value1"
},
{
"key": "key2",
"value": "value2"
}
}
In reality the data has a lot of other fields and the embedded array has some fields where the data is quite large. I need to query the data and I care about the small "key" field in the data2 array but I do not need the large "value". I am finding returning all the value data is causing performance problems, but if I exclude the array data from the SELECT all together it is fast (so the data size is the issue).
I cannot figure out a way to return only the "key" but exclude the "value" in the embedded array.
I basically want SELECT r.data1, r.data2.key and to have it return as:
{
"data1": "hello",
"data2": [
{
"key": "key1"
},
{
"key": "key2"
}
}
but it doesn't seem possible to SELECT r.data2.key because it is in an array
A JOIN will cause it to return a copy of each document for each "data2" array element, which does not work for me. My only other option would be to migrate the data and put the data I want into its own array so I can select the whole object.
Is this possible some how that I have not been able to figure out?
Mike,
As you have surmised, this is not possible without a custom UDF until DocumentDB supports sub-queries. If you would like to go down that route, see the following answer for an example of how the UDF may have to look:
DocumentDB Sub Query
Good luck!
Here's my query as it stands:
"query":{
"fuzzy":{
"author":{
"value":query,
"fuzziness":2
},
"career_title":{
"value":query,
"fuzziness":2
}
}
}
This is part of a callback in Node.js. Query (which is being plugged in as a value to compare against) is set earlier in the function.
What I need it to be able to do is to check both the author and the career_title of a document, fuzzily, and return any documents that match in either field. The above statement never returns anything, and whenever I try to access the object it should create, it says it's undefined. I understand that I could write two queries, one to check each field, then sort the results by score, but I feel like searching every object for one field twice will be slower than searching every object for two fields once.
https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzzy-match-query.html
If you see here, in a multi match query you can specify the fuzziness...
{
"query": {
"multi_match": {
"fields": [ "text", "title" ],
"query": "SURPRIZE ME!",
"fuzziness": "AUTO"
}
}
}
Somewhat like this.. Hope this helps.
I'm new to search and am having trouble interpreting the documentation on boosting fields in the mapping.
I want to achieve a simple boosting where the title of some article is more important than the tags associated with the article.
Here's an attempt at the config, which I have put in config/[index_name]/[some_name].json:
{
"[type]": {
"properties": {
"_boost": {
"name": "title",
"null_value": 2.0
}
"title": {
"type": "string"
}
}
}
}
I can tell the file is being read because of error messages from previous attempts at this file. I have also been deleting the index and recreating it between attempts so that it will use this mapping.
Will this work? It doesn't give any error messages, but I can't tell if there is any boost in effect from the output of _search or get _mapping API calls.
Here is the result of the _mapping call:
{
"[type]" : {
"properties" : {
"title" : {
"type" : "string"
}
"tags": {
"type" : "string"
}
}
}
}
Have a look at the example in the boost field documentation.
The boost field mapping (applied on the root object) allows to define
a boost field mapping where its content will control the boost level
of the document
The following mapping defines a field named _boost. If the _boost field itself exists within the JSON document indexed, its value will control the boost level of the indexed document.
{
"tweet" : {
"_boost" : {"name" : "_boost", "null_value" : 1.0}
}
}
Nothing special, the example just tells elasticsearch to consider the _boost field as it is and give a default 1.0 value to it when not present. But you are defining a boost for a specific document: that means that when the document matches a query, its score will be boosted according to the _boost field mapping that you applied to the root object. This doesn't have anything to do with boosting at a field level.
With your mapping you're saying that the content of the title field should be used as _boost, and you're giving a default _boost value of 2.0.
"_boost": {
"name": "title",
"null_value": 2.0
}
This doesn't make sense since the title contains text, and it's not either what you want I guess.
There are different ways to give more importance to a match on the title field.
As far as I understood from the documentation you can do it in your mapping like this:
{
"[type]" : {
"properties" : {
"title" : {
"type" : "string",
"boost" : 2.0
}
"tags": {
"type" : "string"
}
}
}
}
Quite honestly I haven't tried it and never used it before, but lucene does allow you to specify a boost per field at index time. The boost becomes part of the norms for that field and taken into account when there's a match on that specific field. So, this would be what you were looking for.
Anyway, I would personally do boosting at query time instead of index time, so that you don't need to modify your mapping and you can change the weight without reindexing. You can for example use a query string and search on different fields giving them different weights like this:
{
"query_string" : {
"fields" : ["title^2", "content"],
"query" : "this AND that OR thus"
}
}
You need to take into account that the query string query gets parsed and allows you to use the lucene query syntax.
Furthermore, you can combine different queries together using the bool query. You can express a boost for a match on title with a should clause containing for example a term query and a specific boost for it like this:
"should" : [
{
"term" : { "title" : "your query", "boost" : 2.0 }
}
]
You can use whatever query you want as should clause. If you go for the term query you need to remember it's not analyzed.