latlon format for cloudsearch - amazon

I want to do Geographic search in cloud search, i do indexing like this
when i uploading document
[{"type": "add", "id": "kdhrlfh1304532987654321987654321", "fields":{"name": "user1", "latlon":[12.628611, 120.694152] , "phoneverifiedon": "2015-05-04T15:39:03Z", "fbnumfriends": 172}},
{"type": "add", "id": "kdhrlfh1304532987654321987654322", "fields": {"name": "user2", "latlon":[12.628645,20.694178] , "phoneverifiedon": "2015-05-04T15:39:03Z", "fbnumfriends": 172}}]
i got below error
Status: error
Adds: 0
Deletes: 0
Errors:
{ ["Field "latlon" must have array type to have multiple values (near operation with index 1; document_id kdhrlfh1304532987654321987654321)","Validation error for field 'latlon': Invalid latlon value 12.628611"] }
i tried multiple format for "latlon" field
please suggest what is the correct format for the lat long in cloudsearch

The correct syntax for doc submission is a single string with the two values comma-separated, eg "latlon" : "12.628611, 120.694152".
[
{
"type": "add",
"id": "kdhrlfh1304532987654321987654321",
"fields": {
"name": "user1",
"latlon" : "12.628611, 120.694152"
"phoneverifiedon": "2015-05-04T15:39:03Z",
"fbnumfriends": 172
}
}
]
It is definitely confusing that the submission syntax doesn't match the query syntax, which uses an array to represent lat-lon.
https://forums.aws.amazon.com/thread.jspa?threadID=151633

Related

Unable to fetch the entire column index based on the value using JSONPath finder in npm

I have the below response payload and I just want to check the amount == 1000 if it's matching then I just want to get the entire column as output.
Sample Input:
{
"sqlQuery": "select SET_UNIQUE, amt as AMOUNT from transactionTable where SET_USER_ID=11651 ",
"message": "2 rows selected",
"row": [
{
"column": [
{
"value": "22621264",
"name": "SET_UNIQUE"
},
{
"value": "1000",
"name": "AMOUNT"
}
]
},
{
"column": [
{
"value": "226064213",
"name": "SET_UNIQUE"
},
{
"value": "916",
"name": "AMOUNT"
}
]
}
]
}
Expected Output:
"column": [
{
"value": "22621264",
"name": "SET_UNIQUE"
},
{
"value": "1000",
"name": "AMOUNT"
}
]
The above sample I just want to fetch the entire column if the AMOUNT value will be 1000.
I just tried below to achieve this but no luck.
1. row[*].column[?(#.value==1000)].column
2. row[*].column[?(#.value==1000)]
I don't want to do this by using index. Because It will be change.
Any ideas please?
I think you'd need nested expressions, which isn't something that's widely supported. Something like
$.row[?(#.column[?(#.value==1000)])]
The inner expression returns matches for value==1000, then the outer expression checks for existence of those matches.
Another alternative that might work is
$.row[?(#.column[*].value==1000)]
but this assumes some implicit type conversions that may or may not be supported.

How to create a field mapping in Azure Search with a complex targetField

I use the Azure Search indexer to index documents from a MongoDB CosmosDB which contains objects with fields named _id.
As Azure Search does not allow underscores at the beginning of a field name in the index, I want to create a field mapping.
JSON structure in Cosmos --> structure in index
{
"id": "test"
"name": "test",
"productLine": {
"_id": "123", --> "id": "123"
"name": "test"
}
}
The documentation has exactly this scenario as an example but only for a top level field.
"fieldMappings" : [ { "sourceFieldName" : "_id", "targetFieldName" : "id" } ]}
I tried the following:
"fieldMappings" : [ { "sourceFieldName" : "productLine/_id", "targetFieldName" : "productLine/id" } ] }
that results in an error stating:
Value is not accepted. Valid values: "doc_id", "name", "productName".
What is the correct way to create a mapping for a target field that is a subfield?
It's not possible to directly map subfields. You can get around this by adding a Skillset with a Shaper cognitive skill to the indexer, and an output field mapping.
You will also want to attach a Cognitive Services resource to the skillset. The shaper skill doesn't get billed, but attaching a Cognitive Services resource allows you to process more than 20 documents per day.
Shaper skill
{
"#odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"context": "/document",
"inputs": [
{
"name": "id",
"source": "/document/productLine/_id"
},
{
"name": "name",
"source": "/document/productLine/name"
}
],
"outputs": [
{
"name": "output",
"targetName": "renamedProductLine"
}
]
}
Indexer skillset and output field mapping
"skillsetName": <skillsetName>,
"outputFieldMappings": [
{
"sourceFieldName": "/document/renamedProductLine",
"targetFieldName": "productLine"
}
]

Elasticsearch Search query with multiple filters

Hello guys I am new to elastic search but I have gone through the basic ElasticSearch 5.1 documentation.
Problem in one line:
Search is successful but filters are not working properly.
Mapping datatypes
{
"properties": {
"title": {"type": "string"},
"description": {"type": "string"},
"slug": {"type": "string"},
"course_type": {"type": "string", "index" : "not_analyzed"},
"price": {"type": "string"},
"categories": {"type": "keyword", "index" : "not_analyzed"},
"tags": {"type" : "keyword"},
// "tags": {"type" : "keyword", "index" : "not_analyzed"},
"status": {"type" : "string","index" : "not_analyzed"},
}
}
As noted by #Darth_Vader I tried mapping as well. Following is my mapping
Document in index (Req-1)
....
{
"_index": "learnings",
"_type": "materials",
"_id": "582d9xxxxxxxx9b27fab2c",
"_score": 1,
"_source": {
"title": "Mobile Marketing",
"slug": "mobile-marketing",
"description": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla eleifend hendrerit vehicula.",
"categories": [
"Digital Marketing"
],
"tags": [
"digital-marketing",
"mobile-marketing"
],
"status": "published"
}
},
...
Like above I have like hundred documents in an index
SEARCH QUERY FULL that I am using
"query": {
"bool": {
"must": {
"multi_match" : {
"query" : "mobile",
"fields" : [ "title^5", "tags^4", "categories^3" ],
"operator": "and"
}
},
"filter": {
"bool" : {
"must" : [
{"term" : {"status": "published"} }
]
}
}
}
}
In the above query the most important search criteria/filter is {"term" :
{"status": "published"} }. Every search result must meet this
requirement.
Now from the list of results, I want to filter more. So say I want to get only documents which has mobile-marketing as a tag. My document (Req-1) has this tag (mobile-marketing)
NOW the problem is:
If I modify my Search Query and add my required filter like the following below: I get NO search result (hits = 0) even though my document (Req-1) has mobile-marketing as a tag
"filter": {
"bool" : {
"must" : [
{"term" : {"status": "published"} },
{"term" : {"tags": "mobile-marketing"} }
]
}
}
BUT if I change the filter {"tags": "mobile-marketing"} TO {"tags": "mobile"}, I get the required document (Req-1) as result.
I want to get the same document using this filter: {"tags": "mobile-marketing"}. So where am I doing wrong?
What modification does my search query need?
Thanks
How does your mapping look for tags?
Seems like you've got your mapping for tags field as analyzed. What *analyzed` does is, from the books:
First analyze the string and then index it. In other words, index this
field as full text.
So it analyzes it first, where the value looks like mobile-marketing. Hence it'll store mobile and marketing separately because of the hyphen in the middle and it'll be tokenized into tokens. ie: it'll store mobile and marketing into two different tokens.
Whereas if it's not_analyzed:
Index this field, so it is searchable, but index the value exactly as
specified. Do not analyze it.
So this will basically store the value as it is without analyzing it, which should do the trick in your case. Maybe you should have a look at this point as well.
Hope it helps!

How to search through data with arbitrary amount of fields?

I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.
Created form is used for:
register a new member to event;
search through registered members.
What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?
I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.
The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/
In short, you will need to do the following steps to get what you want:
Create a special index described in the post.
Flatten the data you want to index using the flattenData function:
https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4.
Create a document with the original and flattened data and index it into Elasticsearch:
{
"data": { ... },
"flatData": [ ... ]
}
Optional: use Elasticsearch aggregations to find which fields and types have been indexed.
Execute queries on the flatData object to find what you need.
Example
Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:
name string
age long
sex long - 0 for male, 1 for female
In addition to this data, the related event probably has some sort of id, let's call it eventId. So the final document could look like this:
{
"eventId": "2T73ZT1R463DJNWE36IA8FEN",
"name": "Bob",
"age": 22,
"sex": 0
}
Now, before we index this document, we will flatten it using the flattenData function:
flattenData(document);
This will produce the following array:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "2T73ZT1R463DJNWE36IA8FEN"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Bob"
},
{
"key": "age",
"type": "long",
"key_type": "age.long",
"value_long": 22
},
{
"key": "sex",
"type": "long",
"key_type": "sex.long",
"value_long": 0
}
]
Then we will wrap this data in a document as I've showed before and index it.
Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:
name string
city string
sex string - "male" or "female"
This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".
Let's try to flatten the data submitted by this form:
flattenData({
"eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
"name": "Alice",
"city": "New York",
"sex": "female"
});
This will produce the following data:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Alice"
},
{
"key": "city",
"type": "string",
"key_type": "city.string",
"value_string": "New York"
},
{
"key": "sex",
"type": "string",
"key_type": "sex.string",
"value_string": "female"
}
]
Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.
For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "eventId"}},
{"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
]
}
}
}
},
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "name"}},
{"match": {"flatData.value_string": "bob"}}
]
}
}
}
}
]
}
}
}
ElasticSearch automatically detects the field content in order to index it correctly, even if the mapping hasn't been defined previously. So, yes : ElasticSearch suits well these cases.
However, you may want to fine tune this behavior, or maybe the default mapping applied by ElasticSearch doesn't correspond to what you need : in this case, take a look at the default mapping or, for even further control, the dynamic templates feature.
If you let your end users decide the keys you store things in, you'll have an ever-growing mapping and cluster state, which is problematic.
This case and a suggested solution is covered in this article on common problems with Elasticsearch.
Essentially, you want to have everything that can possibly be user-defined as a value. Using nested documents, you can have a key-field and differently mapped value fields to achieve pretty much the same.

elasticsearch: distinct token matching

tl;dr: I want to have a query that matches every token once at max
Given I have an elasticsearch index with the following words:
["stackoverflow", "overflow", "awesome", "some"]
Is there any elasticsearch query, that matches
"stackoverflow" and "awesome" on the sentence "stackoverflow community is awesome" and doesn't match "overflow" and "some"?
I can't do it with score only, because there's also a misspelling detection included.
What I'm searching for is something like a consuming matching. Unfortunately, I couldn't find anything suitable so far :(
Thanks!
more details:
The indexed documents look like this
{"name": "stackoverflow",
"type": "brand"},
{"name": "awesome",
"type": "descriptor"},
{"name": "overflow",
"type": "brand"},
{"name": "some",
"type": "descriptor"}
My query looks like this:
{
"min_score": 1,
"query": {
"match": {
"name": {
"query": "stakoverflow community is awesom",
"fuzziness": 2
}
}
},
"rescore": {
"window_size": 10,
"query": {
"rescore_query": {
"match": {
"name": "stakoverflow community is awesom"
}
},
"query_weight": 0.9,
"rescore_query_weight": 1.1
}
}
}
So I basically try to catch misspellings in the first query and prefer non-misspelling in the rescore.
What I'd like to achieve:
For each token, I'd like to have at most 1 match:
INPUT stakoverflow community is awesom
OUTPUT stackoverflow <nothing> <nothing> awesome
My problem is, that I also get overflow and some returned. Overflow might even have a better score than awesome, because its not a misspelling.

Resources