How to iterate through indexed field to add field from another index - python-3.x

I'm rather new to elasticsearch, so i'm coming here in hope to find advices.
I have two indices in elastic from two different csv files.
The index_1 has this mapping:
{'settings': {
'number_of_shards' : 3
},
'mappings': {
'properties': {
'place': {'type': 'keyword' },
'address': {'type': 'keyword' },
}
}
}
The file is about 400 000 documents long.
The index_2 with a much smaller file(about 50 documents) has this mapping:
{'settings': {
"number_of_shards" : 1
},
'mappings': {
'properties': {
'place': {'type': 'text' },
'address': {'type': 'keyword' },
}
}
}
The field "place" in index_2 is all of the unique values from the field "place" in index_1.
In both indices the "address" fields are postcodes of datatype keyword with a structure: 0000AZ.
Based on the "place" field keyword in index_1 I want to assign the term of field "address" from index_2.
I have tried using the pandas library but the index_1 file is too large. I have also to tried creating modules based off pandas and elasticsearch, quite unsuccessfully. Although I believe this is a promising direction. A good solution would be to stay into the elasticsearch library as much as possible as these indices will be later be used for further analysis.

If i understand correctly it sounds like you want to use updateByQuery.
the request body should look a little like this:
{
'query': {'term': {'place': "placeToMatch"}},
'script': 'ctx._source.address = "updatedZipCode"'
}
This will update the address field of all documents with the matched place.
EDIT:
So what we want to do is use updateByQuery while iterating over all the documents in index2.
First step: get all the documents from index2, will just do this using the basic search feature
{
"index": 'index2',
"size": 100 // get all documents, once size is over 10,000 you'll have to padginate.
"body": {"query": {"match_all": {}}}
}
Now we iterate over all the results and use updateByQuery for each of the results:
// sudo
doc = response[i]
// update by query request.
{
index: 'index1',
body: {
'query': {'term': {'address': doc._source.address}},
'script': 'ctx._source.place = "`${doc._source.place}`"'
}
}

Related

Fetched sorted API data(NodeJs &Mongoose) not getting displayed in sorted order when try display in Angular UI

I have tried to get sorted in backend & tested via postman and I am getting sorted order.
const locationInfo = await locationDetails.find(query).sort({sectionName:1});
res.json(locationInfo);
[
{ //some other keys &values
"sectionName": "Closet",
},
{
"sectionName": "Dining",
},
{
"sectionName": "Kitchen",
},
{
"sectionName": "Other",
},
{
"sectionName": "Refrigerator",
}
]
After REST call storing result to,
this.result=data;
but when I try to display the same resultant data on UI, Its not getting displayed in sorted order as well as checked in console also resultant data order got changed.
Console Data
[{
sectionName: "Refrigerator",
},
{
sectionName: "Kitchen",
},
{
sectionName: "Dining",
},
{
sectionName: "Closet",
},
{
sectionName: "Other",
}]
Note: Tried to sort from .ts file also but it is not working.
this.result.sort(function(a,b){a.sectionName-b.sectionName});
If any help would be appreciated. Thanks!
SectioName is not a valid criterion for MongoDB to sort the return result. In this case, MongoDB does not know how to sort it.
Here is an example directly from the MongoDB documentation about cursor.sort():
db.restaurants.insertMany( [
{ "_id" : 1, "name" : "Central Park Cafe", "borough" : "Manhattan"},
{ "_id" : 2, "name" : "Rock A Feller Bar and Grill", "borough" : "Queens"},
{ "_id" : 3, "name" : "Empire State Pub", "borough" : "Brooklyn"},
{ "_id" : 4, "name" : "Stan's Pizzaria", "borough" : "Manhattan"},
{ "_id" : 5, "name" : "Jane's Deli", "borough" : "Brooklyn"},
] );
# The following command uses the sort() method to sort on the borough field:
db.restaurants.find().sort( { "borough": 1 } )
Documents are returned in alphabetical order by borough, but the order of those documents with duplicate values for borough might not be the same across multiple executions.
.sort works best with numerical values. If you are in control of the backend and are able to change how data is stored in the database. I suggest you create a field for the creation date or just an index to indicate the order of the items.
Let's say your document looks something like this:
# Doc 1
{
sectionName: "Refrigerator",
order:1
}
# Doc 2
{
sectionName: "Refrigerator",
order:2
}
Then you can do
const locationInfo = await locationDetails.find(query).sort({order:1});
which will return you the documents sorted using the order field, and the order will be consistent.

Cloudant Sorting on a nullable field

I want to sort on a field lets say name which is indexed in Cloudant DB. I am getting all the documents both which has this name field and which doesn't by using the index without sort . But when i try to sort with the name field I am not getting the documents which doesn't have this name field in the doc.
Is there any way to do this by using the query indexes. I want all the documents in sorted order which doesn't have the name field too.
For Example :
Below are some documents:
{
"_id": 1234,
"classId": "abc",
"name": "Happa"
}
{
"_id": 12345,
"classId": "abc",
"name": "Prasanth"
}
{
"_id": 123456,
"classId": "abc",
}
Below is the Query what i am trying to execute:
{
"selector": {
"classId": "abc",
"name" :{
"or" : [
{"$exists": true},{"$exists": false}
]
}
},
"sort": [{ "classId": "asc" }, { "name": "asc" }],
"use_index": "idx-classId_name"
},
I am expecting all the documents to be returned in a sorted order including the document which doesn't have that name field.
Your query makes no sense to me as it stands. You're requesting a listing of documents which either have, or don't have a specific field (meaning every document), and expecting to sort those on this field that may or may not exist. Such an order isn't defined out of the box.
I'd remove the name clause from the selector, sorting only on the classId field which appear in every document, and then do the secondary partial ordering on the client side, so you can decide how you intend to mix in the documents without the name field with those that have it.
Another solution is to use a view instead of a Cloudant Query index. I've not tested this, but hopefully the intent is clear:
function(doc) {
if (doc && doc.classId) {
var name = doc.name || "[notfound]";
emit(doc.classId+"-"+name, 1);
}
}
which will key the docs on "classId-name" and for docs with no name, a specified sentinel value.
Querying the view should return the documents lexicographically ordered on this compound key (which you can reverse with a query parameter if you wish).

ArangoDB AQL Updating Strange Attribute Names

In arangodb I have a Lookup Table as per below:
{
'49DD3A82-2B49-44F5-A0B2-BD88A32EDB13' = 'Human readable value 1',
'B015E210-27BE-4AA7-83EE-9F754F8E469A' = 'Human readable value 2',
'BC54CF8A-BB18-4E2C-B333-EA7086764819' = 'Human readable value 3',
'8DE15947-E49B-4FDC-89EE-235A330B7FEB' = 'Human readable value n'
}
I have documents in a seperate collection such as this which have non human readable attribute and value pairs as per "details" below:
{
"ptype": {
"name": "BC54CF8A-BB18-4E2C-B333-EA7086764819",
"accuracy": 9.6,
"details": {
"49DD3A82-2B49-44F5-A0B2-BD88A32EDB13": "B015E210-27BE-4AA7-83EE-9F754F8E469A",
"8DE15947-E49B-4FDC-89EE-235A330B7FEB": true,
}
}
}
I need to update the above document by looking up the human readable values out of the lookup table and I also need to update the non-human readable attributes with the readable attribute names also found in the lookup table.
The result should look like this:
{
"ptype": {
"name": "Human readable value 3",
"accuracy": 9.6,
"details": {
"Human readable value 1": "Human readable value 2",
"Human readable value n": true,
}
}
}
so ptype.name and ptype.details are updated with values from the lookup table.
This query should help you see how a LUT (Look Up Table) can be used.
One cool feature of AQL is that you can do a LUT query and assign it's value to a variable with the LET command, and then access the contents of that LUT later.
See if this example helps:
LET lut = {
'aaa' : 'Apples',
'bbb' : 'Bananas',
'ccc' : 'Carrots'
}
LET garden = [
{
'size': 'Large',
'plant_code': 'aaa'
},
{
'size': 'Medium',
'plant_code': 'bbb'
},
{
'size': 'Small',
'plant_code': 'ccc'
}
]
FOR doc IN garden
RETURN {
'size': doc.size,
'vegetable': lut[doc.plant_code]
}
The result of this query is:
[
{
"size": "Large",
"vegetable": "Apples"
},
{
"size": "Medium",
"vegetable": "Bananas"
},
{
"size": "Small",
"vegetable": "Carrots"
}
]
You'll notice in the bottom query that actually returns data, it's referring to the LUT by using the doc.plant_code as the look up key.
This is much more performant that performing subqueries there, because if you had 100,000 garden documents you don't want to perform a supporting query 100,000 times to work out the name of the plant_code.
If you wanted to confirm that you could find a value in the LUT, you could optionally have your final query in this format:
FOR doc IN garden
RETURN {
'size': doc.size,
'vegetable': (lut[doc.plant_code] ? lut[doc.plant_code] : 'Unknown')
}
This optional way to return the value for vegetable uses an inline if/then/else, where if the value is not found in the lut, it will return the value 'Unknown'.
Hope this helps you with your particular use case.

Couchdb mango query speed

I have following type of documents:
{
"_id": "0710b1dd6cc2cdc9c2ffa099c8000f7b",
"_rev": "1-93687d40f54ff6ca72e66ca7fc99caff",
"date": "2018-06-04T07:46:08.848Z",
"topic": "some topic",
}
The collection is not very large. Only 20k documents.
However, the following query is very slow. Takes ca 5 secs!
{
selector: {
topic: 'some topic'
},
sort: ['date'],
}
I tried various indexes, e.g.
index: {
fields: ['topic', 'date']
}
but nothing really worked well.
What I am missing here?
When sorting in a Mango query, you need to ensure that the sort order you are asking for matches the index that you are using.
If you are indexing the data set in topic,date order then you can use the following query on "topic" to get the data out in data order using the index:
{
"selector": {
"topic": "some topic"
},
"sort": [
"topic",
"date"
]
}
Because the sort matches the form of the data in the index, the index is used to answer the query which should speed up your query time considerably.

Scores optimization for ElasticSEarch

We have a catalog of products stored in ElasticSearch.
Each document looks like this:
{
'family': 'products family'
'category': 'products category'
'name': 'product name'
'description': 'product description'
}
We are trying to build a query that will get the fuzzy match for a search term and will score the results by the following order of fields:
family
category
name
description
Is there a way to do it?
A simple approach would be to use multi-match query giving each field an appropriate boost.
{
"query": {
"multi_match": {
"query": "produce",
"fields": ["family^4","category^3","name^2","description"],
"fuzziness" : "AUTO",
"rewrite" : "constant_score_auto"
}
}
}
All documents which match on the same field would get the same score.
You can change this behavior by tweaking rewrite parameter
Article gives further insight to this.

Resources