How to give my own _id while inserting data in Elasticsearch? - python-3.x

I have a sample database as below:
SNO
Name
Address
99123
Mike
Texas
88124
Tom
California
I want to keep my SNO in elastic search _id to make it easier to update documents according to my SNO.
Python code to create an index:
abc = {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2
}
}
es.indices.create(index='test',body = abc)
I fetched data from postman as below:
{
"_index": "test",
"_id": "13",
"_data": {
"FirstName": "Sample4",
"LastName": "ABCDEFG",
"Designation": "ABCDEF",
"Salary": "99",
"DateOfJoining": "2020-05-05",
"Address": "ABCDE",
"Gender": "ABCDE",
"Age": "21",
"MaritalStatus": "ABCDE",
"Interests": "ABCDEF",
"timestamp": "2020-05-05T14:42:46.394115",
"country": "Nepal"
}
}
And Insert code in python is below:
req_JSON = request.json
input_index = req_JSON['_index']
input_id = req_JSON['_id']
input_data = req_JSON['_data']
doc = input_data
res = es.index(index=input_index, body=doc)
I thought _id will remain the same as what I had given but it generated the auto _id.

You can simply do it like this:
res = es.index(index=input_index, body=doc, id=input_id)
^
|
add this

Related

How to replace existing key in jsonb?

I'm trying to update a jsonb array in Postgres by replacing the entire array. It's important to note, I'm not trying to add an array to the object, but simply replace the whole thing with new values. When I try the code below, I get this error in the console
error: cannot replace existing key
I'm using Nodejs as server-side language.
server.js
//new array with new values
var address = {
"appt": appt,
"city": city,
"street": street,
"country": country,
"timezone": timezone,
"coordinates": coordinates,
"door_number": door_number,
"state_province": state_province,
"zip_postal_code": zip_postal_code
}
//query
var text = "UPDATE users SET info = JSONB_insert(info, '{address}', '" + JSON.stringify(address) + "') WHERE id=$1 RETURNING*";
var values = [userid];
//pool...[below]
users table
id(serial | info(jsonb)
And this is the object I need update
{
"dob": "1988-12-29",
"type": "seller",
"email": "eyetrinity3#test.com",
"phone": "5553766962",
"avatar": "f",
"address": [
{
"appt": "",
"city": "Brandon",
"street": "11th Street East",
"country": "Canada",
"timezone": "Eastern Standard Time",
"coordinates": [
"-99.925011",
"49.840649"
],
"door_number": "666",
"state_province": "Manitoba",
"zip_postal_code": "R7A 7B8"
}
],
"last_name": "doe",
"first_name": "john",
"date_created": "2022-11-12T19:44:36.714Z",
}
below works in db-fiddle Postgresql v15 (did not in work in v12)
specific element
update json_update_t set info['address'][0] = '{
"appt": "12",
"city": "crater",
"street": "11th Street East",
"country": "mars",
"timezone": "Eastern Standard Time",
"coordinates": [
"-99.925011",
"49.840649"
],
"door_number": "9999",
"state_province": "marsbar",
"zip_postal_code": "abc 123"
}';
whole array
update json_update_t set info['address'] = '[{
"appt": "14",
"city": "crater",
"street": "11th Street East",
"country": "mars",
"timezone": "Eastern Standard Time",
"coordinates": [
"-99.925011",
"49.840649"
],
"door_number": "9999",
"state_province": "marsbar",
"zip_postal_code": "abc 123"
}]';
I have found the answer for this. Going through some of my older apps I coded, I stumbled upon the answer. It's not JSONB_INSERT but JSONB_SET. Notice the difference. The later will replace the entire key and not insert or add to the object.
JSONB_INSERT --> insert
UPDATE users SET info = JSONB_insert(info, '{address,-1}', '" + JSON.stringify(address) + "',true) WHERE id=$1 RETURNING*
JSONB_SET --> set and replace
UPDATE users SET info = JSONB_SET(info, '{address}', '" + JSON.stringify(address) +"') WHERE id=$1 RETURNING*

Cosmos Db: How to query for the maximum value of a property in an array of arrays?

I'm not sure how to query when using CosmosDb as I'm used to SQL. My question is about how to get the maximum value of a property in an array of arrays. I've been trying subqueries so far but apparently I don't understand very well how they work.
In an structure such as the one below, how do I query the city with more population among all states using the Data Explorer in Azure:
{
"id": 1,
"states": [
{
"name": "New York",
"cities": [
{
"name": "New York",
"population": 8500000
},
{
"name": "Hempstead",
"population": 750000
},
{
"name": "Brookhaven",
"population": 500000
}
]
},
{
"name": "California",
"cities":[
{
"name": "Los Angeles",
"population": 4000000
},
{
"name": "San Diego",
"population": 1400000
},
{
"name": "San Jose",
"population": 1000000
}
]
}
]
}
This is currently not possible as far as I know.
It would look a bit like this:
SELECT TOP 1 state.name as stateName, city.name as cityName, city.population FROM c
join state in c.states
join city in state.cities
--order by city.population desc <-- this does not work in this case
You could write a user defined function that will allow you to write the query you probably expect, similar to this: CosmosDB sort results by a value into an array
The result could look like:
SELECT c.name, udf.OnlyMaxPop(c.states) FROM c
function OnlyMaxPop(states){
function compareStates(stateA,stateB){
stateB.cities[0].poplulation - stateA.cities[0].population;
}
onlywithOneCity = states.map(s => {
maxpop = Math.max.apply(Math, s.cities.map(o => o.population));
return {
name: s.name,
cities: s.cities.filter(x => x.population === maxpop)
}
});
return onlywithOneCity.sort(compareStates)[0];
}
You would probably need to adapt the function to your exact query needs, but I am not certain what your desired result would look like.

PySpark Dataframe to Json - grouping data

We are trying to create a json from a dataframe. Please find the dataframe below,
+----------+--------------------+----------+--------------------+-----------------+--------------------+---------------+--------------------+---------------+--------------------+--------------------+
| CustId| TIN|EntityType| EntityAttributes|AddressPreference| AddressDetails|EmailPreference| EmailDetails|PhonePreference| PhoneDetails| MemberDetails|
+----------+--------------------+----------+--------------------+-----------------+--------------------+---------------+--------------------+---------------+--------------------+--------------------+
|1234567890|XXXXXXXXXXXXXXXXXX...| Person|[{null, PRINCESS,...| Alternate|[{Home, 460 M XXX...| Primary|[{Home, HEREBY...| Alternate|[{Home, {88888888...|[{7777777, 999999...|
|1234567890|XXXXXXXXXXXXXXXXXX...| Person|[{null, PRINCESS,...| Alternate|[{Home, 460 M XXX...| Primary|[{Home, HEREBY...| Primary|[{Home, {88888888...|[{7777777, 999999...|
|1234567890|XXXXXXXXXXXXXXXXXX...| Person|[{null, PRINCESS,...| Primary|[{Home, PO BOX 695020...| Primary|[{Home, HEREBY...| Alternate|[{Home, {88888888...|[{7777777, 999999...|
|1234567890|XXXXXXXXXXXXXXXXXX...| Person|[{null, PRINCESS,...| Primary|[{Home, PO BOX 695020...| Primary|[{Home, HEREBY...| Primary|[{Home, {88888888...|[{7777777, 999999...|
+----------+--------------------+----------+--------------------+-----------------+--------------------+---------------+--------------------+---------------+--------------------+--------------------+
So the initial columns custid, TIN, Entitytype,EntityAttributes will be same for a particular customer, say 1234567890 in our example. But he might be having multiple addresses/phone/email. Could you please help us on how to group them under 1 json.
Expected Structure :
{
"CustId": 1234567890,
"TIN": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"EntityType": "Person",
"EntityAttributes": [
{
"FirstName": "PRINCESS",
"LastName": "XXXXXX",
"BirthDate": "xxxx-xx-xx",
"DeceasedFlag": "False"
}
],
"Address": [
{
"AddressPreference": "Alternate",
"AddressDetails": {
"AddressType": "Home",
"Address1": "460",
"City": "XXXX",
"State": "XXX",
"Zip": "XXXX"
}
},
{
"AddressPreference": "Primary",
"AddressDetails": {
"AddressType": "Home",
"Address1": "PO BOX 695020",
"City": "XXX",
"State": "XXXX",
"Zip": "695020",
}
}
],
"Phone": [
{
"PhonePreference": "Primary",
"PhoneDetails": {
"PhoneType": "Home",
"PhoneNumber": "xxxxx",
"FormatPhoneNumber": "xxxxxx"
}
},
{
"PhonePreference": "Alternate",
"PhoneDetails": {
"PhoneType": "Home",
"PhoneNumber": "xxxx",
"FormatPhoneNumber": "xxxxx"
}
},
{
],
"Email": [
{
"EmailPreference": "Primary",
"EmailDetails": {
"EmailType": "Home",
"EmailAddress": "xxxxxxx#GMAIL.COM"
}
}
],
}
]
}
UPDATE
Tried with the below recommended group by method, it ended up giving 1 customer details, but the email is repeated 4 times in the list. Ideally it should be having only 1 email. Also In the Address Preference Alternate has 1 address and primary has 1 address, but the Alternate shows 2 entries and primary shows 2. Could you please help with an ideal solution.
Probably this should work. id is like a custid in your example which has repeating values.
>>> df.show()
+----+------------+----------+
| id| address| email|
+----+------------+----------+
|1001| address-a| email-a|
|1001| address-b| email-b|
|1002|address-1002|email-1002|
|1003|address-1003|email-1002|
|1002| address-c| email-2|
+----+------------+----------+
Aggregate on those repeating columns and then convert to JSON
>>> results = df.groupBy("id").agg(collect_list("address").alias("address"),collect_list("email").alias("email")).toJSON().collect()
>>> for i in results: print(i)
...
{"id":"1003","address":["address-1003"],"email":["email-1002"]}
{"id":"1002","address":["address-1002","address-c"],"email":["email-1002","email-2"]}
{"id":"1001","address":["address-a","address-b"],"email":["email-a","email-b"]}

How to find common struct for all documents in collection?

I have an array of documents, that have more or less same structure. But I need find fields that present in all documents. Somethink like:
{
"name": "Jow",
"salary": 7000,
"age": 25,
"city": "Mumbai"
},
{
"name": "Mike",
"backname": "Brown",
"sex": "male",
"city": "Minks",
"age": 30
},
{
"name": "Piter",
"hobby": "footbol",
"age": 25,
"location": "USA"
},
{
"name": "Maria",
"age": 22,
"city": "Paris"
},
All docs have name and age. How to find them with ArangoDB?
You could do the following:
Retrieve the attribute names of each document
Get the intersection of those attributes
i.e.
LET attrs = (FOR item IN test RETURN ATTRIBUTES(item, true))
RETURN APPLY("INTERSECTION", attrs)
APPLY is necessary so each list of attributes in attrs can be passed as a separate parameter to INTERSECTION.
Documentation:
ATTRIBUTES: https://www.arangodb.com/docs/stable/aql/functions-document.html#attributes
INTERSECTION: https://www.arangodb.com/docs/stable/aql/functions-array.html#intersection
APPLY: https://www.arangodb.com/docs/stable/aql/functions-miscellaneous.html#apply

Merge documents by fields

I have two types of docs. Main docs and additional info for it.
{
"id": "371"
"name": "Mike",
"location": "Paris"
},
{
"id": "371-1",
"age": 20,
"lastname": "Piterson"
}
I need to merge them by id, to get result doc. The result should look like:
{
"id": "371"
"name": "Mike",
"location": "Paris"
"age": 20,
"lastname": "Piterson"
}
Using COLLECT / INTO, SPLIT(), and MERGE():
FOR doc IN collection
COLLECT id = SPLIT(doc.id, '-')[0] INTO groups
RETURN MERGE(MERGE(groups[*].doc), {id})
Result:
[
{
"id": "371",
"location": "Paris",
"name": "Mike",
"lastname": "Piterson",
"age": 20
}
]
This will:
Split each id attribute at any - and return the first part
Group the results into sepearate arrays (groups)
Merge #1: Merge all objects into one
Merge #2: Merge the id into the result
See REMOVE & INSERT or REPLACE for write operations.

Resources