How to replace existing key in jsonb? - node.js

I'm trying to update a jsonb array in Postgres by replacing the entire array. It's important to note, I'm not trying to add an array to the object, but simply replace the whole thing with new values. When I try the code below, I get this error in the console
error: cannot replace existing key
I'm using Nodejs as server-side language.
server.js
//new array with new values
var address = {
"appt": appt,
"city": city,
"street": street,
"country": country,
"timezone": timezone,
"coordinates": coordinates,
"door_number": door_number,
"state_province": state_province,
"zip_postal_code": zip_postal_code
}
//query
var text = "UPDATE users SET info = JSONB_insert(info, '{address}', '" + JSON.stringify(address) + "') WHERE id=$1 RETURNING*";
var values = [userid];
//pool...[below]
users table
id(serial | info(jsonb)
And this is the object I need update
{
"dob": "1988-12-29",
"type": "seller",
"email": "eyetrinity3#test.com",
"phone": "5553766962",
"avatar": "f",
"address": [
{
"appt": "",
"city": "Brandon",
"street": "11th Street East",
"country": "Canada",
"timezone": "Eastern Standard Time",
"coordinates": [
"-99.925011",
"49.840649"
],
"door_number": "666",
"state_province": "Manitoba",
"zip_postal_code": "R7A 7B8"
}
],
"last_name": "doe",
"first_name": "john",
"date_created": "2022-11-12T19:44:36.714Z",
}

below works in db-fiddle Postgresql v15 (did not in work in v12)
specific element
update json_update_t set info['address'][0] = '{
"appt": "12",
"city": "crater",
"street": "11th Street East",
"country": "mars",
"timezone": "Eastern Standard Time",
"coordinates": [
"-99.925011",
"49.840649"
],
"door_number": "9999",
"state_province": "marsbar",
"zip_postal_code": "abc 123"
}';
whole array
update json_update_t set info['address'] = '[{
"appt": "14",
"city": "crater",
"street": "11th Street East",
"country": "mars",
"timezone": "Eastern Standard Time",
"coordinates": [
"-99.925011",
"49.840649"
],
"door_number": "9999",
"state_province": "marsbar",
"zip_postal_code": "abc 123"
}]';

I have found the answer for this. Going through some of my older apps I coded, I stumbled upon the answer. It's not JSONB_INSERT but JSONB_SET. Notice the difference. The later will replace the entire key and not insert or add to the object.
JSONB_INSERT --> insert
UPDATE users SET info = JSONB_insert(info, '{address,-1}', '" + JSON.stringify(address) + "',true) WHERE id=$1 RETURNING*
JSONB_SET --> set and replace
UPDATE users SET info = JSONB_SET(info, '{address}', '" + JSON.stringify(address) +"') WHERE id=$1 RETURNING*

Related

PySpark Dataframe to Json - grouping data

We are trying to create a json from a dataframe. Please find the dataframe below,
+----------+--------------------+----------+--------------------+-----------------+--------------------+---------------+--------------------+---------------+--------------------+--------------------+
| CustId| TIN|EntityType| EntityAttributes|AddressPreference| AddressDetails|EmailPreference| EmailDetails|PhonePreference| PhoneDetails| MemberDetails|
+----------+--------------------+----------+--------------------+-----------------+--------------------+---------------+--------------------+---------------+--------------------+--------------------+
|1234567890|XXXXXXXXXXXXXXXXXX...| Person|[{null, PRINCESS,...| Alternate|[{Home, 460 M XXX...| Primary|[{Home, HEREBY...| Alternate|[{Home, {88888888...|[{7777777, 999999...|
|1234567890|XXXXXXXXXXXXXXXXXX...| Person|[{null, PRINCESS,...| Alternate|[{Home, 460 M XXX...| Primary|[{Home, HEREBY...| Primary|[{Home, {88888888...|[{7777777, 999999...|
|1234567890|XXXXXXXXXXXXXXXXXX...| Person|[{null, PRINCESS,...| Primary|[{Home, PO BOX 695020...| Primary|[{Home, HEREBY...| Alternate|[{Home, {88888888...|[{7777777, 999999...|
|1234567890|XXXXXXXXXXXXXXXXXX...| Person|[{null, PRINCESS,...| Primary|[{Home, PO BOX 695020...| Primary|[{Home, HEREBY...| Primary|[{Home, {88888888...|[{7777777, 999999...|
+----------+--------------------+----------+--------------------+-----------------+--------------------+---------------+--------------------+---------------+--------------------+--------------------+
So the initial columns custid, TIN, Entitytype,EntityAttributes will be same for a particular customer, say 1234567890 in our example. But he might be having multiple addresses/phone/email. Could you please help us on how to group them under 1 json.
Expected Structure :
{
"CustId": 1234567890,
"TIN": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"EntityType": "Person",
"EntityAttributes": [
{
"FirstName": "PRINCESS",
"LastName": "XXXXXX",
"BirthDate": "xxxx-xx-xx",
"DeceasedFlag": "False"
}
],
"Address": [
{
"AddressPreference": "Alternate",
"AddressDetails": {
"AddressType": "Home",
"Address1": "460",
"City": "XXXX",
"State": "XXX",
"Zip": "XXXX"
}
},
{
"AddressPreference": "Primary",
"AddressDetails": {
"AddressType": "Home",
"Address1": "PO BOX 695020",
"City": "XXX",
"State": "XXXX",
"Zip": "695020",
}
}
],
"Phone": [
{
"PhonePreference": "Primary",
"PhoneDetails": {
"PhoneType": "Home",
"PhoneNumber": "xxxxx",
"FormatPhoneNumber": "xxxxxx"
}
},
{
"PhonePreference": "Alternate",
"PhoneDetails": {
"PhoneType": "Home",
"PhoneNumber": "xxxx",
"FormatPhoneNumber": "xxxxx"
}
},
{
],
"Email": [
{
"EmailPreference": "Primary",
"EmailDetails": {
"EmailType": "Home",
"EmailAddress": "xxxxxxx#GMAIL.COM"
}
}
],
}
]
}
UPDATE
Tried with the below recommended group by method, it ended up giving 1 customer details, but the email is repeated 4 times in the list. Ideally it should be having only 1 email. Also In the Address Preference Alternate has 1 address and primary has 1 address, but the Alternate shows 2 entries and primary shows 2. Could you please help with an ideal solution.
Probably this should work. id is like a custid in your example which has repeating values.
>>> df.show()
+----+------------+----------+
| id| address| email|
+----+------------+----------+
|1001| address-a| email-a|
|1001| address-b| email-b|
|1002|address-1002|email-1002|
|1003|address-1003|email-1002|
|1002| address-c| email-2|
+----+------------+----------+
Aggregate on those repeating columns and then convert to JSON
>>> results = df.groupBy("id").agg(collect_list("address").alias("address"),collect_list("email").alias("email")).toJSON().collect()
>>> for i in results: print(i)
...
{"id":"1003","address":["address-1003"],"email":["email-1002"]}
{"id":"1002","address":["address-1002","address-c"],"email":["email-1002","email-2"]}
{"id":"1001","address":["address-a","address-b"],"email":["email-a","email-b"]}

How to find common struct for all documents in collection?

I have an array of documents, that have more or less same structure. But I need find fields that present in all documents. Somethink like:
{
"name": "Jow",
"salary": 7000,
"age": 25,
"city": "Mumbai"
},
{
"name": "Mike",
"backname": "Brown",
"sex": "male",
"city": "Minks",
"age": 30
},
{
"name": "Piter",
"hobby": "footbol",
"age": 25,
"location": "USA"
},
{
"name": "Maria",
"age": 22,
"city": "Paris"
},
All docs have name and age. How to find them with ArangoDB?
You could do the following:
Retrieve the attribute names of each document
Get the intersection of those attributes
i.e.
LET attrs = (FOR item IN test RETURN ATTRIBUTES(item, true))
RETURN APPLY("INTERSECTION", attrs)
APPLY is necessary so each list of attributes in attrs can be passed as a separate parameter to INTERSECTION.
Documentation:
ATTRIBUTES: https://www.arangodb.com/docs/stable/aql/functions-document.html#attributes
INTERSECTION: https://www.arangodb.com/docs/stable/aql/functions-array.html#intersection
APPLY: https://www.arangodb.com/docs/stable/aql/functions-miscellaneous.html#apply

Getting separate street number and street name from Foursquare search api?

I'm using the Foursquare's venue search API and everything is working as expected. But the response does not contain separate fields for street number and street address. It contains one field named "address" that contains both, like the example below
"location": {
"address": "180 Orchard St",
"crossStreet": "btwn Houston & Stanton St",
"lat": 40.72173744277209,
"lng": -73.98800687282996,
"labeledLatLngs": [
{
"label": "display",
"lat": 40.72173744277209,
"lng": -73.98800687282996
}
],
"distance": 8,
"postalCode": "10002",
"cc": "US",
"city": "New York",
"state": "NY",
"country": "United States",
"formattedAddress": [
"180 Orchard St (btwn Houston & Stanton St)",
"New York, NY 10002",
"United States"
]
I need to have the street number and street name separately. Is there anyway to get a response that has them separated? or the only way is for me to parse it?

Return distinct and sorted query in AQL

So I have two collections, one with cities with an array of postal codes as a property and one with postal codes and their latitude & longitude.
I want to return the cities closest to a coordinate. This is easy enough with a geo index but the issue I'm having is the same city being returned multiple times and some times it can be the 1st and 3rd closest because the postal code that I'm searching in bordering another city.
cities example data:
[
{
"_key": "30936019",
"_id": "cities/30936019",
"_rev": "30936019",
"countryCode": "US",
"label": "Colorado Springs, CO",
"name": "Colorado Springs",
"postalCodes": [
"80904",
"80927"
],
"region": "CO"
},
{
"_key": "30983621",
"_id": "cities/30983621",
"_rev": "30983621",
"countryCode": "US",
"label": "Manitou Springs, CO",
"name": "Manitou Springs",
"postalCodes": [
"80829"
],
"region": "CO"
}
]
postalCodes example data:
[
{
"_key": "32132856",
"_id": "postalCodes/32132856",
"_rev": "32132856",
"countryCode": "US",
"location": [
38.9286,
-104.6583
],
"postalCode": "80927"
},
{
"_key": "32147422",
"_id": "postalCodes/32147422",
"_rev": "32147422",
"countryCode": "US",
"location": [
38.8533,
-104.8595
],
"postalCode": "80904"
},
{
"_key": "32172144",
"_id": "postalCodes/32172144",
"_rev": "32172144",
"countryCode": "US",
"location": [
38.855,
-104.9058
],
"postalCode": "80829"
}
]
The following query works but as an ArangoDB newbie I'm wondering if there's a more efficient way to do this:
FOR p IN WITHIN(postalCodes, 38.8609, -104.8734, 30000, 'distance')
FOR c IN cities
FILTER p.postalCode IN c.postalCodes AND c.countryCode == p.countryCode
COLLECT close = c._id AGGREGATE distance = MIN(p.distance)
FOR c2 IN cities
FILTER c2._id == close
SORT distance
RETURN c2
The first FOR in the query will use the geo index and probably return few documents (just the postal codes around the specified location).
The second FOR will look up the city for each found postal code. This may be an issue, depending on whether there is an index present on cities.postalCodes and cities.countryCode. If not, then the second FOR has to do a full scan of the cities collection each time it is involved. This will be inefficient. It may therefore be create an index on the two attributes like this:
db.cities.ensureIndex({ type: "hash", fields: ["countryCode", "postalCodes[*]"] });
The third FOR can be removed entirely when not COLLECTing by c._id but by c:
FOR p IN WITHIN(postalCodes, 38.8609, -104.8734, 30000, 'distance')
FOR c IN cities
FILTER p.postalCode IN c.postalCodes AND c.countryCode == p.countryCode
COLLECT city = c AGGREGATE distance = MIN(p.distance)
SORT distance
RETURN city
This will shorten the query string, but it may not help efficiency much I think, as the third FOR will use the primary index to look up the city documents, which is O(1).
In general, when in doubt about a query using indexes, you can use db._explain(queryString) to show which indexes will be used by a query.

Update inner object in arangodb

I have an object stored in arangodb which has additional inner objects, my current use case requires that I update just one of the elements.
Store Object
{
"status": "Active",
"physicalCode": "99999",
"postalCode": "999999",
"tradingCurrency": "USD",
"taxRate": "14",
"priceVatInclusive": "No",
"type": "eCommerce",
"name": "John and Sons inc",
"description": "John and Sons inc",
"createdDate": "2015-05-25T11:04:14+0200",
"modifiedDate": "2015-05-25T11:04:14+0200",
"physicalAddress": "Corner moon and space 9 station",
"postalAddress": "PO Box 44757553",
"physicalCountry": "Mars Sector 9",
"postalCountry": "Mars Sector 9",
"createdBy": "john.doe",
"modifiedBy": "john.doe",
"users": [
{
"id": "577458630580",
"username": "john.doe"
}
],
"products": [
{
"sellingPrice": "95.00",
"inStock": "10",
"name": "School Shirt Green",
"code": "SKITO2939999995",
"warehouseId": "723468998682"
},
{
"sellingPrice": "95.00",
"inStock": "5",
"name": "School Shirt Red",
"code": "SKITO245454949495",
"warehouseId": "723468998682"
},
{
"sellingPrice": "95.00",
"inStock": "10",
"discount": "5%",
"name": "School Shirt Blue",
"code": "SKITO293949495",
"warehouseId": "723468998682"
}
]
}
I want to change just one of the products stock value
{
"sellingPrice": "95.00",
"inStock": "10",
"discount": "5%",
"name": "School Shirt Blue",
"code": "SKITO293949495",
"warehouseId": "723468998682"
}
Like update store product stock less 1 where store id = x, something to this effect
FOR store IN stores
FILTER store._key == "837108415472"
FOR product IN store.products
FILTER product.code == "SKITO293949495"
UPDATE product WITH { inStock: (product.inStock - 1) } IN store.products
Apart from the above possibly it makes sense to store product as a separate document in collection store_products. I believe in NOSQL that is the best approach to reduce document size.
Found answer
here arangodb-aql-update-single-object-in-embedded-array and there
arangodb-aql-update-for-internal-field-of-object
I however believe it is best to maintain separate documents and rather use joins when retrieving. Updates easily

Resources