How to find common struct for all documents in collection? - arangodb

I have an array of documents, that have more or less same structure. But I need find fields that present in all documents. Somethink like:
{
"name": "Jow",
"salary": 7000,
"age": 25,
"city": "Mumbai"
},
{
"name": "Mike",
"backname": "Brown",
"sex": "male",
"city": "Minks",
"age": 30
},
{
"name": "Piter",
"hobby": "footbol",
"age": 25,
"location": "USA"
},
{
"name": "Maria",
"age": 22,
"city": "Paris"
},
All docs have name and age. How to find them with ArangoDB?

You could do the following:
Retrieve the attribute names of each document
Get the intersection of those attributes
i.e.
LET attrs = (FOR item IN test RETURN ATTRIBUTES(item, true))
RETURN APPLY("INTERSECTION", attrs)
APPLY is necessary so each list of attributes in attrs can be passed as a separate parameter to INTERSECTION.
Documentation:
ATTRIBUTES: https://www.arangodb.com/docs/stable/aql/functions-document.html#attributes
INTERSECTION: https://www.arangodb.com/docs/stable/aql/functions-array.html#intersection
APPLY: https://www.arangodb.com/docs/stable/aql/functions-miscellaneous.html#apply

Related

How to replace existing key in jsonb?

I'm trying to update a jsonb array in Postgres by replacing the entire array. It's important to note, I'm not trying to add an array to the object, but simply replace the whole thing with new values. When I try the code below, I get this error in the console
error: cannot replace existing key
I'm using Nodejs as server-side language.
server.js
//new array with new values
var address = {
"appt": appt,
"city": city,
"street": street,
"country": country,
"timezone": timezone,
"coordinates": coordinates,
"door_number": door_number,
"state_province": state_province,
"zip_postal_code": zip_postal_code
}
//query
var text = "UPDATE users SET info = JSONB_insert(info, '{address}', '" + JSON.stringify(address) + "') WHERE id=$1 RETURNING*";
var values = [userid];
//pool...[below]
users table
id(serial | info(jsonb)
And this is the object I need update
{
"dob": "1988-12-29",
"type": "seller",
"email": "eyetrinity3#test.com",
"phone": "5553766962",
"avatar": "f",
"address": [
{
"appt": "",
"city": "Brandon",
"street": "11th Street East",
"country": "Canada",
"timezone": "Eastern Standard Time",
"coordinates": [
"-99.925011",
"49.840649"
],
"door_number": "666",
"state_province": "Manitoba",
"zip_postal_code": "R7A 7B8"
}
],
"last_name": "doe",
"first_name": "john",
"date_created": "2022-11-12T19:44:36.714Z",
}
below works in db-fiddle Postgresql v15 (did not in work in v12)
specific element
update json_update_t set info['address'][0] = '{
"appt": "12",
"city": "crater",
"street": "11th Street East",
"country": "mars",
"timezone": "Eastern Standard Time",
"coordinates": [
"-99.925011",
"49.840649"
],
"door_number": "9999",
"state_province": "marsbar",
"zip_postal_code": "abc 123"
}';
whole array
update json_update_t set info['address'] = '[{
"appt": "14",
"city": "crater",
"street": "11th Street East",
"country": "mars",
"timezone": "Eastern Standard Time",
"coordinates": [
"-99.925011",
"49.840649"
],
"door_number": "9999",
"state_province": "marsbar",
"zip_postal_code": "abc 123"
}]';
I have found the answer for this. Going through some of my older apps I coded, I stumbled upon the answer. It's not JSONB_INSERT but JSONB_SET. Notice the difference. The later will replace the entire key and not insert or add to the object.
JSONB_INSERT --> insert
UPDATE users SET info = JSONB_insert(info, '{address,-1}', '" + JSON.stringify(address) + "',true) WHERE id=$1 RETURNING*
JSONB_SET --> set and replace
UPDATE users SET info = JSONB_SET(info, '{address}', '" + JSON.stringify(address) +"') WHERE id=$1 RETURNING*

Merge documents by fields

I have two types of docs. Main docs and additional info for it.
{
"id": "371"
"name": "Mike",
"location": "Paris"
},
{
"id": "371-1",
"age": 20,
"lastname": "Piterson"
}
I need to merge them by id, to get result doc. The result should look like:
{
"id": "371"
"name": "Mike",
"location": "Paris"
"age": 20,
"lastname": "Piterson"
}
Using COLLECT / INTO, SPLIT(), and MERGE():
FOR doc IN collection
COLLECT id = SPLIT(doc.id, '-')[0] INTO groups
RETURN MERGE(MERGE(groups[*].doc), {id})
Result:
[
{
"id": "371",
"location": "Paris",
"name": "Mike",
"lastname": "Piterson",
"age": 20
}
]
This will:
Split each id attribute at any - and return the first part
Group the results into sepearate arrays (groups)
Merge #1: Merge all objects into one
Merge #2: Merge the id into the result
See REMOVE & INSERT or REPLACE for write operations.

python to find the oldest person in the state with the lowest average

I am new to programming and I understand for loops, if/else statements and a bit of dictionaries. I don't mind coding things myself but I am having hard time figuring out the logic.
Problem statement - find the oldest person in the state with the lowest average age from this file.
I am able to iterate over the entire list and get all the entries printed out. But not sure how to find the oldest person with the lowest average... which is per state.
sample data:
[
{
"name": "xyz",
"state": "Connecticut",
"age": 95
},
{
"name": "abc",
"state": "Maine",
"age": 82
},
{
"name": "qwr",
"state": "Missouri",
"age": 56
},
{
"name": "qwer",
"state": "Maryland",
"age": 56
},
{
"name": "asdf",
"state": "Rhode Island",
"age": 17
},
]
Hm, could you clarify what you mean by "lowest average age"? Do you mean that you want to find 1.) the oldest person in the entire list, and also 2.) the state that has the lowest average age? In that case, you can do something like this:
def get_oldest_citizen(data):
return max(data, key=lambda citizen: citizen.age)
def get_state_with_youngest_average_age(data):
state_ages = {}
for citizen in data:
if citizen.state in state_ages:
state_ages[citizen.state] += [citizen.age]
else:
state_ages[citizen.state] = [citizen.age]
for state, ages in state_ages.items():
state_ages[state] = sum(ages) / len(ages)
return min(state_ages.values())
There's definitely a more concise and pythonic way to do the second one, but it gets the basic idea across -- basically just keep track of all citizens of a given state's ages, then take the average once you've recorded them all and find the youngest of those.
Edit 1: I realized now what you mean, that you want to find the oldest citizen in the state that has the lowest average age. In that case, all you have to do is use these two functions together, supplying the first with data you can find from the second.
You can rearrange your data by state. Then use min and lambda to find the lowest average state. then max and lambda to find theoldest person in that state.
data = [
{"name": "xyz", "state": "Connecticut", "age": 95},
{"name": "abc", "state": "Maine", "age": 82},
{"name": "qwr", "state": "Missouri", "age": 56},
{"name": "qwer", "state": "Maryland", "age": 22},
{"name": "asdf", "state": "Rhode Island", "age": 57},
{"name": "adam", "state": "Maryland", "age": 14},
]
from collections import defaultdict
data_by_state = defaultdict(list)
for record in data:
data_by_state[record["state"]].append(record)
lowest_avg_age_state = min(data_by_state, key=lambda state: sum([record["age"] for record in data_by_state[state]]) / len(data_by_state[state]))
print(f"{lowest_avg_age_state=}")
oldest_person_in_state = max(data_by_state[lowest_avg_age_state], key=lambda record: record["age"])
print(f"{oldest_person_in_state['name']=}")
OUTPUT
lowest_avg_age_state='Maryland'
oldest_person_in_state['name']='qwer'

Return distinct and sorted query in AQL

So I have two collections, one with cities with an array of postal codes as a property and one with postal codes and their latitude & longitude.
I want to return the cities closest to a coordinate. This is easy enough with a geo index but the issue I'm having is the same city being returned multiple times and some times it can be the 1st and 3rd closest because the postal code that I'm searching in bordering another city.
cities example data:
[
{
"_key": "30936019",
"_id": "cities/30936019",
"_rev": "30936019",
"countryCode": "US",
"label": "Colorado Springs, CO",
"name": "Colorado Springs",
"postalCodes": [
"80904",
"80927"
],
"region": "CO"
},
{
"_key": "30983621",
"_id": "cities/30983621",
"_rev": "30983621",
"countryCode": "US",
"label": "Manitou Springs, CO",
"name": "Manitou Springs",
"postalCodes": [
"80829"
],
"region": "CO"
}
]
postalCodes example data:
[
{
"_key": "32132856",
"_id": "postalCodes/32132856",
"_rev": "32132856",
"countryCode": "US",
"location": [
38.9286,
-104.6583
],
"postalCode": "80927"
},
{
"_key": "32147422",
"_id": "postalCodes/32147422",
"_rev": "32147422",
"countryCode": "US",
"location": [
38.8533,
-104.8595
],
"postalCode": "80904"
},
{
"_key": "32172144",
"_id": "postalCodes/32172144",
"_rev": "32172144",
"countryCode": "US",
"location": [
38.855,
-104.9058
],
"postalCode": "80829"
}
]
The following query works but as an ArangoDB newbie I'm wondering if there's a more efficient way to do this:
FOR p IN WITHIN(postalCodes, 38.8609, -104.8734, 30000, 'distance')
FOR c IN cities
FILTER p.postalCode IN c.postalCodes AND c.countryCode == p.countryCode
COLLECT close = c._id AGGREGATE distance = MIN(p.distance)
FOR c2 IN cities
FILTER c2._id == close
SORT distance
RETURN c2
The first FOR in the query will use the geo index and probably return few documents (just the postal codes around the specified location).
The second FOR will look up the city for each found postal code. This may be an issue, depending on whether there is an index present on cities.postalCodes and cities.countryCode. If not, then the second FOR has to do a full scan of the cities collection each time it is involved. This will be inefficient. It may therefore be create an index on the two attributes like this:
db.cities.ensureIndex({ type: "hash", fields: ["countryCode", "postalCodes[*]"] });
The third FOR can be removed entirely when not COLLECTing by c._id but by c:
FOR p IN WITHIN(postalCodes, 38.8609, -104.8734, 30000, 'distance')
FOR c IN cities
FILTER p.postalCode IN c.postalCodes AND c.countryCode == p.countryCode
COLLECT city = c AGGREGATE distance = MIN(p.distance)
SORT distance
RETURN city
This will shorten the query string, but it may not help efficiency much I think, as the third FOR will use the primary index to look up the city documents, which is O(1).
In general, when in doubt about a query using indexes, you can use db._explain(queryString) to show which indexes will be used by a query.

Update inner object in arangodb

I have an object stored in arangodb which has additional inner objects, my current use case requires that I update just one of the elements.
Store Object
{
"status": "Active",
"physicalCode": "99999",
"postalCode": "999999",
"tradingCurrency": "USD",
"taxRate": "14",
"priceVatInclusive": "No",
"type": "eCommerce",
"name": "John and Sons inc",
"description": "John and Sons inc",
"createdDate": "2015-05-25T11:04:14+0200",
"modifiedDate": "2015-05-25T11:04:14+0200",
"physicalAddress": "Corner moon and space 9 station",
"postalAddress": "PO Box 44757553",
"physicalCountry": "Mars Sector 9",
"postalCountry": "Mars Sector 9",
"createdBy": "john.doe",
"modifiedBy": "john.doe",
"users": [
{
"id": "577458630580",
"username": "john.doe"
}
],
"products": [
{
"sellingPrice": "95.00",
"inStock": "10",
"name": "School Shirt Green",
"code": "SKITO2939999995",
"warehouseId": "723468998682"
},
{
"sellingPrice": "95.00",
"inStock": "5",
"name": "School Shirt Red",
"code": "SKITO245454949495",
"warehouseId": "723468998682"
},
{
"sellingPrice": "95.00",
"inStock": "10",
"discount": "5%",
"name": "School Shirt Blue",
"code": "SKITO293949495",
"warehouseId": "723468998682"
}
]
}
I want to change just one of the products stock value
{
"sellingPrice": "95.00",
"inStock": "10",
"discount": "5%",
"name": "School Shirt Blue",
"code": "SKITO293949495",
"warehouseId": "723468998682"
}
Like update store product stock less 1 where store id = x, something to this effect
FOR store IN stores
FILTER store._key == "837108415472"
FOR product IN store.products
FILTER product.code == "SKITO293949495"
UPDATE product WITH { inStock: (product.inStock - 1) } IN store.products
Apart from the above possibly it makes sense to store product as a separate document in collection store_products. I believe in NOSQL that is the best approach to reduce document size.
Found answer
here arangodb-aql-update-single-object-in-embedded-array and there
arangodb-aql-update-for-internal-field-of-object
I however believe it is best to maintain separate documents and rather use joins when retrieving. Updates easily

Resources