Is reading whole object from DocumentDb faster and more efficient? - azure

I'm trying to understand if it would actually be more efficient to read the entire document from Azure DocumentDb than it is to read a property that may have multiple objects in it?
Let's use this basketball team object as an example:
{
id: 123,
name: "Los Angeles Lakers",
coach: "Byron Scott",
players: [
{ id: 24, name: "Kobe Bryant" },
{ id: 3, name: "Anthony Brown" },
{ id: 4, name: "Ryan Kelly" },
]
}
If I want to get only a list of players, is it more efficient/faster for me to read the entire team document from which I can extract the players OR is it better to send SQL statement and try to read only the players from the document?

Returning only the players will be more efficient on the network, as you're returning less data. And, you should also be able to look at the Request Units burned for your query.
For example, I put your document into one of my collections and ran two queries in the portal (and if you do the same, and look at the bottom of the portal, you'll see the resulting Request Unit cost). I slightly modified your document with unique ID and quotes around everything, so I could load it via the portal:
{
"id": "basketball123",
"name": "Los Angeles Lakers",
"coach": "Byron Scott",
"players": [
{ "id": 24, "name": "Kobe Bryant" },
{ "id": 3, "name": "Anthony Brown" },
{ "id": 4, "name": "Ryan Kelly" }
]
}
I first selected just player data:
SELECT c.players FROM c where c.id="basketball123"
with an RU cost of 2.2:
I then asked for the entire document:
SELECT * FROM c where c.id="basketball123"
with an RU cost of 2.24:
Note: Your document size is very small, so there's really not much difference here. But at least you can see that returning a subset costs less than returning the entire document.

Related

How to create graphs or historigrams with mongoDB

I need to make a graph where to show how many total users we have registered in a time interval. (The language is TypeScript but I can get an answer in another language or using the aggregate of mongoDB)
Example:
Day 1: 10 total users registered
Day 2: 139 ...
Day 3: 1230 ...
Day 4: 2838 ...
...
...
Current day: X number of users ... and so it would end.
It should be noted that all users have a field called createAt, which is of type date.
I tried to obtain the users by means of cubes but it is not an optimal solution.
const response = await this.userModel.aggregate([
{
$bucketAuto: {
groupBy: '$createdAt',
buckets: 4,
},
},
]);
console.log(response);
I have also thought about using mapReduce from mongoDB and pass the specified function to it. But in terms of performance, I would like to know if that could create the pipelines simply with aggregate. mapReduce would be a second option (slightly slower) and as a last option to get all the users (only with the CreateAt field) and process them in my backend.
Thank you in advance for your answers
Update
I also mean that with autoBucket it orders them by non-specific time intervals, it basically orders them by the number of users and groups them by the creation dates, also when i passed the dates to mongodb, with $bucket the result is not as expected
$bucketAuto Option
$bucket option
input Example:
const list = [
{
"createdAt": "2021-08-30T23:47:16.663Z",
"_id": "612d6e044007a95446848cef"
},
{
"createdAt": "2021-08-31T04:18:11.820Z",
"_id": "612dad830541fa001bb63671"
},
{
"createdAt": "2021-08-31T04:18:47.794Z",
"_id": "612dada70541fa001bb63674"
},
{
"createdAt": "2021-08-31T04:20:14.415Z",
"_id": "612dadfe0541fa001bb63678"
},
{
"createdAt": "2021-08-31T04:22:45.580Z",
"_id": "612dae950541fa001bb63682"
},
{
"createdAt": "2021-08-31T11:24:28.471Z",
"_id": "612e116c0541fa001bb63688"
},
{
"createdAt": "2021-08-31T18:47:09.452Z",
"_id": "612e792dba2a3e1d081c9f3d"
}
];

It is possible to have varying data structures in an Azure search index?

Below is some of the data I'm putting into an Azure search index:
I could go with this rigid structure but it needs to support different data types. I could keep adding fields - i.e. Field4, Field5, ... but I wondered if I could have something like a JSON field? So the index could be modelled like below:
[
{
"entityId":"dba656d3-f044-4cc0-9930-b5e77e664a8f",
"entityName":"character",
"data":{
"name":"Luke Skywalker",
"role":"Jedi"
}
},
{
"entityId":"b37bf987-0978-4fc4-9a51-b02b4a5eed53",
"entityName":"character",
"data":{
"name":"C-3PO",
"role":"Droid"
}
},
{
"entityId":"b161b9dc-552b-4744-b2d7-4584a9673669",
"entityName":"film",
"data":{
"name":"A new hope"
}
},
{
"entityId":"e59acdaf-5bcd-4536-a8e9-4f3502cc7d85",
"entityName":"film",
"data":{
"name":"The Empire Strikes Back"
}
},
{
"entityId":"00501b4a-5279-41e9-899d-a914ddcc562e",
"entityName":"vehicle",
"data":{
"name":"Sand Crawler",
"model":"Digger Crawler",
"manufacturer":"Corellia Mining Corporation"
}
},
{
"entityId":"fe815cb6-b03c-401e-a871-396f2cd3eaba",
"entityName":"vehicle",
"data":{
"name":"TIE/LN starfighter",
"model":"win Ion Engine/Ln Starfighter",
"manufacturer":"Sienar Fleet Systems"
}
}
]
I know that I can put JSON in a string field, but that would negatively impact the search matching and also filtering.
Is this possible in Azure search or is there a different way to achieve this kind of requirement?
See the article How to model complex data types. The hotel example data translates nicely to your use-case I believe. If your different entities have different sets of properties you can create a "complex type" similar to the Address or Amenities example below.
Structural updates
You can add new sub-fields to a complex field at any time without the
need for an index rebuild. For example, adding "ZipCode" to Address or
"Amenities" to Rooms is allowed, just like adding a top-level field to
an index.
{
"HotelId": "1",
"HotelName": "Secret Point Motel",
"Description": "Ideally located on the main commercial artery of the city in the heart of New York.",
"Tags": ["Free wifi", "on-site parking", "indoor pool", "continental breakfast"]
"Address": {
"StreetAddress": "677 5th Ave",
"City": "New York",
"StateProvince": "NY"
},
"Rooms": [
{
"Description": "Budget Room, 1 Queen Bed (Cityside)",
"RoomNumber": 1105,
"BaseRate": 96.99,
},
{
"Description": "Deluxe Room, 2 Double Beds (City View)",
"Type": "Deluxe Room",
"BaseRate": 150.99,
}
. . .
]
}

"Transaction was aborted due to detection of concurrent modification" in FaunaDB

I have a document that could be written to from many different concurrent requests.. the same section of the document isn't altered, but it could see concurrent writes (from a nodejs app).
example:
{
name: "testing",
results: {
a: { ... },
b: { ... },
}
I could update the document with "c", etc etc.
If I don't async await the transactions (in a test, for example), I will get partial writes and an error "transaction was aborted due to detection of concurrent modification" .. What's the best way to go about this? I feel like Fauna's main selling point is dealing with issues like this, but I don't have enough knowledge to understand my way around it.
Anyone have any queue strategies/ideas/suggestions?
index:
CreateIndex({
"name": "byName",
"unique": true,
"source": Collection("Testing"),
"serialized": true,
"terms":
[
{ "field": [ "data", "name" ] }
]
})
JS AWS Lambda function is what is doing the writing..
Currently the unit of transaction in Fauna is the document. So in this case I'd recommend something like the following:
CreateCollection({name: "result"})
CreateCollection({name: "sub-result"})
CreateIndex({
name: "result-agg",
source: Collection("sub-result"),
terms: [{"field": ["data", "parent"]}]
})
Assuming parent contained the ref of the main result. Then given $ref as a result ref
Let({
subs: Select("data", Map(Paginate(Match(Index("result-agg"), $ref)), Lambda("x", Get(Var("x")))))
main: Select("data", Get($ref))},
Merge(Var("main"), {results: Var("subs")})
)

Making MongoDB more 'relational'

I like using MongoDB but can't quite swallow the non-relational aspect of it. As far as I can tell from mongo users and the docs: "It's fine, just duplicate parts of your data".
As I'm worried about scaling, and basically just not remembering to update parts of the code to update the correct parts of the data, it seems like a good trade-off to just do an extra query when my API has to return the data for a user with a summary of posts included:
{
"id": 1,
"name": "Default user",
"posts_summary": [
{
"id": 1,
"name": "I am making a blog post",
"description": "I write about some stuff and there are comments after it",
"tags_count": 3
},
{
"id": 2,
"name": "This is my second post",
"description": "In this one I write some more stuff",
"tags_count": 4
}
]
}
...when the posts data looks like this below:
//db.posts
{
"id": 1,
"owner": 1,
"name": "I am making a blog post",
"description": "I write about some stuff and there are comments after it",
"tags": ["Writing", "Blogs", "Stuff"]
},
{
"id": 2,
"owner": 1,
"name": "This is my second post",
"description": "In this one I write some mores tuff",
"tags": ["Writing", "Blogs", "Stuff", "Whatever"]
}
So behind the API, when the query to get the user succeeds, I am doing an additional query to the posts collection to get the "posts_summary" data I need, and adding it in before the API sends response.
It seems like a good trade-off considering the problems it will solve later. Is this what some mongo users do to get around it not being relational, or have I made a mistake when designing my schema?
You can use schema objects as references to implement relational mapping using mongoose
http://mongoosejs.com/docs/populate.html
using mongoose ur schema would be like:
User:Schema({
_id : Number,
name : String,
owner : String,
Post : [{ type: Schema.Types.ObjectId, ref: 'Post' }]
});
Post:Schema({
_id : Number,
name : String,
owner : String,
description : String,
tags:[String]
})

What's best practice "joining" a bunch of values in mongoose/mongodb without populate

Let me start off by stating that I'm aware of the populate method that mongoose offers, but since my work has decided to move to native mongodb drivers in the future, I can no longer rely on populate to avoid work for myself latter on.
If I have two collections of Documents
People
{_id:1, name:Austin}
{_id:2, name:Doug}
{_id:3, name:Nick}
{_id:4, name:Austin}
Hobbies:
{Person: 1, Hobby: Cars}
{Person:1, Hobby: Boats}
{Person:3, Hobby: Chess}
{Person:4, Hobby: Cars}
How should I go about joining each document in people with Hobbies. Ideally I would prefer to only have to call the database twice once to get the people and the second time to get the hobbies, and then return to the client app objects with them joined toeghter.
It depends on what is your primary concern. Generally, I would say to embed the hobbies into the People, like:
{
"_id":1,
"name":"Austin",
"hobbies": [
"Cars","Boats"
]
},
{
"_id":2,
"name":"Doug",
"hobbies": []
},
{
"_id":3,
"name":"Nick",
"hobbies": [
"Chess"
]
},
{
"_id":4,
"name":"Austin",
"hobbies": [
"Cars"
]
}
which would give you the possibility of using a multi keyed index on hobbies and allow queries like this:
db.daCollection.find({"hobbies":"Cars"})
which would return both Austins as complete documents. Yes, I know that there would be a lot of redundant entries. If you would try to prevent that, could model it like this:
{
"_id": 1,
"name":"Cars"
},...
{
"_id":1,
"name":"Austin",
"hobbies": [
1, ...
]
}
which would need an additional index on the name field of the hobby to be efficient. So when you would want to find every person which is into cars, you would need to find the _id and query for it like
db.person.find({"hobbies":1})
I think it is easier, more intuitive and for most use cases faster if you use the embedding.

Resources