I have a document that could be written to from many different concurrent requests.. the same section of the document isn't altered, but it could see concurrent writes (from a nodejs app).
example:
{
name: "testing",
results: {
a: { ... },
b: { ... },
}
I could update the document with "c", etc etc.
If I don't async await the transactions (in a test, for example), I will get partial writes and an error "transaction was aborted due to detection of concurrent modification" .. What's the best way to go about this? I feel like Fauna's main selling point is dealing with issues like this, but I don't have enough knowledge to understand my way around it.
Anyone have any queue strategies/ideas/suggestions?
index:
CreateIndex({
"name": "byName",
"unique": true,
"source": Collection("Testing"),
"serialized": true,
"terms":
[
{ "field": [ "data", "name" ] }
]
})
JS AWS Lambda function is what is doing the writing..
Currently the unit of transaction in Fauna is the document. So in this case I'd recommend something like the following:
CreateCollection({name: "result"})
CreateCollection({name: "sub-result"})
CreateIndex({
name: "result-agg",
source: Collection("sub-result"),
terms: [{"field": ["data", "parent"]}]
})
Assuming parent contained the ref of the main result. Then given $ref as a result ref
Let({
subs: Select("data", Map(Paginate(Match(Index("result-agg"), $ref)), Lambda("x", Get(Var("x")))))
main: Select("data", Get($ref))},
Merge(Var("main"), {results: Var("subs")})
)
Related
Imaging I have an array of objects, available before the aggregate query:
const groupBy = [
{
realm: 1,
latest_timestamp: 1318874398, //Date.now() values, usually different to each other
item_id: 1234, //always the same
},
{
realm: 2,
latest_timestamp: 1312467986, //actually it's $max timestamp field from the collection
item_id: 1234,
},
{
realm: ..., //there are many of them
latest_timestamp: ...,
item_id: 1234,
},
{
realm: 10,
latest_timestamp: 1318874398, //but sometimes then can be the same
item_id: 1234,
},
]
And collection (example set available on MongoPlayground) with the following schema:
{
realm: Number,
timestamp: Number,
item_id: Number,
field: Number, //any other useless fields in this case
}
My problem is, how to $group the values from the collection via the aggregation framework by using the already available set of data (from groupBy) ?
What have been tried already.
Okay, let skip crap ideas, like:
for (const element of groupBy) {
//array of `find` queries
}
My current working aggregation query is something like that:
//first stage
{
$match: {
"item": 1234
"realm" [1,2,3,4...,10]
}
},
{
$group: {
_id: {
realm: '$realm',
},
latest_timestamp: {
$max: '$timestamp',
},
data: {
$push: '$$ROOT',
},
},
},
{
$unwind: '$data',
},
{
$addFields: {
'data.latest_timestamp': {
$cond: {
if: {
$eq: ['$data.timestamp', '$latest_timestamp'],
},
then: '$latest_timestamp',
else: '$$REMOVE',
},
},
},
},
{
$replaceRoot: {
newRoot: '$data',
},
},
//At last, after this stages I can do useful job
but I found it a bit obsolete, and I already heard that using [.mapReduce][1] could solve my problem a bit faster, than this query. (But official docs doesn't sound promising about it) Does it true?
As for now, I am using 4 or 5 stages, before start working with useful (for me) documents.
Recent update:
I have checked the $facet stage and I found it curious for this certain case. Probably it will help me out.
For what it's worth:
After receiving documents after the necessary stages I am building a representative cluster chart, that you may also know as a heatmap
After that I was iterating each document (or array of objects) one-by-one to find their correct x and y coordinated in place which should be:
[
{
x: x (number, actual $price),
y: y (number, actual $realm),
value: price * quantity,
quantity: sum_of_quantity_on_price_level
}
]
As for now, it's old awful code with for...loop inside each other, but in the future, I will be using $facet => $bucket operators for that kind of job.
So, I have found an answer to my question in another, but relevant way.
I was thinking about using $facet operator and to be honest, it's still an option, but using it, as below is a bad practice.
//building $facet query before aggregation
const ObjectQuery = {}
for (const realm of realms) {
Object.assign(ObjectQuery, { `${realm.name}` : [ ... ] }
}
//mongoose query here
aggregation([{
$facet: ObjectQuery
},
...
])
So, I have chosen a $project stage and $switch operator to filter results, such as $groups do.
Also, using MapReduce could also solve this problem, but for some reason, the official Mongo docs recommends to avoid using it, and choose aggregation: $group and $merge operators instead.
I'm having a hard time understanding why I keep getting 0 results back from a query I am trying to perform. Basically I am trying to return only results within a date range. On a given table I have a createdAt which is a DateTime scalar. This basically gets automatically filled in from prisma (or graphql, not sure which ones sets this). So on any table I have the createdAt which is a DateTime string representing the DateTime when it was created.
Here is my schema for this given table:
type Audit {
id: ID! #unique
user: User!
code: AuditCode!
createdAt: DateTime!
updatedAt: DateTime!
message: String
}
I queried this table and got back some results, I'll share them here:
"getAuditLogsForUser": [
{
"id": "cjrgleyvtorqi0b67jnhod8ee",
"code": {
"action": "login"
},
"createdAt": "2019-01-28T17:14:30.047Z"
},
{
"id": "cjrgn99m9osjz0b67568u9415",
"code": {
"action": "adminLogin"
},
"createdAt": "2019-01-28T18:06:03.254Z"
},
{
"id": "cjrgnhoddosnv0b67kqefm0sb",
"code": {
"action": "adminLogin"
},
"createdAt": "2019-01-28T18:12:35.631Z"
},
{
"id": "cjrgnn6ufosqo0b67r2tlo1e2",
"code": {
"action": "login"
},
"createdAt": "2019-01-28T18:16:52.850Z"
},
{
"id": "cjrgq8wwdotwy0b67ydi6bg01",
"code": {
"action": "adminLogin"
},
"createdAt": "2019-01-28T19:29:45.616Z"
},
{
"id": "cjrgqaoreoty50b67ksd04s2h",
"code": {
"action": "adminLogin"
},
"createdAt": "2019-01-28T19:31:08.382Z"
}]
Here is my getAuditLogsForUser schema definition
getAuditLogsForUser(userId: String!, before: DateTime, after: DateTime): [Audit!]!
So to test I would want to get all the results in between the last and first.
2019-01-28T19:31:08.382Z is last
2019-01-28T17:14:30.047Z is first.
Here is my code that would inject into the query statement:
if (args.after && args.before) {
where['createdAt_lte'] = args.after;
where['createdAt_gte'] = args.before;
}
console.log(where)
return await context.db.query.audits({ where }, info);
In playground I execute this statement
getAuditLogsForUser(before: "2019-01-28T19:31:08.382Z" after: "2019-01-28T17:14:30.047Z") { id code { action } createdAt }
So I want anything that createdAt_lte (less than or equal) set to 2019-01-28T17:14:30.047Z and that createdAt_gte (greater than or equal) set to 2019-01-28T19:31:08.382Z
However I get literally no results back even though we KNOW there is results.
I tried to look up some documentation on DateTime scalar in the graphql website. I literally couldn't find anything on it, but I see it in my generated prisma schema. It's just defined as Scalar. With nothing else special about it. I don't think I'm defining it elsewhere either. I am using Graphql-yoga if that makes any difference.
(generated prisma file)
scalar DateTime
I'm wondering if it's truly even handling this as a true datetime? It must be though because it gets generated as a DateTime ISO string in UTC.
Just having a hard time grasping what my issue could possibly be at this moment, maybe I need to define it in some other way? Any help is appreciated
Sorry I misread your example in my first reply. This is what you tried in the playground correct?
getAuditLogsForUser(
before: "2019-01-28T19:31:08.382Z",
after: "2019-01-28T17:14:30.047Z"
){
id
code { action }
createdAt
}
This will not work since before and after do not refer to time, but are cursors used for pagination. They expect an id. Since id's are also strings this query does not throw an error but will not find anything. Here is how pagination is used: https://www.prisma.io/docs/prisma-graphql-api/reference/queries-qwe1/#pagination
What I think you want to do is use a filter in the query. For this you can use the where argument. The query would look like this:
getAuditLogsForUser(
where:{AND:[
{createdAt_lte: "2019-01-28T19:31:08.382Z"},
{createdAt_gte: "2019-01-28T17:14:30.047Z"}
]}
) {
id
code { action }
createdAt
}
Here are the docs for filtering: https://www.prisma.io/docs/prisma-graphql-api/reference/queries-qwe1/#filtering
OK so figured out it had to do with the fact that I used "after" and "before" as an argument variable. I have no clue why this completely screws everything up, but it just wont return ANY results if you have this as a argument. Very strange. Must be abstracting some other variable somehow, probably a bug on graphql's end.
As soon as I tried a new variable name, viola, it works.
This is also possible:
const fileData = await prismaClient.fileCuratedData.findFirst({
where: {
fileId: fileId,
createdAt: {
gte: fromdate}
},
});
I'm trying to execute some aggregate queries against data in TSI. For example:
{
"searchSpan": {
"from": "2018-08-25T00:00:00Z",
"to": "2019-01-01T00:00:00Z"
},
"top": {
"sort": [
{
"input": {
"builtInProperty": "$ts"
}
}
]
},
"aggregates": [
{
"dimension": {
"uniqueValues": {
"input": {
"builtInProperty": "$esn"
},
"take": 100
}
},
"measures": [
{
"count": {}
}
]
}
]
}
The above query, however, does not return any record, although there are many events stored in TSI for that specific searchSpan. Here is the response:
{
"warnings": [],
"events": []
}
The query is based on the examples in the documentation which can be found here and which is actually lacking crucial information for requirements and even some examples do not work...
Any help would be appreciated. Thanks!
#Vladislav,
I'm sorry to hear you're having issues. In reviewing your API call, I see two fixes that should help remedy this issue:
1) It looks like you're using our /events API with payload for /aggregates API. Notice the "events" in the response. Additionally, “top” will be redundant for /aggregates API as we don't support top-level limit clause for our /aggregates API.
2) We do not enforce "count" property to be present in limit clause (“take”, “top” or “sample”) and it looks like you did not specify it, so by default, the value was set to 0, that’s why the call is returning 0 events.
I would recommend that you use /aggregates API rather than /events, and that “count” is specified in the limit clause to ensure you get some data back.
Additionally, I'll note your feedback on documentation. We are ramping up a new hire on documentation now, so we hope to improve the quality soon.
I hope this helps!
Andrew
I like using MongoDB but can't quite swallow the non-relational aspect of it. As far as I can tell from mongo users and the docs: "It's fine, just duplicate parts of your data".
As I'm worried about scaling, and basically just not remembering to update parts of the code to update the correct parts of the data, it seems like a good trade-off to just do an extra query when my API has to return the data for a user with a summary of posts included:
{
"id": 1,
"name": "Default user",
"posts_summary": [
{
"id": 1,
"name": "I am making a blog post",
"description": "I write about some stuff and there are comments after it",
"tags_count": 3
},
{
"id": 2,
"name": "This is my second post",
"description": "In this one I write some more stuff",
"tags_count": 4
}
]
}
...when the posts data looks like this below:
//db.posts
{
"id": 1,
"owner": 1,
"name": "I am making a blog post",
"description": "I write about some stuff and there are comments after it",
"tags": ["Writing", "Blogs", "Stuff"]
},
{
"id": 2,
"owner": 1,
"name": "This is my second post",
"description": "In this one I write some mores tuff",
"tags": ["Writing", "Blogs", "Stuff", "Whatever"]
}
So behind the API, when the query to get the user succeeds, I am doing an additional query to the posts collection to get the "posts_summary" data I need, and adding it in before the API sends response.
It seems like a good trade-off considering the problems it will solve later. Is this what some mongo users do to get around it not being relational, or have I made a mistake when designing my schema?
You can use schema objects as references to implement relational mapping using mongoose
http://mongoosejs.com/docs/populate.html
using mongoose ur schema would be like:
User:Schema({
_id : Number,
name : String,
owner : String,
Post : [{ type: Schema.Types.ObjectId, ref: 'Post' }]
});
Post:Schema({
_id : Number,
name : String,
owner : String,
description : String,
tags:[String]
})
I'm trying to understand if it would actually be more efficient to read the entire document from Azure DocumentDb than it is to read a property that may have multiple objects in it?
Let's use this basketball team object as an example:
{
id: 123,
name: "Los Angeles Lakers",
coach: "Byron Scott",
players: [
{ id: 24, name: "Kobe Bryant" },
{ id: 3, name: "Anthony Brown" },
{ id: 4, name: "Ryan Kelly" },
]
}
If I want to get only a list of players, is it more efficient/faster for me to read the entire team document from which I can extract the players OR is it better to send SQL statement and try to read only the players from the document?
Returning only the players will be more efficient on the network, as you're returning less data. And, you should also be able to look at the Request Units burned for your query.
For example, I put your document into one of my collections and ran two queries in the portal (and if you do the same, and look at the bottom of the portal, you'll see the resulting Request Unit cost). I slightly modified your document with unique ID and quotes around everything, so I could load it via the portal:
{
"id": "basketball123",
"name": "Los Angeles Lakers",
"coach": "Byron Scott",
"players": [
{ "id": 24, "name": "Kobe Bryant" },
{ "id": 3, "name": "Anthony Brown" },
{ "id": 4, "name": "Ryan Kelly" }
]
}
I first selected just player data:
SELECT c.players FROM c where c.id="basketball123"
with an RU cost of 2.2:
I then asked for the entire document:
SELECT * FROM c where c.id="basketball123"
with an RU cost of 2.24:
Note: Your document size is very small, so there's really not much difference here. But at least you can see that returning a subset costs less than returning the entire document.