I wrote brief schema for my project.
I am a beginner of Cassandra.
The schema looks like this.
User = { uid: “”,
media{
media1:{
Rating:”” ,
Views:””,
Like:””
},
media2:{
},
media3:{
},
……
}
}
Media={ mediaId:{
user:{
user1: {
rating:”” ,
views :””,
like:”” ,
comment:””
},
user2:{
},
user3:{
},
…..
},
category:””,
views:””,
rating:””,
likes:””,
attributes:{
audio:{
albumimgurl:””
track:””,
artist:””,
duration:””,
url:””
},
image:{
smallurl:””,
largeurl:””,
title:””
},
video:{
coverimage:””,
url:””,
duration:””,
title:””
},
article:{
title:””,
content:””
},
wallpaper:{
title:””,
smallurl:””,
midurl:””,
largeurl:””
},
},
}
First I have no idea my schema is right for Cassandra.
Please tell me that the schema is right for Cassandra.
Thank you.
Starting with JSON model is fine, since is easy to read - not in your case, but in general ;)
Here is nice formatter: http://jsonlint.com/
One level in JSON document corresponds to Column Family, two levels are representing already Super Column Family, and those are deprecated. More levels is not possible. When you need more levels use compound keys.
To remove one level from your JSON document:
attributes:{
audio:{
albumimgurl:””
track:””,
artist:””,
duration:””,
url:””
change it to:
attributes:{
audio:albumimgurl:””
audio:track:””,
audio:artist:””,
audio:duration:””,
audio:url:””
where audio:albumimgurl is column name - this in Cassandra Compound Column.
You can use any number of compounds, so: attributes:audio:albumimgurl is fine too
Related
Imaging I have an array of objects, available before the aggregate query:
const groupBy = [
{
realm: 1,
latest_timestamp: 1318874398, //Date.now() values, usually different to each other
item_id: 1234, //always the same
},
{
realm: 2,
latest_timestamp: 1312467986, //actually it's $max timestamp field from the collection
item_id: 1234,
},
{
realm: ..., //there are many of them
latest_timestamp: ...,
item_id: 1234,
},
{
realm: 10,
latest_timestamp: 1318874398, //but sometimes then can be the same
item_id: 1234,
},
]
And collection (example set available on MongoPlayground) with the following schema:
{
realm: Number,
timestamp: Number,
item_id: Number,
field: Number, //any other useless fields in this case
}
My problem is, how to $group the values from the collection via the aggregation framework by using the already available set of data (from groupBy) ?
What have been tried already.
Okay, let skip crap ideas, like:
for (const element of groupBy) {
//array of `find` queries
}
My current working aggregation query is something like that:
//first stage
{
$match: {
"item": 1234
"realm" [1,2,3,4...,10]
}
},
{
$group: {
_id: {
realm: '$realm',
},
latest_timestamp: {
$max: '$timestamp',
},
data: {
$push: '$$ROOT',
},
},
},
{
$unwind: '$data',
},
{
$addFields: {
'data.latest_timestamp': {
$cond: {
if: {
$eq: ['$data.timestamp', '$latest_timestamp'],
},
then: '$latest_timestamp',
else: '$$REMOVE',
},
},
},
},
{
$replaceRoot: {
newRoot: '$data',
},
},
//At last, after this stages I can do useful job
but I found it a bit obsolete, and I already heard that using [.mapReduce][1] could solve my problem a bit faster, than this query. (But official docs doesn't sound promising about it) Does it true?
As for now, I am using 4 or 5 stages, before start working with useful (for me) documents.
Recent update:
I have checked the $facet stage and I found it curious for this certain case. Probably it will help me out.
For what it's worth:
After receiving documents after the necessary stages I am building a representative cluster chart, that you may also know as a heatmap
After that I was iterating each document (or array of objects) one-by-one to find their correct x and y coordinated in place which should be:
[
{
x: x (number, actual $price),
y: y (number, actual $realm),
value: price * quantity,
quantity: sum_of_quantity_on_price_level
}
]
As for now, it's old awful code with for...loop inside each other, but in the future, I will be using $facet => $bucket operators for that kind of job.
So, I have found an answer to my question in another, but relevant way.
I was thinking about using $facet operator and to be honest, it's still an option, but using it, as below is a bad practice.
//building $facet query before aggregation
const ObjectQuery = {}
for (const realm of realms) {
Object.assign(ObjectQuery, { `${realm.name}` : [ ... ] }
}
//mongoose query here
aggregation([{
$facet: ObjectQuery
},
...
])
So, I have chosen a $project stage and $switch operator to filter results, such as $groups do.
Also, using MapReduce could also solve this problem, but for some reason, the official Mongo docs recommends to avoid using it, and choose aggregation: $group and $merge operators instead.
This is my Model :-
"_id":{
"$oid":"5f0dbca73ef98355649d7cc7"
},
"name":"Multiple Image test",
"description":"Awesome",
"price":{
"$numberInt":"15000"
},
"images":[
{
"_id":{
"$oid":"5f0dbca73ef98355649d7cc9"
},
"data":{
"$binary":{
"base64":"random buffer data",
"subType":"00"
}
},
"contentType":"image/jpeg"
},
{
"_id":{
"$oid":"5f0dbca73ef98355649d7cc8"
},
"data":{
"$binary":{
"base64":"Random buffer data",
"subType":"00"
}
},
"contentType":"image/jpeg"
}
],
}
Now how can i access a particular image data from the images field ?
I am using mongoose js so which query can be used and how to use it to access the data. Any kind of help would be appreciated.
And to boil Asiri's comment into something that fits your case:
You use the $elemMatch operator: https://docs.mongodb.com/manual/reference/operator/query/elemMatch/
MyModel.find({'images': {$elemMatch: {_id: ObjectId("5f0dbca73ef98355649d7cc8")}}}
You might not need the cast to ObjectId. (Now you should be able to guess that I haven't tested this answer, and for that I'm profoundly sorry.)
I have events collection inserted with a below record in ARANGODB. ( I am new to Arango )
INSERT {
"source": "ABC",
"target": "ZYX",
"tranno": "ABCDEF",
"type": "REST",
"attributes" : { "myID" : "12345"}
} INTO events
But trying to create full text index on attributes, resulting in error as given below. It would be great if you could help with this.
events.createIndex ({ type: "fulltext", fields: [ "attributes" ], minLength: 3 })
Query: AQL: syntax error, unexpected identifier near 'events.createIndex ({ type: "ful...' at position 1:1 (while parsing)
Unlike SQL, AQL is a language used for data selection and data manipulation.
It is not a data definition language, so you can't use AQL to create indexes.
In order to create an index, please use ArangoDB's web interface (Collections => target collection => Indexes => "+" icon) or the ArangoShell. The ArangoShell is a separate executable that is shipped with all ArangoDB packages.
In the ArangoShell you can use the command
db.events.createIndex ({ type: "fulltext", fields: [ "attributes" ], minLength: 3 })
to create the index.
I'm having a hard time understanding why I keep getting 0 results back from a query I am trying to perform. Basically I am trying to return only results within a date range. On a given table I have a createdAt which is a DateTime scalar. This basically gets automatically filled in from prisma (or graphql, not sure which ones sets this). So on any table I have the createdAt which is a DateTime string representing the DateTime when it was created.
Here is my schema for this given table:
type Audit {
id: ID! #unique
user: User!
code: AuditCode!
createdAt: DateTime!
updatedAt: DateTime!
message: String
}
I queried this table and got back some results, I'll share them here:
"getAuditLogsForUser": [
{
"id": "cjrgleyvtorqi0b67jnhod8ee",
"code": {
"action": "login"
},
"createdAt": "2019-01-28T17:14:30.047Z"
},
{
"id": "cjrgn99m9osjz0b67568u9415",
"code": {
"action": "adminLogin"
},
"createdAt": "2019-01-28T18:06:03.254Z"
},
{
"id": "cjrgnhoddosnv0b67kqefm0sb",
"code": {
"action": "adminLogin"
},
"createdAt": "2019-01-28T18:12:35.631Z"
},
{
"id": "cjrgnn6ufosqo0b67r2tlo1e2",
"code": {
"action": "login"
},
"createdAt": "2019-01-28T18:16:52.850Z"
},
{
"id": "cjrgq8wwdotwy0b67ydi6bg01",
"code": {
"action": "adminLogin"
},
"createdAt": "2019-01-28T19:29:45.616Z"
},
{
"id": "cjrgqaoreoty50b67ksd04s2h",
"code": {
"action": "adminLogin"
},
"createdAt": "2019-01-28T19:31:08.382Z"
}]
Here is my getAuditLogsForUser schema definition
getAuditLogsForUser(userId: String!, before: DateTime, after: DateTime): [Audit!]!
So to test I would want to get all the results in between the last and first.
2019-01-28T19:31:08.382Z is last
2019-01-28T17:14:30.047Z is first.
Here is my code that would inject into the query statement:
if (args.after && args.before) {
where['createdAt_lte'] = args.after;
where['createdAt_gte'] = args.before;
}
console.log(where)
return await context.db.query.audits({ where }, info);
In playground I execute this statement
getAuditLogsForUser(before: "2019-01-28T19:31:08.382Z" after: "2019-01-28T17:14:30.047Z") { id code { action } createdAt }
So I want anything that createdAt_lte (less than or equal) set to 2019-01-28T17:14:30.047Z and that createdAt_gte (greater than or equal) set to 2019-01-28T19:31:08.382Z
However I get literally no results back even though we KNOW there is results.
I tried to look up some documentation on DateTime scalar in the graphql website. I literally couldn't find anything on it, but I see it in my generated prisma schema. It's just defined as Scalar. With nothing else special about it. I don't think I'm defining it elsewhere either. I am using Graphql-yoga if that makes any difference.
(generated prisma file)
scalar DateTime
I'm wondering if it's truly even handling this as a true datetime? It must be though because it gets generated as a DateTime ISO string in UTC.
Just having a hard time grasping what my issue could possibly be at this moment, maybe I need to define it in some other way? Any help is appreciated
Sorry I misread your example in my first reply. This is what you tried in the playground correct?
getAuditLogsForUser(
before: "2019-01-28T19:31:08.382Z",
after: "2019-01-28T17:14:30.047Z"
){
id
code { action }
createdAt
}
This will not work since before and after do not refer to time, but are cursors used for pagination. They expect an id. Since id's are also strings this query does not throw an error but will not find anything. Here is how pagination is used: https://www.prisma.io/docs/prisma-graphql-api/reference/queries-qwe1/#pagination
What I think you want to do is use a filter in the query. For this you can use the where argument. The query would look like this:
getAuditLogsForUser(
where:{AND:[
{createdAt_lte: "2019-01-28T19:31:08.382Z"},
{createdAt_gte: "2019-01-28T17:14:30.047Z"}
]}
) {
id
code { action }
createdAt
}
Here are the docs for filtering: https://www.prisma.io/docs/prisma-graphql-api/reference/queries-qwe1/#filtering
OK so figured out it had to do with the fact that I used "after" and "before" as an argument variable. I have no clue why this completely screws everything up, but it just wont return ANY results if you have this as a argument. Very strange. Must be abstracting some other variable somehow, probably a bug on graphql's end.
As soon as I tried a new variable name, viola, it works.
This is also possible:
const fileData = await prismaClient.fileCuratedData.findFirst({
where: {
fileId: fileId,
createdAt: {
gte: fromdate}
},
});
I have problem with presenting complex data structure in cassandra.
JSON example of data :
{
"A": {
"A_ID" : "1111"
"field1": "value1",
"field2": "value2",
"field3": [
{
"id": "id1",
"name": "name1",
"segment": [
{
"segment_id": "segment_id_1",
"segment_name": "segment_name_1",
"segment_value": "segment_value_1"
},
{
"segment_id": "segment_id_2",
"segment_name": "segment_name_2",
"segment_value": "segment_value_2"
},
...
]
},
{
"id": "id2",
"name": "name2",
"segment": [
{
"segment_id": "segment_id_3",
"segment_name": "segment_name_3",
"segment_value": "segment_value_3"
},
{
"segment_id": "segment_id_4",
"segment_name": "segment_name_4",
"segment_value": "segment_value_4"
},
...
]
},
...
]
}
}
Will be used only one query:
Find by A_ID.
I think this data should store in one TABLE (Column Family) and without serialization/deserialization operations for more efficiency.
How can I do this if CQL does not support nested maps and lists?
Cassandra 2.1 adds support for nested structures: https://issues.apache.org/jira/browse/CASSANDRA-5590
The downside to "just store it as a json/protobuf/avro/etc blob" is that you have to read-and-rewrite the entire blob to update any field. So at the very least you should pull your top level fields into Cassandra columns, leveraging collections as appropriate.
As you will be using it just as a key/value, you could actually store it either as JSON, or for saving data more efficiently, something like BSON or event Protobuf.
I personally would store it in the Protobuf record, as it doesn't save the field names which may be repeating in your case.