Cassandra + Fetch the last records using in query - node.js

I am new in this cassandra database using with nodejs.
I have user_activity table. In this table data will insert based on user activity.
Also I have some user list. I need to fetch the data in that particular users and last record.
I don't interest to put the query in for loop. Have any other idea to achieve this?
Example Code:
var userlist = ["12", "34", "56"];
var query = 'SELECT * FROM user_activity WHERE userid IN ?';
server.user.execute(query, [userlist], {
prepare : true
}, function(err, result) {
console.log(results);
});
How to get the user lists for last one ?
Example:
user id = 12 - need to get last record;
user id = 34 - need to get last record;
user id = 56 - need to get last record;
I need to get these 3 records.
Table Schema:
CREATE TABLE test.user_activity (
userid text,
ts timestamp,
clientid text,
clientip text,
status text,
PRIMARY KEY (userid, ts)
)

It is not possible if you use the IN filter.
If it is a single user_id filter you can apply order by. Of course you need a column for inserted/updated time. So query will be like this:
SELECT * FROM user_activity WHERE user_id = 12 ORDER BY updated_at LIMIT 1;

You can put N value to get number of records
SELECT * FROM user_activity WHERE userid IN ? ORDER BY id DESC LIMIT N

Related

Prisma how to add hours while comparing columns in the same table

I am using NestJS and Prisma[4.4.0].
My table:
id: int
created_at: Timestamp
first_active: Timestamp
Query that I want to implement
select count(*) from {table} where id = {id} and first_active <= {created_at} + 48hours
I want to get a count of users which were active within 48 hours of creation.
With https://www.prisma.io/docs/reference/api-reference/prisma-client-reference#compare-columns-in-the-same-table now I can access the column name.
Example
where: {
// find all users where 'name' is in a list of tags
id: ${id},
first_active: {
this.prisma.table.fields.created_at // Not sure how to + 48 hours
}
},
any suggestion on how I can add time (72 hours) to the created_at
You will need to perform two queries to accomplish this, for now. First, retrieve the created_at, then add the necessary hours.
You could create a feature request if you would like to see this functionality added to Prisma.

How to avoid Cassandra ALLOW FILTERING?

I have Following Data Model :-
campaigns {
id int PRIMARY KEY,
scheduletime text,
SchduleStartdate text,
SchduleEndDate text,
enable boolean,
actionFlag boolean,
.... etc
}
Here i need to fetch the data basing on start date and end data with out ALLOW FILTERING .
I got more suggestions to re-design schema to full fill the requirement But i cannot filter the data basing on id since i need the data in b/w the dates .
Some one give me a good suggestion to full fill this scenario to execute Following Query :-
select * from campaings WHERE startdate='XXX' AND endDate='XXX' ; // With out Allow Filtering thing
CREATE TABLE campaigns (
SchduleStartdate text,
SchduleEndDate text,
id int,
scheduletime text,
enable boolean,
PRIMARY KEY ((SchduleStartdate, SchduleEndDate),id));
You can make the below queries to the table,
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx'; -- to get the answer to above question.
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx' and id = 1; -- if you want to filter the data again for specific ids
Here the SchduleStartdate and SchduleEndDate is used as the Partition Key and the ID is used as the Clustering key to make sure the entries are unique.
By this way, you can filter based on start, end and then id if needed.
One downside with this will be if you only need to filter by id that wont be possible as you need to first restrict the partition keys.

Cassandra QueryBuilder not returning any result, whereas same query works fine in CQL shell

SELECT count(*) FROM device_stats
WHERE orgid = 'XYZ'
AND regionid = 'NY'
AND campusid = 'C1'
AND buildingid = 'C1'
AND floorid = '2'
AND year = 2017;
The above CQL query returns correct result - 32032, in CQL Shell
But when I run the same query using QueryBuilder Java API , I see the count as 0
BuiltStatement summaryQuery = QueryBuilder.select()
.countAll()
.from("device_stats")
.where(eq("orgid", "XYZ"))
.and(eq("regionid", "NY"))
.and(eq("campusid", "C1"))
.and(eq("buildingid", "C1"))
.and(eq("floorid", "2"))
.and(eq("year", "2017"));
try {
ResultSetFuture tagSummaryResults = session.executeAsync(tagSummaryQuery);
tagSummaryResults.getUninterruptibly().all().stream().forEach(result -> {
System.out.println(" totalCount > "+result.getLong(0));
});
I have only 20 partitions and 32032 rows per partition.
What could be the reason QueryBuilder not executing the query correctly ?
Schema :
CREATE TABLE device_stats (
orgid text,
regionid text,
campusid text,
buildingid text,
floorid text,
year int,
endofwindow timestamp,
categoryid timeuuid,
devicestats map<text,bigint>,
PRIMARY KEY ((orgid, regionid, campusid, buildingid, floorid,year),endofwindow,categoryid)
) WITH CLUSTERING ORDER BY (endofwindow DESC,categoryid ASC);
// Using the keys function to index the map keys
CREATE INDEX ON device_stats (keys(devicestats));
I am using cassandra 3.10 and com.datastax.cassandra:cassandra-driver-core:3.1.4
Moving my comment to an answer since that seems to solve the original problem:
Changing .and(eq("year", "2017")) to .and(eq("year", 2017)) solves the issue since year is an int and not a text.

Cassandra Schema for retrieving date-ordered records

Folks,
I would like to solve the following with one table in Cassandra. Said service tracks when users open an asset. On subsequent events to the same asset, we simply over-write the accessDate.
example record:
{ userId: "string", assetId: "string", accessDate: unixTimestamp }
With this said, we need to fulfill the following access requirements (each requirement has its own bulletpoint for readability):
Be able to return all assets a user has opened, and at what time.
This is easy to achieve, table could look like:
CREATE TABLE user_assets_tracker (
userId uuid,
accessDate timestamp,
assetId uuid,
PRIMARY KEY (userid, accessDate, assetId)
);
This allows us to query for all assets, and when each was last accessed.
SELECT *
FROM user_assets_tracker
WHERE userId = 522b1fe2-2e36-4cef-a667-cd4237d08b89
ORDER BY accessDate DESC;
>
Dandy. Now the harder bits, which I am unsure about, was hoping you folks could chime in:
Show me all the assets user added in the past 30 days.
Naturally the LIMIT here is not what we need. Also, we may need to have 2 tables to achieve this.
SELECT *
FROM user_assets_tracker
WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89
ORDER BY accessDate DESC;
LIMIT 10; ?????
Show me the last accessed item for the user. I think this one is easier, the LIMIT 1 solves that.
This is probably straight forward, with this schema:
CREATE TABLE user_assets_tracker (
userId uuid,
accessDate timestamp,
assetId uuid,
PRIMARY KEY (userid, accessDate, assetId)
);
SELECT *
FROM user_assets_tracker
WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89
ORDER BY accessDate DESC;
LIMIT 1;
Retrieve the full record for a particular userId + assetId
Since accessDate comes before assetId in our schema, I am not sure how to do this as well. Another table?
Thanks!!
PS It seems that SASI Index could be the solution
Though you are always selecting assetid orderby accessDate desc.
Define your schema with order by accessDate desc
CREATE TABLE user_assets_tracker (
userid uuid,
accessdate timestamp,
assetid uuid,
PRIMARY KEY (userid, accessdate, assetid)
) WITH CLUSTERING ORDER BY (accessdate DESC, assetid ASC);
Now you don't need to specify order by accessDate desc every time. it will by default order your data by accessDate desc
Show me all the assets user added in the past 30 days.
First get timestamp of 30 day ago.
Let's current timestamp of 30 day ago is : 2017-02-05 12:00:00+0000
Now you can query :
SELECT * FROM user_assets_tracker WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89 AND accessdate >= '2017-02-05 12:00:00+0000'
Retrieve the full record for a particular userId + assetId
If you are using Cassandra 3.0 or above you can use Materialized Views
CREATE a Materialized View :
CREATE MATERIALIZED VIEW user_assets AS
SELECT *
FROM user_assets_tracker
WHERE userid IS NOT NULL AND assetid IS NOT NULL AND accessdate IS NOT NULL
PRIMARY KEY (userid, assetid, accessdate);
Now if you want to get all data with userid and assetid, here is the query
SELECT * FROM user_assets WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89 AND assetid = 1d45e6c2-02a1-11e7-aac5-b9ab92bee74c;
Here is another thing, if huge data is inserted into a single user, you should add time bucket with userid as partition key.For more check the answer https://stackoverflow.com/a/41857183/2320144

Wide column pagination in CQL table

Let say i have this table
CREATE TABLE comments
(
postId uuid,
commentId timeuuid,
postedBy text,
postedById uuid,
text text,
blocked boolean,
anonymous boolean,
PRIMARY KEY(postId, commentId)
)
How can I perform wide column pagination on this table something like :
SELECT * FROM comments WHERE postId = '123' AND commentId > '34566'
I was going through Automatic Paging but confused with three approaches mentioned in this document that which should I use
If you want to compare field for timeuuid, you need to using the expression like below:
SELECT * FROM comments WHERE postId = '123' AND commentId > maxTimeuuid('2013-08-01 15:05-0500')
Once you've received the ResultSet from the execute method, you should be able to simply iterate over it using the iterator method. Pagination will happen automatically, based on the value specified in setFetchSize or the default value of 5000.

Resources