Query by Interleaved table fields using Spring Data Spanner - google-cloud-spanner

I'm trying to query by a field of a Interleaved table using Spring Data Spanner. The id comparison is automatically done by Spring Data Spanner when it does the ARRAY STRUCT inner join, but I'm not being able to add a WHERE clause to the Interleaved table query.
Considering the example below:
CREATE TABLE Singers (
Id INT64 NOT NULL,
FirstName STRING(1024),
LastName STRING(1024),
SingerInfo BYTES(MAX),
) PRIMARY KEY (Id);
CREATE TABLE Albums (
SingerId INT64 NOT NULL,
Id INT64 NOT NULL,
AlbumTitle STRING(MAX),
) PRIMARY KEY (SingerId, Id),
INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
Let's suppose I want to query all Singers where the AlbumTitle is "Fear of the Dark", how can I write a repository method to achieve that using Spring Data Spanner?

You're example seems to either contain a couple of typos, or it is otherwise not completely correct:
The Singers table has a column Id which is the primary key. That is in itself fine, but when creating a hierarchy of interleaved tables, it is recommended to prefix the primary key column with the table name. So it would be better to name it SingerId.
The Albums table has a SingerId column and an Id column. These two columns form the primary key of the Albums table. This is technically incorrect (and confusing), and also the reason that I think that your example is not completely correct. Because Albums is interleaved in Singers, Albums must contain the same primary key columns as the Singers table, in addition to any additional columns that form the primary key of Albums. In this case Id references the Singers table, and the SingerId is an additional column in the Albums table that has nothing to do with the Singers table. The primary key columns of the parent table must also appear in the same order as in the parent table.
The example data model should therefore be changed to:
CREATE TABLE Singers (
SingerId INT64 NOT NULL,
FirstName STRING(1024),
LastName STRING(1024),
SingerInfo BYTES(MAX),
) PRIMARY KEY (SingerId);
CREATE TABLE Albums (
SingerId INT64 NOT NULL,
AlbumId INT64 NOT NULL,
AlbumTitle STRING(MAX),
) PRIMARY KEY (SingerId, AlbumId),
INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
From this point on you can consider the SingerId column in the Albums table as a foreign key relationship to a Singer and treat it as you would in any other database system. Note also that there can be multiple albums for each singer, so a query for ...I want to query all Singers where the AlbumTitle is "Fear of the Dark" is slightly ambiguous. I would rather say:
Give me all singers that have at least one album with the title "Fear of the Dark"
A valid query for that would be:
SELECT *
FROM Singers
WHERE SingerId IN (
SELECT SingerId
FROM Albums
WHERE AlbumTitle='Fear of the Dark'
)

Related

How can one interleave with two tables?

Let's pretend I have the schema
CREATE TABLE Account (
AccountId BYTES(MAX),
Foo STRING(1024)
) PRIMARY KEY (AccountId);"
CREATE TABLE Customer (
CustomerId BYTES(MAX),
Bar STRING(1024)
) PRIMARY KEY (CustomerId);"
And I create a new table:
CREATE TABLE Order (
AccountId BYTES(MAX),
CustomerId BYTES(MAX),
Baz STRING(1024)
) PRIMARY KEY (AccountId, CustomerId);"
That I'd like to INTERLEAVE with Account and Customer. How can one do this? I'm familiar with how to INTERLEAVE with one table, when building a hierarchy, but not sure how to achieve this with two tables.
You cannot interleave one table in two other tables, but you can create a hierarchy of interleaved tables. In your example, that would mean interleaving the Customer table in the Account table, and the Order table in the Customer table like this:
CREATE TABLE Account (
AccountId BYTES(MAX),
Foo STRING(1024)
) PRIMARY KEY (AccountId);
CREATE TABLE Customer (
AccountId BYTES(MAX),
CustomerId BYTES(MAX),
Bar STRING(1024)
) PRIMARY KEY (AccountId, CustomerId),
INTERLEAVE IN PARENT Account;
CREATE TABLE Order (
AccountId BYTES(MAX),
CustomerId BYTES(MAX),
OrderId BYTES(MAX),
Baz STRING(1024)
) PRIMARY KEY (AccountId, CustomerId, OrderId),
INTERLEAVE IN PARENT Customer;
The reason that you cannot interleave one table in two other tables in the way that you ask, is that interleaving tables actually means that Cloud Spanner will store the rows of the interleaved child table physically together with the parent table. There's no way to determine where to store the child rows if you were to interleave a table with two different, unrelated parent tables.

Apache Cassandra table not sorting by name or title correctly

I have the following Apache Cassandra Table working.
CREATE TABLE user_songs (
member_id int,
song_id int,
title text,
timestamp timeuuid,
album_id int,
album_title text,
artist_names set<text>,
PRIMARY KEY ((member_id, song_id), title)
) WITH CLUSTERING ORDER BY (title ASC)
CREATE INDEX user_songs_member_id_idx ON music.user_songs (member_id);
When I try to do a select * FROM user_songs WHERE member_id = 1; I thought the Clustering Order by title would have given me a sorted ASC of the return - but it doesn't
Two questions:
Is there something with the table in terms of ordering or PK?
Do I need more tables for my needs in order to have a sorted title by member_id.
Note - my Cassandra queries for this table are:
Find all songs with member_id
Remove a song from memeber_id given song_id
Hence why the PK is composite
UPDATE
It is simialr to: Query results not ordered despite WITH CLUSTERING ORDER BY
However one of the suggestion in the comments is to put member_id,song_id,title as primary instead of the composite that I currently have. When I do that It seems that I cannot delete with only song_id and member_id which is the data that I get for deleting (hence title is missing when deleting)

Cassandra order by on combination of composite keys

I originally wrote a table that tracks feeds that have been assigned to a user for review.
create table user_feed
{
userid uuid,
languageid uuid,
topicid_uuid,
dateinserted timeuuid,
primary key (userid, languageid, topicid, dateinserted)
};
I realized soon after I created this table that I wouldn't be able to sort this table (order by DESC) by dateinserted because for some weird reason, in Cassandra I can only order by the second (and last) column of a composite key table (as in, the table has to have 2 composite keys and order by can only happen on the second column of this key) so I changed my table to this:
create table user_feed
{
userid uuid,
languageid uuid,
topicid_uuid,
dateinserted timeuuid,
primary key (userid, dateinserted)
};
and now I was able to run a query to get the latest feeds for the user, using order by.
However, I have a new requirement that requires me to sort the feeds by a combination of (languageid + userid) or (topicid + userid) or (languageid + topicid + userid).
I had an idea to create three new tables and have the keys combined into one key column. For example, for userid + topic query, I would use:
create table user_feed_by_topic
{
usertopicidkey text,
dateinserted timeuuid,
primary key (usertopicidkey, dateinserted)
};
where usertopididkey = userid.toString() + topicid.toString().
Of course, this solution requires 4 separate inserts whenever I need to insert a new feed row since I have 4 rows, tracking identical data but partitioned differently to allow sorting.
My question is, is there a better way to do this? Is there any way to achieve what I want (query by a combination of columns and order by another column) or am I stuck with my 4 table design approach?
Many thanks,
Cassandra will order all rows based on the PKs clustering columns. In case your PK is primary key (userid, languageid, topicid, dateinserted) all rows will be sorted by languageid, topicid and dateinserted in ascending order. This implies that all rows will only be sorted within a specific language and topic by date. You'd have to use the date as the first clustering key column to change this behaviour.
Its common practice to denormalize your data across multiple tables to implement different ordering strategies.

how to query to get an ordered result from cassandra table

i need to query like:
select * from items where group="a" order by update_time desc;
however,the column "update_time" of each row will not be fixed,it will change as we need.
so,how can i design the cassandra tables to achieve the goal : querying to get an ordered result?
i need to query like:
select * from items where group="a" order by update_time desc;
however,the column "update_time" of each row will not be fixed,it will change as we need.
so,how can i design the cassandra tables to achieve the goal : querying to get an ordered result?
Sample Table
CREATE TABLE items (
ItemA text,
update_time timeuuid,
ItemB int,
PRIMARY KEY ( ItemA, update_time)
) WITH CLUSTERING ORDER BY (update_time DESC);
Ordering Field should be part of clustering key.
Please refer the above table, where we ordering the rows update_time as Desending order.

Search For Multiple Properties by Value Cassandra

How can we design a cassandra model for storing a group say 'Item' having n properties P1,P2...PN and
retrieve the item by searching the item property by value
For Example
Item Item_Type State Country
Item1 Solid State1 Country1
In traditional RDBMS we can issue a select query
select Item from table where Item_Type='Solid' and Country='Country1'
How can we achieve such a model in NoSql Cassandra,we have tried cassandra secondary index but it seems to be not applicable.
For properties P1..PN you will have to ALTER the table as with RDMSs or use an outdated thrift protocol based API (i'd suggest Astyanax for this) which can add columns on-the-fly (but this is considered bad practice). Another possibility is to use a collection of properties where one of your columns is a collection of values:
CREATE TABLE item (
item_id text PRIMARY KEY,
property set<text>
);
For SELECTing values with multiple WHERE clauses you can use secondary indexing or if you know what columns are going to be required in the WHERE clause you can use a composite key, but I would recommend secondary indexes if you are going to have a lot of columns that need to be in the WHERE clause.
The answer to many Cassandra data modelling questions is: denormalize.
You can solve your problem by building indexes yourself. For each property have a row with the property name as key and the values and item ID as columns:
CREATE TABLE item_index (
property TEXT,
value TEXT,
item_id TEXT,
PRIMARY KEY (property, value, item_id)
)
you also need a table for the items:
CREATE TABLE items (
item_id TEXT,
property TEXT,
value TEXT,
PRIMARY KEY (item_id, property)
)
(notice that in the item_index table all three columns are in the primary key, because I assume that multiple items can have the same value for the same property, but in the items table only has item_id and property in the primary key, because I assume that an item can only have one value for a property -- you can solve this for multi-valued properties too, but you have to do a few more things and it will complicate the example)
Every time you insert an item you also insert a row in the item_index table for each property of the item:
INSERT INTO items (item_id, property, value) VALUES ('thing1', 'color', 'blue');
INSERT INTO items (item_id, property, value) VALUES ('thing1', 'shoe_size', '8');
INSERT INTO item_index (property, value, item_id) VALUES ('color', 'blue', 'thing1');
INSERT INTO item_index (property, value, item_id) VALUES ('shoe_size', '8', 'thing1');
(you might want to insert the item as a single BATCH command too)
to find items by shoe size you need to do two queries (sorry, but that's the price you pay for the flexibility -- maybe someone else can come up with a solution that does not require two queries):
SELECT item_id FROM item_index WHERE property = 'shoe_size' AND value = '8';
SELECT * FROM items WHERE item_id = ?;
where the ? is one of the item_ids returned from the first query (because more than one can match, remember).

Resources