Inverted index: Delete then Insert pattern? - cassandra

I have an inverted index table on a table of Users. The table allows querying users by last name. It is called "users_by_lastname".
Primary key of this table has "lastname" in it, so it cannot be updated.
If a user changes their last name in the main "Users" table, should I be deleting and re-inserting into the inverted index table, "users_by_last name"?
I cannot update a primary key column in Cassandra... Are there other patterns that handle this better?

In Cassandra 3.0 you can deal with this problem by creating the inverted index table as a materialized view of the Users table. Then Cassandra will take care of maintaining the view automatically whenever you update the base table.
On earlier versions of Cassandra your only option would be to do a delete and then insert with the new last name in the application maintained inverted index table.

Related

can we create datasync between two tables without indexes in azure data sync

can we sync two tables of two different databases in azure without indexes. As, we all know ,if two databases has to be in sync hub database and member data base should have same schema. But is there chance of avoiding indexes.
please help me with this
Each table needs to have a primary key and you cannot avoid that. Please read the following excerpt from Microsoft documentation:
Each table must have a primary key. Don't change the value of the
primary key in any row. If you have to change a primary key value,
delete the row and recreate it with the new primary key value.
Source here.

How to remove data from an index when indexing a view in Azure Search?

I index the view from my database. When I add an entry to the table, an entry is also added to the view. Index indexes new data. But when I delete this entry from the table in the index, it remains. How to set up a soft delete?
I read the documentation and it says it is necessary to add a field. Should I add it myself?
The only way to remove deleted documents from your index, is by using the SQL Integrated Change Detection feature or by setting up a Data Deletion Detection Policy (Soft Delete).
If you choose the second option, then you have to create a soft-delete column in your view.
Then tell your datasource that it should track the soft-delete field:

Cassandra - join two tables and save result to new table

I am working on a self-bi application where users can upload their own datasets which are stored in Cassandra tables that are created dynamically. The data is extracted from files that the user can upload. So, each dataset is written into its own Cassandra table modeled based on column headers in the uploaded file while indexing the dimensions.
Once the data is uploaded, the users are allowed to build reports, analyze, etc., from within the application. I need a way to allow users to merge/join data from two or more datasets/tables based on matching keys and write the result into a new Cassandra table. Once a dataset/table is created, it will stay immutable and data is only read from it.
user table 1
username
email
employee id
user table 2
employee id
manager
I need to merge data in user table 1 and user table 2 on matching employee id and write to new table that is created dynamically.
new table
username
email
employee id
manager
What would be the best way to do this?
The only option that you have is to do the join in your application code. There are just few details to suggest a proper solution.
Please add details about table keys, usage patterns... in general, in cassandra you model from usage point of view, i.e. starting with queries that you'll execute on data.
In order to merge 2 tables on this pattern, you have to do it into application, creating the third table (target table ) and fill it with data from both tables. You have to make sure that you read the data in pages to not OOM, it really depends on size of the data.
Another alternative is to build the joins into Spark, but maybe is too over-engineering in your case.
You can have merge table with primary key of user so that merged data goes in one row and that should be unique since it is one time action.
Than when user clicks you can go through one table in batches with fetch size (for java you can check query options but that is a way to have a fixed window which will be loaded and when reached move to next fetch size of elements). Lets say you have fetch size of 1000 items, iterate over them from one table and find matches in second table, and after 1000 is reached place batch of 1000 inserts to new table.
If that is time consuming you can as suggested use some other tool like Apache Spark or Spring Batch and do that in background informing user that it will take place.

CQL Select on column value

I am creating account data in Cassandra. Accounts are most commonly queried based on an account id. However, often the account is queried by a login name. I have created a user table with primary keys (account_id and login_name). Because of this, I have to "ALLOW FILTERING" on the table to query by the login_name.
Is there a better way to create the table that does not have the impact of a filterable table?
A possible approach is to define a new table that has the exact same elements in the primary key but in the reverse order: (login_name, account_id). You can still keep the table you present as the reference table, which stores all the relevant account data, but this new table would allow you more optimized queries based on login_name. I am not sure how much you would win compared with the query "allowing filtering"... But this kind of "data duplication" for query optimization is a normal procedure in NoSQL DBs.
HTH.

DB2 backup and restore db table preserving foreign key constraints

I'm currently having issue going through a test backup and restore of database table on my development machine for db2. Was never entirely successful. Although I was able to restore all data after a drop and re-create of the table, I wasn't able to reset the foreign key constraint as I got SQL error complaining that keys don't match. Here's my exact steps, I'm sure not entirely the right way to do it, but it does eventually restore the 5423 rows of data:
The process
export to /export/home/dale/comments.ixf of ixf messages /export/home/dale/msg.txt select * from .comments
Note: step 1 exports 5423 rows of data to a location
drop table .comments
import from /export/home/dale/comments.ixf of ixf create into .comments
Note: step 3 here creates the table but does not insert any data rows
load client from /export/home/dale/comments.ixf of ixf modified by identityoverride replace into .comments
Note: up until this step, I'm able to insert the 5423 rows of data in the recreated db table
alter table .comments add FOREIGN KEY (comments_id) REFERENCES .news (article_key)
Note: here alter table fails as db2 complaints that some comments_id does not match article_key
Could anyone help with my problem here? Thanks in advance
The error means that some of the rows you IMPORT into the Comments table refer to rows that do not exist in the News table.
You might not be forming the constraint correctly. The column name "comment_id" sounds like the primary key to the Comments table. You want the foreign key, which matches the primary key of the News table. It might also be called "article_key" or "article_id".
ALTER TABLE Comments
ADD FOREIGN KEY( article_key)
REFERENCES News( article_key);
If "comment_id" is really not the primary key of the "Comments" table, then the problem comes from not backing up and restoring both the News and Comments table at the same time.
You can either EXPORT and IMPORT the News table along with the Comments table, or remove the Comments that refer to missing News rows with something like this
DELETE FROM Comments
WHERE comments_id NOT IN (
SELECT article_key
FROM News
)
Before you do this, you might want to try listing the Comments which would be deleted by the above query
SELECT *
FROM Comments
WHERE comments_id NOT IN (
SELECT article_key
FROM News
)
I found a solution to my problem as well as my comments above,
user980717 resolved my first problem where I set the wrong column as the foreign key
For my 2nd issue, i.e. "SQL0668N Operation not allowed for reason code "1" on table "tablename". SQLSTATE=57016", I need to run the following command "set integrity for niwps.comments immediate checked" to ensure data satisfied all constraints defined in the table. And thanks to all who took the effort in helping me with my problems. Cheers

Resources