If I have 2 column families one for customer info and another for customer address info, how can I insert customer info and their address info to the two separate column families with the same row key(customer id) ?
Use the batch insert.
BEGIN BATCH
DML for insert into customer info ;
DML for insert into customer address info, ;
APPLY BATCH ;
Why do you need two tables with the same primary? Can't you club them into one?
Related
I'm confused as to how primary keys in Cassandra allow for quick data access. Say for example I create a table of Students with the following schema columns:
I choose the primary key to be Student Id. My understanding is that all the students will be placed around the cluster based on some hash of this value. Say I also choose the Country as a Clustering Column. So Within each partition of the students (who have been split based on their Id) they will be ordered by Country (presumably alphabetically).
So if I then want to retrieve all students for a specific country will I have to visit multiple nodes in the cluster? While the students have been ordered by Country within each node there is nothing to say that all the students for a specific country have been stored on the same node? Is this type of query even supported?
If I had only added 5 students to a 5 nodes cluster would it be possible that all the students would be stored on separate nodes if the Student Id was a UUID?
So if I then want to retrieve all students for a specific country will I have to visit multiple nodes in the cluster?
Yes.
While the students have been ordered by Country within each node there is nothing to say that all the students for a specific country have been stored on the same node?
Correct.
Is this type of query even supported?
It is but that's considered an anti-pattern in Cassandra. What happens is that the coordinator (the node that receives the request from the client) will have to query ALL other nodes since it will have to scan all rows for that column family.
If I had only added 5 students to a 5 nodes cluster would it be possible that all the students would be stored on separate nodes if the Student Id was a UUID?
Yes.
The way your problem can be solved is by having a column family for each query (one for selecting by Student ID and the other for selecting by Country, each one having a different primary query) while duplicating the rows (when you create a student you have to insert it in both column families).
I have a table in which I am trying to create a column that will increment based on conditions being valid.
In my example, I need to update the department count where the department = Marketing. Anyone have a good way to do this using SQLite?
Current Table
Name Department Department_Count
James Accounting NULL
Jennifer Marketing NULL
Micheal Warehouse NULL
Natalie Marketing NULL
Rebecca Marketing NULL
Update Table
Name Department Department_Count
James Accounting NULL
Jennifer Marketing 1
Micheal Warehouse NULL
Natalie Marketing 2
Rebecca Marketing 3
Edit:
Currently, I insert rows where the department is 'Marketing' into a new table and then I used primary key or rowid to create an auto increment so I can number these types of items.
This requires me to create a new table which is not the best since it takes up so much space and the table is redundant since I have the underlying data in this original table already.
I'm using python to interact with my database, if that helps with solving this problem.
UPDATE
Actually, thinking about it a little further, you may not need a trigger:
INSERT INTO Table (Department, Department_Count)
VALUES (?, (SELECT MAX(IFNULL(Department_Count, 0)+1 FROM Table WHERE Department = ?))
may give you what you want.
Original Answer
You cannot do this declaratively, but you can probably accomplish what you want proceduraly using a trigger.
Two possible strategies:
Use an AFTER INSERT trigger to execute an UPDATE statement against the most recently inserted row (RowID will be available AFTER INSERT) to set the Departement_Count column to a SELECT expression based on the current data in the table.
Use an INSTEAD OF trigger to perform an alternate INSERT combining the values from the NEW cursor with a similar SELECT statement to get the maximum value (plus 1) from the Department_Count column.
I am using cassandra.
I have two column families A and B. Both the column families have same data but both have different primary keys. Now I am using a batch statement to update the rows in these two tables.
Table schema is as follows:
Primary Key of table A [id1(partition key) id2(partition key) id3(clustering key)]
Primary Key of table B [id1(partition key) id2(partition key) state(clustering key) id3(clustering key)]
I want to update the state of both the tables. State is cluster key in B and in table A, it is simple column.
What I do is fetch the state from A and consider it as old state.
Then in batch what I do is first delete row from table A, then Delete row from Table B, Insert new row in table A and the insert new row in table B.
Note : Using the old state that is fetched from A, I make primary key of B and then delete from B and the insert new row in B.
It is working fine but for parallel requests it is not.
If 2 requests are coming for same primary key from 2 different instances, then I am getting the problem. Table B gets the two entries with old and new state.
So How can I solve that in cassandra?
Cassandra 2.0 and above supports Lightweight transactions.
where you have "IF NOT EXISTS" condition while you are inserting.
In your case, you can't check it when you are fetching the state from table A, but you can restrict it while inserting which will not allow duplicates in your case. E.g.
insert into A(id1, id2, state, id3)
values('val1', 'val1', 'val3', 'val4')
IF NOT EXISTS
So first one which executes will pass, but the second one will fail. So handle a retry / any failure mechanism from your client based on your business requirement.
Check this docs for more information: https://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
http://postimg.org/image/89yglfakx/
Refer the above link for the image as a reference.
I have an excel file which gets updated on a daily basis i.e the data is always different every time.
I am pulling the data from the excel sheet into the table using Talend. I have a primary key Company_ID defined in the table.
The error I am facing is that the Excel sheet has few duplicate Company_ID values.
It will also pick up more duplicate values in the future as the excel Excel file will be updated on a daily basis ,so it will have different duplicate values in Company_ID field.
I want to choose the unique data record for the Company ID 1,the record that doesn't have null in the rest of the columns.
For Company_ID 3 ,there is a null value for the columns which is ok since it is a unique record for that company_id.
How do I choose a unique row which has maximum no. of column values present ie for eg in the case of Company ID 1 in Talend ?
I tried using Tuniqrow but it uniquely picks up the first record from the duplicates,so if my first record has null values from the duplicate Company ID then it won't work.
I'm a newbie to cassandra. I have a confusion with archival of data. Following is the approach I am trying to implement.
Filter the records to be archived.
Create a new column family
Move the filtered records to the new column family
Delete the filtered records from existing column family
Filter the records to be archived. - Achieved with the use of secondary indexes
Create a new column family Create Query
Move the filtered records to the new column family I thought of implementing this by the approach mentioned in cassandra copy data from one columnfamily to another columnfamily But this copies all the data from column family 1 to 2. Is it possible to move only the filtered rows to new column family?
Delete the filtered records from existing column family I am not sure of how to achieve this in CQL. Please help me.
Additionally, Let me know if there is any better approach.
COPY then DELETE sounds like a valid strategy here.
For deleting rows, take a look at the DELETE command, it takes the same WHERE condition as a SELECT does.
Unfortunately this won't work for a query that requires "ALLOW FILTERING", although there is an enhancement request to add this.