Copying a value from table to another using cassandra - cassandra

I have a very huge table in cassandra that consists of (caseid ,timestamp, activity)as columns with caseid and timestamp being the primary key.The values of caseid are getting repeated and I want to extract the 1st value of activity corresponding to a caseid and put it to another table(named initialActivity) that consists of only activity. Can someone please help me as to how I can acheive this using a cql query.Thanks.

Please try this
Insert into initialActivity() values
(select activity from preActivity where caseId = 111 LIMIT 1 );
Only first rows with column activity with caseId = 111 will get inserted into initialActivity table
Please refer this for more info
CQL

Related

Pyspark: Row count is not matching to the count of records appended

I am trying to identify and insert only the delta records to the target hive table from pyspark program. I am using left anti join on ID columns and it's able to identify the new records successfully. But I could notice that the total number of delta records is not the same as the difference between table record count before load and afterload.
delta_df = src_df.join(tgt_df, src_df.JOIN_HASH == tgt_df.JOIN_HASH,how='leftanti')\
.select(src_df.columns).drop("JOIN_HASH")
delta_df.count() #giving out correct delta count
delta_df.write.mode("append").format("hive").option("compression","snappy").saveAsTable(hivetable)
But if I could see delta_df.count() is not the same as count( * ) from hivetable after writting data - count(*) from hivetable before writting data. The difference is always coming higher compared to the delta count.
I have a unique timestamp column for each load in the source, and to my surprise, the count of records in the target for the current load(grouping by unique timestamp) is less than the delta count.
I am not able to identify the issue here, do I have to write the df.write in some other way?
It was a problem with the line delimiter. When the table is created with spark.write, in SERDEPROPERTIES there is no line.delim specified and column values with * were getting split into multiple rows.
Now I added the below SERDEPROPERTIES and it stores the data correctly.
'line.delim'='\n'

How to show last two rows only in tableview from sqlite using with QSqlQueryModel?

Below is my example code:
db = QSqlDatabase.addDatabase('QSQLITE')
db.setDatabaseName('book.db')
db.open()
model = QSqlQueryModel()
model.setQuery("SELECT * FROM card")
self.tableView.setModel(model)
I am using QSqlQueryModel, Qtablevie, Sqlite3, and able to view all rows in my table. But i want to view only last two rows of my table which are newly inserted rows in to the table. The table has no "id" field and it has numaric and text fields. How is it possible?
Below is the table image:
If you want to get the last 2 elements ordered by any field that indicates the insertion order, in your case "rowid", then you have to use a filter in the SQL command like this:
model.setQuery("SELECT * FROM card ORDER BY rowid DESC LIMIT 2")
Another possible option is to filter the table using QSortFilterProxyModel but it is more inefficient.

Cassandra is inserting null values in skipped column

Anybody please help me understand why Cassandra is inserting null values in columns that was skipped? Isn't it supposed to skip the column? It should not insert any value (not even null) if I skip the column entirely while inserting data? I am bit confused because as per the following tutorial, data is stored by row key with the columns (the diagram in column family), if it is true then I should not get null for the column.
Or the whole concept I learned about the Cassandra column family is wrong?
http://www.tutorialspoint.com/cassandra/cassandra_data_model.htm
Here is the CQL script
create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
create table users (firstname text,lastname text,age int, gender ascii, primary key(firstname))
insert into users(firstname,age,gender,lastname) values("Michael",30,"male","smith");
Here, I am skipping a column, but when I run select query, it shows null for that column. Why Cassandra is filling up null in that column?
insert into users(firstname,age,gender) values('Jane',23,'female');
select * from users;
Why don't you go to the most comprehensive source of documentation and learning for Cassandra : http://academy.datastax.com ? And it's free. The content and tutorialspoint.com is very old and not updated since ages (SuperColumn are deprecated since 2011 - 2012 ...)
Here, I am skipping a column, but when I run select query, it shows null for that column. Why Cassandra is filling up null in that column?
In CQL, null == value is not present or value has been deleted
Since you did not insert any value for column lastname Cassandra will return null (== not present in this case)

Reading the most recent updated row in cassandra

I have a use case and want suggestion on the below.
Structure :
Rowkey_1:
Column1 = value1;
Column2 = value2;
Rowkey_2:
Column1 = value1;
Column2 = value2;
" Suppose i am writing 1000 rows into cassandra with each row having couple of columns. After sometime i update only 100 rows and make changes for column values ".
-> when i read data from cassandra i only want to get these 100 updated rows and not the entire row key information.
Is there a way to say to cassandra like give me all row keys from start - > end where time in between "Time_start" to "Time_end"
in SQL Lingo -- > select from "" to "" where time between "time_start" and "time_end".
P.S. i read Basic Time Series with Cassandra where it says you can annotate rowkey like the below
Inserting data — {:key => ‘server1-load-20110306′, :column_name => TimeUUID(now), :column_value => 0.75}
Here the column family has TimeUUID columns.
My question is can you annotate you rowkey with date and time like this : { :key ==> 2012-11-18 16:00:15 }
OR any other way to get only the recent updated rows.
Any suggestion/ guidance really appreciated.
You can't do range queries on keys unless you use ByteOrderedPartitioner, which you shouldn't use. The way to do this is by writing known sentinel values as keys, such as a timestamp representing the beginning of the day. Then you can do the column slice by time.

Cassandra BETWEEN & ORDER BY operations

I wanted to perform SQL operations such as BETWEEN, ORDER BY with ASC/DSC order on Cassandra-0.7.8.
As I know, Cassandra-0.7.8 does not have direct support to these operations. Kindly let me know is there a way to accomplish these by tweaking on secondary index?
Below is my Data model design.
Emp(KS){
User(CF):{
bsanderson(RowKey): { eno, name, dept, dob, email }
prothfuss(RowKey): { eno, name, dept, dob, email }
}
}
Queries:
- Select * from emp where dept='IT' ORDER BY dob ASC.
- Select * from emp where eno BETWEEN ? AND ? ORDER BY dob ASC.
Thanks in advance.
Regards,
Thamizhananl
Select * from emp where dept='IT' ORDER BY dob ASC.
You can select rows where the 'dept' column has a certain value, by using the built-in secondary indexes. However, the rows will be returned in the order determined by the partitioner (RandomPartitioner or OrderPreservingPartitioner). To order by arbitrary values such as DOB, you would need to sort at the client.
Or, you could support this query directly by having a row for each dept, and a column for each employee, keyed (and therefore sorted) by DOB. But be careful of shared birthdays! And you'd still need subsequent queries to retrieve other data (the results of your SELECT *) for the employees selected, unless you denormalise so that the desired data is stored in the index too.
Select * from emp where eno BETWEEN ? AND ? ORDER BY dob ASC.
The secondary index querying in Cassandra requires at least one equality term, so I think you can do dept='IT' AND eno >=X AND eno <=y, but not just a BETWEEN-style query.
You could do this by creating your own index row, with a column for each employee, keyed on the employee number, with an appropriate comparator so all the columns are automatically sorted in employee-number order. You could then do a range query on that row to get a list of matching employees - but you would need further queries to retrieve other data for each employee (dob etc), unless you denormalise so that the desired data is stored in the index too. You would still need to do the dob ordering at the client.
As I know the columns will be sorted by comparator when you create column family and you can use clustring key for sorting on your opinion
and row in column family will be sorted by partitioner
I suggest you read this paper
Cassandra The Definitive Guide Chapter 6

Resources