How To Retrieve Data From Cassandra? - cassandra

I Have Cassandra Column Family Name as Data3, In That I Have 2 Columns With Data As Follows
URL Data
www.google.com Google
I Want A Similar Query in Cassandra like ( SELECT * FROM Table1 WHERE Data='Google')
Thanks

select * from Data3 where Data = 'Google'
This is CQL, as described by CQL Language Reference on DataStax.
Weirdly, that we use earlier version of Cassandra where CQL was not supported. And we never thought that we actually required something like SQL. If you wanted more detailed read these articles
CQL Utility
You could query without SQL type utility
You could see non-SQL example/tutorial here . Here is how you select columns

Related

How can I see the location of an external Delta table in Spark using Spark SQL?

If I create an external table in Databricks, how can I check its location (in Delta lake) using an SQL query?
This can be done by using of multiple ways .
%sql
show create table database.tablename
or
%sql
desc formatted database.tablename
It can be done by using the following command.
describe detail <the table>
The location would be listed in the column location.

How to delete all rows in Cassandra Keyspace

I need to delete all rows in Cassandra but with Amazon Keyspace isn't possible to execute TRUNCATE tbl_name because the TRUNCATE api isn't supported yet.
Now the few ideas that come in my mind are a little bit tricky:
Solution A
select all the rows
cycle all the rows and delete it (one by one or in a batch)
Solution B
DROP TABLE
CREATE TABLE with the structure of the old table
Do you have any idea to keep the process simplest?
Tnx in advance
If the data is not required. Option B - drop the table and recreate. You can pass in the capacity on create table statment using custom table properties.
CREATE TABLE my_keyspace.my_table (
id text,
division text,
project text,
role text,
manager_id text,
PRIMARY KEY (id,division))
WITH CUSTOM_PROPERTIES=
{'capacity_mode':
{'throughput_mode' : 'PROVISIONED',
'read_capacity_units' : 10,
'write_capacity_units' : 20},
'point_in_time_recovery': {'status': 'enabled'}}
AND TAGS={'pii' :'true',
'prod':'true'
};
Option C. If you require the data you can also leverage on-demand capacity mode which is pay-per request mode. With no request you only have to pay for storage. You can change modes once a day.
ALTER TABLE my_keyspace.my_table
WITH CUSTOM_PROPERTIES=
{'capacity_mode': {'throughput_mode': 'PAY_PER_REQUEST'}}
Solution B should be fine in absence of TRUNCATE. In older versions (version prior to 2.1) of Cassandra recreating table with the same name was a problem. Refer article Datastax FAQ Blog. But since then issue has been resolved via CASSANDRA-5202.
If data in table is not required anymore it is better to drop the table and recreate it. Moreover it will be very tedious task if table contains big amount of data.

External hive table on top of parquet returns no data

I created a hive table on top of a parquet folder written via spark. In one test server it is running fine and giving out results (hive version 2.6.5.196) but in production it gives no records (hive 2.6.5.179). Could someone please point out what the exact issue could be?
If you created the table on top of an existing partition structure, you have to make it known to the table that there are partitions at this location.
MSCK REPAIR TABLE table_name; -- adds missing partitions
SELECT * FROM table_name; -- should return records now
This problem shouldn't happen if there are only files in that location, and if they are the expected format.
You can verify with:
SHOW CREATE TABLE table_name; -- to see the expected format
created hive table on top of a parquet folder written via spark.
Check for the databases that you are using is available or not using
show databases;
check the ddl of the table that you have created on your test server and the other that is there on production
show create table table_name;
Make sure that both the ddl exactly matches.
Do msck repair table table_name to load the incremental data or the data from all the partitions
select * from table_name to view records

Spark SQL query issue - SQL with Subquery doesn't seem to retrieve records

I have a Spark SQL query like:
Select * from xTable a Where Exist (filter subquery) AND (a.date IN (Select max(b.date) from xTable b))
Under certain circumstances (when a filter table is not provided) , my filter subquery should simply do a Select 1.
Whenever I run this in Impala it returns records, in Hive it complains that only 1 subquery expression is allowed. However, when I run it as a Spark SQL in Spark 2.4, it returns an empty dataframe. Any idea why? What am I doing wrong?
Ok, I think I found the reason. It is not related to the query. It seems like an issue while trying to create a table using a csv file in Hive.
When you select the source - path to the csv file in HDFS
, and then under the format - check the 'Has Header' check box.
It seems to create the table ok.
Then, When I execute the following in Hive or Impala :
Select max(date) from xTable
I get the max date back (Where the date column is a String)
However, when I try and run the same via Spark SQL:
I get the result as date (The same name as the column header).
If I remove the header from CSV file and import it, and them manually create the headers and types, then I am not facing this issue.
Seems like some form of bug or may be a user error from my end.

CQL multiple range select

I tried to run this query in cassandra DB, but can't get result:
SELECT smth..smthz, else..elsez FROM CF
Is it possible to use multiple range select expressions in CQL?
No, CASE queries are not supported in cassandra. You have to manipulate in your code to achieve that functionality

Resources