I'm wondering if there is a query in CQL3 that allows you to get column names of a specific columnfamily that has a static schema?
Thanks in advance :)
If you want to get column names of a specific table with CQL3 then i guess try this
select * from system.schema_columns WHERE keyspace_name='#KS' AND columnfamily_name='#CF' allow filtering;
Note: keyspace_name in the where clause is optional. Its mainly used for better filtration purpose (say, table with the same name in multiple keyspace)
You could use the system keyspace to do this:
SELECT column_name FROM system.schema_columnfamilies
WHERE keyspace_name = 'testks' AND columnfamily_name = 'testcf';
Output in cqlsh (using cql3):
column_name
-------------
password
You can work out the key for the column by using:
SELECT key_aliases FROM system.schema_columnfamilies WHERE keyspace_name='testks'
AND columnfamily_name='testcf';
Output:
key_aliases
--------------
["username"]
From my latest test, we should use schema_columns, rather than schema_columnfamilies to get all the column names. schema_columnfamilies could be used for getting table names.
Get column names:
SELECT column_name FROM system.schema_columns WHERE keyspace_name = 'KeySpaceName' AND columnfamily_name = 'TableName';
Get column family names, i.e., table names:
select columnfamily_name from system.schema_columnfamilies where keyspace_name='KeySpaceName';
As per the latest Documentation of Cassandra 3.x, all above answers won't work and now the query to show columns would be like this.
SELECT * FROM system_schema.columns WHERE keyspace_name = 'xxxx' AND table_name = 'xxx';
Related
I'm trying to join two tables in spark sql. Each table has 50+ columns. Both has column id as the key.
spark.sql("select * from tbl1 join tbl2 on tbl1.id = tbl2.id")
The joined table has duplicated id column.
We can of course specify which id column to keep like below:
spark.sql("select tbl1.id, .....from tbl1 join tbl2 on tbl1.id = tbl2.id")
But since we have so many columns in both tables, I do not want to type all the other column names in the query above. (other than id column, no other duplicated column names).
what should I do? thanks.
If id is the only column name in common, you can take advantage of the USING clause:
spark.sql("select * from tbl1 join tbl2 using (id) ")
The using clause matches columns that have the same name in both tables. When using select *, the column appears only once.
Assuming, you want to preserve the "duplicates", you can try to use the internal row-id or equivalents for your help. This helped me in the past, if I had to delete exactly one of two identical rows.
select *,ctid from table;
outputs in postgresql also the internal counter id. Your before exact identical rows become different now. I don't know about spark.sql, but I assume, that you can access a similar attribute there.
val joined = spark
.sql("select * from tbl1")
.join(
spark.sql("select * from tbl2"),
Seq("id"),
"inner" // optional
)
joined should have only one id column. Tested with Spark 2.4.8
For example, we can use
select count(*) from student_database;
to calculate the number of rows in a table.
But how do we calculate the number of tables in a keyspace?
DESCRIBE TABLES;
gives you the list of all tables in that keyspace.
And for a Cassandra 2.x (and lower) answer:
SELECT COUNT(*) FROM system.schema_columnfamilies
WHERE keyspace_name='your keyspace';
SELECT count(*) FROM system_schema.tables WHERE keyspace_name='your keyspace'
The above query will work in cassandra 3.0 and above
rows = session.execute("SELECT count(*) FROM system_schema.tables WHERE keyspace_name = 'your_keyspace_name'")
print(list(rows))
Result:
[Row(count=2)]
CASSANDRA Version : 2.1.10
CREATE TABLE customer_raw_data (
id uuid,
hash_prefix bigint,
profile_data map<varchar,varchar>
PRIMARY KEY (hash_prefix,id));
I have an index on profile_data and I have row where profile_data is null.
How to write a select query to retrieve the rows where profile_data is null ?
I tried the following
select count(*) from customer_raw_data where profile_data=null;
select count(*) from customer_raw_data where profile_data CONTAINS KEY null;
With Reference to : https://issues.apache.org/jira/browse/CASSANDRA-3783
There is currently no select support for indexed nulls, and given the design of Cassandra, is considered a difficult/prohibitive problem.
Basic problem.
where condition column has to be either primary key or secondary index so make your column what-ever is suitable and then try below query.
Try this..
select count(*) from customer_raw_data where profile_data='';
SELECT * FROM TableName WHERE colName > 5000 ALLOW FILTERING; //Work fine
SELECT * FROM TableName WHERE colName > 5000 limit 10 ALLOW FILTERING;
https://cassandra.apache.org/doc/old/CQL-3.0.html
Check the "ALLOW FILTERING" Part.
I have a cassandra database in which columns can be added or removed based on the application need. The column names start with the prefix RSSI. I was wondering if it is possible to select all columns where the column name is like %RSSI%. In MYSQL you can do something like select count(*) FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name ='MACTrain' AND column_name LIKE '%RSSI%'. Is it possible in cassandra ? If not what can be a solution to select columns based on a specific wildcard.
You can obtain the columns metadata of a table by querying the system keyspace:
select * from system.schema_columns
where keyspace_name = 'yourks' and columnfamily_name = 'yourtable';
For Cassandra v3.0 and above, you can use the new system_schema keyspace:
select * from system_schema.columns
where keyspace_name = 'yourks' and table_name = 'yourtable';
We are trying to store lots of attributes for a particular profile_id inside a table (using CQL3) and cannot wrap our heads around which approach is the best:
a. create table mytable (profile_id, a1 int, a2 int, a3 int, a4 int ... a3000 int) primary key (profile_id);
OR
b. create MANY tables, eg.
create table mytable_a1(profile_id, value int) primary key (profile_id);
create table mytable_a2(profile_id, value int) primary key (profile_id);
...
create table mytable_a3000(profile_id, value int) primary key (profile_id);
OR
c. create table mytable (profile_id, a_all text) primary key (profile_id);
and just store 3000 "columns" inside a_all, like:
insert into mytable (profile_id, a_all) values (1, "a1:1,a2:5,a3:55, .... a3000:5");
OR
d. none of the above
The type of query we would be running on this table:
select * from mytable where profile_id in (1,2,3,4,5423,44)
We tried the first approach and the queries keep timing out and sometimes even kill cassandra nodes.
The answer would be to use a clustering column. A clustering column allows you to create dynamic columns that you could use to hold the attribute name (col name) and it's value (col value).
The table would be
create table mytable (
profile_id text,
attr_name text,
attr_value int,
PRIMARY KEY(profile_id, attr_name)
)
This allows you to add inserts like
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a1', 3);
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a2', 1031);
.....
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'an', 2);
This would be the optimal solution.
Because you then want to do the following
'The type of query we would be running on this table: select * from mytable where profile_id in (1,2,3,4,5423,44)'
This would require 6 queries under the hood but cassandra should be able to do this in no time especially if you have a multi node cluster.
Also if you use the DataStax Java Driver you can run this requests asynchronously and concurrently on your cluster.
For more on data modelling and the DataStax Java Driver check out DataStax's free online training. Its worth a look
http://www.datastax.com/what-we-offer/products-services/training/virtual-training
Hope it helps.