Cassandra valid column names - cassandra

I'm creating a api which will work on either mongo or cassandra, for that reason I'm using '_id' as a column name.
This should be a valid name according to the docs:
Keyspace, column, and table names created using CQL can only contain alphanumeric and underscore characters. User-defined data type names and field names, user-defined function names, and user-defined aggregate names created using CQL can only contain alphanumeric and underscore characters. If you enter names for these objects using anything other than alphanumeric characters or underscores, Cassandra will issue an invalid syntax message and fail to create the object.
However, when I run this statement:
CREATE TABLE users(_id: bigint, entires: map<timestamp, text>, PRIMARY KEY(_id));
I return the following error:
Invalid syntax at line 1, char 20
Is it possible to use underscores in column names?

Underscores in columns names? Yes. Column names starting with underscores? No.
From the CREATE TABLE documentation:
Valid table names are strings of alphanumeric characters and underscores, which begin with a letter.

You can create a column name starting with an underscore. Use quotes:
CREATE TABLE users("_id": bigint, entires: map<timestamp, text>, PRIMARY KEY("_id"));
The column name will be _id
Although you can, it does not mean that you should have such a column - you will need to continue using quotes in each query making it cumbersome:
SELECT "_id" FROM users;

Related

How to fix the SQL query in databricks if column name has bracket in it

I have a file which has data like this , I have converted that file into a databricks table.
Select * from myTable
Output:
Product[key] Product[name]
123 Mobile
345 television
456 laptop
I want to query my table for laptop data.
I am using below query
Select * from myTable where Product[name]='laptop'
I am getting below error in databricks:
AnalysisException: cannot resolve 'Product' given input columns:
[spark_catalog.my_db.myTable.Product[key],[spark_catalog.my_db.myTable.Product[name]
When certain characters appear in column names of a table in SQL, you get a parse exception. These characters include brackets, dots (.), hyphens (-), etc. So, when such characters appear in column names, we need an escape character to parse these characters just as a part of column name.
For SQL in Databricks, this character is Backtick (`). Enclosing your column name in backticks ensures that your column name is parsed correctly as it is even when it includes characters like ‘[]’ (In this case).
Since you have converted a file data into Databricks table, you were not able to see the main problem which is parsing the column name. If you manually create a table with specified schema in Databricks, you will get the following result:
Once you use Backtick in the following way, using the column name would not be a problem anymore.
create table mytable(`Product[key]` integer, `Product[name]` varchar(20))
select * from mytable where `Product[name]`='laptop'

When and why are Google Cloud Spanner table and column names case-sensitive?

Spanner documentation says:
Table and column names:
Can be between 1-128 characters long. Must start with an uppercase or lowercase letter.
Can contain uppercase and lowercase letters, numbers, and underscores, but not hyphens.
Are case-insensitive. For example, you cannot create tables named mytable and MyTable in the same database or columns names mycolumn and
MyColumn in the same table.
https://cloud.google.com/spanner/docs/data-definition-language#table_statements
Given that, I have no idea what this means:
Table names are usually case insensitive, but may be case sensitive
when querying a database that uses case sensitive table names.
https://cloud.google.com/spanner/docs/lexical#case-sensitivity
In fact it seems that table names are case-sensitive, for example:
Queries fail if we don't match the case shown in the UI.
This seems to be an error in the documentation. Table names are case insensitive in cloud spanner. I'll follow up with the docs team.
Edit: Updated docs https://cloud.google.com/spanner/docs/data-definition-language#naming_conventions
I add a couple of examples, so we can see the diference.
Table names are case sensitive, In this example, It does not matter, there is only one table:
Example 1:
SELECT *
FROM Roster
WHERE LastName = #myparam
returns all rows where LastName is equal to the value of query parameter myparam.
But for Example 2, if we comparing two tables, or make other kind of queries, using tables.
SELECT id, name
FROM Table1 except select id, name
FROM Table2
It will give you everything in Table1 but not in Table2.

How to change the type of a column name in Cassandra?

I read somewhere that Cassandra supports different types for the Column Names unlike RDBMS's out there that supports only Strings.
How do i go about changing the column name of a table in Cassandra?. Or How do i create a table FOO with a Column name as 1985-12-05 ?
You can use alter command. Check https://docs.datastax.com/en/cql/3.1/cql/cql_reference/alter_table_r.html
but 1985-12-05 won't work as Keyspace, column, and table names created using CQL can only contain alphanumeric and underscore characters.
You can use Map column type
CREATE TABLE FOO(
id int,
mymap map<timestamp, text>,
PRIMARY KEY (int));
This will serve as dynamic column for you.

How to delete a row in cql based on Set<text> content

I have a Cassandra table and one column is defined as Set<text>. I want to delete rows that contain specific elements in that set.
For example if the table had a column names contained random values like ["Alice","Bob","Eve"],
I want a command to delete all the rows that contain the word Eve.
If namewas of type text then the command would go something like:
delete from keyspace.table where name='Eve';
however that does not work since name is not text but Set<text>. What would be an equivalent command here?
delete from keyspace.table where name CONTAINS 'Eve';
however you need to have secondary index on name column.

Why cassandra/cql restrict to use where clause on a column that not indexed?

I have a table as follows in Cassandra 2.0.8:
CREATE TABLE emp (
empid int,
deptid int,
first_name text,
last_name text,
PRIMARY KEY (empid, deptid)
)
when I try to search by: "select * from emp where first_name='John';"
cql shell says:
"Bad Request: No indexed columns present in by-columns clause with Equal operator"
I searched for the issue and every places it says add a secondary index for the column 'first_name'.
But I need to know the exact reason for why that column need to be indexed?
Only thing I can figure out is performance.
Any other reasons?
Cassandra does not support for searching by arbitrary column. It is because it would involve scanning all the rows, which is not supported.
The data are internally organised into something which one can compare to HashMap[X, SortedMap[Y, Z]]. The key of the outer map is a partition key value and the key of the inner map is a kind of concatenation of all clustering columns values and a name of some regular column.
Unless you have an index on a column, you need to provide full (preferred) or partial path to the data you want to collect with the query. Therefore, you should design your schema so that queries contain primary key value and some range on clustering columns.
You may read about what is allowed and what is not here
Alternatively you can create an index in Cassandra, but that will hamper your write performance.

Resources