When and why are Google Cloud Spanner table and column names case-sensitive? - google-cloud-spanner

Spanner documentation says:
Table and column names:
Can be between 1-128 characters long. Must start with an uppercase or lowercase letter.
Can contain uppercase and lowercase letters, numbers, and underscores, but not hyphens.
Are case-insensitive. For example, you cannot create tables named mytable and MyTable in the same database or columns names mycolumn and
MyColumn in the same table.
https://cloud.google.com/spanner/docs/data-definition-language#table_statements
Given that, I have no idea what this means:
Table names are usually case insensitive, but may be case sensitive
when querying a database that uses case sensitive table names.
https://cloud.google.com/spanner/docs/lexical#case-sensitivity
In fact it seems that table names are case-sensitive, for example:
Queries fail if we don't match the case shown in the UI.

This seems to be an error in the documentation. Table names are case insensitive in cloud spanner. I'll follow up with the docs team.
Edit: Updated docs https://cloud.google.com/spanner/docs/data-definition-language#naming_conventions

I add a couple of examples, so we can see the diference.
Table names are case sensitive, In this example, It does not matter, there is only one table:
Example 1:
SELECT *
FROM Roster
WHERE LastName = #myparam
returns all rows where LastName is equal to the value of query parameter myparam.
But for Example 2, if we comparing two tables, or make other kind of queries, using tables.
SELECT id, name
FROM Table1 except select id, name
FROM Table2
It will give you everything in Table1 but not in Table2.

Related

How to fix the SQL query in databricks if column name has bracket in it

I have a file which has data like this , I have converted that file into a databricks table.
Select * from myTable
Output:
Product[key] Product[name]
123 Mobile
345 television
456 laptop
I want to query my table for laptop data.
I am using below query
Select * from myTable where Product[name]='laptop'
I am getting below error in databricks:
AnalysisException: cannot resolve 'Product' given input columns:
[spark_catalog.my_db.myTable.Product[key],[spark_catalog.my_db.myTable.Product[name]
When certain characters appear in column names of a table in SQL, you get a parse exception. These characters include brackets, dots (.), hyphens (-), etc. So, when such characters appear in column names, we need an escape character to parse these characters just as a part of column name.
For SQL in Databricks, this character is Backtick (`). Enclosing your column name in backticks ensures that your column name is parsed correctly as it is even when it includes characters like ‘[]’ (In this case).
Since you have converted a file data into Databricks table, you were not able to see the main problem which is parsing the column name. If you manually create a table with specified schema in Databricks, you will get the following result:
Once you use Backtick in the following way, using the column name would not be a problem anymore.
create table mytable(`Product[key]` integer, `Product[name]` varchar(20))
select * from mytable where `Product[name]`='laptop'

GCP Data Catalog - search columns containing a dot in the column name

Take the public github dataset as an example
SELECT
*
FROM
`bigquery-public-data.github_repos.commits`
LIMIT
2
There are column names like
difference.old_mode
via search:
column:difference.old_mode
will show no results
So, in this case the period isn't actually the column name, its an indication that you're dealing with a complex type (there's a record/struct column of name difference, and within that exists a column named old_mode.
Per search reference there's no special syntax for complex schemas documented.
A suggestion might be to leverage a logical AND operator like column:(difference,old_mode). It's not as precise as specifying the column relationship, but it should return the results you're interested in receiving.

How to split name data in the same column in PostgreSQL?

I am new to PostgreSQL and am using PGADMIN 4 on a Mac. I have one column of imported data that has some usernames, sometimes a last name and mostly a first and last name in the same column.
I care more to be able to query and count the most occurrences of a name in the column. I will be able to determine by results if it is a first or last for my need. Listing the first 50 should do it. Please assist with the specific code including addressing the table and column.
Have played with this, but need more:
select surname, count(*) from atreedata
group by surname
order by count(*) desc limit 40;
Works great with only one name! I need the most common names listed by name and count.
Common Column Example:
John Smith
jsmith3
Stacey123
Bob Smith
Jones
So, if I understand it correctly, you just need to find the most numerous words in surname column.
There's a built-in function regexp_split_to_table that can split strings to words and creates rows from those words. So:
select surname_word, count(*) as surname_word_count
from (
select regexp_split_to_table(surname, E'\\s+') as surname_word
from atreedata
) as surname_words
group by surname_word
order by surname_word_count desc
limit 40;

Cassandra valid column names

I'm creating a api which will work on either mongo or cassandra, for that reason I'm using '_id' as a column name.
This should be a valid name according to the docs:
Keyspace, column, and table names created using CQL can only contain alphanumeric and underscore characters. User-defined data type names and field names, user-defined function names, and user-defined aggregate names created using CQL can only contain alphanumeric and underscore characters. If you enter names for these objects using anything other than alphanumeric characters or underscores, Cassandra will issue an invalid syntax message and fail to create the object.
However, when I run this statement:
CREATE TABLE users(_id: bigint, entires: map<timestamp, text>, PRIMARY KEY(_id));
I return the following error:
Invalid syntax at line 1, char 20
Is it possible to use underscores in column names?
Underscores in columns names? Yes. Column names starting with underscores? No.
From the CREATE TABLE documentation:
Valid table names are strings of alphanumeric characters and underscores, which begin with a letter.
You can create a column name starting with an underscore. Use quotes:
CREATE TABLE users("_id": bigint, entires: map<timestamp, text>, PRIMARY KEY("_id"));
The column name will be _id
Although you can, it does not mean that you should have such a column - you will need to continue using quotes in each query making it cumbersome:
SELECT "_id" FROM users;

Cassandra: How to check if a column value is already present

I am using Cassandra. There is a column name-text which stores all say usernames.
name
------
bob
Bob
bobby
mike
michael
micky
BOB
I have 2 questions
I have to select all user names that starts with 'bo'. I know there is no like equivalent in Cassandra. But is there anyway to achieve that? (Additional column is an option but is there something else?)
There are 3 entries. bob,Bob and BOB. Is there anyway to use fetch all 3 rows if I pass where name='bob'. I need to fetch the names case-insensitive.
Thanks in advance.
Let's start with the second question first. If you want to support case-insensitive queries, you should store a second, upper-case copy of the text data you want to search for in another column. Then by querying by that column you'll be able to do case-insensitive requests.
Going back to searches for bo*. The best way to do that is to use a schema that allows you to leverage clustering columns (columns 2 and higher of the primary key) for range searches. Here is an example:
CREATE TABLE t1 (region INT, name TEXT, PRIMARY KEY (region, name));
In particular, if you make name the second column of the key, you will be able to perform searches such as
SELECT * FROM t1 WHERE name >= 'bo' and name < 'bp' ALLOW FILTERING;
which will return results you're looking for. This only works for trailing wildcards: the leading characters have to be constant for range queries to work. Again, if you want to do case-insensitive searches, have the case-insensitive column be the second part of the primary key and query by it:
SELECT * FROM t1 WHERE name_upper >= 'BO' and name_upper < 'BP' ALLOW FILTERING;

Resources