Hive query string case

Hive query string case - string

Is there a way to get all the types of string cases while doing this:
select count(word) from table where word="abcd"
Actually when doing this, it is not the same as this:
select count(word) from table where word="ABCD"

Ignoring the case in a where clause is very simple. You can, for example, convert both sides of the comparison to all caps notation:
SELECT COUNT(word)
FROM table
WHERE UPPER(word)=UPPER('ABCD')
Regardless of the capitalization used for the search term , the UPPER function makes them match as desired.

select count(word) from table where lower(word)="abcd"
However this assumes it's not a partitioned table. If it's partitioned by word you would start doing a full table scan because of the "lower("

SELECT count(word) FROM table
WHERE word RLIKE
"(?i)WOrd1|wOrd2"

Related

When and why are Google Cloud Spanner table and column names case-sensitive?

Spanner documentation says:
Table and column names:
Can be between 1-128 characters long. Must start with an uppercase or lowercase letter.
Can contain uppercase and lowercase letters, numbers, and underscores, but not hyphens.
Are case-insensitive. For example, you cannot create tables named mytable and MyTable in the same database or columns names mycolumn and
MyColumn in the same table.
https://cloud.google.com/spanner/docs/data-definition-language#table_statements
Given that, I have no idea what this means:
Table names are usually case insensitive, but may be case sensitive
when querying a database that uses case sensitive table names.
https://cloud.google.com/spanner/docs/lexical#case-sensitivity
In fact it seems that table names are case-sensitive, for example:
Queries fail if we don't match the case shown in the UI.

This seems to be an error in the documentation. Table names are case insensitive in cloud spanner. I'll follow up with the docs team.
Edit: Updated docs https://cloud.google.com/spanner/docs/data-definition-language#naming_conventions

I add a couple of examples, so we can see the diference.
Table names are case sensitive, In this example, It does not matter, there is only one table:
Example 1:
SELECT *
FROM Roster
WHERE LastName = #myparam
returns all rows where LastName is equal to the value of query parameter myparam.
But for Example 2, if we comparing two tables, or make other kind of queries, using tables.
SELECT id, name
FROM Table1 except select id, name
FROM Table2
It will give you everything in Table1 but not in Table2.

Case Insensitive group by in Presto

By default, the Presto performs case sensitive group by. But I wanted to know how to do case insensitive group by. One method is convert all the things in the column to lower case and then perform group by ie
select * from ( select lower(name_of_the_column)), other_columns from table)
where conditions..
group by name_of_the_column
One way we can reduce time is by putting the conditions in the select statment inside the brackets. Is there any better method?

You don't need to push lower(...) into a subquery. If you simply write:
SELECT lower(name_of_the_column), ...
FROM ...
GROUP BY lower(name_of_the_column) -- or just "GROUP BY 1"
Presto will do the conversion to lowercase only once for each row (not twice).

Is there a "SOUNDS LIKE" function in CQL like the one in MySQL?

When I used MySQL I was able to query the database with a statement like SELECT * FROM table WHERE col LIKE "%attribute%";
Is there a way I can do that in Cassandra?

Cassandra CQL doesn't have a LIKE operator. It has limited filtering capabilities so you are restricted to equals, range queries on some numeric fields, and the IN operator which is similar to equals.
The most common approach to doing searches of Cassandra data seems to be pairing Cassandra with Apache Solr. Or you can pair it with Apache Spark which has more filtering capabilities than CQL.

If your col is a collection of data like set, list, map. You could use CONTAINS, to perform search.
Sample:
SELECT id, description FROM products WHERE features CONTAINS '32-inch';
For map data type,
SELECT id, description FROM products WHERE features CONTAINS KEY 'refresh-rate';
References:
http://www.datastax.com/dev/blog/cql-in-2-1

CQL LIKE statements now available are in Scylla Open Source 3.2 RC1, the release candidate for Scylla, a CQL-compatible database. We'd love feedback before release. Here's the details:
CQL: LIKE Operation #4477
The new CQL LIKE keyword allows matching any column to a search pattern, using % as a wildcard. Note that LIKE only works with ALLOW FILTERING.
LIKE Syntax support:
'_' matches any single character
'%' matches any substring (including an empty string)
'\' escapes the next pattern character, so it matches verbatim
any other pattern character matches itself
an empty pattern matches empty text fields
For example:
INSERT INTO t (id, name) VALUES (17, ‘Mircevski’)
SELECT * FROM t where name LIKE 'Mirc%' allow filtering
Source: [RELEASE] Scylla 3.2 RC1 2

How is it possible to do wildcard search on apache cassandra at the beginning of the string like '%string' as we do in Sql?

i need to do wildcard search like this on apache cassandra, SELECT * FROM table_name where col_name like "string%";
I know there is no wildcard support like this in Cassandra. I have to maintain some indices for this purpose. I have read through these links which were very much helpful.
Cassandra CQL 3 - Prefix Select
is there any trick to do wildcards search on apache cassandra?
I could design data model which allows wild cards at the end of the string and where I can get result like -->
SELECT * FROM table_name where col_name like 'str%'; by maintaning an index and with normal range queries from str:sts. But I want wild cards at the beginning of the string like '%str'.
Is there any possible way to do this? Any help will be appreciated.
Thanks in advance.

Cassandra is not a right tool to do any searches beyond the primary keys. You should better look for something like Solr for this type of job.
Of course you can create a table with all triplet combinations of letters and numbers as a partition key, and then reference to the partitioning or primary keys of the table which contains the data.
For example you will have rows with the partition key "aaa", "aab", "aac", ..., "ZZZ", etc, and the rest of the columns will tell you the values of the primary key of the table_name where this triplet exists in the col_name. Then you will have to update this table every time you modify the data in that col_name of the table_name.
But I don't think it will be a very efficient use of Cassandra.

FireBird: combine upper with primary key constraint

I want to use case insensivity in more tables which came from other DB where the fields and indexes can be case insensitive.
This means that we can search the needed row in any string format (DAta, Data, data, etc.), we can find that by any of these keys.
I tried to use upper function with index, and use this in a primary key to preserve the program logic.
But I failed with it. I didn't find any valid SQL statement to define it.
Maybe it's an impossible mission?
Or you know which ways I define Primary Key with "upper" index?
Thanks for any info!

If you want to do case insensitive search you're supposed to use case insensitive collation. In case you always want to treat the field's value in case insensitive manner you should define it at the field level, ie
CREATE TABLE T (
Foo VARCHAR(42) CHARACTER SET UTF8 COLLATE UNICODE_CI,
...
)
but you can also specify the collation at the search like
SELECT * FROM T WHERE Foo = 'bar' COLLATE UNICODE_CI
Read more about available collations at the Firebird's language reference.

IMHO better way is to use index by expresion
create index idx_upper on persons computed by (upper(some_name))
sql queries
select * from persons order by upper(some_name);
select * from persons where upper(some_name) starting with 'OBAM';
will use index idx_upper

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Hive query string case - string

Is there a way to get all the types of string cases while doing this: select count(word) from table where word="abcd" Actually when doing this, it is not the same as this: select count(word) from table where word="ABCD"

select count(word) from table where lower(word)="abcd" However this assumes it's not a partitioned table. If it's partitioned by word you would start doing a full table scan because of the "lower("

SELECT count(word) FROM table WHERE word RLIKE "(?i)WOrd1|wOrd2"

Related

When and why are Google Cloud Spanner table and column names case-sensitive?

Case Insensitive group by in Presto

Is there a "SOUNDS LIKE" function in CQL like the one in MySQL?

How is it possible to do wildcard search on apache cassandra at the beginning of the string like '%string' as we do in Sql?

FireBird: combine upper with primary key constraint

Categories

Resources