hive column name change to upper case? - apache-spark

I have a hive table with column name abc and I want to change it to aBc through:
ALTER TABLE mytable CHANGE abc aBc BIGINT
but it didn't work.
I know that hive is case insensitive so I tried to do it with spark session.
doesn't work either.
Is that possible to change it with DDL clause?
I know an alternative solution which requires me to drop the old table and recreate with
create table clause, but the table is too large and I don't want to recreate it.

Related

SparkSQL cannot find existing rows in a Hive table that Trino(Presto) does

We are using Trino(Presto) and SparkSQL for querying Hive tables on s3 but they give different results with the same query and on the same tables. We found the main problem. There are existing rows in a problematic Hive table which can be found with a simple where filter on a specific column with Trino but cannot be found with SparkSQL. The sql statements are the same in both.
On the other hand, SparkSQL can find these rows in the source table of that problematic table, filtering on the same column.
Create sql statement:
CREATE TABLE problematic_hive_table AS SELECT c1,c2,c3 FROM source_table
The select sql that can be used to find missing rows in Trino but not in SparkSQL
SELECT * FROM problematic_hive_table WHERE c1='missing_rows_value_in_column'
And this is the select query which can find these missing rows in SparkSQL:
SELECT * FROM source_table WHERE c1='missing_rows_value_in_column'
We execute the CTAS in Trino(Presto). If we are using ...WHERE trim(c1) = 'missing_key'' then spark can also find the missing rows but the fields do not contain trailing spaces (the length of these fields are the same in the source table as in the problematic table). In the source table spark can find these missing rows without trim.

How to add new column in to partition by clause in Hive External Table

I have external Hive Table which is filled by spark job and partitioned by(event_date date) now I have modified the spark code and added one extra column 'country'.In earlier written data country column will have null values as it is newly added. now I want to Alter 'partitioned by' clause as partition by(event_date date,country string) how can I achieve this.Thank you!!
Please try to alter the partition using below commnad-
ALTER TABLE table_name PARTITION part_spec SET LOCATION path
part_spec:
: (part_col_name1=val1, part_col_name2=val2, ...)
Try this databricks spark-sql language manual for alter command

drop column in a table/view using spark sql only

i have 30 columns in a table i.e table_old
i want to use 29 columns in that table except one . that column is dynamic.
i am using string interpolation.
the below sparksql query i am using
drop_column=now_current_column
var table_new=spark.sql(s"""alter table table_old drop $drop_column""")
but its throwing error
mismatched input expecting 'partition'
i dont want to drop the column using dataframe. i requirement is to drop the column in a table using sparksql only
As mentioned in previous answer, DROP COLUMN is not supported by spark yet.
But, there is a workaround to achieve the same, without much overhead. This trick works for both EXTERNAL and InMemory tables. The code snippet below works for EXTERNAL table, you can easily modify it and use it for InMemory tables as well.
val dropColumnName = "column_name"
val tableIdentifier = "table_name"
val tablePath = "table_path"
val newSchema=StructType(spark.read.table(tableIdentifier).schema.filter(col => col.name != dropColumnName))
spark.sql(s"drop table ${tableIdentifier}")
spark.catalog.createTable(tableIdentifier, "orc", newSchema, Map("path" -> tablePath))
orc is the file format, it should be replaced with the required format. For InMemory tables, remove the tablePath and you are good to go. Hope this helps.
DROP COLUMN (and in general majority of ALTER TABLE commands) are not supported in Spark SQL.
If you want to drop column you should create a new table:
CREATE tmp_table AS
SELECT ... -- all columns without drop TABLE
FROM table_old
and then drop the old table or view, and reclaim the name.
Now drop columns is supported by Spark if you´re using v2 tables. You can check this link
https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-alter-table.html

Cassandra Altering the table

I have a table in Cassandra say employee(id, email, role, name, password) with only id as my primary key.
I want to ...
1. Add another column (manager_id) in with a default value in it
I know that I can add a column in the table but there is no way i can provide a default value to that column through CQL. I can also not update the value for manager_id later since I need to know the id (Partition key and the values are randomly generated unique values which i don't know) to update the row. Is there any way I can achieve this?
2. Rename this table to all_employee.
I also know that its not allowed to rename a table in cassandra. So I am trying to copy the data of table(employee) to csv and copy from csv to new table (all_employee) and deleting the old table(employee). I am doing this through an automated script with cql queries in it and script works fine but will fail if it gets executed again(Which i can not restrict) since the table employee will not be there once its deleted. Essentially I am looking for "If exists" clause in COPY query which is not supported in cql. Is there any other way I can achieve the outcome?
Please note that the amount of data in the table is very small so performance in not an issue.
For #1
I dont think cassandra support default column . You need to do that from your appliaction. Write some default value every time you insert a row.
For #2
you can check if the table exists before trying to copy from it.
SELECT your_table_name FROM system_schema.tables WHERE keyspace_name='your_keyspace_name';

Cassandra: Adding new column to the table

Hi I just added a new column Business_sys to my table my_table:
ALTER TABLE my_table ALTER business_sys TYPE set<text>;
But again I just droped this column name because I wanted to change the type of column:
ALTER TABLE my_table DROP business_sys;
Again when I tried to add the same colmn name with different type am getting error message
"Cannnot add a collection with the name business_sys because the collection with the same name and different type has already been used in past"
I just tried to execute this command to add a new column with different type-
ALTER TABLE my_table ADD business_sys list<text>;
What did I do wrong? I am pretty new to Cassandra. Any suggestions?
You're running into CASSANDRA-6276. The problem is when you drop a column in Cassandra that the data in that column doesn't just disappear, and Cassandra may attempt to read that data with its new comparator type.
From the linked JIRA ticket:
Unfortunately, we can't allow dropping a component from the comparator, including dropping individual collection columns from ColumnToCollectionType.
If we do allow that, and have pre-existing data of that type, C* simply wouldn't know how to compare those...
...even if we did, and allowed [users] to create a different collection with the same name, we'd hit a different issue: the new collection's comparator would be used to compare potentially incompatible types.
The JIRA suggests that this may not be an issue in Cassandra 3.x, but I just tried it in 3.0.3 and it fails with the same error.
What did I do wrong? I am pretty new to Cassandra. Any suggestions?
Unfortunately, the only way around this one is to use a different name for your new list.
EDIT: I've tried this out in Cassandra and ended up with inconsistent missing data. Best way to proceed is to change the column name as suggested in CASSANDRA-6276. And always follow documentation guidelines :)
-WARNING-
According to this comment from CASSANDRA-6276, running the following workaround is unsafe.
Elaborating on #masum's comment - it's possible to work around the limitation by first recreating the column with a non-collection type such as an int. Afterwards, you can drop and recreate again using the new collection type.
From your example, assuming we have a business_sys set:
ALTER TABLE my_table ADD business_sys set<text>;
ALTER TABLE my_table DROP business_sys;
Now re-add the column as int and drop it again:
ALTER TABLE my_table ADD business_sys int;
ALTER TABLE my_table DROP business_sys;
Finally, you can re-create the column with the same name but different collection type:
ALTER TABLE my_table ADD business_sys list<text>;
Cassandra doesn't allow you to recreate a column with the same name and the same datatype, but there is an workaround to fix it.
Once you have dropped the column with SET type, you can recreate it with only another "default" type such as varchar or interger.
After recreating with one of those types, you can drop the column once again and finally recreate with the proper type.
I illustrated it below
ALTER TABLE my_table DROP business_sys; # the drop you've done
ALTER TABLE my_table ADD business_sys varchar; # recreating with another type
ALTER TABLE my_table DROP business_sys; # dropping again
ALTER TABLE my_table ADD business_sys list<text>; # recreating with proper type

Resources