Brisk cassandra TimeUUIDType

Brisk cassandra TimeUUIDType - cassandra

I used brisk. The cassandra column family automatically maps to Hive tables.
However, if data type is timeuuid in column family, it is unreadable in Hive tables.
For example, I used following command to create an external table in hive to map column family.
Hive > create external table A (rowkey string, column_name string, value string)
> STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
> WITH SERDEPROPERTIES (
> "cassandra.columns.mapping" = ":key,:column,:value");
If column name is TimeUUIDType in cassandra, it becomes unreadable in the Hive table.
For example, a row in cassandra column family looks like:
RowKey: 2d36a254bb04272b120aaf79d70a3578
=> (column=29139210-b6dc-11df-8c64-f315e3a329d6, value={"event_id":101},timestamp=1283464254261)
Where column name is TimeUUIDType.
In hive table, it looks like the following row:
2d36a254bb04272b120aaf79d70a3578 t��ߒ4��!�� {"event_id":101}
So, column name is unreadable in Hive table.

This is a known issue with the automatic table mapping. For best results with a timeUUIDType, turn the auto-mapping feature off in $brisk_home/resources/hive/hive-site.xml:
"cassandra.autoCreateHiveSchema"
and create the table in hive manually.

Related

create table in hive

I am trying to create hive table with this syntax :
create table table_name as orc as select * from table1 partitioned by (Acc_date date).
I am getting error. My requirement is to create table using select statement and append the table when the next load happens.
I am trying to replicate this spark command:
df1.distinct().repartition("acc_date").write.mode("append").partitionBy("acc_date").format("parquet").saveAsTable("schema.table_name")

Make it a two step process.
Create the partition table as you want.
Insert data into it.
Details
1.sql may be like this -
create table if not exists table_name
(Col1 int, col2...)
partition (acc_date date)
Stored as orc ;
Insert will be like below. Make sure partition column is the last column in select clause.
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
Insert into table_name partition (Acc_date )
Select col1,col2... acc_date from table1 ;

Databricks table metadata through JDBC driver

The Spark JDBC driver (SparkJDBC42.jar) is unable to capture certain information from the below table structure:
table level comment
The TBLPROPERTIES key-value pair information
PARTITION BY information
However, it captures the column level comment (eg. the comment against employee_number column), all columns of employee table, their technical data types.
Please advise if I need to configure any additional properties to be ale to read/extract the information that the driver could not extract at the moment.
create table default.employee(
employee_number INT COMMENT ‘Unique identifier for an employee’,
employee_name VARCHAR(50),
employee_age INT)
PARTITIONED BY (employee_age)
COMMENT ‘this is a table level comment’
TBLPROPERTIES (‘created.by.user’ = ‘Noor’, ‘created.date’ = ‘10-08-2021’);

You should be able to execute:
describe table extended default.employee
via JDBC interface as well. In first case it will return a table with 3 columns, that you can parse into column level & table level properties - it shouldn't be very complex, as there are explicit delimiters between row-level & table level data:
You can also execute:
show create table default.employee
that will give you a table with one column, containing the SQL statement that you may parse:

How to add new column in to partition by clause in Hive External Table

I have external Hive Table which is filled by spark job and partitioned by(event_date date) now I have modified the spark code and added one extra column 'country'.In earlier written data country column will have null values as it is newly added. now I want to Alter 'partitioned by' clause as partition by(event_date date,country string) how can I achieve this.Thank you!!

Please try to alter the partition using below commnad-
ALTER TABLE table_name PARTITION part_spec SET LOCATION path
part_spec:
: (part_col_name1=val1, part_col_name2=val2, ...)
Try this databricks spark-sql language manual for alter command

Fetch distinct field values from frozen set column in Cassandra columnfamily

Hi please help me to get cql query for below requirement
- Column family contains columns: deptid (datatype:uuid emplList (datatype: set frozen(employee) )
How would I get all distinct employees name from employee object where it is stored at set as column value for emplList.

Such queries couldn't be expressed in the pure CQL - Cassandra is optimized to read data by primary key, and aggregation operations are very limited. You have 2 choices:
Read all data from table by your program, and extract distinct values
Use Spark with Spark Cassandra Connector - it will read all the data from table, but you'll have higher level abstraction to work with data, and it could perform more optimized scanning of your table.

Invalid Column name Error in DSE Analytics Spark

I have one table whose structure roughly is as follows ->
CREATE TABLE keyspace_name.table_name (
id text PRIMARY KEY,
type text,
bool_yn boolean,
created_ts timestamp,
modified_ts timestamp
)
Recently I added new column in the table ->
alter table keyspace_name.table_name first_name text;
And when I query on the given column from table in cqlsh, it gives me the result. For eg.
select first_name from keyspace_name.table_name limit 10;
But if I try to perform the same query in dse spark-sql
It is giving me the following error.
Error in query: cannot resolve 'first_name' given input columns: [id, type, bool_yn, created_ts, modified_ts];
I don't know what's wrong in spark-sql. I've tried nodetool repair but problem still persists
Any help would be appreciated. Thanks

If table schema changes, the Spark metastore doesn't automatically refresh the schema changes, so manually remove the old tables from spark sql with a DROP TABLE command, then run SHOW TABLES. The new table with latest schema will be automatically created. This will not change the data in Cassandra.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Brisk cassandra TimeUUIDType - cassandra

This is a known issue with the automatic table mapping. For best results with a timeUUIDType, turn the auto-mapping feature off in $brisk_home/resources/hive/hive-site.xml: "cassandra.autoCreateHiveSchema" and create the table in hive manually.

Related

create table in hive

Databricks table metadata through JDBC driver

How to add new column in to partition by clause in Hive External Table

Fetch distinct field values from frozen set column in Cassandra columnfamily

Invalid Column name Error in DSE Analytics Spark

Categories

Resources