How to read hive managed table data using spark? - apache-spark

I am able to read hive external table using spark-shell but, when I try to read data from hive managed table it only shows column names.
Please find queries here:

Could you please try using database name as well along with table name?
sql(select * from db_name.test_managed)
If still result is same, request you to please share output of describe formatted for both the tables.

Related

Write data frame to hive table in spark

could you please tell me if this command could create problems with overwriting all tables in the DB:
df.write.option(“path”, “path_to_the_db/hive/”).mode(overwrite).saveAsTable("result_data")
table_name is a new table in the DB, it hasn't existed.
After these commands, all tables disappeared.
I was using Spark3 and tried to solve an error:
Can not create the managed table('result_data').
The associated location('dbfs:/user/hive/warehouse/result_data') already exists.
I expected that a new table will be created without any issues if it doesn’t exist.
If path_to_the_db/hive contains other tables, then you overwrite into that folder, it seems possible that the whole directory would be emptied first, yes. Perhaps you should instead use path_to_the_db/hive/result_data
According to the error, though, your table does already exist.
You can use Spark to register a temporary table in SQL code, then run INSERT OVERWRITE query for existing tables.

Adding custom metadata to DataFrame schema using iceberg table format

I'm adding custom metadata into the DataFrames schema in my PySpark application using StructField's metadata field
It worked fine when I wrote parquet files directly into s3.
The custom metadata was available when reading these parquet files as expected.
But it's not working using iceberg table format. There is no error, but the df.schema.fields.metadata is always empty.
Is there a way to solve it?
Solved by making sure the key is always 'comment'
For example:
{'comment': 'my_metadata_info_field'}

Presto - can I do alter table if exists?

How can I alter table name only if exists?
Something like: alter table mydb.myname if exists rename to mydb.my_new_name
You can do something like:
ALTER TABLE users RENAME TO people;
or
ALTER TABLE mydb.myname RENAME TO mydb.my_new_name;
Please notice that IF EXISTS syntax is not available here. Please find more informations here: https://docs.starburstdata.com/latest/sql/alter-table.html The work for that is tracked under: https://github.com/prestosql/presto/issues/2260
Currently you need to handle this on a different layer, like java program that is running SQL queries to Presto over JDBC.

Which Hybris D.B. Table contain product description

I need to get the Product description manually from database, so please suggest which table it contains.
My Finding
Productslp table contains p_dscription column but that is clob datatype and I am unable to get the data from that.
I would suggest, you raise a support ticket with SAP Product support, since it may be a matter of correct Oracle JDBC driver.
We use HANA DB and had the same issue, they provided me with an updated driver and that resolved the issue for me.
Alternatively you can check the description in backoffice/hmc if that is an acceptable solution.
You can get using the SQL tab in hac with this query.
Select ps.p_description,ps.p_summary from products as p join productslp as ps ON p.pk=ps.itempk where pk=8796158722049
See the image below for result

Schema crawler reading data from table

I understood we can read data from a table using command in Schema crawler.
How to do that programatically in java. I could see example to read schema , table etc. But how to get data?
Thanks in advance.
SchemaCrawler allows you to obtain database metadata, including result set metadata. Standard JDBC provides you a way to get data by using java.sql.ResultSet, and you can use SchemaCrawler for obtainting result set metadata using schemacrawler.utility.SchemaCrawlerUtility.getResultColumns(ResultSet).
Sualeh Fatehi, SchemaCrawler

Resources