I would like to list all empty tables in my database Athena.
I tried :
select table_schema, table_name from information_schema.tables
where table_schema = 'database'
But like this I list only table name with database name.
Thanks for your help.
I do not think it is possible within a single query. Your query gives you a list of tables. Having that I think you could now iterate over that from the external tool.
Related
I have the list of five tables name. I need to delete rest of all the tables in data bricks, which is not in the list. I don't know what command or method to be used to solve this.
Please help me on this.
Regards,
Manoranjini Muthuraj
#pyspark code
#list of tables to keep
keep_tables = ['table_1', 'table_2', 'table_3', 'table_4', 'table_5']
#get list of all tables from my_database
df = spark.sql('show tables in my_database')
#loop thru the tables and if table not in keep_tables then do the operation on each table (drop/delete/count etc).
#**Careful** the code will drop tables not in the keep list
for t in df.collect():
if t not in keep_tables:
#do the table operation (drop/delete/count etc)
print('operate on table {}'.format(t.tableName))
spark.sql('drop table my_database.{}'.format(t.tableName)))
Is there any way to search or filter for particular Table from Dataset using Table name while calling List operation? I understand that documentation mentions use of Labels to filter Tables but in my case this will not suffice as there is no restriction on number of Tables that can be created under a Dataset with or without Label . I am using Node library for my operations.
The prefered way to search or filter for particular Table (or any other metadata object) is to query INFORMATION_SCHEMA. There are multiple INFORMATION_SCHEMA tables which could be used - INFORMATION_SCHEMA.TABLES, INFORMATION_SCHEMA.TABLE_OPTIONS, INFORMATION_SCHEMA.COLUMNS etc.
More info at https://cloud.google.com/bigquery/docs/information-schema-tables
I use databricks. I am trying to create a table as below
target_table_name = 'test_table_1'
spark.sql("""
drop table if exists %s
""" % target_table_name)
spark.sql("""
create table if not exists {0}
USING org.apache.spark.sql.parquet
OPTIONS (
path ("/mnt/sparktables/ds=*/name=xyz/")
)
""".format(target_table_name))
Even though using "*" gives me flexibility on loading different files (pattern matching) and eventually create a table, I wish to create a table based on two completely different paths (no pattern matching).
path1 = /mnt/sparktables/ds=*/name=xyz/
path2 = /mnt/sparktables/new_path/name=123fo/
Spark uses Hive metastore to create these permanent tables. These tables are essentially external tables in Hive.
Generally what you are trying is not possible because Hive external table location needs to be unique at the time of creation.
However, you could still achieve the hive table with different location, if you incorporate partitioning strategy on your hive metastore.
In hive metastore you can have partitions which point to different locations.
However there is no off the shelf way to achieve this. Firstly you would need to specify a partition key for your dataset and create a table from the 1st location where the entire data belongs to one partition. Then alter table to add a new partition.
Sample:
create external table tableName(<schema>) partitioned by ('name') location '/mnt/sparktables/ds=*/name=xyz/'
Then you can add partitions
alter table tableName add partition(name='123fo') location '/mnt/sparktables/new_path/name=123fo/'
The alternate to this process is create 2 dataframe out of the 2 location , combine them then saveAsaTable
I would do something like this:
create or replace view 'mytable' as
select * from parquet.`path1`
union all
select * from parquet.`path2`
The view understands how to query from both locations. I assume you will not append/overwrite the table as it would lead to more ambiguity.
You can create data frames separately for two or more parquet files and then union them (assuming they have identical schemas)
df1.union(df2)
I have set of SQL select queries to execute and share the consolidated results in excel sheet, I'm using sqlyog to do this.
Every time I execute results are in multiple tables. Can I get the results in a single table?
Select * from a.table;
Select * from b.table;
To get the result in a single table you need to use JOIN's in your query.
For Example, I have a area_table and I have a height_table.
to get the result in the consolidated table I would use JOIN and write the query as:
Select a.*,b.* from area_table a
Join height_table b;
Business Case:
I have a list of key IDs in an excel spreadsheet. I want to use Power Query to join these IDs with a details table in a SQL Server database.
Problem
Currently using Power Query I only know how to import the entire table, which is greater than 1 million records, then do a left join on it against an existing query that targets a local table of IDs.
What I want to do is send that set of IDs in the original query so I'm not pulling back the entire table and then filtering it.
Question
Is there an example of placing an IN clause targeting a local table similar to what is shown below?
= Sql.Database("SQLServer001", "SQLDatabase001",
[Query="SELECT * FROM DTree WHERE ParentID
IN(Excel.CurrentWorkbook(){[Name="tbl_IDs"]}[Content])"])
I would first build a "Connection only" Query on the excel spreadsheet key IDs.
Then I would start a new Query by connecting to the SQL table. In that query I would add a Merge step to apply the key IDs query as an Inner Join (filter).
This will download the 1m rows to apply the filter, but it is surprisingly quick as this is mostly done in memory. It will only write the filtered result to an Excel table.
To improve performance, filter the rows and columns as much as you can before the Merge step.