Creating a view table with intermediate view tables I want to drop - databricks

In a SQL notebook, I have created a bunch of view tables which involves merging/joining to ultimately create one combined/merged table. I don't want to keep all of these intermediate tables in the database, is there a way to assign a variable to these intermediate view tables (table 1,table2, etc.) I created? Given example with "Combined_table" as the final output:
CREATE or REPLACE VIEW database.table1 as select ... from...left join on...;
CREATE or REPLACE VIEW database.table2 as select ...from database.table1...left join on...;
CREATE or REPLACE VIEW database.table3 as select ...from database.table1...left join on...;
CREATE or REPLACE VIEW database.Combined_table as select table2.field1 table2.field2 table3.field1 from database.table4 left join table2 on... left join table3 on...
Hopefully you get the idea. Is there a more efficient way to do this/pass a variable to the intermediate tables?

You can use temporary views.
CREATE TEMPORARY view_name AS query
TEMPORARY views are session-scoped and is dropped when session ends
because it skips persisting the definition in the underlying
metastore, if any.
Like a normal view it is only a metadata object, ie. does not materialize data. So depending on the complexity of the queries it might not be the best solution. However in a lot of cases - like yours when every such interim view is only used once - it's enough and works just fine. Try it out to see.

Related

How to force an ADX to update from source with all historical data

Does anyone know a way to force a child table to update from the source table? it would be a one off command to run when the child table is created, then we have an auto update policy in place.
Googling suggests to try this, however it produces a syntax error
.update policy childTable with (sourceTable)
Thanks!:)
Update policy is a mechanism that works during the ingestion and does not support backfill.
You can consider using materialized view with backfill property (if your transformation logic falls under the limitations) or create the child table based on a query, using the .set command.
If your source table is huge you might need to split it to multiple commands.
We had to use this:
.append childTable <| updateFunction

Print table name on which query is executed

Looking at the following lines of code:
query = "DROP TABLE IF EXISTS my_table"
cur.execute(query)
conn.commit()
# print(table_name)
I'm running the query against multiple tables with various query and I want to return the name of the table and the action executed each time. Is there a way to get some kind of meta data from cur.execute or conn.commit on the action running?
In the example above I'd like to print the table name (my_table) and the action (DROP TABLE). however I want this to be dynamic. If I'm creating a table I want to the name of the table newly created and the action (CREATE TABLE).
Thanks.
Quick and Dirty
tables = ['table_1', 'table_2', 'table_3']
action = 'DROP TABLE'
for table in tables:
cur.execute(f'{str(action)} IF EXISTS {str(table)}')
print(f'ACTION: {action}')
print(f'TABLE: {table}')
conn.commit()
HOWEVER, please do not ever do something like this in anything other than a tiny app that will never leave your computer, and especially not with anything that will accept input from a user.
Bad things will happen.
Dynamically interfacing with databases using OOP is a solved problem, and its not worth reinventing the wheel. Have you considered using an ORM like SQLAlchemy?

Cassandra create and load data atomicity

I have got a web service which is looking for the last create table
[name_YYYYMMddHHmmss]
I have a persister job that creates and loads a table (insert or bulk)
Is there something that hides a table until it is fully loaded ?
First, I have created a technical table, it works but I will need one by keyspace (using cassandraAuth). I don’t like this.
I was thinking about tags, but it doesn’t seem to exist.
- create a table with tag and modify or remove it when the table is loaded.
There is also the table comment option.
Any ideas?
Table comment is a good option. We use it for some service information about the table, e.g. table versions tracking.

how to populate the collection of child registers?

I have an application that uses EF 5.0. Suppose That I have two tables:
tableA(IDTableA, IDTableB...)
TableB(IDTableB, ...)
When I query for elements of the tableA, I can do:
List<TableA> lstResult = myContext.TableA.SQLQuery("select * from TableA").ToList();
However, in this case the collection TableA.TableA in an entity of type TableA is not populate, is null, so I need to do other query:
1.- Convert all the IDs of my entity in lstResult to string in format ID1, ID2... I name this string strIDs.
2.- I do the query:
lst<TableB) lstResult2 = myContext.tableB.SQLQuery("select * from TableB where IDTableB IN(" + strIDs + ")").ToList();
In this moment, EF populates the colletion IDTableB in the tableA.
So this makes me do two queries. One to get the register of TableA and other to get the registers of TableB and know the relation between them.
If tableA has many FKs, then I need many additional queries, so I think that this is inefficient.
Lazy loading eorks in a similar way, do this additional queries when I need the information. However, I don't want to use lazy loading, because my dataContext has a short live, so when I return the entities as result of the query, when I try to get the childs records I get an error that says that the entity out of the context.
So other solution is to use a view, but I would like to know if it is posible to populate the child.
Also, I would like to use raw sql, because for dynamic queries for me is easier to create the query.
Thanks.

CouchDB views - Multiple join... Can it be done?

I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.

Resources