Query to show Tables and Table definition in Starcounter Database - starcounter

How can I get a list of table names and definitions by either SQL statement or code behind for the Starcounter DB?

Metadata about created tables, their columns and indexes are stored in meta-data tables. Database classes are publicly exposed for corresponding meta-data tables.
Tables or types are described by Starcounter.Metadata.RawView and Starcounter.Metadata.ClrClass and both extends Starctouner.Metadata.Table. ClrClass contains description for loaded CLR classes only, while RawView describes all created tables. They include descriptions of user-defined classes/tables and metadata classes/tables.
For example, all loaded user-defined classes can be enumerated:
foreach(ClrClass c in Db.SQL<ClrClass>(
"select c from Starcounter.Metadata.ClrClass c where Updatable = ?", true)) {
Console.WriteLine(c.FullName);
}
Property Updatable of Table is true for user-defined tables and false for meta-data/system tables.
Properties or columns are described by Starcounter.Metadata.Member and its children. An example of enumerating all columns for all user-defined tables is:
foreach(Member m in Db.SQL<Member>(
"select m from Column m, RawView v where m.Table = v and v.Updatable = ?",
true)) {
Console.WriteLine(m.Name);
}
Indexes are described by Starcounter.Metadata.Index and Starcounter.Metadata.IndexedColumn.
Currently it is one-to-one match between database classes and tables. However, this and metadata schema might change in future.

Related

How do I create a Django migration for my ManyToMany relation that includes an on-delete cascade?

I'm using PostGres 10, Python 3.9, and Django 3.2. I have set up this model with the accompanying many-to-many relationship ...
class Account(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
...
crypto_currencies = models.ManyToManyField(CryptoCurrency)
After generating and running Django migrations, the following table was created ...
\d cbapp_account_crypto_currencies;
Table "public.cbapp_account_crypto_currencies"
Column | Type | Modifiers
-------------------+---------+------------------------------------------------------------------------------
id | integer | not null default nextval('cbapp_account_crypto_currencies_id_seq'::regclass)
account_id | uuid | not null
cryptocurrency_id | uuid | not null
Indexes:
"cbapp_account_crypto_currencies_pkey" PRIMARY KEY, btree (id)
"cbapp_account_crypto_cur_account_id_cryptocurrenc_38c41c43_uniq" UNIQUE CONSTRAINT, btree (account_id, cryptocurrency_id)
"cbapp_account_crypto_currencies_account_id_611c9b45" btree (account_id)
"cbapp_account_crypto_currencies_cryptocurrency_id_685fb811" btree (cryptocurrency_id)
Foreign-key constraints:
"cbapp_account_crypto_account_id_611c9b45_fk_cbapp_acc" FOREIGN KEY (account_id) REFERENCES cbapp_account(id) DEFERRABLE INITIALLY DEFERRED
"cbapp_account_crypto_cryptocurrency_id_685fb811_fk_cbapp_cry" FOREIGN KEY (cryptocurrency_id) REFERENCES cbapp_cryptocurrency(id) DEFERRABLE INITIALLY DEFERRED
How do I alter my field relation, or generate a migration, such that the cascade relationship is ON-DELETE CASCADE? That is, When I delete an account, I would like accompanying records in this table to also be deleted.
Had a closer look on this. I tried to replicate your models and I also see that the intermediary table has no cascade. I have no answer on your main question on how to add the cascade, but it seems that django does the cascade behavior which already supports this:
When I delete an account, I would like accompanying records in this table to also be deleted.
To demonstrate:
a = Account.objects.create(name='test')
c1 = CryptoCurrency.objects.create(name='c1')
c2 = CryptoCurrency.objects.create(name='c2')
c3 = CryptoCurrency.objects.create(name='c3')
a.crypto_currencies.set([c1, c2, c3])
If you do:
a.delete()
Django runs the following SQL which simulates the cascade on the intermediary table:
[
{
'sql': 'DELETE FROM "myapp_account_crypto_currencies" WHERE "myapp_account_crypto_currencies"."account_id" IN (3)', 'time': '0.002'
},
{
'sql': 'DELETE FROM "myapp_account" WHERE "myapp_account"."id" IN (3)', 'time': '0.001'
}
]
I can't find in the documentation why it is done this way though. Even adding a custom intermediary like this results in the same behavior:
class Account(models.Model):
name = models.CharField(max_length=100)
crypto_currencies = models.ManyToManyField(CryptoCurrency, through='myapp.AccountCryptocurrencies')
class AccountCryptocurrencies(models.Model):
account = models.ForeignKey(Account, on_delete=models.CASCADE)
cryptocurrency = models.ForeignKey(CryptoCurrency, on_delete=models.CASCADE)
When you use a ManyToManyField, Django creates a intermediary table for you, in this case named cbapp_account_crypto_currencies. What you want to do in the future is to always explicitly create the intermediary model, AccountCryptoCurrencies, then set the through attribute of the ManyToManyField. This will allow you to add more fields in the future to the intermediary model. See more here: https://docs.djangoproject.com/en/3.2/ref/models/fields/#django.db.models.ManyToManyField.through.
What you will now need to do is so create this intermediary table:
class AccountCryptoCurrencies(models.Model):
account = models.ForeignKey(Account)
cryptocurrency = models.ForeignKey(CryptoCurrency)
class Meta:
db_table = 'cbapp_account_crypto_currencies'
class Account(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
...
crypto_currencies = models.ManyToManyField(CryptoCurrency, through=AccountCryptoCurrencies)
You are now need to generate a migration, but do not apply it yet! Modify the migration by wrapping it in a SeparateDatabaseAndState. I havent created your migration file because I dont have the full model, but you can see here for how to do it: How to add through option to existing ManyToManyField with migrations and data in django
Now you can apply the migration and you should now have an explicit intermediary table without losing data. You can also now add additional fields to the intermediary table and change the existing fields. You can add the on_delete=models.CASCADE to the account field and migrate the change.

Generate database schema diagram for Databricks

I'm creating a Databricks application and the database schema is getting to be non-trivial. Is there a way I can generate a schema diagram for a Databricks database (something similar to the schema diagrams that can be generated from mysql)?
There are 2 variants possible:
using Spark SQL with show databases, show tables in <database>, describe table ...
using spark.catalog.listDatabases, spark.catalog.listTables, spark.catagog.listColumns.
2nd variant isn't very performant when you have a lot of tables in the database/namespace, although it's slightly easier to use programmatically. But in both cases, the implementation is just 3 nested loops iterating over list of databases, then list of tables inside database, and then list of columns inside table. This data could be used to generate a diagram using your favorite diagramming tool.
Here is the code for generating the source for PlantUML (full code is here):
# This script generates PlantUML diagram for tables visible to Spark.
# The diagram is stored in the db_schema.puml file, so just run
# 'java -jar plantuml.jar db_schema.puml' to get PNG file
from pyspark.sql import SparkSession
from pyspark.sql.utils import AnalysisException
# Variables
# list of databases/namespaces to analyze. Could be empty, then all existing
# databases/namespaces will be processed
databases = ["a", "airbnb"] # put databases/namespace to handle
# change this if you want to include temporary tables as well
include_temp = False
# implementation
spark = SparkSession.builder.appName("Database Schema Generator").getOrCreate()
# if databases aren't specified, then fetch list from the Spark
if len(databases) == 0:
databases = [db["namespace"] for db in spark.sql("show databases").collect()]
with open(f"db_schema.puml", "w") as f:
f.write("\n".join(
["#startuml", "skinparam packageStyle rectangle", "hide circle",
"hide empty methods", "", ""]))
for database_name in databases[:3]:
f.write(f'package "{database_name}" {{\n')
tables = spark.sql(f"show tables in `{database_name}`")
for tbl in tables.collect():
table_name = tbl["tableName"]
db = tbl["database"]
if include_temp or not tbl["isTemporary"]:
lines = []
try:
lines.append(f'class {table_name} {{')
cols = spark.sql(f"describe table `{db}`.`{table_name}`")
for cl in cols.collect():
col_name = cl["col_name"]
data_type = cl["data_type"]
lines.append(f'{{field}} {col_name} : {data_type}')
lines.append('}\n')
f.write("\n".join(lines))
except AnalysisException as ex:
print(f"Error when trying to describe {tbl.database}.{table_name}: {ex}")
f.write("}\n\n")
f.write("#enduml\n")
that then could be transformed into the picture:

non-ordinal access to rows returned by Spark SQL query

In the Spark documentation, it is stated that the result of a Spark SQL query is a SchemaRDD. Each row of this SchemaRDD can in turn be accessed by ordinal. I am wondering if there is any way to access the columns using the field names of the case class on top of which the SQL query was built. I appreciate the fact that the case class is not associated with the result, especially if I have selected individual columns and/or aliased them: however, some way to access fields by name rather than ordinal would be convenient.
A simple way is to use the "language-integrated" select method on the resulting SchemaRDD to select the column(s) you want -- this still gives you a SchemaRDD, and if you select more than one column then you will still need to use ordinals, but you can always select one column at a time. Example:
// setup and some data
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
case class Score(name: String, value: Int)
val scores =
sc.textFile("data.txt").map(_.split(",")).map(s => Score(s(0),s(1).trim.toInt))
scores.registerAsTable("scores")
// initial query
val original =
sqlContext.sql("Select value AS myVal, name FROM scores WHERE name = 'foo'")
// now a simple "language-integrated" query -- no registration required
val secondary = original.select('myVal)
secondary.collect().foreach(println)
Now secondary is a SchemaRDD with just one column, and it works despite the alias in the original query.
Edit: but note that you can register the resulting SchemaRDD and query it with straight SQL syntax without needing another case class.
original.registerAsTable("original")
val secondary = sqlContext.sql("select myVal from original")
secondary.collect().foreach(println)
Second edit: When processing an RDD one row at a time, it's possible to access the columns by name by using the matching syntax:
val secondary = original.map {case Row(myVal: Int, _) => myVal}
although this could get cumbersome if the right hand side of the '=>' requires access to a lot of the columns, as they would each need to be matched on the left. (This from a very useful comment in the source code for the Row companion object)

how to get eager loading in a many to many relationships?

I have a database with four tables. TableA and TableB are the main tables and the TableC is the table of the many to many relationships.
TableA(IDTableA, Name...)
TableB(IDTableB, Name...)
TableC(IDTableA, IDTableB)
This create three entities, The EntityA has an ICollection of Entity C and Entity C has a Collection of EntitiesB, so when I try to get the related entities I do this:
myContext.EntityA.Include(a=>a.EntityB.Select(b=>b.EntityC));
But this throw and exception that says that the collection is null.
So I would like to know if it is possible to do an eager loading when there are a table for the many to many relationship.
Thanks.
I think you need this:
var q = myContext.EntityC.Include("EntityA").Include("EntityB").ToList();
If you want Bs of an A:
var aId; // = something...;
var bs = from c in q
where c.EntityAId == aId
select c.EntityBId;
And simply vice versa if you need As of a B:
var bId; // = something...;
var eas = from c in q
where c.EntityBId == bId
select c.EntityAId;
With many to many association in Entity Framework you can choose between two implementations:
The junction table (C) is part of the conceptual model (class model) and the associations are A—C—B (1—n—1). A can't have a collection of Bs.
The junction table is not part of the conceptual model, but Entity Framework uses it transparently to sustain the association A—B (n—m). A has a collection of Bs and B has a collection of As. This is only possible when table C only contains the two FK columns to A and B.
So you can't have both.
You (apparently) chose the first option, so you will always have to query the other entites through C, like
from a in context.As
select new { a, Bs = a.Cs.Select(c => c.B) }
or
from a in As.Include(a1 => a1.Cs.Select(c => c.B))

Querying multiple tables with a where clause in LINQ to SQL

Forgive my ignorance with Linq to SQL but...
How do you query mulitple tables in one fell swoop?
Example:
I want to query, say 4 tables for a title that includes the following word "penguin". Funnily enough each table also has a field called TITLE.
Tables are like so:
I want to query each table (column: TITLE) for the word "penguin". Each table is referenced (via foreign key) to a parent table that is simply called Reference, and is linked on a column called REF_ID. So ideally the result should come back with a list of REF_ID's where the query criteria was matched.
If you can help you will be richly rewarded....... (with a green tick ;)
The code I have works for just one table - but not for two:
var refs = db.REFERENCEs
.Include(r => r.BOOK).Where(r => r.BOOK.TITLE.Contains(titleString)).Include(r => r.JOURNAL.AUTHORs)
.Include(r => r.JOURNAL).Where(r => r.JOURNAL.TITLE.Contains(titleString));
I had a similar scenario a while back and ended up creating a view that unioned my tables and then mapped that view to a LINQ-to-SQL entity.
Something like this:
create view dbo.References as
select ref_id, title, 'Book' as source from dbo.Book
union all
select ref_id, title, 'Journal' from dbo.Journal
union all
select ref_id, title, 'Magazine' from dbo.Magazine
union all
select ref_id, title, 'Report' from dbo.Report
The mapping would look like this (using attributes):
[Table(Name="References")]
public class Reference {
[Column(Name="Ref_Id", IsPrimaryKey=true)]
public int Id {get;set;}
[Column]
public string Title {get;set;}
[Column]
public string Source {get;set;}
}
Then a query might look like this:
var query = db.GetTable<Reference>().Where(r => r.Title.Contains(titleString));

Resources