DBT Duplication Check Ignores Schemas - python-3.x

During dbt compile, there is a model duplication check to be sure models aren’t stepping on top of each other. This check is causing me problems.
Our Architecture
Our system delineates the stages of processing into different schemas, and we're wanting begin using dbt. So, say we’re importing a raw dataset we’re calling jaffles, we’ll have a raw.jaffles table, a clean.jaffles table, and so on. Note raw and clean in this examples are different schemas.
The Problem
This breaks the duplication check. No matter how I customize the schema names, or how I call ref, the duplication check happens before touching any of that, notices we have two models named “jaffles”, ignores that they wouldn’t actually collide from being in different schemas, and throws an error.
Possible Solutions
Ideally, I'd customize how it solves for the paths it uses to check duplication to include schema. But I can't find how to customize that part.
Possibly I could skip this check altogether and do the integrity check myself. But I couldn't find options to disable this.
The only solution I'm seeing that could work is to rename each of the views to be unique, and this would be a lot of work polluting an otherwise super-clean naming convention we already have established.

As stated in the docs, "model names need to be unique, even if they are in distinct folders".
What you could do, though, is to use custom aliases (see the docs), where you can re-use the same table/view name within 2 or more different schemas. In your example, you could have two different models that have a specific schema assigned each:
-- models/.../raw_jaffles.sql
{{ config(alias='jaffles', schema='raw') }}
-- models/.../clean_jaffles.sql
{{ config(alias='jaffles', schema='clean') }}
Nevertheless, the file names still need to be different one from the other.

Related

DDD: How to save the order of aggregates?

I have the two Aggregates 'notebook' and 'note'.
When I use the role 'aggregates reference only by there ids', I think I have two options:
Notebook(List<NoteId>, [other properties])
Note([other properties])
or
Notebook([other properties])
Note(NotebookId, [other properties])
With the first option, I need two DB calls to show all notes of a notebook (one to get the list and the second to load the notes).
So my current favorite is the second option. Now I have few options in my mind to save the order of the notes, where anyone has some disadvantages.
What is a good approach to solve my problem? Or is the first option better and the two DB calls are negligible?
Can anybody help?
Big THX
It looks that the order of the Notes is important, at least related to the Notebook, so maybe it should be part of the domain. If yes, I would suggest to store it together with the Note. Or use some other information of the Note to give an ordering when a list is loaded.
If not, why is the order relevant? I mean, the two entities have a related but separated lifecycle, or at least it looks: one aggregate - the Notebook - has a list that only references the other - the Note. Hence no direct interaction is planned. But, given the the domain is correctly modelled (there's not enough information to say something about it), somewhere you need a ordered list of Notes. The only way to have it as you need it is to store the information (or use one already stored), otherwise the hypothesis (order is relevant) is not valid anymore.
update after infos about number of Notes and their size
It looks that your domain is organized in this way:
a root entity, the Notebook, where the order of each Note, with only its ID, is also stored: any change in the order will be updated from here, not from the Note
another root entity, the Note, with its own lifecycle and its own 'actions' (operations that trigger a change in the entity)
Whenever you load the Notebook, you must load also the Note and it's order to show it correctly ordered. On the other side, when you change the order, this structure allows you to have a single action (or operation) on the Notebook, for example changeOrder(NoteId), that updates the order of the given Note and, if needed, changes the order of all the others. The trick, here, is that when you persist the Notebook you work just with the ID of the Note, so you don't have to load all the entity, but just a part of it, update and save it again. So, how big is the Note entity is not important, because you don't use it all. Hence, at every change you could trigger an update of all the couples (NoteID, order) for that Notebook. You can't do differently. But, to support this you need a single function in the repository where you load the ID of the Note and its order and you save it again; that should be not so expensive.
On the other side, all the actions that operate directly on the Note should load it, hence you have to load all. But in this case is required to load all, and save all, because you are changing the Note itself.
Anyway, the way you persist the order is totally demanded to the persistence layer, that is built over the domain. I mean, the domain has a Notebook and a set of Notes with order 1, 2, 3, etc.
Even if I don't think that this needs such a complex solution, you could use a totally differen way to store the order: you can use for example steps of 100 (so 100, 200, 300, etc): each new Note is put in the middle of the old two ones, and is the only one to be saved each time. Every since a while you run a job, or something else, that just normalizes all the values restoring the 100 steps (or whatever you use to persit the order). As I said, this looks an overcomplicated solution to the problem, but it also shows the fact that the entities of the domain could be totally different from the Persitence ones.

Can I get schemacrawler to ignore the schema name?

I'm attempting to make a comparison of two Oracle DBs - I'm running a report on two different schema names - in my case, a schema prefix. E.g. Using:
-schemas=FOO.*
then
-schemas=BAR.*
Is there a way of hiding this prefix from the report, so that it isn't shown as an obvious difference when comparing the two reports?
I know I can use the 'unimportant' text feature in Beyond Compare, but it would be nice to cover this upfront.
I have a feeling that I'm missing something obvious, or maybe no one ever requires this as the schema name is fairly fundamental. I suppose I am just comparing across schemas.
If it is in the help, I have probably misunderstood what I have read.
Any hints would be welcome.
Many thanks.
Of course, this was answered in an obvious place...
SchemaCrawler HowTo
How to hide catalog and schema names in text output
Change the configuration for the SchemaCrawler
schemacrawler.format.show_unqualified_names=true in the
schemacrawler.config.properties file. This setting will show
unqualified names of database objects such as tables and prcodures.
That is, the catalog and schema names will not be displayed. Use with
care, especially if you have foreign keys that reference tables in
other schemas, or synonyms.
However, in my situation, the output was actually within returned SQL and procedures etc, so is fundamental to what the DB is holding.
As far as I can see, my best way is to use Beyond Compare or something similar to strip these small strings out to aid in the comparison.

Complex Finds in Domain Driven Design

I'm looking into converting part of an large existing VB6 system, into .net. I'm trying to use domain driven design, but I'm having a hard time getting my head around some things.
One thing that I'm completely stumped on is how I should handle complex find statements. For example, we currently have a screen that displays a list of saved documents, that the user can select and print off, email, edit or delete. I have a SavedDocument object that does the trick for all the actions, but it only has the properties relevant to it, and I need to display the client name that the document is for and their email address if they have one. I also need to show the policy reference that this document may have come from. The Client and Policy are linked to the SavedDocument but are their own aggregate roots, so are not loaded at the same time the SavedDocuments are.
The user is also allowed to specify several filters to reduce the list down. These to can be from properties that are stored on the SavedDocument or the Client and Policy.
I'm not sure how to handle this from a Domain driven design point of view.
Do I have a function on a repository that takes the filters and returns me a list of SavedDocuments, that I then have to turn into a different object or DTO, and fill with the additional client and policy information? That seem a little slow as I have to load all the details using multiple calls.
Do I have a function on a repository that takes the filters and returns me a list of SavedDocumentsForList objects that contain just the information I want? This seems the quickest but doesn't feel like I'm using DDD.
Do I load everything from their objects and do all the filtering and column selection in a service? This seems the slowest, but also appears to be very domain orientated.
I'm just really confused how to handle these situations, and I've not really seeing any other people asking questions about it, which masks me feel that I'm missing something.
Queries can be handled in a few ways in DDD. Sometimes you can use the domain entities themselves to serve queries. This approach can become cumbersome in scenarios such as yours when queries require projections of multiple aggregates. In this case, it is easier to use objects explicitly designed for the respective queries - effectively DTOs. These DTOs will be read-only and won't have any behavior. This can be referred to as the read-model pattern.

Optional or boolean elements to specify characteristics in XML schema?

I'm trying to create an XML schema to describe some aspects of hospitals. A hospital may have 24 hour coverage on: emergency services, operating room, pharmacist, etc. The entire list is relatively short - around 10. The coverage may be on more than one of these services.
My question is how best to represent this. I'm thinking along the lines of:
<coverage>
<emergencyServices/>
<operatingRoom/>
</coverage>
Basically, the services are optional and, if they exist, the coverage is offered by the hospital.
Alternatively, I could have:
<coverage>
<emergencyServices>true</emergencyServices>
<operatingRoom>true</operatingRoom>
<pharmacist>false</pharmacist>
</coverage>
In this case, I require all the elements, but a value of false means that the coverage isn't offered.
There are probably other approaches.
What's the best practice for something like this? And, if I use the first option, what type should the elements be in the schema?
Best practice here depends really on the consumer.
The short and simple rule is that markup is for structure, and content is for data. So having them contain xs:boolean values is generally the best course.
Now, on to the options:
Having separate untyped elements is simple and clear; sometimes processing systems may have difficulty reading them, because some XML-relational mappers may not see any data in the elements to put in relational tables. But if they had values, like <emergencyServices>true</emergencyServices>, then the relational table would have a value to hold.
Again, if you have fixed element names, it means if your consumer is using a system that maps the XML to a database, every time you add a service, a schema change will have to be made.
There are several other ways; each has trade-offs:
Using a <xs:string> with an enumeration, and allow multiple copies. Then you could have <coverage>emergencyServices</coverage><coverage>operatingRoom</coverage>. It makes adding to the list simpler, but allows duplicates. This scheme does not require schema changes in the database for the consumer.
You could use attributes on the <coverage> element. They would have a xs:boolean type, but still require a schema change. Of course, this evokes the attribute vs. element argument.
One good resource is Chapter 11 of Effective XML. At least this should be read before making a final decision.

In what order are migrations executed in Orchard?

I have a habit of keeping a separate Migration Class for every custom Type or Part. A lot of the time I want to attach a Taxonomy Field for the same Taxonomy to several custom Parts. Since I'm not sure which migration will run first, I have to check if the Taxonomy already exists in each migration and create it if it doesn't, leading to a lot of duplicate code. I could move my code into a service for the sake of re-usability/maintainability but it would be easier still if I knew for sure which migration was going to get executed first.
They should be running in order of dependency, starting with the dependency, ending with the module depending on it.
However, for this sort of thing, you might want to try recipes rather than migrations.

Resources