DBT breaking up schema.yml file - readability

We have a large schema.yml file in our DBT folders. It is not the cleanest or easiest to find what we need in it. I am curious if anyone knows of a way to split up this file. I am not trying to overcomplicate things and separate the dbt project into multiple or anything like that but rather just work on cleaning up the schema.yml file for readability etc. Thanks!

You can split this up as much as one model per file, and call the files whatever you want.
The way I usually do it is one file per model and name the file the same as the model.
Just make sure you have
version: 2
models:
at the top of your file and you’re good to go!

Related

How to organize the playground in docassemble / best practices?

This isn't exactly the best stack overflow question because it's opinion-based, but I'm going to try to ask it in a way that will lend itself relatively to an answer with some degree of factuality (as opposed to opinion).
I know you can switch projects (https://docassemble.org/docs/playground.html#projects), which is of course very useful. What I'm thinking about in particular is that I have seen some tutorials that abstract code out of interviews in .py files -- this seems reasonably useful to me, not the least of all because of linting (tangent: is there a docassemble linter?).
Because of the way docassemble does inheritance, I think I would rather have my entire playground be one big directory with subdirectories for projects (rather than starting from scratch with new projects ... some of the .yml file, .py files, static files, etc. are probably able to be written in a way that they can be re-used across interviews and I'd love to do that in a way that's less clunky than re-importing them into a new project when I need them.
Can we organize the playground in docassemble, or are we stuck with a one-level directory?
If the playground can be organized (eg. into directories, subdirectories, etc.), are there any community-accepted or JHPyle-reccomended best-practices around that? (i.e. although I assume less formal, I'm thinking something like PEP) I know it's probably easy-enough to come up with a file naming convention with similar effect, but that's a bit hacky.
Is in possible, as an alternative, to simply directly edit the packages?
The main thing I'd like to accomplish, and the main impetus for this question, is keeping my code DRY by using helper functions / helper .yml files.
The Playground is a simplified interface for people who are new to programming. It supports "projects" but does not support subdirectories. Advanced programmers can write their code in Python packages using a text editor and can use subdirectories under the data directory if they want to.

Using UglifyJs on the whole Node project?

I need to obfuscate my source code as best as possible so I decided to use uglifyjs2.. Now I have the project structure that has nested directories, how can I run it through uglifyjs2 to do the whole project instead of giving it all the input files?
I wouldn't mind if it minified the whole project into a single file or something
I've done something very similar to this in a project I worked on. You have two options:
Leave the files in their directory structure.
This is by far the easier option, but provides a much lower level of obfuscation since someone interested enough in your code basically has a copy of the logical organization of files.
An attacker can simply pretty-print all the files and rename the obfuscated variable names in each file until they have an understanding of what is going on.
To do this, use fs.readdir and fs.stat to recursively go through folders, read in every .js file and output the mangled code.
Compile everything into a single JS file.
This is much more difficult for you to implement, but does make life harder on an attacker since they no longer have the benefit of your project's organization.
Your main problem is reconciling your require calls with files that no longer exist (since everything is now in the same file).
I did this by using Uglify to perform static analysis of my source code by analyzing the AST for calls to require. I then loaded the source code of the required file and repeated.
Once all code was loaded, I replaced the require calls with calls to a custom function, wrapped each file's source code in a function that emulates how node's module system works, and then mangled everything and compiled it into a single file.
My custom require function does most of what node's require does except that rather than searching the disk for a module, it searches the wrapper functions.
Unfortunately, I can't really share any code for #2 since it was part of a proprietary project, but the gist is:
Parse the source text into an AST using UglifyJS.parse.
Use the TreeWalker to visit every node of the AST and check if
node instanceof UglifyJS.AST_Call && node.start.value == 'require'
As I have just completed a huge pure Nodejs project in 80+ files I had the same problem as OP. I needed at least a minimal protection for my hard work, but it seems this very basic need had not been covered by the NPMjs OS community. Add salt to injury the JXCore package encryption system was cracked last week in a few hours so back to obfuscation...
So I created the complete solution, that handles file merging, uglifying. You have the option of leaving out specified files/folders as well from merging. These files are then copied to the new output location of the merged file and references to them are rewritten auto.
NPMjs link of node-uglifier
Github repo of of node-uglifier
PS: I would be glad if people would contribute to make it even better. This is a war between thieves and hard working coders like yourself. Lets join our forces, increase the pain of reverse engineering!
This isn't supported natively by uglifyjs2.
Consider using webpack to package up your entire app into a single minified .js file, excluding node_modules:
http://jlongster.com/Backend-Apps-with-Webpack--Part-I
I had the same need - for which I created node-optimize and grunt-node-optimize.
https://www.npmjs.com/package/grunt-node-optimize

Custom log processing/parsing

I have such log format:
[26830431.7966868][4][0.013590574264526367][30398][api][1374829886.320353][init]
GET /foo
{"controller"=>"foo", "action"=>"index"}
[26830431.7966868][666][2.1876697540283203][30398][api][1374829888.4944339][request_end]
200 OK
The entry is constracted using such pattern:
[request_id][user_id][time_from_request_started][process_id][app][timestamp][tagline]
payload
Durring request I have many point where I log something - app basically has complex behaviour. This helps me debug a lot the user behaviour.
The way I would like to parse it is that I would like to make have directory structure like this:
req_id
|
|----[time_from_request_started][process_id][timestamp][tagline]
|
etc
Basically each directory will have name based on req_id, with files wchich names are rest of tagline. These files will include payload.
And also I will have other directory, with users ids, which will contain symlinks to request done by this user.
First question: Is this structure correct? In my opinion it will make easy fast log access. The reason I want to use directories and files is that I like unix approach, and try it (feel by myself its drawbacks and advantages)
Second question: I will have no problem to use ruby for creating this. But I would like to learn some new tool, which is better suited for this. I am thinking about using just unix tools (pipe, awk etc) to achieve this, or write parser in golang which I am learning right now (even have time to implement simple map reduce). What tool is best suited for this?
I would not store logs in a directory to see how the users behave.
Depending on what behaviour you want to keep track of you could use different tools. One of these could be mixpanel or keen.io.
Instead of logging what the user did in a log file you would sent an event to either of those (they are pretty similar, pick the one you think has better docs / lib), then you would graph those events to better understand the behaviour of your users. I've done this a lot recently, to display data in a nice way I've used rickshaw.
The key point why I'm suggesting this is that if you go the file route you will still have to find a way to understand your data, something that graphs will help you a lot at. Also, visualization is something keen.io does by default, you may still want to do your graphs but it's a good start.
Hope this helped.
Is this structure correct?
Only you can know that, it depends directly on how the data needs be accessed and used.
What tool is best suited for this?
You could probably use UNIX tools to achieve this but it may as well be a good exercise to practice your Go skills by writing this. It would also be more extensible.

scons: Unnecessarily rebuilds files during the first time-stamp only build

I am doing a timestamp-only build to bulk convert image files. Many of the converted image files already exist, but I like to make sure that they are all checked through each time.
How come SCons requires a database file (.sconsign.dblite) that it uses for MD5 hash data when it's instructed (via env.Decider("timestamp-newer")) to only deal with timestamps? It shouldn't need to keep a database between builds for timestamps because all the information is associated with the files themselves.
If the dblite database doesn't exist SCons reconverts all the images regardless of whether their timestamps imply they need to be rebuilt or not. The title is an example message I get when the dblite database does not exist.
If anyone can explain this I'd really appreciate it. I love the functional programming with Python, but SCons itself is not quite doing it for me at the moment.
Using "timestamp-newer", SCons actually stores the timestamp info. You can see why here:
Using Time Stamps to Decide If a File Has Changed
Try using "timestamp-match" instead.
I finally got this sorted. Brady was right about how to use SCons, but I a few days ago I eventually worked out you can also control exactly what you want built by just controlling what build commands are issued in the first place. In my case I ignored any image files for which the target file already exists using os.path.exists().
Sounds simple, but it is a conceptual difference between SCons and make, because make does not save its state between builds in the way SCons does.
Yes, I'm trying to work out the same thing, but I'm doing bulk conversion of video files which takes several days if done unnecessarily. I've already done most of it.
So I want a way to tell SCons, "For files that exist now, store their existing timestamps/MD5s, and don't rebuild unless that changes in future."
Will report back if I find a way...
I think your question is really about why there's a .sconsign.dblite when you set the decider to just check timestamp.
One reason is that it allows SCons to keep track of the method used to produce each target. If that changes, even if the timestamp doesn't, it should rebuild the affected targets.
Have you tried building a single file, and then using the sconsign utility to examine the contents of the .sconsign.dblite file?

Code archive? what do people use?

I have loads of notepad , js , .cs in a folder that I use to refer back to when I'm developing. They are just in a folder on my laptop. Is anyone aware of a better way of storing all this guff in a more stuctured way? Thinking some kind of cloud website or something?
You can use a wiki for this kind of thing. There are wikis that are local, such as TiddlyWiki.
One way or another, to keep things safe, you should use source control, and/or backup to the cloud.
I keep my code samples that aren't project-specific in a revision-controlled directory tree, based on the language they're in; actual projects are also kept in revision control, but are kept separately. I have tons of them now.
For smaller idioms and snippets that are useful or that I forget as I switch between languages for a period of time, I pop them into a wiki, with different pages also based on which language they're in. I don't put whole files in there; I just extract the pieces that I tend to forget and pop them in there.
They do tend to build up as time goes on, so just putting the smaller pieces in is much more efficient for fast lookup.

Resources