Can't connect dbt to Databricks - databricks

I am trying to connect to a Spark cluster on Databricks and I am following this tutorial: https://docs.databricks.com/dev-tools/dbt.html. And I have the dbt-databricks connector installed (https://github.com/databricks/dbt-databricks). However, no matter how I configure it, I keep getting "Database error, failed to connect" when I run dbt test / dbt debug.
This is my profiles.yaml:
databricks_cluster:
outputs:
dev:
connect_retries: 5
connect_timeout: 60
host: <my_server_hostname>
http_path: <my_http_path>
schema: default
token: <my_token>
type: databricks
target: dev
This is my dbt_project.yml:
# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'dbt_dem'
version: '1.0.0'
config-version: 2
# This setting configures which "profile" dbt uses for this project.
profile: 'databricks_cluster'
# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_packages"
# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models
# In this example config, we tell dbt to build all models in the example/ directory
# as tables. These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
dbt_dem:
# Config indicated by + and applies to all files under models/example/
example:
+materialized: view
I have also tried using the spark connector, but I still get the same error using that. Any ideas as to why I can't connect to the Databricks cluster?
These are the logs corresponding to the error:
============================== 2022-02-18 08:43:22.123066 | 4b91f9d3-28ad-4f5a-93db-f431b6d9af14 ==============================
08:43:22.123066 [info ] [MainThread]: Running with dbt=1.0.1
08:43:22.123841 [debug] [MainThread]: running dbt with arguments Namespace(cls=<class 'dbt.task.debug.DebugTask'>, config_dir=False, debug=None, defer=None, event_buffer_size=None, fail_fast=None, log_cache_events=False, log_format=None, partial_parse=None, printer_width=None, profile=None, profiles_dir='/Users/keremaslan/.dbt', project_dir=None, record_timing_info=None, rpc_method=None, send_anonymous_usage_stats=None, single_threaded=False, state=None, static_parser=None, target=None, use_colors=None, use_experimental_parser=None, vars='{}', version_check=None, warn_error=None, which='debug', write_json=None)
08:43:22.124057 [debug] [MainThread]: Tracking: tracking
08:43:22.143750 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb751ef42e0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb751ef4eb0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb751eca730>]}
08:43:22.236001 [debug] [MainThread]: Executing "git --help"
08:43:22.264682 [debug] [MainThread]: STDOUT: "b"usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]\n [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]\n [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]\n [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]\n <command> [<args>]\n\nThese are common Git commands used in various situations:\n\nstart a working area (see also: git help tutorial)\n clone Clone a repository into a new directory\n init Create an empty Git repository or reinitialize an existing one\n\nwork on the current change (see also: git help everyday)\n add Add file contents to the index\n mv Move or rename a file, a directory, or a symlink\n restore Restore working tree files\n rm Remove files from the working tree and from the index\n sparse-checkout Initialize and modify the sparse-checkout\n\nexamine the history and state (see also: git help revisions)\n bisect Use binary search to find the commit that introduced a bug\n diff Show changes between commits, commit and working tree, etc\n grep Print lines matching a pattern\n log Show commit logs\n show Show various types of objects\n status Show the working tree status\n\ngrow, mark and tweak your common history\n branch List, create, or delete branches\n commit Record changes to the repository\n merge Join two or more development histories together\n rebase Reapply commits on top of another base tip\n reset Reset current HEAD to the specified state\n switch Switch branches\n tag Create, list, delete or verify a tag object signed with GPG\n\ncollaborate (see also: git help workflows)\n fetch Download objects and refs from another repository\n pull Fetch from and integrate with another repository or a local branch\n push Update remote refs along with associated objects\n\n'git help -a' and 'git help -g' list available subcommands and some\nconcept guides. See 'git help <command>' or 'git help <concept>'\nto read about a specific subcommand or concept.\nSee 'git help git' for an overview of the system.\n""
08:43:22.265387 [debug] [MainThread]: STDERR: "b''"
08:43:22.272505 [debug] [MainThread]: Acquiring new databricks connection "debug"
08:43:22.273434 [debug] [MainThread]: Using databricks connection "debug"
08:43:22.273833 [debug] [MainThread]: On debug: select 1 as id
08:43:22.274044 [debug] [MainThread]: Opening a new connection, currently in state init
08:43:22.888586 [debug] [MainThread]: Databricks adapter: Error while running:
select 1 as id
08:43:22.889031 [debug] [MainThread]: Databricks adapter: Database Error
failed to connect
08:43:22.889905 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb751f7eaf0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb752113040>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb7521130a0>]}
08:43:24.130154 [debug] [MainThread]: Connection 'debug' was properly closed.

Check your profiles.yml (usually found here: ~/.dbt/profiles.yml) and make sure there is:
No https:// in the host name and
A leading slash in the http_path
For example:
host: server.name.com # instead of https://server.name.com
http_path: /sql/protocolv1/o/0/0000-000000-text000 # note the leading slash
These are easy mistakes to make if, like me, you copied the http_path straight out of the cluster config page and the host straight out of your browser URL.
Another example of profiles.yml from the dbt-databricks ReadMe: https://github.com/databricks/dbt-databricks

I had not specified this in the original question, but I had used conda to set up a virtual environment. Somehow that doesn't work, so I'd recommend following the tutorial to the letter and use pipenv.

Related

GitLab runner unable to clone private GitLab repositories

I'm using a shared runner that has access to the entire project group. In this one particular project on my GitLab server whose visibility is set to "private", when the runner attempts to clone the repository, it presents an error message noting that it could not clone the repository with a 403 response.
I searched high and low in the documentation but couldn't find the explanation for this nor a solution to this problem. I noticed when I switched the project's visibility to internal, everything started working.
Does anyone know why GitLab runners cannot access private repositories? If I wish to grant the runner access and the ability to execute the repo's CICD pipeline, how can I do so?
Here are the logs:
Fetching changes with git depth set to 50...
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
Initialized empty Git repository in /builds/4q6GE3ka/0/mysecretproject/myproject/.git/
Created fresh repository.
remote: You are not allowed to download code from this project.
fatal: unable to access 'https://example.com/mysecretproject/project.git/': The requested URL returned error: 403

Is there a solution for the odd error of bitbake?

When I used yocto to build my first linux system and after 'bitbake imx-image-multimedia' was excuted, I faced the odd error:
ERROR: gnu-config-native-20190501+gitAUTOINC+b98424c249-r0 do_unpack: Unpack failure for URL: 'git://git.savannah.gnu.org/config.git'. No up to date source found: clone directory not available or not up to date: /home/admin/Linux/Yocto/fsl/downloads//git2/git.savannah.gnu.org.config.git; shallow clone not enabled
ERROR: Logfile of failure stored in: /home/admin/Linux/Yocto/fsl/build/tmp/work/x86_64-linux/gnu-config-native/20190501+gitAUTOINC+b98424c249-r0/temp/log.do_unpack.73483
ERROR: Task (virtual:native:/home/admin/Linux/Yocto/fsl/sources/poky/meta/recipes-devtools/gnu-config/gnu-config_git.bb:do_unpack) failed with exit code '1'
Curious about the logfile, I opened /home/admin/Linux/Yocto/fsl/build/tmp/work/x86_64-linux/gnu-config-native/20190501+gitAUTOINC+b98424c249-r0/temp/log.do_unpack.73483 and I see:
DEBUG: Executing python function do_unpack
DEBUG: Executing python function base_do_unpack
DEBUG: Running 'export PSEUDO_DISABLED=1; unset _PYTHON_SYSCONFIGDATA_NAME; export SSH_AUTH_SOCK="/run/user/0/vscode-ssh-auth-sock-7925763"; export PATH="/home/admin/Linux/Yocto/fsl/sources/poky/scripts/native-intercept:/home/admin/Linux/Yocto/fsl/sources/poky/scripts:/home/admin/Linux/Yocto/fsl/build/tmp/work/x86_64-linux/gnu-config-native/20190501+gitAUTOINC+b98424c249-r0/recipe-sysroot-native/usr/bin/x86_64-linux:/home/admin/Linux/Yocto/fsl/build/tmp/work/x86_64-linux/gnu-config-native/20190501+gitAUTOINC+b98424c249-r0/recipe-sysroot-native/usr/bin:/home/admin/Linux/Yocto/fsl/build/tmp/work/x86_64-linux/gnu-config-native/20190501+gitAUTOINC+b98424c249-r0/recipe-sysroot-native/usr/sbin:/home/admin/Linux/Yocto/fsl/build/tmp/work/x86_64-linux/gnu-config-native/20190501+gitAUTOINC+b98424c249-r0/recipe-sysroot-native/usr/bin:/home/admin/Linux/Yocto/fsl/build/tmp/work/x86_64-linux/gnu-config-native/20190501+gitAUTOINC+b98424c249-r0/recipe-sysroot-native/sbin:/home/admin/Linux/Yocto/fsl/build/tmp/work/x86_64-linux/gnu-config-native/20190501+gitAUTOINC+b98424c249-r0/recipe-sysroot-native/bin:/home/admin/Linux/Yocto/fsl/sources/poky/bitbake/bin:/home/admin/Linux/Yocto/fsl/build/tmp/hosttools"; export HOME="/root"; git -c core.fsyncobjectfiles=0 branch --contains b98424c249119b79d3f709e26eb86f2fd4d5e5f3 --list master 2> /dev/null | wc -l' in /home/admin/Linux/Yocto/fsl/downloads//git2/git.savannah.gnu.org.config.git
ERROR: Unpack failure for URL: 'git://git.savannah.gnu.org/config.git'. No up to date source found: clone directory not available or not up to date: /home/admin/Linux/Yocto/fsl/downloads//git2/git.savannah.gnu.org.config.git; shallow clone not enabled
DEBUG: Python function base_do_unpack finished
DEBUG: Python function do_unpack finished
What is 'ERROR: Unpack failure for URL: git://git.savannah.gnu.org/config.git'. No up to date source found: clone directory not available or not up to date: /home/admin/Linux/Yocto/fsl/downloads//git2/git.savannah.gnu.org.config.git; shallow clone not enabled means?
What can I do to fix it? Thanks!
See first if this is similar to this thread
Currently, for my Yocto builds (for nxp and other boards) I used to share a same "downloads" DL_DIR to avoid unnecessary fetch operations.
I tried to use an empty DL_DIR...and it worked fine.
After investigating, I found out there is something wrecked in the "git2" sub-directory of DL_DIR.
I don't know what exactly.
So if you have a custom DL_DIR with a lot of stuff, try to rename your "git2" subdir as "git2.bak".
Check also if the // seen in /home/admin/Linux/Yocto/fsl/downloads//git2/git.savannah.gnu.org.config.git means an empty environment variable, meaning check if there should be an intermediate folder between downloads/ and /git2.
I've recently encountered this problem. I had an empty do_fetch function
do_fetch(){
:
}
. Just by removing it, the git repo was cloned properly.

How to store downloads folder on our private repo in yocto

After a successful "bitbake core-image-sato" build, i moved the downloads folder to my private repository, and then deleted downloads the folder and fetched it from my private repository.
I added BB_NO_NETWORK = "1" in local.conf, and when I tried to do "bitbake core-image-sato" it fails.
NOTE: Executing RunQueue Tasks
ERROR: gnu-config-native-20150728+gitAUTOINC+b576fa87c1-r0 do_fetch: Network access disabled through BB_NO_NETWORK (or set indirectly due to use of BB_FETCH_PREMIRRORONLY) but access requested with command LANG=C git -c core.fsyncobjectfiles=0 fetch -f --prune --progress git://git.savannah.gnu.org/config.git refs/*:refs/* (for url git://git.savannah.gnu.org/config.git)
ERROR: gnu-config-native-20150728+gitAUTOINC+b576fa87c1-r0 do_fetch: Function failed: base_do_fetch
ERROR: Logfile of failure stored in: /home/jamal/test/new_repot/build/tmp/work/x86_64-linux/gnu-config-native/20150728+gitAUTOINC+b576fa87c1-r0/temp/log.do_fetch.29816
ERROR: Task (virtual:native:/home/jamal/test/new_repot/sources/poky/meta/recipes-devtools/gnu-config/gnu-config_git.bb:do_fetch) failed with exit code '1'
It is trying to fetch the source code again, from network, as network access is disabled it fails.
Can you guys please help me in resolving this problem. Thanks for your time and patience.
The problem is missing BB_GENERATE_MIRROR_TARBALLS = "1" in local.conf. Tarballs from git repositories are automatically not created due to performance reasons, see the manual. Setting that variable enables creating of the tarballs, so they can be used later on and git server don't need to be contacted.
(Please see comments for the question for more information, we discussed the solution there. Thanks to #md.jamal for testing it.)

can't use different environment for puppet agent

I have an agent/master setup. I have created a new environment in /etc/puppetlabs/code/environments/ called master.
The content of environment.conf for the master directory environment is
modulepath = site:modules:$basemodulepath
manifest = manifests/site.pp
and when I try puppet agent -t --environment master I am getting some error
Notice: Local environment: 'master' doesn't match server specified node environment 'production', switching agent to 'production'.
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for node1.localpuppet.com
Info: Applying configuration version '1490712072'
Notice: Applied catalog in 0.67 seconds
I am new to puppet. What changes do I need?
PE Console Config
This is a "really fun" quirk of Puppet Enterprise that showed up in the last couple of years. You have to specify the nodes in the PE Classifier that are allowed to specify their directory environment in the puppet.conf or in the puppet agent -t --environment arguments.
In the agent-specified environment tab in the Classifier (you see it at the bottom of your picture above), you can enable it for all nodes. Do this by adding a rule, selecting the name fact, using a regular expression (~), then using the regexp for matching all characters (.*). After you fill this out, the PE Classifier will give you a number of matching nodes. It should be all that are subscribed to your master. Remember to click in the bottom right to update your rules. Your nodes will now be able to use master instead of production from the config file or CLI arguments.
That being said, if you are doing this to avoid naming your default Git branch production in your control repository when working with Code Manager, you should really just rename the branch as that is much easier.

Perforce error - cannot submit from non-stream client

Coming from GIT and SVN background, I've set up my P4V in windows, managed to get the content of my repository into my local folder (somehow), but once I made modification and create new files in my local folder, I cant submit it back to Perforce. It gives error:
//depot/main/p4config.txt - warning: cannot submit from non-stream client
No files to submit.
Submit failed -- fix problems above then use 'p4 submit -c 6'.
My depot hierarchy:
C:\Perforce\kernelpanic\main>p4 client -S //depot/main -o
# A Perforce Client Specification.
#
# Client: The client name.
# Update: The date this specification was last modified.
# Access: The date this client was last used in any way.
# Owner: The user who created this client.
# Host: If set, restricts access to the named host.
# Description: A short description of the client (optional).
# Root: The base directory of the client workspace.
# AltRoots: Up to two alternate client workspace roots.
# Options: Client options:
# [no]allwrite [no]clobber [no]compress
# [un]locked [no]modtime [no]rmdir
# SubmitOptions:
# submitunchanged/submitunchanged+reopen
# revertunchanged/revertunchanged+reopen
# leaveunchanged/leaveunchanged+reopen
# LineEnd: Text file line endings on client: local/unix/mac/win/share.
# ServerID: If set, restricts access to the named server.
# View: Lines to map depot files into the client workspace.
# Stream: The stream to which this client's view will be dedicated.
# (Files in stream paths can be submitted only by dedicated
# stream clients.) When this optional field is set, the
# View field will be automatically replaced by a stream
# view as the client spec is saved.
#
# Use 'p4 help client' to see more about client views and options.
Client: kernelpanic
Update: 2012/10/04 15:35:16
Access: 2012/10/04 15:59:39
Owner: me.kernelpanic
Host: kernelpanic
Description:
Created by me.kernelpanic.
Root: C:/Perforce/kernelpanic
Options: noallwrite noclobber nocompress unlocked nomodtime normdir
SubmitOptions: submitunchanged
LineEnd: local
View:
//depot/... //kernelpanic/...
//depot/main/doc/... //kernelpanic/main/doc/...
//depot/* //kernelpanic/*
//depot/main/* //kernelpanic/main/*
If possible, I'd like to add the files in C:\Perforce\kernelpanic\main\src as well...
Please help, I can't understand Perforce. I've tried doing a check out both before or after making modification, but both ways also failed to submit the change to server. I'd love to stick to GIT but our client is using Perforce so we have to play nice with them. Thanks a lot for your help!
It's possible that P4V has guided you to create a stream depot and a mainline, but has somehow created a non-stream workspace for you. From what I've heard, that's likely to happen for first-time P4V users due to something in the setup.
To see if that's the problem, go to Connection/Edit Workspace and look in the 'Stream' field. Is it empty? If so, use Browse to select the stream. This will turn your current workspace into a stream workspace.
If you prefer to work with the command-line, you can add a stream to your client as follows:
Show available streams:
$ p4 streams
Stream //stream/...1
Stream //stream/...2
Stream //stream/...3
…
Check your current client config:
$ p4 client -o
...
In my case, there was no stream mentioned there. Then edit the client config:
$ p4 client
and add:
Stream: //stream/...1
Check that it worked by rerunning p4 client -o.
Now retry p4 submit.

Resources