I have a project for which I want to be able to run some entry points on databricks. I used dbx for that, having the following deployment.yaml file:
build:
python: "poetry"
environments:
default:
workflows:
- name: "test"
existing_cluster_id: "my-culster-id"
spark_python_task:
python_file: "file://tests/test.py"
I'm able to run the test script with the execute command:
poetry run dbx execute --cluster-id=my-culster-id test
My problem with this option is that it launches the script interactively and I can't really retrieve the executed code on Databricks, except by looking at the cluster's logs.
So I tried using the deploy and launch commands, such that a proper job is created and run on Databricks.
poetry run dbx deploy test && poetry run dbx launch test
However the job run fails with the following error, which I don't understand:
Run result unavailable: job failed with error message
Library installation failed for library due to user error. Error messages:
'Manage' permissions are required to modify libraries on a cluster
In any case, what do you think is the best way to run a job that can be traced on Databricks from my local machine ?
I've recently added a documentation section on the differences of execute and launch, would that be something that answers your question?
Related
I would like to run Spark from source code on my Windows machine. I did the following steps:
git clone https://github.com/apache/spark
Added the SPARK_HOME variable into the user variables.
Added %SPARK_HOME%\bin to the PATH variable.
./build/mvn -DskipTests clean package
./bin/spark-shell
The last command returns the following error:
What should I do to fix the error?
First, refer to the link below for the solution. The top voted answer gave me the working script for this problem.
: Failed to start master for Spark in Windows
The reason is that spark launch scripts do not support Windows. The spark documentation (https://spark.apache.org/docs/1.2.0/spark-standalone.html) insists you to start the master and workers manually if you are a Windows user. So you need to first run the master and then run spark-shell.
I am trying to create a secret scope in a Databricks notebook. The notebook is running using a cluster created by my company's admin - I don't have access to create or edit clusters. I'm following the instructions in the Databricks user notebooks (https://docs.databricks.com/user-guide/secrets/example-secret-workflow.html#example-secret-workflow) but get an error:
/bin/bash: databricks: command not found
Below is the code I've tried that returns the error:
%sh -e
databricks secrets create-scope --scope scopename
sh% is used so I can run the command line language in the notebook. I've tried using
%sh
and also
%sh -e
no luck.
I should be able to create a secret scope using this code but have had no luck. Any suggestions on the cause of this? Has anyone else had the same issue?
I've not heard of running the CLI from the cluster before. Even if it is installed I doubt it is configured.
You can download the CLI and run it from your local machine: https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html
You will need to be running Python locally. If you prefer there is also a PowerShell command-line (disclaimer I produced this): https://github.com/DataThirstLtd/azure.databricks.cicd.tools
Databricks clusters don't have databricks-cli installed by default. That doesn't mean you can't install it on the cluster. You can install databricks-cli using the following command in any databricks notebook:
%sh
/databricks/python/bin/pip install databricks-cli==0.9.1
Logging in may be a problem as you can't send responses using shell scripts within the notebooks. You can create the .databrickscfg file in the clusters root directory using the following set of commmands:
%sh
> ~/.databrickscfg
echo "[DEFAULT]" >> ~/.databrickscfg
echo "host = <your host>" >> ~/.databrickscfg
echo "token = <your token>" >> ~/.databrickscfg
You can save these commands as shell scripts that can be run automatically on cluster start up (init scripts).
I faced same issue with notebook.
If you have to run any databricks cli commands on your databricks instance, easiest way should be to use Web terminal.
You can launch web terminal from compute->Clusters->Apps->Launch Web Terminal
If not installed , you can use pip install databricks-cli
Configure user through command databricks configure or databricks
configure --token
Now you are good to run databricks cli commands
Here's a sample run on databricks web terminal which worked for me:
One other reason the ( /bin/bash: databricks: command not found) can happen that I noticed on my mac that is not listed here is the user path not exported. add this to your bash profile file or just run the command : export PATH="......(path to your python library)/Library/Python/3.9/bin"
I created a new gocd pipeline and have three shell script files to run on different stages.
The problem is the go agent doesn't know npm.
Note: I have npm installed on the machine with go agent and I manually run the shell script from the pipeline.
Here is my shell script to install the packages.
#!/bin/sh
npm install
The error:
01:34:43.674 [go] Start to execute task: <exec command="./install.sh" />.
01:34:43.680 ./install.sh: line 3: npm: command not found
01:34:43.814 [go] Current job status: failed.
Problem
Assuming you have npm/nodejs installed on the agent, the problem probably lies in the fact the user doesn't have its PATH environment variable configured to look into the folder npm was installed in.
Solution
1) You can specify the whole path (/usr/bin/npm) when creating a task.
2) You can edit the .bashrc/.bash_profile of the user running the gocd agent server. In which case you'll be able to call '/usr/bin/npm' without the path prepended.
Example Working Configuration
Consider modification of the agent init script. Changing .bashrc/.bash_profile of the user running the gocd agent does not work because the go agent insulates itself from the calling environment. So on our systems we add these PATH items to the go agent startup scripts. (I use puppet to create agents. The default agent init scripts are not that good - you need to own them.)
I am running a CODED-UI test as a command in my Jenkins workflow. This command works when executed in the server machine's cmd window but fails when executed through Jenkins with the error.
"Error calling Initialization method for test class xxx.xx.xx.CodedUITest.CodedUITest3: Microsoft.VisualStudio.TestTools.UITest.Extension.UITestException: To run tests that interact with the desktop, you must set up the test agent to run as an interactive process. For more information, see "How to: Set Up Your Test Agent to Run Tests That Interact with the Desktop" (http://go.microsoft.com/fwlink/?LinkId=255012)
If you are running the tests as part of your team build, you must also set up the build agent to run as an interactive process. For more information, see "How to: Configure and Run Scheduled Tests After Building Your Application" (http://go.microsoft.com/fwlink/?LinkId=254735)" .
I installed Testagents , selected the option to make it desktop interactive process but the error still persists. The user ID provided in the testagent is the same user ID used for calling the command.
Trying to figure out what else I am missing.
Command used : C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\IDE\MStest.exe /testcontainer:"E:\workspace\Microsoft\xxx\Publish\Test2.orderedtest"
You need to install test agent. You have a Jenkins Server and from tehre you are triggering the coded ui. For coed ui tests to run u have to install and mention the test agent where to run your test. that agent should have VSTS installed in it. You can have server and agent on same machine if needed.
I'm new to this area and was trying to run the following commands from jenkins:
npm install
grunt quickStart
So far I've a jenkins running on a window machine as a window service and I've also installed NodeJs plugin for jenkins.
However, I'm stuck and quit confused following instructions here, its asking me to to add one or mode nodeJs installation and I could not find those setting and not even sure if I even need them in the first place.
Here is the bit that's asking me to do:
I cannot see this setting for the jenkins job I create. Is there an easy way to run those command in jenkins from a .bat or .sh script, a .bat would be recomended since I'm on window machine.
Note : I've already checked out a project using git in jenkins!!!
Thanks
"its asking me to to add one or mode nodeJs installation and I could not find those setting and not even sure if I even need them in the first place."
I don't think you need that I have pointed jenkins to the node installation folder an nothing more.., for this you go to Manage Jenkins->Configure System->NodeJS->NodeJS installations.. type in any name you like and point to where the node home folder is.
...cannot see this setting for the jenkins job I create...
Once you have configure that in your jenkins configuration you should have that configuration available like so:
...Is there an easy way to run those command in jenkins from a .bat or .sh script, a .bat would be recomended since I'm on window machine
I'm sorry don't get what commands are you referring to...
so summarizing :
you have to tell jenkins where you have your node installation
use that configuration in your jobs
hope this helps...
The way I made it, was trough execute shell, as the build tool for nodejs is npm, I simply wrote a shell script that instructs jenkins to run npm install in the workspace directory inside jenkins where it clones the git repository and then to zip and move the package if successful to another folder.