Can I have more than one connection in databricks-connect? - databricks

I have setup on my PC a miniconda python environment where I have installed the databricks-connect package and configured the tool with databricks-connect configure to connect to a databricks instance I want to use when developing code in the US.
I have a need to connect to a different a different databricks instance for developing code in the EU and I thought I could do this by setting up a different miniconda environment and installing databricks-connect in that environment and setting the configuration in that environment to point to the new databricks instance.
Alas, this did not work. When I look at databricks-connect configure in either miniconda environment, I see the same configuration in both which is the configuration I last configured.
My question therefore is: Is there a way to have multiple databricks-connect connections at the same time and toggle between the two without having to reconfigure each time?
Thank you for your time.

Right now, databricks-connect relies on the central configuration file, and this causes problems. There are two approaches to workaround that:
Use environment variables as described in the documentation, but they should be set somehow, plus you need to have different python environments for different versions of databricks-connect
Specify parameters as spark configuration (see in the same documentation)
For each DB cluster, do following:
separate python environment with name <name> & activate it
install databricks-connect into it
configure databricks-connect
move ~/.databricks-connect into ~/.databricks-connect-<name>
write wrapper script, that will activate python environment & symlink ~/.databricks-connect-<name> into ~/.databricks-connect (I have such script for Zsh, it could be too long to paste it here.)

Related

Upgrading MariaDB on AWS Linux machine

I have a moodle site which runs on a linux AWS box and I'm trying to upgrade it. I need to have MariaDB 10.3 on there, and I currently have 10.2.10
I've followed the instruction for upgrading using yum from this webpage https://www.ryadel.com/en/mariadb-10-upgrade-10-3-without-losing-data-how-to/ and all goes fine until I get to Running Transaction Check and Running Transaction Check at which point I get the following
Transaction check error:
file /etc/my.cnf from install of MariaDB-common-10.3.27-1.el7.centos.x86_64 conflicts with file fr
om package mariadb-config-3:10.2.10-2.amzn2.0.3.x86_64
file /usr/lib64/libmysqlclient.so.18 from install of MariaDB-compat-10.3.27-1.el7.centos.x86_64 co
nflicts with file from package mariadb-libs-3:10.2.10-2.amzn2.0.3.x86_64
I'm not sure what to do now? Any help or pointers would be appreciated.
EC2 is not designed for database specifically
You seem to be installing and running your database on EC2 (what you call a linux AWS box), this means you can SSH into the instance and install software manually and carry out updates and edit configuration files and settings etc.
RDS is designed for Database
RDS also has other really convenient features like automatic version upgrade and maintenance window management.
If your situation allows I would suggest to use a tool designed for database instead of having to configure things manually. It will save you a lot of time and troubleshooting, it is also more secured.

Execute databricks magic command from PyCharm IDE

With databricks-connect we can successfully run codes written in Databricks or Databricks notebook from many IDE. Databricks has also created many magic commands to support their feature with regards to running multi-language support in each cell by adding commands like %sql or %md. One issue I am facing currently is when I try to execute Databricks notebooks in Pycharm is as follows:
How to execute Databricks specific magic command from PyCharm.
E.g.
Importing a script or notebook in Done in Databricks using this command-
%run
'./FILE_TO_IMPORT'
Where as in IDE from FILE_TO_IMPORT import XYZ works.
Again everytime I download Databricks notebook it comments out the magic commands and that makes it impossible to be used anywhere outside Databricks environment.
It's really inefficient to convert all databricks magic command everytime I want to do any developement.
Is there any configuration I could set which automatically detects Databricks specific magic commands?
Any solution to this will be helpful. Thanks in Advance!!!
Unfortunately, as per the databricks-connect version 6.2.0-
" We cannot use magic command outside the databricks environment directly. This will either require creating custom functions but again that will only work for Jupyter not PyCharm"
Again, since importing py files requires %run magic command so this also becomes a major issue. A solution to this is by converting the set of files to be imported as a python package and add it to the cluster via Databricks UI and then import and use it in PyCharm. But this is a very tedious process.

possibility of using several virtual environments simultaneously

hello I noticed that some of the libraries I use are used for development but also for analysis and sometimes I have to use both is there a way to use a basic virtual environment as in the templates and can I use the two libraries a few times simultaneously
Yes.
Create two virtual environments.
Call them v1 and v2
Open terminal x 2
First terminal - activate v1 and install your library 1
Second terminal - activate v2 and install your library 1
Yes of course you can use multiple virtual environments. You need to create virtual envs. If you're already aware of anaconda env, you can create virtual envs with them else you can go in the normal way of creating venvs.
After creating them, have 2 terminals each for each venv. Run your scripts.

pyspark installation on windows 10 fails

I install spark according to all available tutorials I found on internet. Set up all environmental variables yet I am still not able to launch it. Please see the attached report.
make sure your environment variables are setup properly for spark home and path, for example:
SPARK_HOME = D:\Spark\spark-2.3.0-bin-hadoop2.7
PATH += D:\Spark\spark-2.3.0-bin-hadoop2.7\bin

Change Python path in CDH for pyspark

I need to change the python that is being used with my CDH5.5.1 cluster. My research pointed me to set PYSPARK_PYTHON in spark-env.sh. I tried that manually without success. I then used Cloudera Manager to set the variable in both the 'Spark Service Environment Advanced Configuration Snippet' and 'Spark Service Advanced Configuration Snippet' & about everywhere else that referenced spark-env-sh. This hasn't worked and I'm at a lost where to go next.
You need to add the PYSPARK_PYTHON variable to the YARN configuration :
YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve)
Do that, restart the cluster and you are good to go.

Resources