I am learning to use Azure Machine Learning. it has its Notebooks (which are ok!) and also it allows me to use Jupyter Notebook and VSCode.
However I am wondering if there is a way to efficiently use Spyder with Azure Machine Learing.
eg. I was able to install R-Studio as a custom application using a docker image using steps provided here Stackoverflow link
Spyder supports connecting to a Remote Python kernel, it does, however require SSH.
You can enable SSH on your Compute Instance (see below), but only when you set it up. Also, many companies have policies against enabling SSH, so this might not work for you. If it doesn't, I can highly recommend VSCode.
Related
I have been working on a GCP AI Notebook for the past couple of weeks when I got '524 error'. I followed the troubleshooting instructions here. I connected to the notebook instance via ssh and restarted the Jupyter service. I am now able to open JupyterLab but I can't find any of my work!! Here is the JupyterLab screenshot. I searched for the files using Terminal in JupyterLab as well as the Cloud Shell but nothing. It looks as if my instance had been wiped clean.
Please help, I lost all my code I have been working on for the past couple of weeks.
Based on the Terminal output, seems to be you are using a Container based instance.
This means that you have a base OS and a Docker instance running JupyterLab service on top. I will be interested in knowing what Docker instance is that you are running. Is this a Deep Learning Container?
By default (If using Deep Learning Containers) files are stored in /home/jupyter and this folder is mapped to local disk so you can see if there is something inside jupyter. Do you have something there?
You can SSH into Jupyter instance and verify which is your container and parameters passed.
sudo docker ps --no-trunc
I am trying to run a python3 program continuously on GCP. What is the best way to do this?
So far I have tried using a google compute engine virtual machine running Debian linux. I used nohup but it still hangs up when the ssh connection is broken.
What other ways could I try to run the program through the vm? Are there better alternatives using GCP to run the program continuously?
Python's installation depend on different operating systems, documentation [1] could help you to run python program in Linux or in Windows without any trouble.
On the other hand, Google App Engine applications [2] are easy to create, easy to maintain, and easy to scale as your traffic and data storage needs change. With App Engine, there are no servers to maintain. You simply upload your application and very easy to operate. The documentation [3] also could be very helpful for you.
To know more about python on Google Cloud Platform, please have the documentation [4].
[1] https://cloud.google.com/python/setup#installing_python
[2] https://codelabs.developers.google.com/codelabs/cloud-app-engine-python3/#0
[3] https://cloud.google.com/python/getting-started
[4] https://cloud.google.com/python
I am looking to use Databricks Connect for developing a pyspark pipeline. DBConnect is really awesome because I am able to run my code on the cluster where the actual data resides, so it's perfect for integration testing, but I also want to be able to, during development and unit testing (pytest with pytest-spark), simply using a local Spark environment.
Is there any way to configure DBConnect so for one use-case I simply use a local Spark environment, but for another it uses DBConnect?
My 2 cents, since I've been done this type of development for some months now:
Work with two Python environments: one with databricks-connect (and thus, no pyspark installed), and another one with only pyspark installed. When you want to execute the tests, just activate the "local" virtual environment and run pytest as usual. Make sure, as some commenters pointed out, that you are initializing the pyspark session using SparkConf().setMaster("local").
Pycharm helps immensely to switch between environments during development. I am always on the "local" venv by default, but whenever I want to execute something using databricks-connect, I just create a new Run configuration from the menu. Easy peasy.
Also, be aware of some of databricks-connect's limitations:
It is not officially supported anymore, and Databricks recommend moving towards dbx whenever possible.
UDFs just won't work in databricks-connect.
Mlflow integration is not reliable. In my use case, I am able to download and use models, but unable to log a new experiment or track models using databricks tracking uri. This might depend on your Databricks Runtime, mlflow and local Python version.
I am trying to figure out the best way about how we can use local IDE such as microsoft visual studio code to use distributed computing power. Currently, we are brining data locally but it doesn't seem like sustainable solution because of reasons like in future scale of data will grow, cloud data security, etc. One workaround we thought of is to tunnel into EC2 instances but would like to hear what's best way to solve this in machine learning/data science environment (we are using databricks and AWS services).
Not sure why you are connecting IDE to ccomputer . I have used VS Code for running scripts against HDInsight cluster . Before I fire by scripts I do configure the clusters against which it is going to run . The same is true on the Databricks also.
I am very impressed with IPython Notebook, and I'd like to use it more extensively. My question has to do with secure data. I know only very little about networking. If I use IPython Notebook, is the data sent out over the web to a remote server? Or is it all contained locally? I am not talking about setting up a common resource for multiple access points, just using the data on my machine as I would with SAS or R.
Thanks
If you run the notebook on your machine, then no, it doesn't send anything externally. There are sites like Wakari where you can use the IPython notebook that's running on a server, and obviously that will send your code and data to their servers.
If you did want to expose your notebook server on the internet, then there are security measures that you should take, but that's not necessary if you're just running ipython notebook locally, which is the default way it starts up.