Workspace url for machine learning experiment notebook - python-3.x

I am running a machine learning experiment in Databricks and I want to obtain the workspace URL for certain uses.
I know how to manually obtain the workspace URL of notebook from this link https://learn.microsoft.com/en-us/azure/databricks/workspace/per-workspace-urls
Similar to how you can obtain the path of your notebook by
dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
How do I programmatically obtain the notebook's URL?

There are two things available:
Browser host name - it gives you just host name, without http/https schema, but it's really a name that you see in the browser:
dbutils.notebook.entry_point.getDbutils().notebook().getContext() \
.browserHostName().get()
API URL: base URL with HTTPS schema that you can use to call APIs:
dbutils.notebook.entry_point.getDbutils().notebook().getContext() \
.apiUrl().get()
P.S. I really prefer to convert that information into a Python dict that it's easier to investigate and use. I use code like this:
import json
ctx = json.loads(dbutils.notebook.entry_point.getDbutils().notebook() \
.getContext().toJson())
ctx

Related

How to delete GKE (Google Kubernetes Engine) cluster using python?

I'm new to GKE-Python. I would like to delete my GKE(Google Kubernetes Engine) cluster using a python script.
I found an API delete_cluster() from the google-cloud-container python library to delete the GKE cluster.
https://googleapis.dev/python/container/latest/index.html
But I'm not sure how to use that API by passing the required parameters in python. Can anyone explain me with an example?
Or else If there is any other way to delete the GKE cluster in python?
Thanks in advance.
First you'd need to configure the Python Client for Google Kubernetes Engine as explained on this section of the link you shared. Basically, set up a virtual environment and install the library with pip install google-cloud-container.
If you are running the script within an environment such as the Cloud Shell with an user that has enough access to manage the GKE resources (with at least the Kubernetes Engine Cluster Admin permission assigned) the client library will handle the necessary authentication from the script automatically and the following script will most likely work:
from google.cloud import container_v1
project_id = "YOUR-PROJECT-NAME" #Change me.
zone = "ZONE-OF-THE-CLUSTER" #Change me.
cluster_id = "NAME-OF-THE-CLUSTER" #Change me.
name = "projects/"+project_id+"/locations/"+zone+"/clusters/"+cluster_id
client = container_v1.ClusterManagerClient()
response = client.delete_cluster(name=name)
print(response)
Notice that as per the delete_cluster method documentation you only need to pass the name parameter. If by some reason you are just provided the credentials (generally in the form of a JSON file) of a service account that has enough permissions to delete the cluster you'd need to modify the client for the script and use the credentials parameter to get the client correctly authenticated in a similar fashion to:
...
client = container_v1.ClusterManagerClient(credentials=credentials)
...
Where the credentials variable is pointing to the JSON filename (and path if it's not located in the folder where the script is running) of the service account credentials file with enough permissions that was provided.
Finally notice that the response variable that is returned by the delete_cluster method is of the Operations class which can serve to monitor a long running operation in a similar fashion as to how it is explained here with the self_link attribute corresponding to the long running operation.
After running the script you could use a curl command in a similar fashion to:
curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://container.googleapis.com/v1/projects/[RPOJECT-NUMBER]/zones/[ZONE-WHERE-THE-CLUSTER-WAS-LOCATED]/operations/operation-[OPERATION-NUMBER]
by checking the status field (which could be in RUNNING state while it is happening) of the response to that curl command. Or your could also use the requests library or any equivalent to automate this checking procedure of the long running operation within your script.
This page contains an example for the command you are trying to perform.
To give some more details that are required for the command to succeed -
Your environment needs to contain environment variables, this page contains instructions for how to do that.
Once your environment is successfully authenticated we can run the delete cluster command like so -
from google.cloud import container_v1
client = container_v1.ClusterManagerClient()
response = client.delete_cluster(name=projects/<project>/locations/<location>/clusters/<cluster>)

How to connect Google Datastore from a script in Python 3

We want to do some stuff with the data that is in the Google Datastore. We have a database already, We would like to use Python 3 to handle the data and make queries from a script on our developing machines. Which would be the easiest way to accomplish what we need?
From the Official Documentation:
You will need to install the Cloud Datastore client library for Python:
pip install --upgrade google-cloud-datastore
Set up authentication by creating a service account and setting an environment variable. It will be easier if you see it, please take a look at the official documentation for more info about this. You can perform this step by either using the GCP console or command line.
Then you will be able to connect to your Cloud Datastore client and use it, as in the example below:
# Imports the Google Cloud client library
from google.cloud import datastore
# Instantiates a client
datastore_client = datastore.Client()
# The kind for the new entity
kind = 'Task'
# The name/ID for the new entity
name = 'sampletask1'
# The Cloud Datastore key for the new entity
task_key = datastore_client.key(kind, name)
# Prepares the new entity
task = datastore.Entity(key=task_key)
task['description'] = 'Buy milk'
# Saves the entity
datastore_client.put(task)
print('Saved {}: {}'.format(task.key.name, task['description']))
As #JohnHanley mentioned, you will find a good example on this Bookshelf app tutorial that uses Cloud Datastore to store its persistent data and metadata for books.
You can create a service account and download the credentials as JSON and then set an environment variable called GOOGLE_APPLICATION_CREDENTIALS pointing to the json file. You can see the details at the link below.
https://googleapis.dev/python/google-api-core/latest/auth.html

What is suggested method to get service versions

What is the best way to get list of service versions in google app engine in flex env? (from service instance in Python 3). I want to authenticate using service account json keys file. I need to find currently default version (with most of traffic).
Is there any lib I can use like googleapiclient.discovery, or google.appengine.api.modules? Or I should build it from scratches and request REST api on apps.services.versions.list using oauth? I couldn't not find any information in google docs..
https://cloud.google.com/appengine/docs/standard/python3/python-differences#cloud_client_libraries
Finally I was able to solve it. Simple things on GAE became big problems..
SOLUTION:
I have path to service_account.json set in GOOGLE_APPLICATION_CREDENTIALS env variable. Then you can use google.auth.default
from googleapiclient.discovery import build
import google.auth
creds, project = google.auth.default(scopes=['https://www.googleapis.com/auth/cloud-platform.read-only'])
service = build('appengine', 'v1', credentials=creds, cache_discovery=False)
data = service.apps().services().get(appsId=APPLICATION_ID, servicesId=SERVICE_ID).execute()
print data['split']['allocations']
Return value is allocations dictionary with versions as keys and traffic percents in values.
All the best!
You can use Google's Python Client Library to interact with the Google App Engine Admin API, in order to get the list of a GAE service versions.
Once you have google-api-python-client installed, you might want to use the list method to list all services in your application:
list(appsId, pageSize=None, pageToken=None, x__xgafv=None)
The arguments of the method should include the following:
appsId: string, Part of `name`. Name of the resource requested. Example: apps/myapp. (required)
pageSize: integer, Maximum results to return per page.
pageToken: string, Continuation token for fetching the next page of results.
x__xgafv: string, V1 error format. Allowed values: v1 error format, v2 error format
You can find more information on this method in the link mentioned above.

How to download via URL from DBFS in Azure Databricks

Documented here its mentioned that I am supposed to download a file from Data Bricks File System from a URL like:
https://<your-region>.azuredatabricks.net?o=######/files/my-stuff/my-file.txt
But when I try to download it from the URL with my own "o=" parameter similar to this:
https://westeurope.azuredatabricks.net/?o=1234567890123456/files/my-stuff/my-file.txt
it only gives the following error:
HTTP ERROR: 500
Problem accessing /. Reason:
java.lang.NumberFormatException: For input string:
"1234567890123456/files/my-stuff/my-file.txt"
Am I using the wrong URL or is the documentation wrong?
I already found a similar question that was answered, but that one does not seem to fit to the Azure Databricks documentation and might for AWS Databricks:
Databricks: Download a dbfs:/FileStore File to my Local Machine?
Thanks in advance for your help
The URL should be:
https://westeurope.azuredatabricks.net/files/my-stuff/my-file.txt?o=1234567890123456
Note that the file must be in the filestore folder.
As a side note I've been working on something called DBFS explorer to help with things like this if you would like to give it a try?
https://datathirst.net/projects/dbfs-explorer/

Introduce new data source in Terraform

I am new to Terraform and have been trying to understand the constructs of the same. Let's say i have a service which exposes REST API's and i want to call those REST API's as part of my terraform script, what are the steps i need to take ?
My understanding is that i need to write a custom provider but i am unable to connect the dot's on how to add new data source type for the new provider.
Also, assuming that we do have the required provider, whats the protocol that would be used for communicating with my service ? Is it HTTP/s ?
One more point to note is that my service currently is used for configuring storage in the backend.
Recent versions of terraform ( > 0.9 I believe) support external data sources. You don't have to create a custom provider. You can call any arbitrary shell or python script that return values that you can use as data.
data "external" "example" {
program = ["python", "${path.module}/example-data-source.py"]
query = {
# arbitrary map from strings to strings, passed
# to the external program as the data query.
id = "abc123"
}
}
In your case you could use a simple curl in a bash script to call your endpoint and return data to terraform as a map of strings.
Do note the warnings a the top of that page.
This is considerably more difficult then it appears; it is impossible to debug the interaction between what terraform is sending to my script and what the script is expecting. It just fails to parse the arguments and refuses to provide me any feedback as to what is getting into the program

Resources