How do I access Databricks Repos metadata?

How do I access Databricks Repos metadata? - databricks

Is there a way to access data such as Repo url and Branch name inside a notebook within a Repo? Perhaps something in dbutils.

You can use Repos API for that - specifically the Get command. You can extract notebook path from the notebook context available via dbutils, and then do the two queries:
Get repo ID by path via Workspace API (repo path always consists of 3 components - /Repos, directory (for user or custom), and actual repository name)
Fetch repo data
Something like this:
import json
import requests
ctx = json.loads(
dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())
notebook_path = ctx['extraContext']['notebook_path']
repo_path = '/'.join(notebook_path.split('/')[:4])
api_url = ctx['extraContext']['api_url']
api_token = ctx['extraContext']['api_token']
repo_dir_data = requests.get(f"{api_url}/api/2.0/workspace/get-status",
headers = {"Authorization": f"Bearer {api_token}"},
json={"path": repo_path}).json()
repo_id = repo_dir_data['object_id']
repo_data = requests.get(f"{api_url}/api/2.0/repos/{repo_id}",
headers = {"Authorization": f"Bearer {api_token}"}
).json()

Related

import json as variable in shell script on windows and linux?

I want to read data from a json file into my azure devops as variable.
I saw this post:
import Azure Devops pipeline variables from json file
But the answer here is only powershell compatible.
Is ther any solution that works with linux and windows agents?

First, powershell should not only works on windows but also works on linux.
Install PowerShell on Linux
Second, for your requirement, I write a demo YAML for you(based on python):
1, Get the value of the specific route of json, and set it as variables in pipeline concept.
trigger:
- none
pool:
vmImage: ubuntu-latest
steps:
- task: PythonScript#0
inputs:
scriptSource: 'inline'
script: |
import json
def get_json_content_of_specific_route(json_file_path, json_route):
length = len(json_route)
# print(length)
#open json file
with open(json_file_path, 'r') as f:
data = json.load(f)
content = data
#get the content f data[json_route[0]][json_route[1]]...[json_route[length-1]]
for i in range(length):
if i == 0:
content = data[json_route[i]]
else:
content = content[json_route[i]]
#get the content
return content
json_file_path = 'Json_Files/parameters.json' #This is the path of the json file.
route = ['parameters', 'secretsPermissions', 'value'] #This is the route of the content you want to get.
content = get_json_content_of_specific_route(json_file_path, route)
print(content)
print("##vso[task.setvariable variable=myJobVar]"+str(content))
- powershell: |
Write-Host $(myJobVar) #you can get the content after you using logging command to set the variables.
The above variables getting from json can be Temporary set and use in pipeline run.
See using logging command to set the variables.
This is the structure of my repository:
Set and get the variable successfully in pipeline run lifecycle:
2, If you want permanent variables, there are two situations.
1' When based on classic pipeline, you need to use these REST API to achieve your requirements:
Get pipeline definition
Change pipeline definition
A python script that can change the variables of the classic pipeline:
import json
import requests
org_name = "xxx"
project_name = "xxx"
pipeline_definition_id = "xxx"
personal_access_token = "xxx"
key = 'variables'
var_name = 'BUILDNUMBER'
url = "https://dev.azure.com/"+org_name+"/"+project_name+"/_apis/build/definitions/"+pipeline_definition_id+"?api-version=6.0"
payload={}
headers = {
'Authorization': 'Basic '+personal_access_token
}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
json_content = response.text
def get_content_of_json(json_content, key, var_name):
data = json.loads(json_content)
return data[key][var_name].get('value')
def change_content_of_json(json_content, key, var_name):
data = json.loads(json_content)
data[key][var_name]['value'] = str(int(get_content_of_json(json_content,key,var_name)) + 1)
return data
json_data = change_content_of_json(json_content, key, var_name)
url2 = "https://dev.azure.com/"+org_name+"/"+project_name+"/_apis/build/definitions/"+pipeline_definition_id+"?api-version=6.0"
payload2 = json.dumps(json_data)
headers2 = {
'Authorization': 'Basic '+personal_access_token,
'Content-Type': 'application/json'
}
response2 = requests.request("PUT", url2, headers=headers2, data=payload2)
2' If your pipeline is based on non-classic pipeline, then you need to change the variables part of YAML file definition.
After you change the YAML definition during pipeline run, you can follow this to push back the changes:
Push Back Changes to Repository
Also, you can make your pipeline based on variables group and update the variables group via REST API:
Variablegroups - Update

How to create Wiki Subpages in Azure Devops thru Python?

My Azure devops page will look like :
I have 4 pandas dataframes.
I need to create 4 sub pages in Azure devops wiki from each dataframe.
Say, Sub1 from first dataframe, Sub2 from second dataframe and so on.
My result should be in tab. The result should look like :
Is it possible to create subpages thru API?
I have referenced the following docs. But I am unable to make any sense. Any inputs will be helpful. Thanks.
https://github.com/microsoft/azure-devops-python-samples/blob/main/API%20Samples.ipynb
https://learn.microsoft.com/en-us/rest/api/azure/devops/wiki/pages/create%20or%20update?view=azure-devops-rest-6.0

To create a wiki subpage, you should use Pages - Create Or Update api, and specify the path to pagename/subpagename. Regarding how to use the api in Python, you could use Azure DevOps Python API and refer to the sample below:
def create_or_update_page(self, parameters, project, wiki_identifier, path, version, comment=None, version_descriptor=None):
"""CreateOrUpdatePage.
[Preview API] Creates or edits a wiki page.
:param :class:`<WikiPageCreateOrUpdateParameters> <azure.devops.v6_0.wiki.models.WikiPageCreateOrUpdateParameters>` parameters: Wiki create or update operation parameters.
:param str project: Project ID or project name
:param str wiki_identifier: Wiki ID or wiki name.
:param str path: Wiki page path.
:param String version: Version of the page on which the change is to be made. Mandatory for `Edit` scenario. To be populated in the If-Match header of the request.
:param str comment: Comment to be associated with the page operation.
:param :class:`<GitVersionDescriptor> <azure.devops.v6_0.wiki.models.GitVersionDescriptor>` version_descriptor: GitVersionDescriptor for the page. (Optional in case of ProjectWiki).
:rtype: :class:`<WikiPageResponse> <azure.devops.v6_0.wiki.models.WikiPageResponse>`
"""
route_values = {}
if project is not None:
route_values['project'] = self._serialize.url('project', project, 'str')
if wiki_identifier is not None:
route_values['wikiIdentifier'] = self._serialize.url('wiki_identifier', wiki_identifier, 'str')
query_parameters = {}
if path is not None:
query_parameters['path'] = self._serialize.query('path', path, 'str')
if comment is not None:
query_parameters['comment'] = self._serialize.query('comment', comment, 'str')
if version_descriptor is not None:
if version_descriptor.version_type is not None:
query_parameters['versionDescriptor.versionType'] = version_descriptor.version_type
if version_descriptor.version is not None:
query_parameters['versionDescriptor.version'] = version_descriptor.version
if version_descriptor.version_options is not None:
query_parameters['versionDescriptor.versionOptions'] = version_descriptor.version_options
additional_headers = {}
if version is not None:
additional_headers['If-Match'] = version
content = self._serialize.body(parameters, 'WikiPageCreateOrUpdateParameters')
response = self._send(http_method='PUT',
location_id='25d3fbc7-fe3d-46cb-b5a5-0b6f79caf27b',
version='6.0-preview.1',
route_values=route_values,
query_parameters=query_parameters,
additional_headers=additional_headers,
content=content)
response_object = models.WikiPageResponse()
response_object.page = self._deserialize('WikiPage', response)
response_object.eTag = response.headers.get('ETag')
return response_object
More details, you could refer to the link below:
https://github.com/microsoft/azure-devops-python-api/blob/451cade4c475482792cbe9e522c1fee32393139e/azure-devops/azure/devops/v6_0/wiki/wiki_client.py#L107

Able to achieve with rest api
import requests
import base64
import pandas as pd
pat = 'TO BE FILLED BY YOU' #CONFIDENTIAL
authorization = str(base64.b64encode(bytes(':'+pat, 'ascii')), 'ascii')
headers = {
'Accept': 'application/json',
'Authorization': 'Basic '+authorization
}
df = pd.read_csv('sf_metadata.csv') #METADATA OF 3 TABLES
df.set_index('TABLE_NAME', inplace=True,drop=True)
df_test1 = df.loc['CURRENCY']
x1 = df_test1.to_html() # CONVERTING TO HTML TO PRESERVE THE TABULAR STRUCTURE
#JSON FOR PUT REQUEST
SamplePage1 = {
"content": x1
}
#API CALLS TO AZURE DEVOPS WIKI
response = requests.put(
url="https://dev.azure.com/xxx/yyy/_apis/wiki/wikis/yyy.wiki/pages?path=SamplePag2&api-version=6.0", headers=headers,json=SamplePage1)
print(response.text)

Based on #usr_lal123's answer, here is a function that can update a wiki page or create it if it doesn't:
import requests
import base64
pat = '' # Personal Access Token to be created by you
authorization = str(base64.b64encode(bytes(':'+pat, 'ascii')), 'ascii')
def update_or_create_wiki_page(organization, project, wikiIdentifier, path):
# Check if page exists by performing a Get
headers = {
'Accept': 'application/json',
'Authorization': 'Basic '+authorization
}
response = requests.get(url=f"https://dev.azure.com/{organization}/{project}/_apis/wiki/wikis/{wikiIdentifier}/pages?path={path}&api-version=6.0", headers=headers)
# Existing page will return an ETag in their response, which is required when updating a page
version = ''
if response.ok:
version = response.headers['ETag']
# Modify the headers
headers['If-Match'] = version
pageContent = {
"content": "[[_TOC_]] \n ## Section 1 \n normal text"
+ "\n ## Section 2 \n [ADO link](https://azure.microsoft.com/en-us/products/devops/)"
}
response = requests.put(
url=f"https://dev.azure.com/{organization}/{project}/_apis/wiki/wikis/{wikiIdentifier}/pages?path={path}&api-version=6.0", headers=headers,json=pageContent)
print("response.text: ", response.text)

Access Locust Host attribute - Locust 1.0.0+

I had previously asked and solved the problem of dumping stats using an older version of locust, but the setup and teardown methods were removed in locust 1.0.0, and now I'm unable to get the host (base URL).
I'm looking to print out some information about requests after they've run. Following the docs at https://docs.locust.io/en/stable/extending-locust.html, I have an request_success listener inside my sequential task set - some rough sample code below:
class SearchSequentialTest(SequentialTaskSet):
#task
def search(self):
path = '/search/tomatoes'
headers = {"Content-Type": "application/json",
unique_identifier = uuid.uuid4()
data = {
"name": f"Performance-{unique_identifier}",
}
with self.client.post(
path,
data=json.dumps(data),
headers=headers,
catch_response=True,
) as response:
json_response = json.loads(response.text)
self.items = json_response['result']['payload'][0]['uuid']
print(json_response)
#events.request_success.add_listener
def my_success_handler(request_type, name, response_time, response_length, **kw):
print(f"Successfully made a request to: {self.host}/{name}")
But I cannot access the self.host - and if I remove it I only get a relative url.
How do I access the base_url inside a TaskSet's event hooks?

How do I access the base_url inside a TaskSet's event hooks?
You can do it by accessing the class variable directly in your request handler:
print(f"Successfully made a request to: {YourUser.host}/{name}")
Or you can use absolute URLs in your test (task) like this:
with self.client.post(
self.user.host + path,
...
Then you'll get the full url to your request listener.

How to send a GraphQL query to AppSync from python?

How do we post a GraphQL request through AWS AppSync using boto?
Ultimately I'm trying to mimic a mobile app accessing our stackless/cloudformation stack on AWS, but with python. Not javascript or amplify.
The primary pain point is authentication; I've tried a dozen different ways already. This the current one, which generates a "401" response with "UnauthorizedException" and "Permission denied", which is actually pretty good considering some of the other messages I've had. I'm now using the 'aws_requests_auth' library to do the signing part. I assume it authenticates me using the stored /.aws/credentials from my local environment, or does it?
I'm a little confused as to where and how cognito identities and pools will come into it. eg: say I wanted to mimic the sign-up sequence?
Anyways the code looks pretty straightforward; I just don't grok the authentication.
from aws_requests_auth.boto_utils import BotoAWSRequestsAuth
APPSYNC_API_KEY = 'inAppsyncSettings'
APPSYNC_API_ENDPOINT_URL = 'https://aaaaaaaaaaaavzbke.appsync-api.ap-southeast-2.amazonaws.com/graphql'
headers = {
'Content-Type': "application/graphql",
'x-api-key': APPSYNC_API_KEY,
'cache-control': "no-cache",
}
query = """{
GetUserSettingsByEmail(email: "john#washere"){
items {name, identity_id, invite_code}
}
}"""
def test_stuff():
# Use the library to generate auth headers.
auth = BotoAWSRequestsAuth(
aws_host='aaaaaaaaaaaavzbke.appsync-api.ap-southeast-2.amazonaws.com',
aws_region='ap-southeast-2',
aws_service='appsync')
# Create an http graphql request.
response = requests.post(
APPSYNC_API_ENDPOINT_URL,
json={'query': query},
auth=auth,
headers=headers)
print(response)
# this didn't work:
# response = requests.post(APPSYNC_API_ENDPOINT_URL, data=json.dumps({'query': query}), auth=auth, headers=headers)
Yields
{
"errors" : [ {
"errorType" : "UnauthorizedException",
"message" : "Permission denied"
} ]
}

It's quite simple--once you know. There are some things I didn't appreciate:
I've assumed IAM authentication (OpenID appended way below)
There are a number of ways for appsync to handle authentication. We're using IAM so that's what I need to deal with, yours might be different.
Boto doesn't come into it.
We want to issue a request like any regular punter, they don't use boto, and neither do we. Trawling the AWS boto docs was a waste of time.
Use the AWS4Auth library
We are going to send a regular http request to aws, so whilst we can use python requests they need to be authenticated--by attaching headers.
And, of course, AWS auth headers are special and different from all others.
You can try to work out how to do it
yourself, or you can go looking for someone else who has already done it: Aws_requests_auth, the one I started with, probably works just fine, but I have ended up with AWS4Auth. There are many others of dubious value; none endorsed or provided by Amazon (that I could find).
Specify appsync as the "service"
What service are we calling? I didn't find any examples of anyone doing this anywhere. All the examples are trivial S3 or EC2 or even EB which left uncertainty. Should we be talking to api-gateway service? Whatsmore, you feed this detail into the AWS4Auth routine, or authentication data. Obviously, in hindsight, the request is hitting Appsync, so it will be authenticated by Appsync, so specify "appsync" as the service when putting together the auth headers.
It comes together as:
import requests
from requests_aws4auth import AWS4Auth
# Use AWS4Auth to sign a requests session
session = requests.Session()
session.auth = AWS4Auth(
# An AWS 'ACCESS KEY' associated with an IAM user.
'AKxxxxxxxxxxxxxxx2A',
# The 'secret' that goes with the above access key.
'kwWxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxgEm',
# The region you want to access.
'ap-southeast-2',
# The service you want to access.
'appsync'
)
# As found in AWS Appsync under Settings for your endpoint.
APPSYNC_API_ENDPOINT_URL = 'https://nqxxxxxxxxxxxxxxxxxxxke'
'.appsync-api.ap-southeast-2.amazonaws.com/graphql'
# Use JSON format string for the query. It does not need reformatting.
query = """
query foo {
GetUserSettings (
identity_id: "ap-southeast-2:8xxxxxxb-7xx4-4xx4-8xx0-exxxxxxx2"
){
user_name, email, whatever
}}"""
# Now we can simply post the request...
response = session.request(
url=APPSYNC_API_ENDPOINT_URL,
method='POST',
json={'query': query}
)
print(response.text)
Which yields
# Your answer comes as a JSON formatted string in the text attribute, under data.
{"data":{"GetUserSettings":{"user_name":"0xxxxxxx3-9102-42f0-9874-1xxxxx7dxxx5"}}}
Getting credentials
To get rid of the hardcoded key/secret you can consume the local AWS ~/.aws/config and ~/.aws/credentials, and it is done this way...
# Use AWS4Auth to sign a requests session
session = requests.Session()
credentials = boto3.session.Session().get_credentials()
session.auth = AWS4Auth(
credentials.access_key,
credentials.secret_key,
boto3.session.Session().region_name,
'appsync',
session_token=credentials.token
)
...<as above>
This does seem to respect the environment variable AWS_PROFILE for assuming different roles.
Note that STS.get_session_token is not the way to do it, as it may try to assume a role from a role, depending where it keyword matched the AWS_PROFILE value. Labels in the credentials file will work because the keys are right there, but names found in the config file do not work, as that assumes a role already.
OpenID
In this scenario, all the complexity is transferred to the conversation with the openid connect provider. The hard stuff is all the auth hoops you jump through to get an access token, and thence using the refresh token to keep it alive. That is where all the real work lies.
Once you finally have an access token, assuming you have configured the "OpenID Connect" Authorization Mode in appsync, then you can, very simply, drop the access token into the header:
response = requests.post(
url="https://nc3xxxxxxxxxx123456zwjka.appsync-api.ap-southeast-2.amazonaws.com/graphql",
headers={"Authorization": ACCESS_TOKEN},
json={'query': "query foo{GetStuff{cat, dog, tree}}"}
)

You can set up an API key on the AppSync end and use the code below. This works for my case.
import requests
# establish a session with requests session
session = requests.Session()
# As found in AWS Appsync under Settings for your endpoint.
APPSYNC_API_ENDPOINT_URL = 'https://vxxxxxxxxxxxxxxxxxxy.appsync-api.ap-southeast-2.amazonaws.com/graphql'
# setup the query string (optional)
query = """query listItemsQuery {listItemsQuery {items {correlation_id, id, etc}}}"""
# Now we can simply post the request...
response = session.request(
url=APPSYNC_API_ENDPOINT_URL,
method='POST',
headers={'x-api-key': '<APIKEYFOUNDINAPPSYNCSETTINGS>'},
json={'query': query}
)
print(response.json()['data'])

Building off Joseph Warda's answer you can use the class below to send AppSync commands.
# fileName: AppSyncLibrary
import requests
class AppSync():
def __init__(self,data):
endpoint = data["endpoint"]
self.APPSYNC_API_ENDPOINT_URL = endpoint
self.api_key = data["api_key"]
self.session = requests.Session()
def graphql_operation(self,query,input_params):
response = self.session.request(
url=self.APPSYNC_API_ENDPOINT_URL,
method='POST',
headers={'x-api-key': self.api_key},
json={'query': query,'variables':{"input":input_params}}
)
return response.json()
For example in another file within the same directory:
from AppSyncLibrary import AppSync
APPSYNC_API_ENDPOINT_URL = {YOUR_APPSYNC_API_ENDPOINT}
APPSYNC_API_KEY = {YOUR_API_KEY}
init_params = {"endpoint":APPSYNC_API_ENDPOINT_URL,"api_key":APPSYNC_API_KEY}
app_sync = AppSync(init_params)
mutation = """mutation CreatePost($input: CreatePostInput!) {
createPost(input: $input) {
id
content
}
}
"""
input_params = {
"content":"My first post"
}
response = app_sync.graphql_operation(mutation,input_params)
print(response)
Note: This requires you to activate API access for your AppSync API. Check this AWS post for more details.

graphql-python/gql supports AWS AppSync since version 3.0.0rc0.
It supports queries, mutation and even subscriptions on the realtime endpoint.
The documentation is available here
Here is an example of a mutation using the API Key authentication:
import asyncio
import os
import sys
from urllib.parse import urlparse
from gql import Client, gql
from gql.transport.aiohttp import AIOHTTPTransport
from gql.transport.appsync_auth import AppSyncApiKeyAuthentication
# Uncomment the following lines to enable debug output
# import logging
# logging.basicConfig(level=logging.DEBUG)
async def main():
# Should look like:
# https://XXXXXXXXXXXXXXXXXXXXXXXXXX.appsync-api.REGION.amazonaws.com/graphql
url = os.environ.get("AWS_GRAPHQL_API_ENDPOINT")
api_key = os.environ.get("AWS_GRAPHQL_API_KEY")
if url is None or api_key is None:
print("Missing environment variables")
sys.exit()
# Extract host from url
host = str(urlparse(url).netloc)
auth = AppSyncApiKeyAuthentication(host=host, api_key=api_key)
transport = AIOHTTPTransport(url=url, auth=auth)
async with Client(
transport=transport, fetch_schema_from_transport=False,
) as session:
query = gql(
"""
mutation createMessage($message: String!) {
createMessage(input: {message: $message}) {
id
message
createdAt
}
}"""
)
variable_values = {"message": "Hello world!"}
result = await session.execute(query, variable_values=variable_values)
print(result)
asyncio.run(main())

I am unable to add a comment due to low rep, but I just want to add that I tried the accepted answer and it didn't work. I was getting an error saying my session_token is invalid. Probably because I was using AWS Lambda.
I got it to work pretty much exactly, but by adding to the session token parameter of the aws4auth object. Here's the full piece:
import requests
import os
from requests_aws4auth import AWS4Auth
def AppsyncHandler(event, context):
# These are env vars that are always present in an AWS Lambda function
# If not using AWS Lambda, you'll need to add them manually to your env.
access_id = os.environ.get("AWS_ACCESS_KEY_ID")
secret_key = os.environ.get("AWS_SECRET_ACCESS_KEY")
session_token = os.environ.get("AWS_SESSION_TOKEN")
region = os.environ.get("AWS_REGION")
# Your AppSync Endpoint
api_endpoint = os.environ.get("AppsyncConnectionString")
resource = "appsync"
session = requests.Session()
session.auth = AWS4Auth(access_id,
secret_key,
region,
resource,
session_token=session_token)
The rest is the same.

Hope this Helps Everyone
import requests
import json
import os
from dotenv import load_dotenv
load_dotenv(".env")
class AppSync(object):
def __init__(self,data):
endpoint = data["endpoint"]
self.APPSYNC_API_ENDPOINT_URL = endpoint
self.api_key = data["api_key"]
self.session = requests.Session()
def graphql_operation(self,query,input_params):
response = self.session.request(
url=self.APPSYNC_API_ENDPOINT_URL,
method='POST',
headers={'x-api-key': self.api_key},
json={'query': query,'variables':{"input":input_params}}
)
return response.json()
def main():
APPSYNC_API_ENDPOINT_URL = os.getenv("APPSYNC_API_ENDPOINT_URL")
APPSYNC_API_KEY = os.getenv("APPSYNC_API_KEY")
init_params = {"endpoint":APPSYNC_API_ENDPOINT_URL,"api_key":APPSYNC_API_KEY}
app_sync = AppSync(init_params)
mutation = """
query MyQuery {
getAccountId(id: "5ca4bbc7a2dd94ee58162393") {
_id
account_id
limit
products
}
}
"""
input_params = {}
response = app_sync.graphql_operation(mutation,input_params)
print(json.dumps(response , indent=3))
main()

Add headers and payload in requests.put

I am trying to use requests in a Python3 script to update a deploy key in a gitlab project with write access. Unfortunately, I am receiving a 404 error when trying to connect via the requests module. The code is below:
project_url = str(url)+('/deploy_keys/')+str(DEPLOY_KEY_ID)
headers = {"PRIVATE-TOKEN" : "REDACTED"}
payload = {"can_push" : "true"}
r = requests.put(project_url, headers=headers, json=payload)
print(r)
Is there something that I am doing wrong where in the syntax of my Private Key/headers?
I have gone through gitlab api and requests documentation. I have also confirmed that my Private Token is working outside of the script.
I am expecting it to update the deploy key with write access, but am receiving a upon exit, making me think the issue is with the headers/auth.

This is resolved, it was actually not an auth problem. You must use the project ID instead of url, for example:
project_url = f'https://git.REDACTED.com/api/v4/projects/{project_id}/deploy_keys/{DEPLOY_KEY_ID}'
headers = {"PRIVATE-TOKEN" : "REDACTED"}
payload = {'can_push' : 'true'}
try:
r = requests.put(project_url, headers=headers, data=payload)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How do I access Databricks Repos metadata? - databricks

Is there a way to access data such as Repo url and Branch name inside a notebook within a Repo? Perhaps something in dbutils.

Related

import json as variable in shell script on windows and linux?

How to create Wiki Subpages in Azure Devops thru Python?

Access Locust Host attribute - Locust 1.0.0+

How to send a GraphQL query to AppSync from python?

Add headers and payload in requests.put

Categories

Resources