Azure ML Studio Pipeline is run under Service Principle - azure-machine-learning-service

I have a draft pipeline created inside the azure machine learning service workspace (Designer Mode). I try to run pipeline from python using Python Azure ML SDK. It starts but quickly fails on the second step.
Trace from step:
Traceback (most recent call last):
File "invoker.py", line 81, in <module>
execute(args)
File "invoker.py", line 71, in execute
ret = run(generate_run_command(args))
File "invoker.py", line 52, in run
return subprocess.Popen(command, stdout=sys.stdout, stderr=sys.stderr).wait(timeout=timeout)
File "/azureml-envs/azureml_b05af1507517824d92fd90bb8ce7897a/lib/python3.8/subprocess.py", line 858, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/azureml-envs/azureml_b05af1507517824d92fd90bb8ce7897a/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: ''
When I submit a job for the drafted pipeline in UI there is no problem. When I submit a job for the same draft pipeline from Python SDK, it fails with "Permission denied" on the second step, which actually "Apply SQL Transformation", the first step is Import Dataset. When I resubmit the failed job from UI there is also no problem. It is clear that the problem is in Service Principle. I granted all possible permissions to SP for the workspace. It didn't help. Does anybody have luck running Azure ML Drafted Pipeline from Python?

Service principal method works for me when using Azure ML Drafted Pipeline from Python. I am Using-python 3.7 and azureml-sdk 1.47.0
My code
from azureml.core import Workspace
from azureml.core.authentication import ServicePrincipalAuthentication
from dotenv import load_dotenv
load_dotenv()
def getMLWorkspace():
# Connect to the Azure ML Service Workspace using a service principal
svcpr = ServicePrincipalAuthentication(
tenant_id = os.environ['TENANT_ID'],
service_principal_id = os.environ['SERVICE_PRINCIPAL_ID'],
service_principal_password = os.environ['SERVICE_PRINCIPAL_PWD'])
subscription_id = os.environ['SUBSCRIPTION_ID']
resource_group = os.environ['RESOURCE_GROUP']
workspace_name = os.environ['WORKSPACE_NAME']
ws = Workspace(
subscription_id = subscription_id,
resource_group = resource_group,
workspace_name = workspace_name,
auth = svcpr)
print('Workspace configuration succeeded')
return ws

Related

403 permission error when executing from command line client on Bigquery

I have set-up gcloud in my local system. I am using Python 3.7 to insert records in big-query dataset situated in projectA. So I try it from command line client with the project set to projectA. The first command I give is to get authenticated
gcloud auth login
Then I use Python 3 and get into Python mode, and I give the following commands:
from googleapiclient.discovery import build
from google.cloud import bigquery
import json
body={json input} //pass the json string here
bigquery = build('bigquery', 'v2', cache_discovery=False)
bigquery.tabledata().insertAll(projectId="projectA",datasetId="compute_reports",tableId="compute_snapshot",body=body).execute()
I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/googleapiclient/http.py", line 915, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://bigquery.googleapis.com/bigquery/v2/projects/projectA/datasets/compute_reports/tables/compute_snapshot/insertAll?alt=json returned "Access Denied: Table projectA:compute_reports.compute_snapshot: User does not have bigquery.tables.updateData permission for table projectA:compute_reports.compute_snapshot."
I am executing it as a user with role/Owner and BigQueryDataOwner permissions for the project and also added DataEditor to the dataset also, which has these permissions including:
bigquery.tables.update
bigquery.datasets.update
Still I am getting this error.
Why with my credentials am I still not able to execute insert in the big-query?
The error lies in the permissions, so the service account which was used by the python run-time, which is the default service account as set in the bash profile did not have the Bigquery dataeditor access for projectA. Once I gave the access it started working

Roles for queue.yaml deployment with service account

I'm trying to deploy our app.yaml and queue.yaml using the following command:
gcloud --verbosity=debug --project PROJECT_ID app deploy app.yaml queue.yaml
I created a new service account with the roles
App Engine Deployer
App Engine Service Admin
Cloud Build Service Account
for deploying the app.yaml, which works by itself. When trying to deploy the queue.yaml, I get the following error:
DEBUG: Running [gcloud.app.deploy] with arguments: [--project: "PROJECT_ID", --verbosity: "debug", DEPLOYABLES:1: "[u'queue.yaml']"]
DEBUG: Loading runtimes experiment config from [gs://runtime-builders/experiments.yaml]
INFO: Reading [<googlecloudsdk.api_lib.storage.storage_util.ObjectReference object at 0x7fcc7dba0dd0>]
DEBUG: API endpoint: [https://appengine.googleapis.com/], API version: [v1]
Configurations to update:
descriptor: [/home/dominic/workspace/PROJECT/api/queue.yaml]
type: [task queues]
target project: [PROJECT_ID]
DEBUG: (gcloud.app.deploy) PERMISSION_DENIED: The caller does not have permission
Traceback (most recent call last):
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 983, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 807, in Run
resources = command_instance.Run(args)
File "/usr/lib/google-cloud-sdk/lib/surface/app/deploy.py", line 117, in Run
default_strategy=flex_image_build_option_default))
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 606, in RunDeploy
app, project, services, configs, version_id, deploy_options.promote)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/output_helpers.py", line 111, in DisplayProposedDeployment
DisplayProposedConfigDeployments(project, configs)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/output_helpers.py", line 134, in DisplayProposedConfigDeployments
project, 'cloudtasks.googleapis.com')
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/services/enable_api.py", line 43, in IsServiceEnabled
service = serviceusage.GetService(project_id, service_name)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/services/serviceusage.py", line 168, in GetService
exceptions.ReraiseError(e, exceptions.GetServicePermissionDeniedException)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/services/exceptions.py", line 96, in ReraiseError
core_exceptions.reraise(klass(api_lib_exceptions.HttpException(err)))
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/core/exceptions.py", line 146, in reraise
six.reraise(type(exc_value), exc_value, tb)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/services/serviceusage.py", line 165, in GetService
return client.services.Get(request)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/third_party/apis/serviceusage/v1/serviceusage_v1_client.py", line 297, in Get
config, request, global_params=global_params)
File "/usr/bin/../lib/google-cloud-sdk/lib/third_party/apitools/base/py/base_api.py", line 731, in _RunMethod
return self.ProcessHttpResponse(method_config, http_response, request)
File "/usr/bin/../lib/google-cloud-sdk/lib/third_party/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse
self.__ProcessHttpResponse(method_config, http_response, request))
File "/usr/bin/../lib/google-cloud-sdk/lib/third_party/apitools/base/py/base_api.py", line 604, in __ProcessHttpResponse
http_response, method_config=method_config, request=request)
GetServicePermissionDeniedException: PERMISSION_DENIED: The caller does not have permission
ERROR: (gcloud.app.deploy) PERMISSION_DENIED: The caller does not have permission
I've also tried the following roles:
Cloud Tasks Admin
Cloud Tasks Queue Admin
Cloud Tasks Service Agent
I'm using the Project Editor role for now, which works but I would like to only permit the roles which are actually required.
In addition of Cloud Tasks Queue Admin role, you have to add Service Account User to allow the service account of Cloud Task to generate token on behalf the service account.
Was banging my head against the wall for awhile with this myself, it seems "intuitively" you also need "serviceusage.services.list" perms, so Service Usage Viewer role
found via this issue
https://issuetracker.google.com/issues/137078982

Azure function import pyodbc error after publishing

First- thanks a ton for all your posts and responses that helped me immensely getting this far!
I have successfully created an Azure function that has import pyodbc, azure.function like shown below.
*import logging
import pyodbc
import json
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
try:*
It works fine in VS Code but when I try to run it after publishing, it fails with
**2019-11-22T14:31:17.743 [Information] Executing 'Functions.godataexcelautomation' (Reason='This function was programmatically called via the host APIs.', Id=79cebf6c-b371-4a12-b623-16931abe7757)
2019-11-22T14:31:17.761 [Error] Executed 'Functions.godataexcelautomation' (Failed, Id=79cebf6c-b371-4a12-b623-16931abe7757)
Result: Failure
Exception: ModuleNotFoundError: No module named 'pyodbc'
Stack: File "/azure-functions-host/workers/python/3.6/LINUX/X64/azure_functions_worker/dispatcher.py", line 242, in _handle__function_load_request
func_request.metadata.entry_point)
File "/azure-functions-host/workers/python/3.6/LINUX/X64/azure_functions_worker/loader.py", line 66, in load_function
mod = importlib.import_module(fullmodname)
File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/home/site/wwwroot/godataexcelautomation/__init__.py", line 2, in <module>
import pyodbc**
Appreciate any help you can.. seems like I need make pyodbc available to azure portal? in the .json file?
Thanks in advance!
I got the same error as you when I deploy python function from VS code directly.Pls have a check that you have added pyobdc in your requirements.txt and try the command below to deploy your python function , it solved my issue:
func azure functionapp publish <APP_NAME> --build remote
Btw,you should define ODBC Driver 17 for SQL Server as odbc driver on python function.
Result :

Registering and downloading a fastText .bin model fails with Azure Machine Learning Service

I have a simple RegisterModel.py script that uses the Azure ML Service SDK to register a fastText .bin model. This completes successfully and I can see the model in the Azure Portal UI (I cannot see what model files are in it). I then want to download the model (DownloadModel.py) and use it (for testing purposes), however it throws an error on the model.download method (tarfile.ReadError: file could not be opened successfully) and makes a 0 byte rjtestmodel8.tar.gz file.
I then use the Azure Portal and Add Model and select the same bin model file and it uploads fine. Downloading it with the download.py script below works fine, so I am assuming something is not correct with the Register script.
Here are the 2 scripts and the stacktrace - let me know if you can see anything wrong:
RegisterModel.py
import azureml.core
from azureml.core import Workspace, Model
ws = Workspace.from_config()
model = Model.register(workspace=ws,
model_name='rjSDKmodel10',
model_path='riskModel.bin')
DownloadModel.py
# Works when downloading the UI Uploaded .bin file, but not the SDK registered .bin file
import os
import azureml.core
from azureml.core import Workspace, Model
ws = Workspace.from_config()
model = Model(workspace=ws, name='rjSDKmodel10')
model.download(target_dir=os.getcwd(), exist_ok=True)
Stacktrace
Traceback (most recent call last):
File "...\.vscode\extensions\ms-python.python-2019.9.34474\pythonFiles\ptvsd_launcher.py", line 43, in <module>
main(ptvsdArgs)
File "...\.vscode\extensions\ms-python.python-2019.9.34474\pythonFiles\lib\python\ptvsd\__main__.py", line 432, in main
run()
File "...\.vscode\extensions\ms-python.python-2019.9.34474\pythonFiles\lib\python\ptvsd\__main__.py", line 316, in run_file
runpy.run_path(target, run_name='__main__')
File "...\.conda\envs\DoC\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "...\.conda\envs\DoC\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "...\.conda\envs\DoC\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "...\\DownloadModel.py", line 21, in <module>
model.download(target_dir=os.getcwd(), exist_ok=True)
File "...\.conda\envs\DoC\lib\site-packages\azureml\core\model.py", line 712, in download
file_paths = self._download_model_files(sas_to_relative_download_path, target_dir, exist_ok)
File "...\.conda\envs\DoC\lib\site-packages\azureml\core\model.py", line 658, in _download_model_files
file_paths = self._handle_packed_model_file(tar_path, target_dir, exist_ok)
File "...\.conda\envs\DoC\lib\site-packages\azureml\core\model.py", line 670, in _handle_packed_model_file
with tarfile.open(tar_path) as tar:
File "...\.conda\envs\DoC\lib\tarfile.py", line 1578, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
Environment
riskModel.bin is 6 megs
AMLS 1.0.60
Python 3.7
Working locally with Visual Code
The Azure Machine Learning service SDK has a bug with how it interacts with Azure Storage, which causes it to upload corrupted files if it has to retry uploading.
A couple workarounds:
The bug was introduced in 1.0.60 release. If you downgrade to AzureML-SDK 1.0.55, the code should fail when there are issue uploading instead of silently corrupting data.
It's possible that the retry is being triggered by the low timeout values that the AzureML-SDK defaults to. You could investigate changing the timeout in site-packages/azureml/_restclient/artifacts_client.py
This bug should be fixed in the next release of the AzureML-SDK.

Localhost: how to get credentials to connect GAE Python 3 app and Datastore Emulator?

I'd like to use the new Datastore Emulator together with a GAE Flask app on localhost. I want to run it in the Docker environment, but the error I get (DefaultCredentialsError) happens with or without Docker.
My Flask file looks like this (see the whole repository here on GitHub):
main.py:
from flask import Flask
from google.cloud import datastore
app = Flask(__name__)
#app.route("/")
def index():
return "App Engine with Python 3"
#app.route("/message")
def message():
# auth
db = datastore.Client()
# add object to db
entity = datastore.Entity(key=db.key("Message"))
message = {"message": "hello world"}
entity.update(message)
db.put(entity)
# query from db
obj = db.get(key=db.key("Message", entity.id))
return "Message for you: {}".format(obj["message"])
The index() handler works fine, but the message() handler throws this error:
[2019-02-03 20:00:46,246] ERROR in app: Exception on /message [GET]
Traceback (most recent call last):
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/main.py", line 16, in message
db = datastore.Client()
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/google/cloud/datastore/client.py", line 210, in __init__
project=project, credentials=credentials, _http=_http
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/google/cloud/client.py", line 223, in __init__
_ClientProjectMixin.__init__(self, project=project)
INFO 2019-02-03 20:00:46,260 module.py:861] default: "GET /message HTTP/1.1" 500 291
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/google/cloud/client.py", line 175, in __init__
project = self._determine_default(project)
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/google/cloud/datastore/client.py", line 228, in _determine_default
return _determine_default_project(project)
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/google/cloud/datastore/client.py", line 75, in _determine_default_project
project = _base_default_project(project=project)
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/google/cloud/_helpers.py", line 186, in _determine_default_project
_, project = google.auth.default()
File "/tmp/tmpJcIw2U/lib/python3.5/site-packages/google/auth/_default.py", line 306, in default
raise exceptions.DefaultCredentialsError(_HELP_MESSAGE)
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
I checked the website in the error log and tried the JSON auth file (GOOGLE_APPLICATION_CREDENTIALS), but the result was that my app then connected with a production Datastore on Google Cloud, instead of the local Datastore Emulator.
Any idea how to resolve this?
I managed to solve this problem by adding env vars directly into the Python code (in this case in main.py) and using the Mock library:
import os
import mock
from flask import Flask, render_template, request
from google.cloud import datastore
import google.auth.credentials
app = Flask(__name__)
if os.getenv('GAE_ENV', '').startswith('standard'):
# production
db = datastore.Client()
else:
# localhost
os.environ["DATASTORE_DATASET"] = "test"
os.environ["DATASTORE_EMULATOR_HOST"] = "localhost:8001"
os.environ["DATASTORE_EMULATOR_HOST_PATH"] = "localhost:8001/datastore"
os.environ["DATASTORE_HOST"] = "http://localhost:8001"
os.environ["DATASTORE_PROJECT_ID"] = "test"
credentials = mock.Mock(spec=google.auth.credentials.Credentials)
db = datastore.Client(project="test", credentials=credentials)
The Datastore Emulator is then run like this:
gcloud beta emulators datastore start --no-legacy --data-dir=. --project test --host-port "localhost:8001"
Requirements needed:
Flask
google-cloud-datastore
mock
google-auth
GitHub example here: https://github.com/smartninja/gae-2nd-gen-examples/tree/master/simple-app-datastore
The fact that credentials are required indicates you're reaching to the actual Datastore, not to the datastore emulator (which neither needs nor requests credentials).
To reach the emulator the client applications (that support it) need to figure out where the emulator is listening and, for that, you need to set the DATASTORE_EMULATOR_HOST environment variable for them. From Setting environment variables:
After you start the emulator, you need to set environment variables so
that your application connects to the emulator instead of the
production Datastore mode environment. Set these environment variables
on the same machine that you use to run your application.
You need to set the environment variables each time you start the
emulator. The environment variables depend on dynamically assigned
port numbers that could change when you restart the emulator.
See the rest of that section on details about setting the environment and maybe peek at Is it possible to start two dev_appserver.py connecting to the same google cloud datastore emulator?

Resources