load registered model with mlflow - mlflow

I have registered a model to mlflow server, where artifact stored in aws bucket.
Now I want to load the model with mlflow API
model = mlflow.tensorflow.log_model("models:/sample-ann-1/1", artifact_path="")
However, I'm not sure where is the artifact stored, or what is the artifact path.
I know the uri of my tracking server. The artifact is stored in AWS s3 host and I have the s3 uri. How can I find the path of the artifact of registered model?

Related

Terraform state file in multiple backends

I am using Gitlab CI/CD to deploy to AWS with Terraform.
I would like to use the Gitlab REST API to store and lock/unlock my state.
To add some security and prevent any loss of my state file, I want also to backup my state file to an S3 bucket.
My question is: how to sync/update my state file present in my S3 bucket when my pipeline run a terraform apply and make changes to my AWS resources ?

Terraform wants to recreate imported resources

Locally I:
Created main.tf
Initialize with ‘terraform init’
Imported GCP project and Google Run service
Updated main.tf so ‘terraform plan’ was not trying to do anything.
Checked main.tf to GitHub
I setup GitHub actions so:
Checkout
Setup Gcloud
Initialize with ‘terraform init’
Plan with ‘terraform plan’
Terraform plan is trying to recreate everything.
How do I make it detect existing resources?
By default Terraform will initialise a local state. The problem with this state is that it will be available only for you on your PC. If you execute a plan somewhere else, this state will be lost. To solve this issue, you need to set up a remote backend for Terraform for being able to store the state file in a centralised location.
If you are using Google Cloud, you can use a Cloud Store bucket for storing the state file. Terraform offers gcs module for being able to configure this backend using Cloud Store. You have to create a bucket and provide the bucket name to the gcs backend configuration:
terraform {
backend "gcs" {
bucket = "tf-state-prod"
prefix = "terraform/state"
}
}

Storing mlflow artifacts to s3, while having SQL databse as backend

When using a SQL database as backend for mlflow are the artifacts stored in the same database or in default ./mlruns directory?
Is it possible to store them in different location as in AWS S3?
Yes, you can different artifact locations for each experiment and have the same backend registry. Here is an example that shows it
In this example, my backend registry is "mlruns.db" and the artifacts will be stored in their respective locations.
Yes. You can make use of the mlflow serve as below.
mlflow server --backend-store-uri=sqlite:///mlflow.db --default-artifact-root="s3://<bucket_name>" --host 0.0.0.0 --port 80
Also dont forget add install boto3 and configure AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables so that mlflow can read and write from and to the bucket respectively

Cloud build avoid billing by changing eu.artifacts.<project>.appspot.com bucket to single-region

Using app engine standard environment for python 3.7.
When running the app deploy command are container images uploaded to google storage in the bucket eu.artifacts.<project>.appspot.com.
This message is printed during app deploy
Beginning deployment of service [default]...
#============================================================#
#= Uploading 827 files to Google Cloud Storage =#
#============================================================#
File upload done.
Updating service [default]...
The files are uploaded to a multi-region (eu), how do I change this to upload to a single region?
Guessing that it's a configuration file that should be added to the repository to instruct app engine, cloud build or cloud storage that the files should be uploaded to a single region.
Is the eu.artifacts.<project>.appspot.com bucket required, or could all files be ignore using the .gcloudignore file?
The issue is similar to this issue How can I specify a region for the Cloud Storage buckets used by Cloud Build for a Cloud Run deployment?, but for app engine.
I'm triggering the cloud build using a service account.
Tried to implement the changes in the solution in the link above, but aren't able to get rid of the multi region bucket.
substitutions:
_BUCKET: unused
steps:
- name: 'gcr.io/cloud-builders/gcloud'
args: ['app', 'deploy', '--promote', '--stop-previous-version']
artifacts:
objects:
location: 'gs://${_BUCKET}/artifacts'
paths: ['*']
Command gcloud builds submit --gcs-log-dir="gs://$BUCKET/logs" --gcs-source-staging-dir="gs://$BUCKET/source" --substitutions=_BUCKET="$BUCKET"
I delete whole bucket after deploying, which prevents billing
gsutil -m rm -r gs://us.artifacts.<project-id>.appspot.com
-m - multi-threading/multi-processing (instead of deleting object-by-object this arguments will delete objects simultaneously)
rm - command to remove objects
-r - recursive
https://cloud.google.com/storage/docs/gsutil/commands/rm
After investigation a little bit more, I want to mention that this
kind of bucket is created by the “container registry” product when you deploy a new container( when you deploy your App Engine Application)-> When you push an image to a registry with a new hostname, Container Registry creates a storage bucket in the specified multi-regional location.This bucket is the underlying storage for the registry. Within a project, all registries with the same hostname share one storage bucket.
Based on this, it is not accessible by default and itself contains container images which are written when you deploy a new container. It's not recommended to modify it because the artifacts bucket is meant to contain deployment images, which may influence your app.
Finally, something curious that I found is when you create a default bucket (as is the case of the aforementioned bucket), you also get a staging bucket with the same name except that staging. You can use this staging bucket for temporary files used for staging and test purposes; it also has a 5 GB limit, but it is automatically emptied on a weekly basis

Cannot read .json from a google cloud bucket

I have a folder structure within a bucket of google cloud storage
bucket_name = 'logs'
json_location = '/logs/files/2018/file.json'
I try to read this json file in jupyter notebook using this code
from google.cloud import storage
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "logs/files/2018/file.json"
def download_blob(source_blob_name, bucket_name, destination_file_name):
"""Downloads a blob from the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print('Blob {} downloaded to {}.'.format(
source_blob_name,
destination_file_name))
Then calling the function
download_blob('file.json', 'logs', 'file.json')
And I get this error
DefaultCredentialsError: File /logs/files/2018/file.json was not found.
I have looked at all the similar question asked on stackoverflow and cannot find a solution.
The json file is present and can be open or downloaded in the json_location on google cloud storage.
There are two different perspectives regarding the json file you refer:
1) The json file used for authenticating to GCP.
2) The json you want to download from a bucket to your local machine.
For the first one, if you are accessing remotely to you Jupyter server, most probably the json doesn't exist in such remote machine, but in your local machine. If this is your scenario try to upload the json to the Jupyter server. Executing ls -l /logs/files/2018/file.json in the remote machine could help to verify its correctness. Then, os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "JSON_PATH_ON_JUPYTER_SERVER" should work.
On the other hand, I executed your code and got:
>>> download_blob('static/upload_files_CS.png', 'bucketrsantiago', 'file2.json')
Blob static/upload_files_CS.png downloaded to file2.json.
The file gs://bucketrsantiago/static/upload_files_CS.png was downloaded to my local machine with the name file2.json. This helps to clarify that the only problem is regarding the authentication json file.
GOOGLE_APPLICATION_CREDENTIALS is supposed to point to a file on the local disk where you are running jupyter. You need the credentials in order to call GCS, so you can't fetch them from GCS.
In fact, you are best off not messing around with credentials at all in your program, and leaving the client library to it. Don't touch GOOGLE_APPLICATION_CREDENTIALS in our application. Instead:
If you are running on GCE, just make sure your GCE instances [has a service account with the right scopes and permissions]. Applications running in that instance will automatically have the permissions of that service account.
If you are running locally, install google cloud SDK and run gcloud auth application-default login. Your program will then automatically use whichever account you log in as.
Complete instructions here

Resources