python cant find the env var - python-3.x

I'm passing env var via docker container run in github actions like so:
run: docker container run -d -e MY_KEY="some key" -p 3000:3000 somedockerimage/somedockerimage:0.0.2
I know it should pass it right way because it working with node.js
in the python file:
import os
api_key = os.environ['MY_KEY']
print(api_key)
the results I get:
File "print.py", line 4, in <module>
api_key = os.environ['MY_KEY']
File "/usr/lib/python3.8/os.py", line 675, in __getitem__
raise KeyError(key) from None
KeyError: 'MY_KEY'

I don't see anything incorrect with the way you're running your Docker container. It's difficult to say without seeing the rest of the project, but you may need to delete your .pyc files with something like find . -name \*.pyc -delete. This answer could add more context.

Related

"OSError" whilst trying to run a Python app inside a Docker container using Application Default Credentials

Error
My Python app is running fine locally. I've created a Dockerfile and built an image. Inside the app I'm using the Python Google Cloud Logging library. When I try running the image I can see the following error from the Docker container logs:
File "/home/apprunner/app/./app/main.py", line 12, in <module>
2022-12-20 15:14:37 client = google.cloud.logging.Client()
2022-12-20 15:14:37 File "/home/apprunner/app/.venv/lib/python3.9/site-packages/google/cloud/logging_v2/client.py", line 122, in __init__
2022-12-20 15:14:37 super(Client, self).__init__(
2022-12-20 15:14:37 File "/home/apprunner/app/.venv/lib/python3.9/site-packages/google/cloud/client/__init__.py", line 320, in __init__
2022-12-20 15:14:37 _ClientProjectMixin.__init__(self, project=project, credentials=credentials)
2022-12-20 15:14:37 File "/home/apprunner/app/.venv/lib/python3.9/site-packages/google/cloud/client/__init__.py", line 271, in __init__
2022-12-20 15:14:37 raise EnvironmentError(
2022-12-20 15:14:37 OSError: Project was not passed and could not be determined from the environment.
Running the Docker container
I'm running the Docker container using the following commands where I pass in Application Default Credentials:
# Set shell variable
ADC=~/.config/gcloud/application_default_credentials.json \
docker run \
-d \
-v ${ADC}:/tmp/keys/application_default_credentials.json:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/tmp/keys/application_default_credentials.json \
IMAGE_NAME
I'm following the official guide on Docker with Google Cloud Access but using Application Default Credentials instead of a service account.
I've checked that my application_default_credentials.json is present in that location and I've checked that ${ADC} has the correct value:
$ echo $ADC
/Users/ian/.config/gcloud/application_default_credentials.json
Debugging
I see the stack trace points to the line in my code that calls the Logging library:
client = google.cloud.logging.Client()
And below it seems to suggest that it is expecting a project as well as the credentials:
_ClientProjectMixin.__init__(self, project=project, credentials=credentials)
Is this a problem with how I'm passing in my Application Default Credentials or should I be passing in some other project information?
Update
If I explicitly pass in the project argument in my code I can get the Docker container to run successfully:
client = google.cloud.logging.Client(project='my-project')
However I don't want to make code changes for local development and this shouldn't be required. I don't understand why this isn't be pulled out of my ADC(?)
I've been able to get it to run but only by explicitly passing in the project ID.
Solution
The GOOGLE_CLOUD_PROJECT variable is an explicit requirement alongside GOOGLE_APPLICATION_CREDENTIALS. The cleanest way is to pass both in as environment variables when running the container. This is the first place that is searched.
Set shell vars:
ADC=~/.config/gcloud/application_default_credentials.json \
PROJECT=my-project
Docker run:
$ docker run \
-v ${ADC}:/tmp/keys/application_default_credentials.json:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/tmp/keys/application_default_credentials.json \
-e GOOGLE_CLOUD_PROJECT=${PROJECT} \
IMAGE_NAME
Explanation
The docs mention that the project should be inferred from the environment if not explicitly provided:
# if project not given, it will be inferred from the environment
client = google.cloud.logging.Client(project="my-project")
To be inferred from the environment you need to have:
Installed the Google Cloud SDK
Created Application Default Credentials (ADC).
gcloud auth application-default login
Set an active project.
gcloud config set project PROJECT_ID
Passing ADC in to the locally running container through environment variables works for authentication but it doesn't pass in the active project as this is set in your local configuration (3)(~/.config/gcloud/configurations on Mac/Linux). So no project can be inferred from the environment inside the container as it is not set and not passed. So it searches through the list of locations in order and doesn't find anything.
Best Practice
It's good practice to pass both authentication credentials and project identifier from the same place:
Credentials and project information must come from the
same place (principle of least surprise).
Be explicit in setting them rather than relying on Application Default Credentials:
we really encourage people to explicitly pass credentials and
project to the client constructor and not depend on the often surprising behavior of application default credentials.
Make the easy to find by setting them in the first place that will be searched:
GOOGLE_CLOUD_PROJECT environment variable
GOOGLE_APPLICATION_CREDENTIALS JSON file
With these in mind, passing them both in as environment variables ticks all the boxes.
Note: This looks to be the same throughout the Google Cloud Python Client Library and not just in Logging.

MLflow saves models to relative place instead of tracking_uri

sorry if my question is too basic, but cannot solve it.
I am experimenting with mlflow currently and facing the following issue:
Even if I have set the tracking_uri, the mlflow artifacts are saved to the ./mlruns/... folder relative to the path from where I run mlfow run path/to/train.py (in command line). The mlflow server searches for the artifacts following the tracking_uri (mlflow server --default-artifact-root here/comes/the/same/tracking_uri).
Through the following example it will be clear what I mean:
I set the following in the training script before the with mlflow.start_run() as run:
mlflow.set_tracking_uri("file:///home/#myUser/#SomeFolders/mlflow_artifact_store/mlruns/")
My expectation would be that mlflow saves all the artifacts to the place I gave in the registry uri. Instead, it saves the artifacts relative to place from where I run mlflow run path/to/train.py, i.e. running the following
/home/#myUser/ mlflow run path/to/train.py
creates the structure:
/home/#myUser/mlruns/#experimentID/#runID/artifacts
/home/#myUser/mlruns/#experimentID/#runID/metrics
/home/#myUser/mlruns/#experimentID/#runID/params
/home/#myUser/mlruns/#experimentID/#runID/tags
and therefore it doesn't find the run artifacts in the tracking_uri, giving the error message:
Traceback (most recent call last):
File "train.py", line 59, in <module>
with mlflow.start_run() as run:
File "/home/#myUser/miniconda3/envs/mlflow-ff56d6062d031d43990effc19450800e72b9830b/lib/python3.6/site-packages/mlflow/tracking/fluent.py", line 204, in start_run
active_run_obj = client.get_run(existing_run_id)
File "/home/#myUser/miniconda3/envs/mlflow-ff56d6062d031d43990effc19450800e72b9830b/lib/python3.6/site-packages/mlflow/tracking/client.py", line 151, in get_run
return self._tracking_client.get_run(run_id)
File "/home/#myUser/miniconda3/envs/mlflow-ff56d6062d031d43990effc19450800e72b9830b/lib/python3.6/site-packages/mlflow/tracking/_tracking_service/client.py", line 57, in get_run
return self.store.get_run(run_id)
File "/home/#myUser/miniconda3/envs/mlflow-ff56d6062d031d43990effc19450800e72b9830b/lib/python3.6/site-packages/mlflow/store/tracking/file_store.py", line 524, in get_run
run_info = self._get_run_info(run_id)
File "/home/#myUser/miniconda3/envs/mlflow-ff56d6062d031d43990effc19450800e72b9830b/lib/python3.6/site-packages/mlflow/store/tracking/file_store.py", line 544, in _get_run_info
"Run '%s' not found" % run_uuid, databricks_pb2.RESOURCE_DOES_NOT_EXIST
mlflow.exceptions.MlflowException: Run '788563758ece40f283bfbf8ba80ceca8' not found
2021/07/23 16:54:16 ERROR mlflow.cli: === Run (ID '788563758ece40f283bfbf8ba80ceca8') failed ===
Why is that so? How can I change the place where the artifacts are stored, this directory structure is created? I have tried mlflow run --storage-dir here/comes/the/path, setting the tracking_uri, registry_uri. If I run the /home/path/to/tracking/uri mlflow run path/to/train.py it works, but I need to run the scripts remotely.
My endgoal would be to change the artifact uri to an NFS drive, but even in my local computer I cannot do the trick.
Thanks for reading it, even more thanks if you suggest a solution! :)
Have a great day!
This issue was solved by the following:
I have mixed the tracking_uri with the backend_store_uri.
The tracking_uri is where the MLflow related data (e.g. tags, parameters, metrics, etc.) are saved, which can be a database. On the other hand, the artifact_location is where the artifacts (other, not MLflow related data belonging to the preprocessing/training/evaluation/etc. scripts).
What led me to mistakes is that by running mlflow server from command line one should set up for the --backend-store-uri the tracking_uri (also in the script by setting the mlflow.set_tracking_uri()) and for --default-artifact-location the location of the artifacts. Somehow I didn't get that the tracking_uri = backend_store_uri.
Here's my solution
Launch the server
mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri postgresql://DB_USER:DB_PASSWORD#DB_ENDPOINT:5432/DB_NAME --default-artifact-root s3://S3_BUCKET_NAME
Set the the tracking uri to an HTTP URI like
mlflow.set_tracking_uri("http://my-tracking-server:5000/")

Docker firewall issue with cBioportal

we are sitting behind a firewall and try to run a docker image (cBioportal). The docker itself could be installed with a proxy but now we encounter the following issue:
Starting validation...
INFO: -: Unable to read xml containing cBioPortal version.
DEBUG: -: Requesting cancertypes from portal at 'http://cbioportal-container:8081'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error occurred during validation step:
Traceback (most recent call last):
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4491, in request_from_portal_api
response.raise_for_status()
File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://cbioportal-container:8081/api-legacy/cancertypes
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/metaImport.py", line 127, in <module>
exitcode = validateData.main_validate(args)
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4969, in main_validate
portal_instance = load_portal_info(server_url, logger)
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4622, in load_portal_info
parsed_json = request_from_portal_api(path, api_name, logger)
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4495, in request_from_portal_api
) from e
ConnectionError: Failed to fetch metadata from the portal at [http://cbioportal-container:8081/api-legacy/cancertypes]
Now we know that it is a firewall issue, because it works when we install it outside the firewall. But we do not know how to change the firewall yet. Our idea was to look up the files and lines which throw the errors. But we do not know how to look into the files since they are within the docker.
So we can not just do something like
vim /cbioportal/core/src/main/scripts/importer/validateData.py
...because ... there is nothing. Of course we know this file is within the docker image, but like i said we dont know how to look into it. At the moment we do not know how to solve this riddle - any help appreciated.
maybe you still might need this.
You can access this python file within the container by usingdocker-compose exec cbioportal sh or docker-compose exec cbioportal bash
Then you can us cd, cat, vi, vim or else to access the given path in your post.
I'm not sure which command you're actually running but when I did the import call like
docker-compose run --rm cbioportal metaImport.py -u http://cbioportal:8080 -s study/lgg_ucsf_2014/lgg_ucsf_2014/ -o
I had to replace the http://cbioportal:8080 with the servers ip address.
Also notice that the studies path is one level deeper than in the official documentation.
In cbioportal behind proxy the study import is only available in offline mode via:
First you need to get inside the container
docker exec -it cbioportal-container bash
Then generate portal info folder
cd $PORTAL_HOME/core/src/main/scripts ./dumpPortalInfo.pl $PORTAL_HOME/my_portal_info_folder
Then import the study offline. -o is important to overwrite despite of warnings.
cd $PORTAL_HOME/core/src/main/scripts
./importer/metaImport.py -p $PORTAL_HOME/my_portal_info_folder -s /study/lgg_ucsf_2014 -v -o
Hope this helps.

Subprocess can't find file when executed from a Python file in Docker container

I have created a simple Flask app which I am trying to deploy to Docker.
The basic user interface will load on localhost, but when I execute a command which calls a specific function, it keeps showing:
"Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."
Looking at Docker logs I can see the problem is that the file cannot be found by the subprocess.popen command:
"FileNotFoundError: [Errno 2] No such file or directory: 'test_2.bat': 'test_2.bat'
172.17.0.1 - - [31/Oct/2019 17:01:55] "POST /login HTTP/1.1" 500"
The file certainly exists in the Docker environment, within the container I can see it listed in the root directory.
I have also tried changing:
item = subprocess.Popen(["test_2.bat", i], shell=False,stdout=subprocess.PIPE)
to:
item = subprocess.Popen(["./test_2.bat", i], shell=False,stdout=subprocess.PIPE)
which generated the alternative error:
"OSError: [Errno 8] Exec format error: './test_2.bat'
172.17.0.1 - - [31/Oct/2019 16:58:54] "POST /login HTTP/1.1" 500"
I have added a shebang to the top of both .py files involved in the Flask app (although I may have done this wrong):
#!/usr/bin/env python3
and this is the Dockerfile:
FROM python:3.6
RUN adduser lighthouse
WORKDIR /home/lighthouse
COPY requirements.txt requirements.txt
# RUN python -m venv venv
RUN pip install -r requirements.txt
RUN pip install gunicorn
COPY templates templates
COPY json_logs_nl json_logs_nl
COPY app.py full_script_manual_with_list.py schema_all.json ./
COPY bq_load_indv_jsons_v3.bat test_2.bat ./
RUN chmod 644 app.py
RUN pip install flask
ENV FLASK_APP app.py
RUN chown -R lighthouse:lighthouse ./
USER lighthouse
# EXPOSE 5000
CMD ["flask", "run", "--host=0.0.0.0"]
I am using Ubuntu and WSL2 to run Docker on a Windows machine without a virtual box. I have no trouble navigating my Windows file system or building Docker images so I think this configuration is not the problem - but just in case.
If anyone has any ideas to help subprocess locate test_2.bat I would be very grateful!
Edit: the app works exactly as expected when executed locally via the command line with "flask run"
If anyone is facing a similar problem, the solution was to put the command directly into the Python script rather than calling it in a separate file. It is split into separate strings to allow the "url" variable to be iteratively updated, as this all occurs within a for loop:
url = str(i)
var_command = "lighthouse " + url + " --quiet --chrome-flags=\" --headless\" --output=json output-path=/home/lighthouse/result.json"
item = subprocess.Popen([var_command], stdout=subprocess.PIPE, shell=True)
item.communicate()
As a side note, if you would like to run Lighthouse within a container you need to install it just as you would to run it on the command line, in a Node container. This container can then communicate with my Python container if both deployed in the same pod via Kubernetes and share a namespace. Here is a Lighthouse container Dockerfile I've used: https://github.com/GoogleChromeLabs/lighthousebot/blob/master/builder/Dockerfile

Docker compose error while creating a portus demo environment why?

I am trying to create a portus envirnment using docker compose, but I get this error and I don't know how to solve it:
ERROR: for crono Container command 'bin/crono' not found or does not exist.
Traceback (most recent call last):
File "<string>", line 3, in <module>
File "compose/cli/main.py", line 63, in main
AttributeError: 'ProjectError' object has no attribute 'msg'
docker-compose returned -1
Its probably that the Data Space Available value is near of 0 MB. You can check this value using the command "docker info".
If this is your case you can resolve your problem following the next steps:
If you haven't your images uploaded to a docker repository you should save it using the next command -> "docker save -o DockerImageName.tar "
su
systemctl stop docker
mkdir /new/path/to/docker
vim /etc/docker/daemon.json
Add the next lines to your daemon.json -> { "graph": "/new/path/to/docker" }
systemctl start docker
And you could try up you container again
su docker
If you have saved your docker images in the first step you should load the images with the next command -> "docker load -i DockerImageName.tar"
cd /path/to/docker-compose
docker-compose up -d

Resources