I recently ported my scripts from 2.x to 3.x. During production runs through automation (rundeck) we are seeing errors caused by the logger not handling blocking I/O. Any ideas how to resolve would be great.
Ubuntu 18.04.1 LTS
Python 3.6.7
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.6/logging/__init__.py", line 998, in emit
self.flush()
File "/usr/lib/python3.6/logging/__init__.py", line 978, in flush
self.stream.flush()
BlockingIOError: [Errno 11] write could not complete without blocking
I was getting the same error on CI builds. It looks like it was a capacity issue with the output stream. After reducing the log output, the errors went away.
I recently faced the error while building my Docker image using docker-compose in CI and I found one Workaround maybe that will help someone:
the Error:
BlockingIOError: [Errno 11] write could not complete without blocking
if you do not want to lose any logs , you can send all the logs to file and save it as an artifact , tested on Bamboo and Jenkins:
docker-compose build --no-cache my_image > myfile.txt
if you do not want the logs:
docker-compose build --no-cache my_image > /dev/null
Related
I have some strange Pytest behaviour I can't explain and was wondering if someone had already seen this before.
So we're running Pytest with 4 treads - pytest -n 4 tests/e2e.
Now, sometimes, it runs the same test on 2 different workers, like here:
[gw1] [ 40%] PASSED tests/e2e/test_get_connector.py::test_invoke_api_with_permission_not_found
..... more tests....
[gw3] [100%] PASSED tests/e2e/test_get_connector.py::test_invoke_api_with_permission_not_found Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/jenkins/.virtualenvs/lib/python3.8/site-packages/jsii/_kernel/providers/process.py", line 284, in stop
self._process.stdin.close()
BrokenPipeError: [Errno 32] Broken pipe
You can see that the same test is run 2 times on 2 different worker threads.
The second run of that test fails with a Broken Pipe error, but the whole test run still succeeds (the second test is also marked as PASSED, as you can see) and the Jenkins job also succeeds.
Anyone has any idea why this is happening?
This behaviour makes it hard to trust the pipeline..
Thanks!
I followed the official Airflow docker guide.
It works fine for most of the simple jobs I have.
I tried to use this guide for that I needed to add in the .env file this line:
_PIP_ADDITIONAL_REQUIREMENTS=pyspark xlrd apache-airflow-providers-apache-spark
Unfortunately, the dag is not being loaded.
The problem seems to be related to JAVA_HOME because the docker output shows this message:
airflow-scheduler_1 | is not set
In the Airflow web GUI it shows the following erro:
Broken DAG: [/opt/airflow/dags/SparkETL.py] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/pyspark/context.py", line 339, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/home/airflow/.local/lib/python3.7/site-packages/pyspark/java_gateway.py", line 108, in launch_gateway
raise RuntimeError("Java gateway process exited before sending its port number")
RuntimeError: Java gateway process exited before sending its port number
I tried to add install -y openjdk-11-jdk command in the docker-compose, and set JAVA_HOME: '/usr/lib/jvm/java-11-openjdk-amd64' also in the docker compose. In this situation airflow_schedule dumps that the path does not exist.
sorry if my question is too basic, but cannot solve it.
I am experimenting with mlflow currently and facing the following issue:
Even if I have set the tracking_uri, the mlflow artifacts are saved to the ./mlruns/... folder relative to the path from where I run mlfow run path/to/train.py (in command line). The mlflow server searches for the artifacts following the tracking_uri (mlflow server --default-artifact-root here/comes/the/same/tracking_uri).
Through the following example it will be clear what I mean:
I set the following in the training script before the with mlflow.start_run() as run:
mlflow.set_tracking_uri("file:///home/#myUser/#SomeFolders/mlflow_artifact_store/mlruns/")
My expectation would be that mlflow saves all the artifacts to the place I gave in the registry uri. Instead, it saves the artifacts relative to place from where I run mlflow run path/to/train.py, i.e. running the following
/home/#myUser/ mlflow run path/to/train.py
creates the structure:
/home/#myUser/mlruns/#experimentID/#runID/artifacts
/home/#myUser/mlruns/#experimentID/#runID/metrics
/home/#myUser/mlruns/#experimentID/#runID/params
/home/#myUser/mlruns/#experimentID/#runID/tags
and therefore it doesn't find the run artifacts in the tracking_uri, giving the error message:
Traceback (most recent call last):
File "train.py", line 59, in <module>
with mlflow.start_run() as run:
File "/home/#myUser/miniconda3/envs/mlflow-ff56d6062d031d43990effc19450800e72b9830b/lib/python3.6/site-packages/mlflow/tracking/fluent.py", line 204, in start_run
active_run_obj = client.get_run(existing_run_id)
File "/home/#myUser/miniconda3/envs/mlflow-ff56d6062d031d43990effc19450800e72b9830b/lib/python3.6/site-packages/mlflow/tracking/client.py", line 151, in get_run
return self._tracking_client.get_run(run_id)
File "/home/#myUser/miniconda3/envs/mlflow-ff56d6062d031d43990effc19450800e72b9830b/lib/python3.6/site-packages/mlflow/tracking/_tracking_service/client.py", line 57, in get_run
return self.store.get_run(run_id)
File "/home/#myUser/miniconda3/envs/mlflow-ff56d6062d031d43990effc19450800e72b9830b/lib/python3.6/site-packages/mlflow/store/tracking/file_store.py", line 524, in get_run
run_info = self._get_run_info(run_id)
File "/home/#myUser/miniconda3/envs/mlflow-ff56d6062d031d43990effc19450800e72b9830b/lib/python3.6/site-packages/mlflow/store/tracking/file_store.py", line 544, in _get_run_info
"Run '%s' not found" % run_uuid, databricks_pb2.RESOURCE_DOES_NOT_EXIST
mlflow.exceptions.MlflowException: Run '788563758ece40f283bfbf8ba80ceca8' not found
2021/07/23 16:54:16 ERROR mlflow.cli: === Run (ID '788563758ece40f283bfbf8ba80ceca8') failed ===
Why is that so? How can I change the place where the artifacts are stored, this directory structure is created? I have tried mlflow run --storage-dir here/comes/the/path, setting the tracking_uri, registry_uri. If I run the /home/path/to/tracking/uri mlflow run path/to/train.py it works, but I need to run the scripts remotely.
My endgoal would be to change the artifact uri to an NFS drive, but even in my local computer I cannot do the trick.
Thanks for reading it, even more thanks if you suggest a solution! :)
Have a great day!
This issue was solved by the following:
I have mixed the tracking_uri with the backend_store_uri.
The tracking_uri is where the MLflow related data (e.g. tags, parameters, metrics, etc.) are saved, which can be a database. On the other hand, the artifact_location is where the artifacts (other, not MLflow related data belonging to the preprocessing/training/evaluation/etc. scripts).
What led me to mistakes is that by running mlflow server from command line one should set up for the --backend-store-uri the tracking_uri (also in the script by setting the mlflow.set_tracking_uri()) and for --default-artifact-location the location of the artifacts. Somehow I didn't get that the tracking_uri = backend_store_uri.
Here's my solution
Launch the server
mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri postgresql://DB_USER:DB_PASSWORD#DB_ENDPOINT:5432/DB_NAME --default-artifact-root s3://S3_BUCKET_NAME
Set the the tracking uri to an HTTP URI like
mlflow.set_tracking_uri("http://my-tracking-server:5000/")
Recently I have needed to add web sockets to my backend application currently hosted on Google App Engine (GAE) standard environment. Because web sockets are a feature only available in GAE's flexible environment, I have been attempting a redeployment but with little success.
To make the change to a flexible environment I have updated the app.yaml file from
runtime: nodejs10
env: standard
to
runtime: nodejs
env: flex
While previously working in the standard environment, now with env: flex when I run the command gcloud app deploy --app-yaml=app-staging.yaml --verbosity=debug I get the following stack trace:
Do you want to continue (Y/n)? Y
DEBUG: No bucket specified, retrieving default bucket.
DEBUG: Using bucket [gs://staging.finnsalud.appspot.com].
DEBUG: Service [appengineflex.googleapis.com] is already enabled for project [finnsalud]
Beginning deployment of service [finnsalud-staging]...
INFO: Using ignore file at [~/checkouts/twilio/backend/.gcloudignore].
DEBUG: not expecting type '<class 'NoneType'>'
Traceback (most recent call last):
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 982, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 809, in Run
resources = command_instance.Run(args)
File "/google-cloud-sdk/lib/surface/app/deploy.py", line 115, in Run
return deploy_util.RunDeploy(
File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 669, in RunDeploy
deployer.Deploy(
File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 428, in Deploy
source_files = source_files_util.GetSourceFiles(
File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/source_files_util.py", line 184, in GetSourceFiles
return list(it)
File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/util/gcloudignore.py", line 233, in GetIncludedFiles
six.ensure_str(upload_directory), followlinks=True):
File "//google-cloud-sdk/lib/third_party/six/__init__.py", line 884, in ensure_str
raise TypeError("not expecting type '%s'" % type(s))
TypeError: not expecting type '<class 'NoneType'>'
ERROR: gcloud crashed (TypeError): not expecting type '<class 'NoneType'>'
In this stack trace, it mentions an error in google-cloud-sdk/lib/googlecloudsdk/command_lib/util/gcloudignore.py so I had also reviewed my .gcloudignore file but was unable to find anything out of place:
.gcloudignore
.git
.gitignore
node_modules/
In an attempt to work around this bug I tried removing my .gcloudignore file which resulted in a different error, but still failed nevertheless:
Do you want to continue (Y/n)? Y
DEBUG: No bucket specified, retrieving default bucket.
DEBUG: Using bucket [gs://staging.finnsalud.appspot.com].
DEBUG: Service [appengineflex.googleapis.com] is already enabled for project [finnsalud]
Beginning deployment of service [finnsalud-staging]...
DEBUG: expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 982, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 809, in Run
resources = command_instance.Run(args)
File "/google-cloud-sdk/lib/surface/app/deploy.py", line 115, in Run
return deploy_util.RunDeploy(
File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 669, in RunDeploy
deployer.Deploy(
File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 428, in Deploy
source_files = source_files_util.GetSourceFiles(
File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/source_files_util.py", line 184, in GetSourceFiles
return list(it)
File "/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/util.py", line 165, in FileIterator
entries = set(os.listdir(os.path.join(base, current_dir)))
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
ERROR: gcloud crashed (TypeError): expected str, bytes or os.PathLike object, not NoneType
Thinking maybe this was an error relating to the version of my CLI I have also run the following commands to try and update:
gcloud app update
gcloud components update
Unfortunately, this had no change on the output.
I have noticed that when I run this command with the app.yaml env value set to flexible, there are no updates to the logging section on google cloud and no changes to the files uploaded to the project's storage bucket. To me, this indicates that the crash is occurring in the CLI before any communication to the google cloud services is made. If this is correct, then it seems unlikely that the cause of the error would be related to a bad configuration on google cloud and must be related to something (software or configuration) on my local machine.
I have also tried using the 'Hello World' app.yaml configuration on the flexible environments 'Getting Started' page to rule out a configuration error my own application's app.yaml but this also had no change on the output.
Finally, if at any point I change env: flex back to env: standard then the issue does disappear. Unfortunately, as stated above, this won't work for deploying my web sockets feature.
This has gotten me thinking that possibly the error is due to a bug with the gcloud cli application. However, if this were the case, I would have expected to see many more bug reports for this issue by others whom are also using the GAE's flexible environment.
Regardless, given this stack trace points to code within the gcloud cli, I have opened a bug ticket with google which can be found here: https://issuetracker.google.com/issues/176839574
I have also seen this similar SO post, but it is not the exact error I am experiencing and remains unresolved: gcloud app deploy fails with flexible environment
If anyone has any ideas on other steps to try or methods to overcome this issue, I would be immensely grateful if you drop a note on this post. Thanks!
I deployed a nodejs application using the Quickstart for Node.js in the standard environment
Then I changed the app.yaml file from :
runtime: nodejs10
to
runtime: nodejs
env: flex
Everything worked as expected.
It might be related to your specific use case.
Surprisingly, this issue does seem to be related to a bug in the gcloud cli. However, there does seem to be a workaround.
When a --appyaml flag is specified for a deployment to the flex environment, then the CLI crashes with the messages outlines in my question above. However, if you copy your .yaml file renaming to app.yaml (the default) and delete this --appyaml flag when deploying then the build will proceed without errors.
If you have also experienced this error, please follow the google issue as I am working with the google engineers to be sure they reproduce and eventually fix this bug.
Broken app.yaml
runtime:nodejs14
Fixed app.yaml
runtime: nodejs14
I am dead serious. And :
glcoud info --run-diagnostics
was ZERO HELP.
Once I did this the "ERROR: gcloud crashed (TypeError): expected string or bytes-like object" went away.
I guess "colon + space" is part of the spec:
Why does the YAML spec mandate a space after the colon?
we are sitting behind a firewall and try to run a docker image (cBioportal). The docker itself could be installed with a proxy but now we encounter the following issue:
Starting validation...
INFO: -: Unable to read xml containing cBioPortal version.
DEBUG: -: Requesting cancertypes from portal at 'http://cbioportal-container:8081'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error occurred during validation step:
Traceback (most recent call last):
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4491, in request_from_portal_api
response.raise_for_status()
File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://cbioportal-container:8081/api-legacy/cancertypes
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/metaImport.py", line 127, in <module>
exitcode = validateData.main_validate(args)
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4969, in main_validate
portal_instance = load_portal_info(server_url, logger)
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4622, in load_portal_info
parsed_json = request_from_portal_api(path, api_name, logger)
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4495, in request_from_portal_api
) from e
ConnectionError: Failed to fetch metadata from the portal at [http://cbioportal-container:8081/api-legacy/cancertypes]
Now we know that it is a firewall issue, because it works when we install it outside the firewall. But we do not know how to change the firewall yet. Our idea was to look up the files and lines which throw the errors. But we do not know how to look into the files since they are within the docker.
So we can not just do something like
vim /cbioportal/core/src/main/scripts/importer/validateData.py
...because ... there is nothing. Of course we know this file is within the docker image, but like i said we dont know how to look into it. At the moment we do not know how to solve this riddle - any help appreciated.
maybe you still might need this.
You can access this python file within the container by usingdocker-compose exec cbioportal sh or docker-compose exec cbioportal bash
Then you can us cd, cat, vi, vim or else to access the given path in your post.
I'm not sure which command you're actually running but when I did the import call like
docker-compose run --rm cbioportal metaImport.py -u http://cbioportal:8080 -s study/lgg_ucsf_2014/lgg_ucsf_2014/ -o
I had to replace the http://cbioportal:8080 with the servers ip address.
Also notice that the studies path is one level deeper than in the official documentation.
In cbioportal behind proxy the study import is only available in offline mode via:
First you need to get inside the container
docker exec -it cbioportal-container bash
Then generate portal info folder
cd $PORTAL_HOME/core/src/main/scripts ./dumpPortalInfo.pl $PORTAL_HOME/my_portal_info_folder
Then import the study offline. -o is important to overwrite despite of warnings.
cd $PORTAL_HOME/core/src/main/scripts
./importer/metaImport.py -p $PORTAL_HOME/my_portal_info_folder -s /study/lgg_ucsf_2014 -v -o
Hope this helps.