How to setup pytorch in google-cloud-ml

How to setup pytorch in google-cloud-ml - pytorch

I try to throw job with Pytorch code in google-cloud-ml.
so I code the "setup.py" file. And add option "install_requires"
"setup.py"
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['http://download.pytorch.org/whl/cpu/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl','torchvision']
setup(
name='trainer',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='My keras trainer application package.'
)
and throw the job to the google-cloud-ml, but it doesn't work
with error message
{
insertId: "3m78xtf9czd0u"
jsonPayload: {
created: 1516845879.49039
levelname: "ERROR"
lineno: 829
message: "Command '['pip', 'install', '--user', '--upgrade', '--force-reinstall', '--no-deps', u'trainer-0.1.tar.gz']' returned non-zero exit status 1"
pathname: "/runcloudml.py"
}
labels: {
compute.googleapis.com/resource_id: "6637909247101536087"
compute.googleapis.com/resource_name: "cmle-training-master-5502b52646-0-ql9ds"
compute.googleapis.com/zone: "us-central1-c"
ml.googleapis.com/job_id: "run_ml_engine_pytorch_test_20180125_015752"
ml.googleapis.com/job_id/log_area: "root"
ml.googleapis.com/task_name: "master-replica-0"
ml.googleapis.com/trial_id: ""
}
logName: "projects/exem-191100/logs/master-replica-0"
receiveTimestamp: "2018-01-25T02:04:55.421517460Z"
resource: {
labels: {…}
type: "ml_job"
}
severity: "ERROR"
timestamp: "2018-01-25T02:04:39.490387916Z"
}
====================================================================
See detailed message here
so how can i use pytorch in google cloud ml engine?

i find solution about setting up PYTORCH in google-cloud-ml
first
you have to get a .whl file about pytorch and store it to google storage bucket.
and you will get the link for bucket link.
gs://bucketname/directory/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl
the .whl file is depend on your python version or cuda version....
second
you write the command line and setup.py because you have to set up the google-cloud-ml setting.
related link is this submit_job_to_ml-engine
you write the setup.py file to describe your setup.
the related link is this write_setup.py_file
this is my command code and setup.py file
=====================================================================
"command"
#commandline code
JOB_NAME="run_ml_engine_pytorch_test_$(date +%Y%m%d_%H%M%S)"
REGION=us-central1
OUTPUT_PATH=gs://yourbucket
gcloud ml-engine jobs submit training $JOB_NAME \
--job-dir $OUTPUT_PATH \
--runtime-version 1.4 \
--module-name models.pytorch_test \
--package-path models/ \
--packages gs://yourbucket/directory/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl \
--region $REGION \
-- \
--verbosity DEBUG
=====================================================================
"setup.py"
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['torchvision']
setup(
name='trainer',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='My pytorch trainer application package.'
)
=====================================================================
third
if you have experience submitting job to the ml-engine.
you might know the file structure about submitting ml-engine
packaging_training_model.
you have to follow above link and know how to pack files.

The actual error message is a bit buried, but it is this:
'install_requires' must be a string or list of strings containing
valid project/version requirement specifiers; Invalid requirement,
parse error at "'://downl'"
To use packages not hosted on PyPI, you need to use dependency_links (see this documentation). Something like this ought to work:
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['torchvision']
DEPENDENCY_LINKS = ['http://download.pytorch.org/whl/cpu/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl']
setup(
name='trainer',
version='0.1',
install_requires=REQUIRED_PACKAGES,
dependency_links=DEPENDENCY_LINKS,
packages=find_packages(),
include_package_data=True,
description='My keras trainer application package.'
)

Related

Google cloud function error with wheels & pywin32

I'm trying to deploy a script on google cloud functions for the first time. I went through the documentation and figured out the basics. Then, I started trying to deploy my actual script. I'm facing an error with dependencies from the requirements.txt file. I'm at the stage where I don't know enough to be specific about my problem so I'll list down what I did.
After I run the gcloud command gcloud functions deploy FILENAME --runtime python37 with my file name, I hit this error:
ERROR: (gcloud.functions.deploy) OperationError: code=3, message=Build failed:
{
"error": {
"canonicalCode": "INVALID_ARGUMENT",
"errorMessage": "`pip_download_ wheels` had stderr output:\nERROR: Could not find
a version that satisfies the requirement pywin32==227 (from -r requirements.txt (line 32))
(from versions: n one)\nERROR:\r\nNo matching distribution found for pywin32==227 (from -r requirements.txt (line 32))
\n\nerror: `pip_download_wheels` returned code: 1",
"errorTyp e": "InternalError",
"errorId": "8C994D6A"
}
}
This is my requirements.txt file:
attrs==19.3.0
autobahn==20.4.3
Automat==20.2.0
cachetools==4.1.0
certifi==2020.4.5.1
cffi==1.14.0
chardet==3.0.4
constantly==15.1.0
cryptography==2.9.2
enum34==1.1.10
google-api-core==1.17.0
google-auth==1.14.1
google-cloud-bigquery==1.24.0
google-cloud-core==1.3.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
hyperlink==19.0.0
idna==2.9
incremental==17.5.0
kiteconnect==3.8.2
numpy==1.18.3
pandas==1.0.3
protobuf==3.11.3
pyarrow==0.17.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
PyHamcrest==2.0.2
pyOpenSSL==19.1.0
python-dateutil==2.8.1
pytz==2020.1
pywin32==227
requests==2.23.0
rsa==4.0
service-identity==18.1.0
six==1.14.0
tqdm==4.45.0
Twisted==20.3.0
txaio==20.4.1
urllib3==1.25.9
wincertstore==0.2
zope.interface==5.1.0
Can you help me figure out how to get past this error?
Edit: Based on the suggestion to only keep required dependencies in the requirements.txt file, I tried that and I'm getting a slightly different error
ERROR: (gcloud.functions.deploy) OperationError: code=3, message=Build failed:
{
"error": {
"canonicalCode": "INVALID_ARGUMENT",
"errorMessage": "`pip_download_\r\nwheels` had stderr output:
\n WARNING: Legacy build of wheel for 'kiteconnect' created no files.
\n Command arguments: /opt/python3.7/bin/python3.7 -u -c 'imp\r\nort sys, setuptools, tokenize; sys.argv[0] = '\"'\"'/tmp/pip-wheel-fdr9r30n/kiteconnect/setup.py'\"'\"'; __file__='\"'\"'/tmp/pip-wheel-fdr9r30n/kiteconnect/s\r\netup.py'\"'\"';f=getattr(tokenize, '\"'\"'open'\"'\"', open)(__file__);code=f.read().replace('\"'\"'\\r\\n'\"'\"', '\"'\"'\\n'\"'\"');f.close();exec(compile(c\r\node, __file__, '\"'\"'exec'\"'\"'))' bdist_wheel -d /tmp/pip-wheel-zkanpa3p\n Command output: [use --verbose to show]\nERROR: Failed to build one or more whe\r\nels\n\nerror: `pip_download_wheels` returned code: 1",
"errorType": "InternalError",
"errorId": "7EF920E4"
}
}
The new requirements.txt file looks like this:
google-api-core==1.17.0
google-auth==1.14.1
google-cloud-bigquery==1.24.0
google-cloud-core==1.3.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
kiteconnect==3.8.2
numpy==1.18.3
pandas==1.0.3
pyarrow==0.17.0
python-dateutil==2.8.1
tqdm==4.45.0

The pywin32 package only provides distributions for the Windows platform, so you won't be able to install it in the Google Cloud Functions runtime.
Do you really need it? Your requirements.txt file looks like the output of pip freeze. You probably don't need all those dependencies. It should only include the dependencies you need to import in your function.

cv2 with Gstreamer. ERROR: GST_PIPELINE grammar.y:716:priv_gst_parse_yyparse: no element xxx

I have successfully bult opencv with Gstreamer but cannot run a simple test programm with reading a rtsp videoflow and writing frames into a new one.
cmake build flags: gotten from https://medium.com/#galaktyk01/how-to-build-opencv-with-gstreamer-b11668fa09c p.s all libs are also were installed
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D INSTALL_PYTHON_EXAMPLES=ON \
-D INSTALL_C_EXAMPLES=OFF \
-D PYTHON_EXECUTABLE=$(which python3) \
-D BUILD_opencv_python2=OFF \
-D CMAKE_INSTALL_PREFIX=$(python3 -c "import sys; print(sys.prefix)") \
-D PYTHON3_EXECUTABLE=$(which python3) \
-D PYTHON3_INCLUDE_DIR=$(python3 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
-D PYTHON3_PACKAGES_PATH=$(python3 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \
-D WITH_GSTREAMER=ON \
-D BUILD_EXAMPLES=ON ..
code snippet: with free working rtsp camera
import cv2
import sys
gst = "rtspsrc location=rtsp://170.93.143.139/rtplive/470011e600ef003a004ee33696235daa latency=0 ! rtph264depay ! h264parse ! omxh264dec ! videoconvert ! appsink"
video_capture = cv2.VideoCapture(gst)
if video_capture.isOpened() == False:
print("VideoCapture Failed!")
sys.exit(1)
cv2 built information
print(cv2.getBuildInformation())
GStreamer:
base: YES (ver 1.8.3)
video: YES (ver 1.8.3)
app: YES (ver 1.8.3)
riff: YES (ver 1.8.3)
pbutils: YES (ver 1.8.3)
libv4l/libv4l2: NO
v4l/v4l2: linux/videodev2.h
gPhoto2: NO
error with gst logging enabled with export GST_DEBUG=2
python3 test.py
0:00:00.010455708 1392 0x2674e10 ERROR GST_PIPELINE grammar.y:716:priv_gst_parse_yyparse: no element "omxh264dec"
0:00:00.010472883 1----------------------392 0x2674e10 ERROR GST_PIPELINE grammar.y:801:priv_gst_parse_yyparse: link has no sink [source=#0x267f3e0]
0:00:00.010722031 1392 0x2674e10 ERROR GST_PIPELINE grammar.y:801:priv_gst_parse_yyparse: link has no source [sink=#0x2689390]
0:00:01.579497450 1392 0x7ff6a002f450 WARN basesrc gstbasesrc.c:2948:gst_base_src_loop:<udpsrc0> error: Internal data flow error.
0:00:01.579576046 1392 0x7ff6a002f450 WARN basesrc gstbasesrc.c:2948:gst_base_src_loop:<udpsrc0> error: streaming task paused, reason not-linked (-1)
0:00:01.580609562 1392 0x25fdb70 WARN rtspsrc gstrtspsrc.c:5483:gst_rtspsrc_try_send:<rtspsrc0> send interrupted
0:00:01.580704628 1392 0x25fdb70 WARN rtspsrc gstrtspsrc.c:7552:gst_rtspsrc_pause:<rtspsrc0> PAUSE interrupted
OpenCV Error: Unspecified error (GStreamer: unable to start pipeline
) in cvCaptureFromCAM_GStreamer, file /opencv-3.4.0/modules/videoio/src/cap_gstreamer.cpp, line 890
VIDEOIO(cvCreateCapture_GStreamer (CV_CAP_GSTREAMER_FILE, filename)): raised OpenCV exception:
/opencv-3.4.0/modules/videoio/src/cap_gstreamer.cpp:890: error: (-2) GStreamer: unable to start pipeline
in function cvCaptureFromCAM_GStreamer

Using xgboost in azure ML environment

I cannot succeed to use xgboost package in Azure Machine Learning Studio interpreter. I am trying to import a model using xgboost that I trained in order to deploy it here. But It seems that my package is not set correctly because I cannot access to certain functions, particularly "xgboost.sklearn"
My model is of course using the xgboost.sklearn.something to do the classification
I tried to implement the package following two different methods :
from the tar.gz principle like here :
How can certain python libraries be imported in azure ML?Like the line import humanfriendly gives error
and also by clean package with the sandbox like there :
Uploading xgboost to azure machine learning: %1 is not a valid Win32 application\r\nProcess returned with non-zero exit code 1
import sys
import sklearn
import pandas as pd
import pickle
import xgboost
def azureml_main(dataframe1 = None, dataframe2 = None):
sys.path.insert(0,".\Script Bundle")
model = pickle.load(open(".\\Script
Bundle\\xgboost\\XGBv1.pkl",'rb'))
dataframe1, dftrue = filterdata(dataframe1)
## one processing step
pred = predictorV1(dataframe1,dftrue)
dataframe1['Y'] = pred
return dataframe1
Here is the error I get
Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
Caught exception while executing function: Traceback (most recent call last):
File "C:\server\invokepy.py", line 199, in batch
odfs = mod.azureml_main(*idfs)
File "C:\temp\1098d8754a52467181a9509ed16de8ac.py", line 89, in azureml_main
model = pickle.load(open(".\\Script Bundle\\xgboost\\XGBv1.pkl", 'rb'))
ImportError: No module named 'xgboost.sklearn'
Process returned with non-zero exit code 1
---------- End of error message from Python interpreter ----------
Start time: UTC 05/22/2019 13:11:08
End time: UTC 05/22/2019 13:11:49

Python3 xdist with addoption

I want to use the custom options with distributed and subprocess testing.
I have 2 addoption --resources_dir and --output_dir.
Try to start it with :
python3 -m pytest -vs --junitxml=/tmp/result_alert_test.xml --resources_dir=test/resources --output_dir=/tmp/ -n auto test_*
The error message:
Replacing crashed worker gw82
Cusage: -c [options] [file_or_dir] [file_or_dir] [...]
-c: error: the following arguments are required: --resources_dir, --output_dir
[gw83] node down: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/execnet/gateway_base.py", line 1072, in executetask
Without xdist (-n auto) when i run the tests in a single process, it is working.
python3 -m pytest -vs --junitxml=/tmp/result_alert_test.xml --resources_dir=test/resources --output_dir=/tmp/ test_*
If i start with the last command. Its work with single process. No errors.
=============================== test session starts ===============================
platform linux -- Python 3.5.2, pytest-3.5.0, py-1.5.3, pluggy-0.6.0 -- /usr/bin/python3
cachedir: ../../../../../.pytest_cache
rootdir: /, inifile:
plugins: xdist-1.22.2, forked-0.2
collected 115 items

https://github.com/pytest-dev/pytest/issues/2026
There is not fix for this bug.
I used environments.
python3 -m pytest -vsx --full-trace --junitxml=${TEST_REPORT_DIR}/result_alert_test.xml --tx=popen//env:TEST_DIR=${TESTS_ROOT} --tx=popen//env:TEST_OUTPUT_DIR=${TEST_OUTPUT_DIR} -n auto -vs test_*

Pyspark Inferring Timezone by location

I'm trying to infer timezone in PySpark given the longitude and latitude of an event. I came across the timezonefinder library which works locally. I wrapped it in a user defined function in an attempt to use it as the timezone inferrer.
def get_timezone(longitude, latitude):
from timezonefinder import TimezoneFinder
tzf = TimezoneFinder()
return tzf.timezone_at(lng=longitude, lat=latitude)
udf_timezone = F.udf(get_timezone, StringType())
df = sqlContext.read.parquet(INPUT)
df.withColumn("local_timezone", udf_timezone(df.longitude, df.latitude))\
.write.parquet(OUTPUT)
When I run on a single node, this code works. However, when running in parallel, I get the following error:
File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1525907011747_0007/container_1525907011747_0007_01_000062/pyspark.zip/pyspark/worker.py", line 177, in main
process()
File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1525907011747_0007/container_1525907011747_0007_01_000062/pyspark.zip/pyspark/worker.py", line 172, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1525907011747_0007/container_1525907011747_0007_01_000062/pyspark.zip/pyspark/worker.py", line 104, in <lambda>
func = lambda _, it: map(mapper, it)
File "<string>", line 1, in <lambda>
File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1525907011747_0007/container_1525907011747_0007_01_000062/pyspark.zip/pyspark/worker.py", line 71, in <lambda>
return lambda *a: f(*a)
File "/tmp/c95422912bfb4079b64b88427991552a/enrich_data.py", line 64, in get_timezone
File "/opt/conda/lib/python2.7/site-packages/timezonefinder/__init__.py", line 3, in <module>
from .timezonefinder import TimezoneFinder
File "/opt/conda/lib/python2.7/site-packages/timezonefinder/timezonefinder.py", line 59, in <module>
from .helpers_numba import coord2int, int2coord, distance_to_polygon_exact, distance_to_polygon, inside_polygon, \
File "/opt/conda/lib/python2.7/site-packages/timezonefinder/helpers_numba.py", line 17, in <module>
#jit(b1(i4, i4, i4[:, :]), nopython=True, cache=True)
File "/opt/conda/lib/python2.7/site-packages/numba/decorators.py", line 191, in wrapper
disp.enable_caching()
File "/opt/conda/lib/python2.7/site-packages/numba/dispatcher.py", line 529, in enable_caching
self._cache = FunctionCache(self.py_func)
File "/opt/conda/lib/python2.7/site-packages/numba/caching.py", line 614, in __init__
self._impl = self._impl_class(py_func)
File "/opt/conda/lib/python2.7/site-packages/numba/caching.py", line 349, in __init__
"for file %r" % (qualname, source_path))
RuntimeError: cannot cache function 'inside_polygon': no locator available for file '/opt/conda/lib/python2.7/site-packages/timezonefinder/helpers_numba.py'
I can import the library locally on the nodes where I got the error.
Any solution along these line would be appreciated:
Is there a native Spark to do the task?
Is there another way to load the library?
Is there a way to avoid caching numba does?

Eventually this was solved by abandoning timezonefinder completely, and instead, using the geo-spatial timezone dataset from timezone-boundary-builder, while querying using magellan, the geo-spatial sql query library for spark.
One caveat I had was the fact that the Point and other objects in the library were not wrapped for Python. I ended up writing my own scala function for timezone matching, and dropped the objects from magellan before returning the dataframe.

Encountered this error when running timezonefinder on spark cluster.
RuntimeError: cannot cache function 'inside_polygon': no locator available for file '/disk-1/hadoop/yarn/local/usercache/timezonefinder1.zip/timezonefinder/helpers_numba.py'
The issue was that numpy versions were different on cluster and timezonefinder package that we shipped to spark.
Cluster had numpy - 1.13.3 where as numpy on timezonefinder.zip was 1.17.2.
To overcome version mismatches, we created a custom conda environment with timezonefinder and numpy 1.17.2 and submitted spark job using custom conda environment.
Creating Custom Conda Environment with timezonefinder package installed:
conda create --name timezone-conda python timezonefinder
source activate timezone-conda
conda install -y conda-pack
conda pack -o timezonecondaevnv.tar.gz -d ./MY_CONDA_ENV
https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands
Submitting spark job with custom conda environment:
!spark-submit --name app_name \
--master yarn \
--deploy-mode cluster \
--driver-memory 1024m \
--executor-memory 1GB \
--executor-cores 5 \
--num-executors 10 \
--queue QUEUE_NAME\
--archives ./timezonecondaevnv.tar.gz#MY_CONDA_ENV \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./MY_CONDA_ENV/bin/python \
--conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=./MY_CONDA_ENV/bin/python \
--conf spark.executorEnv.PYSPARK_PYTHON=./MY_CONDA_ENV/bin/python \
--conf spark.executorEnv.PYSPARK_DRIVER_PYTHON=./MY_CONDA_ENV/bin/python \
./main.py

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to setup pytorch in google-cloud-ml - pytorch

Related

Google cloud function error with wheels & pywin32

cv2 with Gstreamer. ERROR: GST_PIPELINE grammar.y:716:priv_gst_parse_yyparse: no element xxx

Using xgboost in azure ML environment

Python3 xdist with addoption

Pyspark Inferring Timezone by location

Categories

Resources