I am trying to execute a mpi-based task with python-sdk for batch pool.
My setup includes the following steps:
Create the pool from template json.
Create SAS URLs and upload the executable & supporting files as common resource files.
Create the Task Object
Add Environmental settings to tasks. (using azure.batch.models.EnvironmentSetting)
Add the coordination script and run_script.
The coordination script looks like:
#!/usr/bin/bash
export MPI_HOME="/usr/local/openmpi4.0/openmpi-4.0.0/build"
export CUDA_HOME="/usr/local/cuda-11.0"
export LD_LIBRARY_PATH="$MPI_HOME/lib:$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
export PATH="$MPI_HOME/bin:$CUDA_HOME/bin:$PATH"
export PYTHONPATH="/usr/local/lib64/python3.6/site-packages:$PYTHONPATH"
The run_script looks like:
#!/usr/bin/bash
export MPI_HOME="/usr/local/openmpi4.0/openmpi-4.0.0/build"
export CUDA_HOME="/usr/local/cuda-11.0"
export LD_LIBRARY_PATH="$MPI_HOME/lib:$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
export PATH="$MPI_HOME/bin:$CUDA_HOME/bin:$PATH"
export PYTHONPATH="/usr/local/lib64/python3.6/site-packages:$PYTHONPATH"
mpirun -np 3 --prefix /usr/local/openmpi4.0/openmpi-4.0.0/build --map-by ppr:4:node --oversubscribe --mca btl_openib_allow_ib 1 -host $AZ_BATCH_HOST_LIST $AZ_BATCH_TASK_SHARED_DIR/EXECUTABLE_FILE $AZ_BATCH_TASK_SHARED_DIR/INPUT_FILE_WITH_PARAMS
I am stuck with the error:
/path/to/my/executable: error while loading shared libraries: libmpi.so.12: cannot open shared object file: No such file or directory
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
I am not sure what I am doing wrong.
Clearly, the env settings is not working.
Can you point to how can I go about the env setup for the tasks ?
Related
I want to deploy my Node.js application in Pivotal Cloud Foundry using manifest.yml. I need to update the PATH variable of the container before the application starts, to include the path of a directory in my application's src directory. Can this be achieved?
manifest.yml:
---
applications:
- name: node-apollo-graphql-server
command: npm start
instances: 1
memory: 512M
buildpack: dicf_nodejs_buildpack_rc
stack: cflinuxfs3
You cannot do this by setting env variables with cf push -e or the env: block in manifest.yml. If you set path using one of these methods, you'll override path when what you likely want to do is append to it.
To append to $PATH, add a file .profile to the root of your project (directory from which you're running cf push). In that file, put one line export PATH=$PATH:<new loc> where <new loc> is the path you want to append to the $PATH env variable.
The .profile file is sourced before your application starts so you can use this to dynamically set environment variables or apply configuration before your application starts up.
The only caveat is that this happens before your application starts so it blocks the starting of your application. As such, you should avoid running expensive/time-consuming processes here. Otherwise, you will delay the start of your application, or possibly even cause app failures if you exceed the startup timeout (cf push -t).
Ok, it is very strange. I have some init scripts that I would like to run when a cluster starts
cluster has the init script , which is in a file (in dbfs)
basically this
dbfs:/databricks/init-scripts/custom-cert.sh
Now , when I make the init script like this, it works (no ssl errors for my endpoints. Also, the event logs for the cluster shows the duration as 1 second for the init script
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
However, if I just put the init script in an bash script and upload it to DBFS through a pipeline, the init script does not do anything. It executes , as per the event log but the execution duration is 0 sec.
I have the sh script in a file named
custom-cert.sh
with the same contents as above, i.e.
#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt"
but when I check /usr/local/share/ca-certificates/ , it does not contain /dbfs/orgcertificates/orgcerts.crt, even though the cluster init script has run.
Also, I have compared the contents of the init script in both cases and it least to the naked eye, I can't figure out any difference
i.e.
%sh
cat /dbfs/databricks/init-scripts/custom-cert.sh
shows the same contents in both the scenarios. What is the problem with the 2nd case?
EDIT: I read a bit more about init scripts and found that the logs of init scripts are written here
%sh
ls /databricks/init_scripts/
Looking at the err file in that location, it seems there is an error
sudo: update-ca-certificates
: command not found
Why is it that update-ca-certificates found in the first case but not when I put the same script in a sh script and upload it to dbfs (instead of executing the dbutils.fs.put within a notebook) ?
EDIT 2: In response to the first answer. After running the command
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/orgcertificates/orgcerts.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
the output is the file custom-cert.sh and then I restart the cluster with the init script location as dbfs:/databricks/init-scripts/custom-cert.sh and then it works. So, it is essentially the same content that the init script is reading (which is the generated sh script). Why can't it read it if I do not use dbfs put but just put the contents in bash file and upload it during the CI/CD process?
As we aware, An init script is a shell script that runs during startup of each cluster node before the Apache Spark driver or worker JVM start. case-2 When you run bash
command by using of %sh magic command means you are trying to execute this command in Local driver node. So that workers nodes is not able to access . But based on
case-1 , By using of %fs magic command you are trying run copy command (dbutils.fs.put )from root . So that along with driver node , other workers node also can access path .
Ref : https://docs.databricks.com/data/databricks-file-system.html#summary-table-and-diagram
It seems that my observations I made in the comments section of my question is the way to go.
I now create the init script using a databricks job that I run during the CI/CD pipeline from Azure DevOps.
The notebook has the commands
dbutils.fs.rm("/databricks/init-scripts/custom-cert.sh")
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash
cp /dbfs/internal-certificates/certs.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
""")
I then create a Databricks job (pointing to this notebook), the cluster is a job cluster which is just temporary . Of course , in my case , even this job creation is automated using a powershell script.
I then call this Databricks job in the release pipeline using again a Powershell script.
This creates the file
/databricks/init-scripts/custom-cert.sh
I then use this file in any other cluster that accesses my org's endpoints (without certificate errors).
I do not know (or still understand), why can't the same script file be just part of a repo and uploaded during the release process (instead of it being this Databricks job calling a notebook). I would love to know the reason . The other answer on this question does not hold true as you can see, that the cluster script is created by a job cluster and then accessed from another cluster as part of its init script.
It simply boils down to how the init script gets created.
But I get my job done. Just if it helps someone get their job done too.
I have raised a support case though to understand the reason.
I'm using Rails 5. I would like to create a sidekiq process running locally using the default queue. My worker class looks roughly like the below ...
module Accounting::Workers
class FileGenerationWorker
include Sidekiq::Worker
def perform
print "starting work ...\n"
...
I have set up my config/sidekiq.yml file like so, in hopes of running the worker daily at a specific time (11:05 am) ...
:concurrency: 20
:queues:
- default
...
:schedule:
Accounting::Workers::FileGenerationWorker:
cron: "0 5 11 * *"
queue: default
However, when I start my rails server ("rails s"), I don't see my print statement output to the console or any of the work performed, which tells me my worker isn't running. What else am I missing in order to get this worker scheduled properly locally?
Run the workers with
bundle exec sidekiq
You may need to provide the path to the worker module. For example,
bundle exec sidekiq -r ./worker.rb
Sidekiq by itself doesn't support a :schedule: map entry in the Sidekiq configuration file.
Periodic job functionality is provided in extensions such as sidekiq-scheduler.
You need to use classes declared in the extended Sidekiq module provided in sidekiq-scheduler. For example,
./worker.rb
require 'sidekiq-scheduler'
require './app/workers/accounting'
Sidekiq.configure_client do |config|
config.redis = {db: 1}
end
Sidekiq.configure_server do |config|
config.redis = {db: 1}
end
./app/workers/accounting.rb
module Accounting
# ...
end
module Accounting::Workers
class FileGenerationWorker
include Sidekiq::Worker
def perform
puts "starting work ...\n"
end
end
end
I'm running Cypress in one of my release stages and it gives me this output:
Finished processing: D:\a\r1\a\_ClientWeb-Build-CI\ShellArtifact\tests\integration\cypress\videos\onboarding.spec.js.mp4 (0 seconds)
I have 2 questions:
Is the path name relative to the app service? If I have a app service called randomname and run the Cypress Stage on that randomname app service should I be able to find tCypresshe output in randomname.scm.azurewebsites.net.
If I go into the scm debug console and I do cd D:\a\ I get:
cd : Cannot find path 'D:\a\' because it does not exist.
So how do I actually access my Cypress test results?
I've also tried archiving the files into a zip file:
In the output of the task step I see:
Creating archive: d:\home\testing\somefile.zip
But when I try to access the D:/home/testing folder on my appname.scm.azurewebsites.net I get:
cd : Cannot find path 'D:\home\testing' because it does not exist.
The path D:\a\r1\a is inside the hosted agent that run the release pipeline, is not in your application.
The same thing is for the zip file, when you specify d:/home/... is in the agent.
After the release is finish all the files are deleted, so you need to save the file in another place (maybe in azure?) during the pipeline, for example, with "Azure File Copy" task.
I'm trying to run a .sh file loading from a .py file in a PySpark's job but I receive a message always saying that .sh file is not found
This is my code:
test.py:
import os,sys
os.system("sh ./check.sh")
and my gcloud command:
gcloud beta dataproc jobs submit pyspark --cluster mserver file:///home/myuser/test.py
test.py file is loaded well but the system can't find check.sh file
I figure out that is something related with the file's path but not sure
I tried also with os.system("sh home/myuser/check.sh") and same result
I think that this should be easy to do so ... ideas?
The "current working directory" used by Dataproc jobs submitted through the API is a temporary directory with a unique name for each job; if the file wasn't uploaded with the job itself, you'll have to access it using your absolute path.
If you indeed added the check.sh file manually to /home/myuser/check.sh, then you should be able to call it using the fully qualified path, os.system("sh /home/myuser/check.sh"); make sure to start your absolute path with a /.