I have this weird error i can't seem to get around, the python error is something i cant seem to figure out, unresolvable contact point sounds like ip issue? cluster-ip is none when i look at the client below
NAME↑ TYPE CLUSTER-IP EXTERNAL-IP SELECTOR PORTS │
│ rook-cassandra-client ClusterIP None <none> app=rook-cassandra,app.kubernetes.io/managed-by=rook-cassandra-operator,app.kubernetes.io/name=rook-cassandra,cassandra.rook.io/cluster=rook
All i am trying to do is pass arguments below in env:
env:
replicationClass: SimpleStrategy
replicationFactor: 3
name: sample
cluster: rook-cassandra-client
namespace: xxxx-cassandra
below
FailedCreatePodSandBox
Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" network for pod "rook-cassandra-create-keyspace-job-zf4p9": NetworkPlugin cni failed to set up pod "rook-cassandra-create-keyspace-job-xxxxx_xxxx-cassandra" network: add cmd: failed to assign an IP address to container
{}
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/audit/__main__.py", line 54, in <module>
main()
File "/audit/__main__.py", line 50, in main
print(args.func(args))
File "/audit/__main__.py", line 9, in keyspace
keyspace.create(args.name, args.replication_class, args.replication_factor, args.cluster, args.namespace)
File "/audit/keyspace/create_keyspace.py", line 17, in create
raise e
File "/audit/keyspace/create_keyspace.py", line 14, in create
session = self.__connect_to_cluster(cassandra_service, cassandra_namespace)
File "/audit/keyspace/create_keyspace.py", line 8, in __connect_to_cluster
cluster = Cluster([f"{service}.{namespace}"])
File "/usr/local/lib/python3.8/site-packages/cassandra/cluster.py", line 1174, in __init__
raise UnresolvableContactPoints(self._endpoint_map_for_insights)
cassandra.UnresolvableContactPoints: {}
Related
I'm trying to run training of the following model: https://github.com/Atten4Vis/ConditionalDETR
by using a script conddetr_r50_epoch50.sh, just like it is said in README. It looks like this:
script_name1=`basename $0`
script_name=${script_name1:0:${#script_name1}-3}
python -m torch.distributed.launch \
--nproc_per_node=8 \
--use_env \
main.py \
--coco_path ../data/coco \
--output_dir output/$script_name
But I am getting the following errors:
NOTE: Redirects are currently not supported in Windows or MacOs.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [DESKTOP-16DB4TE]:29500 (system error: 10049 - The requested address is not valid in its context.).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [DESKTOP-16DB4TE]:29500 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
File "C:\DETR\ConditionalDETR\main.py", line 258, in <module>
main(args)
File "C:\DETR\ConditionalDETR\main.py", line 116, in main
utils.init_distributed_mode(args)
File "C:\DETR\ConditionalDETR\util\misc.py", line 429, in init_distributed_mode
torch.cuda.set_device(args.gpu)
File "C:\ProgramData\Anaconda3\envs\conditional_detr\lib\site-packages\torch\cuda\__init__.py", line 326, in set_device
torch._C._cuda_setDevice(device)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
C:\ProgramData\Anaconda3\envs\conditional_detr\lib\site-packages\torch\distributed\launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 55928) of binary: C:\ProgramData\Anaconda3\envs\conditional_detr\python.exe
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\conditional_detr\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\ProgramData\Anaconda3\envs\conditional_detr\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\ProgramData\Anaconda3\envs\conditional_detr\lib\site-packages\torch\distributed\launch.py", line 195, in <module>
main()
File "C:\ProgramData\Anaconda3\envs\conditional_detr\lib\site-packages\torch\distributed\launch.py", line 191, in main
launch(args)
File "C:\ProgramData\Anaconda3\envs\conditional_detr\lib\site-packages\torch\distributed\launch.py", line 176, in launch
run(args)
File "C:\ProgramData\Anaconda3\envs\conditional_detr\lib\site-packages\torch\distributed\run.py", line 753, in run
elastic_launch(
File "C:\ProgramData\Anaconda3\envs\conditional_detr\lib\site-packages\torch\distributed\launcher\api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "C:\ProgramData\Anaconda3\envs\conditional_detr\lib\site-packages\torch\distributed\launcher\api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
I am very new to PyTorch I do not quite understand why I'm getting this errors and what should I do to fix this?
Im trying to create a local Airflow env with Docker + ubuntu in a Windows OS.
I used the following wizard:
https://github.com/aws/aws-mwaa-local-runner
This wizard creates 2 containers (1 for DB and 1 for the Airflow)
Now I'm stuck with the following problem:
My Airflow container keeps restarting after troughing 2 exceptions:
"ERROR: You need to initialize the database. Please run airflow db init. Make sure the command is run using Airflow version 2.3.2."
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in
sys.exit(main())
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/main.py", line 38, in main
args.func(args)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 51, in command
return func(*args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/db_command.py", line 35, in initdb
db.initdb()
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/session.py", line 71, in wrapper
return func(*args, session=session, **kwargs)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/db.py", line 648, in initdb
upgradedb(session=session)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/session.py", line 68, in wrapper
return func(*args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/db.py", line 1449, in upgradedb
command.upgrade(config, revision=to_revision or 'heads')
File "/usr/local/lib/python3.7/site-packages/alembic/command.py", line 294, in upgrade
script.run_env()
File "/usr/local/lib/python3.7/site-packages/alembic/script/base.py", line 490, in run_env
util.load_python_file(self.dir, "env.py")
File "/usr/local/lib/python3.7/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file
module = load_module_py(module_id, path)
File "/usr/local/lib/python3.7/site-packages/alembic/util/compat.py", line 182, in load_module_py
spec.loader.exec_module(module)
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/migrations/env.py", line 107, in
run_migrations_online()
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/migrations/env.py", line 101, in run_migrations_online
context.run_migrations()
File "", line 8, in run_migrations
File "/usr/local/lib/python3.7/site-packages/alembic/runtime/environment.py", line 813, in run_migrations
self.get_context().run_migrations(**kw)
File "/usr/local/lib/python3.7/site-packages/alembic/runtime/migration.py", line 548, in run_migrations
for step in self._migrations_fn(heads, self):
File "/usr/local/lib/python3.7/site-packages/alembic/command.py", line 283, in upgrade
return script._upgrade_revs(revision, rev)
File "/usr/local/lib/python3.7/site-packages/alembic/script/base.py", line 365, in _upgrade_revs
revs = list(revs)
File "/usr/local/lib/python3.7/site-packages/alembic/script/revision.py", line 1040, in _iterate_revisions
total_space.remove(rev.revision)
KeyError: '75d5ed6c2b43'"
I tried to run the following command in ubuntu for updating the DB:
"docker exec -it aws-mwaa-local-runner-202_local-runner_1 /entrypoint.sh airflow db upgrade"
but getting same error:
"..... KeyError: '75d5ed6c2b43'"
I also tried to reset and init the Airflow DB with-
docker exec -it aws-mwaa-local-runner-202_local-runner_1 /entrypoint.sh airflow initdb
docker exec -it aws-mwaa-local-runner-202_local-runner_1 /entrypoint.sh airflow reset
and still getting the "..... KeyError: '75d5ed6c2b43'" error.
waiting for response,
Thanks
I have automated ansible playbook to generate Terraform files required to create AWS resources in a folder. Once the files are generated, ansible tasks runs terraform plan(using below plan code) but it is failing with below error(error)
Ansible Version:
ansible [core 2.11.2]
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.8/dist-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
Terraform Version:
Terraform v0.12.31
plan code:
- name: Create Terraform plan
check_mode: true
community.general.terraform:
project_path: "{{ terraform_dir }}"
plan_file: "{{ terraform_dir }}/tfplan"
state: "planned"
force_init: true
register: plan
tags:
- build
- plan
- never
error:
**fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):
File \"/root/.ansible/tmp/ansible-tmp-1625160772.6764412-713-181137940992596/AnsiballZ_terraform.py\", line 100, in <module>
_ansiballz_main()
File \"/root/.ansible/tmp/ansible-tmp-1625160772.6764412-713-181137940992596/AnsiballZ_terraform.py\", line 92, in _ansiballz_main
invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
File \"/root/.ansible/tmp/ansible-tmp-1625160772.6764412-713-181137940992596/AnsiballZ_terraform.py\", line 40, in invoke_module
runpy.run_module(mod_name='ansible_collections.community.general.plugins.modules.terraform', init_globals=dict(_module_fqn='ansible_collections.community.general.plugins.modules.terraform', _modlib_path=modlib_path),
File \"/usr/lib/python3.8/runpy.py\", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File \"/usr/lib/python3.8/runpy.py\", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File \"/usr/lib/python3.8/runpy.py\", line 87, in _run_code
exec(code, run_globals)
File \"/tmp/ansible_community.general.terraform_payload_xnzkbjkr/ansible_community.general.terraform_payload.zip/ansible_collections/community/general/plugins/modules/terraform.py\", line 497, in <module>
File \"/tmp/ansible_community.general.terraform_payload_xnzkbjkr/ansible_community.general.terraform_payload.zip/ansible_collections/community/general/plugins/modules/terraform.py\", line 393, in main
File \"/tmp/ansible_community.general.terraform_payload_xnzkbjkr/ansible_community.general.terraform_payload.zip/ansible_collections/community/general/plugins/modules/terraform.py\", line 238, in get_version
File \"/usr/lib/python3.8/json/__init__.py\", line 357, in loads
return _default_decoder.decode(s)
File \"/usr/lib/python3.8/json/decoder.py\", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File \"/usr/lib/python3.8/json/decoder.py\", line 355, in raw_decode
raise JSONDecodeError(\"Expecting value\", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
", "module_stdout": "", "msg": "MODULE FAILURE
See stdout/stderr for the exact error", "rc": 1}**
not sure if I missed to install any required modules or if I need to make code changes.
Any idea how to fix the above error?
Please let me know if any further information needed.
Thanks in Advance.
Fixed the above error by changing the ansible-galaxy collection version(ansible-galaxy collection install community.general:1.3.9).
ansible-galaxy versions causing the above error are >1.3.9 as versions greater than 1.3.9 are not compatible with terraform version v0.12.31
Previously I have ansible-galaxy collection version 2.2.0 installed, degrading the version to 1.3.9 fixed my issue.
Instaling cloud tracker by - https://github.com/duo-labs/cloudtracker and getting below ERROR :
Could not load yaml from config file config.yaml while runing
cloudtracker --account demo --list users (venv)
[ec2-user#ip-10-0-0-245 ~]$ cloudtracker --account demo --list users
Traceback (most recent call last):
File "/home/ec2-user/venv/lib64/python3.7/site-packages/cloudtracker/cli.py",
line 97, in m ain
config = yaml.load(args.config)
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/init.py",
line 72, in load
return loader.get_single_data()
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/constructor.py",
line 35, in g et_single_data
node = self.get_single_node()
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/composer.py",
line 36, in get_ single_node
document = self.compose_document()
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/composer.py",
line 55, in comp ose_document
node = self.compose_node(None, None)
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/composer.py",
line 84, in comp ose_node
node = self.compose_mapping_node(anchor)
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/composer.py",
line 133, in com pose_mapping_node
item_value = self.compose_node(node, item_key)
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/composer.py",
line 84, in comp ose_node
node = self.compose_mapping_node(anchor)
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/composer.py",
line 127, in com pose_mapping_node
while not self.check_event(MappingEndEvent):
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/parser.py",
line 98, in check_ event
self.current_event = self.state()
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/parser.py",
line 428, in parse _block_mapping_key
if self.check_token(KeyToken):
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/scanner.py",
line 116, in chec k_token
self.fetch_more_tokens()
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/scanner.py",
line 223, in fetc h_more_tokens
return self.fetch_value()
File "/home/ec2-user/venv/lib64/python3.7/site-packages/yaml/scanner.py",
line 579, in fetc h_value
self.get_mark())
yaml.scanner.ScannerError: mapping values are not allowed here
in "config.yaml", line 3, column 8
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ec2-user/venv/bin/cloudtracker", line 8, in <module>
sys.exit(main())
File "/home/ec2-user/venv/lib64/python3.7/site-packages/cloudtracker/cli.py", line 101, in main
"ERROR: Could not load yaml from config file {}\n{}".format(args.config.name, e)
argparse.ArgumentError: ERROR: Could not load yaml from config file config.yaml
mapping values are not allowed here
in "config.yaml", line 3, column 8
Ok solved the error today
It was the indentation error in config.ymal file that contains bucket name and account id.
Due to indentation error in that file it was unable to find path to cloud trail logs.
I have Kubeflow Pipelines running in my local environment, along with JupyterLab and the Elyra extensions. I've created a notebook pipeline and configured the runtime configuration as follows, setting api_endpoint to http://localhost:31380/pipeline (with security disabled). Trying to run the pipeline the following error message is displayed:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/tornado/web.py", line 1703, in _execute
result = await result
File "/usr/local/lib/python3.8/site-packages/elyra/pipeline/handlers.py", line 89, in post
response = await PipelineProcessorManager.instance().process(pipeline)
File "/usr/local/lib/python3.8/site-packages/elyra/pipeline/processor.py", line 70, in process
res = await asyncio.get_event_loop().run_in_executor(None, processor.process, pipeline)
File "/usr/local/Cellar/python#3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/elyra/pipeline/processor_kfp.py", line 100, in process
raise lve
File "/usr/local/lib/python3.8/site-packages/elyra/pipeline/processor_kfp.py", line 89, in process
client.upload_pipeline(pipeline_path,
File "/usr/local/lib/python3.8/site-packages/kfp/_client.py", line 720, in upload_pipeline
response = self._upload_api.upload_pipeline(pipeline_package_path, name=pipeline_name, description=description)
File "/usr/local/lib/python3.8/site-packages/kfp_server_api/api/pipeline_upload_service_api.py", line 83, in upload_pipeline
return self.upload_pipeline_with_http_info(uploadfile, **kwargs) # noqa: E501
File "/usr/local/lib/python3.8/site-packages/kfp_server_api/api/pipeline_upload_service_api.py", line 177, in upload_pipeline_with_http_info
return self.api_client.call_api(
File "/usr/local/lib/python3.8/site-packages/kfp_server_api/api_client.py", line 378, in call_api
return self.__call_api(resource_path, method,
File "/usr/local/lib/python3.8/site-packages/kfp_server_api/api_client.py", line 195, in __call_api
response_data = self.request(
File "/usr/local/lib/python3.8/site-packages/kfp_server_api/api_client.py", line 421, in request
return self.rest_client.POST(url,
File "/usr/local/lib/python3.8/site-packages/kfp_server_api/rest.py", line 279, in POST
return self.request("POST", url,
File "/usr/local/lib/python3.8/site-packages/kfp_server_api/rest.py", line 196, in request
r = self.pool_manager.request(
File "/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 79, in request
return self.request_encode_body(
File "/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 171, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/usr/local/lib/python3.8/site-packages/urllib3/poolmanager.py", line 325, in urlopen
conn = self.connection_from_host(u.host, port=u.port, scheme=u.scheme)
File "/usr/local/lib/python3.8/site-packages/urllib3/poolmanager.py", line 231, in connection_from_host
raise LocationValueError("No host specified.")
urllib3.exceptions.LocationValueError: No host specified.
The root cause is an issue in the Kubeflow Pipelines kfp package version 1.0.0 that is distributed with Elyra v1.4.1 (and lower). To work around the issue, replace localhost with 127.0.0.1 in your runtime configuration, e.g. http://127.0.0.1:31380/pipeline.
Edit: With the availability of Elyra v1.5+, which requires a more recent version of the kfp package, you can also upgrade Elyra to resolve the issue.