I'm trying to implement a multi-node job using Pytorch's DistributedDataParallel on a University supercomputer where I'm logging in via ssh using port 22. Following this tutorial, when I set MASTER_PORT=12340 or some other number on the SLURM script, I obviously get no response since there's nothing happening on it. If I set MASTER_PORT=22, I get permission denied when the code reaches the dist.init_process_group() method:
dist.init_process_group(backend=opt.dist_backend, init_method=opt.dist_url,
world_size=opt.world_size, rank=opt.rank)
GIVES ME
Traceback (most recent call last):
File "train_dist.py", line 262, in <module>
main()
File "train_dist.py", line 220, in main
world_size=opt.world_size, rank=opt.rank)
File "/home/miniconda3/envs/vit/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 595, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/home/miniconda3/envs/vit/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 232, in _env_rendezvous_handler
store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
File "/home/miniconda3/envs/vit/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 161, in _create_c10d_store
hostname, port, world_size, start_daemon, timeout, multi_tenant=True
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:22 (errno: 13 - Permission denied). The server socket has failed to bind to 0.0.0.0:22 (errno: 13 - Permission denied).
I have also tried to re-route the port 22 traffic to some other port (eg. 65000) but I also get permission denied for even attempting this rerouting. I'm not sure what else I can try to do at this point, anyone has any suggestions or is this something that I need to ask the administrator for?
Related
I have setup libretranslate on my local system (ubuntu focal fossa) by following steps described by https://github.com/LibreTranslate/LibreTranslate url and scaled the app with gunicorn and nginx as described in the same tutorial. I have created libretranslate as ubuntu service unit. below is my ExecStart command of my service file.
ExecStart=/home/support/LibreTranslate/env/bin/gunicorn --workers 3 --log-level 'error' --error-logfile /home/support/LibreTranslate/Logs/gunicorn_nohup.log --bind unix:libretranslate.sock -m 007 wsgi:app
I started gunicorn with 3 worker. However, after running for sometimes, it started to give 500 internal server error. Below is log generated by gunicorn
[2022-05-10 13:44:03 +0100] [306482] [ERROR] Error handling request /detect
Traceback (most recent call last):
File "/home/support/LibreTranslate/env/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 136, in handle
self.handle_request(listener, req, client, addr)
File "/home/support/LibreTranslate/env/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 179, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/home/support/LibreTranslate/wsgi.py", line 14, in app
instance = main()
File "/home/support/LibreTranslate/app/main.py", line 121, in main
app = create_app(args)
File "/home/support/LibreTranslate/app/app.py", line 113, in create_app
remove_translated_files.setup(get_upload_dir())
File "/home/support/LibreTranslate/app/remove_translated_files.py", line 23, in setup
scheduler.start()
File "/home/support/LibreTranslate/env/lib/python3.8/site-packages/apscheduler/schedulers/background.py", line 38, in start
self._thread.start()
File "/usr/lib/python3.8/threading.py", line 852, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
Does anyone knows why this is happening? And Is there any other way to achieve same without facing this issue?
I have raised issue on LibreTransate community. here is the link https://community.libretranslate.com/t/python-library-of-libretranslate-run-with-gunicorn-and-nginx-not-freeing-up-threads/221
and link to GH issue https://github.com/argosopentech/LibreTranslate-init/issues/10
I have been trying to get a connection between TwinCAT 3 on Windows and Python on Ubuntu. I already have the connection between Twincat 3 Windows and Python Windows working, but not to Ubuntu. I have a virtual machine set up through Oracle VM Virtualbox. I tried many things but so far had no success in creating the connection.
I have a bridged adapter network and tried to open the port of the IP address of the virtual machine in linux through sudo ufw allow
I have the following code:
pyads.open_port()
pyads.add_route('10.11.104.206.1.1','127.0.0.1')
pyads.close_port()
plc = pyads.Connection('10.11.104.206.1.1', 851)
plc.open()
try:
# try to connect to PLC
plc.read_state()
print('Connection succeeded')
except Exception:
print('Connection failed')
And this is the error I get:
2020-11-22T22:45:46+0100 Error: Connect TCP socket failed with: 111
Traceback (most recent call last):
File "/home/laurence/ws_moveit/devel/lib/moveit_tutorials/move_panda_LKO.py", line 15, in <module>
exec(compile(fh.read(), python_script, 'exec'), context)
File "/home/laurence/ws_moveit/src/moveit_tutorials/doc/move_panda_LKO/scripts/move_panda_LKO.py", line 64, in <module>
pyads.add_route('10.11.104.206.1.1','127.0.0.1')
File "/usr/local/lib/python3.8/dist-packages/pyads/ads.py", line 188, in add_route
return adsAddRoute(adr.netIdStruct(), ip_address)
File "/usr/local/lib/python3.8/dist-packages/pyads/pyads_ex.py", line 155, in wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pyads/pyads_ex.py", line 177, in adsAddRoute
raise ADSError(error_code)
pyads.pyads_ex.ADSError: ADSError: target port not found ADS Server not started (6).
These are the netid/IPadresses.
-
-
-
CX-52EE70
169.254.64.202
5.82.238.112.1.1
TCP_IP
-
LEENLAPTOP19
127.0.0.1
10.11.104.206.1.1
TCP_IP
I have tried combinations with other netid/IP addresses so sometimes I get other errors (110,113) but usually 111 which means connection refused, but I do not know what I am doing wrong. Any ideas?
Please make sure the plc runtime is running (a plc program) when you connect. If the plc is in config or exception mode the plc runtime ads port (851 (or 801 for TC2)) is not present. That is what the ADS error 6 target port not found is trying to tell us.
I am getting this when I start make infra using localstack
Starting local dev environment. CTRL-C to quit.
cannot import name 'dns_server' from 'localstack_ext.services' (unknown location)
Starting mock S3 (http port 4572)...
Starting mock SNS (http port 4575)...
2019-09-21T13:11:08:INFO:localstack.multiserver: Starting multi API server process on port
51492
Starting mock SQS (http port 4576)...
Starting mock DynamoDB (http port 4569)...
Starting mock Lambda service (http port 4574)...
Starting mock CloudWatch Logs (http port 4586)...
Ready.
But when I do http://localhost:4569 it throws error and doesnt show it started.
Below are the errors I am getting
2019-09-21T13:15:05:ERROR:localstack.services.generic_proxy: Error forwarding request: the
JSON object must be str, bytes or bytearray, not NoneType Traceback (most recent call last):
File "/Users//workspaces/others/localstack/localstack/services/generic_proxy.py", line 240,
in forward
path=path, data=data, headers=forward_headers)
File"/Users//workspaces/others/localstack/localstack/services/dynamodb/dynamodb_listener.py",
line 35, in forward_request
data = json.loads(to_str(data))
File"/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/init.py", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType
Error 2:
2019-09-21T13:14:11:ERROR:localstack.services.generic_proxy: Error forwarding request:'QueueUrl' Traceback (most recent call last):
File "/Users//workspaces/others/localstack/localstack/services/generic_proxy.py", line 240, in forward
path=path, data=data, headers=forward_headers)
File "/Users//workspaces/others/localstack/localstack/services/sqs/sqs_listener.py", line 53, in forward_request
self._set_queue_attributes(req_data)
File "/Users/*****/workspaces/others/localstack/localstack/services/sqs/sqs_listener.py", line 245, in _set_queue_attributes
queue_url = req_data['QueueUrl'][0]
KeyError: 'QueueUrl'`
Please help on this its a been a blocker for me.
This is because I was hitting http://localhost:4569 which isn't a user interface. So its throwing error.But if I create tables using aws cli and do some operations it just works fine without any issue.
my script works very well when I use it at my offices (wifi connection). But when I get home and I use my personal wifi or any other wifi, it doesn't work...
Could you please help me ??
Extracted from general.log
ERROR [2018-08-11 21:31:18] [mehdidouache] Message: Can not connect to the Service /Users/Mehdi/Desktop/InstaPy/assets/chromedriver
Traceback (most recent call last):
File "/Users/Mehdi/Desktop/InstaPy/instapy/instapy.py", line 299, in set_selenium_local_session
chrome_options=chrome_options)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 62, in init
self.service.start()
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 102, in start
raise WebDriverException("Can not connect to the Service %s" % self.path)
selenium.common.exceptions.WebDriverException: Message: Can not connect to the Service /Users/Mehdi/Desktop/InstaPy/assets/chromedriver
I have an app written in python3 that uses pyserial (python3-serial) to communicate with another processor via serial cable. And I'm losing some responses. I have a tap on the line itself and have determined that the responses are indeed being placed on the wire.
So I wanted to sniff the port at a low linux level to see if the responses are showing up there. My python3 apps uses something like
port = serial.Serial(path, baudrate=115200, timeout=0)
where path is something like _port_TD200_ which is a symlink to \dev\ttyS3.
My general approach was to change that link, so that it points at a virtual port, and then bridge the real port to the virtual port. So I used ln -sf to relink the _port_TD200_ to /tmp/ttyV0 and then run this
sudo socat /dev/ttyS3,raw,echo=0 SYSTEM:'tee input.txt | socat - "PTY,link=/tmp/ttyV0,raw,echo=0,waitslave" | tee output.txt'
However, when I try to restart the service that runs the pyserial app... it chokes on the change:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/serial/serialposix.py", line 265, in open
self.fd = os.open(self.portstr, os.O_RDWR | os.O_NOCTTY | os.O_NONBLOCK)
PermissionError: [Errno 13] Permission denied: '/opt/pilot/_port_TD200_'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/pilot/radio.py", line 19, in __init__
self.port = serial.Serial(path, baudrate=115200, timeout=0)
File "/usr/lib/python3/dist-packages/serial/serialutil.py", line 236, in __init__
self.open()
File "/usr/lib/python3/dist-packages/serial/serialposix.py", line 268, in open
raise SerialException(msg.errno, "could not open port {}: {}".format(self._port, msg))
serial.serialutil.SerialException: [Errno 13] could not open port /opt/pilot/_port_TD200_: [Errno 13] Permission denied: '/opt/pilot/_port_TD200_'
I have a guess that the pyserial open machinery which opens a file descriptor, but ALSO does setserial like things, is not happy because /tmp/ttyV0 is not enough of a real serial port. Is this correct? If so, is there a way to modify the socat incantation to make it possible?
Or is it something else? I admit I don't really understand the socat incantation. I pulled it from this stack exchange question...