torch.distributed.barrier() added on all processes not working - pytorch

import torch
import os
torch.distributed.init_process_group(backend="nccl")
local_rank = int(os.environ["LOCAL_RANK"])
if local_rank >0:
torch.distributed.barrier()
print(f"Entered process {local_rank}")
if local_rank ==0:
torch.distributed.barrier()
The above code gets hanged forever but if I remove both torch.distributed.barrier() then both print statements get executed. Am I missing something here?
On the command line I execute the process using torchrun --nnodes=1 --nproc_per_node 2 test.py where test.py is the name of the script
tried the above code with and without the torch.distributed.barrier()
With the barrier() statements expecting the statement to print for one gpu and exit -- not as expected
Without the barrier() statements expecting both to print -- as expected
Am I missing something here?

It is better to put your multiprocessing initialization code inside the if __name__ == "__main__": to avoid endless process generation and re-design the control flow to fit your purpose:
if __name__ == "__main__":
import torch
import os
torch.distributed.init_process_group(backend="nccl")
local_rank = int(os.environ["LOCAL_RANK"])
if local_rank > 0:
torch.distributed.barrier()
else:
print(f"Entered process {local_rank}")
torch.distributed.barrier()

Related

Why can't I see terminal input (stdout) in Linux after executing this Python3 script?

I wrote a Python3 script (shown below, repo here https://gitlab.com/papiris/gcode-motor-stutter-generator)
After I execute it on Linux (Raspberry Pi OS bullseye 32-bit) and either exit by ctrl+c or let it finish; I can't see what I write in that respective terminal tab anymore. The terminal (kde konsole) responds to commands, the text just isn't visible. I can open a new terminal tab and keep working, but the terminal tabs I run this script in never show the text I input again.
Why is this, and how can I fix it?
I tried searching for this topic, but couldn't find anything similar.
#!/usr/bin/env python3
from sys import stdout, stdin
from curtsies import Input
from threading import Thread
from queue import Queue, Empty
### non-blocking read of stdin
def enqueue_input(stdin, queue):
try:
with Input(keynames='curses') as input_generator:
for _input in iter(input_generator):
queue.put(_input)
except keyboardInterrupt:
sys.exit(1)
q=Queue()
t = Thread(target=enqueue_input, args=(stdin, q))
t.daemon = True # thread dies with the program
t.start()
def main():
while True:
try:
input_key = q.get(timeout=2)
except Empty:
print(f'printing continuously')
pass
else:
if input_key == 'n':
print('extrusion loop stopped, moving on')
break
if __name__ == "__main__":
main()

Fail fast with MPI4PY

I'd like the following behavior when running an MPI script with mpi4py: when any process throws an exception, mpirun (and its spawned processes) should immediately exit with non-zero error codes. But instead, I find that execution continues even if one or more processes throws an exception.
I am using mpi4py 3.0.0 with OpenMPI 2.1.2. I'm running this script with
mpirun --verbose -mca orte_abort_on_non_zero_status 1 -n 4 python my_script.py. I expected this to immediately end before the sleep is hit, but instead, processes with ranks != 0 sleep:
import time
import mpi4py
def main():
import mpi4py.MPI
mpi_comm = mpi4py.MPI.COMM_WORLD
if mpi_comm.rank == 0:
raise ValueError('Failure')
print('{} continuing to execute'.format(mpi_comm.rank))
time.sleep(10)
print('{} exiting'.format(mpi_comm.rank)
if __name__ == '__main__':
main()
How can I get the behavior I'd like (fail quickly if any process fails)?
Thank you!
It seems to be an known issue of mpi4py. From https://groups.google.com/forum/#!topic/mpi4py/RovYzJ8qkbc, I read:
mpi4py initializes/finalizes MPI for you. The initialization occurs at
import time, and the finalization when the Python process is about to
finalize (I'm using Py_AtExit() C-API call to do this). As
MPI_Finalize() is collective and likely blocking in most MPI impls,
you get the deadlock.
A solution is to override sys.excepthookand call explicitly MPI.COMM_WORLD.Abort in it.
Here is your code modified:
import sys
import time
import mpi4py.MPI
mpi_comm = mpi4py.MPI.COMM_WORLD
def mpiabort_excepthook(type, value, traceback):
mpi_comm.Abort()
sys.__excepthook__(type, value, traceback)
def main():
if mpi_comm.rank == 0:
raise ValueError('Failure')
print('{} continuing to execute'.format(mpi_comm.rank))
time.sleep(10)
print('{} exiting'.format(mpi_comm.rank))
if __name__ == "__main__":
sys.excepthook = mpiabort_excepthook
main()
sys.excepthook = sys.__excepthook__
It turns out mpi4py can be run as a module fixing this issue (internally by calling Abort() like jcgiret says):
mpirun --verbose -mca orte_abort_on_non_zero_status 1 -n 4 python -m mpi4py my_script.py

How to run the python file in Spark

I am having python code in python file.I want to know how to run the python code which is present in one location.I am using Ubuntu OS.In my code, I am getting Json from one URL and need to show as scatter graph using SPARK.I am new to PYSPARK. Please guide me how to achieve this. Please find my below code,
`import multiprocessing
import time
import json
from sseclient import SSEClient as EventSource
# 'Complete your function here i cant understand what you are doing'
# i just placed the code inside check once i dont have the package so u try it
def func(n):
file = open('w.txt','w',encoding='utf8')
url = 'https://stream.wikimedia.org/v2/stream/recentchange'
print(1)
url = 'https://stream.wikimedia.org/v2/stream/recentchange'
json_st=''
stt=''
for event in EventSource(url):
if event.event == 'message':
try:
change = json.loads(event.data)
except ValueError:
pass
else:
print(1)
file.write(str(event.data))
#if file.write(str(event))count <= 10:
#print(event.data)
#print(event.data)
#js=json.loads(event.data)
##print(js['comment'])
#file.write(stt)
#print(stt)
#file.write(str(event))
# count = count + 1
#else:
# break
#print(stt)
#json_str={s}
if __name__ == '__main__':
# Start your process as a process
p = multiprocessing.Process(target=func, name="func", args=(10,))
p.start()
# Wait 3(give your time in secs) seconds for foo
time.sleep(3)
# Terminate func
p.terminate()
# Cleanup
p.join()`
you have to used spark-submit command to running you python script with spark (using command line terminal).
spark-submit /home/sample.py

psutil hangs with a psutil.NoSuchProcess process no longer exists

Ive been messing with this but keep getting an error, Im assuming the pid is changing right before Im asking for the cpu_percent.
Heres my little test program, it basically is opening a file waiting for the the program to finish loading close the program and repeats. After a few loads Ill get the "psutil.NoSuchProcess: psutil.NoSuchProcess process no longer exists (pid=10144)". Any guidance on this would be great.
import psutil
from time import gmtime, strftime
import time
import os
def Monitor():
i = "True"
while i == "True":
process = [proc for proc in psutil.process_iter()]
for object in process:
if 'MSACCESS.EXE' in object.name():
if object.name() == 'MSACCESS.EXE':
a=object.cpu_percent(interval=1)
time.sleep(10)
print(a)
if a > 0:
i = "True"
else:
i = "False"
print("Finished")
repeat = "repeat"
while repeat == "repeat":
os.startfile('\\\\revvedupoffice\\Revved Up\\Collective 10.0.accdb')
Monitor()
os.system("TASKKILL /F /IM MSACCESS.EXE")
time.sleep(10)

Why no call_at_threadsafe and call_later_threadsafe?

I'm using Python 3.5.2 in Windows 32bits and aware that asyncio call_at is not threadsafe, hence following code won't print 'bomb' unless I uncomment the line loop._write_to_self().
import asyncio
import threading
def bomb(loop):
loop.call_later(1, print, 'bomb')
print('submitted')
# loop._write_to_self()
if __name__ == '__main__':
loop = asyncio.get_event_loop()
threading.Timer(2, bomb, args=(loop,)).start()
loop.run_forever()
However I couldn't find a piece of information about why call_at_threadsafe and call_later_threadsafe is implemented. Is the reason ever exists?
Simply use loop.call_soon_threadsafe to schedule loop.call_later:
loop.call_soon_threadsafe(loop.call_later, 1, print, 'bomb')

Resources