subprocess called from multiprocess not finishing - python-3.x

I've got a set of videos from which I'm trying to pull random frames. There are a lot of videos, so I want to work in parallel, and ffmpeg can splice out the frames for me, so here's the important part of the code:
import os
from tqdm import tqdm
from joblib import Parallel, delayed
from multiprocessing import current_process
from subprocess import Popen
vids_dir = 'just a string'
out_dir = 'another string'
def process_each(f):
temp = 'temp' + str(current_process()._identity[0])
os.mkdir(temp)
proc = Popen(['ffmpeg -i ' + vids_dir + '/' + f + ' ' + temp + '/' + f[:-4] + '_%03d.jpg &> /dev/null'], shell=True) # convert to frames
proc.wait()
# do stuff
os.system('rm -rf ' + temp) # clean up
Parallel(n_jobs=10)(delayed(process_each)(f) for f in tqdm(os.listdir(vids_dir)))
I can print out the command being passed to Popen and execute it in a shell, and it works. I can open a python3 session and call the command from Popen or subprocess.call or even os.system, and it works. I can even set n_jobs=1 in my Parallel, and it works.
But the moment I actually parallelize this, I find ffmpeg doesn't flush its full results to the temporary folders; it only gets the first one or few frames.
What on earth could be going on? subprocess and multiprocessing should be able to mix this way.

Related

How to implement Multiprocessing in Azure Databricks - Python

I need to get details of each file from a directory. It is taking longer time. I need to implement Multiprocessing so that it's execution can be completed early.
My code is like this:
from pathlib import Path
from os.path import getmtime, getsize
from multiprocessing import Pool, Process
def iterate_directories(root_dir):
for child in Path(root_dir).iterdir():
if child.is_file():
modified_time = datetime.fromtimestamp(getmtime(file)).date()
file_size = getsize(file)
# further steps...
else:
iterate_directories(child) ## I need this to run on separate Process (in Parallel)
I tried to do recursive call using below, but it is not working. It comes out of loop immediately.
else:
p = Process(target=iterate_directories, args=(child))
Pros.append(p) # declared Pros as empty list.
p.start()
for p in Pros:
if not p.is_alive():
p.join()
What am I missing here? How can I run for sub-directories in parallel.
You have to get the directories list first and then you have to use multiprocessing pool to call the function.
something like below.
from pathlib import Path
from os.path import getmtime, getsize
from multiprocessing import Pool, Process
Filedetails = ''
def iterate_directories(root_dir):
for child in Path(root_dir).iterdir():
if child.is_file():
modified_time = datetime.fromtimestamp(getmtime(file)).date()
file_size = getsize(file)
Filedetails = Filedetails + '\n' + '{add file name details}' + modified_time + file_size
else:
iterate_directories(child) ## I need this to run on separate Process (in Parallel)
return Filesdetails #file return from that particular directory
pool = multiprocessing.Pool(processes={define how many processes you like to run in parallel})
results = pool.map(iterate_directories, {explicit directory list })
print(results) #entire collection will be printed here. it basically a list you can iterate individual directory level
.
pls let me know, how it goes.
The problem is this line:
if not p.is_alive():
What this translates to is that if the process is already complete, only then wait for it to complete, which obviously does not make much sense (you need to remove the not from the statement). Also, it is completely unnecessary as well. Calling .join does the same thing internally that p.is_alive does (except one blocks). So you can safely just do this:
for p in Pros:
p.join()
The code will then wait for all child processes to finish.

How to run Python subprocess consistently and in parallel?

I wrote a python script that converts mp3 to wav with mpg123. Then ffmpeg takes the output of mpg123 and upsample it. Finally, ffmpeg output file uploads to cloud. All these steps must run consistently.
subprocess.run(['mpg123', ...])
subprocess.run(['ffmpeg', ...])
upload()
Suppose, I have many mp3 files and I'd like to run 10 threads at the same time. I know that Python offers threading in subprocess.Popen, threading, concurrent.futures and multiprocessing modules. What is the right way to parallelize this process?
You could use the MPipe library:
from mpipe import OrderedStage, Pipeline
def increment(value):
return value + 1
def double(value):
return value * 2
stage1 = OrderedStage(increment, 3)
stage2 = OrderedStage(double, 3)
pipe = Pipeline(stage1.link(stage2))
for number in range(10):
pipe.put(number)
pipe.put(None)
for result in pipe.results():
print(result)

how can restrict processes in parallel

I have this script, but I only want to start 3 perl processes at a time. Once these 3 are done, the script should start the next three.
at the moment all processes are started in parallel
unfortunately I don't know what to do. can someone help me?
my script:
import json, os
import subprocess
from subprocess import Popen, PIPE
list = open('list.txt', 'r')
procs = []
for dirs in list:
args = ['perl', 'test.pl', '-a', dirs]
proc = subprocess.Popen(args)
procs.append(proc)
for proc in procs:
proc.wait()
list.txt :
dir1
dir2
dir3
dir4
dir5
dir6
dir7
dir8
dir9
dir10
dir11
test.pl
$com=$ARGV[0];
$dirs=$ARGV[1];
print "$com $dirs";
sleep(5);
Use Python's concurrent.futures module - it has the figure of a 'Process Pool' that will automatically keep only that many worker process, and start new tasks as the older ones are completed.
As target function, put a simple Python function to open your external process, and wait synchronously for the result - a function with the lines currently inside your for loop.
Using concurrent.futures, your code might look like this:
import json, os
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed
from subprocess import Popen, PIPE
mylist = open('list.txt', 'r')
def worker(dirs):
args = ['perl', 'test.pl', '-a']
proc = subprocess.run(args + [dirs])
executor = ThreadPoolExecutor(3) # 3 is: max-workers.
# ProcessPoolExecutor could be an option, but you don't need
# it - the `perl` process will run in other process anyway.
procs = []
for dirs in mylist:
proc = executor.submit(worker, dirs)
procs.append(proc)
for proc in as_completed(procs):
try:
result = proc.result()
except Exception as exc:
# handle any error that may have been raised in the worker
pass

Running a function or BASH command after exiting npyscreen

I'm trying to run a function after entering in npyscreen, tried a few things and am still stuck. Just exits npyscreen and returns to a bash screen. This function is supposed to start a watchdog/rsync watch-folder waiting for files to backup.
#!/usr/bin/env python
# encoding: utf-8
import npyscreen as np
from nextscript import want_to_run_this_function
class WuTangClan(np.NPSAppManaged):
def onStart(self):
self.addForm('MAIN', FormMc, name="36 Chambers")
class FormMc(np.ActionFormExpandedV2):
def create(self):
self.rza_gfk = self.add(np.TitleSelectOne, max_height=4, name="Better MC:", value=[0], values=["RZA", "GhostFace Killah"], scroll_exit=True)
def after_editing(self):
if self.rza_gfk.value == [0]:
want_to_run_this_function()
self.parentApp.setNextForm(None)
else:
self.parentApp.setNextForm(None)
if __name__ == "__main__":
App = WuTangClan()
App.run()
I not sure if i understood correctly what you want.
For executing any kind of bash command i like to use subprocess module, he has the Popen constructor, which you can use to run anything from a bash.
e.g, on windows
import subprocess
process = subprocess.Popen(['ipconfig','/all'])
On unix like system:
import subprocess
process = subprocess.Popen(['ip','a'])
If you have a ".py" file you can pass the parameters like if you where running it from the terminal
e.g
import subprocess
process = subprocess.Popen(['python3','sleeper.py'])
You can even retrieve the process pid and kill it whenever you want, you can look at subprocess module documentation here

Print in real time the result of a bash command launched with subprocess in Python

I'm using the subprocess module to run a bash command. I want to display the result in real time, including when there's no new line added, but the output is still modified.
I'm using python 3. My code is running with subprocess, but I'm open to any other module. I have some code that return a generator for every new line added.
import subprocess
import shlex
def run(command):
process = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE)
while True:
line = process.stdout.readline().rstrip()
if not line:
break
yield line.decode('utf-8')
cmd = 'ls -al'
for l in run(cmd):
print(l)
The problem comes with commands of the form rsync -P file.txt file2.txt for example, which shows a progress bar.
For example, we can start by creating a big file in bash:
base64 /dev/urandom | head -c 1000000000 > file.txt
Then try to use python to display the rsync command:
cmd = 'rsync -P file.txt file2.txt'
for l in run(cmd):
print(l)
With this code, the progress bar is only printed at the end of the process, but I want to print the progress in real time.
From this answer you can disable buffering when print in python:
You can skip buffering for a whole python process using "python -u"
(or #!/usr/bin/env python -u etc) or by setting the environment
variable PYTHONUNBUFFERED.
You could also replace sys.stdout with some other stream like wrapper
which does a flush after every call.
Something like this (not really tested) might work...but there are
probably problems that could pop up. For instance, I don't think it
will work in IDLE, since sys.stdout is already replaced with some
funny object there which doesn't like to be flushed. (This could be
considered a bug in IDLE though.)
>>> class Unbuffered:
.. def __init__(self, stream):
.. self.stream = stream
.. def write(self, data):
.. self.stream.write(data)
.. self.stream.flush()
.. def __getattr__(self, attr):
.. return getattr(self.stream, attr)
..
>>> import sys
>>> sys.stdout=Unbuffered(sys.stdout)
>>> print 'Hello'
Hello

Resources