how can restrict processes in parallel - python-3.x

I have this script, but I only want to start 3 perl processes at a time. Once these 3 are done, the script should start the next three.
at the moment all processes are started in parallel
unfortunately I don't know what to do. can someone help me?
my script:
import json, os
import subprocess
from subprocess import Popen, PIPE
list = open('list.txt', 'r')
procs = []
for dirs in list:
args = ['perl', 'test.pl', '-a', dirs]
proc = subprocess.Popen(args)
procs.append(proc)
for proc in procs:
proc.wait()
list.txt :
dir1
dir2
dir3
dir4
dir5
dir6
dir7
dir8
dir9
dir10
dir11
test.pl
$com=$ARGV[0];
$dirs=$ARGV[1];
print "$com $dirs";
sleep(5);

Use Python's concurrent.futures module - it has the figure of a 'Process Pool' that will automatically keep only that many worker process, and start new tasks as the older ones are completed.
As target function, put a simple Python function to open your external process, and wait synchronously for the result - a function with the lines currently inside your for loop.
Using concurrent.futures, your code might look like this:
import json, os
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed
from subprocess import Popen, PIPE
mylist = open('list.txt', 'r')
def worker(dirs):
args = ['perl', 'test.pl', '-a']
proc = subprocess.run(args + [dirs])
executor = ThreadPoolExecutor(3) # 3 is: max-workers.
# ProcessPoolExecutor could be an option, but you don't need
# it - the `perl` process will run in other process anyway.
procs = []
for dirs in mylist:
proc = executor.submit(worker, dirs)
procs.append(proc)
for proc in as_completed(procs):
try:
result = proc.result()
except Exception as exc:
# handle any error that may have been raised in the worker
pass

Related

v2ray (vmess|vless|trojan|ss) test in python

Friends, how can I test the configurations of (vmess|vless|trojan|ss) with Python?
I need a function to test the speed of given v2ray configs
There is a project vmessping by v2fly that support only vmess but there is a LiteSpeedTest for trojan/ss
sample:
from subprocess import Popen, PIPE
def speedtest(vmesslink):
process = Popen(["./vmessspeed", vmesslink], stdout=PIPE)
stdout = process.communicate()[0]
return stdout

How to implement Multiprocessing in Azure Databricks - Python

I need to get details of each file from a directory. It is taking longer time. I need to implement Multiprocessing so that it's execution can be completed early.
My code is like this:
from pathlib import Path
from os.path import getmtime, getsize
from multiprocessing import Pool, Process
def iterate_directories(root_dir):
for child in Path(root_dir).iterdir():
if child.is_file():
modified_time = datetime.fromtimestamp(getmtime(file)).date()
file_size = getsize(file)
# further steps...
else:
iterate_directories(child) ## I need this to run on separate Process (in Parallel)
I tried to do recursive call using below, but it is not working. It comes out of loop immediately.
else:
p = Process(target=iterate_directories, args=(child))
Pros.append(p) # declared Pros as empty list.
p.start()
for p in Pros:
if not p.is_alive():
p.join()
What am I missing here? How can I run for sub-directories in parallel.
You have to get the directories list first and then you have to use multiprocessing pool to call the function.
something like below.
from pathlib import Path
from os.path import getmtime, getsize
from multiprocessing import Pool, Process
Filedetails = ''
def iterate_directories(root_dir):
for child in Path(root_dir).iterdir():
if child.is_file():
modified_time = datetime.fromtimestamp(getmtime(file)).date()
file_size = getsize(file)
Filedetails = Filedetails + '\n' + '{add file name details}' + modified_time + file_size
else:
iterate_directories(child) ## I need this to run on separate Process (in Parallel)
return Filesdetails #file return from that particular directory
pool = multiprocessing.Pool(processes={define how many processes you like to run in parallel})
results = pool.map(iterate_directories, {explicit directory list })
print(results) #entire collection will be printed here. it basically a list you can iterate individual directory level
.
pls let me know, how it goes.
The problem is this line:
if not p.is_alive():
What this translates to is that if the process is already complete, only then wait for it to complete, which obviously does not make much sense (you need to remove the not from the statement). Also, it is completely unnecessary as well. Calling .join does the same thing internally that p.is_alive does (except one blocks). So you can safely just do this:
for p in Pros:
p.join()
The code will then wait for all child processes to finish.

subprocess called from multiprocess not finishing

I've got a set of videos from which I'm trying to pull random frames. There are a lot of videos, so I want to work in parallel, and ffmpeg can splice out the frames for me, so here's the important part of the code:
import os
from tqdm import tqdm
from joblib import Parallel, delayed
from multiprocessing import current_process
from subprocess import Popen
vids_dir = 'just a string'
out_dir = 'another string'
def process_each(f):
temp = 'temp' + str(current_process()._identity[0])
os.mkdir(temp)
proc = Popen(['ffmpeg -i ' + vids_dir + '/' + f + ' ' + temp + '/' + f[:-4] + '_%03d.jpg &> /dev/null'], shell=True) # convert to frames
proc.wait()
# do stuff
os.system('rm -rf ' + temp) # clean up
Parallel(n_jobs=10)(delayed(process_each)(f) for f in tqdm(os.listdir(vids_dir)))
I can print out the command being passed to Popen and execute it in a shell, and it works. I can open a python3 session and call the command from Popen or subprocess.call or even os.system, and it works. I can even set n_jobs=1 in my Parallel, and it works.
But the moment I actually parallelize this, I find ffmpeg doesn't flush its full results to the temporary folders; it only gets the first one or few frames.
What on earth could be going on? subprocess and multiprocessing should be able to mix this way.

Running a function or BASH command after exiting npyscreen

I'm trying to run a function after entering in npyscreen, tried a few things and am still stuck. Just exits npyscreen and returns to a bash screen. This function is supposed to start a watchdog/rsync watch-folder waiting for files to backup.
#!/usr/bin/env python
# encoding: utf-8
import npyscreen as np
from nextscript import want_to_run_this_function
class WuTangClan(np.NPSAppManaged):
def onStart(self):
self.addForm('MAIN', FormMc, name="36 Chambers")
class FormMc(np.ActionFormExpandedV2):
def create(self):
self.rza_gfk = self.add(np.TitleSelectOne, max_height=4, name="Better MC:", value=[0], values=["RZA", "GhostFace Killah"], scroll_exit=True)
def after_editing(self):
if self.rza_gfk.value == [0]:
want_to_run_this_function()
self.parentApp.setNextForm(None)
else:
self.parentApp.setNextForm(None)
if __name__ == "__main__":
App = WuTangClan()
App.run()
I not sure if i understood correctly what you want.
For executing any kind of bash command i like to use subprocess module, he has the Popen constructor, which you can use to run anything from a bash.
e.g, on windows
import subprocess
process = subprocess.Popen(['ipconfig','/all'])
On unix like system:
import subprocess
process = subprocess.Popen(['ip','a'])
If you have a ".py" file you can pass the parameters like if you where running it from the terminal
e.g
import subprocess
process = subprocess.Popen(['python3','sleeper.py'])
You can even retrieve the process pid and kill it whenever you want, you can look at subprocess module documentation here

Can't capture stdout with unittest

I have a python3.7 script, which takes a YAML file as input and processes it depending on the instructions within. The YAML file I am using for unit testing looks like this:
...
tasks:
- echo '1'
- echo '2'
- echo '3'
- echo '4'
- echo '5'
The script loops over tasks and then runs each one, using os.system() call.
The manual testing indicates, that the output is as expected:
1
2
3
4
5
But I can't make it work in my unit test. Here's how I'm trying to capture the output:
from application import application
from io import StringIO
import unittest
from unittest.mock import patch
class TestApplication(unittest.TestCase):
def test_application_tasks(self):
expected = ['1','2','3','4','5']
with patch('sys.stdout', new=StringIO()) as fakeOutput:
application.parse_event('some event') # print() is called here within parse_event()
self.assertEqual(fakeOutput.getvalue().strip().split(), expected)
When running python3 -m unittest discover -s tests, all I get is AssertionError: Lists differ: [] != ['1', '2', '3', '4', '5'].
I also tried using with patch('sys.stdout', new_callable=StringIO) as fakeOutput: instead, but to no avail.
Another thing I tried was self.assertEqual(fakeOutput.getvalue(), '1\n2\n3\n4\n5'), and here what the unittest outputs:
AssertionError: '' != '1\n2\n3\n4\n5'
+ 1
+ 2
+ 3
+ 4
+ 5
Obviously, the script works and outputs the right result, but fakeOutput does not capture it.
Using patch as a decorator does not work either:
from application import application
from io import StringIO
import unittest
from unittest.mock import patch
class TestApplication(unittest.TestCase):
#patch('sys.stdout', new_callable=StringIO)
def test_application_tasks(self):
expected = ['1','2','3','4','5']
application.parse_event('some event') # print() is called here within parse_event()
self.assertEqual(fakeOutput.getvalue().strip().split(), expected)
Would output absolutely the same error: AssertionError: Lists differ: [] != ['1', '2', '3', '4', '5']
os.system runs a new process. If you monkey-patch sys.stdout this affects the current process but has no consequences for any new processes.
Consider:
import sys
from os import system
from io import BytesIO
capture = sys.stdout = BytesIO()
system("echo Hello")
sys.stdout = sys.__stdout__
print(capture.getvalue())
Nothing is captured because only the child process has written to its stdout. Nothing has written to the stdout of your Python process.
Generally, avoid os.system. Instead, use the subprocess module which will let you capture output from the process that is run.
Thank you, Jean-Paul Calderone. I realized the fact, that os.system() creates a completely different process and therefore I need to tackle the problem differently, only after I posted the question :)
To actually be able to test my code, I had to rewrite it using subprocess instead of os.system(). In the end, I went with subprocess_run_result = subprocess.run(task, shell=True, stdout=subprocess.PIPE) and then getting the result using subprocess_run_result.stdout.strip().decode("utf-8").
In the tests I just create an instance of class and call a method, which runs the tasks in subprocess.
My whole refactored code and tests are here in this commit if anyone would like to take a look.
Your solution is fine, just use getvalue instead, like so:
with patch("sys.stdout", new_callable=StringIO) as f:
print("Foo")
r = f.getvalue()
print("r: {r!r} ;".format(r=r))
r: "Foo" ;

Resources