I have developed a Python script that allows (in theory) to convert WAV files to MP3 files. I'm trying to make it ASYNC (asyncio) to convert multiple files simultaneously and reduce the processing time.
but that I converted 1 or 10 the time spent and the same. I'm not very good at using async yet
class SoundsProcessing:
async def cconvert(self, sem, audioFileIndex, audioFile, aOutputFormat = 'mp3'):
try:
async with sem:
inputfile = tempfile.NamedTemporaryFile()
inputfile.write(audioFile)
outputfile = tempfile.NamedTemporaryFile()
AudioSegment.from_wav(inputfile.name).export(outputfile.name+'.'+aOutputFormat, format=aOutputFormat)
inputfile.close()
audio = await readAsyncFile(outputfile.name+'.'+aOutputFormat)
self.audioFiles[audioFileIndex] = audio
outputfile.close()
logger.add('INFO', "Audio Files conversion: " + outputfile.name + " indexed " + str(audioFileIndex) + " is Done")
return audio
except Exception as e:
logger.add('WARNING', "Audio Files conversion: " + str(e))
return False
async def audioConversion(self, aOutputAudioFormat = 'mp3'):
tasks = []
sem = asyncio.Semaphore(10)
start_at = time.time()
logger.add('INFO', "Audio Files conversion: Start session for " + str(len(self.audioFiles)) + " files/treatment")
audioFileIndex = 0
for audioFile in self.audioFiles:
task = asyncio.ensure_future(self.cconvert(sem, audioFileIndex, audioFile, aOutputAudioFormat))
tasks.append(task)
audioFileIndex = audioFileIndex + 1
responses = asyncio.gather(*tasks)
await responses
time_lapse = round(time.time() - start_at, 2)
query_by_seconds = round(len(self.audioFiles) / time_lapse, 2)
logger.add('INFO', "Audio Files conversion: End session on " + str(time_lapse) + " seconds ("+str(query_by_seconds)+"q/s)")
def convert(self, aOutputAudioFormat = 'mp3'):
self.results = {}
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
future = asyncio.ensure_future(self.audioConversion(aOutputAudioFormat))
loop.run_until_complete(future)
you can rewrite the code correctly as "async" - and at first glance it is ok - but it won't be any faster: this is a CPU bound task - that is, the process takes time in the .from_wav call. The asyncio loop will be blocked while that does not return.
What you can try, if you have a multi-core machine is to write the body of your cconvert method, the part inside the with sem: block in a synchronous method (forget the async file access), and run that in a ProcessPollExecutor, using the loop.run_in_executor call, passing in an explicit concurrent.futures.ProcessPoolExecutor instance.
Related
Python 3.10.6
ManagerTask() - responsible of executing Task() methods .
Note , Task's method are actually are Celery tasks (async ).
I'd like to add the option to track those tasks execution inside ManagerTask class.
I manage to get it work but as it my first asyncio I'm not sure i'm doing it right ,
(I'm aware of Flower).
Second, at current running single Main() which orchestrating executing Task() ,
In real world need to expand it to support executing multiple sessions of Main() in parallel.
class ManagerTask:
def __init__(self ,id:int ) -> None:
self.id = id
self.tasks: List["Task"] = []
self.executed_tasks: List["Task"] = []
self.state = State.SCHEDULED
def load_tasks(self , configs: List[Dict]):
# code for loading tasks
async def run(self):
""" Execute all tasks and start monitor results"""
execute_task = asyncio.create_task(self.task_exec())
monitor_progresss = asyncio.create_task(self.monitor_progress())
await execute_task
await monitor_progresss
async def monitor_progress(self):
""" If one task Failed - Main.state failed if all success mark Main.state -success"""
import time
failure_flag = False
count_success = 0
i = 0
while True:
state = self.executed_tasks[i].task_run.state
if state == 'SUCCESS':
count_success += 1
i += 1
if state == 'FAILURE':
failure_flag = True
i += 1
print(f'Rule_ID:{self.executed_tasks[i].rule_id} \
\nCelery_UUID:{self.executed_tasks[i].task_celery_id} \
\nstatus - {self.executed_tasks[i].task_run.state}')
# all tasks proccessed (either failed/success)
if i == len(self.executed_tasks) -1:
if failure_flag:
self.state = State.FAILED
elif count_success == len(self.executed_tasks) :
self.state = State.FINISHED
break;
# otherwise wait
await asyncio.sleep(3)
async def task_exec(self):
for task in self.starting_tasks:
task.run()
client (execute the app):
...code
cust = MainTask(id=1)
cust.load_tasks(configs=rules ,db_connections=db_connections)
asyncio.run(cust.run())
print("MainTask State:" + cust.state)
Example of output:
Rule_ID:4
Celery_UUID:fecde27c-b58a-43cd-9498-3478404c248b
status - FAILURE
....
Rule_ID:6
Celery_UUID:85df9bba-3d75-4b00-a533-a81cd3f6afb3
status - SUCCESS
MainTask State:Failed
1.Is that is proper way executing asyncio ?
2.for running multiple MainTask , how I should do it? Thread/Asyncio?
As this program executing all task using celery I think i should also run in asyncio but not sure .
second , would be thankful for guidance if this is the right approach
async def exec_id(id):
cust = MainTask(id=id)
cust.load_tasks(configs=rules ,...)
await cust.run()
async def main():
ids = [111,222,333]
for id in ids:
await exec_id(id)
asyncio.run(main())
The problem I am having is that:
To start an async function in the background I need an asycio event
loop.
This event loop usualy exists in the main thread, and when started,
blocks the exeuction of that thread (i.e lines of code after starting
the event loop aren't run untill the event loop is cancelled).
However, ROS2 has it's own event loop (executor) that also usually runs in the main thread
and blocks execution. This means it is difficult to have both event loops running
My attempted sollution was to start the asyncio event loop in a seperate thread. This is started in the Node constructor, and stops after the Node is deconstructed.
This looks like this:
class IncrementPercentDoneServiceNode(Node):
def __create_task(self, f: Awaitable):
self.__task = self.__loop.create_task(f)
def __init__(self):
super().__init__('increment_percent_done_service_node')
self.__loop = asyncio.new_event_loop()
self.__task: Optional[Task] = None
self.__thread = threading.Thread(target=self.__loop.run_forever)
self.__thread.start()
self.done = False
self.create_service(Trigger, 'start_incrementing',
callback=lambda request, responce : (
self.get_logger().info("Starting service"),
self.__loop.call_soon_threadsafe(self.__create_task, self.__increment_percent_complete()),
TriggerResponse(success=True, message='')
)[-1]
)
def __del__(self):
print("stopping loop")
self.done = True
if self.__task is not None:
self.__task.cancel()
self.__loop.stop()
self.__thread.join()
async def __increment_percent_complete(self):
timeout_start = time.time()
duration = 5
while time.time() < (timeout_start + duration):
time_since_start = time.time() - timeout_start
percent_complete = (time_since_start / duration) * 100.0
self.get_logger().info("Percent complete: {}%".format(percent_complete))
await asyncio.sleep(0.5)
self.get_logger().info("leaving async function")
self.done = True
if __name__ == '__main__':
rclpy.init()
test = IncrementPercentDoneServiceNode()
e = MultiThreadedExecutor()
e.add_node(test)
e.spin()
Is this a sensible way to do it? Is there a better way? How would I cancel the start_incrementing service with another service? (I know that this is what actions are for, but I cannot use them in this instance).
I'm trying to simulate processing in threads by using asyncio.Queue. However, I'm struggling to turn a threaded processing simulation part to asynchronous loop.
So what my script does in brief: 1) receive processing requests over a websocket, 2) assign the request to the requested queue (which simulates a thread), 3) runs processing queues, which put responses into one shared response queue, and then 4) the websocket takes out the responses from the shared queue one by one and sends them out to the server.
Simplified version of my code:
# Initialize empty processing queues for the number of threads I want to simulate
processing_queues = [asyncio.Queue() for i in range(n_queues)
# Initialize shared response queue
response_q = asyncio.Queue()
# Set up a websocket context manager
async with websockets.connect(f"ws://{host}:{port}") as websocket:
while True:
# Read incoming requests
message = await websocket.recv()
# Parse mssg -> get request data and on which thread / queue to process it
request_data, queue_no = parse_message(message)
# Put the request data to the requested queue (imitating thread)
await processing_queues[queue_no].put(request_data)
# THIS IS WHERE I THINK ASYNCHRONY BREAKS (AND I NEED HELP)
# Do processing in each imitated processing thread
for proc_q in processing_queues:
if not proc_q.empty():
request_data = await proc_q.get()
# do the processing
response = process_data(request_data)
# Add the response to the response queue
await response_q.put(response)
# Send responses back to the server
if not response_q.empty():
response_data = response_q.get()
await websocket.send(response_data)
From the output of the script, I deduced that 1) I seem to receive requests and send out responses asynchronously; 2) processing in queues does not happen asynchronously. Correct me if I'm wrong.
I was reading about create_task() in asyncio. Maybe that could be a way to solve my problem?
I'm open to any solution (even hacky).
P.S. I would just use threads from threading library, but I need asyncio for websockets library.
P.P.S. Threaded version of my idea.
class ProcessingImitationThread(threading.Thread):
def __init__(self, thread_id, request_q, response_q):
threading.Thread.__init__(self)
self.thread_id = thread_id
self.request_q = request_q
self.response_q = response_q
def run(self):
while True:
try:
(x, request_id) = self.request_q.get()
except Empty:
time.sleep(0.2)
else:
if x == -1:
# EXIT CONDITION
break
else:
sleep_time_for_x = count_imitation(x, state)
time.sleep(sleep_time_for_x)
self.response_q.put(request_id)
print(f"request {request_id} executed")
# Set up
processing_qs = [queue.Queue() for i in range(n_processes_simulated)]
response_q = queue.Queue()
processing_thread_handlers = []
for i in n_processes_simulated:
# create thread
t = ProcessingImitationThread(i, processing_qs[i], response_q)
processing_thread_handlers.append(t)
# Main loop
while True:
# receive requests and assign to requested queue (so that thread picks up)
if new_request:
requested_process, x, request_id = parse(new_request)
processing_qs[requested_process].put((x, request_id))
...
# if there are any new responses, sent them out to the server
if response_q.q_size() > 0:
request_id = response_q.get()
# Networking: send to server
...
# Close down
...
EDIT: fixes small typos.
Your intuition that you need create_task is correct, as create_task is the closest async equivalent of Thread.start: it creates a task that runs in parallel (in an async sense) to whatever you are doing now.
You need separate coroutines that drain the respective queues running in parallel; something like this:
async def main():
processing_qs = [asyncio.Queue() for i in range(n_queues)]
response_q = asyncio.Queue()
async with websockets.connect(f"ws://{host}:{port}") as websocket:
processing_tasks = [
asyncio.create_task(processing(processing_q, response_q))
for processing_q in processing_qs
]
response_task = asyncio.create_task(
send_responses(websocket, response_q))
while True:
message = await websocket.recv()
requested_process, x, request_id = parse(message)
await processing_qs[requested_process].put((x, request_id))
async def processing(processing_q, response_q):
while True:
x, request_id = await processing_q.get()
... create response ...
await response_q.put(response)
async def send_responses(websocket, response_q):
while True:
msg = await response_q.get()
await websocket.send(msg)
like the many other threads I've opened, I am trying to create a multi-feature instant replay system utilizing the blackmagic hyperdeck which operates over Telnet. The current feature I am trying to implement is an in-out replay which requires storing two timecode variables in the format of hh:mm:ss;ff where h=hours, m=minutes, s=seconds, and f=frames #30fps. the telnet command for this is transport info, and the response returns 9 lines of which I only want the timecode from the 7th. Any idea on how to do this, as it is way out of my league?
status: stopped
speed: 0
slot id: 1
clip id: 1
single clip: false
display timecode: 00:00:09;22
timecode: 00:00:09;22
video format: 1080i5994
loop: false
Here's ideally what I would like it to look like
import telnetlib
host = "192.168.1.13" #changes for each device
port = 9993 #specific for hyperdecks
timeout = 10
session = telnetlib.Telnet(host, port, timeout)
def In():
session.write(b"transport info \n")
line = session.read_until(b";00",.5)
print(line)
#code to take response and store given line as variable IOin
def out():
session.write(b"transport info \n")
line = session.read_until(b";00",.5)
print(line)
#code to take response and store given line as variable IOout
def IOplay():
IOtc = "playrange set: in: " + str(IOin) + " out: " + str(IOout) + " \n"
session.write( IOtc.encode() )
speed = "play: speed: " + str(Pspeed.get() ) + "\n"
session.write(speed.encode() )
For the most part here's what I got to at least partially work
TCi = 1
TCo = 1
def In():
global TCi
session.write(b"transport info \n")
by = session.read_until(b";00",.5)
print(by)
s = by.find(b"00:")
TCi = by[s:s+11]
def Out():
global TCo
session.write(b"transport info \n")
by = session.read_until(b";00",.5)
print(by)
s = by.find(b"00:")
TCo = by[s:s+11]
def IOplay():
IOtc = "playrange set: in: " + str(TCi) + " out: " + str(TCo) + " \n"
print(IOtc.encode() )
session.write(IOtc.encode() )
speed = "play: speed: 2 \n"
session.write(speed.encode() )
except that its encoding as
b"playrange set: in: b'00:00:01;11' out: b'00:00:03;10' \n"
rather than
"playrange set: in: 00:00:01;11 out: 00:00:03;10 \n"
I need to get rid of the apostrophe's and b prefix in front of the variables
Any ideas?
def get_timecode(text):
tc = ''
lines = text.split('\r\n')
for line in lines:
var, val = line.split(': ', maxsplit=1)
if var == 'timecode':
tc = val
return tc
You could choose to go directly to lines[6], without scanning,
but that would be more fragile if client got out of sync with server,
or if server's output formatting changed in a later release.
EDIT:
You wrote:
session.write(b"transport info \n")
#code to take response and store given line as variable IOin
You don't appear to be reading anything from the session.
I don't use telnetlib, but the docs suggest you'll
never obtain those nine lines of text if you don't do something like:
expect = b"foo" # some prompt string returned by server that you never described in your question
session.write(b"transport info\n")
bytes = session.read_until(expect, timeout)
text = bytes.decode()
print(text)
print('Timecode is', get_timecode(text))
When executing a bat file from Groovy, the output of this is not printed from the Groovy script until the bat file is complete. To compare, I tested the same exact bat file from C# and Perl. These both print the output of the bat file as it's being written to STDOUT.
def cmd = "batfile.bat param1"
println cmd.execute().text()
Is there a way to tell Groovy to read the stream and print immediately?
Thank you for the response! I addition to note is when using the recommendation the exec did not wait for the process to complete, which we desire in this case, so adding process.waitFor() accomplished this. Working code example below. (Note test.bat is anything you like, such as: sleep 5)
import groovy.time.*
def times = [:]
def progStartTime = new Date()
String[] caches = ["cache1", "cache2", "cache3"]
def cmd
def batFile = "test.bat "
println new Date()
for (String item : caches) {
def timeStart = new Date()
cmd = [batFile, item]
//println cmd.execute().text
def process = cmd.execute()
process.consumeProcessOutput(System.out, System.err)
process.waitFor()
def timeStop = new Date()
TimeDuration duration = TimeCategory.minus(timeStop, timeStart)
println "cache: " + item + " : " + duration
times.put(item,duration)
}
times.each{ k, v -> println "cache: ${k}, took: ${v}" }
def progStopTime = new Date()
TimeDuration duration = TimeCategory.minus(progStopTime, progStartTime)
println "Total Program duration: " + duration
println new Date()
First of all I believe it should read:
cmd.execute().text
without parenthesis so that we call the groovy Process.getText() method. However that will not solve your problem as the getText() method waits for process completion before returning.
If you don't need control of the output but just want it directly on standard out and standard err, you can use the groovy Process.consumeProcessOutput() method:
def process = "batfile.bat param1".execute()
process.consumeProcessOutput(System.out, System.err)
This will output the process out and err stream output directly on the system out and err streams as the output becomes available.
If you need processing or control, something like the following should solve your problem:
def process = "batfile.bat param1".execute()
process.in.withReader { r ->
r.eachLine { line ->
// some token processing
println "batfile output> ${line.toUpperCase()}"
}
}
also parameters with spaces tend to cause havoc so I have found it is often safer to use the groovy List.execute() form instead as in:
def process = ["batfile.bat", "param1"].execute()
which does the same thing but keeps parameter integrity with regards to spaces.