Python Multithreading Producer Consumer Pattern - python-3.x

I'm still learning how to code and these are my first attempts at multithreading.
I've read a bunch of multithreading articles. I thought these were very helpful:
Processing single file from multiple processes
Python module of the week: multiprocessing
Producer-consumer problem in Python
Multiprocessing
There's quite a lot to think about. Especially for a beginner.
Unfortunately, when I try to put this information into practice my code isn't quite working.
The idea behind this code is to read simplified.txt which contains lines of comma delimited numbers. Eg: 0.275,0.28,0.275,0.275,36078.
The producer thread reads each line and strips the newline character from the end of the line. Then each number in the line is split and assigned a variable.
Variable1 is then placed into the queue.
The consumer thread will pick up items in the queue, square it, then add an entry into the log file.
The code I am using comes from this template. This is the code I have so far:
import threading
import queue
import time
import logging
import random
import sys
read_file = 'C:/temp/temp1/simplified.txt'
log1 = open('C:/temp/temp1/simplified_log1.txt', "a+")
logging.basicConfig(level=logging.DEBUG, format='(%(threadName)-9s) %(message)s',)
BUF_SIZE = 10
q = queue.Queue(BUF_SIZE)
class ProducerThread(threading.Thread):
def __init__(self, name, read_file):
super(ProducerThread,self).__init__()
self.name = name
self.read_file = read_file
def run(self, read_file):
while True:
if not q.full():
with open(read_file, 'r') as f:
for line in f:
stripped = line.strip('\n\r')
value1,value2,value3,value4,value5,value6,value7 = stripped.split(',')
q.put(value1)
logging.debug('Putting ' + str(value1) + ' : ' + str(q.qsize()) + ' items in queue')
time.sleep(random.random())
return
class ConsumerThread(threading.Thread):
def __init__(self, name, value1, log1):
super(ConsumerThread,self).__init__()
self.name = name
self.value1 = value1
self.log1 = log1
return
def run(self):
while True:
if not q.empty():
value1 = q.get()
sqr_value1 = value1 * value1
log1.write("The square of " + str(value1) + " is " + str(sqr_value1))
logging.debug('Getting ' + str(value1) + ' : ' + str(q.qsize()) + ' items in queue')
time.sleep(random.random())
return
if __name__ == '__main__':
p = ProducerThread(name='producer')
c = ConsumerThread(name='consumer')
p.start()
time.sleep(2)
c.start()
time.sleep(2)
When I run the code, I get this error:
Traceback (most recent call last):
File "c:/Scripta/A_Simplified_Producer_Consumer_Queue_v0.1.py", line 60, in <module>
p = ProducerThread(name='producer')
TypeError: __init__() missing 1 required positional argument: 'read_file'
I don't know where else I need to add read_file.
Any help would be greatly appreciated. Thanks in advance.

Your ProducerThread class requires 2 parameters (name and read_file) as arguments to its constructor as defined in its __init__ method, where you only provide the first such argument when you create an instance in your main block. You have the same problem with your second class.
You should either provide the read_file to the constructors when creating instances or just remove it from the constructor signature since you don't appear to use it anyways (you use the read_file passed into run function, but I don't think that is correct). Seems like you're attempting to override that method from the Thread superclass and I doubt that takes such a parameter.

Thank you userSeventeen for setting me on the right path.
I thought that in order to use outside variables I needed to place them in the init method, then again into the run method. You've clarified that I only needed to use the variables in the run methods.
This is the working code. I had to remove the while true: statement as I did not want the code to run forever.
import threading
import queue
import time
import logging
import random
import sys
import os
read_file = 'C:/temp/temp1/simplified.txt'
log1 = open('C:/temp/temp1/simplified_log1.txt', "a+")
logging.basicConfig(level=logging.DEBUG, format='(%(threadName)-9s) %(message)s',)
BUF_SIZE = 10
q = queue.Queue(BUF_SIZE)
class ProducerThread(threading.Thread):
def __init__(self, name):
super(ProducerThread,self).__init__()
self.name = name
def run(self):
with open(read_file, 'r') as f:
for line in f:
stripped = line.strip('\n\r')
value1,value2,value3,value4,value5 = stripped.split(',')
float_value1 = float(value1)
if not q.full():
q.put(float_value1)
logging.debug('Putting ' + str(float_value1) + ' : ' + str(q.qsize()) + ' items in queue')
time.sleep(random.random())
return
class ConsumerThread(threading.Thread):
def __init__(self, name):
super(ConsumerThread,self).__init__()
self.name = name
return
def run(self):
while not q.empty():
float_value1 = q.get()
sqr_value1 = float_value1 * float_value1
log1.write("The square of " + str(float_value1) + " is " + str(sqr_value1))
logging.debug('Getting ' + str(float_value1) + ' : ' + str(q.qsize()) + ' items in queue')
time.sleep(random.random())
return
if __name__ == '__main__':
p = ProducerThread(name='producer')
c = ConsumerThread(name='consumer')
p.start()
time.sleep(2)
c.start()
time.sleep(2)

Related

How to find out how long a search for files will take on python?

So I have a little app that searches for all xml files on my pc, copying the files that have 44 digits as the filename to the "output" folder.
The problem is that the final user needs an indication of the progress and remaining time of the task.
This is the module to copy files:
xml_search.py
import os
import re
from threading import Thread
from datetime import datetime
import time
import shutil
import winsound
os.system('cls')
def get_drives():
response = os.popen("wmic logicaldisk get caption")
list1 = []
t1 = datetime.now()
for line in response.readlines():
line = line.strip("\n")
line = line.strip("\r")
line = line.strip(" ")
if (line == "Caption" or line == ""):
continue
list1.append(line + '\\')
return list1
def search1(drive):
for root, dir, files in os.walk(drive):
for file in files:
if re.match("\d{44}.xml", file):
filename = os.path.join(root, file)
try:
shutil.copy(filename, os.path.join('output', file))
except Exception as e:
pass
def exec_(callback):
t1 = datetime.now()
list2 = [] # empty list is created
list1 = get_drives()
for each in list1:
process1 = Thread(target=search1, args=(each,))
process1.start()
list2.append(process1)
for t in list2:
t.join() # Terminate the threads
t2 = datetime.now()
total = str(t2-t1)
print(total, file=open('times.txt', 'a'), end="\n")
for x in range(3):
winsound.Beep(2000,100)
time.sleep(.1)
callback()
if __name__ == "__main__":
exec_()
The below code uses progressbar library and it shows
indication of the progress and remaining time of the task
import progressbar
from time import sleep
bar = progressbar.ProgressBar(maxval=1120, \
widgets=[progressbar.Bar('=', '[', ']'), ' ', progressbar.ETA()])
bar.start()
for i in range(1120):
bar.update(i+1)
sleep(0.1)
bar.finish()
You would need to add the above modified code to your code.
So in your case, you would need to count the number of files and provide it as input to ProgressBar constructor's maxval argument and remove sleep call.
The suggested solution with progress bar should work with one thread. You would need to figure out how to initiate the progress bar and where to put the updates if you insist to work with multiple threads.
Try to implement a timer decorator like the following:
import time
def mytimer(func):
def wrapper():
t1 = time.time()
result = func()
t2 = time.time()
print(f"The function {func.__name__} was run {t2 - t1} seconds")
return result
return wrapper
#mytimer
def TimeConsumingFunction():
time.sleep(3)
print("Hello timers")
TimeConsumingFunction()
Output:
/usr/bin/python3.7 /home/user/Documents/python-workspace/timers/example.py
Hello timers
The function TimeConsumingFunction was run 3.002610206604004 seconds
Process finished with exit code 0

how to save user input string in init()

I don't understand how to save the input string and then for in init() to later use it in another function
Class Person:
def __init__ (self):
....
def cCount (self):
num_A = self.count('A')
return num_A
import sys
def main():
inX = input('sequence?')
while inX
myY = Person(inX)
myCnumber = myY.cCount()
print (" {0}".format(myCnumber))
...
I want the output to be the count the number of As in the user input string
You can rearrange your code a little bit like this to achieve what you want:
class Person:
def __init__ (self, sequence):
self.sequence = sequence
def cCount (self):
return self.sequence.count('A')
import sys
def main():
inX = input('sequence?')
while inX:
myY = Person(inX)
myCnumber = myY.cCount()
print (" {0}".format(myCnumber))
inX = input('sequence?')
main()
Result
sequence?ABCDEF
1
sequence?XAAAEWF
3
sequence?
Changes made
Class was replaced with class - lowercase
Init definition added with sequence parameter
Inside init added self.sequence and initialized it
In def cCount, changed to self.sequence.count('A') and returned it
Indented def main's body
Added colon after inX
added inX = input.. at the bottom of while so that while can continue until you just hit Enter

Why are my class functions executed when importing the class?

it's probably a very basic question but I was unable to find an answer that I could thoroughly understand.
In my main program main_program.py, I'm importing a class that itself imports another class:
in main_program.py:
from createTest import *
in createTest.py:
print("TEST")
from recordRecallQused import *
print("TEST")
now in recordRecallQused:
class recordRecallQused:
def __init__(self, path):
self.path = path
try:
with open(self.path, 'r',newline = '') as question_used:
question_used.closed
except IOError:
#if file doesnt exist
print("the file doesn't exist")
with open(self.path, 'w',newline = '') as question_used:
question_used.closed
def recallQused(self):
list_Qused = []
print("I'm being executed")
with open(self.path, 'r',newline = '') as question_used:
questionused = csv.reader(question_used)
for item in questionused:
if len(item)>0:
list_Qused.append(item[0])
question_used.closed
return list_Qused
What I obtain in the kernel:
>TEST
>I'm being executed
>TEST
so functions inside the class are executed even though they are not called, but I have read that it's "normal", "def" are no statements but "live" things.
Still, I have tried something much more simple:
in main_program_TEST.py
from class1 import *
a = class1()
in class1.py:
print("t")
from class2 import *
print("t")
class class1:
def __init__(self):
pass
def message(self):
print("prout")
in class2.py:
class class2:
def __init__(self):
pass
def message(self):
print("prout2")
When executing main_program_TEST.py the kernel displays
>t
>t
so this time the functions in class2.py have not been executed, otherwise the kernel would show instead:
>t
>prout2
>t
I really wonder why.
Stephen Rauch you are right, part of my code in recordRecallQused.py was calling the function.
"""#load all list
print("loading questions info")
# questions info: answers, category
list_AllQ = []
with open('questionsInfoTo130.csv', newline = '') as csvfile:
questionsInfo = csv.reader(csvfile)
# loop over the questions information rows
for (i,row) in enumerate(questionsInfo):
if(i!=0):
list_AllQ.append(row)
csvfile.close()
path = 'question_used.csv'"""
list_AllQ = [[0,1,2,1,"que"],[0,1,2,2,"que"],[0,1,2,3,"que"],[0,1,2,4,"que"],[0,1,2,55,"que"],[0,1,2,6,"que"],[0,1,2,7,"que"],[0,1,2,8,"que"],[0,1,2,9,"que"]]
a = recordRecallQused('question_used.csv')
list_Qused = a.recallQused()
list_avQ = a.createListavQ(list_Qused, list_AllQ)
list_Qtest = a.createListQtest(list_avQ)
a.recordQused(list_Qtest)

Python 3 run two applcations in windows parallel

one question: I want to run two different exe files parallel (Windows).
All my tests starts the two applications, but one after the other (after closing the application). What's wrong?
import threading
import subprocess
import os.path
def Worker(aPrg):
_, name = os.path.split(aPrg)
if os.path.isfile(aPrg):
lExe = []
lExe.append(aPrg)
print('Start: ' + name)
lResult = subprocess.call(lExe)
else:
print('ERROR: ' + name + ' not available!')
return
def main():
t1 = threading.Thread(target=Worker('C:\\windows\\notepad.exe'))
t2 = threading.Thread(target=Worker('c:\\windows\\explorer.exe'))
t1.start()
t2.start()
if __name__ == '__main__':
main()
Thanks for all ideas!
Geosucher
The reasons for this problem is discussed here : Python threading appears to run threads sequentially
This should help you:
import subprocess
import os.path
import multiprocessing
def Worker(aPrg):
_, name = os.path.split(aPrg)
if os.path.isfile(aPrg):
lExe = []
lExe.append(aPrg)
print('Start: ' + name)
lResult = subprocess.call(lExe)
else:
print('ERROR: ' + name + ' not available!')
return
def main():
p = multiprocessing.Pool(2)
p.map(Worker, ('c:\\windows\\explorer.exe','c:\\windows\\explorer.exe'))
if __name__ == '__main__':
main()

How to decorate an asyncio.coroutine to retain its __name__?

I've tried to write a decorator function which wraps an asyncio.coroutine and returns the time it took to get done. The recipe below contains the code which is working as I expected. My only problem with it that somehow I loose the name of the decorated function despite the use of #functools.wraps. How to retain the name of the original coroutine? I checked the source of asyncio.
import asyncio
import functools
import random
import time
MULTIPLIER = 5
def time_resulted(coro):
#functools.wraps(coro)
#asyncio.coroutine
def wrapper(*args, **kargs):
time_before = time.time()
result = yield from coro(*args, **kargs)
if result is not None:
raise TypeError('time resulted coroutine can '
'only return None')
return time_before, time.time()
print('= wrapper.__name__: {!r} ='.format(wrapper.__name__))
return wrapper
#time_resulted
#asyncio.coroutine
def random_sleep():
sleep_time = random.random() * MULTIPLIER
print('{} -> {}'.format(time.time(), sleep_time))
yield from asyncio.sleep(sleep_time)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
tasks = [asyncio.Task(random_sleep()) for i in range(5)]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
for task in tasks:
print(task, task.result()[1] - task.result()[0])
print('= random_sleep.__name__: {!r} ='.format(
random_sleep.__name__))
print('= random_sleep().__name__: {!r} ='.format(
random_sleep().__name__))
The result:
= wrapper.__name__: 'random_sleep' =
1397226479.00875 -> 4.261069174838891
1397226479.00875 -> 0.6596335046471768
1397226479.00875 -> 3.83421163259601
1397226479.00875 -> 2.5514027672929713
1397226479.00875 -> 4.497471439365472
Task(<wrapper>)<result=(1397226479.00875, 1397226483.274884)> 4.266134023666382
Task(<wrapper>)<result=(1397226479.00875, 1397226479.6697)> 0.6609499454498291
Task(<wrapper>)<result=(1397226479.00875, 1397226482.844265)> 3.835515022277832
Task(<wrapper>)<result=(1397226479.00875, 1397226481.562422)> 2.5536720752716064
Task(<wrapper>)<result=(1397226479.00875, 1397226483.51523)> 4.506479978561401
= random_sleep.__name__: 'random_sleep' =
= random_sleep().__name__: 'wrapper' =
As you can see random_sleep() returns a generator object with different name. I would like to retain the name of the decorated coroutine. I am not aware if this is problem is specific to asyncio.coroutines or not. I also tried the code with different decorator orders, but all has the same result. If I comment #functools.wraps(coro) then even random_sleep.__name__ becomes wrapper as I expected.
EDIT: I've posted this issue to Python Issue Tracker and received the following answer by R. David Murray: "I think this is a specific case of a more general need to improve 'wraps' that was discussed on python-dev not too long ago."
The issue is that functools.wraps changes only wrapper.__name__ and wrapper().__name__ stays wrapper. __name__ is a readonly generator attribute. You could use exec to set appropriate name:
import asyncio
import functools
import uuid
from textwrap import dedent
def wrap_coroutine(coro, name_prefix='__' + uuid.uuid4().hex):
"""Like functools.wraps but preserves coroutine names."""
# attribute __name__ is not writable for a generator, set it dynamically
namespace = {
# use name_prefix to avoid an accidental name conflict
name_prefix + 'coro': coro,
name_prefix + 'functools': functools,
name_prefix + 'asyncio': asyncio,
}
exec(dedent('''
def {0}decorator({0}wrapper_coro):
#{0}functools.wraps({0}coro)
#{0}asyncio.coroutine
def {wrapper_name}(*{0}args, **{0}kwargs):
{0}result = yield from {0}wrapper_coro(*{0}args, **{0}kwargs)
return {0}result
return {wrapper_name}
''').format(name_prefix, wrapper_name=coro.__name__), namespace)
return namespace[name_prefix + 'decorator']
Usage:
def time_resulted(coro):
#wrap_coroutine(coro)
def wrapper(*args, **kargs):
# ...
return wrapper
It works but there is probably a better way than using exec().
In the time since this question was asked, it became possible to change the name of a coroutine. It is done by setting __qualname__ (not __name__):
async def my_coro(): pass
c = my_coro()
print(repr(c))
# <coroutine object my_coro at 0x7ff8a7d52bc0>
c.__qualname__ = 'flimflam'
print(repr(c))
# <coroutine object flimflam at 0x7ff8a7d52bc0>
import asyncio
print(repr(asyncio.ensure_future(c)))
# <Task pending name='Task-737' coro=<flimflam() running at <ipython-input>:1>>
The usage of __qualname__ in a coroutine object's __repr__ is defined in the CPython source

Resources