Simultaneously executing a loop and a function in Python 3 - multithreading

I have a problem: I am writing a program in Python 3.2 that requires that a loop run uninterrupted and separate from the rest of the program, but at the same time it must be able to send and receive data (such as a string) from the main part of the script. The parts would work like this:
# Continuing loop (LOOP)
while True:
data.read()
if data[2] == "ff":
string += data
if request = True:
SEND(string, MAIN)
string = []
# Main program (MAIN)
hexValues = REQUEST(string, LOOP)
So, like having two processes of Python running at the same time but talking to each other.
Is this even possible? If so, how should I do it?
EDIT: I am using Ubuntu GNU/Linux and Python 3.2.

This is what the threading module is for. You can also look at multiprocessing.

Related

Multiprocessing with Multiple Functions: Need to add a function to the pool from within another function

I am measuring the metrics of an encryption algorithm that I designed. I have declared 2 functions and a brief sample is as follows:
import sys, random, timeit, psutil, os, time
from multiprocessing import Process
from subprocess import check_output
pid=0
def cpuUsage():
global running
while pid == 0:
time.sleep(1)
running=true
p = psutil.Process(pid)
while running:
print(f'PID: {pid}\t|\tCPU Usage: {p.memory_info().rss/(1024*1024)} MB')
time.sleep(1)
def Encryption()
global pid, running
pid = os.getpid()
myList=[]
for i in range(1000):
myList.append(random.randint(-sys.maxsize,sys.maxsize)+random.random())
print('Now running timeit function for speed metrics.')
p1 = Process(target=metric_collector())
p1.start()
p1.join()
number=1000
unit='msec'
setup = '''
import homomorphic,random,sys,time,os,timeit
myList={myList}
'''
enc_code='''
for x in range(len(myList)):
myList[x] = encryptMethod(a, b, myList[x], d)
'''
dec_code='''
\nfor x in range(len(myList)):
myList[x] = decryptMethod(myList[x])
'''
time=timeit.timeit(setup=setup,
stmt=(enc_code+dec_code),
number=number)
running=False
print(f'''Average Time:\t\t\t {time/number*.0001} seconds
Total time for {number} Iters:\t\t\t {time} {unit}s
Total Encrypted/Decrypted Values:\t {number*len(myList)}''')
sys.exit()
if __name__ == '__main__':
print('Beginning Metric Evaluation\n...\n')
p2 = Process(target=Encryption())
p2.start()
p2.join()
I am sure there's an implementation error in my code, I'm just having trouble grabbing the PID for the encryption method and I am trying to make the overhead from other calls as minimal as possible so I can get an accurate reading of just the functionality of the methods being called by timeit. If you know a simpler implementation, please let me know. Trying to figure out how to measure all of the metrics has been killing me softly.
I've tried acquiring the pid a few different ways, but I only want to measure performance when timeit is run. Good chance I'll have to break this out separately and run it that way (instead of multiprocessing) to evaluate the function properly, I'm guessing.
There are at least three major problems with your code. The net result is that you are not actually doing any multiprocessing.
The first problem is here, and in a couple of other similar places:
p2 = Process(target=Encryption())
What this code passes to Process is not the function Encryption but the returned value from Encryption(). It is exactly the same as if you had written:
x = Encryption()
p2 = Process(target=x)
What you want is this:
p2 = Process(target=Encryption)
This code tells Python to create a new Process and execute the function Encryption() in that Process.
The second problem has to do with the way Python handles memory for Processes. Each Process lives in its own memory space. Each Process has its own local copy of global variables, so you cannot set a global variable in one Process and have another Process be aware of this change. There are mechanisms to handle this important situation, documented in the multiprocessing module. See the section titled "Sharing state between processes." The bottom line here is that you cannot simply set a global variable inside a Process and expect other Processes to see the change, as you are trying to do with pid. You have to use one of the approaches described in the documentation.
The third problem is this code pattern, which occurs for both p1 and p2.
p2 = Process(target=Encryption)
p2.start()
p2.join()
This tells Python to create a Process and to start it. Then you immediately wait for it to finish, which means that your current Process must stop at that point until the new Process is finished. You never allow two Processes to run at once, so there is no performance benefit. The only reason to use multiprocessing is to run two things at the same time, which you never do. You might as well not bother with multiprocessing at all since it is only making your life more difficult.
Finally I am not sure why you have decided to try to use multiprocessing in the first place. The functions that measure memory usage and execution time are almost certainly very fast, and I would expect them to be much faster than any method of synchronizing one Process to another. If you're worried about errors due to the time used by the diagnostic functions themselves, I doubt that you can make things better by multiprocessing. Why not just start with a simple program and see what results you get?

Get realtime output from a long-running executable using python

It's my first time asking a question on here so bear with me.
I'm trying to make a python3 program that runs executable files for x amount of time and creates a log of all output in a text file. For some reason the code I have so far works only with some executables. I'm new to python and especially subprocess so any help is appreciated.
import time
import subprocess
def CreateLog(executable, timeout=5):
time_start = time.time()
process = subprocess.Popen(executable, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, text=True)
f = open("log.txt", "w")
while process.poll() is None:
output = process.stdout.readline()
if output:
f.write(output)
if time.time() > time_start + timeout:
process.kill()
break
I was recently experimenting with crypto mining and came across nanominer, I tried using this python code on nanominer and the log file was empty. I am aware that nanominer already logs its output, but the point is why does the python code fail.
You are interacting through .poll() (R U dead yet?) and .readline().
It's not clear you want to do that.
There seems to be two cases for your long-lived child:
it runs "too long" silently
it runs forever, regularly producing output text at e.g. one-second intervals
The 2nd case is the easy one.
Just use for line in process.stdout:, consume the line,
peek at the clock, and maybe send a .kill() just as you're already doing.
No need for .poll(), as child exiting will produce EOF on that pipe.
For the 1st case, you will want to set an alarm.
See https://docs.python.org/3/library/signal.html#example
signal.signal(signal.SIGALRM, handler)
signal.alarm(5)
After "too long", five seconds, your handler will run.
It can do anything you desire.
You'll want it to have access to the process handle,
which will let you send a .kill().

Threads will not close off after program completion

I have a script that receives temperature data via using requests. Since I had to make multiple requests (around 13000) I decided to explore the use of multi-threading which I am new at.
The programs work by grabbing longitude/latitude data from a csv file and then makes a request to retrieve the temperature data.
The problem that I am facing is that the script does not finish fully when the last temperature value is retrieved.
Here is the code. I have shortened so it is easy to see what I am doing:
num_threads = 16
q = Queue(maxsize=0)
def get_temp(q):
while not q.empty():
work = q.get()
if work is None:
break
## rest of my code here
q.task_done()
At main:
def main():
for o in range(num_threads):
logging.debug('Starting Thread %s', o)
worker = threading.Thread(target=get_temp, args=(q,))
worker.setDaemon(True)
worker.start()
logging.info("Main Thread Waiting")
q.join()
logging.info("Job complete!")
I do not see any errors on the console and temperature is being successfully being written to another file. I have a tried running a test csv file with only a few longitude/latitude references and the script seems to finish executing fine.
So is there a way of shedding light as to what might be happening in the background? I am using Python 3.7.3 on PyCharm 2019.1 on Linux Mint 19.1.
the .join() function waits for all threads to join before continuing to the next line

Python pool.apply_async() doesn't call target function?

I'm writing an optimization routine to brute force search a solution space for optimal hyper parameters; and apply_async does not appear to be doing anything at all. Ubuntu Server 16.04, Python 3.5, PyCharm CE 2018. Also, I'm doing this on an Azure virtual machine. My code looks like this:
class optimizer(object):
def __init__(self,n_proc,frame):
# Set Class Variables
def prep(self):
# Get Data and prepare for optimization
def ret_func(self,retval):
self.results = self.results.append(retval)
print('Something')
def search(self):
p = multiprocessing.Pool(processes=self.n_proc)
for x, y in zip(repeat(self.data),self.grid):
job = p.apply_async(self.bot.backtest,(x,y),callback=self.ret_func)
p.close()
p.join()
self.results.to_csv('OptimizationResults.csv')
print('***************************')
print('Exiting, Optimization Complete')
if __name__ == '__main__':
multiprocessing.freeze_support()
opt = optimizer(n_proc=4,frame='ytd')
opt.prep()
print('Data Prepped, beginning search')
opt.search()
I was running this exact setup on a Windows Server VM, and I switched over due to issues with multiprocessing not utilizing all cores. Today, I configured my machine and was able to run the optimization one time only. After that, it mysteriously stopped working with no change from me. Also, I should mention that it spits out output every 1 in 10 times I run it. Very odd behavior. I expect to see:
Something
Something
Something
.....
Which would typically be the best "to-date" results of the optimization (omitted for clarity). Instead I get:
Data Prepped, beginning search
***************************
Exiting, Optimization Complete
If I call get() on the async object, the results are printed as expected, but only one core is utilized because the results are being gathered in the for loop. Why isn't apply_async doing anything at all? I should mention that I use the "stop" button on Pycharm to terminate the process, not sure if this has something to do with it?
Let me know if you need more details about prep(), or bot.backtest()
I found the error! Basically I was converting a dict() to a list() and passing the values from the list into my function! The list parameter order was different every time I ran the function, and one of the parameters needed to be an integer, not a float.
For some reason, on windows, the order of the dict was preserved when converting to a list; not the case with Ubuntu! Very interesting.

What makes Python3's print function thread safe?

I've seen on various mailing lists and forums that people keep mentioning that the print function in Python 3 is thread safe. From my own testing, I see no reason to doubt that.
import threading
import time
import random
def worker(letter):
print(letter * 50)
threads = [threading.Thread(target=worker, args=(let,)) for let in "ABCDEFGHIJ"]
for t in threads:
t.start()
for t in threads:
t.join()
When I run it with Python 3, even though some of the lines may be out of order, they are still always on their own lines. With Python 2, however, the output is fairly sporadic. Some lines are joined together or indented. This is also the case when I from __future__ import print_function
Python 2.7 builtin_print <- not thread safe
Python 3.6 builtin_print <- thread safe?
I'm just trying to understand WHY this is the case?
For Python 3.7: The print() function is a builtin, it by default sends output to sys.stdout, the documentation of which says, among other things:
When interactive, stdout and stderr streams are line-buffered.
Otherwise, they are block-buffered like regular text files. You can
override this value with the -u command-line option.
So its really the combination of interactive mode and sys.stderr that is responsible for the behaviour of the print function as demonstrated in the example.
And we can get closer to the truth if the worker function in your example program is changed to
def worker(letter):
print(letter*25, letter*25, sep='\n')
then we get outputs similar to the one below, which clearly shows that print in itself is not thread safe, what you can expect is that individual lines do not get interleaved with each other.
DDDDDDDDDDDDDDDDDDDDDDDDDJJJJJJJJJJJJJJJJJJJJJJJJJ
JJJJJJJJJJJJJJJJJJJJJJJJJ
DDDDDDDDDDDDDDDDDDDDDDDDDGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHHHHHHHHHHHHH
FFFFFFFFFFFFFFFFFFFFFFFFF
IIIIIIIIIIIIIIIIIIIIIIIIICCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
IIIIIIIIIIIIIIIIIIIIIIIII
EEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEFFFFFFFFFFFFFFFFFFFFFFFFF
BBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBBBBBBBBB
So ultimately thread safety of print is determined by the buffering strategy used.

Resources