Python3 Multiprocessing Windows and Linux - linux

I have a singleton class which I use. I make an instance of this class on my Main Thread.
I Create a few multi-processes and what seems logical (due to how the GIL works) to me is that when I instantiate the same class there its a new instance, it cannot see the one instantiated at my Main Thread.
But if I do the same on Linux, when creating my multi-processes and instantiate my class it returns the pointer to the instance created at my Main Thread and can thus see the data from that one.
This might be touching on some basic principle of threading in both Linux and Windows I do not understand. But if someone could help me briefly to understand or point to where I can read about it
Regards
EDIT
On Windows:
def child_process_function():
SingletonThread = instantiate_singleton() <-- This will be a new instance
do_work()
SingletonMain = instantiate_singleton()
process = multiprocessing.Process(target=child_process_function)
process.start()
On Linux
def child_process_function():
SingletonThread = instantiate_singleton() <-- This will contain the data from the SingletonMain.
do_work()
SingletonMain = instantiate_singleton()
process = multiprocessing.Process(target=child_process_function)
process.start()
Now after reading your link the Linux SingletonThread does not point to the SingletonMain as I thought at first its simply just data copied. However my problem is that because the data does not get copied on windows I send the data through a Queue (The data changes all the time so this is the desired method). However there is a Overwrite block on the class (it contains settings) so if the data is already there as on Linux I get and write error on the class. However adding the.
import multiprocessing as mp
mp.set_start_method('spawn')
Fixes my problem.

When spawning a new process, UNIX OSes use the fork strategy which literally duplicates the parent process so that the child has the exact copy of the parent memory and resources. The memory is duplicated and not shared. Therefore, changes applied on the child singleton instance won't be seen by the parent and vice-versa.
You are misled by the fact you see the "same pointer" because of how virtual memory is managed in modern OSes. The pointers you see are the same because they start from the same offset but they actually belong to different memory pages. I gave a similar explanation here.
You can read more about process launching mechanisms in the multiprocessing module documentation.

Related

Is there a reason for this difference between Python threads and processes?

When a list object is passed to a python (3.9) Process and Thread, the additions to the list object done in the thread are seen in the parent but not the additions done in the process. E. g.,
from multiprocessing import Process
from threading import Thread
def job(x, out):
out.append(f'f({x})')
out = []
pr = Process(target=job, args=('process', out))
th = Thread(target=job, args=('thread', out))
pr.start(), th.start()
pr.join(), th.join()
print(out)
This prints ['f(thread)']. I expected it to be (disregard the order) ['f(thread)', 'f(process)'].
Could someone explain the reason for this?
There's nothing Python-specific about it; that's just how processes work.
Specifically, all threads running within a given process share the process's memory-space -- so e.g. if thread A changes the state of a variable, thread B will "see" that change.
Processes, OTOH, each get their own private memory space that is inaccessible to all other processes. That's done deliberately as a way to prevent process A from accidentally (or deliberately) reading or corrupting the memory of process B.
When you spawn a child process, the new child process gets its own memory-space that initially contains a copy of all the data in the parent's memory space, but it is a separate space, so changes made by the child will not be visible to the parent (and vice-versa).

Why does some widgets don't update on Qt5?

I am trying to create a PyQt5 application, where I have used certain labels for displaying status variables. To update them, I have implemented custom pyqtSignal manually. However, on debugging I find that the value of GUI QLabel have changed but the values don't get reflected on the main window.
Some answers suggested calling QApplication().processEvents() occasionally. However, this instantaneously crashes the application and also freezes the application.
Here's a sample code (all required libraries are imported, it's just the part creating problem, the actual code is huge):
from multiprocessing import Process
def sub(signal):
i = 0
while (True):
if (i % 5 == 0):
signal.update(i)
class CustomSignal(QObject):
signal = pyqtSignal(int)
def update(value):
self.signal.emit(value)
class MainApp(QWidget):
def __init__(self):
super().__init__()
self.label = QLabel("0");
self.customSignal = CustomSignal()
self.subp = Process(target=sub, args=(customSignal,))
self.subp.start()
self.customSignal.signal.connect(self.updateValue)
def updateValue(self, value):
print("old value", self.label.text())
self.label.setText(str(value))
print("new value", self.label.text())
The output of the print statements is as expected. However, the text in label does not change.
The update function in CustomSignal is called by some thread.
I've applied the same method to update progress bar which works fine.
Is there any other fix for this, other than processEvents()?
The OS is Ubuntu 16.04.
The key problem lies in the very concept behind the code.
Processes have their own address space, and don't share data with another processes, unless some inter-process communication algorithm is used. Perhaps, multithreading module was used instead of threading module to bring concurrency to avoid Python's GIL and speedup the program. However, subprocess has cannot access the data of parent process.
I have tested two solutions to this case, and they seem to work.
threading module: No matter threading in Python is inefficient due to GIL, but it's still sufficient to some extent for basic concurrency demands. Note the difference between concurrency and speedup.
QThread: Since you are using PyQt, there's isn't any issue in using QThread, which is a better option because it takes concurrency to multiple cores taking advantage of operating system's system call, rather than Python in the middle.
Try adding
self.label.repaint()
immediately after updating the text, like this:
self.label.setText(str(value))
self.label.repaint()

how to use GetDlgItemText inside and outside of worker thread

I am having a problem with using GetDlgGetItemText inside a worker thread. it is working perfectly fine outside with
TCHAR txtbuff[50];
GetDlgItemText(IDC_SLIDER1, txtbuff, 50);
SOmething = ::SendMessage(something,WM_SETTEXT,0,(LPARAM)txtbuff)
but when i try to use the same in a worker thread I get told that it doesn't take 3 arguments, because it needs it's HWND handle (from what I have gathered), which I thought was gained using winspy++ or similar, but those handles change all the time. I thought (quite wrongly because I am new to this) that I could simply use the same code inside my worker thread. how come the above code works fine outside of the worker thread? I have looked around everywhere, am I missing something blatantly obvious/simple?
The GetDlgItemText version that needs only 3 arguments is a CWnd member function. MFC translates it into the 4-argument API version using the m_hWnd member of the CWnd. So you cannot use the 3 argument version outside of a CWnd object.
Another problem is that both versions of GetDlgItemText are defined to work only for accessing child windows. So you cannot use them unless calling in the context of the parent window of the IDC_SLIDER1 control.
And another problem is that MFC does not support accessing a window/control from a thread that did not create the window. There are some limited cases where this works but it is poor practice. Even when it works it causes unobvious interactions between the threads, possibly leading to a deadlock.
Put all of your interactions with the GUI into the main thread. And put all your interactions with child windows (i.e. controls) in the parent window class.

IronPython: StartThread creates 'DummyThreads' that don't get cleaned up

The problem I have is that in my IronPython application threads are being created but never cleaned up, even when the method they run has exited. In my application I start threads in two ways: a) by using Python-style threads (sub-classes of threading.Thread that do something in their run() method), and b) by using the .NET 'ThreadStart' approach. The Python-style threads behave as expected, and after their 'run()' exits they get cleaned up. The .NET style threads never get cleaned up, even after they have exited. You can call del, Abort, whatever you want, and it has no effect on them.
The following IronPython script demonstrates the issue:
import System
import threading
import time
import logging
def do_beeps():
logging.debug("starting do_beeps")
t_start = time.clock()
while time.clock() - t_start < 10:
System.Console.Beep()
System.Threading.Thread.CurrentThread.Join(1000)
logging.debug("exiting do_beeps")
class PythonStyleThread(threading.Thread):
def __init__(self, thread_name="PythonStyleThread"):
super(PythonStyleThread, self).__init__(name=thread_name)
def run(self):
do_beeps()
class ThreadStarter():
def start(self):
t = System.Threading.Thread(System.Threading.ThreadStart(do_beeps))
t.IsBackground = True
t.Name = "ThreadStartStyleThread"
t.Start()
if __name__ == '__main__':
logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', level=logging.DEBUG, datefmt='%H:%M:%S')
# Start some ThreadStarter threads:
for _ in range(5):
ts = ThreadStarter()
ts.start()
System.Threading.Thread.CurrentThread.Join(200)
# Start some Python-style threads:
for _ in range(5):
pt = PythonStyleThread()
pt.start()
System.Threading.Thread.CurrentThread.Join(200)
# Do something on the main thread:
for _ in range(30):
print(".")
System.Threading.Thread.CurrentThread.Join(1000)
When this is debugged in PyDev, what I see is that all the threads appear as expected in the 'debug' view as they are created:
but whereas the Python-style ones disappear after they've finished, the .NET / ThreadStart ones stay until the main thread exits.
As can be seen in the image, in the debugger the problematic threads appear with names 'Dummy-4', 'Dummy-5' etc, whereas the Pythonic ones appear with the name I've given them ('PythonStyleThread'). Looking in the threading.py file in my IronPython installation I see there is a class called "_DummyThread", a subclass of Thread, that sets its 'name' as 'name=_newname("Dummy-%d")', so it looks like by using ThreadStart I'm ending up with _DummyThreads. The comment for the class also says:
# Dummy thread class to represent threads not started here.
# These aren't garbage collected when they die, nor can they be waited for.
which would explain why I can't get rid of them.
But I don't want 'DummyThread's. I just want normal ones, that behave nicely and get garbage-collected when they've finished doing their thing.
Now, a slightly odd thing about all of this is that unless I set up the logger I don't see the DummyThread entries in the debugger at all (although they still run). This may be a funny of the PyDev debugger, or it may relevant. Is there any sane reason why logging should have any bearing on this? Can I solve my problem just by not logging in my thread?
Here, it says:
"There is the possibility that "dummy thread objects" are created. These are thread objects corresponding to "alien threads", which are threads of control started outside the threading module, such as directly from C code. Dummy thread objects have limited functionality; they are always considered alive and daemonic, and cannot be joined. They are never deleted, since it is impossible to detect the termination of alien threads."
Which makes me wonder why I've had the misfortune of ending up with them?
While I have a workaround in that I can use Python-style threading.Thread subclasses everywhere I currently use .NET 'ThreadStart' threads, I am not keen to do this as the reason I was using .NET style threads in certain places was because they give me an Abort method (whereas the Python ones don't). I know Aborting threads is a Bad Thing, but the application is a unit-testing framework, and I a) need to run unit-tests in a thread, and b) have no control over their contents (they are written by third-parties), so I have no means of periodically checking for a 'please shut me down nicely' flag on these threads, and in extremis may need to kill them rudely.
So a) why am I getting DummyThreads, b) has this got anything to do with logging and c) what can I do about it?
Thanks.

What multithreading package for Lua "just works" as shipped?

Coding in Lua, I have a triply nested loop that goes through 6000 iterations. All 6000 iterations are independent and can easily be parallelized. What threads package for Lua compiles out of the box and gets decent parallel speedups on four or more cores?
Here's what I know so far:
luaproc comes from the core Lua team, but the software bundle on luaforge is old, and the mailing list has reports of it segfaulting. Also, it's not obvious to me how to use the scalar message-passing model to get results ultimately into a parent thread.
Lua Lanes makes interesting claims but seems to be a heavyweight, complex solution. Many messages on the mailing list report trouble getting Lua Lanes to build or work for them. I myself have had trouble getting the underlying "Lua rocks" distribution mechanism to work for me.
LuaThread requires explicit locking and requires that communication between threads be mediated by global variables that are protected by locks. I could imagine worse, but I'd be happier with a higher level of abstraction.
Concurrent Lua provides an attractive message-passing model similar to Erlang, but it says that processes do not share memory. It is not clear whether spawn actually works with any Lua function or whether there are restrictions.
Russ Cox proposed an occasional threading model that works only for C threads. Not useful for me.
I will upvote all answers that report on actual experience with these or any other multithreading package, or any answer that provides new information.
For reference, here is the loop I would like to parallelize:
for tid, tests in pairs(tests) do
local results = { }
matrix[tid] = results
for i, test in pairs(tests) do
if test.valid then
results[i] = { }
local results = results[i]
for sid, bin in pairs(binaries) do
local outcome, witness = run_test(test, bin)
results[sid] = { outcome = outcome, witness = witness }
end
end
end
end
The run_test function is passed in as an argument, so a package can be useful to me only if it can run arbitrary functions in parallel. My goal is enough parallelism to get 100% CPU utilization on 6 to 8 cores.
Norman wrote concerning luaproc:
"it's not obvious to me how to use the scalar message-passing model to get results ultimately into a parent thread"
I had the same problem with a use case I was dealing with. I liked lua proc due to its simple and light implementation, but my use case had C code that was calling lua, which was triggering a co-routine that needed to send/receive messages to interact with other luaproc threads.
To achieve my desired functionality I had to add features to luaproc to allow sending and receiving messages from the parent thread or any other thread not running from the luaproc scheduler. Additionally, my changes allow using luaproc send/receive from coroutines created from luaproc.newproc() created lua states.
I added an additional luaproc.addproc() function to the api which is to be called from any lua state running from a context not controlled by the luaproc scheduler in order to set itself up with luaproc for sending/receiving messages.
I am considering posting the source as a new github project or contacting the developers and seeing if they would like to pull my additions. Suggestions as to how I should make it available to others are welcome.
Check the threads library in torch family. It implements a thread pool model: a few true threads (pthread in linux and windows thread in win32) are created first. Each thread has a lua_State object and a blocking job queue that admits jobs added from the main thread.
Lua objects are copied over from main thread to the job thread. However C objects such as Torch tensors or tds data structures can be passed to job threads via pointers -- this is how limited shared memory is achieved.
This is a perfect example of MapReduce
You can use LuaRings to accomplish your parallelization needs.
Concurrent Lua might seem like the way to go, but as I note in my updates below, it doesn't run things in parallel. The approach I tried was to spawn several processes that execute pickled closures received through the message queue.
Update
Concurrent Lua seems to handle first-class functions and closures without a hitch. See the following example program.
require 'concurrent'
local NUM_WORKERS = 4 -- number of worker threads to use
local NUM_WORKITEMS = 100 -- number of work items for processing
-- calls the received function in the local thread context
function worker(pid)
while true do
-- request new work
concurrent.send(pid, { pid = concurrent.self() })
local msg = concurrent.receive()
-- exit when instructed
if msg.exit then return end
-- otherwise, run the provided function
msg.work()
end
end
-- creates workers, produces all the work and performs shutdown
function tasker()
local pid = concurrent.self()
-- create the worker threads
for i = 1, NUM_WORKERS do concurrent.spawn(worker, pid) end
-- provide work to threads as requests are received
for i = 1, NUM_WORKITEMS do
local msg = concurrent.receive()
-- send the work as a closure
concurrent.send(msg.pid, { work = function() print(i) end, pid = pid })
end
-- shutdown the threads as they complete
for i = 1, NUM_WORKERS do
local msg = concurrent.receive()
concurrent.send(msg.pid, { exit = true })
end
end
-- create the task process
local pid = concurrent.spawn(tasker)
-- run the event loop until all threads terminate
concurrent.loop()
Update 2
Scratch all of that stuff above. Something didn't look right when I was testing this. It turns out that Concurrent Lua isn't concurrent at all. The "processes" are implemented with coroutines and all run cooperatively in the same thread context. That's what we get for not reading carefully!
So, at least I eliminated one of the options I guess. :(
I realize that this is not a works-out-of-the-box solution, but, maybe go old-school and play with forks? (Assuming you're on a POSIX system.)
What I would have done:
Right before your loop, put all tests in a queue, accessible between processes. (A file, a Redis LIST or anything else you like most.)
Also before the loop, spawn several forks with lua-posix (same as the number of cores or even more depending on the nature of tests). In parent fork wait until all children will quit.
In each fork in a loop, get a test from the queue, execute it, put results somewhere. (To a file, to a Redis LIST, anywhere else you like.) If there are no more tests in queue, quit.
In the parent fetch and process all test results as you do now.
This assumes that test parameters and results are serializable. But even if they are not, I think that it should be rather easy to cheat around that.
I've now built a parallel application using luaproc. Here are some misconceptions that kept me from adopting it sooner, and how to work around them.
Once the parallel threads are launched, as far as I can tell there is no way for them to communicate back to the parent. This property was the big block for me. Eventually I realized the way forward: when it's done forking threads, the parent stops and waits. The job that would have been done by the parent should instead be done by a child thread, which should be dedicated to that job. Not a great model, but it works.
Communication between parent and children is very limited. The parent can communicate only scalar values: strings, Booleans, and numbers. If the parent wants to communicate more complex values, like tables and functions, it must code them as strings. Such coding can take place inline in the program, or (especially) functions can be parked into the filesystem and loaded into the child using require.
The children inherit nothing of the parent's environment. In particular, they don't inherit package.path or package.cpath. I had to work around this by the way I wrote the code for the children.
The most convenient way to communicate from parent to child is to define the child as a function, and to have the child capture parental information in its free variables, known in Lua parlances as "upvalues." These free variables may not be global variables, and they must be scalars. Still, it's a decent model. Here's an example:
local function spawner(N, workers)
return function()
local luaproc = require 'luaproc'
for i = 1, N do
luaproc.send('source', i)
end
for i = 1, workers do
luaproc.send('source', nil)
end
end
end
This code is used as, e.g.,
assert(luaproc.newproc(spawner(randoms, workers)))
This call is how values randoms and workers are communicated from parent to child.
The assertion is essential here, as if you forget the rules and accidentally capture a table or a local function, luaproc.newproc will fail.
Once I understood these properties, luaproc did indeed work "out of the box", when downloaded from askyrme on github.
ETA: There is an annoying limitation: in some circumstances, calling fread() in one thread can prevent other threads from being scheduled. In particular, if I run the sequence
local file = io.popen(command, 'r')
local result = file:read '*a'
file:close()
return result
the read operation blocks all other threads. I don't know why this is---I assume it is some nonsense going on within glibc. The workaround I used was to call directly to read(2), which required a little glue code, but this works properly with io.popen and file:close().
There's one other limitation worth noting:
Unlike Tony Hoare's original conception of communicating sequential processing, and unlike most mature, serious implementations of synchronous message passing, luaproc does not allow a receiver to block on multiple channels simultaneously. This limitation is serious, and it rules out many of the design patterns that synchronous message-passing is good at, but it's still find for many simple models of parallelism, especially the "parbegin" sort that I needed to solve for my original problem.

Resources