Entity Framework + Multiple Threads + Lazy Load

Entity Framework + Multiple Threads + Lazy Load - multithreading

I'm having issues with Entity Framework and multiple threads and I am wondering if there is a solution that keeps the ability to lazy load. From my understanding the data context is not thread safe which is why when I have multiple threads using the same data context I get various data readers error. The solution to this problem is to use a separate data context for each connection to the database and then destroy the data context. Unfortunately destroying my data context then prevents me from doing lazy loading.
Is there a pattern to allow me to have a shared context across my application, but still properly handle multiple threads?

No, there is no such solution. Your choices in multithreaded application are:
Context per thread
Single context producing unproxied detached entities (no lazy loading, no change tracking) with synchronization for each access to that context.
Doing the second approach with proxied attached entities is way to disaster. It would require to detect all hidden interactions with the context and make related code also synchronized. You will probably end with single threaded process running in multiple switching threads.

Related

Why is it wrong to access GUI elements from another thread? [duplicate]

This question already has answers here:
Why are most UI frameworks single threaded?
(6 answers)
Closed 7 years ago.
In every GUI library I've used (Swing, Android, Windows Forms, WPF) there's this golden rule saying that one cannot access or modify GUI elements from another thread (other than the GUI thread). I suppose this rule applies to any GUI library. Breaking this rule will most likely cause application to crash. However, I've been wondering recently, why is it so? I couldn't find any profound explanation. So what is the low-level explanation of this rule?

No piece of software is thread-safe unless it is explicitly designed and build to be so.
A GUI is a complex and stateful beast, making it thread-safe would be 'prohibitively expensive'.

There is a very simple reason for this. Usually UI functions are not thread-safe (as making them thread-safe would pessimize performance).

Of those you listed, some may be wrappers around existing mechanisms, so you have to answer the question indirectly via the underlying GUI framework. In case of multi-platform GUI frameworks like e.g. Qt, you will also have the lowest-common denominator that determines what is possible and what isn't.
Now, why is access to the GUI not thread-safe? In the cases where I'm most familiar with (win32 and X11), accesses are often performed indirectly by sending requests and sometimes waiting for the according answer. This usually works in an atomic way, even across process boundaries, so that is not directly cause of the problem. However, if you do so from multiple threads, the worst that can happen is that data is modified in an uncoordinated way. For example, if you read, modify and write the same widget from two threads, these operations might be interleaved, so that only one thread's modifications will actually be applied.
There are other reasons for not supporting cross-thread access:
In win32, the queue with the messages is thread-local, which means that only the thread that created a window will actually find and be able to handle messages for that window. I guess this a legacy from times where processes were single-threaded and the message queue was simply a global. Making it thread-local is the same approach as the one used for making errno thread-safe.
Another reason is that support objects are created inside a process that represent some GUI element. For example, the MFC (on top of win32) use a map from the OS' widget handle to a C++ object representing that object. That map is stored in thread-local storage (which follows the thread-local message queue) and the access to the C++ objects is not guarded by a mutex. Accessing these objects from different threads is bad, not because they represent GUI objects but because they are not synchronized, simple as that.
If you think about modifying the structure of a widget tree (like e.g. the DOM tree in a browser), you either have very detailed knowledge of what other parts of the application are doing or you need to lock access to the whole tree before every operation just to be safe. Needless to say, this effectively prevents any parallel operations, so you can also take the next step and require all operations to come from one thread and thus save the whole multithreading overhead.
That said, I believe that Qt and C# (and probably others) actually do support some cross-thread operations. They will work some (more or less obscure) magic that forwards the calls to the GUI thread and forwards the results back to the calling thread again. In other words, they try to make the necessary inter-thread communication more convenient for the programmer, while retaining the efficiency and simplicity of the single-threaded GUI. This is not restricted to GUI handling though but rather a general approach, only that it is especially important for the GUI.

As far as I know, that is simply not true: Every object in Java might be accesed concurrently, as far as thread-safe techniques are correctly applied. The fact is that Java Swing objects are mostly not prepared for multithreading, so you'll have to perform external synchronization.
There are several instances in which you need several threads to interoperate in a GUI: Games, visual effects, user events...
More information about the GUI and multithreading:
https://docs.oracle.com/javase/tutorial/uiswing/concurrency/dispatch.html

Thread Locking in Large Parralel Applications

I have a slightly more general question about parallelisation and threadlocking synchronization in large applications. I am working on an application with a large number of object types with a deep architecture that also utilises parallelisation of most key tasks. At present synchronisation is done with thread locking management inside each object of the system. The problem is that the locking scope is only as large as each object, whereas the object attibutes are being passed through many different objects where the attributes are losing synchronisation protection.
What is best-practice on thread management, 'synchronization contexts' &c. in large applications? It seems the only foolproof solution is to make data synchronization application wide such that data can be consumed safely by any object at any time, but this seems to violate object oriented coding concepts.
How is this problem best managed?

One approach is to make your objects read-only; a read-only object doesn't need any synchronization because there is no chance of any thread reading it while another thread writes to it (because no thread ever writes to it). Object lifetime issues can be handled using lock-free reference counting (using atomic-counters for thread safety).
Of course the down side is that if you actually want to change an object's state you can't; you have to create a new object that is a copy of the old object except for the changed part. Depending on what your application does, that overhead may or may not be acceptable.

Which implications does multithreading have on the architecture of a desktop application?

I am writing a multithreaded desktop application.
Generally
I am unsure about the implications that multitreading has on the architecture. There is a lot of literature on architecture, but I know none that takes multithreading into account. There is a lot of literature on the low level stuff of multithreading (mutexes, semaphores, etc.) but I know none that describes how these concepts are embedded into an architecture.
Which literature do you recommend to fill this gap?
Particularly
My application consist of
Presentation that creates and manages dialogs using a GUI toolkit,
Kernel that knows all about the domain of the application,
Controller that knows the Kernel and the Presentation and moderates between these two.
To be more precise, here is how files are opened:
The Presentation signals a FileOpenCommand.
The ApplicationController recieves this signal and
uses the ApplicationKernel to create a File object,
uses the ApplicationPresentation to create a FilePresentation object,
creates a FileController object, passing the File and the FilePresentation to the constructor.
The FileController registers itself as an observer on its File and FilePresentation.
Let's say File provides a long running operation Init() that should not block the user interface. There are two approaches that came to my mind:
File::Init() returns an object that encapsulates a thread and can be used to register an observer that is notified about progress, errors, completion etc. This puts a lot of responsibility into the FileController (who would be the observer), because it is now accessed from both the main thread as well as from the working thread.
Hide the working thread completely from the Controller. File::Init() would return nothing, but the ApplicationKernel would signal creation, progress and errors of long running operations in the main thread. This would drag a lot of communication though the ApplicationKernel, turning it into something like a god object.
Which of these two is the common approach to multithreading in a desktop application (if any)? Which alternative approaches do you recommend?

I suggest that you consider using the actor model. It is a concurrency abstraction that hides a lot of the details associated with threads, locks, etc.
Edit
Some additional comments spurred by #CMR's comment...
Under an actor model, I imagine that the application would still be structured using the same components as suggested in the question: Presentation, ApplicationController, etc. The difference with the actor model is that the components (now actors) would not hold direct references to each other. Rather, they would hold channels to which they could post asynchronous, immutable, messages to one another.
The sequence of events in the "open a file" case would be essentially the same in the actor model, except that channels would be passed to the FileController in step 2.3 instead of direct object references. Similarly, observer registration in occurs through channels.
So what's the difference? The main difference is that none of the code needs to be thread-aware. Threads are invisible to the application logic, being the concern of the actor framework. If one can follow the discipline of only passing immutable objects through the channels (which some actor frameworks enforce), then virtually all the difficult logic associated with thread synchronization disappears. Of course, one has to switch mindsets from a synchronous programming model to an asynchronous one -- not necessarily a trivial task. However, it is my opinion that the cost of that switch is outweighed by the benefit of not having to think about thread-safety (at least in systems of some complexity).
In UI programming in particular, asynchronous models make it much easier to give nice user feedback. For example, a UI element may kick off a long-running task, display a "working..." message, and then go to sleep. Some time later, a message arrives delivering the results of the long running task which the UI element then displays in place of the "working..." message. In similar fashion, tree views can be built incrementally as each tree node's data arrives in an incoming message.
You can think of an actor model as a generalization of the classic UI "event pump" approach -- except that every component (actor) is running its own event pump simultaneously instead of one pump dispatching to a bunch of components. Actor frameworks provide a way to run large or even huge numbers of such "simultaneous pumps" with very low overhead. In particular, a small number of threads (say, one per cpu) service a large number of actors.

DB-connection in separate thread - what's the best way?

I am creating an app that accesses a database. On every database access, the app waits for the job to be finished.
To keep the UI responsive, I want to put all the database stuff in a separate thread.
Here is my idea:
The db-thread creates all database components it needs when it is created
Now the thread just sits there and waits for a command
If it receives a command, it performs the action and goes back to idle. During that time the main thread waits.
the db-thread lives as long as the app is running
Does this sound ok?
What's the best way to get the database results from the db-thread into the main thread?
I haven't done much with threads so far, therefore I'm wondering if the db-thread can create a query component out of which the main thread reads the results. Main thread and db thread will never access the query at the same time. Will this still cause problems?

What you are looking for is the standard data access technique, called asynchronous query execution. Some data access components implement this feature in an easy-to-use manner. At least dbGo (ADO) and AnyDAC implement that. Lets consider the dbGo.
The idea is simple - you call the convenient dataset methods, like a Open. The method launches required task in a background thread and immediately returns. When the task is completed, an appropriate event will be fired, notifying the application, that the task is finished.
The standard approach with the DB GUI applications and the Open method is the following (draft):
include eoAsyncExecute, eoAsyncFetch, eoAsyncFetchNonBlock into dataset ExecuteOptions;
disconnect TDataSource.DataSet from dataset;
set dataset OnFetchComplete to a proc P;
show "Hello ! We do the hard work to process your requests. Please wait ..." dialog;
call the dataset Open method;
when the query execution will be finished, the OnFetchComplete will be called, so the P. And the P hides the "Wait" dialog and connects TDataSource.DataSet back to the dataset.
Also your "Wait" dialog may have a Cancel button, which an user may use to cancel a too long running query.

First of all - if you haven't much experience with multi-threading, don't start with the VCL classes. Use the OmniThreadLibrary, for (among others) those reasons:
Your level of abstraction is the task, not the thread, a much better way of dealing with concurrency.
You can easily switch between executing tasks in their own thread and scheduling them with a thread pool.
All the low-level details like thread shutdown, bidirectional communication and much more are taken care of for you. You can concentrate on the database stuff.
The db-thread creates all database components it needs when it is created
This may not be the best way. I have generally created components only when needed, but not destroyed immediately. You should definitely keep the connection open in a thread pool thread, and close it only once the thread has been inactive for some time and the pool disposes of it. But it is also often a good idea to keep a cache of transaction and statement objects.
If it receives a command, it performs the action and goes back to idle. During that time the main thread waits.
The first part is being handled fine when OTL is used. However - don't have the main thread wait, this will bring little advantage over performing the database access directly in the VCL thread in the first place. You need an asynchronous design to make best use of multiple threads. Consider a standard database browser form that has controls for filtering records. I handle this by (re-)starting a timer every time one of the controls changes. Once the user finishes editing the timer event fires (say after 500 ms), and a task is started that executes the statement that fetches data according to the filter criteria. The grid contents are cleared, and it is repopulated only when the task has finished. This may take some time though, so the VCL thread doesn't wait for the task to complete. Instead the user could even change the filter criteria again, in which case the current task is cancelled and a new one started. OTL gives you an event for task completion, so the asynchronous design is easy to achieve.
What's the best way to get the database results from the db-thread into the main thread?
I generally don't use data aware components for multi-threaded db apps, but use standard controls that are views for business objects. In the database tasks I create these objects, put them in lists, and the task completion event transfers the list to the VCL thread.
Main thread and db thread will never access the query at the same time.
With all components that load data on-demand you can't be sure of that. Often only the first records are fetched from the db, and fetching continues after they have been consumed. Such components obviously must not be shared by threads.

I have implemented both strategies: Thread pool and adhoc thread creation.
I suggest to begin with the adhoc thread creation, it is simpler to implement and simpler to scale.
Only move to a thread pool if (with careful evaluation) (1) there is a lot of resources (and time) invested in the creation of the thread and (2) you have a lot of creation requests.
In both cases you must deal with passing parameters and collect results. I suggest to extend the thread class with properties that allow this data passing.
Refer to the documentation of the classes, components and functions that the thread use to make sure they are thread safe, that is, they can be use simultaneously from different threads. If not, you will need to synchronize the access. In some cases you may find slight differences regarding thread safety. As an example, see DateTimeToStr.

If you create your thread at start and reuse it later whenever you need it, you have to make sure that you disconnect the db components (grid..) from the underlying datasource (disableControls) each time you're "processing" data.
For the sake of simplicity, I would inherit TThread and implement all the business logic in my own class. The result dataset would be a member of this class and I would connect it the db aware compos in with synchronize.
Anyway, it is also very important to delegate as much work as possible to the db server and keep the UI as lightweight as possible. Firebird is my favourite db server: triggers, for select, custom UDF dlls developed in Delphi, many thread safe db components with lots of examples and good support (forum) : jvUIB...
Good Luck

What is the advantage of Executors over Threads in multithreaded application

I have seen several comments to the effect that Executors are better than Threads, but if you have a number of Threads communicating via bounded buffers (as in Flow-Based Programming) why would you use Executors when you have to use Threads anyway (with newCachedThreadPool (?)). Also, I use methods like isAlive(), interrupt() - how do I get hold of the Thread handle?
Does anyone have sample code that I can plagiarize? ;-)

Executors are basically an abstraction over Threads. They make you isolate your potentially parallel logic in Runnable/Callable instances while liberating you from the duties of manually creating and starting a thread or managing a pool. You still need to handle dependencies as part of your application logic.
If you want to interact / are comfortable with Threads for your application logic, you may skip using Executors. Regarding getting hold of the thread, you can always execute Thread.currentThread() to get hold of the current thread from any executing context.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string