Locking mechanism for synchronization between multiple threads

Locking mechanism for synchronization between multiple threads - multithreading

I have a scenario where i am spawning multiple threads (each of them working on a browser instance). But all threads share 1 common operation which i need to synchronize. I was thinking of creating locks using file created on disk, but am afraid will that really work ?
Basically i need some locking mechanism on windows platform. I am spawning multiple threads, each thread is an automated script (javascript/batch) working over browser, but i need to synchronise any operation which triggers opening of save as/upload dialog in browser.
Regards

Depends strongly on the language, platform, OS, etc you want to develop on.
For POSIX you could achieve that e.g. with mutexes:
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_lock.html
If you develop with C++17:
http://en.cppreference.com/w/cpp/thread/mutex
Regarding your question: you could also do that with file locks. In that case you would have to wait for the file to be locked again, which can be achieved with select()

Related

How worker threads works in Nodejs?

Nodejs can not have a built-in thread API like java and .net
do. If threads are added, the nature of the language itself will
change. It’s not possible to add threads as a new set of available
classes or functions.
Nodejs 10.x added worker threads as an experiment and now stable since 12.x. I have gone through the few blogs but did not understand much maybe due to lack of knowledge. How are they different than the threads.

Worker threads in Javascript are somewhat analogous to WebWorkers in the browser. They do not share direct access to any variables with the main thread or with each other and the only way they communicate with the main thread is via messaging. This messaging is synchronized through the event loop. This avoids all the classic race conditions that multiple threads have trying to access the same variables because two separate threads can't access the same variables in node.js. Each thread has its own set of variables and the only way to influence another thread's variables is to send it a message and ask it to modify its own variables. Since that message is synchronized through that thread's event queue, there's no risk of classic race conditions in accessing variables.
Java threads, on the other hand, are similar to C++ or native threads in that they share access to the same variables and the threads are freely timesliced so right in the middle of functionA running in threadA, execution could be interrupted and functionB running in threadB could run. Since both can freely access the same variables, there are all sorts of race conditions possible unless one manually uses thread synchronization tools (such as mutexes) to coordinate and protect all access to shared variables. This type of programming is often the source of very hard to find and next-to-impossible to reliably reproduce concurrency bugs. While powerful and useful for some system-level things or more real-time-ish code, it's very easy for anyone but a very senior and experienced developer to make costly concurrency mistakes. And, it's very hard to devise a test that will tell you if it's really stable under all types of load or not.
node.js attempts to avoid the classic concurrency bugs by separating the threads into their own variable space and forcing all communication between them to be synchronized via the event queue. This means that threadA/functionA is never arbitrarily interrupted and some other code in your process changes some shared variables it was accessing while it wasn't looking.
node.js also has a backstop that it can run a child_process that can be written in any language and can use native threads if needed or one can actually hook native code and real system level threads right into node.js using the add-on SDK (and it communicates with node.js Javascript through the SDK interface). And, in fact, a number of node.js built-in libraries do exactly this to surface functionality that requires that level of access to the nodejs environment. For example, the implementation of file access uses a pool of native threads to carry out file operations.
So, with all that said, there are still some types of race conditions that can occur and this has to do with access to outside resources. For example if two threads or processes are both trying to do their own thing and write to the same file, they can clearly conflict with each other and create problems.
So, using Workers in node.js still has to be aware of concurrency issues when accessing outside resources. node.js protects the local variable environment for each Worker, but can't do anything about contention among outside resources. In that regard, node.js Workers have the same issues as Java threads and the programmer has to code for that (exclusive file access, file locks, separate files for each Worker, using a database to manage the concurrency for storage, etc...).

It comes under the node js architecture. whenever a req reaches the node it is passed on to "EVENT QUE" then to "Event Loop" . Here the event-loop checks whether the request is 'blocking io or non-blocking io'. (blocking io - the operations which takes time to complete eg:fetching a data from someother place ) . Then Event-loop passes the blocking io to THREAD POOL. Thread pool is a collection of WORKER THREADS. This blocking io gets attached to one of the worker-threads and it begins to perform its operation(eg: fetching data from database) after the completion it is send back to event loop and later to Execution.

How to control a process from a different process running in different core

I have two processes running in different cores. I want to know what is the fastest way to pause and continue one process from another process.

Well it really depends on the application and the type of communication you are looking for, so I will asume that what you want is indeed a blocking operation.
For that, I will go with Domain Sockets, they have blocking and non blocking operations and are very ease to use, with lot of examples. You can start here here in the oficial documentation for Linux.
The concept is the same for any other operating system, only the implementations could differ.

What are user threads?

What are user threads? Below explanation says they are managed by userspace... Please explain how?
Threads are sometimes implemented in userspace libraries, thus called user threads. The kernel is not aware of them, so they are managed and scheduled in userspace.

Every modern server or desktop OS, and all major mobile OSs, have a native thread library these days, so this question is not very relevant anymore. But basically, before this was the case, there were libraries -- most famously, the "Green threads library" -- which implemented cooperatively-multitasking threads as a user library. That "cooperatively multitasking" part is the important part: in general, such a library switches from one thread to another only when the thread calls some method that allows a switch to happen ("sleep", "yield", etc.) A user library generally can't do preemptive time-slicing; that's something that has to be done at the OS level.

Symbian OS has an Active Object framework that allows async event handling in a single thread
http://en.wikipedia.org/wiki/Active_object_%28Symbian_OS%29
Windows also has Fibres:
http://msdn.microsoft.com/en-us/library/ms682661%28v=vs.85%29.aspx

Kernel threads (also called lightweight process) are handeled by the system. They offer several interesting benefits, the main one being that two threads can be scheduled on two different processors in the hope that this will reduce the execution time of your process.
However threads are often used as a programming model. A typical example is a multi-client webserver that waits for incoming connexion and simultaneously exchange data with its connected clients. In this case the programmer may want to create a lot of threads and switch between them very quickly. System threads are not very adapted to this. The number of kernel threads is limited (to few undreads) and any basic operation (creation destruction switching locking) is costly since it must be executed in kernel space.
The user threads on the other hand, can be implemented using set_jmp() and long_jmp() inside a user library. Since they don't involve the kernel an application can create/destroy and switch between user threads very efficiently.
As Ernest said, user threads are not very common any more, however there exists a hybrid solution that can take advantages of the two worlds.
http://en.wikipedia.org/wiki/Thread_(computer_science)#N:M_.28Hybrid_threading.29

Delphi - Creating a control that runs in its own process

HI
I have a control that accesses a database using proprietary datasets. The database is an old ISAM bases database.
The control uses a background thread to query the database using the proprietary datasets.
A form will have several of these controls on it, each using their own thread to access the data as they all need to load simultaneously.
The proprietary datasets handle concurrency by displaying a VCL TForm notifying the user that the table being opened is locked by another user and that the dataset is waiting for the lock to be released.
The form has a cancel button on it which lets the user cancel the lock wait.
The problem:
When using the proprietary datasets from within a thread, the application will crash, hang or give some error if the lock wait form it displayed. I suspect this is to do with the VCL not being thread safe.
I have solved the issue by synchronizing Dataset.Open however this holds up the main thread until the dataset.open returns, which can take a considerable amount of time depending on the complexity of the query.
I have displayed a modal progress bar which lets to user know that something it happening but I don't like this idea as the user will be sitting waiting for the progress bar to complete.
The proprietary dataset code is compiled into the main application, i.e. its not stored in a separate DLL. We are not allowed to change how the locking works or whether a form is displayed or not at this stage of the development process as we are too close to release.
Ideally I would like to have Dataset.open run in the controls thread as well instead of having the use the main thread, however this doesn't seem likely to work.
Can anyone else suggest a work around? please.

Fibers won't help you one bit, because they are in the Windows API solely to help ease porting old code that was written with cooperative multitasking in mind. Fibers are basically a form of co-routines, they all execute in the same process, have their own stack space, and the switching between them is controlled by the user code, not by the OS. That means that the switching between them can be made to occur only at times that are safe, so no synchronization issues. OTOH that means that only one fiber can be running within one thread at the same time, so using fibers with blocking code has the same characteristics as calling blocking code from within one thread - the application becomes unresponsive.
You could use fibers together with multiple threads, but that can be dangerous and doesn't bring any benefit over using threads alone.
I have used fibers successfully within VCL applications, but only for specific purposes. Forget about them if you want to deal with potentially blocking code.
As for your problem - you should make a control that is used for display purposes only, and which uses the standard inter-process communication mechanisms to exchange data with another process that accesses your database.

COM objects can run in out-of-process mode. May be in delphi it will be a bit easier to use them, then another IPC mechanisms.

What Use are Threads Outside of Parallel Problems on MultiCore Systems?

Threads make the design, implementation and debugging of a program significantly more difficult.
Yet many people seem to think that every task in a program that can be threaded should be threaded, even on a single core system.
I can understand threading something like an MPEG2 decoder that's going to run on a multicore cpu ( which I've done ), but what can justify the significant development costs threading entails when you're talking about a single core system or even a multicore system if your task doesn't gain significant performance from a parallel implementation?
Or more succinctly, what kinds of non-performance related problems justify threading?
Edit
Well I just ran across one instance that's not CPU limited but threads make a big difference:
TCP, HTTP and the Multi-Threading Sweet Spot
Multiple threads are pretty useful when trying to max out your bandwidth to another peer over a high latency network connection. Non-blocking I/O would use significantly less local CPU resources, but would be much more difficult to design and implement.

Performing a CPU intensive task without blocking the user interface, for example.

Any application in which you may be waiting around for a resource (for example, blocking I/O from network sockets or disk devices) can benefit from threading.
In that case the thread blocking on the slow operation can be put to sleep while other threads continue to run (including, under some operating systems, the GUI thread which, if the OS cannot contact it for a while, will offer the use the chance to destroy it, thinking it's deadlocked somehow).
So it's not just for multi-core machines at all.

An interesting example is a webserver - you need to be able to handle multiple incoming connections that have nothing to do with each other.

what kinds of non-performance related
problems justify threading?
Web applications are the classic example. Each user request is conceptually a new thread. Nothing to do with performance, it's just a natural fit for the design.

Blocking code is usually much simpler to write and easier to read (and therefore maintain) than non-blocking code. Yet, using blocking code limits you to a single execution path and also locks out things like user interface (mentioned) and other IO ports. Threading is an elegant solution in these cases.
Another case when multithreading is to be considered is when you have several near-synchronous IO channels that should be managed: using multiple threads (and usually a local message queue) allows for much clearer code.

Here are a couple of specific and simple scenarios where I have launched threads...
A long running report request by the user. When the report is submitted, it is placed in a queue to be processed by a separate thread. The user can then go on within the application and check back later to see the status of their report, they aren't left with a "Processing..." page or icon.
A thread that iterates cache storage, removing data that has expired or no longer needed. The thread's job within the application is independent of any processing for a specific user, but part of the overall application run-time maintenance.
although, not specifically a threading scenario, logging within our web site is handed off to a parallel process, so the throughput of the web site isn't hindered by the time it takes to record log data.
I agree that threading just for threadings sake isn't a good idea and it can introduce problems within your application if isn't done properly, but it is an extremely useful tool for solving some problems.

Whenever you need to call some external component (be it a database query, a 3. party library, an operating system primitive etc.) that only provides a synchronous/blocking interface or using the asynchronous interface not worth the extra trouble and pain - and you also need some form of concurrency - e.g. serving multiple clients in a server or keep the GUI still responsive.

Well, how do you know if you're app is going to run on a multi-core system or not?
Beyond that, there are a lot of processes that take up time, but don't require the CPU. Such as writing to a disk or networking. Who wants to push a button in a GUI and then have to sit there and wait for a network connection. Even on a single core machine, having a separate IO thread greatly improves user experience. You always at least want a separate thread for the UI.

Yet many people seem to think that
every task in a program that can be
threaded should be threaded, even on a
single core system.
"Many people"... Who?
Also from my experience many many programs that should be multithreaded aren't (especially games.. I have an i7 and yet most games still use only 1 of my cores), so I'm not sure what you're talking about. Definitely programs like calc.exe are not multithread (or, if they are, 1 thread does 99% of the work).
Performing a CPU intensive task
without blocking the user interface,
for example.
Yes, this is true but this is fairly easy to implement and it's not what the OP is referring to (since, in this case, 1 thread does almost all the work and you only need very few mutexes)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string