File read access from threads

File read access from threads - multithreading

I have a static class that contains a number of functions that read values from configuration files. The configuration files are provided with the software and the software itself NEVER writes to them.
I have a number of threads that are running in my application and I need to call a function in the static class. The function will then go to one of the configuration files, look up a value (depending on the parameter that I pass when I call the function) and then return a result.
I need the threads to be able to read the file all at the same time (or rather, without synchronising to the main thread). The threads will NEVER write to the configuration files.
My question is simply, therefore, will there be any issues in allowing multiple threads to call the same static functions to read values from the same file at the same time? I can appreciate that there would be serialization issues if some threads were writing to the file while others were reading, but this will never happen.
Basically:
1. Are there any issues allowing multiple threads to read from the same file at the same time?
2. Are there any issues allowing multiple threads to call the same static functions (in the same static class) at the same time?

Yes, this CAN be an issue, depending on how the class is actually locating and reading from the files, and more so if the class is also caching the values so it does not need to read from the files every time. Without seeing your class's actual code, there is no way to tell you whether your code is thread-safe or not.

Related

Azure durable entity or static variables?

Question: Is it thread-safe to use static variables (as a shared storage between orchestrations) or better to save/retrieve data to durable-entity?
There are couple of azure functions in the same namespace: hub-trigger, durable-entity, 2 orchestrations (main process and the one that monitors the whole process) and activity.
They all need some shared variables. In my case I need to know the number of main orchestration instances (start new or hold on). It's done in another orchestration (monitor)
I've tried both options and ask because I see different results.
Static variables: in my case there is a generic List, where SomeMyType holds the Id of the task, state, number of attempts, records it processed and other info.
When I need to start new orchestration and List.Add(), when I need to retrieve and modify it I use simple List.First(id_of_the_task). First() - I know for sure needed task is there.
With static variables I sometimes see that tasks become duplicated for some reason - I retrieve the task with List.First(id_of_the_task) - change something on result variable and that is it. Not a lot of code.
Durable-entity: the major difference is that I add List on a durable entity and each time I need to retrieve it I call for .CallEntityAsync("getTask") and .CallEntityAsync("saveTask") that might slow done the app.
With this approach more code and calls is required however it looks more stable, I don't see any duplicates.
Please, advice

Can't answer why you would see duplicates with the static variables approach without the code, may be because list is not thread safe and it may need ConcurrentBag but not sure. One issue with static variable is if the function app is not always on or if it can have multiple instances. Because when function unloads (or crashes) the state would be lost. Static variables are not shared across instances either so during high loads it wont work (if there can be many instances).
Durable entities seem better here. Yes they can be shared across many concurrent function instances and each entity can only execute one operation at a time so they are for sure a better option. The performance cost is a bit higher but they should not be slower than orchestrators since they perform a lot of common operations, writing to Table Storage, checking for events etc.
Can't say if its right for you but instead of List.First(id_of_the_task) you should just be able to access the orchestrators properties through the client which can hold custom data. Another idea depending on the usage is that you may be able to query the Table Storages directly with CloudTable class for the information about the running orchestrators.
Although not entirely related you can look at some settings for parallelism for durable functions Azure (Durable) Functions - Managing parallelism
Please ask any questions if I should clarify anything or if I misunderstood your question.

Performance gain in turning local variables into object attributes

I have N threads querying a webservice and generating a file, then waiting 30 seconds, then doing it all over again.
I have another N threads opening and reading those files, inserting into a database, removing the files, waiting 100 milliseconds, then doing it all over again.
In all those objects there are a lot of methods with a lot of local variables: integers, strings, arrays, and other framework-specific objects.
Recently we are increasing the number of threads to read those files, because the webservice is returning a lot more data.
What gains can I expect by turning all the local variables into object attributes (instance variables)?
I presume it's not going to be so many instantiations, since that will be done once when the object itself is instantiated.
I'm using Delphi, but I believe it can be answered to any programming language or framework.

I don't think that there will be a remarkable performance increase if you turn the local variables into object attributes. However, generating a file from one thread, reading it from another one, and then deleting the file, sounds like the real bottleneck. If there is no really good reason to use a file as temporary storage, use a single thread instead of two for querying the webservice and then writing the data to the database.

MPI I/O, matching processes to files

I have a number of files, say 100 and a number of processors, say 1000. Each proc needs to read parts of some subset of files. For instance, proc 3 needs file04.dat, file05.dat, and file09.dat, while proc 4 needs file04.dat, file07.dat, and file08.dat., etc. Which files are needed by which procs are not known at compile time and cannot be determined from any algorithm, but are easily determined during runtime from an existing metadata file.
I am trying to determine the best way to do this, using MPI I/O. It occurs to me that I could just have all the procs cycle through the files they need, calling MPI_File_open with MPI_COMM_SELF as the communicator argument. However, I'm a beginner with MPI I/O, and I suspect this would create some problems with large numbers of procs or files. Is this the case?
I have also thought that perhaps the thing to do would be to establish a separate communicator for each file, and each processor that needs a particular file would be a member of the file's associated communicator. But here, first, would that be a good idea? And second, I'm not an expert on communicators either and I can't figure out how to set up the communicators in this manner. Any ideas?
And if anyone has a completely different idea that would work better, I would be glad to hear it.

Should I worry about File handles in logging from async methods?

I'm using the each() method of the async lib and experiencing some very odd (and inconsistent) errors that appear to be File handle errors when I attempt to log to file from within the child processes.
The array that I'm handing to this method frequently has hundreds of items and I'm curious if Node is having trouble running out of available file handles as it tries to log to file from within all these simultaneous processes. The problem goes away when I comment out my log calls, so it's definitely related to this somehow, but I'm having a tough time tracking down why.
All the logging is trying to go into a single file... I'm entirely unclear on how that works given that each write (presumably) blocks, which makes me wonder how all these simultaneous processes are able to run independently if they're all sitting around waiting on the file to become available to write to.
Assuming that this IS the source of my troubles, what's the right way to log from a process such as Asnyc.each() which runs N number of processes at once?

I think you should have some adjustable limit to how many concurrent/outstanding write calls you are going to do. No, none of them will block, but I think async.eachLimit or async.queue will give you the flexibility to set the limit low and be sure things behave and then gradually increase it to find out what resource constraints you eventually bump up against.

A non-reentrant function in an API being used in a multi-threaded program

I've using the QT API in C++, but I imagine answers can come effectively from people without any prior experience with QT.
QT has a function in its XML-handling class, called setContent(), which is specified as non-reentrant. When called, setContent() reads an XML file into memory, and returns it as a data structure.
As I understand it, a non-reentrant function is one that is not safe to call simultaneously from multiple threads even if the function is called to operate on different files/objects.
So based on this, my understanding is that I would not be able to have more than one thread that opens XML files using this function unless somehow both of these threads are protected against accessing the setContent() function at the same time.
Is this correct? If so, seems like a really poor way to write an API as this doesn't seem like a function at all that intuitively would raise multi-threading problems. In addition, no mutex is provided at all by the API.
So in order to use this function in my multi-threaded program, where more than one thread will be opening different XML files, what's the best way to handle access to the setContent() function? Should I create an extern mutex in a header file on its own that is included by every file that will access XML?

Looks like it's all about static QDomImplementation::InvalidDataPolicy invalidDataPolicy. It's the only static data that QDom*** classes use.
setContent and a bunch of global functions use its value when parsing, and if another thread changes it in the middle, obviously something may happen.
I suppose if your program never calls setInvalidDataPolicy(), you're safe to parse XML from different threads.

So based on this, my understanding is that I would not be able to have
more than one thread that opens XML files using this function unless
somehow both of these threads are protected against accessing the
setContent() function at the same time.
I think you're correct.
So in order to use this function in my multi-threaded program, where
more than one thread will be opening different XML files, what's the
best way to handle access to the setContent() function? Should I
create an extern mutex in a header file on its own that is included by
every file that will access XML?
Again, I tend to agree with you regarding the mutex. (By the way, Qt provides the QMutex) But I'm not sure what you mean by an extern mutex in a header file, so I'll just make sure to instantiate exactly one mutex, and dispatch a pointer to this mutex to all the threads that require it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

File read access from threads - multithreading

Related

Azure durable entity or static variables?

Performance gain in turning local variables into object attributes

MPI I/O, matching processes to files

Should I worry about File handles in logging from async methods?

A non-reentrant function in an API being used in a multi-threaded program

Categories

Resources