I have an Node.js AWS Lambda function which processes an array. The number of entries in the array can vary from just a few to as many as several hundred million. Processing each iteration takes milliseconds and makes use of shared data (i.e., each iteration uses the exact same input data except for a key). The shared data is non-trivial in size. The results of some iterations are collected into an array of results. The results of any one iteration do not have an impact on any other iteration (i.e., each is atomic).
To improve performance, I want to make this process multi-threaded. Since there will be very many jobs and each will do the exact same task, a Workerpool seemed right. I created one with n-1 (n = # processes) workers. It works, but the processing takes longer in my pool version than in my non-pool version. Research leads me to suspect the issue is the size of the shared data passed to during each invocation.
Ideally, I'd like to initialize each worker with the same shared input and just pass the relevant key with each invocation. However, I don't see an option to pass shared data when initializing the pool (https://github.com/josdejong/workerpool).
I found this 2014 SO question (Share objects in nodejs between different instances) that has a comment indicating that it isn't possible in Node.js ("Thanks for the great stuff but i was looking for simple solution like the threads in .net and java which support sharing objects between different threads").
Does Node.js support initializing workers in a workerpool with shared data? I'm new to multi-threading and would appreciate any guidance/links.
Related
I would like to cache a large amount of data in a Flask application. Currently it runs on K8S pods with the following unicorn.ini
bind = "0.0.0.0:5000"
workers = 10
timeout = 900
preload_app = True
To avoid caching the same data in those 10 workers I would like to know if Python supports a way to multi-thread instead of multi-process. This would be very easy in Java but I am not sure if it is possible in Python. I know that you can share cache between Python instances using the file system or other methods. However it would be a lot simpler if it is all share in the same process space.
Edited:
There are couple post that suggested threads are supported in Python. This comment by Filipe Correia, or this answer in the same question.
Based on the above comment the Unicorn design document talks about workers and threads:
Since Gunicorn 19, a threads option can be used to process requests in multiple threads. Using threads assumes use of the gthread worker.
Based on how Java works, to shared some data among threads, I would need one worker and multiple threads. Based on this other link
I know it is possible. So I assume I can change my gunicorn configuration as follows:
bind = "0.0.0.0:5000"
workers = 1
threads = 10
timeout = 900
preload_app = True
This should give me 1 worker and 10 threads which should be able to process the same number of request as current configuration. However the question is: Would the cache still be instantiated once and shared among all the threads? How or where should I instantiate the cache to make sure is shared among all the threads.
would like to ... multi-thread instead of multi-process.
I'm not sure you really want that. Python is rather different from Java.
workers = 10
One way to read that is "ten cores", sure.
But another way is "wow, we get ten GILs!"
The global interpreter lock must be held
before the interpreter interprets a new bytecode instruction.
Ten interpreters offers significant parallelism,
executing ten instructions simultaneously.
Now, there are workloads dominated by async I/O, or where
the interpreter calls into a C extension to do the bulk of the work.
If a C thread can keep running, doing useful work
in the background, and the interpreter gathers the result later,
terrific. But that's not most workloads.
tl;dr: You probably want ten GILs, rather than just one.
To avoid caching the same data in those 10 workers
Right! That makes perfect sense.
Consider pushing the cache into a storage layer, or a daemon like Redis.
Or access memory-resident cache, in the context of your own process,
via mmap or shmat.
When running Flask under Gunicorn, you are certainly free
to set threads greater than 1,
though it's likely not what you want.
YMMV. Measure and see.
I have N threads querying a webservice and generating a file, then waiting 30 seconds, then doing it all over again.
I have another N threads opening and reading those files, inserting into a database, removing the files, waiting 100 milliseconds, then doing it all over again.
In all those objects there are a lot of methods with a lot of local variables: integers, strings, arrays, and other framework-specific objects.
Recently we are increasing the number of threads to read those files, because the webservice is returning a lot more data.
What gains can I expect by turning all the local variables into object attributes (instance variables)?
I presume it's not going to be so many instantiations, since that will be done once when the object itself is instantiated.
I'm using Delphi, but I believe it can be answered to any programming language or framework.
I don't think that there will be a remarkable performance increase if you turn the local variables into object attributes. However, generating a file from one thread, reading it from another one, and then deleting the file, sounds like the real bottleneck. If there is no really good reason to use a file as temporary storage, use a single thread instead of two for querying the webservice and then writing the data to the database.
I have a multi step Spring Batch job and in one of steps I create Lucene indices for the data read in reader so subsequent steps can search in that Lucene index.
Based on read data in ItemReader, I spread indices to few separate directories.
If I specify, Step Task Executor to be a SimpleAsyncTaskExecutor , I don't get any issue as long as indices are always written to different directories but sometimes I get a locking exception. I guess, two threads tried to write to same Index.
If I remove SimpleAsyncTaskExecutor, I don't get any issues but write becomes sequential and slow.
Is it possible to use multi threading for a Lucene Index writer if indices are being written to a single directory?
Do I need to make index creator code to be thread safe to use SimpleAsyncTaskExecutor?
index creator code is in step processor.
I am using Lucene 6.0.0 and as per IndexWriter API Doc,
NOTE: IndexWriter instances are completely thread safe, meaning
multiple threads can call any of its methods, concurrently. If your
application requires external synchronization, you should not
synchronize on the IndexWriter instance as this may cause deadlock;
use your own (non-Lucene) objects instead.
I was creating multiple instances of writer and that was causing problems. Single writer instance can be passed to as many threads as you like provided rest of the code around that writer is thread safe.
I used a single writer instance and parallelized chunks. Each parallel chunk wrote to same directory without any issues.
To parallelize chunks, I had to made my chunk components - reader , processor and writer to be thread safe.
I am new to Qt development, the way it handles threads (signals and slots) and databases (and SQLite at that). It has been 4 weeks that I have started working on the mentioned technologies. This is the first time I'm posting a question on SO and I feel I have done research before coming to you all. This may look a little long and possibly a duplicate, but I request you all to read it thoroughly once before dismissing it off as a duplicate or tl;dr.
Context:
I am working on a Windows application that performs a certain operation X on a database. The application is developed in Qt and uses QSQLite as database engine. It's a single threaded application, i.e., the tables are processed sequentially. However, as the DB size grows (in number of tables and records), this processing becomes slower. The result of this operation X is written in a separate results table in the same DB. The processing being done is immaterial to the problem, but in basic terms here's what it does:
Read a row from Table_X_1
Read a row from Table_X_2
Do some operations on the rows (only read)
Push the results in Table_X_Results table (this is the only write being performed on the DB)
Table_X_1 and Table_X_2 are identical in number and types of columns and number of rows, only the data may differ.
What I'm trying to do:
In order to improve the performance, I am trying to make the application multi-threaded. Initially I am spawning two threads (using QtConcurrentRun). The two tables can be categorized in two types, say A and B. Each thread will take care of the tables of two types. Processing within the threads remains same, i.e., within each thread the tables are being processed sequentially.
The function is such that it uses SELECT to fetch rows for processing and INSERT to insert result in results table. For inserting the results I am using transactions.
I am creating all the intermediate tables, result tables and indices before starting my actual operation. I am opening and closing connections everytime. For the threads, I create and open a connection before entering the loop (one for each thread).
THE PROBLEM:
Inside my processing function, I get following (nasty, infamous, stubborn) error:
QSqlError(5, "Unable to fetch row", "database is locked")
I am getting this error when I'm trying to read a row from DB (using SELECT). This is in the same function in which I'm performing my INSERTs into results table. The SELECT and the INSERT are in the same transaction (begin and commit pair). For INSERT I'm using prepared statement (SQLiteStatement).
Reasons for seemingly peculiar things that I am doing:
I am using QtConcurrentRun to create the threads because it is straightforward to do! I have tried using QThread (not subclassing QThread, but the other method). That also leads to same problem.
I am compiling with DSQLITE_THREADSAFE=0 to avoid application from crashing. If I use the default (DSQLITE_THREADSAFE=1), my application crashes at SQLiteStatement::recordSet->Reset(). Also, with the default option, internal SQLITE sync mechanism comes into play which may not be reliable. If the need be, I'll employ explicit sync.
Making the application multi-threaded to improve performance, and not doing this. I'm taking care of all the optimizations recommended there.
Using QSqlDatabase::setConnectOptions with QSQLITE_BUSY_TIMEOUT=0. A link suggested that it will prevent the DB to get locked immediately and hence may give my thread(s) appropriate amount of time to "die peacefully". This failed: the DB got locked much frequently than before.
Observations:
The database goes into lock only and as soon as when one of the threads return. This behavior is consistent.
When compiling with DSQLITE_THREADSAFE=1, the application crashes when one of the threads return. Call stack points at SQLiteStatement::recordSet->Reset() in my function, and at winMutexEnter() (called from EnterCriticalSection()) in sqlite3.c. This is consistent as well.
The threads created using QtConcurrentRun do not die immediately.
If I use QThreads, I can't get them to return. That is to say, I feel the thread never returns even though I have connected the signals and the slots correctly. What is the correct way to wait for threads and how long it takes them to die?
The thread that finishes execution never returns, it has locked the DB and hence the error.
I checked for SQLITE_BUSY and tried to make the thread sleep but could not get it to work. What is the correct way to sleep in Qt (for threads created with QtConcurrentRun or QThreads)?
When I close my connections, I get this warning:
QSqlDatabasePrivate::removeDatabase: connection 'DB_CONN_CREATE_RESULTS' is still in use, all queries will cease to work.
Is this of any significance? Some links suggested that this warning arises because of using local QSqlDatabase, and will not arise if the connection is made a class member. However, could it be the reason for my problem?
Further experiments:
I am thinking of creating another database which will only contain results table (Table_X_Results). The rationale is that while the threads will read from one DB (the one that I have currently), they will get to write to another DB. However, I may still face the same problem. Moreover, I read on the forums and wikis that it IS possible to have two threads doing read and write on same DB. So why can I not get this scenario to work?
I am currently using SQLITE version 3.6.17. Could that be the problem? Will things be better if I used version 3.8.5?
I was trying to post the web resources that I have already explored, but I get a message saying "I'd need 10 reps to post more than 2 links". Any help/suggestions would be much appreciated.
I am building a simple application to download a set of XML files and parse them into a database using the async module (https://npmjs.org/package/node-async) for flow control. The overall flow is as follows:
Download list of datasets from API (single Request call)
Download metadata for each dataset to get link to XML file (async.each)
Download XML for each dataset (async.parallel)
Parse XML for each dataset into JSON objects (async.parallel)
Save each JSON object to a database (async.each)
In effect, for each dataset there is a parent process (2) which sets of a series of asynchronous child processes (3, 4, 5). The challenge that I am facing is that, because so many parent processes fire before all of the children of a particular process are complete, child processes seem to be getting queued up in the event loop, and it takes a long time for all of the child processes for a particular parent process to resolve and allow garbage collection to clean everything up. The result of this is that even though the program doesn't appear to have any memory leaks, memory usage is still too high, ultimately crashing the program.
One solution which worked was to make some of the child processes synchronous so that they can be grouped together in the event loop. However, I have also seen an alternative solution discussed here: https://groups.google.com/forum/#!topic/nodejs/Xp4htMTfvYY, which pushes parent processes into a queue and only allows a certain number to be running at once. My question then is does anyone know of a more robust module for handling this type of queueing, or any other viable alternative for handling this kind of flow control. I have been searching but so far no luck.
Thanks.
I decided to post this as an answer:
Don't launch all of the processes at once. Let the callback of one request launch the next one. The overall work is still asynchronous, but each request gets run in series. You can then pool up a certain number of the connections to be running simultaneously to maximize I/O throughput. Look at async.eachLimit and replace each of your async.each examples with it.
Your async.parallel calls may be causing issues as well.