We recently encountered the problem of too frequent fullgc, which made us very confused. It was observed that a large number of objects lived through younggc 15 times while processing the request, and can be collected during fullgc.
The question is how can we find these objects that can be recycled by fullgc but not by younggc? We need to use this as a point to locate the corresponding business code. I checked many documents and found no way to track these objects.
this is observed using jstat -gcold and print every second.
jstat
Related
Over 2 years ago, Remy Lebeau gave me invaluable tips on threads in Delphi. His answers were very useful to me and I feel like I made great progress thanks to him. This post can be found here.
Today, I now face a "conceptual problem" about threads. This is not really about code, this is about the approach one should choose for a certain problem. I know we are not supposed to ask for personal opinions, I am merely asking if, on a technical point a view, one of these approach must be avoided or if they are both viable.
My application has a list of unique product numbers (named SKU) in a database. Querying an API with theses SKUS, I get back a JSON file containing details about these products. This JSON file is processed and results are displayed on screen, and saved in database. So, at one step, a download process is involved and it is executed in a worker thread.
I see two different approaches possible for this whole procedure :
When the user clicks on the start button, a query is fired, building a list of SKUs based on the user criteria. A Tstringlist is then built and, for each element of the list, a thread is launched, downloads the JSON, sends back the result to the main thread and terminates.
This can be pictured like this :
When the user clicks on the start button, a query is fired, building a list of SKUs based on the user criteria. Instead of sending SKU numbers one after another to the worker thread, the whole list is sent, and the worker thread iterates through the list, sending back results for displaying and saving to the main thread (via a synchronize event). So we only have one worker thread working the whole list before terminating.
This can be pictured like this :
I have coded these two different approaches and they both work... with each their downsides that I have experienced.
I am not a professional developer, this is a hobby and, before working my way further down a path or another for "polishing", I would like to know if, on a technical point of view and according to your knowledge and experience, one of the approaches I depicted should be avoided and why.
Thanks for your time
Mathias
Another thing to consider in this case is latency to your API that is producing the JSON. For example, if it takes 30 msec to go back and forth to the server, and 0.01 msec to create the JSON on the server, then querying a single JSON record per request, even if each request is in a different thread, does not make much sense. In that case, it would make sense to do fewer requests to the server, returning more data on each request, and partition the results up among different threads.
The other thing is that threads are not a solution to every problem. I would question why you need to break each sku into a single thread. how long is each individual thread running and how much processing is each thread doing? In general, creating lots of threads, for each thread to work for a fraction of a msec does not make sense. You want the threads to be alive for as long as possible, processing as much data as they can for the job. You don't want the computer to be using as much time creating/destroying threads as actually doing useful work.
My application garbage collector used to run a major frequently, maybe once a day. But it stopped working suddenly. Now it has reached to 90 % and I had to restart the application few times.
This is in a production environment and what I allowed to do is read the logs and see the JVM states via provided UI.
Another observation I made was for the last 3 months, 1st 2 months there was no minor garbage collections but a lot of majors. For the last month no major collections but many minor collections.
Perhaps it never does a major collection because you are restarting the application before it gets a chance.
You should be getting many minor collections if the young space is a reasonable size.
If you were only getting major collections most likely your JVM wasn't tuned correctly. I would try to remove as many GC tuning parameters as possible and only add each one if you know it helps. Having too many tuning parameters set is a good way to get strange behaviour.
I am new to Qt development, the way it handles threads (signals and slots) and databases (and SQLite at that). It has been 4 weeks that I have started working on the mentioned technologies. This is the first time I'm posting a question on SO and I feel I have done research before coming to you all. This may look a little long and possibly a duplicate, but I request you all to read it thoroughly once before dismissing it off as a duplicate or tl;dr.
Context:
I am working on a Windows application that performs a certain operation X on a database. The application is developed in Qt and uses QSQLite as database engine. It's a single threaded application, i.e., the tables are processed sequentially. However, as the DB size grows (in number of tables and records), this processing becomes slower. The result of this operation X is written in a separate results table in the same DB. The processing being done is immaterial to the problem, but in basic terms here's what it does:
Read a row from Table_X_1
Read a row from Table_X_2
Do some operations on the rows (only read)
Push the results in Table_X_Results table (this is the only write being performed on the DB)
Table_X_1 and Table_X_2 are identical in number and types of columns and number of rows, only the data may differ.
What I'm trying to do:
In order to improve the performance, I am trying to make the application multi-threaded. Initially I am spawning two threads (using QtConcurrentRun). The two tables can be categorized in two types, say A and B. Each thread will take care of the tables of two types. Processing within the threads remains same, i.e., within each thread the tables are being processed sequentially.
The function is such that it uses SELECT to fetch rows for processing and INSERT to insert result in results table. For inserting the results I am using transactions.
I am creating all the intermediate tables, result tables and indices before starting my actual operation. I am opening and closing connections everytime. For the threads, I create and open a connection before entering the loop (one for each thread).
THE PROBLEM:
Inside my processing function, I get following (nasty, infamous, stubborn) error:
QSqlError(5, "Unable to fetch row", "database is locked")
I am getting this error when I'm trying to read a row from DB (using SELECT). This is in the same function in which I'm performing my INSERTs into results table. The SELECT and the INSERT are in the same transaction (begin and commit pair). For INSERT I'm using prepared statement (SQLiteStatement).
Reasons for seemingly peculiar things that I am doing:
I am using QtConcurrentRun to create the threads because it is straightforward to do! I have tried using QThread (not subclassing QThread, but the other method). That also leads to same problem.
I am compiling with DSQLITE_THREADSAFE=0 to avoid application from crashing. If I use the default (DSQLITE_THREADSAFE=1), my application crashes at SQLiteStatement::recordSet->Reset(). Also, with the default option, internal SQLITE sync mechanism comes into play which may not be reliable. If the need be, I'll employ explicit sync.
Making the application multi-threaded to improve performance, and not doing this. I'm taking care of all the optimizations recommended there.
Using QSqlDatabase::setConnectOptions with QSQLITE_BUSY_TIMEOUT=0. A link suggested that it will prevent the DB to get locked immediately and hence may give my thread(s) appropriate amount of time to "die peacefully". This failed: the DB got locked much frequently than before.
Observations:
The database goes into lock only and as soon as when one of the threads return. This behavior is consistent.
When compiling with DSQLITE_THREADSAFE=1, the application crashes when one of the threads return. Call stack points at SQLiteStatement::recordSet->Reset() in my function, and at winMutexEnter() (called from EnterCriticalSection()) in sqlite3.c. This is consistent as well.
The threads created using QtConcurrentRun do not die immediately.
If I use QThreads, I can't get them to return. That is to say, I feel the thread never returns even though I have connected the signals and the slots correctly. What is the correct way to wait for threads and how long it takes them to die?
The thread that finishes execution never returns, it has locked the DB and hence the error.
I checked for SQLITE_BUSY and tried to make the thread sleep but could not get it to work. What is the correct way to sleep in Qt (for threads created with QtConcurrentRun or QThreads)?
When I close my connections, I get this warning:
QSqlDatabasePrivate::removeDatabase: connection 'DB_CONN_CREATE_RESULTS' is still in use, all queries will cease to work.
Is this of any significance? Some links suggested that this warning arises because of using local QSqlDatabase, and will not arise if the connection is made a class member. However, could it be the reason for my problem?
Further experiments:
I am thinking of creating another database which will only contain results table (Table_X_Results). The rationale is that while the threads will read from one DB (the one that I have currently), they will get to write to another DB. However, I may still face the same problem. Moreover, I read on the forums and wikis that it IS possible to have two threads doing read and write on same DB. So why can I not get this scenario to work?
I am currently using SQLITE version 3.6.17. Could that be the problem? Will things be better if I used version 3.8.5?
I was trying to post the web resources that I have already explored, but I get a message saying "I'd need 10 reps to post more than 2 links". Any help/suggestions would be much appreciated.
I made a multiThread download application, and now I got to show the progress of each downloading Thread, like in IDM, When Data is downloaded the progressbar is notified about downloaded data, and as you know each thread position in progressBar had to begin from a specified position, now the question is:
How can I increment progressposition according to downloaded data, it is pretty simple in monothread by using IDHTTPWORK, so can I use the same method in multithread application or is there another simple method to implement?
Do I need to synchronise the instructions that increment position?
Suppose you have N downloads, of known size M[i] bytes. Before you start downloading, sum these values to get the total number of bytes to be downloaded, M.
While the threads are working they keep track of how many bytes have been downloaded so far, m[i] say. Then, at any point in time the proportion of the task that is complete is:
Sum(m[i]) / M
You can update the progress out of the main thread using a timer. Each time the timer fires, calculate the sum of the m[i] counts. There's no need for synchronisation here so long as the m[i] values are aligned. Any data races are benign.
Now, m[i] might not be stored in an array. You might have an array of download thread objects. And each of those objects stored all the information relating to that download object, including m[i].
Alternatively you can use the same sort of synchronized updating as you do for single threaded code. Remove the timer and update from the made thread when you get new progress information. However, with a lot of threads there is a lot of synchronization and that can potentially lead to contention. The lock free approach above would be my preference. Even though it involves polling on the timer.
You can take a look at the subclassed MFC list controls developed in the article by Michael Dunn 15 years ago: Articles/79/Neat-Stuff-to-Do-in-List-Controls-Using-Custom-Dra on codeproject dot com.
If you implement one of them, say, CXListCtrl* pListCtrl, at thread creation time, then the progress reporting of that thread becomes as simple as making calls such as:
pListCtrl->SetProgress(mItem,0);
when it's time to start showing progress, and
pListCtrl->SetProgress(mItem,0, i);
when you're i% done.
Actually, if you just want the progress bar functionality and don't care about all that's under the hood, you could obtain and use without modification (or license issues) the class XListCtrl.cpp in the Work Queue article at Articles/3607/Work-Queue on that same site.
As I understand it, if I open a view from a database using db.getView() there's no point in doing this multiple times from different threads.
But suppose I have multiple threads searching the View using getAllDocumentsByKey() Is it safe to do so and iterate over the DocumentCollections in parallel?
Also, Document.recycle() messes with the DocumentCollection, will this mess with each other if two threads search for the same value and have the same results in their collection?
Note: I'm just starting to research this in depth, but thought it'd be a good thing to have documented here, and maybe I'll get lucky and someone will have the answer.
The Domino Java API doesn't really like sharing objects across threads. If you recycle() one view in one thread, it will delete the backend JNI references for all objects that referenced that view.
So you will find your other threads are then broken.
Bob Balaban did a really good series of articles on how the Java API works and recycling. Here is a link to part of it.
http://www.bobzblog.com/tuxedoguy.nsf/dx/geek-o-terica-5-taking-out-the-garbage-java?opendocument&comments
Each thread will have its own copy of a DocumentCollection object returned by the getAllDocumentsByKey() method, so there won't be any threading issues. The recycle() method will free up memory on your object, not the Document itself, so again there wouldn't be any threading issues either.
Probably the most likely issue you'll have is if you delete a document in the collection in one thread, and then later try to access the document in another. You'll get a "document has been deleted" error. You'll have to prepare for those types of errors and handle them gracefully.