Domino server is crashing frequently and here is the nsd log. There is big xpage based application on the server and accessed by many users always. All nsd logs are having such lines common.
Details one:
Please help.
Looks as if it is failing when reading view entries...
Not specific to XPages I have seen similar behaviour when there is a corrupt document in a database. So I would suggest as the very first thing that you run a fixup and compact on the database(s) with your application. If your application is fulltext-indexed I would consider deleting and regenerating the fulltext index. I have also seen crashes due to corrupt FT-indexes (though years ago).
Next, there are a couple of important Fixpacks to 8.5.3 that you should consider - I know it may not always be that easy to do - depending on your environment.
If none of that solves the issue I would:
Consider ANY changes done in the environment or the application shortly before the crashes started - even though they may not seem related.
Report the issue to IBM as a support incident. They do have some people that are very clever at digesting all the NSD information and put the finger on the issue.
Hope you get it solved!
/John
PS: You should REALLY consider getting any servers running XPages up on the latest version of Domino (i.e. 9.0.1 FP3/FP4). There are MAJOR improvements - and you can use the openNTF.org Domino API to improve any Java coding you do in your XPages (and you really should use a lot of Java - instead of SSJS). Just a free advice ;-)
If you have the message "PANIC: LookupHandle: Handle out of range", you are not recycling Domino objects correctly.
Always recycle ViewEntry and Document within loops
Always recycle Name or DateTime objects created within loops
If the view includes dates/times in columns and you use getColumnValues(), always load the columnValues() object into a Vector variable before using it and always recycle the Vector object afterwards using the .recycle(Vector) method available on any Domino object. Never, ever use getColumnValues(0) unless the view does not contain DateTimes and never will. Any call to getColumnValues() extracts all the columns and, for dates/times, creates a DateTime object which is a child of the Session, not of the ViewEntry. So recycling the ViewEntry has no effect on the DateTime.
John's two other suggestions will probably also help. OpenNTF Domino API always recycles, so you don't have to, so you cannot get this kind of crash. Later versions of Domino increase the number of handles available, so the chances of getting this crash are minimised.
Related
Hybris: 1905.14
I'm having performance issues on a Hybris instance hosted in CCV2. It is slowing down the storefront and the backoffice. If I go to HAC > Monitoring > Suspend, I see several Backoffice Long Operation items. The thread dump also show several threads related to backoffice.
There are no cronjobs running and the Triggers have been set to active=false. After some time, the server needs to be restarted, since the backoffice no longer loads. Lastly, the server cannot be initialized, since it contains data.
There is minimal configuration in backoffice, just some XML configuration to customize the treeview of different usergroups.
I am not able to replicate the performance issue on my local. Any ideas what may be causing these Backoffice Long Operation items?
Blocked threads look like this:
priority:5 - threadId:0x2095 - nativeId:0x82f - nativeId (decimal):2095 - state:BLOCKED
stackTrace:
java.lang.Thread.State: BLOCKED
at java.base#11.0.6/java.util.Collections$SynchronizedMap.put(Collections.java:2598)
- waiting to lock java.util.Collections$SynchronizedMap#1e60f80f
at com.hybris.cockpitng.util.cache.WidgetAsyncWarmUpCache$WarmUpOperation.lambda$execute$0(WidgetAsyncWarmUpCache.java:122)
at com.hybris.cockpitng.util.cache.WidgetAsyncWarmUpCache$WarmUpOperation$$Lambda$1481/0x00000008020d4c40.accept(Unknown Source)
at java.base#11.0.6/java.util.ArrayList.forEach(ArrayList.java:1540)
at com.hybris.cockpitng.util.cache.WidgetAsyncWarmUpCache$WarmUpOperation.execute(WidgetAsyncWarmUpCache.java:122)
at com.hybris.cockpitng.engine.impl.DefaultWidgetInstanceManager.lambda$prepareLongOperation$2(DefaultWidgetInstanceManager.java:223)
at com.hybris.cockpitng.engine.impl.DefaultWidgetInstanceManager$$Lambda$1466/0x00000008020d0040.get(Unknown Source)
at com.hybris.cockpitng.engine.operations.CockpitNGBackgroundOperation.runInternal(CockpitNGBackgroundOperation.java:125)
at com.hybris.cockpitng.engine.operations.CockpitNGBackgroundOperation.run(CockpitNGBackgroundOperation.java:93)
at com.hybris.backoffice.cockpitng.util.BackofficeThreadContextCreator$RunnableWithParentThreadContext.run(BackofficeThreadContextCreator.java:100)
at java.base#11.0.6/java.lang.Thread.run(Thread.java:834)
at de.hybris.platform.core.threadregistry.RegistrableThread.internalRun(RegistrableThread.java:141)
at de.hybris.platform.core.threadregistry.RegistrableThread.run(RegistrableThread.java:131)
Locked synchronizers: count = 0
Thread count from fastthread.io:
UPDATE 5/27 15:29: Hybris confirmed it to be a bug in 1905.14: ECP-5030 WidgetAsyncWarmUpCache causing CPU saturation with certain category structure. A workaround for CCV2 is in the JIRA tiket:
Upload attached cockpitframework-19.05.14-RC4.jar to your repository under
root/CUSTOMIZE/modules/backoffice-framework/backoffice/web/webroot/WEB-INF/lib
Build
Deploy
However, we are choosing to use 1905.13 for now.
xxx
As a temporary resolution (which removed the performance issue), we have:
Downgraded Hybris 1905.14 to 1905.13
Removed/Disabled the hotfolder extensions
At this time, we can't say if the issue is due to Hybris version or due to hotfolder extension.
We have another server that is running on Hybris 1905.14 with hotfolders enabled, and it doesn't have the "Backoffice Long Operation" issue. So, at this time, we're just waiting for SAP to provide some response (or investigation of the issue).
I had a similar situation(from what I remember my thread dump was identical to yours) in the past and the problem was caused because of a Variant who had itself as a baseProduct. Because of this when sync or indexing was triggered, a lot of stackoverflow errors appeared since the code was trying to also synch/index the base product who had itself as baseProduct to infinity and beyond.
In order to check whether you have a similar scenario you can run the following flexible search:
select {VP:code} from {VariantProduct as VP} where {VP:baseProduct}={VP:pk}
PS: I checked and the issue still reproduces(Hybris still lets you create this cyclic dependency) on Hybris 1905.11
I need to process AJAX in my crawler and would prefer using system browser albeit I may have to change my mind. My crawler program may generally be working in background while the user can work on other stuff in other applications.
Anyhow - since WebControl leaks memory if processing JS libs that leak memory - this can cause a crawler to quickly run out of memory. (Many SO posts about this.)
So I have created a solution that uses a separate "dummy" small executable with the webcontrol that takes input/output. This is launched as a separate process by the crawler. This part seems to work great. This child process is created/destroyed as many times as needed.
However, this called process with the embedded-IE grabs focus on every page load (a least if e.g. JS code calls focus) which means if the user is doing work in e.g. Word or whatever - keyboard focus is lost.
I have already moved the embedded IE window off-screen, but I can not make it invisible in the traditional sense since then the embedded IE stops working.
I have tried to disable all parent controls before calling navigate - but it does not work for me.
Any ideas I have not tried? Maybe somehow catch a windows message that focuses webcontrol and ignore it? OR something so I can immediately refocus the earlier control that had focus?
I currently use Delphi - but this question is applicable to VB, C# .Net etc. from my earlier investigations on this matter. I will take a solution and ideas in any language.
I have inherited a website built on Expression Engine which is having a lot of trouble under load. Looking in the server console for the database I am seeing a lot of database writes (300-800/second)
Trying to track down why we are getting so much write activity compared to read activity and seeing things like
UPDATE `exp_snippets` SET `snippet_contents` = 'some content in here' WHERE `snippet_name` = 'member_login_form'
Why would EE be writing these to the database when no administrative changes are happening and how can I turn this behavior off?
Any other bottlenecks which could be avoided? The site is using an EE ad module so I cannot easily run it through Varnish since the ads need to change on each page load - looking to try and integrate DFP instead so they can be loaded asynchronously
There are a lot of front end operations that trigger INSERT and UPDATE operations. (Having to do with tracking users, hits, sessions, also generating hashes for forms etc.)
The snippets one tho seems very strange indeed I wouldn't think that snippets would call an UPDATE under normal circumstances. Perhaps the previous developer did something where the member_login_form (which has dynamic hash in it) is written to a snippet each time it is called? Not sure why you would do it, but there's a guess.
For general speed optimization see:
Optimizing ExpressionEngine
There are a number of configs in the "Extreme Traffic" section that will reduce the number of writes (tho not the snippet one which doesn't seem to be normal behavior).
I have read this suggestion about recycle of Domino objects:
What is the best way to recycle Domino objects in Java Beans
What is best practice if I have a datasource named document and in a function that is called several times this code exists:
var doc=document.getDocument(true)
and doing stuff to the backend document.
Before I exit the function should I recycle doc or is my backend document to the datasource recycled then ?
This is an excellent question, because this is one of the only exceptions to the "recycle everything" principle (two other notable examples are that you should never recycle the current session or database). It's a bad idea to recycle the back end document for a data source, because the JSF lifecycle gets the same handle, and you'd be recycling it out from under Domino. The data source takes care of this for us, so there's no need to recycle it manually. On the other hand, if you get a handle on specific items (i.e. doc.getFirstItem("someFieldName"), or item values that are dates, you should recycle those objects, just not the document itself.
By far the most important scenario where it's crucial to recycle Java and SSJS objects is in view iteration, because every time you advance to the next entry or document, you're leaking a handle if you skip the recycle. In most other cases, recycling is still advisable, but closer to being optional, because it takes a long time for other operations to leak enough to cause problems. But if you're iterating a very large view, you can easily run out of handles in a single iteration if you forget to recycle.
One parting thought, however: I rarely see a situation where getting a handle on the back end document of a data source is the best approach, so I'd recommend revisiting your code to ensure that it's even necessary to obtain this handle to begin with. For instance, instead of document.getDocument(true).getItemValueString("someFieldName"), just call document.getValue("someFieldName"). The value returned should be identical, but it will run more efficiently, and you're not touching the back end document, so recycling isn't an issue. And it's less typing for every item access, which certainly adds up over time. Similarly, instead of document.getDocument(true).replaceItemValue("someFieldName", "newValue"), substitute document.setValue("someFieldName", "newValue").
My iPhone app uses core data and things are fine for most part. But here is a problem:
after a certain amount of data, it stalls at first time execution (where core data entities must be loaded).
Some experimenting showed that things are OK up to a certain amount of data loaded in Core Data at start.
If I go over a critical amount the installation starts failing. The bigger the amount of data for start, the higher the probability that it fails.
By making separate tests I made sure the data themselves are not faulty.
I also can say this problem does not appear in the simulator.
It also does not happen when I connect the debugger to the device.
It looks like too much data loaded in core data in a short amount of time creates some kind of overload.
Is that true? Any idea on a possible solution?
At this point I made up a partial solution using a UIActionSheet object to kill some time (asking the user to push a button). But this is not very satisfactory, though for the time being it works.
Any comment or advice for a better way would be appreciated.
It is not quite clear what do you mean by "it fails".
However if you are using SQLite, by loading into CoreData, if you mean "create and save" entities at start up to populate CoreData, then remember to not call [managedObjectContext save...] only at the end especially with large amount of data, but create and save a reasonable set of NSManagedObject.
Otherwise, if you mean you have large amount of data that are retrieved as NSManagedObject, probably loaded into a UITableView consider using some kind of NSOperation for asynchronous loading.
If those two cases doesn't apply to you just tell us the error you are getting, or what you mean by "fails" os "stalls".