Liferay: timeout on the regeneration of the search indexes - search

When launching the functionality of a "Rebuild all search indexes." very often the request times out, perhaps because the browser waits too long the answer.
How do I fix this? As it is I can not figure out when it ends the regeneration of the indices and if unsuccessful
Liferay 6.2

If your data set is pretty big regenerating the indexes can take a lot of time. There’s no ‘fix’ for this. You could, for example, use a different Indexer, such as Solr, to remove the burden from the machine running Liferay.

I can always tell when it is done by running it late at night during low traffic hours and monitoring the JMX (CPU) activity and (tomcat) logs. Both will give indications when the indexing is completed with various tasks and starting new ones, but I find the JMX monitoring to be clearest. In our case, there is around 500+MB of index data on each node and it takes ~ 2.5 hours give or take. I also kick off indexing on each application node, since we have found the "cluster link" software unreliable to copy the index across cluster nodes...

Related

preferred way to schedule job #Scheduled vs crontab

I have to run one utility periodically for instance say, every minute.
So, I have two option #Scheduled spring boot vs crontab of linux box that we are using to deploy the artifact.
SO, my question is which way should I use?
what are the pros and cons for each solution , any other solution if you can suggest.
Just for comparing between these two, I don't have much points, but only based on this situation which I faced now. I just built a new end point and am doing performance testing and stress testing for the same on production. I am yet to decide the cron schedule times, and those may need a slight tweaking over some more time of observation. Setting via #Scheduled needs me to deploy/restart application every time I make a change.
Application restart generally takes more time than crontab edit.
Other than this, a few points considering the aspects of availability and scalability -
Setting only via crontab on a single server would mean a single point of failure, if the server goes down.
Setting via #Scheduled also could mean the same.
If you have multiple instances of the server, this could mean endpoint getting triggered twice and you may not want to have the same. Worst case, is if the scaling up happens after a long time, and you wrote the #Scheduled endpoint long back, while it was only deployed on a single server and then you forgot. As soon as scaling up happens, the process will start getting hit twice.
So, none of these seem to be the best in terms of points of availability and scalability.
In such situations, ideally a distributed cron management system (I have heard about Rundeck) is needed, which manages which, out of the available servers is to be called to hit the desired end point and if needed to call the next server in case the first one is down.
In case of any need for investigation. logs of rundeck could be checked to find the server which was actually called.

Azure App Service: How can I determine which process is consuming high CPU?

UPDATE: I've figured it out. See the end of this question.
I have an Azure App Service running four sites. One of the sites has two deployment slots in addition to the primary one. Recently I've been seeing really high CPU utilization for the App Service plan as a whole.
The dark orange line shows the CPU percentage. This is just after restarting all my sites, which brought it down to this level.
However, when I look at the CPU use reported by each site, it's really low.
The darker blue line shows the CPU time, which is basically nothing. I did this for all of my sites, and all the graphs look the same. Basically, it seems that none of my sites are causing the issue.
A couple of the sites have web jobs, so I took a look at the logs but everything is running fine there. The jobs run for a few seconds every few hours.
So my question is: how can I determine the source of this CPU utilization? Any pointers would be greatly appreciated.
UPDATE: Thanks to the replies below, I was able to get more detail into what was happening. I ended up getting what I needed from SCM / Kudu tools. You can get here by going to your web app in Azure and choosing Advanced Tools from the side nav. From the Kudu dashboard, choose Process Explorer. The value in the Total CPU Time column is not directly useful, because it's the time in seconds that the process has run since it started, which might have been minutes or days ago.
However, if you make a record of the value at intervals, you can look at the change over time, and one process might jump out at you. In my case, it was my WebJobs process. Every 60 seconds, this one process was consuming about 10 seconds of processor time, just within one environment.
The great thing about this Kudu dashboard is, if you can catch the problem while it is actually happening, you can hit the Start Profiling button and capture a diagnostic session. You can then open this up in Visual Studio and get some nice details about where the CPU time is being spent.
Just in case anyone else is seeing similar issues, I'll provide more details about my particular case. As I mentioned, my WebJobs exe was the culprit, and I found that all the CPU time was being spent in StackExchange.Redis.SocketManager, which manages connections to Azure Redis Cache. In my main web app, I create only one connection, as recommended. But Since my web jobs only run every once in a while, I was creating a new connection to Azure Redis Cache each time one ran, which apparently can lead to issues. I changed my code to create the Redis Cache connection once when the WebJob process starts up and use the existing connection when any individual WebJob runs.
Time will tell if this really fixes the issue, but I think it will. When the problem occurred, it always fit the same pattern: After a few days of running fine, my CPU would slowly ramp up over the course of about 12 hours. My thinking is that each time a WebJob ran, it created a connection object, which at first didn't produce trouble, but gradually as WebJobs ran every hour or two, cruft was building up until finally some critical threshold was met and the CPU usage would take off.
Hope this helps someone out there. Best wishes!
May be you should go to webApp scm?
%yourAppName%.scm.azurewebsites.com;
There is a page, that can show you all process, that runned now on your web app. (something like Console > Process).
Also you can go to support page (from scm right corner).
You can find some more info about your performance there, and make memory dump (not for this problem, but it useful for performance issues).
According to your description, I assumed that you could leverage the Crash Diagnoser extension to capture dump files from your Web Apps and WebJobs when the CPUs usage percentage is higher than the specific threshold to isolate this issue. For more details, you could refer to this official blog.

Meteor Node Process CPU Usage Nears 100%

I'm having trouble with my Meteor app when it gets to its peak amount of traffic (peak for this is nothing, 1k visits, maybe 2,500 pageviews in a day). CPU usage spikes and never recovers, so I've taken to using Nodetime to monitor usage and I've been reloading the process (forever restart) to get things back to normal.
I'm fairly new to profiling, so finding the underlying cause has me at a loss for where to start. I'm fairly certain it has to do with my app's server code, but the profiling seems to point to the Fibers module as a "hotspot" which I understand aids in making my server code synchronous.
Below is a snippet from the profiling results. I hope someone can guide me in the right direction in troubleshooting this!
While I don't have a specific answer to your question, I have experience dealing with CPU issues for our production meteor app for so I can give you a list of things to investigate.
Upgrade to the latest version of meteor and the appropriate node version (see the changelog). As of this writing that's meteor 0.8.2 and node 0.10.28.
Read this and this article. The latter makes a great point that you really should always try to delay activation of subscriptions until you need them. In particular you may not need to publish anything for users who are not logged in. In my experience, meteor CPU problems have everything to do with subscriptions.
Be careful with observe and observeChanges. These are expensive and are easy to abuse. In particular:
Make sure you are calling stop() on your handles when they are no longer needed (consider using a package like publish-with-relations so this is done for you).
Fetch only the collections and fields that you absolutely need. Observe works by continually diffing objects (requires lots of CPU). The fewer and smaller objects you have, the less there is to compute.
Consider using smart-collections before it is retired. Use oplog tailing - this can make for a night and day difference in performance and CPU usage in your app.
Consider making some things not reactive (also mentioned in the articles above). For us that was a big win. We had one extremely expensive join that was used on two frequently accessed pages on the site. When it got to the point where the CPU was pegged at 100% about every 30 minutes I gave up on reactivity for that element and just did the join on the server and shipped the data to the client via a method call. I also created a server-side expiring cache for these results and stored them by user (special thanks to Matt DeBergalis for this suggestion).
Do a preventative nightly restart. I have a cron job that tells forever to restart our app once a day in the middle of the night. That brings the CPU down from ~10% to 1%. This seems like black magic, but the fact that the CPU usage changes after a reset leads me to believe this is a good idea.
Updated thoughts (1/13/14)
We migrated to oplog tailing as soon as it was available (meteor 0.7) and that made a big difference. Note that in order to get access to the oplog, you'll probably need to either host your own db or run a dedicated instance on the hosting provider of your choice. I'd also recommend adding the facts package to actually tell if its working.
There was a memory leak discovered in publish-with-relations, and as of this writing the atmosphere version (v0.1.5) hasn't been bumped to reflect these changes. If you are using it in production, I strongly recommend checking out the HEAD version and running it locally.
We stopped doing nightly restarts a couple of weeks ago. So far everything has been fine (fingers crossed).
Updated thoughts (7/2/14)
A few months ago we switched over to using an Elastic Deployment on mongohq. It's affordable, the performance has been great, and they even have a blog post which tells you how to enable oplog tailing.
I'd strongly recommend checking out kadira to help diagnose performance issues in your app. Also check out the academy articles which have a number of good tips in them.
I'm also having this problem. Actually there is an issue with 0.6.6.1, I run meteor --release 0.6.6 and the cpu is back to normal now.

Extremely uneven cloud service load-balancing with Azure

I'm utilizing Azure for hosting a cloud service, which I recently modified to be scalable across multiple instances, including a session caching worker role. My question is, why would I be seeing extreme load (upwards of 90%) on one instance, but not on other instances (15-20% across all other instances)? Should I be worried?
Before I set up load balancing and when my single instance hit upwards of 95% load, it would slow to a crawl --- becoming unusable. Is there any way to ensure that I don't have any users experiencing this because they're somehow round-robin'd onto the overloaded instance?
We found we had a similar type of situation when one load-balanced instance failed over; what we were seeing is that all the load transferred, but wouldn't balance out again. We found that turning off keep-alive for a couple of minutes let the load spread again, after which we could turn it back on.
http://technet.microsoft.com/en-us/library/cc772183(v=ws.10).aspx
Well... azure load balance is based on round robin... so the distribution should be almost equal (something like 60-40 or even 70-30 is still acceptable)... so just to be sure: Are you sure your not using IIS "redirect" (I forgot the name of the feature) that would set sticky session?
I must say that without further details about what your site actually "do and how" it's quite hard to advice... I must say that this behavior is strange, but it's not clear that it is the load balancer fault...
Edit1: I would suggest that you further exam what is the 90% guy is doing by tracing it's activities... maybe you're out-of-luck and the requests that will cause heavy load are falling into that machine and the ones that will be quickly worked are being worked by the other one... Another thing that might be happening is that something might be stucked (maybe a infinite-loop)... if you implemented a scalable architecture I would recommend that you provision another machine and kill the one that is suffering...
Edit2: A simple way to verify that the load balancer is working is: Log remotely to the service machines and replace something like a image that is displayed on the main page (something that you can easily spot just by looking to the page). On server 1 put lets say a yellow image and on server 2 a red image (ok... maybe something not this drastic but you get the point...). Then keep loading the page again and again...

Isolating a rampant process in IIS

I have a webserver that is pegged and I've been able to isolate it to a particular website instance. I'd like to dig deeper and isolate the particular page/process that is causing the issue.. Any tips?
You can take a memory dump of the process and poke around with windbg.
There are posts on this issue from Tess Ferrandez blog. Just do as she say.
Which version of IIS are you using? Some of the higher ones allow for a separation of which process gets used to handle requests such as a worker process that you could isolate a bit more that way. I'd also suggest reading through the IIS logs to see what requests were being handled, how long they took, etc.
There are many different quirks to each IIS version. The really low ones just had a start/stop functionality, but the newer ones have really given administrators much more control and power, IMO.
You should try using a profiler to identify what is using up the most resources. I've used dotTrace Profiler, although that can be expensive if you're on a tight budget.
It allows you to see exactly what processes and method calls use of the most processing time of a request really well so you can isolate the most resource intensive operations.
You should really be able to use any profiler to do this, not just dotTrace. I just happen to only have experience with this one in particular.
Change your web garden setting to 10 or greater. Then watch your CPU and memory utilization on the web server.
Continue to increase the web garden setting until either the app is completely responsive with less than 5% average utilization OR you have actually maxed your web server's memory.
UPDATE
It's not about diagnosing, it's about properly configuring the IIS server. Web Gardens are one of the top misunderstood features of IIS. By increasing the available threads to handle new requests you remove the appearance of contention at the web server level and place it squarely where it belongs. In this case at your database. Instead of masking a problem it actually highlights exactly where the problem is.
This turned out to be a SQL problem (sql 2005). The solution was found by using SQL activity monitor to identify a suspended process with a Async_network_io wait type. We then ran SQL profiler to narrow it down to two massive queries which were returning an over abundance of results.

Resources