Profiling Lucene in Nutch

Profiling Lucene in Nutch - search

I'm trying to profile Nutch using VisualVM. Lucene is the part of the Nutch core responsible for generating url indexes and for searching these indexes due to some query. I'm running Nutch through Apache Tomcat and I would like to determine how much time Nutch spends in various function calls (including Lucene calls) but when I try to profile using VisualVM I get a bunch of profiling data about Tomcat and not Nutch or Lucene. What am I doing wrong here?

I had the same experience trying to locate Lucene time inside Tomcat calls.
What you have to do is:
Use VisualVM 1.2.2.
Choose the relevant process and press "Profile".
Check the "Settings" checkbox. This should open a "CPU settings" tab, with fields you can fill.
Under "Start profiling From classes:" write an entrance point in your code
(e.g. com.my.company.NutchUser)
Uncheck "Profile new runnables".
Choose "Profile only classes:" and under it write:
org.apache.lucene.*
org.apache.nutch.*
Press the "Profile CPU" button.
I believe if you do all that, then run your process and take occasional snapshots, you will be fine.
Alternatively, This guy suggests doing stack sampling instead of profiling. I have never done it, but it sounds interesting.

Related

Where can I get node exporter metrics description?

I'm new to monitoring the k8s cluster with prometheus, node exporter and so on.
I want to know that what the metrics exactly mean for though the name of metrics are self descriptive.
I already checked the github of node exporter, but I got not useful information.
Where can I get the descriptions of node exporter metrics?
Thanks

There is a short description along with each of the metrics. You can see them if you open node exporter in browser or just curl http://my-node-exporter:9100/metrics. You will see all the exported metrics and lines with # HELP are the description ones:
# HELP node_cpu_seconds_total Seconds the cpus spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 2.59840376e+07
Grafana can show this help message in the editor:
Prometheus (with recent experimental editor) can show it too:
And this works for all metrics, not just node exporter's. If you need more technical details about those values, I recommend searching for the information in Google and man pages (if you're on Linux). Node exporter takes most of the metrics from /proc almost as-is and it is not difficult to find the details. Take for example node_memory_KReclaimable_bytes. 'Bytes' suffix is obviously the unit, node_memory is just a namespace prefix, and KReclaimable is the actual metric name. Using man -K KReclaimable will bring you to the proc(5) man page, where you can find that:
KReclaimable %lu (since Linux 4.20)
Kernel allocations that the kernel will attempt to
reclaim under memory pressure. Includes
SReclaimable (below), and other direct allocations
with a shrinker.
Finally, if this intention to learn more about the metrics is inspired by the desire to configure alerts for your hardware, you can skip to the last part and grab some alerts shared by the community from here: https://awesome-prometheus-alerts.grep.to/rules#host-and-hardware

"Can we read Indexed file using JCL?"

I am looking to read indexed file using JCL is there any possibility of doing like that? Like there is one KSDS file and we have to read that file using indices and we have to print the selected record onto the console using only JCL no usage of COBOL..

I believe the program you are looking to execute with your JCL is IDCAMS, and that you want to use the PRINT FROMKEY() TOKEY() command.
That hyperlink is to the IBM Documentation, a comprehensive set of documentation for z/OS and many of its components. Other IBM products such as Enterprise COBOL, CICS, DB2, and MQ have their own Documentation sites. If you're going to be using an IBM mainframe, it's a good idea to bookmark the sites for the products you use and become familiar with them.
This will not display output on the console, but it will display output on the SYSPRINT DD. I'm not sure if there's a way to display this output on the console (which is where the interface used by mainframe operators), typically that's where messages essential to system health and continued functioning are displayed. If you displayed the output you requested on the console I suspect you'd get a request to not do that right quick.
#NicC is quite correct in saying that the JCL is not doing anything other than requesting the IDCAMS program (in this particular case) be executed. If you're a Linux person, think of it this way:
Suppose you have a shell script...
#! /bin/bash
sort < $1
...would you say the script is doing the work, or the sort program?
JCL has no looping constructs, no way to programmatically alter variables. JCL allows you to request that programs be executed by the operating system and gives you a way to specify their inputs and outputs.

Determine OpenJDK active GC type

Does anybody know how to determine the active GC type(serial, parallel, etc.) via JMX in running OpenJDK 8 JVM?

This is purely based of my local machine it may likely be different from yours. Hopefully though it will help you find what you are after.
So its probably a good idea to download Java 1.8 Mission Control if not already installed. See the following Stack Overflow question.
Where to find Java Mission Control and VisualVM on Ubuntu (OpenJDK8)
When you have Java Mission Control open you will want to select your running JVM, this can be found on the left tab under 'JVM Browser'
When you have selected your running JVM you should be able to select the option 'MBean Server'. Selecting this will open a Overview of your JVM, at the bottom of this page you should see multiple tabs, including 'MBean Browser'.
This tab will display you devices, applications or any resources that need to be managed by the JVM including the Garbage Collector.
You should be able then to filter the MBean Tree by searching for 'Garbage'
For more information about the Garbage Collection you can view the 'Memory' tab at the bottom. This will hopefully display you the GC Tables that contain the Garbage Collection data and descriptions. ( See second screenshot)
Hopefully if all is working as I expected you will be shown the MBeans for Garbage Collection such as my screenshot below.

MonoTouch Memory Use High

I have monodevelop 2.8 on top of monotouch 5 agains the Xcode 4.2 SDK. I have been having memory issues with my iPhone app. I have been struggling with identifying the cause, so I created a test app with a master detail view. I made a minor modification to the rootcontroller to have it show 5 root items instead of the default 1. Each click of the root item adds a new DetailViewController into the navigation controller.
controller.NavigationController.PushViewController (DetailViewController, true);
In my detail view controller I've added logic that simply take an input that governs the number of times a loop happens, and then a button to trigger the loop to occur and make a call to a REST based service. Very minimal code changes from the default.
Just running the example and looking at it in instruments I seem to be up to 1.2 MB of live bytes. I think launch the detail view by touching items in the root view controller and I get up over 2 MB. Rotating the display or triggering the keyboard to open gets memory up near 3 MB. I navigate back in the controller and open a different view from the rootviewcontroller and I can see the memory grow some more. Just moving in and out of views without even triggering my custom code I can get the memory use in instruments over 3 MB. I've seen my app receive memory warnings when being up over 3 MB before. My test detail view is very basic with a text box, a label, and a button that all have outlets on them. I was under the impression I don't need to do anything special to have them cleanup. However, I don't see live bytes drop in instruments.
As an additional test, I added a Done button. When the done button is pressed I go and use RemoveFromSuperview() on each outlet, Dispose(), and then set it to null. I see the live bytes drop. But that doesn't do anything for me if the back navigation is used instead.
I'm curious if anyone can verify my expectations of seeing memory drop. Not sure if using instruments to look at live bytes is even valid or not. I'd like to determine if my testing is even valid and if there are tips for reducing memory foot print. Any links to best practices on reducing the memory foot print are appreciated as I seem to be able to get the memory to climb and my app to start getting memory warnings just by navigating around between screens.

It's hard to comment without seeing the code for the test app. Is there any way you could submit a bug report to http://bugzilla.xamarin.com and attach your test project?
There's a developer on MonoTouch working hard to add additional smarts to the GC for MonoTouch for 5.2 that I'm sure would love to have more test cases.
I would also be very interested in looking over your test case.

Profiling partial programs in Linux

I have a program in which significant amount of time is spent loading and saving data. Now I want to know how much time each function is taking in terms of percentage of the total running time. However, I want to exclude the time taken by loading and saving functions from the total time considered by the profiler. Is there any way to do so using gprof or any other popular profiler?

Similarly you can use
valgrind --tool=callgrind --collect-atstart=no --toggle-collect=<function>
Other options to look at:
--instr-atstart # to avoid runtime overhead while not profiling
To get instructionlevel stats:
--collect-jumps=yes
--dump-instr=yes
Alternatively you can 'remote control' it on the fly: callgrind_control or annotate your source code (IIRC also with branch predictions stats): callgrind_annotate.
The excellent tool kcachegrind is a marvellous visualization/navigation tool. I can hardly recommend it enough:

I would consider using something more modern than gprof, such as OProfile. When generating a report using opreport you can use the --exclude-symbols option to exclude functions you are not interested in.
See the OProfile webpage for more details; however for a quick start guide see the OProfile docs page.

Zoom from RotateRight offers a system-wide time profile for Linux. If your code spends a lot of time in i/o, then that time won't show up in a time profile of the CPUs. Alternatively, if you want to account for time spent in i/o, try the "thread time profile".

for a simple, basic solution, you might want log data to a csv file.
e.g. Format [functionKey,timeStamp\n]
... then load that up in Excel. Get the deltas, and then include or exclude based on if functions. Nothing fancy. On the upside, you could get some visualisations fairly cheaply.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string