How to get Eclipse MAT memory leak suspect report in JSON/XML format? - garbage-collection

I am trying to add Java memory profiling into my devops pipeline. I am using Eclipse MAT command line script for this purpose. Once I provide the hprof file as input to this and it generates suspect report. I need to parse this HTML report and make decision based on this.
This is cumbersome and also the parsing is not structured. I want either XML/JSON report or I want to see how the actual report is generated by Eclipse MAT. If I found that I can get the code and customize based on my need. Any suggestions on this.

Eclipse Memory Analyzer is open source, and all the code is here: https://git.eclipse.org/c/mat/org.eclipse.mat.git
The leak suspects code is in
https://git.eclipse.org/c/mat/org.eclipse.mat.git/tree/plugins/org.eclipse.mat.api/src/org/eclipse/mat/inspections/FindLeaksQuery.java
and
https://git.eclipse.org/c/mat/org.eclipse.mat.git/tree/plugins/org.eclipse.mat.api/src/org/eclipse/mat/inspections/LeakHunterQuery.java
The leak suspects report is clean HTML which passes HTML validation, which would aid parsing.
Your request for a XML/JSON report is a reasonable one, but is not currently provided by MAT. There is provision for one to be added, using the rendering extension point http://help.eclipse.org/latest/topic/org.eclipse.mat.ui.help/doc/org_eclipse_mat_report_renderer.html . There are already HTML, CSV and TXT renderers and so a JSON extension point could be added. There are then design decisions to be made about how to represent trees and tables in JSON, and how to convert HTML from say the yellow boxes of the leak suspects report to JSON.
As I know of other people who want a JSON formatted report it may be worthwhile following it up more formally with the MAT community, either on the forum https://www.eclipse.org/forums/index.php?t=thread&frm_id=186 or on Bugzilla or on the developer mailing list if you are proposing to help write code.

Another option is to implement your own extension: https://wiki.eclipse.org/MemoryAnalyzer/Extending_Memory_Analyzer#Calling_One_Query_from_Another and call the leak_hunter query. Then read through the details of the IResult and transform those to a json object.

Related

Producing PDF files in NodeJS - simpler than puppeteer/chromium but a bit less basic than low level libraries

I'd like to be able to produce PDF files in NodeJS.
Currently, we use puppeteer. We need to produce highly designed documents and so puppeteer/chromium gives me the ability to create a complex layout in HTML with the added benefit of also having the HTML version of the file.
It's great for relatively small documents where design is key.
The problem is when I try to produce long report documents. These documents do not require elaborate design. These are pretty much just a header with some information, and then a simple table with lots of records that stretch far as the eye can see, so they tend to be large. Like, really really large.
When I try using puppeteer for that, well pretty much just crashes and burns because loading such huge layouts into the underlying browser is just too much.
Currently I do "stitching". I create the document by having puppeteer create the doc in parts, and then I connect all those "doclets" into one using PDFKit.
But then I have problems like when one "doclet" ends and a new one begins, there are blank lines. (partially empty pages for no good reason from the perspective of a customer viewing it)
What I'm looking for is a library that has basic layout functionality but that doesn't use a browser (or perhaps uses something lightweight).
Problem is that libraries like PDFkit and pdf-lib seem to be too low level.
I'm going to literally have to "draw" the documents by telling it where exactly the text should be.
If I want tables, I'm going to straight up have to draw rectangles and stuff.
Having to create all of this manually would be a nightmare.
All I want is the ability to create simple layouts (tables, titles, text wrapping, background color) without having to use a library that just launches chromium.
Please, let me know if you know of any such option.
Thanks in advance!
What I tried:
PDFkit/pdf-lib - too low level. Unless I'm getting something wrong, there doesn't seem to be a way to create word wrapped layouts with basic tables.
jsPDF doesn't seem to be able to use the HTML functionality on the server(I think to get it to work I'd have to let it use a browser...? if so, doesn't really help).
Puppeteer/other libraries that pilot a browser - well, it uses a browser so a no-go for large docs.
Praying to Odin - No luck so far.

What are known limitations of borb related to PDF versions?

I'm new to borb, which seems to me a very promising Python package.
Trying to load a small sample of PDF documents, just to put hands on, I've found that borb can open some of them without problems; in some cases I got messages such as "Unable to process XMP meta-data"; yet in other cases I got assertion errors.
Thus, before posting specific issues, I'm looking for information about current limitations of borb, with reference to PDF versions, and on tools I could use first to detect files to be considered invalid PDF documents. Thanks.
I'm using borb release v2.0.20, just cloned from GitHub, and Python 3.6.5 on Windows 10.
Disclaimer: I am Joris Schellekens, author of the aforementioned library borb.
The problem is that the PDF spec (ISO-32000) leaves some room for interpretation at various points throughout. That means some PDF libraries will interpret the spec in a given way, and produce documents that may not always be compliant according to other tools.
borb tends to be very strict when it comes to PDF parsing. As soon as an error is detected, it will throw the stacktrace right back at you. Whereas other PDF software (e.g. Adobe Reader) tend to be much more forgiving in terms of what they accept as input PDF documents.
Although I certainly understand your frustration at being unable to process what you perceive to be "perfectly good PDF documents", I assure you that processing them might lead to even more issues.
I know for instance that there are cases where Adobe Reader tries to correct a bad PDF document, and as a result ends up corrupting the signatures in the document (very undesirable).
If you experience issues, and you can share the PDF, feel free to log a ticket on the GitHub repository.
From the top of my head, the current limitations of borb are:
signatures
encrypted PDF documents
XREF not found
some images with transparent pixels

Merging PDF files in Haskell

The Preview application on the Mac allows one to merge multiple PDF files, although the functionality is rather obscure. I'm writing a utility in Haskell that needs to perform a similar task, that is, merge an arbitrary number of PDF files into one new file.
Does anyone have a suggestion as to where to start with this? Obviously if there's a library on Hackage that will do most of the work out of the box that would be ideal, but if not, then some pointers about where to start would be very much appreciated.
I'm working on pdf library, that supports parsing and generating. It is low level, higher level tools are in todo list yet (because it is hard to design good high level API).
Here is an example of unpacking and decrypting of PDF file. It is easy to implement PDF merging, but you need to be familiar with PDF internals.
ADDED:
I create a basic example of merging PDF files in Haskell. 150 lines of code total, but it lacks few features (see comments at on the top of the file). They are easy to add, so let me know if you are interested.
The PDF file format isn't that complicated. Adobe has an official specification document for it somewhere. Essentially a PDF file contains a set of numbered "objects". You'd have to get all the objects from each PDF file, renumber them so they're unique, and then you need to fiddle with the page index so all the pages actually show up.
There appear to be a couple of packages on Hackage for writing PDF files, but I don't see much for reading them. You may like to look at the source code for pdfsplit for ideas. Also HPDF.

Embeddable customizable graph editor (Java, Flash, HTML+Javascript)

I have an application that uses intricate graph-like structure as a configuration. The application itself resembles a NetGraph- or netfilter firewall, thus graph nodes have types and properties (which correspond to operations) and they're interconnected with directed edges.
I'd like to have an easy-to-user configuration editor for my application that provides visualization and editing for configuration as a graph.
In my dream scenario, application would receive this configuration as a file in one of popular graph formats (for example, TGF, DOT, GraphML, etc), parse it and use.
A few requirements (not really strict, I'm open to consider various options) - graph editor should be:
available to be embedded in web UI - i.e. implemented in Javascript/HTML, Flash or as a Java applet
able to load TGF-style graph (i.e. without layouting instructions, nodes would be without coordinates) and lay it out in a somewhat decent automatically
able to save this graph back
able to load/save using requests to HTTP server, not a file directly
customizable to make it work with strict set of node types (so that user can't just create arbitrary node type or arbitrary properties for a given node)
open-source
So far I've found yEd and it's Flash version, Graphity - both look cool, but they aren't customizable (to strip them to bare-bones functionality, i.e. creation of one a few node types) and not open source, so embedding them anywhere pledges to be somewhat painful.
Another option I'm considering is trashing the whole "visual editor" idea and make user just write down bare TGF or DOT-style definitions in a plain text file and visualize them for later checking using something like GraphViz. Is it a viable way to go?
Have you looked at InfoVis? In particular, the force-directed layout and editing may be applicable. Graph source data is analogous to DOT, albeit in json format. No layout info in the source data though.
EDIT: There's also ProtoVis which is similar.
hth.

How do you visualize logfiles in realtime?

Sometimes it might be useful, but mostly just looking cool or impressive to visualize log files (anything from http requests and to bandwith usage to cups of coffee drunk per day).
I know about Visitorville which I think look a bit silly, and then there's gltail.
How do you "visualize" your log files in realtime?
There is also the logstalgia tool. Visualizes Apache logs. See http://code.google.com/p/logstalgia/ for more details and a youtube video.
You may take a look at Apache Chainsaw. This nifty tool allows Log incomes from nearly everyqhere and has live filtering and colering. If you have an already written Log, I'm not sure if it can read it, it's been a while since I used it last time (was very usefull for the prototyping phase of our JBoss server)
Google has released the Visualization API that is probably flexible enough to help you:
The Google Visualization API lets you access multiple sources of structured data that you can display, choosing from a large selection of visualizations. The Google Visualization API also provides a platform that can be used to create, share and reuse visualizations written by the developer community at large.
It requires some Javascript knowledge and includes Google Docs integration, Spreadsheet integration. Check out the Gallery for some examples.
You could take a look at this. http://www.intalisys.com. 3D realtime vis app
We use Awk and Perl scripts to parse the log files and create summary reports and "databases" (technically databases in that each row corresponds to a unique event with many columns of data about that event, but not stored in a traditional database format. We're moving in that direction). I like Awk because you can very quickly search for specific strings in the log files using regex, keep counters and gather data from the log file entries, and do all kinds of calculations with that data. Then use your favorite plotting software. We use Excel, mainly because that's what was here before I started this job. I prefer MATLAB and it's open-source cousin, Octave, which is built on gnuplot.
I prefer Sawmill for visualizing data. You can basically throw any log file against it, and it will not only autodetect its structure*, but will also decide on how to analyze it. Even if you have a custom log file, you can still define what and how shall be analyzed and visualized.
I mainly use R to visualize data, but I've heard of Orange, too.
Not sure if it fits the question, but I just released this:
numStepCsvLogVis - analyze logfile data in CSV format
It uses Python's matplotlib, is motivated by the need to visualize syslog data in context of debugging kernel circular buffer operation (and variables) in C; and it visualizes by using CSV file format as intermediary to the logfile data (I cannot explain it better in brief - take a look at the README for more detail).
It has a "step" player accessed in terminal, and can handle "live" stdin input, but unfortunately, I cannot get a better response that 1 FPS when plot renders, so I wouldn't really call it "realtime" per se - but you can use it to eventually generate sonified videos of plot animations.
A simple solution is to use Logstalgia alongside the lightweight local-web-server.
First install the above. Then, from the root folder of your site visualise your logs in realtime with:
$ ws --log-format default | logstalgia -
Using SciTe, Notepad++ or other powerful text editor which have file processing routines, so you can create a script that colorizes parts of the log or just delete some non-important lines from it

Resources