How can I create Log4j 2 appender connected with jTextPane? - jtextpane

I'm currently trying to make Log4j 2 log into a JTextPane. It should act like a STDERR or STDOUT in Netbeans IDE console (incl. text style - color).
I know that I need to create an appender and connect it with JTextPane, however I don't know how do it using Log4j 2.
Do you have any suggestions?
I appreciate your help,
marty

I have done this for Logback (with plain text only). The basic things you need to do are:
Implement your own Appender to receive the log events. Log4j 2 provides AbstractAppender, which will give you the baseline functionality.
Use an appropriate Layout to format the log event (will depend what type of Document you're using for your JTextPane.
Append the formatted text to the underlying Document for the JTextPane.
A couple of other points:
Things will be simpler if you log plain text only, in which case you should use a JTextArea.
Presumably you will want to cap the amount of text in the Document. You can do this by checking the length on each append and cutting out the first X% using Document.remove when it exceeds the maximum length.
If you have frequent log operations, you should limit the frequency at which you append to the document, and buffer the changes in between to reduce the swing update/repaint overhead. I typically use 3 Hz. This is also advisable when you have multiple log producer threads because although the Document.insertString method is thread-safe, it obtains a lock on the document before performing the update and can result in quite a bit of contention.
I'd highly recommend referencing the documentation for this. I've never used Log4j 2, but the documentation looks quite straight forward. Similarly, the "Using Text Components" section of the Java Tutorials provides everything you need to know about the Swing side. Unfortunately I can't provide additional links here.

Related

Log4j - how to tokenize logs like Elastic Search and group by values, filter and visualize them?

When you know the logging format/pattern, is there a way to visualize/classify/filter logs with certain thread name/client id/... without using Elastic Search? I want to show for example, "this thread has these lines, ordered by timestamp, with these keywords". I don't want to manually search by thread names anymore.
I can manually grep, of course, but I want to visualize them in a better way. I can search them if I know the thread names, but I want to show all of the thread names without knowing them previously.
Also, ES marks session boundary, but grep does not. You can see that a session/request is different from another, because the session id has changed. A thread can undertake several sessions in a period of time.
I have only seen that in ES for now. I think it can be called as a log tokenizer and classifier tool, but towards static log files which haven't been fed into ES.
Now, I am trying to do this in Excel: adding ; between tags, and parse them to columns, and classify/filter. In the screenshot, thread names is filtered to be one. But I found it very hard to do if logs are too many because separating by space is not good, need to manually add ; or other unique separator.
Log4j2 has a companion project to visualize logs: Chainsaw. It does not have a lot of features, but it might be enough for your needs.

How to create custom config files in OpenSMILE

I am trying to extract some features from an audio sample using OpenSMILE, but I'm realizing how difficult it is to set up a config file.
The documentation is not very helpful. The best I could do was run some of the sample config files that are provided, see what came out, and then go into the config file and try to determine where the feature was specified. Here's what I did:
I used the default feature set used from The INTERSPEECH 2010 Paralinguistic Challenge (IS10_paraling.conf).
I ran it over a sample audiofile.
I looked at what came out. Then I read the config file in depth, trying to find out where the feature was specified.
Here's a little markdown table showing the results of my exploration:
| Feature generated | instruction in the conf file |
|-------------------|---------------------------------------------------------|
| pcm_loudness | I see: 'loudness=1' |
| mfcc | I see a section: [mfcc:cMfcc] |
| lspFreq | no matches for the text 'lspFreq' anywhere |
| F0finEnv | I seeF0finalEnv = 1 under [pitchSmooth:cPitchSmoother] |
What I see, is 4 different features, all generated by a different instruction in the config file. Well, for one of them, there was no disconcernable instruction in the config file that I could find. With no pattern or intuitive syntax or apparent system, I have no idea how I can eventually figure out how to specify my own features I want to generate.
There are no tutorials, no YouTube videos, no StackOverflow question and no blog posts out there talking about how this could be done. Which is really surprising since this is obviously a huge part of using OpenSMILE.
If anyone finds this, please, can you advise me on how to create custom config files of OpenSMILE? Thanks!
thanks for your interest in openSMILE and your eagerness to build your own configuration files.
Most users in the scientific community actually use openSMILE for its pre-defined config files for the baseline feature sets, which in version 2.3 are even more flexible to use (more commandline options to output to different file formats etc.).
I admit that the documentation provided is not as good as it could be. However, openSMILE is a very complex piece of Software with a lot of functionality, of which only the most important parts are currently well documented.
The best starting point would be to read the openSMILE book and the SIG'MM tutorials all referenced at http://opensmile.audeering.com/ . It contains a section on how to write configuration files. The next important element is the online help of the binary:
SMILExtract -L lists the available components
SMILExtract -H cComponentName lists all options which a given component supports (and thus also features it can extract) with a short description for each
SMILExtract -configDflt cComponentName gives you a template configuration section for the component with all options listed and defaults set
Due to the architecture of openSMILE, which is centered on incremental processing of all audio features, there is (at least not yet) no easy syntax to define the features you want. Rather, you define the processing chain by adding components:
data sources will read in data (from audio files, csv files, or microphone, for example),
data processors will do signal processing and feature extraction in individual steps (windowing, window function, FFT, magnitudes, mel-spectrum, cepstral coefficients (MFCC), for example for extracting MFCC); for each step there is a data processor.
data sinks will write data to output files or send results to a server etc.
You connect the components via the "reader.dmLevel" and "writer.dmLevel" options. These define a name of a data memory level that the components use to exchange data. Only one component may write to one level, i.e. writer.dmLevel=levelName defines the level and may appear only once. Multiple components can read from this level by setting reader.dmLevel=levelName.
In each component you then set the options to enable computation of features and set parameters for this. To answer your question about lspFreq: This is probably enabled by default in the cLsp component, so you don't see an explicit option for it. For future versions of openSMILE the practice of setting all options explicitly will and should be followed more tightly.
The names of the features in the output will be automatically defined by the components. Often each component adds a part the the name, so you can infer from the name the full chain of processing. The options nameAppend and copyInputName (available to most data processors) control this behaviour, although some components might internally override them or change the behaviour a bit.
To see the names (and other info) for each data memory level, including e.g. which features a component in the configuration produces, you can set the option "printLevelStats=5" in the section of componentInstances:cComponentManager.
As everyhting in openSMILE is built for real-time incremental processing, each data memory level has a buffer, which by default is a ring buffer to keep memory footprint constant when the application runs for a longer time.
Sometimes you might want to summarise features over a window of a given length (e.g. with the cFunctionals component). In this case you must ensure that the buffer size of the input level to this component is large enough to hold the full window. You do this via the following options:
writer.levelconf.isRb = 1/0 : sets type of buffer to ringbuffer (1) or fixed size buffer
writer.levelconf.growDyn = 1/0 : sets the buffer to dynamically grow if more data is written to it (1)
writer.levelconf.nT = sets the size of the buffer in frames. Alternatively you can use bufferSizeSec=x to set the size size in seconds and convert to frames automatically.
In most cases the sizes will be set correctly automatically. Subsequent levels also inherit the configuration from the previous levels. Exceptions are when you set a cFunctionals component to read the full input (e.g. only produce one feature at the end of the file), the you must use growDyn=1 on the level that the functionals component reads from, or if you use a variable framing mode (see below).
The cFunctionals component provides frameMode, frameSize, and frameStep options. Where frameMode can be full* (one vector produced at end of input/file), **list (specify a list of frames), var (receive messages, e.g. from a cTurnDetector component, that define frames on-the-fly), or fix (fixed length window). Only in the case of fix the options frameSize set the size of this window, and frameStep the rate at which the window is shifted forward. In case of fix the buffer size of the input level is set correctly automatically, in the other cases you have to set it manually.
I hope this helps you to get started! With every new openSMILE release we at audEERING are trying to document things a bit better and unify things through various components.
We also welcome contributions from the community (e.g. anybody willing to write a graphical configuration file editor where you drag/drop components and connect them graphically? ;)) - although we know that more documentation will make this easier. Until then, you always have to source code to read ;)
Cheers,
Florian

Millions of rows in GUI

I would like to implement a GUI handling a huge number of rows and I need to use GTK in Linux.
I started having a look at GTKTreeView with lists but I don't think that adding millions of lines directly to that widget will help in having a GUI that doesn't slow the application.
Do you know whether there is a GTK widget already in place for this problem or do I have to handle my self the window frame that that must display those lines? Eventually I would write the data directly using GtkDrawingArea (essentially writing a new widget).
Any suggestion about any GTK topic or project I can look as starting point for my research?
As suggested in the comments, you can use the Cell Data Func, and get the displayed data under contro. But I have another idea: Millions of lines are much much more than any amount of information a human user can see and understand. So maybe a better, more usable and user-friendly solution, is to diaplay the data in a way the users can more easily navigate in it.
Imagine opening a huge hierarchy, scrolling down, and forgetting what were the top-level items you opened.
Example for a possible solution: Have a combo box which allows to choose some filter or category, and this can reduce the amount of data to a reasonable amount the user can more easily navigate and make a mental model of it if necessary.
NOTE: As far as I know, GtkTreeView doesn't support sorting/filtering and drag-n-drop at the same time, so if you want to use both features, I suggest you use the existing drag-n-drop functionality (otherwise very complicated to implement by hand) and implement your own sorting/filtering.

Is there a way to get download all the statistics on events at once from Flurry?

We are bumping into limitations with Flurry. We use events and parameters to track some game play info (like number of KO/map) but 1/ the limit of 15 parameters per event is a problem and 2/ the visualisation is not good (for instance Ko/map is shown by map so we have to open each event one after another).
We are trying to build a better visualisation with excel using the CSV files provided by Flurry, but then again we need to download the 50+ CSV files and it's really not convenient.
Is there a way to get all the information in one CSV or to get the information another way?
As a side note Flurry support is not answering any of our emails. :(
thanks for your help!
Have you tried checking out playtomic instead. Sounds like it might match your problem better.
They have an API to access your data. So you should be able to access it realtime.
You might also want to check out www.parse.com

Integrating with 500+ applications

Our customers use 500+ applications and we would like to integrate these applications with our. What is the best way to do that? These applications are time registration applications and common for most of them is that they can export to csv or similar, some of them are actually home-brewed excel sheets where time is registered.
The best idea so far is to create our own excel sheet, which can be used to integrate with all these applications. The integrations could be in the form of cells containing something like ='[c:\export.csv]rawdata'!$A$3 Where export.csv is the csv file exported from the time registration applications. Can you see a better way to integrate against all these applications? It should be mentioned that almost all our customers have Microsoft Office.
Edit: Answers to the excellent questions from Pontus Gagge:
How similar are the data in the different applications?
I assume that since they time registration applications, they will have some similarities, but I assume that some will register the how long time one has worked in total for a whole month, while others will spesify for each day. If Excel is chosen, I believe that many of the differences could be ironed out using basic formulas.
What quality is the data?
The quality of the data can vary so basic validation must be undertaken, a good way is also to make it transparent for the customers, how our application understands their input, so they are responsible.
How large amounts of data are you talking about?
There will be information about the time worked for up to 50 employees.
Is the integration one-way only?
Yes
With what frequency should information be transferred?
Once per month (when they need to pay salaries).
How often do the applications themselves change, and how often does your product change?
If their application is a home-brewed Excel sheet, then I assume it will change once a year (due for example a mistake someone). If it is a standard proper time registration application, then I do not believe they are updated more often than every fifth year or so, as it is a very stabile concept.
Should the integration be fully automatic or can your end users trigger a data transfer?
They can surely trigger data transfer. The users are often dedicated to the process so they can be trained at doing it, which means that they could make up to, say 30, mouse clicks in order to integrate each month.
Will the customers have somebody to monitor the integrations?
As we have many customers, many of them should be able to undertake the integration themselves. We will though be able to assist them over the telephone. We cannot, though undertake the integration ourselves because we would then be responsible for any errors due to user mistakes, etc.
Does the phrase 'integration spaghetti' mean anything to you...?
I am looking for ideas from the best chefs to cook a nice large portion of that.
You need to come up with a common data format, and a way to translate the individual data formats to the common format. There's really no way around this - any solution you come up with will have to do this in one way or the other. It's the essential complexity of what you're doing.
The bigger issue is actually variances within the source data, in terms of how things like dates are stored, missing columns, etc. Doing a generic conversion for CSV to move columns around is comparatively easy.
I would also look at CSV and then use an OLEDB connection against the CSV file for importing.
If you try to make something that can interface to any data structure in the universe (and 500 is plenty close enough), it is guaranteed to be a maintenance nightmare. Instead I would approach this from multiple angles:
Devise an interface into which a human can enter this data already in the proper format. With 500+ clients, I'd make this a small, raw but functional browser based site that users can use to enter this information manally. This is the fall-back. At the end of the day, a human can re-key the information into the site and solve the import issue. Ideally, everyone would use this instead of their own format. Data entry people are cheap.
Similar to above, but expanded, I would develop a standard application or standardize on an off-the-shelf application that can be used to replace their existing format. This might take more time than #1. The goal would be to only do one-time imports of these varying data schemas into the application and be done with them for good.
The nice thing about spreadsheets is that you can do anything anywhere. The bad thing about spreadsheets is that you can do anything anywhere. With CSV or a spreadsheet there is simply no way to enforce data integrity and thus consistency (which is the primary goal) on the data. If the source data is already in a database, then that is obviously simpler.
I would be inclined to use database format into which each of these files need to be converted rather than a spreadsheet (e.g. use something like Jet (MDB)). If you have non-Windows users then that will make it harder and you might have to use a spreadsheet. The problem is that it is too easy for the user to change their source structure, break their upload and come crying to you. If a given end user has a resident expert, they can find a way of importing the data into that database format . If you are that expert, then I would on a case-by-case basis, write something that would import into that database format. XML would be the other choice, but that will likely take more coding than an import/export into a database format.
Standardization of the apps (even having all the sources in a database format instead of a spreadsheet would help) and control over the data schema is the ultimate goal rather than permitting a gazillion formats. There really is no nice answer other than standardization. Otherwise, you are having to write a converter for every Tom-Dick-and-Harry format and again when someone changes the source format.
With a multitude of data sources mapping each one correctly to an intermediate format is not trivial. Regular expressions are good with a finite set of known data formats. Multipass can help when data is ambiguous without context (month,day fields and have several days of data), and also help defeat data entry errors. But it seems as this data is connected to salaries there needs a good reliable transfer.
An import configuring trick
Get the customer to make a set of training data in the application. It should have a "predefined unique date" and each subsequent data field have a number corresponding to the target data field in your application. On importing your application needs to recognise the predefined date, determine the unique translation required and effect the displaying/saving of this "mapping key", and stop the import. eg If you expect "Duration hours" in field two then get the user to enter 2 in the relevant field which might be "Attendance hours".
On subsequent runs, and with the mapping definition key, import becomes a fairly easy process of translation.
Note on terms
"predefined date" - must be historical, say founding date of your company?, might need to be in PC clock settable range.
"mapping key" - could be string of hex digits and nybble based so tractable to workout
The entered code can be extended to signify required conversions ie customer's application has durations in days and your application expects it in hours.
Interfacing with windows programs (in order if increasing fragility)
Ye Olde saving as CSV file
Print to operating system printer that is setup as a text file/pdf, then scavenge the data out of that
Extract data via the application interface control, typically ActiveX for several windows programs ie like Matlab's Spreadsheet Link
Read native file format xls format ie like Matlab's xlsread
Add an additional intermediate spreadsheet sheet that has extended cell references ie ='[filename]rawdata'!$A$3
Have a look at Teiid by JBoss: http://jboss.org/teiid
Also consider using SOA - e.g., if you're on Java, try JBoss SOA platform: http://www.jboss.com/resources/soa/?intcmp=1004
Use a simple XML format. A non-technical person can easily understand a simple XML format (and could even identify basic problems with XML documents that are not well-formed).
Maybe use a DTD (or even better an XML schema) to do very basic validation, and then supplement this with an XSL stylesheet to do more validation with better error reporting. (An XSL stylesheet simply converts from XML to something else and so can be generate readable error messages.)
The advantage of this approach is that web browsers such as Internet Explorer can apply the XSL stylesheets. A customer need only spend at most a day enhancing their applications or writing excel macros to generate the XML data in the format that you specify.
Recent versions of Excel have support for converting spreadsheet data to XML, and can even validate against schemas.
Once the data passes the XSL validation checks, you have validated XML data.
If you have heaps of data and heaps of money, you could look at existing data management and cleansing tools:
http://www-01.ibm.com/software/data/infosphere/datastage
http://www-01.ibm.com/software/data/infosphere/qualitystage
But even then, you'll likely need to follow kyoryu's suggestion assuming you have 500+ data formats. The problem isn't your side. You need them to standardize their output formats if you have no control over their apps. CSV is likely the easiest. You could even send them a excel template to help them along.

Resources