How to create custom config files in OpenSMILE - audio

I am trying to extract some features from an audio sample using OpenSMILE, but I'm realizing how difficult it is to set up a config file.
The documentation is not very helpful. The best I could do was run some of the sample config files that are provided, see what came out, and then go into the config file and try to determine where the feature was specified. Here's what I did:
I used the default feature set used from The INTERSPEECH 2010 Paralinguistic Challenge (IS10_paraling.conf).
I ran it over a sample audiofile.
I looked at what came out. Then I read the config file in depth, trying to find out where the feature was specified.
Here's a little markdown table showing the results of my exploration:
| Feature generated | instruction in the conf file |
|-------------------|---------------------------------------------------------|
| pcm_loudness | I see: 'loudness=1' |
| mfcc | I see a section: [mfcc:cMfcc] |
| lspFreq | no matches for the text 'lspFreq' anywhere |
| F0finEnv | I seeF0finalEnv = 1 under [pitchSmooth:cPitchSmoother] |
What I see, is 4 different features, all generated by a different instruction in the config file. Well, for one of them, there was no disconcernable instruction in the config file that I could find. With no pattern or intuitive syntax or apparent system, I have no idea how I can eventually figure out how to specify my own features I want to generate.
There are no tutorials, no YouTube videos, no StackOverflow question and no blog posts out there talking about how this could be done. Which is really surprising since this is obviously a huge part of using OpenSMILE.
If anyone finds this, please, can you advise me on how to create custom config files of OpenSMILE? Thanks!

thanks for your interest in openSMILE and your eagerness to build your own configuration files.
Most users in the scientific community actually use openSMILE for its pre-defined config files for the baseline feature sets, which in version 2.3 are even more flexible to use (more commandline options to output to different file formats etc.).
I admit that the documentation provided is not as good as it could be. However, openSMILE is a very complex piece of Software with a lot of functionality, of which only the most important parts are currently well documented.
The best starting point would be to read the openSMILE book and the SIG'MM tutorials all referenced at http://opensmile.audeering.com/ . It contains a section on how to write configuration files. The next important element is the online help of the binary:
SMILExtract -L lists the available components
SMILExtract -H cComponentName lists all options which a given component supports (and thus also features it can extract) with a short description for each
SMILExtract -configDflt cComponentName gives you a template configuration section for the component with all options listed and defaults set
Due to the architecture of openSMILE, which is centered on incremental processing of all audio features, there is (at least not yet) no easy syntax to define the features you want. Rather, you define the processing chain by adding components:
data sources will read in data (from audio files, csv files, or microphone, for example),
data processors will do signal processing and feature extraction in individual steps (windowing, window function, FFT, magnitudes, mel-spectrum, cepstral coefficients (MFCC), for example for extracting MFCC); for each step there is a data processor.
data sinks will write data to output files or send results to a server etc.
You connect the components via the "reader.dmLevel" and "writer.dmLevel" options. These define a name of a data memory level that the components use to exchange data. Only one component may write to one level, i.e. writer.dmLevel=levelName defines the level and may appear only once. Multiple components can read from this level by setting reader.dmLevel=levelName.
In each component you then set the options to enable computation of features and set parameters for this. To answer your question about lspFreq: This is probably enabled by default in the cLsp component, so you don't see an explicit option for it. For future versions of openSMILE the practice of setting all options explicitly will and should be followed more tightly.
The names of the features in the output will be automatically defined by the components. Often each component adds a part the the name, so you can infer from the name the full chain of processing. The options nameAppend and copyInputName (available to most data processors) control this behaviour, although some components might internally override them or change the behaviour a bit.
To see the names (and other info) for each data memory level, including e.g. which features a component in the configuration produces, you can set the option "printLevelStats=5" in the section of componentInstances:cComponentManager.
As everyhting in openSMILE is built for real-time incremental processing, each data memory level has a buffer, which by default is a ring buffer to keep memory footprint constant when the application runs for a longer time.
Sometimes you might want to summarise features over a window of a given length (e.g. with the cFunctionals component). In this case you must ensure that the buffer size of the input level to this component is large enough to hold the full window. You do this via the following options:
writer.levelconf.isRb = 1/0 : sets type of buffer to ringbuffer (1) or fixed size buffer
writer.levelconf.growDyn = 1/0 : sets the buffer to dynamically grow if more data is written to it (1)
writer.levelconf.nT = sets the size of the buffer in frames. Alternatively you can use bufferSizeSec=x to set the size size in seconds and convert to frames automatically.
In most cases the sizes will be set correctly automatically. Subsequent levels also inherit the configuration from the previous levels. Exceptions are when you set a cFunctionals component to read the full input (e.g. only produce one feature at the end of the file), the you must use growDyn=1 on the level that the functionals component reads from, or if you use a variable framing mode (see below).
The cFunctionals component provides frameMode, frameSize, and frameStep options. Where frameMode can be full* (one vector produced at end of input/file), **list (specify a list of frames), var (receive messages, e.g. from a cTurnDetector component, that define frames on-the-fly), or fix (fixed length window). Only in the case of fix the options frameSize set the size of this window, and frameStep the rate at which the window is shifted forward. In case of fix the buffer size of the input level is set correctly automatically, in the other cases you have to set it manually.
I hope this helps you to get started! With every new openSMILE release we at audEERING are trying to document things a bit better and unify things through various components.
We also welcome contributions from the community (e.g. anybody willing to write a graphical configuration file editor where you drag/drop components and connect them graphically? ;)) - although we know that more documentation will make this easier. Until then, you always have to source code to read ;)
Cheers,
Florian

Related

How do I analyze a bunch of profiles

I have a bunch of profiles about my application(200+ profiles per week), I want to analyze them for getting some performance information.
I don't know what information can be analyzed from these files, maybe
the slowest function or the hot-spot code in the different release versions?
I have no relevant experience in this field, has anyone had that?
Well, it depends on the kind of profiles you have, you can capture number of different kinds of profiles.
In general most people are interested in the CPU profile and heap profile since they allow you to see which functions/lines use the most amount of CPU and allocate the most memory.
There are a number of ways you can visualize the data and/or drill into them. You should take a look at help output of go tool pprof and the pprof blog article to get you going.
As for the amount of profiles. In general, people analyze them just one by one. If you want an average over multiple samples, you can merge multiple samples with pprof-merge.
You can substract one profile from another to see relative changes. By using the -diff_base and -base flags.
Yet another approach can be to extract percentages per function for each profile and see if they change over time. You can do this by parsing the output of go tool pprof -top {profile}. You can correlate this data with known events like software updates or high demand and see if one functions starts taking up more resources than others(it might not scale very well).
If you are ever that the point where you want to develop custom tools for analysis, you can use the pprof library to parse the profiles for you.

How do i work with .osm files in terms of creation of nodes and edges so i can put it on a Djikstra Algorithm?

I have a MetroManila.osm file and i parse it so it can read roads only using this command: --keep = --keep-nodes=highway --keep-ways=. Is this the right command for filtering only the roads? What i want after parsing is to create a node where there's intersections or curves, or is it possible with just using the whole nodes in MetroManila.osm? Can i create an edge using it and how do i do it? Currently, i'm really lost on what to do since i'm fairly new on android studio and in osmdroid.
This hasn't really anything to do with Android or osmdroid.
For learning how to create a routing graph from an OSM file see the related answer at help.openstreetmap.org. Quoting from it:
parse all ways; throw away those that are not roads, and for the others, remember the node IDs they consist of, by incrementing a "link counter" for each node referenced.
parse all ways a second time; a way will normally become one edge, but if any nodes apart from the first and the last have a link counter greater than one, then split the way into two edges at that point. Nodes with a link counter of one and which are neither first nor last can be thrown away unless you need to compute the length of the edge.
(if you need geometry for your graph nodes) parse the nodes section of the XML now, recording coordinates for all nodes that you have retained.
Also take a look at routing in the OSM wiki to get some general hints and existing tools and libraries.
Regarding osmdroid: If you don't rely on offline routing then take a look at osmbonuspack, it supports online routing.
If you don't want to reinvent the wheel then just use one of the existing offline routing tools mentioned in the OSM wiki.

Adding big junks of custom data to jpg image file

I wonder if there is an obvious and elegant way to add additional data to a jpeg while keeping it readable for standard image viewers. More precisely I would like to embed a picture of the backside of a (scanned) photo into it. Old photos often have personal messages written on the back, may it be the date or some notes. Sure you could use EXIF and add some text, but an actuall image of the back is more preferable.
Sure I could also save 2 files xyz.jpg and xyz_back.jpg, or arrange both images side by side, always visible in one picture, but that's not what I'm looking for.
It is possible and has been done, like on Samsung Note 2 and 3 you can add handwritten notes to the photos you've taken as a image. Or some smartphones allow to embed voice recordings to the image files while preserving the readability of those files on other devices.
There are two ways you can do this.
1) Use and Application Marker (APP0–APPF)—the preferred method
2) Use a Comment Marker (COM)
If you use an APPn marker:
1) Do not make it the first APPn in the file. Every known JPEG file format expects some kind of format specific APPn marker right after the SOI marker. Make sure that your marker is not there.
2) Place a unique application identifier (null terminated string) at the start of the data (something done by convention).
All kinds of applications store additional data this way.
One issue is that the length field is only 16-bits (Big Endian format). If you have a lot of data, you will have to split it across multiple markers.
If you use a COM marker, make sure it comes after the first APPn marker in the file. However, I would discourage using a COM marker for something like this as it might choke applications that try to display the contents.
An interesting question. There are file formats that support multiple images per file (multipage TIFF comes to mind) but JPEG doesn't support this natively.
One feature of the JPEG file format is the concept of APP segments. These are regions of the JPEG file that can contain arbitrary information (as a sequence of bytes). Exif is actually stored in one of these segments, and is identified by a preamble.
Take a look at this page: http://www.sno.phy.queensu.ca/~phil/exiftool/#JPEG
You'll see many segments there that start with APP such as APP0 (which can store JFIF data), APP1 (which can contain Exif) and so forth.
There's nothing stopping you storing data in one of these segments. Conformant JPEG readers will ignore this unrecognised data, but you could write software to store/retrieve data from within there. It's even possible to embed another JPEG file within such a segment! There's no precedent I know for doing this however.
Another option would be to include the second image as the thumbnail of the first. Normally thumbnails are very small, but you could store a second image as the thumbnail of the first. Some software might replace or remove this though.
In general I think using two files and a naming convention would be the simplest and least confusing, but you do have options.

MPEG-DASH trick modes

Does anyone know how to do trick modes (rewind/forward at different speeds) with MPEG-DASH ?
DASH-IF Interoperability Points V3.0 states that it is possible.
the general idea is laid out in the document but the details are not specified.
A DASH segmenter should add tracks with a frame rate lower than normal to a specially marked AdaptationSet. Roughly you could say (even though in theory you should look at the exact profile/level thresholds) half frame rate is double playoutrate. A quarter frame rate is quadruple playoutrate.
All this is only an offer to the DASH client to facilitate ffwd. The client can use it but doesn't have to. If the DASH client doesn't understand the AdaptationSet at all it will disregard it due the EssentialProperty that marking it as track play AdaptationSet.
I can't see that fast rewind can be supported in any spec conforming way. You'd need to implement it according to your needs but with no expectation of interoperability.
You can try an indication on ISO/IEC 23009-1:2014(E) =>
Annex A
The client may pause or stop a Media Presentation. In this case client simply stops requesting Media Segments or parts thereof. To resume, the client sends requests to Media Segments, starting with the next Subsegment after the last requested Subsegment.
If a specific Representation or SubRepresentation element includes the #maxPlayoutRate attribute, then the corresponding Representation or Sub-Representation may be used for the fast-forward trick mode. The client may play the Representation or Sub-Representation with any speed up to the regular speed times the specified #maxPlayoutRate attribute with the same decoder profile and level requirements as the normal playout rate. If a specific Representation or SubRepresentation element includes the #codingDependency attribute with value set to 'false', then the corresponding Representation or Sub-Representation may be used for both fast-forward and fast-rewind trick modes.
Sub-Representations in combination with Index Segments and Subsegment Index boxes may be used for efficient trick mode implementation. Given a Sub-Representation with the desired #maxPlayoutRate, ranges corresponding to SubRepresentation#level all level values from SubRepresentation#dependencyLevel may be extracted via byte ranges constructed from the information in Subsegment Index Box. These ranges can be used to construct more compact HTTP GET request.

How can I create Log4j 2 appender connected with jTextPane?

I'm currently trying to make Log4j 2 log into a JTextPane. It should act like a STDERR or STDOUT in Netbeans IDE console (incl. text style - color).
I know that I need to create an appender and connect it with JTextPane, however I don't know how do it using Log4j 2.
Do you have any suggestions?
I appreciate your help,
marty
I have done this for Logback (with plain text only). The basic things you need to do are:
Implement your own Appender to receive the log events. Log4j 2 provides AbstractAppender, which will give you the baseline functionality.
Use an appropriate Layout to format the log event (will depend what type of Document you're using for your JTextPane.
Append the formatted text to the underlying Document for the JTextPane.
A couple of other points:
Things will be simpler if you log plain text only, in which case you should use a JTextArea.
Presumably you will want to cap the amount of text in the Document. You can do this by checking the length on each append and cutting out the first X% using Document.remove when it exceeds the maximum length.
If you have frequent log operations, you should limit the frequency at which you append to the document, and buffer the changes in between to reduce the swing update/repaint overhead. I typically use 3 Hz. This is also advisable when you have multiple log producer threads because although the Document.insertString method is thread-safe, it obtains a lock on the document before performing the update and can result in quite a bit of contention.
I'd highly recommend referencing the documentation for this. I've never used Log4j 2, but the documentation looks quite straight forward. Similarly, the "Using Text Components" section of the Java Tutorials provides everything you need to know about the Swing side. Unfortunately I can't provide additional links here.

Resources