lv2 plugin development - how to read MIDI time and note simultaneously - audio

I'm using Moony to prototype some components for a MIDI only lv2 plugin I'm building. I've been trying to work out how to get some sort of song position value from a noteOn event, meaning that I need to know the beat and bar that the note belongs to when it invokes the midiResponder. Even a total time or total frames will do to calculate. The way Moony works with timeResponder and midiResponder callbacks means I can know the time position or the note... but not both simultaneously. Looking at the lv2 midi spec it looks like only the event type, note number and velocity are properties of a noteOn event atom... so then I will face the same issue when I rewrite in C++ and integrate this code into my lv2 plugin? Is this right? Is there a work around?

The spec you looked at there describes the payload of an LV2 MIDI event, which is literally MIDI. The time stamp is available, but it is in the (generic) Event which contains the MIDI. In this way, all events are time stamped (in frames relative to the buffer), regardless of their payload type.
If you were to write this in C++, you would get a single buffer of events which includes time changes, MIDI events, and whatever other events the plugin may support. So, all the information is available, but managing this state so it is available where you want is up to you.

Related

Partial playback using playbackDuration/startTime in Google Cast Chrome API (v3)

I am trying to cast just a snippet of a file (say, only from 00:00:30 to 00:00:40) from a Chrome sender to the default receiver. Reading the API reference documentation documentation for LoadRequest, MediaInfo, and QueueItem, it seemed like I should be able to do this with some combination of these. In particular, the first queued item (loaded with CastSession#loadMedia) would need LoadRequest#currentTime set to the offset (30 seconds in my example above) and MediaInfo#duration set to the duration (10 seconds in my example), while subsequently queued items would set QueueItem#startTime and QueueItem#playbackDuration to the offset and duration (respectively).
However, this isn't happening in practice. I can confirm that the queue on the receiver has these fields set, but the no matter how I go about this, I can't get the right snippet to play. When I add the first media item as described above, the receiver just plays the track from beginning to end, neither respecting the offset nor the duration. Since the combination of LoadRequest#currentTime and MediaInfo#duration is a bit odd, I tried using only the QueueItem method (add the first media item with autoplay = false, add another queue item, remove the first, and then start playing the queue). In this case, the offset was still not respected, and the duration ended up being (very strangely) the sum of startTime and playbackDuration (in addition, any subsequently queued items would load, and then "finish" playing without starting, which I also can't figure out).
Does anyone else have experience with this part of the API? Am I reading the documentation incorrectly and what I'm doing just isn't supported, or am I just piecing things together incorrectly?
I am not sure I understand why you are attempting to use a queue with multiple items. First, the duration field is not what you think it is; it is not the duration of play back that you want, it is the total duration of the media that is being loaded, regardless of where you start or stop the playback. In fact, in most cases, you don't even need to set that; the receiver gets the total duration of the media when it loads he item, at least in the majority of the cases. The currentTime should work (if it is not, please file a bug on our SDK issue tracker) and alternatively, you can load a media (with autoplay off) and "seek" to the time you want and then play. To stop at a certain point, you need to monitor the the playback location and when it reaches that point, pause the playback.

Audio synthesis in Haskell using reactive-banana

I'm trying to get started with reactive-banana and want to create a simple synthesizer. There are lots of GUI examples, but I have trouble applying them to audio. Since audio APIs have callbacks that say "give me n samples of audio" I figure I should fire an event each callback (using the snd part of what newAddHandler returns) that contains the number of samples to generate, a pointer where they should be written, and timing info to coordinate MIDI events. The IO action passed to reactimate would then write the samples to the pointer. MIDI events would be similarly fired from another callback and also contain timing info.
This is where I get stuck however. I guess the audio signal is supposed to be a behaviour, but how do I "run" a behaviour for the right amount of time to obtain the samples? The right amount of course depends on MIDI events that might occur between two audio callbacks.
Presuming the intention is to do something live, I think firing an event for each callback is going to be extremely limiting. Most audio APIs expect that these callbacks will return very quickly (e.g. typically you would never call malloc or do blocking IO in one). Firing an FRP event may work for very simple processing, but I think if you try to do anything more complex you'll get dropouts in the audio stream.
I would expect a more viable approach is to fire events yourself (by a clock, or in response to GUI events, etc) and generate a buffer of audio, and have the callback API read from that buffer. I know that some audio APIs (e.g. portaudio) have a buffered mode which handles some of this automatically. Although if all you have is a callback API, it's not too hard to add a buffer on top of that.
To approach problems like this, I find useful to take a semantic viewpoint: What is an audio signal? What type can I use to represent it?
Essentially, an audio signal is a time-varying amplitude
Audio = Time -> Double
which suggests the representation as a behavior
type Audio = Behavior Double
Then, we can use the <#> combinator to query the amplitude at a particular moment in time, namely whenever an event happens.
However, for reasons of efficiency, audio data is generally stored in blocks of 64 bytes (or 128, 256). After all, processing needs to be fast and it's important to use tight inner loops. This suggests to model audio data as a behavior
type Audio = Behavior (Vector Double)
whose values are 64 byte blocks of audio data and which changes whenever the time period corresponding to 64 bytes is over.
Connecting to other APIs is done only after the semantic model has been clarified. In this case, it seems a good idea to write the audio data from the behavior into a buffer, whose contents is then presented whenever the external API calls your callback.
By the way, I don't know whether reactive-banana-0.8 is fast enough yet to be useful for sample-level audio processing. It shouldn't be too bad, but you may have to choose a rather large block size.

Streaming output from program to an arbitrary number of programs under Linux?

How should I stream the output from one program to an undefined number of programs in such a fashion that the data isn't buffered anywhere and that the application where the stream originates from doesn't block even if there's nothing reading the stream, but the programs reading the stream do block if there's no output from the first-mentioned program?
I've been trying to Google around for a while now, but all I find is methods where the program does block if nothing is reading the stream.
How should I stream the output from one program to an undefined number of programs in such a fashion that the data isn't buffered anywhere and that the application where the stream originates from doesn't block even if there's nothing reading the stream
Your requirements as stated can not possibly be satisfied without some form of a buffer.
Most straightforward option is to write the output to the file and let consumers read that file.
Another option is to have a ring-buffer in a form of a memory mapped file. As the capacity of a ring-buffer is normally fixed there needs to be a policy for dealing with slow consumers. Options are: block the producer; terminate the slow consumer; let the slow consumer somehow recover when it missed data.
Many years ago I wrote something like what you describe for an audio stream processing app (http://hewgill.com/nwr/). It's on github as splitter.cpp and has a small man page.
The splitter program currently does not support dynamically changing the set of output programs. The output programs are fixed when the command is started.
Without knowing exactly what sort of data you are talking about (how large is the data, what format is it, etc, etc) it is hard to come up with a concrete answer. Let's say for example you want a "ticker-tape" application that sends out information for share purchases on the stock exchange, you could quite easily have a server that accepts a socket from each application, starts a thread and sends the relevant data as it appears from the recoder at the stock market. I'm not aware of any "multiplexer" that exists today (but Greg's one may be a starting point). If you use (for example) XML to package the data, you could send the second half of a packet, and the client code would detect that it's not complete, so throws it away.
If, on the other hand, you are sending out high detail live update weather maps for the whole country, the data is probably large enough that you don't want to wait for a full new one to arrive, so you need some sort of lock'n'load protocol that sets the current updated map, and then sends that one out until (say) 1 minute later you have a new one. Again, it's not that complex to write some code to do this, but it's quite a different set of code to the "ticker tape" solution above, because the packet of data is larger, and getting "half a packet" is quite wasteful and completely useless.
If you are streaming live video from the 2016 Olympics in Brazil, then you probably want a further diffferent solution, as timing is everything with video, and you need the client to buffer, pick up key-frames, throw away "stale" frames, etc, etc, and the server will have to be different.

realtime midi input and synchronisation with audio

I have built a standalone app version of a project that until now was just a VST/audiounit. I am providing audio support via rtaudio.
I would like to add MIDI support using rtmidi but it's not clear to me how to synchronise the audio and MIDI parts.
In VST/audiounit land, I am used to MIDI events that have a timestamp indicating their offset in samples from the start of the audio block.
rtmidi provides a delta time in seconds since the previous event, but I am not sure how I should grab those events and how I can work out their time in relation to the current sample in the audio thread.
How do plugin hosts do this?
I can understand how events can be sample accurate on playback, but it's not clear how they could be sample accurate when using realtime input.
rtaudio gives me a callback function. I will run at a low block size (32 samples). I guess I will pass a pointer to an rtmidi instance as the userdata part of the callback and then call midiin->getMessage( &message ); inside the audio callback, but I am not sure if this is thread-sensible.
Many thanks for any tips you can give me
In your case, you don't need to worry about it. Your program should send the MIDI events to the plugin with a timestamp of zero as soon as they arrive. I think you have perhaps misunderstood the idea behind what it means to be "sample accurate".
As #Brad noted in his comment to your question, MIDI is indeed very slow. But that's only part of the problem... when you are working in a block-based environment, incoming MIDI events cannot be processed by the plugin until the start of a block. When computers were slower and block sizes of 512 (or god forbid, >1024) were common, this introduced a non-trivial amount of latency which results in the arrangement not sounding as "tight". Therefore sequencers came up with a clever way to get around this problem. Since the MIDI events are already known ahead of time, these events can be sent to the instrument one block early with an offset in sample frames. The plugin then receives these events at the start of the block, and knows not to start actually processing them until N samples have passed. This is what "sample accurate" means in sequencers.
However, if you are dealing with live input from a keyboard or some sort of other MIDI device, there is no way to "schedule" these events. In fact, by the time you receive them, the clock is already ticking! Therefore these events should just be sent to the plugin at the start of the very next block with an offset of 0. Sequencers such as Ableton Live, which allow a plugin to simultaneously receive both pre-sequenced and live events, simply send any live events with an offset of 0 frames.
Since you are using a very small block size, the worst-case scenario is a latency of .7ms, which isn't too bad at all. In the case of rtmidi, the timestamp does not represent an offset which you need to schedule around, but rather the time which the event was captured. But since you only intend to receive live events (you aren't writing a sequencer, are you?), you can simply pass any incoming MIDI to the plugin right away.

drop/rewrite/generate keyboard events under Linux

I would like to hook into, intercept, and generate keyboard (make/break) events under Linux before they get delivered to any application. More precisely, I want to detect patterns in the key event stream and be able to discard/insert events into the stream depending on the detected patterns.
I've seen some related questions on SO, but:
either they only deal with how to get at the key events (key loggers etc.), and not how to manipulate the propagation of them (they only listen, but don't intercept/generate).
or they use passive/active grabs in X (read more on that below).
A Small DSL
I explain the problem below, but to make it a bit more compact and understandable, first a small DSL definition.
A_: for make (press) key A
A^: for break (release) key A
A^->[C_,C^,U_,U^]: on A^ send a make/break combo for C and then U further down the processing chain (and finally to the application). If there is no -> then there's nothing sent (but internal state might be modified to detect subsequent events).
$X: execute an arbitrary action. This can be sending some configurable key event sequence (maybe something like C-x C-s for emacs), or execute a function. If I can only send key events, that would be enough, as I can then further process these in a window manager depending on which application is active.
Problem Description
Ok, so with this notation, here are the patterns I want to detect and what events I want to pass on down the processing chain.
A_, A^->[A_,A^]: expl. see above, note that the send happens on A^.
A_, B_, A^->[A_,A^], B^->[B_,B^]: basically the same as 1. but overlapping events don't change the processing flow.
A_, B_, B^->[$X], A^: if there was a complete make/break of a key (B) while another key was held (A), X is executed (see above), and the break of A is discarded.
(it's in principle a simple statemachine implemented over key events, which can generate (multiple) key events as output).
Additional Notes
The solution has to work at typing speed.
Consumers of the modified key event stream run under X on Linux (consoles, browsers, editors, etc.).
Only keyboard events influence the processing (no mouse etc.)
Matching can happen on keysyms (a bit easier), or keycodes (a bit harder). With the latter, I will just have to read in the mapping to translate from code to keysym.
If possible, I'd prefer a solution that works with both USB keyboards as well as inside a virtual machine (could be a problem if working at the driver layer, other layers should be ok).
I'm pretty open about the implementation language.
Possible Solutions and Questions
So the basic question is how to implement this.
I have implemented a solution in a window manager using passive grabs (XGrabKey) and XSendEvent. Unfortunately passive grabs don't work in this case as they don't capture correctly B^ in the second pattern above. The reason is that the converted grab ends on A^ and is not continued to B^. A new grab is converted to capture B if still held but only after ~1 sec. Otherwise a plain B^ is sent to the application. This can be verified with xev.
I could convert my implementation to use an active grab (XGrabKeyboard), but I'm not sure about the effect on other applications if the window manager has an active grab on the keyboard all the time. X documentation refers to active grabs as being intrusive and designed for short term use. If someone has experience with this and there are no major drawbacks with longterm active grabs, then I'd consider this a solution.
I'm willing to look at other layers of key event processing besides window managers (which operate as X clients). Keyboard drivers or mappings are a possibility as long as I can solve the above problem with them. This also implies that the solution doesn't have to be a separate application. I'm perfectly fine to have a driver or kernel module do this for me. Be aware though that I have never done any kernel or driver programming, so I would appreciate some good resources.
Thanks for any pointers!
Use XInput2 to make device(keyboard) floating, then monitor KeyPress and KeyRelease event on the device, using XTest to regenerate KeyPress & KeyRelease event.

Resources