I'm working on the sound part of an interactive installation that would need an event to be triggered by osc an undefined number of times, making the sound linked to it overlaps instead of being rewinded and started again.
Would it be possible to do that without needing to make an array of loadings of the same sound?
I'm actually trying to do it with processing and minim library.
Do you think it would be easier to achieve it with another programming software? I've found myself in the same difficulties trying to do it with puredata. Any tip or clue would be extremely welcome.
Thanks a lot.
You will need multiple readers ([tabread~] resp [tabplay~] in Pd; i don't know about Processing/minim, but the same principle applies) to read the table multiple times (in parallel), where each one can be started separately.
However, you only need a single instance of your data array (e.g. [table]), as the various readers can access the same array independently.
Can you use Java libraries in Processing? Processing is built on Java, yes?
If you can, I have a library you can use, supporting a class I call AudioCue available via github. This is modeled on a Java Clip but with additional capabilities. It allows multiple, concurrent playback. AudioCue also has real time controls for volume, panning and playback speed, in case you want to play around with adding some more interactivity to your installation.
I would love to know if it can be used with Processing. Please follow up with me if you try this route. I'd like to see it done, and can possibly assist.
If Processing allows you to send PCM directly out for playback, then the basic algorithm is the store the audio data in an array, and create pointers or cursors (depending on your preferred terminology) that independently iterate through that array. This is the main basis of the algorithm I use for AudioCue, with the PCM being routed out via a Java SourceDataLine.
Related
I'm looking for a program that is able to recognize individual audio samples from my computer and reroute them to trigger WAV files from a library. In my project, it would need to be realtime as the latency would not be a desired result. I tried using dictation software that would recognize words to trigger opening a file and that's the direction where I want to go, but instead of words I want it to be sounds and it would happen in realtime. I'm not sure where to go and am just looking for some guidance. Does anyone have any suggestions of what I should do?
That's a fairly broad question, but I can tell you how I would do it. (Hardly the only way, but where I would start.)
If you're looking for real time input, the Java Sound library (excellent tutorial here) allows for that. (Just note that microphone input from a web page is difficult on anything, due to major security concerns, so this would be a desktop application.)
If it needs to be real time, the first thing I would suggest is stream and multithread the hell out of it. I would suggest the Java 8 Stream API, but since you're looking for subsamples that match a specific pattern, then each data point will have to be aware of the state of its neighbors, and that isn't easy with streams.
You will probably want to know if a sound roughly resembles an audio profile, so for that, I would pick a tolerance on just how close you want it to be for a match (remembering that samples may not line up 100% anyway, so "exact" is not an option), and then look up Hidden Markov Models. I suggest these because they're what voice recognition software typically uses, and while your sounds may not be voices, it will give you an idea of what has already been done.
You'll also want to maintain a limited list of audio samples in memory. Specifically, you will likely need the most recent data, because an audio signal is a time-variant signal, and you can't get a match from just one point. I wouldn't make it much longer than the longest sample you're looking to recognize, as audio takes up a boatload of memory.
Lastly (for audio), I would recommend picking a standard format for comparison. Make it as good as gets you decent results, and start high. You will want to convert everything to that format before you compare it.
Once you recognize a specific sound, it's basically a Command Pattern. Specific sounds can be mapped, even with a java.util.HashMap, to specific files, which (if there are few enough) you might even have pre-loaded.
Lastly, it's worth looking at the Java Speech API. It's not part of the JDK and it's quite dated, but you might get some good advice from its implementation.
This is of course the advice of a Java-preferring programmer, but I imagine that there might be some decent libraries in Python and Ruby to help you as well; and of course there's something in C somewhere. This may sound like a lot, but most of the material is already implemented and ready-to-go.
Hopefully this helps, let's look forward to other answers.
I'm struggling to choose between a vast number of audio programming languages and APIs. I'm very (totally) new to audio programming so please bear with me.
Software
I need to be able to:
Alter volume of different sounds before outputting them to anything (these sounds can have a variety of different origins, for example mp3s and microphone input)
phase shift sounds
superimpose sounds that I have tweaked (as per items 1 and 2)
control the output to each of 8 channels independently of one another
make this all happen on Windows7
These capabilities need be abstracted by a graphical frontend I will probably make myself. What I want to be able to do is create 'sound sources' and move them around a 3D environment along either pre-defined trajectories and/or in relation to the movement of whoever is inside the rig. The reason I want to do pitch bending is so I can mess with red-shift stuff.
I don't want to have to construct full tracks before-hand and just play them. I want the sound that is played to depend on external input from sensors as well as what I am doing on the frontend.
As far as I know this means I cant use any existing full audio making app.
The Question
I've been looking around for for the API or language I should use and I have not turned up a blank, quite the opposite actually. I'm struggling to narrow down my search. A lot of my problem stems from the fact that I have no experience in audio programming.
So, does anyone know off-hand of an API or language that meets my criteria?
Hardware stuff and goals
(I left this until last because I'm not sure how relevant it is)
My goal is to make three rings of speakers at different heights and to have enough control over them to be able to simulate any number of 'sound sources' within the array. The idea is to have someone stand in the middle of the rig and be able to make it sound like there are lots of things moving around them. To get this working I'm planning on doing a little trig and using 8 channels of audio from my PC. The maths is pretty straight forward, it just the rest that I need to worry about
What I want to do next is attach a bunch of cameras to the thing and do some simple image recognition stuff to be able to 'attach sound sources' to different objects. Eg. If someone is standing in the right place it can be made to seem as though all red balls quack like a duck, and all orange ones moan hauntingly.
This is not to detract from Richard Small's answer, but to comment on some of the other options out there:
If you are looking for something higher-level with which you can prototype and develop this faster, you want max/msp or it's open source competitor puredata. These are designed for musicians who are technically minded, but not so much for programmers. As a result, you can build this sort of thing quickly and efficiently.
You also have some lower level options: PortAudio can handle your audio I/O, you would have to do the sound generation and effects and so on on your own or with other libraries. Cinder and OpenFramewoks both provide interfaces for audio, cameras, and other stuff for "creative programming". I'm afraid I don't know if they meet your full requirements, but they are powerful and popular for this sort of thing so I encorage you to look at them.
The two major ones these days tend to be
WWise
WWise Download Link
FMOD
FMOD Download Link
These two engines may even in fact be overkill for what you need, but I can almost guarantee that they will be capable of anything you require.
I'm developing a sort of a "advanced playback" audio application to aid music transcription. The idea is to allow the user to change the audio tempo/pitch, as well as select and possibly loop parts of the audio track. I've opted to use gstreamer for the time being. I have the scaletempo plugin in the pipeline to aid with changing the tempo. I am unsure as to what's the best way to do the looping.
From reading the docs it seems that I could get it done by performing a gst_element_seek on the scaletempo element and setting the *stop_type* and stop parameters, waiting for an EOS on the message bus, and then performing yet another seek etc.
Is there a better way to do it? Ideally I'd like to get smooth looping, though it's not a dealbreaker if I don't. The gstreamer docs contain mentions of a concept of "segments", but from glancing at the docs I still don't have any idea what they are or whether they're useful in my scenario.
Pointers to code in C/Python/Haskell/whatever are very much welcome.
I am using gstreamer and it's deinterlace method, there are several options for the method, (options).
Some are marked as bad and should not be used. But which should I use?
I am processing a video file and saving to a new file (i.e. not streaming), so I'm willing to sacrific CPU processing time to get good results. I also am not concerned about the output file size, and am willing to sacrific that for better results. I also don't care if I do (or don't) need to run this more than once.
I am digitizing old family VHS tapes, and noticed this when the small kids were moving/runing/jumping around. Since this is home made video tapes, there aren't many 'scenes' or special effects or fast moving cars. Just long continuous shots of people doing things.
So given that I don't care about lots of factors (processing time etc.), and the nature of the content, which is the best method to use?
I have come up with an idea for an audio project and it looks like Go is a useful language for implementing it. However, it requires the ability to apply filters to incoming audio, and Go doesn't appear to have any sort of audio processing package. I can use cgo to call C code, but every signal processing library I find uses C++ classes which cgo cannot handle. It looks like libsox may work. Are there any others?
What libsox can provide and what I need is to take an incoming audio stream and divide it into frequency bands. If I can do this while only reading the file once, then bonus! I am not sure if libsox can do this.
If you want to use a C++ library you could try SWIG, but you'll have to get it out of Subversion. The next release (2.0.1) will be the first released version to support Go. In my experience the Go support is still a little rough, but then again the library I tried to wrap is a monster.
Alternatively, you could still create your own bindings through cgo using the same method SWIG does, but it will be painful and tedious. The basic idea is that you first create a C wrapper, then let cgo create a Go wrapper around your C wrapper.
I don't know anything about signal processing or libsox, though. Sorry.
There is a relatively new project called ZikiChombo
which contains so far some basic DSP functionality geared toward audio, see here
The dsp part of the project has filters on its roadmap, but they are not yet there. On the other hand some infrastructure for implementing filters, such as real fft and block convolution is there. Meaning that if you want FIRs, and can compute the coefficients by some other means, you can run them via convolution in zc currently with sound in real time.
Basic filtering design support (FIR,Biquad), for example using an ideal filter as a starting point will be the next step for zc. There are numerous small self-contained open source projects for basic and more advanced FIR and IIR filter design, most notably Iowa Hills which might be more accessible than a larger project to compute filter coefficients outside of Go.
More advanced filtering such as Butterworth, and filters based on polynomial solving and the bilinear transform will take more time for zc.
There is also some software defined radio Golang projects with some code related to filtering, sorry don't have the links offhand but a search for the topic may lead you to them.
Finally, there is a gonum Fourier package which also supplies fft.
So Go is growing some interesting and potentially stuff in this domain, but still has quite a ways to go compared to older projects (which are mostly in C/C++, or perhaps with a Python wrapper via numpy for example).
I am using this pure golang repo to perform Fourier Transforms with good effect
https://github.com/mjibson/go-dsp
just supply the FFT call with a
import (
"github.com/mjibson/go-dsp/fft" // https://github.com/mjibson/go-dsp
)
var audio_wave []float64
// ... now populate audio_wave with your audio PCM samples
var complex_fft []complex128
// input time domain ... output frequency domain of equally spaced freq bins
complex_fft = fft.FFTReal(audio_wave)