I need to develop a program that toggles a particular audio track on or off when it recognizes a parrot scream or screech. The software would need to recognize a particular range of sounds and allow some variations in the range (as a parrot likely won't replicate its sreeches EXACTLY each time).
Example: Bird screeches, no audio. Bird stops screeching for five seconds, audio track praising the bird plays. Regular chattering needs to be ignored completely, as it is not to be discouraged.
I've heard of java libraries that have speech recognition with dictionaries built in, but the software would need to be taught the particular sounds that my particular parrot makes - not words or any random bird sound. In addition as I mentioned above, it would need to allow for slight variation in the sound, as the screech will likely never be 100% identical to the recorded version.
What would be the best way to go about this/what language should I look into?
Edit: Alternatively (and perhaps this would be a more simple solution), is there a way to make the audio toggle based on the volume of input? So it wouldn't matter what kind of sound the parrot makes, just how loud it is?
This question seems to be tightly related to voice recognition. I would recomend taking a look at this post: How to convert human voice into digital format?
Related
Here is what i like to achieve:
I like to play around in creating "new" software / hardware instruments.
Sound processing and creation is always managed by software. But one could play the instrument via ultrasonic distance sensor for example. Another idea is to start playback when someone interrupts the light of a photoelectric barrier and so on....
So the instrument would play common sounds, but has to be used in an unusal way. For example, the ultrasonic instrument would play a sound if it detects something in a certain distance. The sound could be manipiulated in pitch for example if the distance gets smaller.
Basically i like to playback a sound sample and manipualte this in realtime.
I guess i have to use WAV samples for this, right? And which programming language do you think fits best for this task?
Edited after kevins hint: please kick me into the right direction - give me a hint where to start.
Thanks in advance
Since you're using the the Processing tag, you can try Processing.
It comes with a sound library like Minim or you can install beads which is great. There's actually a nice book on it: Sonifying Processing
You might find SuperColider fun as well.
The main thing is what are you comfortable with at the moment ?
If Processing syntax looks intimidating, you can actually try a different programming paradigm like data flow. In which case you can use PureData(free, opensource) or MaxMSP(very similar, but commercial). The idea is rather than typing instructions, you connect boxes with wires which is fun and the examples are great too.
If you're into c++ there are plenty of libraries. On the creative side, there's a nice set of libraries called OpenFrameworks that's easy and fun to use. If this is your cup of tea, have a peek at Maximilian.
Bottomline is: there are multiple options to achieve the same task. Choose the best tool for your (based on your background) or try each and see what you like best.
You asked "And which programming language do you think fits best for this task?" - I would also suggest using Processing. I have been used Processing to work with sounds previously. And in all cases I used Minim. It has many UgenS to generate sounds programmatically.
Also, you wants to integrate with some sensors. I'm not sure what types of sensors you will use, but Processing goes pretty well with different Arduino modules and sensors. Check this link for more direction.
Furthermore, you can export your project as .exe or executable .jar files. And their JS version (P5.js) works almost the same as the Java version.
I'm looking for a program that is able to recognize individual audio samples from my computer and reroute them to trigger WAV files from a library. In my project, it would need to be realtime as the latency would not be a desired result. I tried using dictation software that would recognize words to trigger opening a file and that's the direction where I want to go, but instead of words I want it to be sounds and it would happen in realtime. I'm not sure where to go and am just looking for some guidance. Does anyone have any suggestions of what I should do?
That's a fairly broad question, but I can tell you how I would do it. (Hardly the only way, but where I would start.)
If you're looking for real time input, the Java Sound library (excellent tutorial here) allows for that. (Just note that microphone input from a web page is difficult on anything, due to major security concerns, so this would be a desktop application.)
If it needs to be real time, the first thing I would suggest is stream and multithread the hell out of it. I would suggest the Java 8 Stream API, but since you're looking for subsamples that match a specific pattern, then each data point will have to be aware of the state of its neighbors, and that isn't easy with streams.
You will probably want to know if a sound roughly resembles an audio profile, so for that, I would pick a tolerance on just how close you want it to be for a match (remembering that samples may not line up 100% anyway, so "exact" is not an option), and then look up Hidden Markov Models. I suggest these because they're what voice recognition software typically uses, and while your sounds may not be voices, it will give you an idea of what has already been done.
You'll also want to maintain a limited list of audio samples in memory. Specifically, you will likely need the most recent data, because an audio signal is a time-variant signal, and you can't get a match from just one point. I wouldn't make it much longer than the longest sample you're looking to recognize, as audio takes up a boatload of memory.
Lastly (for audio), I would recommend picking a standard format for comparison. Make it as good as gets you decent results, and start high. You will want to convert everything to that format before you compare it.
Once you recognize a specific sound, it's basically a Command Pattern. Specific sounds can be mapped, even with a java.util.HashMap, to specific files, which (if there are few enough) you might even have pre-loaded.
Lastly, it's worth looking at the Java Speech API. It's not part of the JDK and it's quite dated, but you might get some good advice from its implementation.
This is of course the advice of a Java-preferring programmer, but I imagine that there might be some decent libraries in Python and Ruby to help you as well; and of course there's something in C somewhere. This may sound like a lot, but most of the material is already implemented and ready-to-go.
Hopefully this helps, let's look forward to other answers.
I'm struggling to choose between a vast number of audio programming languages and APIs. I'm very (totally) new to audio programming so please bear with me.
Software
I need to be able to:
Alter volume of different sounds before outputting them to anything (these sounds can have a variety of different origins, for example mp3s and microphone input)
phase shift sounds
superimpose sounds that I have tweaked (as per items 1 and 2)
control the output to each of 8 channels independently of one another
make this all happen on Windows7
These capabilities need be abstracted by a graphical frontend I will probably make myself. What I want to be able to do is create 'sound sources' and move them around a 3D environment along either pre-defined trajectories and/or in relation to the movement of whoever is inside the rig. The reason I want to do pitch bending is so I can mess with red-shift stuff.
I don't want to have to construct full tracks before-hand and just play them. I want the sound that is played to depend on external input from sensors as well as what I am doing on the frontend.
As far as I know this means I cant use any existing full audio making app.
The Question
I've been looking around for for the API or language I should use and I have not turned up a blank, quite the opposite actually. I'm struggling to narrow down my search. A lot of my problem stems from the fact that I have no experience in audio programming.
So, does anyone know off-hand of an API or language that meets my criteria?
Hardware stuff and goals
(I left this until last because I'm not sure how relevant it is)
My goal is to make three rings of speakers at different heights and to have enough control over them to be able to simulate any number of 'sound sources' within the array. The idea is to have someone stand in the middle of the rig and be able to make it sound like there are lots of things moving around them. To get this working I'm planning on doing a little trig and using 8 channels of audio from my PC. The maths is pretty straight forward, it just the rest that I need to worry about
What I want to do next is attach a bunch of cameras to the thing and do some simple image recognition stuff to be able to 'attach sound sources' to different objects. Eg. If someone is standing in the right place it can be made to seem as though all red balls quack like a duck, and all orange ones moan hauntingly.
This is not to detract from Richard Small's answer, but to comment on some of the other options out there:
If you are looking for something higher-level with which you can prototype and develop this faster, you want max/msp or it's open source competitor puredata. These are designed for musicians who are technically minded, but not so much for programmers. As a result, you can build this sort of thing quickly and efficiently.
You also have some lower level options: PortAudio can handle your audio I/O, you would have to do the sound generation and effects and so on on your own or with other libraries. Cinder and OpenFramewoks both provide interfaces for audio, cameras, and other stuff for "creative programming". I'm afraid I don't know if they meet your full requirements, but they are powerful and popular for this sort of thing so I encorage you to look at them.
The two major ones these days tend to be
WWise
WWise Download Link
FMOD
FMOD Download Link
These two engines may even in fact be overkill for what you need, but I can almost guarantee that they will be capable of anything you require.
I want to start on a hobby project that focuses on displaying audio files in a folder in a certain fashion and has the ability to play such an audio file and shows basic control options for playing. However, i'm struggling to find a fit programming language for this.
The displaying part shouldn't be too hard and can probably be done in most of the programming languages. The audio part is what concerns me the most since it's not the main focus of the project and should only do limited things (so it shouldn't be too hard) and i do not know anything about sound support in the programming languages i currently know. (Java, C and C++)
Specifically i would like to be able to do these things:
Play a sound file
Stop/pause a playing song
Adjust volume
Show a bar that displays the current position in the song
Most files will be .mp3 files but being able to process other formats is certainly a plus. Since this is just a small project it's ok if it runs just on Windows. Scalabilty would be nice but not required.
It would be nice to have a small overview of audio support/audio libraries of programming languages (i'm always up for something new) that can accomplish these simple things, in a not too complicated way, aswell as personal experiences.
In this way i hope to create a better understanding of which programming language fits my project best. (i would very much like to not have to change language mid-way the project)
--
Edit:
This is only for a later stage of the project if the first part was successfull: i will want to change the file names of the audio files that are displayed. (to make them follow a specific format)
I haven't written audio processing programs much, but I know a lot of them exist for C and C++. For Java perhaps, too, but I don't know Java. I had used audio with SDL in a game, but that doesn't have that many features and I don't recommend it.
There's this question asking for a library in C, and there are a couple of similar questions that SO brings up on the side. You may want to take a look at those.
You would also need to look for a library that loads different file types. SDL at least, only opens .wav files, which I believe most of the playback libraries would support. For MP3, you will most likely need an additional library. I know Audacity uses LAME Mp3 so I'm guessing that should be good.
Some of the functionalities you want is also doable by yourself. For example, knowing the length of the music and the amount you have already read, you will know how far in the audio you are. Adjusting the volume is also a multiplication (in the simplest case) that you can do on the audio data if the library doesn't provide it.
A very good choice seems to be PortAudio which is used by Audacity, and also recommended in the accepted answer of the question I mentioned above.
I've done audio apps in both Java and C++. Java development goes way faster because it's a more powerful language and has garbage collection, but JavaSound is a pretty awful solution for audio. Of course, there are wrappers for FFMPEG and other stuff, so you can get a lot of things working. Here's an example of a Java audio app: http://www.indabamusic.com/help/mantis
OTOH, C++ gives you lots of control, low latency and wealth of libraries. (another answer mentioned Portaudio, which is, indeed, great.) But you will definitely find it also has a much longer development cycle.
You can certainly do everything you want to do with either language.
I'm trying to develop an online application where the user writes some text and the software sings it back to the user.
I can currently generate the audio file with the words spoken by the computer using espeak, but I have no idea how to make it sound like a song, how to add rhythm to it.
I'm able to change the pitch and tempo using rubberband, but that's as far as I've gotten.
Does anyone have a clue how to make this happen?
If you want to use rubberband to change duration and pitch, then I think the hard part is going to be mapping from phonemes/syllables in the text to corresponding audio ranges in the speech systhesis output, for which I have no simple suggestion. (Ideally you'd get inside the speech synthesiser so that it would provide you with the mapping from phonemes to audio location.)
A simpler alternative might be to try Speech Synthesizer Markup Language - SSML. It has a "pitch" and "duration" elements that can absolutely specify pitch in Hz and duration in seconds. You can also specify volume, for controlling dynamics.
Given this, you could try to convert the text into a SSML document, and mark up words/syllables/phonemees with pitch/duration and volume attributes.
I've ended up using Festival's singing mode. It sounds reasonably well, except for the fact it only works with English voices.