WAV-MIDI matching [closed] - audio

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
let's consider a variation of the "WAV to MIDI" conversion problem. I'm aware of the complexity of such a problem and I know that a vast literature about the more general Music Information Retrieval (MIR) subject exists.
But let's suppose here that we already have both the WAV and the MIDI representation of a music piece, so we actually don't have to discover pitches inside the WAV signal from scatch... we "just" have to match the pitches detected (using a suitable algorithm) with the NoteOn events contained in the MIDI representation. I definitely suppose we should use the information contained in the MIDI file to give some hints to the pitch detection algorithm.
Such a matching tool could be very useful, for example for MIDI "humanization": we could make the MIDI representation more expressive using the information retrieved from the WAV signal to "fine tune" note onsets, durations, dynamics, etc...
Does anybody know if such a problem has already been addressed in literature?
Any form of contribution or assistance will be greatly appreciated.
Thanks in advance.

At the 2010 Music Hackday in London some people used the MATCH Vamp plugin to align score to Youtube videos. It was very impressive! Maybe their source code could be of use. I don't know how well MATCH works on audio generated from MIDI files, but that could be worth a try. Here's a link: http://wiki.musichackday.org/index.php?title=Auto_Score_Tubing
This guy appears to have done something similar: http://www.musanim.com/wavalign/ His results are definitely interesting.

This seems like an interesting idea. What are you trying to do, is it just match the notes pitch? Or do you have something else in mind?
One possible thing that you could look into is if you know the note (as an integer value I think its been a while) that will be used to pass into the noteOn method, you may be able to do something with that to map it with a wav signal. IT depends on what you are trying to do.
Also, there are some things that you could also play around with in (I think it is called) the midi controller. Such as: modulation, pitch, volume, pan, or play a couple of notes simultaneously. What you could do with this though, is have a background thread that can change some of those effects as the note is being played. For example, you could have a note get quieter the longer it is played, or have a note that with pan between the left and right speakers, etc
I havnt really played with this code in a long time, but there are some examples of using a midi controller.

Related

Realtime Sound Routing...Trigger a Sound with Another Sound

I'm looking for a program that is able to recognize individual audio samples from my computer and reroute them to trigger WAV files from a library. In my project, it would need to be realtime as the latency would not be a desired result. I tried using dictation software that would recognize words to trigger opening a file and that's the direction where I want to go, but instead of words I want it to be sounds and it would happen in realtime. I'm not sure where to go and am just looking for some guidance. Does anyone have any suggestions of what I should do?
That's a fairly broad question, but I can tell you how I would do it. (Hardly the only way, but where I would start.)
If you're looking for real time input, the Java Sound library (excellent tutorial here) allows for that. (Just note that microphone input from a web page is difficult on anything, due to major security concerns, so this would be a desktop application.)
If it needs to be real time, the first thing I would suggest is stream and multithread the hell out of it. I would suggest the Java 8 Stream API, but since you're looking for subsamples that match a specific pattern, then each data point will have to be aware of the state of its neighbors, and that isn't easy with streams.
You will probably want to know if a sound roughly resembles an audio profile, so for that, I would pick a tolerance on just how close you want it to be for a match (remembering that samples may not line up 100% anyway, so "exact" is not an option), and then look up Hidden Markov Models. I suggest these because they're what voice recognition software typically uses, and while your sounds may not be voices, it will give you an idea of what has already been done.
You'll also want to maintain a limited list of audio samples in memory. Specifically, you will likely need the most recent data, because an audio signal is a time-variant signal, and you can't get a match from just one point. I wouldn't make it much longer than the longest sample you're looking to recognize, as audio takes up a boatload of memory.
Lastly (for audio), I would recommend picking a standard format for comparison. Make it as good as gets you decent results, and start high. You will want to convert everything to that format before you compare it.
Once you recognize a specific sound, it's basically a Command Pattern. Specific sounds can be mapped, even with a java.util.HashMap, to specific files, which (if there are few enough) you might even have pre-loaded.
Lastly, it's worth looking at the Java Speech API. It's not part of the JDK and it's quite dated, but you might get some good advice from its implementation.
This is of course the advice of a Java-preferring programmer, but I imagine that there might be some decent libraries in Python and Ruby to help you as well; and of course there's something in C somewhere. This may sound like a lot, but most of the material is already implemented and ready-to-go.
Hopefully this helps, let's look forward to other answers.

How Text to Audio softwares works [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to create a software that can convert readable-texts(non-English) to Audio sound output.
After some searches what I have realized that most of the existing audio readers are too robotic and lacks the human-speech like effects.
I am looking for some algorithm/paper-work, which can give me some idea on how to proceed/implement such a thing.
or
Does anyone know, How some of the world's best Text-Reader software works?
My expectation are:
Reduced Robotic-like sound, and more of Human-like sound
High Quality Output
Light weight, yet Fast process speed
**Please edit this question, if anyone thinks some points are missing on this aspect.
Some small steps might help you give some basic Idea of what happens-
You need to create a dictionary of words, each word with its name and sound.
Create your own signal processor, this will help you add effects to your sound, like you might want robotic, or a female version or something else.
Parse the text file you want to read in array formats, dividing each word and punctuations, to form an array and. eg. "I want to die, this isn't a correct way to live." this will form an array as {I:want:to:die:,:this:isn't:a:correct:way:to:live:.}
Use the punctuation to implement life like parameters like , for small pause and . for longer pauses in your audio reader.
Use the words to take out audio from your database(dictionary) list in point 1.
Play the whole array continuously with a pause between each array element, will work similar to spaces
I think these are major ways to do this. To make it faster you can use advanced sound processing tools, to cache small sound data and add data on fly while you are modulating sound signals.
Might this help you.
Could be nice if you can tell us what kind of app you'll create (Movil, Web, Desktop) and also in what code you'll develop it (Php, Java, C++, etc). Because if you search in google, you'll find a lot plugins for website that convert text to audio that you can download them and see the code.
Also it's hard to find an app that not sound like a robot and if you find it maybe you'll pay for it.
The "robotic" aspect of text to speech that you are concerned about is a matter of the quality of "prosody". This is an active research area. You could probably get a PhD for working on improving prosody in TTS systems. If you would like to read about current research you can try searching for "improving prosody in text to speech".
A big part of the problem is having an accurate model of speech prosody in a given language. The thesis "MeLos: Analysis and Modelling of Speech Prosody and Speaking Style" by Nicolas Obin (2012) contains a survey of the state of the art in speech prosody modelling. Or try searching for "text to speech prosody survey state of the art".

How or where can I get separate notes of an instrument for playback in my application? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am looking to create a music creation application, and would like to allow the user to play the individual notes of an instrument. Is there a place online where I can find individual sound files that I may playback for each note, or is there a way of programmatically "generating" each pitch? I am not concerned with sound quality at this point in my development.
EDIT: I am still in the early stages of development. I want the app to be browser based, using Javascript or something similar. A Linux development environment, if that is of relevance at all. The notes will be played via an on-screen interface.
The University of Iowa's Electronic Music Studios has a very nice and complete archive of sampled instruments, with one musical note per file. You should also check out freesound, though that is a much more general-purpose sample sharing site.
There are plenty of places online to find sampled instruments. If you're not concerned about sound quality, some free soundfonts will most likely do the job.
For example, this site http://soundfonts.homemusician.net/ has pianos, basses, guitars, horn etc. (Google "free sf2" for more)
There are plenty of ways to generate (aka synthesise) tones as well.
If you don't mind MIDI files, you can get a free MIDI software piano and create your own files: C.mid, C#.mid, D.mid, etc.
Here's one with a quirky interface but there are many more:
http://download.cnet.com/MidiPiano/3000-2133_4-10542342.html
The easiest way to do this is to simply output MIDI messages to the synth built-in to every computer. No need to create MIDI files or use extra sound fonts.
You didn't mention what language you are using, so it is hard to suggest ways to get started. In all cases though, you'll want to read up a bit on what MIDI actually is.
Basically, MIDI is nothing but control data, commonly used with synthesizers. At a basic level, there are note-on, and note-off messages. There are many other kinds of messages too, such as pitch bend, control change, etc. MIDI supports 16 "channels", which are sent all down the same line, just with a different identifier.
A good utility (on Windows) for debugging MIDI messages (and getting a better idea of the protocol in general!) can be found here: http://www.midiox.com/

Ways to identify (musical) scores [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm searching ways to identify scores, when someone is playing i.e. guitar. How can I manage that?
I've heard that midi stores music data as musical scores. I wonder if it's a good solution.
MIDI does store musical scores, but it doesn't (normally) extract them from recorded sounds. You can't take an mp3 file and "convert it to MIDI", in a standard or entirely reliable way.
You create a MIDI file using a recorder (or "sequencer"), which might be a desktop application where you "write the score" like a composer does, or it might be a musical device like a keyboard, which records which keys you press, how hard and for how long, and interprets that as a score.
A MIDI player takes the data/score, and reproduces it using its own voice (or "sound font" if you like). So the advantage of MIDI data is firstly that the voice is already available on the playback device (and so the data is very compact), and second that the same data ("tune") can be played using different voices ("instruments")[*].
I believe there are MIDI guitars, but I don't know how "good" they are. The tone of an electric guitar comes in part from resonances of the solid body. This could of course be imitated by the voice at playback time, but there are bound to be some things that you can do with an electric guitar but which the MIDI format cannot capture or represent (for example I'd guess feedback is impossible).
Software exists to extract MIDI data from recorded sound - this is a bit like the way OCR extracts ASCII character data from images of text. It's not a major means of recording someone's guitar-playing, but if what you want is to get a first approximation to the score/tabs, you could try it.
Here's a randomly-selected example, found by Googling "convert from wav to MIDI":
http://www.pluto.dti.ne.jp/~araki/amazingmidi/
[*] But members of the audience, you find yourselves wondering, "what is this mindless automaton which bangs out the tunes it's instructed to, without comprehension or any aesthetic sense". Ladies and gentlemen, Colin Sell at the piano.
Music recognition/retrieval is an extremely difficult and almost entirely unsolved AI problem. Try to extract the frequency from a signal file of someone playing a single unwavering note one time - it's much more difficult than just "apply Fourier transform, read off solution". Compound that with polyphony, noise, rubato, vibrato/portamento, plus the fact that (contrary to speech recognition) we don't even have a working a-priori model of what music actually is, and you begin to see the difficulty. There are absolutely fascinating research papers and even entire conferences on the topic, but in the short term, you're just plain out of luck.
Are you aware you are attempting something extremely difficult? It's a very complex topic you could spend years researching yourself or pay $$$$$$ for existing commercial solutions.
MIDI is a reasonable choice for your output format.
For the rest you will need Fast Fourier Transforms working off a high-resolution capture of the input analogue sounds plus at least seven years of musical theory.
Good luck.
If the player is playing in tune, there will be very distinct frequencies in the signal, or at least frequencies with a mathematical separation. It may be possible to characterise a signal using spectral analysis to distinguish music from noise; or at least melodic music from noise - avant guarde experimental music may not pass ;). The distinction may become more difficult with multiple instrumentalists, percussion, and non-standard or poor tuning; traditional Chinese or Indian music for example uses different scales than western music.
Extracting the frequencies in the signal will require signal processing techniques such as Fast Fourier Transform. Categorising the signal as music/not music could be done by statistical analysis, or AI techniques such as neural networks or fuzzy logic

Is algebraic sound synthesis possible? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Lets say you have an normal song with two layers, one instrumental and another of just vocals. Now lets say you also have just the instrumental layer. Is it possible to "subtract" the instrumentals and obtain the pure vocals? Is there going to be loss? How would I go about performing this specific type of subtractive synthesis?
Yes it's possible but yes there can be loss as well since sound waves can additively cancel each other out (destructive interference). For example, two sine waves that are 180 degrees out of phase would produce silence.
Ideally, it should be possible. The catch is, "ideal" is pretty restrictive. In order to pull this off properly, you would have to have a song file that was constructed by additive synthesis in the first place, i.e. by adding the vocal track to the exact same instrumental track you have. Now, if you do have that situation, then it's simple enough; as others have said, you just add the inverse of the instrumental track to the overall song. Unfortunately, there are a lot of things that can get in the way of that. For example, if the additive synthesis was clipped at some points (which means that the sum of the instrumental and vocal tracks was louder than the maximum volume that can be stored), you won't be able to recover the vocals at those points. More generally, lossy audio compression tends to remove different pieces of the sound depending on what is most/least audible, and that's heavily dependent on whether you have vocals or not, so if any of these sound files have been compressed using a lossy codec like MP3, you've probably lost the information you need to reconstruct the vocal track. The thing is, even minor changes to a signal can sometimes produce a big difference when you add it to or subtract it from another signal (because of wave interference and such things) so the results are kind of unpredictable when you don't have the exact sound to work with.
By the way, if you do have the exact signals you need to do this, you can perform the subtraction using Audacity or any other decent audio editor. There are even some mathematical programs you can use (like Matlab, which is able to read/write WAV files IIRC).
Using a technique (and software) like this: Audacity Vocal Removals I bet you can achieve what you need. As Daniel and Paolo said if you can apply the inverse of a soundwave to the original soundwave you are able to cancel it out (muting the sound).
Generally, the word 'Synthesis' is used slightly differently, though the dictionary meaning might agree with your question. As pointed out, audacity/vst plugin/pro-tools versions of 'extract vocals' exist.
// Is there going to be a loss?// Of course, there will be some loss. Vocal and Instrumental tracks are 'mixed' and 'mastered'. Panning, adding effects (echo/reverb), and additional shaping (compress etc) take place in these stages. Besides, there'll be many instrument tracks (keyboards, guitars, bass, drums) in most of the music that's produced these days.
I mean to say, even you make your own music - with 1 instrument track and 1 vocal track, if you just pan your tracks, your logic of subtraction is going to be affected. And, if you had recorded those tracks, say by playing guitar and singing, most probably there'll be some 'leak' in both of the tracks, that makes matters worse.
Hypothetically speaking, wonderful things are possible with this idea. Practically, there are too many imperfections in the actual music production process.

Resources