How Text to Audio softwares works [closed]

How Text to Audio softwares works [closed] - text

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to create a software that can convert readable-texts(non-English) to Audio sound output.
After some searches what I have realized that most of the existing audio readers are too robotic and lacks the human-speech like effects.
I am looking for some algorithm/paper-work, which can give me some idea on how to proceed/implement such a thing.
or
Does anyone know, How some of the world's best Text-Reader software works?
My expectation are:
Reduced Robotic-like sound, and more of Human-like sound
High Quality Output
Light weight, yet Fast process speed
**Please edit this question, if anyone thinks some points are missing on this aspect.

Some small steps might help you give some basic Idea of what happens-
You need to create a dictionary of words, each word with its name and sound.
Create your own signal processor, this will help you add effects to your sound, like you might want robotic, or a female version or something else.
Parse the text file you want to read in array formats, dividing each word and punctuations, to form an array and. eg. "I want to die, this isn't a correct way to live." this will form an array as {I:want:to:die:,:this:isn't:a:correct:way:to:live:.}
Use the punctuation to implement life like parameters like , for small pause and . for longer pauses in your audio reader.
Use the words to take out audio from your database(dictionary) list in point 1.
Play the whole array continuously with a pause between each array element, will work similar to spaces
I think these are major ways to do this. To make it faster you can use advanced sound processing tools, to cache small sound data and add data on fly while you are modulating sound signals.
Might this help you.

Could be nice if you can tell us what kind of app you'll create (Movil, Web, Desktop) and also in what code you'll develop it (Php, Java, C++, etc). Because if you search in google, you'll find a lot plugins for website that convert text to audio that you can download them and see the code.
Also it's hard to find an app that not sound like a robot and if you find it maybe you'll pay for it.

The "robotic" aspect of text to speech that you are concerned about is a matter of the quality of "prosody". This is an active research area. You could probably get a PhD for working on improving prosody in TTS systems. If you would like to read about current research you can try searching for "improving prosody in text to speech".
A big part of the problem is having an accurate model of speech prosody in a given language. The thesis "MeLos: Analysis and Modelling of Speech Prosody and Speaking Style" by Nicolas Obin (2012) contains a survey of the state of the art in speech prosody modelling. Or try searching for "text to speech prosody survey state of the art".

Related

Baby Cry Sound detection [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I wanted to write a code to detect baby cry sound. I am using Windows as platform. Presently, I am able to get audio samples and its frequency plot(using FFT) but not sure how to proceed forward.
I wanted to ask what steps I should follow to detect the baby cry sound given its time-frequency plot.
I saw some methods such as median filter followed by HMM in speech recognition. But for simple sound detection do I need to go for such sophiticated method?
I will be very grateful if you could help me.

Hidden markov models are widely used in speach recognition, but since you don't really need to know what your baby is saying (next project: baby translator), i don't think it is what you need.
What you should probably do is look at a lot of spectorgrams of babies crying, and look for patterns. Or, even better, let your algorithm do this. What you do is calculate certain metrics about your sound called MFCCs.
You do this on, say, 1000 samples of crying sound, and then you have a 1000 vectors of metrics.
Now, for each metric you calculate the standard deviation. This gives you a way to tell of a sample of random babysound how much different it is from avarage crying sound.
This sounds very hard, but i know there are tools out there. Have a look at sphinx. You can probably train to work.
But either way, start by collecting baby-crying sounds ;) (but don't steal candy)

How or where can I get separate notes of an instrument for playback in my application? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am looking to create a music creation application, and would like to allow the user to play the individual notes of an instrument. Is there a place online where I can find individual sound files that I may playback for each note, or is there a way of programmatically "generating" each pitch? I am not concerned with sound quality at this point in my development.
EDIT: I am still in the early stages of development. I want the app to be browser based, using Javascript or something similar. A Linux development environment, if that is of relevance at all. The notes will be played via an on-screen interface.

The University of Iowa's Electronic Music Studios has a very nice and complete archive of sampled instruments, with one musical note per file. You should also check out freesound, though that is a much more general-purpose sample sharing site.

There are plenty of places online to find sampled instruments. If you're not concerned about sound quality, some free soundfonts will most likely do the job.
For example, this site http://soundfonts.homemusician.net/ has pianos, basses, guitars, horn etc. (Google "free sf2" for more)
There are plenty of ways to generate (aka synthesise) tones as well.

If you don't mind MIDI files, you can get a free MIDI software piano and create your own files: C.mid, C#.mid, D.mid, etc.
Here's one with a quirky interface but there are many more:
http://download.cnet.com/MidiPiano/3000-2133_4-10542342.html

The easiest way to do this is to simply output MIDI messages to the synth built-in to every computer. No need to create MIDI files or use extra sound fonts.
You didn't mention what language you are using, so it is hard to suggest ways to get started. In all cases though, you'll want to read up a bit on what MIDI actually is.
Basically, MIDI is nothing but control data, commonly used with synthesizers. At a basic level, there are note-on, and note-off messages. There are many other kinds of messages too, such as pitch bend, control change, etc. MIDI supports 16 "channels", which are sent all down the same line, just with a different identifier.
A good utility (on Windows) for debugging MIDI messages (and getting a better idea of the protocol in general!) can be found here: http://www.midiox.com/

WAV-MIDI matching [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
let's consider a variation of the "WAV to MIDI" conversion problem. I'm aware of the complexity of such a problem and I know that a vast literature about the more general Music Information Retrieval (MIR) subject exists.
But let's suppose here that we already have both the WAV and the MIDI representation of a music piece, so we actually don't have to discover pitches inside the WAV signal from scatch... we "just" have to match the pitches detected (using a suitable algorithm) with the NoteOn events contained in the MIDI representation. I definitely suppose we should use the information contained in the MIDI file to give some hints to the pitch detection algorithm.
Such a matching tool could be very useful, for example for MIDI "humanization": we could make the MIDI representation more expressive using the information retrieved from the WAV signal to "fine tune" note onsets, durations, dynamics, etc...
Does anybody know if such a problem has already been addressed in literature?
Any form of contribution or assistance will be greatly appreciated.
Thanks in advance.

At the 2010 Music Hackday in London some people used the MATCH Vamp plugin to align score to Youtube videos. It was very impressive! Maybe their source code could be of use. I don't know how well MATCH works on audio generated from MIDI files, but that could be worth a try. Here's a link: http://wiki.musichackday.org/index.php?title=Auto_Score_Tubing
This guy appears to have done something similar: http://www.musanim.com/wavalign/ His results are definitely interesting.

This seems like an interesting idea. What are you trying to do, is it just match the notes pitch? Or do you have something else in mind?
One possible thing that you could look into is if you know the note (as an integer value I think its been a while) that will be used to pass into the noteOn method, you may be able to do something with that to map it with a wav signal. IT depends on what you are trying to do.
Also, there are some things that you could also play around with in (I think it is called) the midi controller. Such as: modulation, pitch, volume, pan, or play a couple of notes simultaneously. What you could do with this though, is have a background thread that can change some of those effects as the note is being played. For example, you could have a note get quieter the longer it is played, or have a note that with pan between the left and right speakers, etc
I havnt really played with this code in a long time, but there are some examples of using a midi controller.

Ways to identify (musical) scores [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm searching ways to identify scores, when someone is playing i.e. guitar. How can I manage that?
I've heard that midi stores music data as musical scores. I wonder if it's a good solution.

MIDI does store musical scores, but it doesn't (normally) extract them from recorded sounds. You can't take an mp3 file and "convert it to MIDI", in a standard or entirely reliable way.
You create a MIDI file using a recorder (or "sequencer"), which might be a desktop application where you "write the score" like a composer does, or it might be a musical device like a keyboard, which records which keys you press, how hard and for how long, and interprets that as a score.
A MIDI player takes the data/score, and reproduces it using its own voice (or "sound font" if you like). So the advantage of MIDI data is firstly that the voice is already available on the playback device (and so the data is very compact), and second that the same data ("tune") can be played using different voices ("instruments")[*].
I believe there are MIDI guitars, but I don't know how "good" they are. The tone of an electric guitar comes in part from resonances of the solid body. This could of course be imitated by the voice at playback time, but there are bound to be some things that you can do with an electric guitar but which the MIDI format cannot capture or represent (for example I'd guess feedback is impossible).
Software exists to extract MIDI data from recorded sound - this is a bit like the way OCR extracts ASCII character data from images of text. It's not a major means of recording someone's guitar-playing, but if what you want is to get a first approximation to the score/tabs, you could try it.
Here's a randomly-selected example, found by Googling "convert from wav to MIDI":
http://www.pluto.dti.ne.jp/~araki/amazingmidi/
[*] But members of the audience, you find yourselves wondering, "what is this mindless automaton which bangs out the tunes it's instructed to, without comprehension or any aesthetic sense". Ladies and gentlemen, Colin Sell at the piano.

Music recognition/retrieval is an extremely difficult and almost entirely unsolved AI problem. Try to extract the frequency from a signal file of someone playing a single unwavering note one time - it's much more difficult than just "apply Fourier transform, read off solution". Compound that with polyphony, noise, rubato, vibrato/portamento, plus the fact that (contrary to speech recognition) we don't even have a working a-priori model of what music actually is, and you begin to see the difficulty. There are absolutely fascinating research papers and even entire conferences on the topic, but in the short term, you're just plain out of luck.

Are you aware you are attempting something extremely difficult? It's a very complex topic you could spend years researching yourself or pay $$$$$$ for existing commercial solutions.

MIDI is a reasonable choice for your output format.
For the rest you will need Fast Fourier Transforms working off a high-resolution capture of the input analogue sounds plus at least seven years of musical theory.
Good luck.

If the player is playing in tune, there will be very distinct frequencies in the signal, or at least frequencies with a mathematical separation. It may be possible to characterise a signal using spectral analysis to distinguish music from noise; or at least melodic music from noise - avant guarde experimental music may not pass ;). The distinction may become more difficult with multiple instrumentalists, percussion, and non-standard or poor tuning; traditional Chinese or Indian music for example uses different scales than western music.
Extracting the frequencies in the signal will require signal processing techniques such as Fast Fourier Transform. Categorising the signal as music/not music could be done by statistical analysis, or AI techniques such as neural networks or fuzzy logic

Where can I get freely available audio, graphics, and other resources for games? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I've done a google search of this topic, but so far haven't found anything satisfactory.
From your experience, what's the best place to get game resources, like sprites, backgrounds, sound effects, music, etc.? To be more specific, I'm looking for more of sound effects and music, which I'm currently lacking more than graphics. However, for graphics, I've tried getting random graphics from different sites, but they just don't match. I don't want to copy one entire graphics package either.
The resources should be free and easy to obtain. The products I intend to make are free if not open source, and are unlikely to receive widespread attention or produce profit for myself, so I'd like something that I can use and distribute freely.
I don't have enough graphics and musical knowledge to attempt to create resources from scratch and don't know anyone willing to do so.
I'm working with Java. I'm sure I can read all kinds of file formats with it, or if not, I can always use software to convert resources.

In terms of graphics, Daniel Cook of Lost Garden produces some seriously high quality, reusable game art that is free for both personal and commercial work (read his license details). Here's the index of his free graphics related posts, just hit the "read more" link at the bottom of an article and you'll find links to the downloads.
edit: in terms of sound effects, Soundrangers is pretty decent if you have something specific in mind, but it can quickly get expensive. For a complex game, if you're wanting a rich user experience you'll need dozens, if not hundreds of sound effects. At a couple of bucks a pop, that adds up real quick. A lot of places (including Soundrangers) offer thematic sound packs which give you a little more bang for your buck, but it's still not free. GameDev also has a listing of audio resources.
For music, I think your options are better. Depending on what kind of thing you're looking for (ambient, instrumental, vocal etc). I would seriously think about approaching local independent musicians and using existing tracks that they have. They're likely to let you use their music for free (properly accredited of course) or at a reasonable cost.

There's http://www.freesound.org/
Most stuff there has a license that is incompatible with, say, Fedora, for instance, though if you ask the copyright holder sometimes they'll license things under a different license.
Music is harder to come by than sound effects. you could try digging around on archive.org, say here: http://www.archive.org/details/muzic
Also check out sfxr http://www.cyd.liu.se/~tompe573/hp/project_sfxr.html
though the sounds that it makes are pretty old school sounding -- and if that's what you're looking for, it's cool, otherwise, it can make some place holder sound effects.

If you're looking for interesting textures, I would suggest checking out Filter Forge. You can download the filters for use in Adobe Photoshop, or you can potentially use the sample images on the site to create texture maps for various types of terrains and materials.

GarageGames.com sells a lot of that kind of thing... 3d models, textures, background music and so on.
http://garagegames.com and specifically http://www.garagegames.com/products/browse
HTH
edit: whoops, I didn't see the "free" requirement! Do a search on "creative commons" and you'll find lots of music, at least, and some graphics.

Don't know if this is the type of thing you're looking for, but Game Sprite Archives has a huge huge collection of SNES/NES/Anything pre-playstation 1 sprite rips.

I just discovered this site the other day while looking for some sound effects:
http://www.soundrangers.com/
It looks like they're royalty-free but most of the sounds cost a buck or two. Looks like some sounds are free though.

Clipart
Open Clip Art
Textures
ImageAfter
CG Textures
OpenFootage
Texture Hound

I recommend Sound Snap, they allow 5 free downloads for a month for the free accounts, and more if you sign up. I have been using them for the past couple of months for the games I have developed.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string