Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I would like to implement an 8-bit audio clip. Basically just some notes that are run through some modulations and such. I am new so don't know the terminology too well.
What I'm wondering in order to do this is, the typical/standard format you store your audio sequence in. I know about all the audio formats and lossy/lossless, but these seem to apply to recording audio signals from the environment, rather than generating the audio from within.
When the audio is generated, it seems instead of storing it as the final output sound waves, you could instead store it as the midi sequence or as the notes and intensity, or some waves of some sort. This way you can build a music/sound editor, and save your file, and return to the file with your notes in place.
Midi sequences in the UI
Wondering how this is typically done.
In MIDI the sounds are stored as notes and durations for voices(instruments). When they are played, the sound card(or a software program) generates the appropriate waveforms to be played on the speaker.
MIDI files offer much smaller files than its MP3 or WAV counterparts, at the expense of speech and sound variation.
Not all implementations of MIDI make the same sounds. A professional MIDI synthesizer could generate waveforms for instruments by using captured sounds of real instruments; compared to the typical PC implementation of a generic instrument.
A similar method was used in most 8-bit games to generate music.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to create a software that can convert readable-texts(non-English) to Audio sound output.
After some searches what I have realized that most of the existing audio readers are too robotic and lacks the human-speech like effects.
I am looking for some algorithm/paper-work, which can give me some idea on how to proceed/implement such a thing.
or
Does anyone know, How some of the world's best Text-Reader software works?
My expectation are:
Reduced Robotic-like sound, and more of Human-like sound
High Quality Output
Light weight, yet Fast process speed
**Please edit this question, if anyone thinks some points are missing on this aspect.
Some small steps might help you give some basic Idea of what happens-
You need to create a dictionary of words, each word with its name and sound.
Create your own signal processor, this will help you add effects to your sound, like you might want robotic, or a female version or something else.
Parse the text file you want to read in array formats, dividing each word and punctuations, to form an array and. eg. "I want to die, this isn't a correct way to live." this will form an array as {I:want:to:die:,:this:isn't:a:correct:way:to:live:.}
Use the punctuation to implement life like parameters like , for small pause and . for longer pauses in your audio reader.
Use the words to take out audio from your database(dictionary) list in point 1.
Play the whole array continuously with a pause between each array element, will work similar to spaces
I think these are major ways to do this. To make it faster you can use advanced sound processing tools, to cache small sound data and add data on fly while you are modulating sound signals.
Might this help you.
Could be nice if you can tell us what kind of app you'll create (Movil, Web, Desktop) and also in what code you'll develop it (Php, Java, C++, etc). Because if you search in google, you'll find a lot plugins for website that convert text to audio that you can download them and see the code.
Also it's hard to find an app that not sound like a robot and if you find it maybe you'll pay for it.
The "robotic" aspect of text to speech that you are concerned about is a matter of the quality of "prosody". This is an active research area. You could probably get a PhD for working on improving prosody in TTS systems. If you would like to read about current research you can try searching for "improving prosody in text to speech".
A big part of the problem is having an accurate model of speech prosody in a given language. The thesis "MeLos: Analysis and Modelling of Speech Prosody and Speaking Style" by Nicolas Obin (2012) contains a survey of the state of the art in speech prosody modelling. Or try searching for "text to speech prosody survey state of the art".
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
let's consider a variation of the "WAV to MIDI" conversion problem. I'm aware of the complexity of such a problem and I know that a vast literature about the more general Music Information Retrieval (MIR) subject exists.
But let's suppose here that we already have both the WAV and the MIDI representation of a music piece, so we actually don't have to discover pitches inside the WAV signal from scatch... we "just" have to match the pitches detected (using a suitable algorithm) with the NoteOn events contained in the MIDI representation. I definitely suppose we should use the information contained in the MIDI file to give some hints to the pitch detection algorithm.
Such a matching tool could be very useful, for example for MIDI "humanization": we could make the MIDI representation more expressive using the information retrieved from the WAV signal to "fine tune" note onsets, durations, dynamics, etc...
Does anybody know if such a problem has already been addressed in literature?
Any form of contribution or assistance will be greatly appreciated.
Thanks in advance.
At the 2010 Music Hackday in London some people used the MATCH Vamp plugin to align score to Youtube videos. It was very impressive! Maybe their source code could be of use. I don't know how well MATCH works on audio generated from MIDI files, but that could be worth a try. Here's a link: http://wiki.musichackday.org/index.php?title=Auto_Score_Tubing
This guy appears to have done something similar: http://www.musanim.com/wavalign/ His results are definitely interesting.
This seems like an interesting idea. What are you trying to do, is it just match the notes pitch? Or do you have something else in mind?
One possible thing that you could look into is if you know the note (as an integer value I think its been a while) that will be used to pass into the noteOn method, you may be able to do something with that to map it with a wav signal. IT depends on what you are trying to do.
Also, there are some things that you could also play around with in (I think it is called) the midi controller. Such as: modulation, pitch, volume, pan, or play a couple of notes simultaneously. What you could do with this though, is have a background thread that can change some of those effects as the note is being played. For example, you could have a note get quieter the longer it is played, or have a note that with pan between the left and right speakers, etc
I havnt really played with this code in a long time, but there are some examples of using a midi controller.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm searching ways to identify scores, when someone is playing i.e. guitar. How can I manage that?
I've heard that midi stores music data as musical scores. I wonder if it's a good solution.
MIDI does store musical scores, but it doesn't (normally) extract them from recorded sounds. You can't take an mp3 file and "convert it to MIDI", in a standard or entirely reliable way.
You create a MIDI file using a recorder (or "sequencer"), which might be a desktop application where you "write the score" like a composer does, or it might be a musical device like a keyboard, which records which keys you press, how hard and for how long, and interprets that as a score.
A MIDI player takes the data/score, and reproduces it using its own voice (or "sound font" if you like). So the advantage of MIDI data is firstly that the voice is already available on the playback device (and so the data is very compact), and second that the same data ("tune") can be played using different voices ("instruments")[*].
I believe there are MIDI guitars, but I don't know how "good" they are. The tone of an electric guitar comes in part from resonances of the solid body. This could of course be imitated by the voice at playback time, but there are bound to be some things that you can do with an electric guitar but which the MIDI format cannot capture or represent (for example I'd guess feedback is impossible).
Software exists to extract MIDI data from recorded sound - this is a bit like the way OCR extracts ASCII character data from images of text. It's not a major means of recording someone's guitar-playing, but if what you want is to get a first approximation to the score/tabs, you could try it.
Here's a randomly-selected example, found by Googling "convert from wav to MIDI":
http://www.pluto.dti.ne.jp/~araki/amazingmidi/
[*] But members of the audience, you find yourselves wondering, "what is this mindless automaton which bangs out the tunes it's instructed to, without comprehension or any aesthetic sense". Ladies and gentlemen, Colin Sell at the piano.
Music recognition/retrieval is an extremely difficult and almost entirely unsolved AI problem. Try to extract the frequency from a signal file of someone playing a single unwavering note one time - it's much more difficult than just "apply Fourier transform, read off solution". Compound that with polyphony, noise, rubato, vibrato/portamento, plus the fact that (contrary to speech recognition) we don't even have a working a-priori model of what music actually is, and you begin to see the difficulty. There are absolutely fascinating research papers and even entire conferences on the topic, but in the short term, you're just plain out of luck.
Are you aware you are attempting something extremely difficult? It's a very complex topic you could spend years researching yourself or pay $$$$$$ for existing commercial solutions.
MIDI is a reasonable choice for your output format.
For the rest you will need Fast Fourier Transforms working off a high-resolution capture of the input analogue sounds plus at least seven years of musical theory.
Good luck.
If the player is playing in tune, there will be very distinct frequencies in the signal, or at least frequencies with a mathematical separation. It may be possible to characterise a signal using spectral analysis to distinguish music from noise; or at least melodic music from noise - avant guarde experimental music may not pass ;). The distinction may become more difficult with multiple instrumentalists, percussion, and non-standard or poor tuning; traditional Chinese or Indian music for example uses different scales than western music.
Extracting the frequencies in the signal will require signal processing techniques such as Fast Fourier Transform. Categorising the signal as music/not music could be done by statistical analysis, or AI techniques such as neural networks or fuzzy logic
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I've looked for references on the Audible format, and it appears that people are only interested in cracking/converting out of it. I've got a collection of MP3 files that I want to convert into Audible format for use on my Kindle and iPod.
Does anybody have a good reference on the Audible or protected AAC formats and how section markers and metadata are expressed? Better yet, a utility or code sample?
The Audible format is a DRM protected proprietary format only available from Audible. They protect their format diligently. From what I understood, they used to charge handsomely for the right to use it. I am not aware of any publicly available encoders that will create audible formatted files (for free or not). Since Amazon bought Audible I am not sure if they would even sell the right to anyone else anymore.
I do not own a Kindle, but from what I can tell, it only supports Audible formatted files with full audiobook functionality. MP3's can be played , but they are not treated the same way (no support for chapters etc).
Creating audiobooks with chapter support for the iPod requires creating a M4B file with a text track and making sure the audio track has the proper track reference to the text track. M4B is a version of MP4 that uses AAC audio. There are several programs to do this on the MAC and recently a few have surfaced for the PC.
I have created a freeware software package with a Graphical User Interface (GUI) for the PC which is in Beta test now. It is called "Chapter and Verse" and it will be available on lodensoftware.com shortly. Two other options with GUI's are: "Chapter Master" from Rightword Enterprises ($15) and "iPod Audio Book Converter" (freeware) from sjhaley.com which is in Beta test as well.
Several command line utilities exist as well. One is called Slide Show Assembler (SSA) that can be used to create podcasts as well as audiobooks. SSA is available from jrlearnsmedia.com. Another is a command line utility for manipulating MP4 files called mp4creator. mp4creator can add chapters to a MP4 file.
I struggled with the problem of mp3 to m4b audiobooks on OSX for a while and came up with my workflow.. now summed up in a ruby script that does it all automatically. Here's the workflow:
1.(ruby) sort files and identify chapter names (regexp magic!)
2.(lame) convert all the mp3 to 64kbits mono cbr (this is done to leverage the input files, remove tags that could cause problems in merging, and most of all because coreaudio aac conversion from stereo to mono sucks..)
3.(mp3info ruby gem) compute files length and generate chapter xml file (length computed on the mono cbr version to avoid problems)
4.(cat) merge them
5.(afconvert) convert to aac
6.(ChapterTool) applay chapter markers and rename to m4b
FYI
Chapter and Verse has been released for some time.
The latest version (v1.3) is now available with full capability to create chapterized audiobooks with chapter images for the iPod, iTunes and quicktime.
The program is freeware and is available at lodensoftware.com
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
What is a good free library for editing MP3s/FLACs.
By editing I mean:
Cutting audio file into multiple parts
Joining multiple audio files together
Increase playback speed of file without affecting the pitch (eg. podcasts up to 1.3x)
Re-encoding audio file from Flac -> MP3 or vice versa
I don't mean software, I mean a library that I can use within another application. Programming language agnostic.
Just about every language has bindings to C, so you'll probably want to get the applicable C libraries for encoding/decoding mp3's and FLAC files. This list might include
libFLAC http://flac.sourceforge.net/api/index.html FLAC encoding/decoding
LAME http://lame.sourceforge.net/index.php MP3 encoding
MAD http://www.underbit.com/products/mad/ MP3 decoding
The rest of your signal processing needs could be gathered around a single popular API such as LADSPA http://www.ladspa.org/.
Here's a stretching / pitch shifting library: http://www.breakfastquay.com/rubberband/
Most audio processing programs have a certain internal format they use. That keeps things simple. Everything coming in gets converted to the same format. Once you've standardized the internal format, cutting and splicing audio data is about as difficult as cutting and splicing strings. You don't really need a library for that.
I use Audacity for all my editing needs
Audacity is a free, easy-to-use audio
editor and recorder for Windows, Mac
OS X, GNU/Linux and other operating
systems. You can use Audacity to:
* Record live audio.
* Convert tapes and records into digital recordings or CDs.
* Edit Ogg Vorbis, MP3, WAV or AIFF sound files.
* Cut, copy, splice or mix sounds together.
* Change the speed or pitch of a recording.
Audacity uses the Lame library, however not only is this not language agnostic it also has some questions over licensing. Nevertheless it might be a start