This question already has answers here:
Trying to get the frequencies of a .wav file in Python
(4 answers)
Closed last year.
I have a fun artistic project in mind :) Basically, I would like to load a music file (probably as a .wav), and take it up multiple octaves so that you can represent it on the color spectrum. My question is, how do I go about breaking down an audio file so that I can perform this transformation? I'm curious to see if colors can enhance the experience of listening to music if both are "in harmony" :)
All the best!
Anthony :)
Hey so I found a thread that pretty much answers my question! Trying to get the frequencies of a .wav file in Python Basically I can multiply the calculated frequency by 2^x to represent it with light :) Thanks for all the help!
Edit: While I was working on finishing up my program I stumbled upon this https://github.com/rho-bit/Psynesthesia. Looks like someone did exactly what I was trying to do back in 2003!! This was my first post and I totally love this website. Look forward to using it more! :)
Related
I'm looking to create this project in processing, however, I'm finding the terminology a bit hard. I'm not sure how to call the effect where the line is staying permanently throughout the song to 'draw' the music data.
I would appreciate any guidance on what tutorials I could look at or an answer from someone.
My aim is to create something as close to this as possible:
https://www.youtube.com/watch?v=Bb5PTitqtlc&t=58s
Stack Overflow isn't really designed for general "how do I do this" type questions. It's for specific "I tried X, expected Y, but got Z instead" type questions. But I'll try to help in a general sense:
You need to break your problem down into smaller pieces and then take those pieces on one at a time. Write down exactly what you want to happen, in English, and that will be an algorithm that you can think about implementing with code.
Get something simple working. Can you write a simple sketch that plays a song? Then work your way forward in small steps. Can you write a simple sketch that prints out some numeric values based on the song that's playing? Separately from that, can you create a very simple visualization using hard-coded numbers? Get all of that working separately before you think about combining them into a sketch that shows a visualization based on a song that's playing.
Then if you get stuck, you can post a more specific question along with a MCVE. Good luck.
I have a text script that is used to create podcasts. So the words in podcast audio are exactly the same as in my text. Now what I want to have is the following:
Word in text | Pronounciation started at
Hello 0:0:0.000
my 0:0:1.125
friends 0:0:2.750
Is that possible to do at all?
Thanks in advance!
One of the key words you could start with to approach the complexity of the problem is "forced alignment". This site also covers questions regarding this topic e.g. here which leads you to questions and answers concerning HTK (the Hidden Markov Model Toolkit) via the releated threads.
You can find a more hands-on style description of how to use forced alignment in automated audio segmentation here.
So the answer is: yes, it is possible, but it is algorithmically very complex and even in its best implementations it is not error-free.
PS.: I found you a really simple tool
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to create a software that can convert readable-texts(non-English) to Audio sound output.
After some searches what I have realized that most of the existing audio readers are too robotic and lacks the human-speech like effects.
I am looking for some algorithm/paper-work, which can give me some idea on how to proceed/implement such a thing.
or
Does anyone know, How some of the world's best Text-Reader software works?
My expectation are:
Reduced Robotic-like sound, and more of Human-like sound
High Quality Output
Light weight, yet Fast process speed
**Please edit this question, if anyone thinks some points are missing on this aspect.
Some small steps might help you give some basic Idea of what happens-
You need to create a dictionary of words, each word with its name and sound.
Create your own signal processor, this will help you add effects to your sound, like you might want robotic, or a female version or something else.
Parse the text file you want to read in array formats, dividing each word and punctuations, to form an array and. eg. "I want to die, this isn't a correct way to live." this will form an array as {I:want:to:die:,:this:isn't:a:correct:way:to:live:.}
Use the punctuation to implement life like parameters like , for small pause and . for longer pauses in your audio reader.
Use the words to take out audio from your database(dictionary) list in point 1.
Play the whole array continuously with a pause between each array element, will work similar to spaces
I think these are major ways to do this. To make it faster you can use advanced sound processing tools, to cache small sound data and add data on fly while you are modulating sound signals.
Might this help you.
Could be nice if you can tell us what kind of app you'll create (Movil, Web, Desktop) and also in what code you'll develop it (Php, Java, C++, etc). Because if you search in google, you'll find a lot plugins for website that convert text to audio that you can download them and see the code.
Also it's hard to find an app that not sound like a robot and if you find it maybe you'll pay for it.
The "robotic" aspect of text to speech that you are concerned about is a matter of the quality of "prosody". This is an active research area. You could probably get a PhD for working on improving prosody in TTS systems. If you would like to read about current research you can try searching for "improving prosody in text to speech".
A big part of the problem is having an accurate model of speech prosody in a given language. The thesis "MeLos: Analysis and Modelling of Speech Prosody and Speaking Style" by Nicolas Obin (2012) contains a survey of the state of the art in speech prosody modelling. Or try searching for "text to speech prosody survey state of the art".
I'm trying to extract snippets (3-5 sec.) from a large collection of audio files.
I would like to do this with a shell script. I found basically nothing on the internet, so I'm asking here.
I'm also familiar with perl, php and java - I don't care what language will do, I just want to job done :)
Scenario: I got a large archive of audio files in very high resolution. I need to extract a very short snippet in low resolution for a preview (3 to 5 sec. is extremely short but that's what we need). Being a huge fan of shell scripting, I was hoping to automatize a process that extracts the snippet at RANDOM onset time... is it really too much to ask? :)
Thank you for your ideas!
You can use flash to automatically set lengths on the audio according to this blog:
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/media/Sound.html
Hope it helps.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
let's consider a variation of the "WAV to MIDI" conversion problem. I'm aware of the complexity of such a problem and I know that a vast literature about the more general Music Information Retrieval (MIR) subject exists.
But let's suppose here that we already have both the WAV and the MIDI representation of a music piece, so we actually don't have to discover pitches inside the WAV signal from scatch... we "just" have to match the pitches detected (using a suitable algorithm) with the NoteOn events contained in the MIDI representation. I definitely suppose we should use the information contained in the MIDI file to give some hints to the pitch detection algorithm.
Such a matching tool could be very useful, for example for MIDI "humanization": we could make the MIDI representation more expressive using the information retrieved from the WAV signal to "fine tune" note onsets, durations, dynamics, etc...
Does anybody know if such a problem has already been addressed in literature?
Any form of contribution or assistance will be greatly appreciated.
Thanks in advance.
At the 2010 Music Hackday in London some people used the MATCH Vamp plugin to align score to Youtube videos. It was very impressive! Maybe their source code could be of use. I don't know how well MATCH works on audio generated from MIDI files, but that could be worth a try. Here's a link: http://wiki.musichackday.org/index.php?title=Auto_Score_Tubing
This guy appears to have done something similar: http://www.musanim.com/wavalign/ His results are definitely interesting.
This seems like an interesting idea. What are you trying to do, is it just match the notes pitch? Or do you have something else in mind?
One possible thing that you could look into is if you know the note (as an integer value I think its been a while) that will be used to pass into the noteOn method, you may be able to do something with that to map it with a wav signal. IT depends on what you are trying to do.
Also, there are some things that you could also play around with in (I think it is called) the midi controller. Such as: modulation, pitch, volume, pan, or play a couple of notes simultaneously. What you could do with this though, is have a background thread that can change some of those effects as the note is being played. For example, you could have a note get quieter the longer it is played, or have a note that with pan between the left and right speakers, etc
I havnt really played with this code in a long time, but there are some examples of using a midi controller.