This question already has answers here:
How to find what time a part of audio starts and ends in another audio?
(4 answers)
How to find the location of a specific word in an audio file?
(1 answer)
Closed 1 year ago.
I have sound sample of saying one word, for example "Apple". Then I have longer audio file ~30 minutes, I want to find when in this longer audio file I say 'apple'. For now I have two ideas, first use sound recognize find this by text (but google/azure speech service have limits for free use). Second idea was to use fourier transformation to find some similarities, I will divide this longer audio to smaller samples. Do you have any idea how can I do it ?
Related
This question already has answers here:
Trying to get the frequencies of a .wav file in Python
(4 answers)
Closed last year.
I have a fun artistic project in mind :) Basically, I would like to load a music file (probably as a .wav), and take it up multiple octaves so that you can represent it on the color spectrum. My question is, how do I go about breaking down an audio file so that I can perform this transformation? I'm curious to see if colors can enhance the experience of listening to music if both are "in harmony" :)
All the best!
Anthony :)
Hey so I found a thread that pretty much answers my question! Trying to get the frequencies of a .wav file in Python Basically I can multiply the calculated frequency by 2^x to represent it with light :) Thanks for all the help!
Edit: While I was working on finishing up my program I stumbled upon this https://github.com/rho-bit/Psynesthesia. Looks like someone did exactly what I was trying to do back in 2003!! This was my first post and I totally love this website. Look forward to using it more! :)
TL;DR: what approach would you take to match an existing text file into a mp3, generating an SRT file?
The situation is this:
I have people reading over text word by word
They upload the mp3 files into our system
Then the mp3 shall have SRT files so we can highlight the text when the voice is played over it.
I know there are some speech recognition software out there (with still mediocre results), but this is a bit different: we only need to somehow match what we already have.
How would you approach this problem? Any idea?
Cheers from Sai Gon, Vietnam
Till
P.S. I started using Stackoverflow a couple of years ago but just had to create a new account linked to my Google account.
I have a large batch of videos that has this loading screen from the program I used to make them and I was wondering if there was a program that would cut that part out and create a new file when the sound of the video starts playing.
I know I can do it manually but it would take forever as I have over 50 videos that I need to edit.
Take a look at this answer:
FFMPEG used to trim a video
I think it answers your question (unless the question also requires some sort of automated way to detect the start of sound activity).
On TED.com they have transcriptions and they go to the appropriate section of the video when clicking a part of the transcription.
I want to do this for 80 hours of audios and transcriptions I have, on Linux with OSS.
This is the approach I'm thinking:
Start small with a 30 minuite sample
Split the audio up into 2 minute WAV file formatted chunks, even if it breaks words up
Run the phrase spotter from CMU Sphinx's long-audio-aligner on each chunk, with the transcript
Take the time index for identified words/phrases found in each bit and calculate the actual estimated time of the ngrams in the original audio file.
Does this seem like an efficient approach? Has anyone actually done this?
Are there alternate approaches that are worth trying like dumb word counting that may be accurate enough?
You can just feed all your audio and text in a long audio aligner and it will give you the timestamps of the words. Using this timestamps you can jump to the specific word in a file.
I'm not sure why do you want to split your audio or do something else.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
let's consider a variation of the "WAV to MIDI" conversion problem. I'm aware of the complexity of such a problem and I know that a vast literature about the more general Music Information Retrieval (MIR) subject exists.
But let's suppose here that we already have both the WAV and the MIDI representation of a music piece, so we actually don't have to discover pitches inside the WAV signal from scatch... we "just" have to match the pitches detected (using a suitable algorithm) with the NoteOn events contained in the MIDI representation. I definitely suppose we should use the information contained in the MIDI file to give some hints to the pitch detection algorithm.
Such a matching tool could be very useful, for example for MIDI "humanization": we could make the MIDI representation more expressive using the information retrieved from the WAV signal to "fine tune" note onsets, durations, dynamics, etc...
Does anybody know if such a problem has already been addressed in literature?
Any form of contribution or assistance will be greatly appreciated.
Thanks in advance.
At the 2010 Music Hackday in London some people used the MATCH Vamp plugin to align score to Youtube videos. It was very impressive! Maybe their source code could be of use. I don't know how well MATCH works on audio generated from MIDI files, but that could be worth a try. Here's a link: http://wiki.musichackday.org/index.php?title=Auto_Score_Tubing
This guy appears to have done something similar: http://www.musanim.com/wavalign/ His results are definitely interesting.
This seems like an interesting idea. What are you trying to do, is it just match the notes pitch? Or do you have something else in mind?
One possible thing that you could look into is if you know the note (as an integer value I think its been a while) that will be used to pass into the noteOn method, you may be able to do something with that to map it with a wav signal. IT depends on what you are trying to do.
Also, there are some things that you could also play around with in (I think it is called) the midi controller. Such as: modulation, pitch, volume, pan, or play a couple of notes simultaneously. What you could do with this though, is have a background thread that can change some of those effects as the note is being played. For example, you could have a note get quieter the longer it is played, or have a note that with pan between the left and right speakers, etc
I havnt really played with this code in a long time, but there are some examples of using a midi controller.