I hoped that potentially there is something that will help me do it with just one step, however there might be other steps for it.
The problem is, that there is a game I follow, however all the major informations (devblogs and streams) is passed over mostly in french. Today there is one stream on Twitch that I would love to understand, however French has never crossed my path outside of the game. I was hoping there would be a way for me to launch the stream, capture the text spoken during it and translate it to English.
So far, the best idea that came to my mind would be to open up google translate, turn the volume up and let it speak to itself, however I hoped there would be something that listens to an application/window inside the system without the use of the actual microphone and speaker.
Enjoy your day, all!
Related
So, yesterday, i was calling with my friend and i wanted to play him an sound from a game, but i had no idea how, because he couldn't hear mine sound. So, is there a way to direct all the system audio to record on the microphone? And also keep the normal output so i can hear it. Thanks for help!
And also, please make your answer noob-friendly, because i am pretty bad with this, thanks!
Other stuff:
I want to direct the output into the microphone, not just the calling program (it's Skype by the way).
Please don't answer that i should put the output to the speakers and record it like it, because i don't even have speakers... just headphones.
I'm looking for a program that is able to recognize individual audio samples from my computer and reroute them to trigger WAV files from a library. In my project, it would need to be realtime as the latency would not be a desired result. I tried using dictation software that would recognize words to trigger opening a file and that's the direction where I want to go, but instead of words I want it to be sounds and it would happen in realtime. I'm not sure where to go and am just looking for some guidance. Does anyone have any suggestions of what I should do?
That's a fairly broad question, but I can tell you how I would do it. (Hardly the only way, but where I would start.)
If you're looking for real time input, the Java Sound library (excellent tutorial here) allows for that. (Just note that microphone input from a web page is difficult on anything, due to major security concerns, so this would be a desktop application.)
If it needs to be real time, the first thing I would suggest is stream and multithread the hell out of it. I would suggest the Java 8 Stream API, but since you're looking for subsamples that match a specific pattern, then each data point will have to be aware of the state of its neighbors, and that isn't easy with streams.
You will probably want to know if a sound roughly resembles an audio profile, so for that, I would pick a tolerance on just how close you want it to be for a match (remembering that samples may not line up 100% anyway, so "exact" is not an option), and then look up Hidden Markov Models. I suggest these because they're what voice recognition software typically uses, and while your sounds may not be voices, it will give you an idea of what has already been done.
You'll also want to maintain a limited list of audio samples in memory. Specifically, you will likely need the most recent data, because an audio signal is a time-variant signal, and you can't get a match from just one point. I wouldn't make it much longer than the longest sample you're looking to recognize, as audio takes up a boatload of memory.
Lastly (for audio), I would recommend picking a standard format for comparison. Make it as good as gets you decent results, and start high. You will want to convert everything to that format before you compare it.
Once you recognize a specific sound, it's basically a Command Pattern. Specific sounds can be mapped, even with a java.util.HashMap, to specific files, which (if there are few enough) you might even have pre-loaded.
Lastly, it's worth looking at the Java Speech API. It's not part of the JDK and it's quite dated, but you might get some good advice from its implementation.
This is of course the advice of a Java-preferring programmer, but I imagine that there might be some decent libraries in Python and Ruby to help you as well; and of course there's something in C somewhere. This may sound like a lot, but most of the material is already implemented and ready-to-go.
Hopefully this helps, let's look forward to other answers.
For one of our projects, we got a new requirement on our hands but I don't have any idea about how to do it.
We need to process audio captured from the environment and do something in the app (show a message, picture, etc) when a specific pattern is recognized.
The first thing that came to my mind when I heard this requirement was Shazam. I did a little research and found Echo Print library (http://echoprint.me). I think that works on the whole of the songs, what I need to do is constantly listen to environment and act when the patterns are recognized. I don't know anything about audio processing (at least for now) but this sounds more like steganography. Correct me if I'm wrong.
Any help will be greatly appreciated!
EDIT: I think I need to correct some points in my question. Yes, the application will listen to the environment but it will recognize the patterns in a pre-configured audio. A specific song on a radio, a dog bark, etc. So the patterns will be artificially defined in the audio.
Thanks.
I'm trying to develop an online application where the user writes some text and the software sings it back to the user.
I can currently generate the audio file with the words spoken by the computer using espeak, but I have no idea how to make it sound like a song, how to add rhythm to it.
I'm able to change the pitch and tempo using rubberband, but that's as far as I've gotten.
Does anyone have a clue how to make this happen?
If you want to use rubberband to change duration and pitch, then I think the hard part is going to be mapping from phonemes/syllables in the text to corresponding audio ranges in the speech systhesis output, for which I have no simple suggestion. (Ideally you'd get inside the speech synthesiser so that it would provide you with the mapping from phonemes to audio location.)
A simpler alternative might be to try Speech Synthesizer Markup Language - SSML. It has a "pitch" and "duration" elements that can absolutely specify pitch in Hz and duration in seconds. You can also specify volume, for controlling dynamics.
Given this, you could try to convert the text into a SSML document, and mark up words/syllables/phonemees with pitch/duration and volume attributes.
I've ended up using Festival's singing mode. It sounds reasonably well, except for the fact it only works with English voices.
I want to make a program that takes recorded speech and transforms it so it sounds like it's coming from a Texas TI-99. Do you have any good ideas and resources for how to go about that?
Most of those old speech synthesizers were build directly in-chip. Perhaps you could find a synthesizer that sounds like the chip, but if you really want the original sound, you would either have to simulate the chip (I don't know if it's a simple matter, perhaps the chip internals aren't published).
I only know because I burnt out a number of the Radio Shack speech synthesizer ICs before I managed to get a SP0256-AL2 working.
If you're more of a do-it yourself type guy, you need to find out which IC actually drove the speech synthesis in a TI-99, and then build the chip up on a bread board. That's what I was trying to do back then, and I managed to get the chip to speak, but lost patience after I fried my third chip due to a mis-wiring issue when I attempted to attach it to my PC's parallel port. I think this was the book I was using back then, but there's no cover art featured so it's hard to know for sure.
If you are familiar with how to use ROM images, there seems to be a gentleman that has managed to refeverse engineer the ROM image out of a SP0256-AL2. Look here for the image and the incredible granted permission to do the work and distribute the results.
You could start with open source that does something similar: Adding Robotic/Vocoder effect to your song using Audacity