Ok, so I go this VoIP service and need a simple test of the sound quality transmitted. 2 VMs will "talk", and the tests will be done by a third computer. We have the record of the sound spoken and another recording of the sound received(.wav). The testing computer receives both files (pre and pos transmission, pos-transmission should have a little noise or errors) and need to compare the sound quality between then. The only relevant info, would be an output saying how good the quality is at he receiver end. (something like 0.0 - 1.0 score) I'm having a lot of trouble comparing the 2 sounds recorded, any insight and help would be great. Oh yeah, this must be automatized, so there is no one to listen both records and say how bad one of then is. The computer should be able to determine the final quality.
Sorry for any mistake, English is not my first language, and thanks again for any possible help.
The question is very interesting, but unfortunately you have thin chances of getting a perfect solution since it is an open problem. You might be interested in PESQ (Perceptual Evaluation of Speech Quality) (implementation at: http://www.itu.int/rec/T-REC-P.862-200102-I/en)
Related
Lets say I have the audio file for Happy Birthday. I want to convert that audio file into an audio file that sounds like this : happy birthday.
First, I'd like to know if I have the ability to program this? Can a highschooler who's almost finished with APCS program this?
If I can:
How would I change the bpm of the song? I've searched through a bunch of websites, but they weren't very helpful.
I know that audio files can be represented in waveforms. How would I scan for each individual wave in an audio file (I need this to isolate the notes)?
This is a very ambitious project, actually. One reason is that it involves using digital signal processing tools like FFT (Fast fourier transforms) to analyze the sound to pick out the pitches. You might be able to find a library that can do this, but as far as coding such a tool, that would involve a steep learning curve.
If you would like to look further into this, there is a good online resource called "The Scientists and Engineers Guide to Digital Signal Processing". I was able to work through and understand the discrete fourier transform with only high school math (lots of trig) and a bit of calculus. It was a lift, though.
Trying to analyze rhythm is also no easy task. Even with advanced tools provided in professional notation system such as Finale, people have trouble playing rhythms in time well enough for the best transcription tools. Algorithms that "quantize" the beats help but also limit the amount of detail that can be included in the playback.
My guess is that as interesting and worthwhile as this project would be, to bring it to completion before the semester ends would require putting together prebuilt pieces. A lot of programming is done that way, these days.
If you scale the project back to something like just getting your code to analyze a short sample of a single note and give its pitch, that would be both impressive and doable with a lot of work. It could be done with a DFT algorithm instead of requiring FFT, reducing the amount of info you'd have to acquire first. That way, you'd only have to work your way up to understanding and implementing the material on this link which is about calculating the DFT. Notice that there is example code in BASIC. The code examples throughout this book are a big help.
This is an odd one. I'm not sure it's the right place to ask, but maybe there are some sound experts who can chime in? (no pun intended)
We use two sounds on our website to indicate success and failure on a quiz. Those are very simple and short sounds.
Somehow one of our customers reported that her dog was whimpering and really upset with both of those sounds. He's normally fine with lots of other sounds that dogs are typically unhappy with including loud sounds, hoovers etc. She even said it happens when she uses headphones!
Other than muting, or replacing those sounds (and upsetting other dogs?), is there anything we can do to clean the sound or detect what specifically makes them upset this or other dogs?
Downvoters: I think this question crosses over between biology/physiology/physics and signal and audio processing. The answers I'm getting now actually demonstrate this. It requires this cross-domain knowledge. In any case, I'm happy to delete it if this seems to not jive well with this community. I think my intentions were positive and I added a bounty to try to solve this real problem. It saddens me to even see downvotes for the answers although they made an effort to help.
EDIT: I'm unable to delete this question it seems. I get an error message.
EDIT2: In case it's more helpful, here's a spectrum analysis of both sounds using Audacity. There are lots of different options, but this is using the default options for Analyze->Plot spectrum
This question is not at the right place (and down-voted) but for your information you may take a look on Frequency Range of Dog Hearing where you can read that:
Humans can hear sounds approximately within the frequencies of 20 Hz and 20,000 Hz
[...]
The frequency range of dog hearing is approximately 40 Hz to 60,000 Hz
Note also that:
The shape of a dog's ear also helps it hear more proficiently.
A similar example is :
A vacuum cleaner, which merely sounds loud to us, can produce a high frequency sound which may scare dogs away
I expect that very low frequency can also scare dog (like thunderstorm sound)
You may use a spectrum analyser software (open source audio-software like Audacity allow you to do the job) to double check if low/high frequency are present in the sound.
In my opinion you may use a Band-pass filter, to cut all frequency lower than 50KHz & higher than 15KHz to avoid the "the vacuum cleaner" and "thunderstorm sound" effect (which may scare dogs.)
You may finally take a look on the Audacity low pass filter manual to know how to apply this filter on your sound.
Despite the fact that the question is offtopic here, I'll move here my initial answer from comments - I guess this will allow the question author to close question bounty properly.
The answer:
1. I guess your question is downvoted because here's a place for computer software and hardware functioning related questions, and your question is rather from physics or biology domains. So ask them on appropriate SO sections: physics.stackexchange.com and/or biology.stackexchange.com
Regarding your question, I would recommend you to check your sounds frequency range: dogs do not like sounds with loud high frequencies. Perhaps your sounds contains high frequencies, you can check it yourself with sound spectrum diagram in some audio redactor, for example Audacity
And yes, as #astefani pointed out those frequencies can be removed with band-pass filter, for example with low-pass filter in Audacity
I'm looking for a program that is able to recognize individual audio samples from my computer and reroute them to trigger WAV files from a library. In my project, it would need to be realtime as the latency would not be a desired result. I tried using dictation software that would recognize words to trigger opening a file and that's the direction where I want to go, but instead of words I want it to be sounds and it would happen in realtime. I'm not sure where to go and am just looking for some guidance. Does anyone have any suggestions of what I should do?
That's a fairly broad question, but I can tell you how I would do it. (Hardly the only way, but where I would start.)
If you're looking for real time input, the Java Sound library (excellent tutorial here) allows for that. (Just note that microphone input from a web page is difficult on anything, due to major security concerns, so this would be a desktop application.)
If it needs to be real time, the first thing I would suggest is stream and multithread the hell out of it. I would suggest the Java 8 Stream API, but since you're looking for subsamples that match a specific pattern, then each data point will have to be aware of the state of its neighbors, and that isn't easy with streams.
You will probably want to know if a sound roughly resembles an audio profile, so for that, I would pick a tolerance on just how close you want it to be for a match (remembering that samples may not line up 100% anyway, so "exact" is not an option), and then look up Hidden Markov Models. I suggest these because they're what voice recognition software typically uses, and while your sounds may not be voices, it will give you an idea of what has already been done.
You'll also want to maintain a limited list of audio samples in memory. Specifically, you will likely need the most recent data, because an audio signal is a time-variant signal, and you can't get a match from just one point. I wouldn't make it much longer than the longest sample you're looking to recognize, as audio takes up a boatload of memory.
Lastly (for audio), I would recommend picking a standard format for comparison. Make it as good as gets you decent results, and start high. You will want to convert everything to that format before you compare it.
Once you recognize a specific sound, it's basically a Command Pattern. Specific sounds can be mapped, even with a java.util.HashMap, to specific files, which (if there are few enough) you might even have pre-loaded.
Lastly, it's worth looking at the Java Speech API. It's not part of the JDK and it's quite dated, but you might get some good advice from its implementation.
This is of course the advice of a Java-preferring programmer, but I imagine that there might be some decent libraries in Python and Ruby to help you as well; and of course there's something in C somewhere. This may sound like a lot, but most of the material is already implemented and ready-to-go.
Hopefully this helps, let's look forward to other answers.
I want to make a program that takes recorded speech and transforms it so it sounds like it's coming from a Texas TI-99. Do you have any good ideas and resources for how to go about that?
Most of those old speech synthesizers were build directly in-chip. Perhaps you could find a synthesizer that sounds like the chip, but if you really want the original sound, you would either have to simulate the chip (I don't know if it's a simple matter, perhaps the chip internals aren't published).
I only know because I burnt out a number of the Radio Shack speech synthesizer ICs before I managed to get a SP0256-AL2 working.
If you're more of a do-it yourself type guy, you need to find out which IC actually drove the speech synthesis in a TI-99, and then build the chip up on a bread board. That's what I was trying to do back then, and I managed to get the chip to speak, but lost patience after I fried my third chip due to a mis-wiring issue when I attempted to attach it to my PC's parallel port. I think this was the book I was using back then, but there's no cover art featured so it's hard to know for sure.
If you are familiar with how to use ROM images, there seems to be a gentleman that has managed to refeverse engineer the ROM image out of a SP0256-AL2. Look here for the image and the incredible granted permission to do the work and distribute the results.
You could start with open source that does something similar: Adding Robotic/Vocoder effect to your song using Audacity
Despite all the advances in 3D graphic engines, it strikes me as odd that the same level of attention hasn't been given to audio. Modern games do real-time rendering of 3D scenes, yet we still get more-or-less pre-canned audio accompanying those scenes.
Imagine - if you will - a 3D engine that models not just the physical appearance of items, but also their audio properties. And from these models it can dynamically generate audio based on the materials that come into contact, their velocity, distance from your virtual ears, etcetera. Now, when you're crouching behind the sandbags with bullets flying over your head, each one will yield a unique and realistic sound.
The obvious application of such a technology would be gaming, but I'm sure there are many other possibilities.
Is such a technology being actively developed? Does anyone know of any projects that attempt to achieve this?
Thanks,
Kent
I once did some research toward improving OpenAL, and the problem with simulating 3D audio is that so many of the cues that your mind uses — the slightly different attenuation at various angles, the frequency difference between sounds in front of you and those behind you — are quite specific to your own head and are not quite the same for anyone else!
If you want, say, a pair of headphones to really make it sound like a creature is in the leaves ahead and in front of the character in a game, then you actually have to take that player into a studio, measure how their own particular ears and head change the amplitude and phase of the sound at different distances (amplitude and phase are different, and are both quite important to the way your brain processes sound direction), and then teach the game to attenuate and phase-shift the sounds for that particular player.
There do exist "standard heads" that have been mocked up with plastic and used to get generic frequency-response curves for the various directions around the head, but an average or standard will never sound quite right to most players.
Thus the current technology is basically to sell the player five cheap speakers, have them place them around their desk, and then the sounds — while not particularly well reproduced — actually do sound like they're coming from behind or beside the player because, well, they are coming from the speaker behind the player. :-)
But some games do bother to be careful to compute how sound would be muffled and attenuated through walls and doors (which can get difficult to simulate, because the ear receives the same sound at a few milliseconds different delay through various materials and reflective surfaces in the environment, all of which would have to be included if things were to sound realistic). They tend to keep their libraries under wraps, however, so public reference implementations like OpenAL tend to be pretty primitive.
Edit: here is a link to an online data set that I found at the time, that could be used as a starting point for creating a more realistic OpenAL sound field, from MIT:
http://sound.media.mit.edu/resources/KEMAR.html
Enjoy! :-)
Aureal did this back in 1998. I still have one of their cards, although I'd need Windows 98 to run it.
Imagine ray-tracing, but with audio. A game using the Aureal API would provide geometric environment information (e.g. a 3D map) and the audio card would ray-trace sound. It was exactly like hearing real things in the world around you. You could focus your eyes on the sound sources and attend to given sources in a noisy environment.
As I understand it, Creative destroyed Aureal by means of legal expenses in a series of patent infringement claims (which were all rejected).
In the public domain world, OpenAL exists - an audio version of OpenGL. I think development stopped a long time ago. They had a very simple 3D audio approach, no geometry - no better than EAX in software.
EAX 4.0 (and I think there is a later version?) finally - after a decade - I think have incoporated some of the geometric information ray-tracing approach Aureal used (Creative bought up their IP after they folded).
The Source (Half-Life 2) engine on the SoundBlaster X-Fi already does this.
It really is something to hear. You can definitely hear the difference between an echo against concrete vs wood vs glass, etc...
A little known side area is voip. While games are having actively developed software, you are likely to spent time talking to others while you are gaming as well.
Mumble ( http://mumble.sourceforge.net/ ) is software that uses plugins to determine who is ingame with you. It will then position its audio in a 360 degree area around you, so the left is to the left, behind you sounds like as such. This made a creepily realistic addition, and while trying it out it led to funny games of "marko, polo".
Audio took a massive back turn in vista, where hardware was not allowed to be used to accelerate it anymore. This killed EAX as it was in the XP days. Software wrappers are gradually getting built now.
Very interesting field indeed. So interesting, that I'm going to do my master's degree thesis on this subject. In particular, it's use in first person shooters.
My literature research so far has made it clear that this particular field has little theoretical background. Not a lot of research has been done in this field, and most theory is based on movie-audio theory.
As for practical applications, I haven't found any so far. Of course, there are plenty titles and packages which support real-time audio-effect processing and apply them depending on the general surroundings of the auditor. e.g.: auditor enters a hall, so a echo/reverb effect is applied on the sound samples. This is rather crude. An analogy for visuals would be to subtract 20% of the RGB-value of the entire image when someone turns off (or shoots ;) ) one of five lightbulbs in the room. It's a start, but not very realisic at all.
The best work I found was a (2007) PhD thesis by Mark Nicholas Grimshaw, University of Waikato , called The Accoustic Ecology of the First-Person Shooter
This huge pager proposes a theoretical setup for such an engine, as well as formulating a wealth of taxonomies and terms for analysing game-audio. Also he argues that the importance of audio for first person shooters is greatly overlooked, as audio is a powerful force for emergence into the game world.
Just think about it. Imagine playing a game on a monitor with no sound but picture perfect graphics. Next, imagine hearing game realisic (game) sounds all around you, while closing your eyes. The latter will give you a much greater sense of 'being there'.
So why haven't game developers dove into this full-hearted already? I think the answer to that is clear: it's much harder to sell. Improved images is easy to sell: you just give a picture or movie and it's easy to see how much prettier it is. It's even easily quantifyable (e.g. more pixels=better picture). For sound it's not so easy. Realism in sound is much more sub-conscious, and therefor harder to market.
The effects the real world has on sounds are subconsciously percieved. Most people never even notice most of them. Some of these effects cannot even conciously be heard. Still, they all play a part in the percieved realism of the sound. There is an easy experiment you can do yourself which illustrates this. Next time you're walking on the sidewalk, listen carefully to the background sounds of the enviroment: wind blowing through leaves, all the cars on distant roads, etc.. Then, listen to how this sound changes when you walk nearer or further from a wall, or when you walk under an overhanging balcony, or when you pass an open door even. Do it, listen carefully, and you'll notice a big difference in sound. Probably much bigger than you ever remembered.
In a game world, these type of changes aren't reflected. And even though you don't (yet) consciously miss them, your subconsciously do, and this will have a negative effect on your level of emergence.
So, how good does audio have to be in comparison to the image? More practical: which physical effects in the real world contribute the most to the percieved realism. Does this percieved realism depend on the sound and/or the situation? These are the questions I wish to answer with my research. After that, my idea is to design a practical framework for an audio engine which could variably apply some effects to some or all game audio, depending (dynamically) on the amount of available computing power. Yup, I'm setting the bar pretty high :)
I'll be starting per September 2009. If anyone's interested, I'm thinking about setting up a blog to share my progress and findings.
Janne Louw
(BSc Computer Sciences Universiteit Leiden, The Netherlands)