How to play a wav file on high frequency? - audio

Do you have any idea on converting my wav audio data to high frequency playback.
I am creating a module that plays a melody wav file in 16-20khz frequency.
any idea?
-tong

There are several different ways of doing this and it will depend whether you have any temporal information that you want to preserve, or whether you can just effectively increase the sample rate so that the frequency is scaled up (and the duration scaled down).
You didn't mention anything about OS, programming language, sound API, etc, so it's impossible to give specific solutions, but if your sound API allows you to change the playback rate than you may just be able to scale this up by a suitable factor. If not then you will need to resample the sound to achieve the same effect. There are plenty of third party libraries out there for resampling (upsampling/downsampling).
If you need to preserve temporal information then things get a little trickier and you'll need to look at pitch shifting algorithms, e.g. phase vocoder.

Related

Fieldwork audio recording for acoustic analysis: stereo or mono? appropriate gain?

I work in the field of phonetics and often need to record human speech for acoustic analysis. I have two questions that I couldn't find answers:
If I record in stereo channels, I need to convert to mono later on to proceed with annotation. So in principle mono signal is good enough. Are there reasons that stereo sound should be used (e.g. the signal would be better?)
Also, we were warned that the gain level should be kept small so that the recording level shouldn't exceed the maximum, which leads to signal cuttoff. However, I was also criticised when the recording file shows too low an amplitude (it's still very clear though), for that leads to a low SNR. How do people choose an appropriate gain level?
As the act of recording is involved, the Sound Design forum might be your best bet.
I can't think anything that might be gained, in terms of frequency analysis, by having a stereo signal. Stereo is more about locating the source of a sound in 3D space. Does the source of sound emit different frequency profiles in different directions? Does the environment filter the sound differently over the course of the two paths to the stereo inputs? If the the answer is "not significantly" then mono should be fine.
Choosing an appropriate gain level is mostly a matter of knowing your equipment. Ideally, your recording setup will provide feedback (usually a visual meter of some sort) that shows the signal strength. The "best" would be (theoretically) the loudest level that does not distort. So you have to know at what level distortion happens on all the elements of the recording chain.
There can be some fudging on this, given that the loudest peak on a recorded segment may be an outlier.

How to decrease pitch of audio file in nodejs server side?

I have a .MP3 file stored on my server, and I'd like to modify it to be a bit lower in pitch. I know this can be achieved by increasing the length of the audio, however, I don't know of any libraries in node that can do this.
I've tried using the node web audio api, and soundbank-pitch-shift, but the former doesn't seem to have the capabilities of pitch shifting (AFAIK), and the latter seems designed toward client
I need the solution within the realm of node ONLY- that means no external programs, etc., and it needs to be automated as well, so I can't manually pitch shift.
An ideal solution would be a function that takes a file/filepath as an input, and then creates (or overwrites) another MP3 file but with the pitch shifted by x amount, but really, any solution that produces something with a lower pitch than the original, works.
I'm totally lost here. Please help.
An audio file is basically a list of numbers. Those numbers are read one at a time at a particular speed called the 'sample rate'. The sample rate is otherwise defined as the number of audio samples read every second e.g. if an audio files sample rate is 44100, then there are 44100 samples (or numbers) read every second.
If you are with me so far, the simplest way to lower the pitch of an audio file is to play the file back at a lower sample rate (which is normally fixed in place). In most cases you wont be able to do this, so you need to achieve the same effect by resampling the file i.e adding new samples to the file in between the old samples to make it literally longer. For this you would need to understand interpolation.
The drawback to this technique in either case is that the sound will also play back at a slower speed, as well as at a lower pitch. If it is a problem that the sound has slowed down as well as lowered in pitch as a result of your processing, then you will also have to use a timestretching algorithm to fix the playback speed.
You may also have problems doing this using MP3 files. In this case you may have to uncompress the data in the MP3 file before you can operate on it in such a way that changes the pitch of the file. WAV files are more ideal in audio processing. In any case, you essentially need to turn the file into a list of floating point numbers, and change those numbers to be effectively read back at a slower rate.
Other methods of pitch shifting would probably need to involve the use of ffts, and would be a more complicated affair to say the least.
I am not familiar with nodejs I'm afraid.
I managed to get it working with help from Ollie M's answer and node-lame.
I hadn't known previously that sample rate could affect the speed, but thanks to Ollie, suddenly this problem became a lot more simple.
Using node-lame, all I did was take one of the examples (mp32wav.js), and make it so that I change the parameter sampleRate of the format object, so that it is lower than the base sample rate, which in my application was always a static 24,000. I could also make it dynamic since node-lame can grab the parameters of the input file in the format object.
Ollie, however perfectly describes the drawback with this method
The drawback to this technique in either case is that the sound will
also play back at a slower speed, as well as at a lower pitch. If it
is a problem that the sound has slowed down as well as lowered in
pitch as a result of your processing, then you will also have to use a
timestretching algorithm to fix the playback speed.
I don't have a particular need to implement a time stretching algorithm at the moment (thankfully, because that's a whole other can of worms), since I have the ability to change the initial speed of the file, but others may in the future.
See https://www.npmjs.com/package/audio-decode, https://github.com/audiojs/audio-buffer, and related linked at bottom of audio-buffer readme.

How can I look for certain sounds in a live sound input?

I've combed StackOverflow and the web for many questions on whistle detection, etc, and many people did explain as much as they could as to how they can go about detecting their stuff.
capturing sound for analysis and visualizing frequences in android
analyzing whistle sound for pitch note
But what I don't get is how does FFT help you to detect certain sounds in a given sample audio data?
Here's what I understand so far from some stuff I found here and there.
-The sine wave is more or less the building block of ALL signals, musical or not
-Three parameters - FREQUENCY, AMPLITUDE, and INITIAL PHASE, characterize every steady sine wave completely.
-They make each and any kind of wave unique.
-Fourier transform can be used to inspect what kinds of sine waves there are in a signal
SOURCE -- [Audio signal processing basics][3]
Audio data that the computer generates as received from the mic or other input source, for live processing, is an array of amplitudes processed (or stored or taken) at a particular sample rate.
So how does one go from that to detecting whistles and claps?
And complex things such as say, a short period of whistling to a particular song?
My theory of detecting is that we test our whistles in a spectogram, and record the particular frequency and amplitude characteristics. And then if those particular characteristics are repeated again in the input, we've detected a whistle.
Am I right or wrong?
This sound processing stuff is a little complicated.
Forgot to mention this - I'm using Python. Java is also okay, since most of the examplar code I found was for Android which is in Java. And I can work in Java too. Any mention of any libraries or APIs would be helpful too.

Can FFT be used to find drum solos/breaks in audio files?

Is it possible with FFT to find a drum solo, or a drum break, in an audio file? Is this something FFT is able to do and are there any resources online that could aid me with learning?
In general, a FFT is not a good choice for detecting the onset of percussion sounds:
An FFT is always calculated over a window of samples (in effect a period of time) and yields the magnitude of signal within the bin and its phase offset. You can therefore determine that there is signal at that particular bin, but not its onset time. The best time resolution available is the window period. Of course, you can make the period shorter at the expense of frequency resolution.
Percussion sounds tend to look like noise and spread across the spectrum. This would be OK if you only had percussions sounds, but is not great in real-life polyphonic content.
However, you might be able to find some inference from the different characteristics of the spectra of a drum solo vs instrumental sections of a track.
The problem of finding the time at which percussion sounds start in music is described in academic journals as onset dectection and is one of the many techniques used for feature extraction; the wider field is known as Music Information Retrieval. Your problem sounds like one of identifying sections in audio files and this might be described as partitioning
A good place to start is Sonic Visualiser which is a tool written specifically for MIR applications. Plug-ins exist for various types of feature extraction. From these you will be able to easily find the large body of academic work in this area. There is an added bonus that the existing plug-ins are all open source too.
I'd look here, there was a bit of discussion with great pointers on the Gamedev SE: https://gamedev.stackexchange.com/questions/9761/beat-detection-and-fft :-)

Frequency differences from MP3 to mic

I'm trying to compare sound clips based on microphone recording. Simply put I play an MP3 file while recording from the speakers, then attempt to match the two files. I have the algorithms in place that works, but I'm seeing a slight difference I'd like to sort out to get better accuracy.
The microphone seem to favor some frequencies (add amplitude), and be slightly off on others (peaks are wider on the mic).
I'm wondering what the cause of this difference is, and how to compensate for it.
Background:
Because of speed issues in how I'm doing comparison I select certain frequencies with certain characteristics. The problem is that a high percentage of these (depending on how many I choose) don't match between MP3 and mic.
It's called the response characteristic of the microphone. Unfortunately, you can't easily get around it without buying a different, presumably more expensive, microphone.
If you can measure the actual microphone frequency response by some method (which generally requires having some etalon acoustic system and an anechoic chamber), you can compensate for it by applying an equaliser tuned to exactly inverse characteristic, like discussed here. But in practice, as Kilian says, it's much simpler to get a more precise microphone. I'd recommend a condenser or an electrostatic one.

Resources