so it would appear that the only values that actually work are 0.0, 0.5, 1.0, and 2.0...
i tried setting it to 0.25 since I want it to play at 1/4th of the natural speed, but it played it at 1/2 of the natural speed instead. can anyone confirm this?
The play rate restriction appears to be due to pitch correction, which is now configurable in iOS 7 or later.
// This prevents the play rate from going below 1/2.
playerItem.audioTimePitchAlgorithm = AVAudioTimePitchAlgorithmLowQualityZeroLatency;
That seems to be the default setting:
Low quality and very low computationally intensive pitch algorithm. Suitable for brief fast-forward and rewind effects as well as low quality voice. The rate is snapped to {0.5, 0.666667, 0.8, 1.0, 1.25, 1.5, 2.0}.
The other three algorithm settings let you vary the play rate down to 1/32. For example, AVAudioTimePitchAlgorithmVarispeed turns off pitch correction.
// Enable play rates down to 1/32.
playerItem.audioTimePitchAlgorithm = AVAudioTimePitchAlgorithmVarispeed;
Confirmed. I actually had a ticket with Apple DTS open for this issue and a bug filed. The only supported values are 0.50, 0.67, 0.80, 1.0, 1.25, 1.50, and 2.0. All other settings are rounded to nearest value.
I found that smaller values are indeed supported, but all tracks in the AVPlayerItem have to support the speed. However, Apple does not provide a property on individual tracks that would indicate what rates are supported, there is only the property canPlaySlowForward on AVPlayerItem.
What i found is, that AVPlayerItems with an audio track cannot play at rates slower than 0.5. However, if there is only a video track, the rate can have an arbitrary small value like 0.01. I will try to write a category that checks on-the-fly which values are supported and disable unsupported tracks if needed.
br denis
UPDATE
I wrote a function which you can call whenever you want to set the rate for video below 0.5. It enables/disables all audio tracks.
- (void)enableAudioTracks:(BOOL)enable inPlayerItem:(AVPlayerItem*)playerItem
{
for (AVPlayerItemTrack *track in playerItem.tracks)
{
if ([track.assetTrack.mediaType isEqual:AVMediaTypeAudio])
{
track.enabled = enable;
}
}
}
(Xcode 11.6, iOS 13.6, Swift 5)
It doesn't work with
player.rate = 2.0
It works with
player.playImmediately(atRate: 3.0)
I agree with #otto, hi answer solved my problem.
/*
AVAudioProcessingSettings.h
#abstract Values for time pitch algorithm
#constant AVAudioTimePitchAlgorithmLowQualityZeroLatency
Low quality, very inexpensive. Suitable for brief fast-forward/rewind effects, low quality voice.
Rate snapped to {0.5, 0.666667, 0.8, 1.0, 1.25, 1.5, 2.0}.
#constant AVAudioTimePitchAlgorithmTimeDomain
Modest quality, less expensive. Suitable for voice.
Variable rate from 1/32 to 32.
#constant AVAudioTimePitchAlgorithmSpectral
Highest quality, most computationally expensive. Suitable for music.
Variable rate from 1/32 to 32.
#constant AVAudioTimePitchAlgorithmVarispeed
High quality, no pitch correction. Pitch varies with rate.
Variable rate from 1/32 to 32.
*/
AVF_EXPORT NSString *const AVAudioTimePitchAlgorithmLowQualityZeroLatency NS_AVAILABLE_IOS(7_0);
AVF_EXPORT NSString *const AVAudioTimePitchAlgorithmTimeDomain NS_AVAILABLE(10_9, 7_0);
AVF_EXPORT NSString *const AVAudioTimePitchAlgorithmSpectral NS_AVAILABLE(10_9, 7_0);
AVF_EXPORT NSString *const AVAudioTimePitchAlgorithmVarispeed NS_AVAILABLE(10_9, 7_0);
No it's working fine for me ( xcode 4.2) on ipad 2 ios 5. I used the AVPlayerDemo from the dev resources and modify the rate property with a slider and it's very smooth, definitly no jumps. tho the behavior below 0.2 is odd. maybe the rate is not linear near the extrem values, but definitly smooth. from 0.2 all the way up to 2. I am using videos I captured with the device, that could make a difference.
Related
I need to measure signal frequency while the musicians play music, and it happens to be a bit too fast for FFT (Fast Fourier Transform).
Musicians play music at 90-140 bpm. This means that there are 90-140 groups of notes each minute, up to 8 (more frequently, up to 4) notes in each group (60/140/8 = 0.0536 sec, 60/90/4 = 0.167 sec), that is, notes may change at the rate of 6-19 notes per second.
The music uses a logarithmic scale: the range between, say, 440Hz and 880Hz is divided into 12 notes, only 7 of which are used for melody. (Basically, they use only the white keys on the piano; when they want to shift the starting frequency, they use some of the black keys and don't use some white keys.)
That is, the frequency of each next note is multiplied by 2^(1/12) = 1.05946.
To make things more complicated, the A (La) frequency may vary from 438 to 446 Hz. The string instruments in theory can be tuned, while the wind instruments depend on the air temperature and humidity, so the frequency happens to be re-negotiated by the musicians during the sound check.
Sometimes musicians and vocalists make errors in frequency, they call it "out of tune". They want a device that would inform them of such "out of tune errors". They have tuners, but the tuners require playing the same sound for about 1 sec before they start showing anything. This works for tuning, but does not work while the music is played.
Most likely, the tuner is doing FFT, and due to the formula
df = 1/T
waits for 1 second to get the 1Hz resolution.
For A=440Hz, the difference in frequency between two notes is 440*0.05946 = 26.16 Hz, to get that frequency resolution, one has to use acquisition time of 0.038 sec, that is, at tempo=196bpm FFT is able to just distinguish two notes, at 98 bpm it is able to tell a 50% out-of-tune error provided that it starts acquisition at the very moment that the pitch changes. If we allow the pitch change in the course of an acquisition period, we get 49 bpm, which is just too slow. In addition, it is very desirable to be more precise about the frequency, say, detect a 25% out-of-tune error.
Is there a way to measure frequency better than FFT, that is, with better resolution in less acquisition time? (At least 2 times better, ideally, 8 times better.)
In exchange, I do not need to distinguish between notes of different octaves, e.g. both 440 and 880 may be recognized as A. (Probably, more trade-offs are possible, just nothing else comes to my mind right now.)
UPD
Here's a really good drawing:
UPD2
I have found a PhD thesis and open source software (TARTINI -- the real-time music analysis tool) at:
http://miracle.otago.ac.nz/tartini/
(The pages are also available via the web archive service: http://web.archive.org = http://archive.org = http://waybackmachine.org )
Regarding the FFT, assuming the narrow-band spectral frequency content is sparse and well separated in low enough background noise, frequency peaks can be interpolated or phase vocoded to much higher resolution than the FFT bin spacing (bin spacing as related to the inverse of the length of the segment of actual time-domain data). Parabolic interpolation is common, but there are other more accurate interpolation kernels. Phase vocoder frequency estimation methods require stationarity across 2 overlapped frames, however the total span of those 2 frames can be relatively short.
But the peak spectral frequency reported by an FFT is not the same as a pitch frequency as perceived by a human (as voices and many musical instruments can radiate more acoustic spectral energy in an overtone series than at pitch frequency, sometimes slightly inharmonically.) There are algorithms more suited for pitch estimation than FFTs (alone). A partial list is in this answer: FFT on iPhone to ignore background noise and find lower pitches
Many academic papers on pitch estimation methods for music can be found on the music-ir/MIREX site: http://www.music-ir.org/mirex/wiki/MIREX_HOME
In the video The Sound of Hydrogen (original here), the sound
is created using the NIST Atomic Spectra Database and then importing this edited data into Mathematica to modulate a Sine Wave. I was wondering how he turned the data from the website into the values shown in the video (3:47 - top of the page) because it is nothing like what is initially seen on the website.
Short answer: It's different because in the tutorial the sampling rate is 8 kHz while it's probably higher in the original video.
Long answer:
I wish you'd asked this on http://physics.stackexchange.com or http://math.stackexchange.com instead so I could use formulae... Use the bookmarklet
javascript:(function(){function%20a(a){var%20b=a.createElement('script'),c;b.src='https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML.js',b.type='text/javascript',c='MathJax.Hub.Config({tex2jax:{inlineMath:[[\'$\',\'$\']],displayMath:[[\'\\\\[\',\'\\\\]\']],processEscapes:true}});MathJax.Hub.Startup.onload();',window.opera?b.innerHTML=c:b.text=c,a.getElementsByTagName('head')[0].appendChild(b)}function%20b(b){b.MathJax===undefined?a(b.document):b.MathJax.Hub.Queue(new%20b.Array('Typeset',b.MathJax.Hub))}var%20c=document.getElementsByTagName('iframe'),d,e;b(window);for(d=0;d<c.length;d++)e=c[d].contentWindow||c[d].contentDocument,e.document||(e=e.parentNode),b(e)})()
to render the formulae with MathJax:
First of all, note how the Rydberg formula provides the resonance frequencies of hydrogen as $\nu_{nm} = c R \left(\frac1{n^2}-\frac1{m^2}\right)$ where $c$ is the speed of light and $R$ the Rydberg constant. The highest frequency is $\nu_{1\infty}\approx 3000$ THz while for $n,m\to\infty$ there is basically no lower limit, though if you restrict yourself to the Lyman series ($n=1$) and the Balmer series ($n=2$), the lower limit is $\nu_{23}\approx 400$ THz. These are electromagnetic frequencies corresponding to light (not entirely in the visual spectrum (ranging from 430–790 THz), there's some IR and lots of UV in there which you cannot see). "minutephysics" now simply considers these frequencies as sound frequencies that are remapped to the human hearing range (ca 20-20000 Hz).
But as the video stated, not all these frequencies resonate with the same strength, and the data at http://nist.gov/pml/data/asd.cfm also includes the amplitudes. For the frequency $\nu_{nm}$ let's call the intensity $I_{nm}$ (intensity is amplitude squared, I wonder if the video treated that correctly). Then your signal is simply
$f(t) = \sum\limits_{n=1}^N \sum\limits_{m=n+1}^M I_{nm}\sin(\alpha(\nu_{nm})t+\phi_{nm})$
where $\alpha$ denotes the frequency rescaling (probably something linear like $\alpha(\nu) = (20 + (\nu-400\cdot10^{12})\cdot\frac{20000-20}{(3000-400)\cdot 10^{12}})$ Hz) and the optional phase $\phi_{nm}$ is probably equal to zero.
Why does it sound slightly different? Probably the actual video did use a higher sampling rate than the 8 kHz used in the tutorial video.
Can anyone tell me why the volume becomes lower when I make the pitch higher in openal? The higher the pitch, the lower the volume.....
alSourcef(source, AL_PITCH, 1.2f);
alSourcef(source, AL_GAIN, 1.0f);
with this setting, the volume is still very very low. is there a way to cheat it to make the gain above 1? Maybe this has something to do with distance??
FYI, the source is a voice recorded from AVrecorder, so I cant set the source volume any higher.
Afaik it is not normal that the amplitude would change in function of pitch change. When pitch is set higher than original, openal speeds up the sample by the multiplier (afaik) using some sort of interpolation when the multiplier is not whole.
There might be some rare cases where the amplitude changes, but probably not for longer samples with lots of frequency content (as most natural sounds tend to be)
How loud we perceive that amplitude depends on the pitch, see equal loudness contour
Maybe that effect explains your question?
As workaround you could lower the gain for normal pitched sounds and use higher gain for higher pitched sounds.
Or multiply the source data by a multiplier before attaching/passing to a buffer.
I want to calculate room noise level with the computer's microphone. I record noise as an audio file, but how can I calculate the noise dB level?
I don't know how to start!
All the previous answers are correct if you want a technically accurate or scientifically valuable answer. But if you just want a general estimation of comparative loudness, like if you want to check whether the dog is barking or whether a baby is crying and you want to specify the threshold in dB, then it's a relatively simple calculation.
Many wave-file editors have a vertical scale in decibels. There is no calibration or reference measurements, just a simple calculation:
dB = 20 * log10(amplitude)
The amplitude in this case is expressed as a number between 0 and 1, where 1 represents the maximum amplitude in the sound file. For example, if you have a 16 bit sound file, the amplitude can go as high as 32767. So you just divide the sample by 32767. (We work with absolute values, positive numbers only.) So if you have a wave that peaks at 14731, then:
amplitude = 14731 / 32767
= 0.44
dB = 20 * log10(0.44)
= -7.13
But there are very important things to consider, specifically the answers given by the others.
1) As Jörg W Mittag says, dB is a relative measurement. Since we don't have calibrations and references, this measurement is only relative to itself. And by that I mean that you will be able to see that the sound in the sound file at this point is 3 dB louder than at that point, or that this spike is 5 decibels louder than the background. But you cannot know how loud it is in real life, not without the calibrations that the others are referring to.
2) This was also mentioned by PaulR and user545125: Because you're evaluating according to a recorded sound, you are only measuring the sound at the specific location where the microphone is, biased to the direction the microphone is pointing, and filtered by the frequency response of your hardware. A few feet away, a human listening with human ears will get a totally different sound level and different frequencies.
3) Without calibrated hardware, you cannot say that the sound is 60dB or 89dB or whatever. All that this calculation can give you is how the peaks in the sound file compares to other peaks in the same sound file.
If this is all you want, then it's fine, but if you want to do something serious, like determine whether the noise level in a factory is safe for workers, then listen to Paul, user545125 and Jörg.
You do need reference hardware (i.e., a reference mic) to calculate noise level (dB SPL, or sound pressure level). One thing Radio Shack sells is a $50 dB SPL meter. If you're doing scientific calculations, I wouldn't use it. But if the goal is to get a general idea of a weighted measurement (dBA or dBC) of the sound pressure in a given environment, then it might be useful. As a sound engineer, I use mine all the time to see how much sound volume I'm generating while I mix. It's usually accurate to within 2 dB.
That's my answer. The rest is FYI stuff.
Jorg is correct that dB SPL is a relative measurement. All decibel measurements are. But you've implied a reference of 0 dB SPL, or 20 micropascals, scientifically agreed to be the most quiet sound a human ear can detect (though, understandably, what a person can actually hear is very difficult to determine). This, according to Wikipedia, is about the sound of a flying mosquito from about 10 feet away (http://en.wikipedia.org/wiki/Decibel).
By assuming you don't understand decibels, I think Jorg is just trying to out-geek you. He clearly didn't give you a practical answer. :-)
Unweighted measurements (dB, instead of dBA or dBC) are rarely used, because most sound pressure is not detected by the human ear. In a given office environment, there is usually 80-100 dB SPL (sound pressure level). To give you an idea of exactly how much is not heard, in the U.S., occupational regulations limit noise exposure to 80 dBA for a given 8-hour work shift (80 dBA is about the background noise level of your average downtown street - difficult, but not impossible to talk over). 85 dBA is oppressive, and at 90, most people are trying to get away. So the difference between 80 dB and 80 dBA is very significant -- 80 dBA is difficult to talk over, and 80 dB is quite peaceful. :-)
So what is 'A' weighting? 'A' weighting compensates for the fact that we don't perceive lower frequency sounds as well as high frequency sounds (we hear 20 Hz to 20,000 Hz). There's a lot of low-end rumble that our ears/brains pretty much ignore. In addition, we're more sensitive to a certain midrange (1000 Hz to 4000 Hz). Most agree that this frequency range contains the sounds of consonants of speech (vowels happen at a much lower frequency). Imagine talking with just vowels. You can't understand anything. Thus, the ability of a human to be able to communicate (conventionally) rests in the 1kHz-5kHz bump in hearing sensitivity. Interestingly, this is why most telephone systems only transmit 300 Hz to 3000 Hz. It was determined that this was the minimal response needed to understand the voice on the other end.
But I think that's more than you wanted to know. Hope it helps. :-)
You can't easily measure absolute dB SPL, since your microphone and analogue hardware are not calibrated. You may be able to do an approximate calibration for a particular hardware set up but you would need to repeat this for every different microphone and hardware set up that you plan to support.
If you do have some kind of SPL reference source that you can use then then it gets easier:
use your reference source to generate a tone at a known dB SPL - measure this
measure the ambient noise
calculate noise level = 20 * log10 (V_noise / V_ref) + dB_ref
Of course this assumes that the frequency response of your microphone and audio hardware is reasonably flat and that you just want a flat (unweighted) noise figure. If you want a weighted (e.g. A-weight) noise figure then you'll have to do rather more processing.
According to Merchant et al. (section 3.2 in the appendix: "Measuring acoustic habitats", Methods in Ecology and Evolution, 2015), you can actually calculate absolute, calibrated SPL values using manufacturer specifications by subtracting a correction term S to your relative (scaled to maximum) SPL values:
S = M + G + 20*log10(1/Vadc) + 20*log10(2^Nbit-1)
where M is the sensitivity of the transducer (microphone) re 1 V/Pa. G is the gain applied by the user. Vadc is the zero-to-peak voltage, given by multiplying the rms ADC voltage by a conversion factor of squareroot(2). Nbit is the bit sampling depth.
The last term is necessary if your system scales the amplitude by its maximum.
The correction will be more accurate using end-to-end calibration with sound calibrators.
Note that the formula above is dependent on frequency, but you could apply it over a wider frequency range if your microphone has a flat frequency response.
You can't. dB is a relative unit, IOW it is a unit for comparing two measurements against each other. You can only say that measurement A is x dB louder than measurement B, but in your case you only have one measurement. Therefore, it simply isn't possible to calculate the dB level.
The short answer is: you cannot do sound level measurements with your laptop, nor with your cellphone, etc., for all the reasons outlined previously, plus the fact your cellphone, laptop, etc. use compression algorithms to assure that everything recorded is within the hardware capability. So, if for example you measure a sound then run it through signal processing software such as Head Artemis or LMS Test.Lab, the indicated sound pressure level will always be in the neighborhood of 80 dB(A) regardless of the true level. I can say this from having used cellphone or laptop audio to get an idea of a noise frequency spectrum, while taking level measurements using a calibrated sound level meter. Interestingly, Radio Shack used to sell a microphone intended for speech input while videoconferencing that had very flat frequency response over a broad range, and only cost about $15.
I use a sound level calibrator.
It produces 94 dB or 114dB at 1 KHz
wich is a frecuency where weighting
filters share the same level.
With calibrator at 114dB I adjust mic gain to reach almost full scale
input simply watching a sound card based virtual osciloscope.
Now I know Vref # 114dB.
I developed a simple software based SPL meter
that can be provided if needed. You can use REW too.
You hace to know that PC hardware hardly
reaches 60 dB of dynamic range so calibrating
#114 dB it wont read less than 54dB, wich
is pretty high if you consider that sleeping
is good with less than 35 dB A.
In this case you can calibrate at 94dB
and then you may measure down to 34dB
but again you will hit pc and mic self noise
wich may you prevent to reach such low levels.
Anyway, once calibrated, measures at 114dB
and 94dB should read fine.
Note: the lab standard pistonphone calibrator operates at 250 Hz.
Well! I Used RobertT's Method But It Always Giving Me Oveflow Exception, Then I Used:- int dB = -36 - (value * -1), The Exception Gone, I Don't Know Whether It's Telling dB Values, If You Knew Using Code Given Below, Please Comment Me Whether it's A dB Value or not.
VB.NET:-
Dim dB As Integer = -36 - (9 * -1)
C#:-
int dB = -36 - (9 * -1)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am writing a software synthesizer and need to generate bandlimited, alias free waveforms in real time at 44.1 kHz samplerate. Sawtooth waveform would do for now, since I can generate a pulse wave by mixing two sawtooths together, one inverted and phase shifted.
So far I've tried the following approaches:
Precomputing one-cycle perfectly bandlimited waveform samples at different bandlimit frequencies at startup, then playing back the two closest ones mixed together. Works okay I guess, but does not feel very elegant. A lot of samples are needed or the "gaps" between them will be heard. Interpolating and mixing is also quite CPU intensive.
Integrating a train of DC compensated sinc pulses to get a sawtooth wave. Sounds great except that the wave drifts away from zero if you don't get the DC compensation exactly right (which I found to be really tricky). The DC problem can be reduced by adding a bit of leakage to the integrator, but then you lose the low frequencies.
So, my question is: What is the usual way this is done? Any suggested solution must be efficient in terms of CPU, since it must be done in real time, for many voices at once.
One fast way to generate band-limited waveforms is by using band-limited steps (BLEPs). You generate the band-limited step itself:
and store that in a wavetable, then replace each transition with a band-limited step, to create waveforms that look like this:
See the walk-through at Band-Limited Sound Synthesis.
Since this BLEP is non-causal (meaning it extends into the future), for generating real-time waveforms, it's better to use the minimum-phase band-limited step, called a MinBLEP, which has the same frequency spectrum, but only extends into the past:
MinBLEPs take the idea further and
take a windowed sinc, perform a
minimum phase reconstruction and then
integrate the result and store it in a
table. Now to make an oscillator you
just insert a MinBLEP at each
discontinuity in the waveform. So for
a square wave you insert a MinBLEP
where the waveform inverts, for saw
wave you insert a MinBLEP where the
value inverts, but you generate the
ramp as normal.
There are a lot of ways to approach the bandlimited waveform generation. You will end up trading computational cost against quality as usual.
I suggest that you take a look at this site here:
http://www.musicdsp.org/
Check out the archive! It's full of good material. I just did a search on the keyword "bandlimited". The material that pops up should you keep busy for at least a week.
Btw - Don't know if that's what you looking for, but I did alias reduced (e.g. not really band limited) waveform generation a couple of years ago. I just calculated the integral between the last and current sample-position. For traditional synth-waveforms you can do that rather easy if you split your integration interval at the singularities (e.g. when the sawtooth get's his reset). The CPU load was low and the quality acceptable for my needs.
I had the same drift-problems, but applying a high-pass with a very low cutoff-frequency on the integral got rid of that effect. Real analog-synth don't go down into the subhertz region anyway, so you won't miss much.
This is what I came up with, inspired by Nils' ideas. Pasting it here in case it is useful for someone else. I simply box filter a sawtooth wave analytically using the change in phase from the last sample as a kernel size (or cutoff). It works fairly well, there is some audible aliasing at the very highest notes, but for normal usage it sounds great.
To reduce aliasing even more the kernel size can be increased a bit, making it 2*phaseChange for example sounds good as well, though you lose a bit of the highest frequencies.
Also, here is another good DSP resource I found when browsing SP for similar topics: The Synthesis ToolKit in C++ (STK). It's a class library that has lot's of useful DSP tools. It even has ready to use bandlimited waveform generators. The method they use is to integrate sinc as I described in my first post (though I guess they do it better then me...).
float getSaw(float phaseChange)
{
static float phase = 0.0f;
phase = fmod(phase + phaseChange, 1.0f);
return getBoxFilteredSaw(phase, phaseChange);
}
float getPulse(float phaseChange, float pulseWidth)
{
static float phase = 0.0f;
phase = fmod(phase + phaseChange, 1.0f);
return getBoxFilteredSaw(phase, phaseChange) - getBoxFilteredSaw(fmod(phase + pulseWidth, 1.0f), phaseChange);
}
float getBoxFilteredSaw(float phase, float kernelSize)
{
float a, b;
// Check if kernel is longer that one cycle
if (kernelSize >= 1.0f) {
return 0.0f;
}
// Remap phase and kernelSize from [0.0, 1.0] to [-1.0, 1.0]
kernelSize *= 2.0f;
phase = phase * 2.0f - 1.0f;
if (phase + kernelSize > 1.0f)
{
// Kernel wraps around edge of [-1.0, 1.0]
a = phase;
b = phase + kernelSize - 2.0f;
}
else
{
// Kernel fits nicely in [-1.0, 1.0]
a = phase;
b = phase + kernelSize;
}
// Integrate and divide with kernelSize
return (b * b - a * a) / (2.0f * kernelSize);
}
The DC offset from a blit - can be reduced with a simple High Pass Filter! - much like a real analogue circuit where they use a DC blocking cap!