Resample to 12bit PCM sound? - audio

I have trouble finding a program wich will resample 16bit 44.1KHz PCM into 12bit 25Khz. There's not much info on this to be found...
Anyone have a clue? I tried audacity, ffmpeg,... but to no avail. I was thinking about reducing the amplitude on a 16bit sample by 75% in a normal editor and throw away the highest nibble but something tells me it might not be that simple...

You can do that using Audacity, but it's not so intuitive. There is no direct resampling option, so you need to do it in two steps.
In the Audio Track menu you can use Set Rate to change the samling rate to 25000 Hz.
That only changes the replay speed, so you need to resample it also. That is done with Change Speed in the Effect menu. The speed change is 44100/25000 = 1.764, which is 76.4% faster.
Now you can export the track to the 12-bit format that you want.

I found another way in the meantime; sox (the command line version) - very good and complete conversion: http://sox.sourceforge.net/ I am converting the sample rate with sox and process the raw file through php

Related

How do I cap / clip audio volume via code?

I am dealing with an audio file that shows sudden spikes:
When I try to normalize the audio file, the audio program sees this spike, notices that it is at 0 dB and won't normalize the audio any more.
To solve this issue, I have applied a Limiter that limits to -3 dB. I have used both Steinberg Wavelab and Audacity Hard Limiter. Instead, they both diminish the entire audio volume.
Both do not eliminate this spike.
Edit: I have found out that "Hard Clip" in Audacity does what I need, but now I still want to finish my own approach.
So I was thinking that they perhaps do not work correctly.
Then I tried to write my own limiter in VB6 in order to have full control over what's happening.
To do that, I'm loading the audio data of a wav file like this:
(I have stripped the process down very much)
Dim nSamples() As Integer
ReDim nSamples(0 To (lCountSamples - 1))
Get #iFile, , nSamples
I have the audio data in "nSamples" now.
Dim i&
For i = 1 To UBound(nSamples)
Dim dblAmplitude As Double
dblAmplitude = nSamples(i) / 32767
Dim db As Double
db = 20 * Log10(dblAmplitude)
If db > -0.3 Then
nSamples(i)=0 'how would I lower the volume here instead of just setting it to 0???
End If
Next
I'm not sure how I could calculate the dB from the sample data in order to clip it and how to clip it to -3dB.
I have now simply tried it with settings the clipping value to "0".
However, something strange happens here: The lower part of the audio is gone:
I expected that setting a value to 0 would mute the audio, but in my case, the lower wav form is gone.
What is wrong about my approach?
Thank you!
Calculating dB from audio samples isn't that simple.
It sounds like what you want to do is find an appropriate threshold and then clip the audio with something like:
If Abs(nSamples(i)) > threshold Then
nSamples(i) = threshold * Sgn(nSamples(i))
You could set the threshold to some fixed value if it's the same for all of your audio clips. Otherwise, it might be more accurate to sort the samples and search for a large gap between points near the top and bottom to find your threshold.

Is there a way to optimize silence periods in Opus better than DTX does?

I'm doing a little bit of research about the DTX option in Opus:
Discontinuous Transmission (DTX) reduces the bitrate during silence
or background noise. When DTX is enabled, only one frame is encoded
every 400 milliseconds.
I wonder if there's an easy way to make Opus encode exactly one frame for the whole duration of a silence period rather than encoding useless silence frames every 400 milliseconds?
I want to produce "absolute" silence during silent or non-speech activity and minimize the overhead of headers, so basically a quiet recording will produce an almost empty file.
If there are other codecs that can accomplish that, I'd be happy to hear about them.
I did not test this, but I am very confident it can be done, however, you would be breaking the standards which state:
2.1.9. Discontinuous Transmission (DTX)
Discontinuous Transmission (DTX) reduces the bitrate during silence
or background noise. When DTX is enabled, only one frame is encoded
every 400 milliseconds.
Download the source code, open file ./silk/define.h and change line 57 [source for Linux] from
#define MAX_CONSECUTIVE_DTX 20 /* eq 400 ms */
to something like
#define MAX_CONSECUTIVE_DTX 40 /* eq 800 ms */
or whatever you feel is adequate. Without changing the source code I don't think it's possible, because as stated here
Even though Opus is now standardized by the IETF, this Opus
implementation will keep improving in the future. Of course, all
future versions will still be fully compliant with the Opus IETF
specification.

What is the best practice of saving an audio file with louder volume?

I would like to save a quiet audio file with more volume. Can you suggest me a program, method to do it?
The device I would like to use the audio file on is not very intelligent and the maximum volume setting is too quiet for me. Other audio files (that are louder) can be played fine on the device. So I thought, I open the file on PC, modify it to be louder, save it, then the device will play this fine too. I am aware of distortions and such, that is not the point now.
I have used VLC player and I can make a setting where the audio file is loud enough with little distortions, but I can not find the options to save the file with these settings. It is an MP3 file.
Thanks for the help,
Sziro
Increasing the volume of an audio file so that the peaks are at (or near) the maximum level is called normalisation. You can use an audio editior like audacity or there are dedicated solutions. Normally if saving to mp3 you normalise to slightly less than full volume (say -0.5 dB).
You might also want to consider compressing the audio. This will be useful if the peaks in your audio are much louder than the quiet passages, and the quiet passages are hard to hear as a result. Again, you can do this in audacity.

Does ADPCM has some sample rate?

ADPCM is adaptive, so it has varible sample rate. But does it have some average rate or something? Does it have frames of fixed time duration?
You misunderstood it here :-). "Adaptive" doesn't mean that sample rate is adjusted according to the signal it contains.
"Adaptive" means that the limited available delta steps (4Bit = only 16 possibilities to encode a sample) are adapted to the signal by prediction. It attempts to approximate from a given sample which value the next sample may have and adapts the delta steps to that.
If the signal has less change from sample to sample, the steps are chosen closer togheter than if the signal has much change. It is very unlikely that the signal goes from very oscillating to quiet from one sample to the next.
You notice that behavior if you encode a square wave with 100Hz using such algorithm and re-open it in an audio editor that makes the waveform visible. When the waveform changes from one polarity to other, the signal "speeds up" (the steps are more and more apart) until it reaches the other end and then it slows down again (The steps are more and more close togheter).
It still has a fixed sample rate. The one you will give to it. In RIFF WAVE, the sample rate is stored in the header.

How do you get an accurate duration of a given h264 stream(Silverlight)?

Background:
I have encoded a raw h264 file using ffmpeg. I'm trying to create my own container like how Smooth Streaming works with fragmented mp4 containers. I'm not happy with the security of smooth stream though since anyone can completely rip a file from IIS with appropriate authentication.
Problem
Anyway I have my raw h264 stream playback "kinda" working using MediaStreamSource within Silverlight with ssl enabled but I can't get my timestamp right for the chucks that I'm sending from server side to the MediaStreamSource within the silverlight client. There is a delay between h264 data chunks which I have parsed by sps Nals. I saw this question for getting duration. Wondering if there is an easy way to count frames in a h264 stream and get a duration so that I can relay an accurate timestamp to the MediaSampleSource. If someone can A: point me in the direction of an open source frame counter or give me some pseudo code for parsing out frames (Maybe some Hex parsing tips). Or maybe someone has some experience with this exact issue that would be great. Any help would be greatly appreciate. Thanks in advance.
I dug through the ISO 14496-10 Documentation and found some hex strings for finding frames in a raw h264 stream:
0x00000141,
0x00000101,
0x00000165
If you go through your stream and count these hex strings and your encoding with ffmpeg and libx264 this should get you a pretty solid frame count. (Please Someone Correct Me If I'm wrong). So if you have the total duration of the h264 video and you have the FPS which you should be able to easily get from ffmpeg then you can use the amount of frames calculated in any given chunk of data that is passed to the MediaStreamSource to get a very accurate TimeStamp for you MediaSampleSource. Hope this helps someone because it was really frustrating me a couple nights ago when my playback was all choppy.
Edit
As I have tested my playback feature in directshow I have noticed that this is not perfect and only works for simplistically encoded h264 streams using ffmpeg. h264 has variable framerates and bitrates. Although the video runs pretty smoothly, a discerning eye can see that at more complex sequences in the video the timing is a bit awkward. I think for a crude video streaming player this is a fine method especially if keyframes are frequently used. I thought it would be fair to add this before I clicked answered.
This is actually a bit of a rabbit hole. Start with ISO 14496 part 10 and go to section 7.3 for syntax.
The first approximation is to use the field rate in the vui_parameters ( num_units_in_tick/time_scale ) and the number of slice_header()s.
This breaks down if you're dealing with HD content and your encoder is using multiple slice_header()s per picture (then you have to check first_mb_in_slice ==0).
You'll have to pay attention to frame_mbs_only_flag and field_pic_flag.
The other hairball is Table D-1 which interprets the pic_struct field of the pic_timing SEI message. This covers things like field repetition (TBT or BTB), frame doubling, and frame tripling.
If you have a transport stream, you can make an end run around this by checking the DTS values on the PES headers (ISO 13818 part 1) for the first and last access unit.

Resources