How do I cap / clip audio volume via code? - audio

I am dealing with an audio file that shows sudden spikes:
When I try to normalize the audio file, the audio program sees this spike, notices that it is at 0 dB and won't normalize the audio any more.
To solve this issue, I have applied a Limiter that limits to -3 dB. I have used both Steinberg Wavelab and Audacity Hard Limiter. Instead, they both diminish the entire audio volume.
Both do not eliminate this spike.
Edit: I have found out that "Hard Clip" in Audacity does what I need, but now I still want to finish my own approach.
So I was thinking that they perhaps do not work correctly.
Then I tried to write my own limiter in VB6 in order to have full control over what's happening.
To do that, I'm loading the audio data of a wav file like this:
(I have stripped the process down very much)
Dim nSamples() As Integer
ReDim nSamples(0 To (lCountSamples - 1))
Get #iFile, , nSamples
I have the audio data in "nSamples" now.
Dim i&
For i = 1 To UBound(nSamples)
Dim dblAmplitude As Double
dblAmplitude = nSamples(i) / 32767
Dim db As Double
db = 20 * Log10(dblAmplitude)
If db > -0.3 Then
nSamples(i)=0 'how would I lower the volume here instead of just setting it to 0???
End If
Next
I'm not sure how I could calculate the dB from the sample data in order to clip it and how to clip it to -3dB.
I have now simply tried it with settings the clipping value to "0".
However, something strange happens here: The lower part of the audio is gone:
I expected that setting a value to 0 would mute the audio, but in my case, the lower wav form is gone.
What is wrong about my approach?
Thank you!

Calculating dB from audio samples isn't that simple.
It sounds like what you want to do is find an appropriate threshold and then clip the audio with something like:
If Abs(nSamples(i)) > threshold Then
nSamples(i) = threshold * Sgn(nSamples(i))
You could set the threshold to some fixed value if it's the same for all of your audio clips. Otherwise, it might be more accurate to sort the samples and search for a large gap between points near the top and bottom to find your threshold.

Related

Decoding incomplete audio file

I was given an uncompressed .wav audio file (360 mb) which seems to be broken. The file was recorded using a small usb recorder (I don't have more information about the recorder at this moment). It was unreadable by any player and I've tried GSpot (https://www.headbands.com/gspot/) to detect whether it was perhaps of a different format than wav but to no avail. The file is big, which hints at it being in some uncompressed format. It misses the RIFF-WAVE characters at the start of the file though, which can be an indication this is some other format or perhaps (more likely in this case) the header is missing.
I've tried converting the bytes of the file directly to audio and this creates a VERY noisy audio file, though voices could be made out and I was able to determine the sample rate was probably 22050hz (given a sample size of 8-bits) and a file length of about 4 hours and 45 minutes. Running it through some filters in Audition resulted in a file that was understandable in some places, but still way too noisy in others.
Next I tried running the data through some java code that produces an image out of the bytes, and it showed me lots of noise, but also 3 byte separations every 1024 bytes. First a byte close to either 0 or 255 (but not 100%), then a byte representing a number distributed somewhere around 25 (but with some variation), and then a 00000000 (always, 100%). The first 'chunk header' (as I suppose these are) is located at 513 bytes into the file, again close to a 2-power, like the chunk size. Seems a bit too perfect for coincidence, so I'm mentioning it as it could be important. https://imgur.com/a/sgZ0JFS, the first image shows a 1024x1024 image showing the first 1mb of the file (row-wise) and the second image shows the distribution of the 3 'chunk header' bytes.
Next to these headers, the file also has areas that clearly show structure, almost wave-like structures. I suppose this is the actual audio I'm after, but it's riddled with noise: https://imgur.com/a/sgZ0JFS, third image, showing a region of the file with audio structures.
I also created a histogram for the entire file (ignoring the 3-byte 'chunk headers'): https://imgur.com/a/sgZ0JFS, fourth image. I've flipped the lower half of the range as I think audio data should be centered around some mean value, but correct me if I'm wrong. Maybe the non-symmetric nature of the histogram has something to do with signed/unsigned data or two's-complement. Perhaps the data representation is in 8-bit floats or something similar, I don't know.
I've ran into a wall now. I have no idea what else I can try. Is there anyone out there that sees something I missed. Perhaps someone can give me some pointers what else to try. I would really like to extract the audio data out of this file, as it contains some important information.
Sorry for the bother. I've been able to track down the owner of the voice recorder and had him record me a minute of audio with it and send me that file. I was able to determine the audio was IMA 4-bit ADPCM encoded, 16-bit audio at 48000hz. Looking at the structure of the file I realized simple placing the header of the good file in front of the data of the bad file should be possible, and lo and behold I had a working file again :)
I'm still very much interested how that ADPCM works and if I can write my own decoder, but that's for another day when I'm strolling on wikipedia again. Have a great day everyone!

Arduino AnalogRead returns 0 every a couple samples

I am using Teensy3.1 to record audio with 50KHz sample rate. I use the function AnalogRead to sample the analog pin. The reading value should fall into the range between 0 to 1024.
However, after recording the data, I found there is a small reading(E.g. 0.019) every 100 samples . What might be the possible reason for that? Am I sampling it too fast?
Any feedback is very much appreciated.
I figured out that problem, which rasing another problem to be solved.
Earlier problem was caused by usage of Buffer while reading serial port in Java via RXTX. I set the buffer size to be 1024. Therefore, each time the data stream was broken at the end of each buffer and the start of the next buffer. E.g. 449.00 was broken into three lines with 4 ,49 , .00 .
One easy solution is to increase the buffer size and throw the data at the end and start of each buffer. Is there any better way to solve that?
Thanks.

Resample to 12bit PCM sound?

I have trouble finding a program wich will resample 16bit 44.1KHz PCM into 12bit 25Khz. There's not much info on this to be found...
Anyone have a clue? I tried audacity, ffmpeg,... but to no avail. I was thinking about reducing the amplitude on a 16bit sample by 75% in a normal editor and throw away the highest nibble but something tells me it might not be that simple...
You can do that using Audacity, but it's not so intuitive. There is no direct resampling option, so you need to do it in two steps.
In the Audio Track menu you can use Set Rate to change the samling rate to 25000 Hz.
That only changes the replay speed, so you need to resample it also. That is done with Change Speed in the Effect menu. The speed change is 44100/25000 = 1.764, which is 76.4% faster.
Now you can export the track to the 12-bit format that you want.
I found another way in the meantime; sox (the command line version) - very good and complete conversion: http://sox.sourceforge.net/ I am converting the sample rate with sox and process the raw file through php

MP4 Atom Parsing - where to configure time...?

I've written an MP4 parser that can read atoms in an MP4 just fine, and stitch them back together - the result is a technically valid MP4 file that Quicktime can open and such, but it can't play any audio as I believe the timing/sampling information is all off. I should probably mention I'm only interested in audio.
What I'm doing is trying to take the moov atoms/etc from an existing MP4, and then take only a subset of the mdat atom in the file to create a new, smaller MP4. In doing so I've altered the duration in the mvhd atom, as well as the duration in the mdia header. There are no tkhd atoms in this file that have edits, so I believe I don't need to alter the durations there - what am I missing?
In creating the new MP4 I'm properly sectioning the mdat block with a wide box, and keeping the 'mdat' header/size in their right places - I make sure to update the size with the new content.
Now it's entirely 110% possible I'm missing something crucial about the format, but if this is possible I'd love to get the final piece. Anybody got any input/ideas?
Code can be found at the following link:
https://gist.github.com/ryanmcgrath/958c602cff133bd7fa0b
I'm going to take a stab in the dark here and say that you're not updating your stbl offsets properly. At least I didn't (at first glance) see your python doing that anywhere.
STSC
Lets start with the location of data. Packets are written into the file in terms of chunks, and the header tells the decoder where each "block" of these chunks exists. The stsc table says how many items per chunk exist. The first chunk says where that new chunk starts. It's a little confusing, but look at my example. This is saying that you have 100 samples per chunkk, up to the 8th chunk. At the 8th chunk there are 98 samples.
STCO
That said, you also have to track where the offsets of these chunks are. That's the job of the stco table. So, where in the file is chunk offset 1, or chunk offset 2, etc.
If you modify any data in mdat you have to maintain these tables. You can't just chop mdat data out, and expect the decoder to know what to do.
As if this wasn't enough, now you have to also maintain the sample time table (stts) the sample size table (stsz) and if this was video, the sync sample table (stss).
STTS
stts says how long a sample should play for in units of the timescale. If you're doing audio the timescale is probably 44100 or 48000 (kHz).
If you've lopped off some data, now everything could potentially be out of sync. If all the values here have the exact same duration though you'd be OK.
STSZ
stsz says what size each sample is in bytes. This is important for the decoder to be able to start at a chunk, and then go through each sample by its size.
Again, if all the sample sizes are exactly the same you'd be OK. Audio tends to be pretty much the same, but video stuff varies a lot (with keyframes and whatnot)
STSS
And last but not least we have the stss table which says which frame's are keyframes. I only have experience with AAC, but every audio frame is considered a keyframe. In that case you can have one entry that describes all the packets.
In relation to your original question, the time display isn't always honored the same way in each player. The most accurate way is to sum up the durations of all the frames in the header and use that as the total time. Other players use the metadata in the track headers. I've found it best to just keep all the values the same and then players are happy.
If you're doing all that and I missed it in the script then can you post a sample mp4 and a standalone app and I can try to help you out.

How to estimate size of a Jpeg file before saving it with a certain quality factor?

I'v got a bitmap 24bits, I am writing application in c++, MFC,
I am using libjpeg for encoding the bitmap into jpeg file 24bits.
When this bitmap's width is M, and height is N.
How to estimate jpeg file size before saving it with certain quality factor N (0-100).
Is it possible to do this?
For example.
I want to implement a slide bar, which represent save a current bitmap with certain quality factor N.
A label is beside it. shows the approximate file size when decode the bitmap with this quality factor.
When user move the slide bar. He can have a approximate preview of the filesize of the tobe saved jpeg file.
In libjpeg, you can write a custom destination manager that doesn't actually call fwrite, but just counts the number of bytes written.
Start with the stdio destination manager in jdatadst.c, and have a look at the documentation in libjpeg.doc.
Your init_destination and term_destination methods will be very minimal (just alloc/dealloc), and your empty_output_buffer method will do the actual counting. Once you have completed the JPEG writing, you'll have to read the count value out of your custom structure. Make sure you do this before term_destination is called.
It also depends on the compression you are using and to be more specific how many bits per color pixel are you using.
The quality factor wont help you here as a quality factor of 100 can range (in most cases) from 6 bits per color pixel to ~10 bits per color pixel, maybe even more (Not sure).
so once you know that its really straight forward from there..
If you know the Sub Sampling Factor this can be estimated. That information comes from the start of frame marker.
In the same marker right before the width and height so is the bit depth.
If you let
int subSampleFactorH = 2, subSampleFactorV = 1;
Then
int totalImageBytes = (Image.Width / subSampleFactorH) * (Image.Height / subSampleFactorV);
Then you can also optionally add more bytes to account for container data also.
int totalBytes = totalImageBytes + someConstantOverhead;

Resources