Retrieve audio duration from kbps and size - audio

I have this data:
Bit speed: 276 kilobytes/seconds
File size: 6.17 MB
Channels: 2
Layer: 3
Frequency: 44100 HZ
How can I retrieve the audio duration in seconds or milliseconds?

You can't. To get the duration you need the sampling rate in samples per second but also the number of channels (mono, stereo, etc.), and the sample length in bytes (1 to 3 usually). And unless it is a raw audio there is also additional data that takes some space. 276kpbs does not help here. If it is a mP3 the file is compressed, you simply can't just by looking at the file size.

Related

How find sampleCount knowing length audio file and sampleRate?

I have been looking for a long time how to find sampleCount, but there is no answer. It is possible to say an algorithm or formula for calculation. It is known 850ms , the file weight is 37 KB, the resolution of the wav file , sampleRate is 48000.... I can check , you should get sampleCount equal to 40681 as I have in the file . this is necessary so that I can calculate sampleCount for other audio files.I am waiting for your help
I found and I get 40800 . I multiplied the rate with the time in seconds
Yes, the sample count is equal to the sample rate, multiplied by the duration.
So for an audio file that is exactly 850 milliseconds, at 48 kHz sample rate:
850 * 48000 = 40800 samples
Now, with MP3s you have to be careful. There is some padding at the beginning of the file for cleanly initializing the decoder, and the amount of padding can vary based on the encoder and its configuration. (You can read all about the troubles this has caused on the Wikipedia page for "gapless playback".) Additionally, your MP3 duration will be determined on MP3 frame boundaries, and not arbitrary PCM boundaries... assuming your decoder/player does not support gapless playback.

Converting m4a to highest quality wav and 320kbps mp3

I have a file .m4a audio file of size 805 kb which I wanted to convert it to wav. My purpose is to (1) get highest quality .wav audio file and (2) get 320 kbps .mp3 audio file.
I used an online service (https://www.online-convert.com) to convert it. When I converted directly without optional settings, the file size increased to 11.6 mb and when I converted the same audio with optional settings like, changing bit resolution to 32 Bit and changing sampling rate to 96000 Hz, the file size jumped to 50.7 mb
There are only two optional settings in the web service
Bit resolution - no change, 8 bit, 16 bit, 24 bit, 32 bit
Sampling rate - no change, 1000 hz, 8000, 11025, 16000, 22050, 24000, 32000, 44100, 48000, 96000 hz
And one raido button for Normalize audio that can be checked and unchecked
Can someone explain why the file size increases and what settings I must keep to get the highest quality from the original 805 kb audio?
Thanks

About definition for terms of audio codec

When I was studying Cocoa Audio Queue document, I met several terms in audio codec. There are defined in a structure named AudioStreamBasicDescription.
Here are the terms:
1. Sample rate
2. Packet
3. Frame
4. Channel
I known about sample rate and channel. How I was confused by the other two. What do the other two terms mean?
Also you can answer this question by example. For example, I have an dual-channel PCM-16 source with a sample rate 44.1kHz, which means there are 2*44100 = 88200 Bytes PCM data per second. But how about packet and frame?
Thank you at advance!
You are already familiar with the sample rate defintion.
The sampling frequency or sampling rate, fs, is defined as the number of samples obtained in one second (samples per second), thus fs = 1/T.
So for a sampling rate of 44100 Hz, you have 44100 samples per second (per audio channel).
The number of frames per second in video is a similar concept to the number of samples per second in audio. Frames for our eyes, samples for our ears. Additional infos here.
If you have 16 bits depth stereo PCM it means you have 16*44100*2 = 1411200 bits per second => ~ 172 kB per second => around 10 MB per minute.
To the definition in reworded terms from Apple:
Sample: a single number representing the value of one audio channel at one point in time.
Frame: a group of one or more samples, with one sample for each channel, representing the audio on all channels at a single point on time.
Packet: a group of one or more frames, representing the audio format's smallest encoding unit, and the audio for all channels across a short amount of time.
As you can see there is a subtle difference between audio and video frame notions. In one second you have for stereo audio at 44.1 kHz: 88200 samples and thus 44100 frames.
Compressed format like MP3 and AAC pack multiple frames in packets (these packets can then be written in MP4 file for example where they could be efficiently interleaved with video content). You understand that dealing with large packets helps to identify bits patterns for better coding efficiency.
MP3, for example, uses packets of 1152 frames, which are the basic atomic unit of an MP3 stream. PCM audio is just a series of samples, so it can be divided down to the individual frame, and it really has no packet size at all.
For AAC you can have 1024 (or 960) frames per packet. This is described in the Apple document you pointed at:
The number of frames in a packet of audio data. For uncompressed audio, the value is 1. For variable bit-rate formats, the value is a larger fixed number, such as 1024 for AAC. For formats with a variable number of frames per packet, such as Ogg Vorbis, set this field to 0.
In MPEG-based file format a packet is referred to as a data frame (not to be
mingled with the previous audio frame notion). See Brad comment for more information on the subject.

setting timestamps for audio samples in directshow graph

I am developing a directshow audio decoder filter, to decode AC3 audio.
the filter is used in a live graph, decoding TS multicast.
the demuxer (mainconcept) provides me with the audio data demuxed, but does not provide timestamps for the sample.
how can I get/compute the correct timestamp of the audio?
I found this forum post:
http://www.ureader.com/msg/14712447.aspx
In it, a member gives the following formula for calculating the timestamps for audio, given it's format (sample rate, number of channels, bits per sample):
With PCM audio, duration_in_secs = 8 * buffer_size / wBitsPerSample /
nChannels / nSamplesPerSec or duration_in_secs = buffer_size /
nAvgBytesPerSec (since, for PCM audio, nAvgBytesPerSec =
wBitsPerSample * nChannels * nSamplesPerSec / 8).
The only thing you need to add is a tracking variable that tells you what sample number in the stream that you are at, so you can use it to offset the start time and end time by the duration (duration_in_secs) when doing linear streaming. For seek operations you would of course need to know or calculate the sample number into the stream.
Don't forget that the units for timestamps in DirectShow are typed as REFERENCE_TIME, a long integer or Int64. Each unit is equal to 100 nanoseconds. That is why you see in video filters the value 10,000,000 being divided by the relevant number of frames per second (FPS) to calculate timestamps for each frame because 10,000,000 equals 1 second in a REFERENCE_TIME variable.
Each AC-3 frame embeds data for 6 * 256 samples. Sampling rate can be 32 kHz, 44.1 kHz or 48 kHz (as defined by AC-3 specification Digital Audio Compression Standard (AC-3, E-AC-3)). The frames themselves do not carry timestamps, so you needs to assume continuous stream and increment time stamps respectively. As you mentioned the source is live, you might need to re-adjust time stamps on data starvation.
Each AC-3 frame is of fixed length (which you can identify from bitstream header), so you might also be checking if demultiplexer is giving you a single AC-3 frame or a few in a batch.

Audio samples per second?

I am wondering on the relationship between a block of samples and its time equivalent. Given my rough idea so far:
Number of samples played per second = total filesize / duration.
So say, I have a 1.02MB file and a duration of 12 sec (avg), I will have about 89,300 samples played per second. Is this right?
Is there other ways on how to compute this? For example, how can I know how much a byte[1024] array is equivalent to in time?
Generally speaking for PCM samples you can divide the total length (in bytes) by the duration (in seconds) to get the number of bytes per second (for WAV files there will be some inaccuracy to account for the header). How these translate into samples depends on
the sample rate
bits used per sample, i.e. commonly
used is 16 bits = 2 bytes
number of channels, i.e. for stereo
this is 2
If you know 2) and 3) you can determine 1)
In your example 89300 bytes/second, assuming stereo and 16 bits per sample would be 89300 / 4 ~= 22Khz sample rate
In addition to #BrokenGlass's very good answer, I'll just add that for uncompressed audio with a fixed sample rate, number of channels and bits per sample, the arithmetic is fairly straightforward. E.g. for "CD quality" audio we have a 44.1 kHz sample rate, 16 bits per sample, 2 channels (stereo), therefore the data rate is:
44100 * 16 * 2
= 1,411,200 bits / sec
= 176,400 bytes / sec
= 10 MB / minute (approx)

Resources