How many bytes can be stored per minute of audio using any method of steganography with a disregard to detectability or any other factor e.g if the original audio begins to sound different
Related
I have some long audio files.I want to split this audio file into multiple short length audio file using python.Ex:The audio long length is more than 1 hour and want to split into multiple short length 5s files. i want to extract features for the whole audio file in each 5s.
There are two issues in your question.
Splitting the audio
Extracting features.
and both of them have the same, underlying, key information: sampling frequency.
The duration of an audio signal, in seconds, and the sampling frequency used for the audio file, define the amount of samples that an audio file has. An audio sample is (in simplified terms) one value of the audio signal in your hard-disk or computer memory.
The amount of audio samples, for a typical wav file, are calculated based on the formula sr * dur, here sr is the sampling frequency in Hz (e.g. 44100 for a CD quality signal) and dur is the duration of the audio file in seconds. For example, a CD audio file of 2 seconds has 44100 * 2 = 88200 samples.
So:
To split an audio file in Python, you first have to read it in a variable. There are plenty libraries and functions out there, for example (in a random order):
scipy.io.wavfile.read
wave module
and others. You can check this SO post for more info on reading a wav file.
Then, you just have to get N samples, e.g. my_audio_1 = whole_audio_file[0:5*sr].
BUT!!!
If you just want to extract features for every X seconds, then it is no need to split the audio manually. Most audio feature extraction libraries, do that for you.
For example, in librosa you can control the amount of the FFT points, which roughly are equivalent to the length of the audio that you want to extract features from. You can check, for example, here: https://librosa.org/doc/latest/feature.html
As title; for some compressed format such as EAC3, AC3 frame starts as a sync word.
So what's PCM (raw audio)'s sync word? How to identify the beginning of a PCM frame?
I met a problem where audio is concatenated by several audio segments and each of them has different frame size. I need to identify the start position.
Thanks in advance.
There is no such concept as a frame in PCM. The concept of a frame is to indicate points of random access. In PCM every single sample is a point of random access, hence start indicators are not required, and there are no standard frame size. It all up to you.
A PCM frame is different from the frames you're describing, in that a frame is just a single sample on all channels. That is, if I'm recording 16-bit stereo PCM audio, each frame is 4 bytes (32 bits) long.
There is no sync word, nor frame header in raw PCM. It's just a stream of data. You need to know the bit depth, channel count, and current offset if you want to sync to it. (Or, you need to do some simple heuristics. For example, apply several different formats and offsets to a small chunk of data and see which one has the least variance/randomness from sample to sample.)
I want relation between time and bytes in ogg file. If I have 5 second ogg and it's length 68*1024 bytes. If I chunk from that ogg file and save it can I knew that size from before chunk? Like I knew it I want to chunk from 2.4 to 3.2.
And give some mathematical calculation and get accurate answer of bytes I can get. Can anyone tell me please if this is possible?
Bit rate 128kbps, 16 bit , sample rate - 44.1Khz, stereo
I used below logic but can't get accurate answer.
Click here
Any such direct mapping between file size and play time will work, but not if the codec uses variable bit rate (vbr) encoding ... meaning the compression algorithm is vbr if its success in compressing is dependent on the informational density of the source media ... repetitive audio is more efficiently compressed than say random noise ... vbr algorithms are typically more efficient since to maintain a constant bit rate the algo pads the buffer with filler data just so its throughput is in constant bytes per second
When I was studying Cocoa Audio Queue document, I met several terms in audio codec. There are defined in a structure named AudioStreamBasicDescription.
Here are the terms:
1. Sample rate
2. Packet
3. Frame
4. Channel
I known about sample rate and channel. How I was confused by the other two. What do the other two terms mean?
Also you can answer this question by example. For example, I have an dual-channel PCM-16 source with a sample rate 44.1kHz, which means there are 2*44100 = 88200 Bytes PCM data per second. But how about packet and frame?
Thank you at advance!
You are already familiar with the sample rate defintion.
The sampling frequency or sampling rate, fs, is defined as the number of samples obtained in one second (samples per second), thus fs = 1/T.
So for a sampling rate of 44100 Hz, you have 44100 samples per second (per audio channel).
The number of frames per second in video is a similar concept to the number of samples per second in audio. Frames for our eyes, samples for our ears. Additional infos here.
If you have 16 bits depth stereo PCM it means you have 16*44100*2 = 1411200 bits per second => ~ 172 kB per second => around 10 MB per minute.
To the definition in reworded terms from Apple:
Sample: a single number representing the value of one audio channel at one point in time.
Frame: a group of one or more samples, with one sample for each channel, representing the audio on all channels at a single point on time.
Packet: a group of one or more frames, representing the audio format's smallest encoding unit, and the audio for all channels across a short amount of time.
As you can see there is a subtle difference between audio and video frame notions. In one second you have for stereo audio at 44.1 kHz: 88200 samples and thus 44100 frames.
Compressed format like MP3 and AAC pack multiple frames in packets (these packets can then be written in MP4 file for example where they could be efficiently interleaved with video content). You understand that dealing with large packets helps to identify bits patterns for better coding efficiency.
MP3, for example, uses packets of 1152 frames, which are the basic atomic unit of an MP3 stream. PCM audio is just a series of samples, so it can be divided down to the individual frame, and it really has no packet size at all.
For AAC you can have 1024 (or 960) frames per packet. This is described in the Apple document you pointed at:
The number of frames in a packet of audio data. For uncompressed audio, the value is 1. For variable bit-rate formats, the value is a larger fixed number, such as 1024 for AAC. For formats with a variable number of frames per packet, such as Ogg Vorbis, set this field to 0.
In MPEG-based file format a packet is referred to as a data frame (not to be
mingled with the previous audio frame notion). See Brad comment for more information on the subject.
I use ffmpeg to convert videos from one format to another.
Is bitrate the only parameter which decides the output size of a video/audio file?
Yes, bitrate is essentially what will control the file size (for a given playback duration). It is the number of bits used to represent each second of material.
However, there are some subtleties, e.g. :
a video file encoded at a certain video bitrate probably contains a separate audio stream, with a separately-specified bitrate
most file formats will contain some metadata that won't be counted towards the basic video stream bitrate
sometimes the algorithm will not actually aim to achieve the specified bitrate - for example, using the CRF factor. http://trac.ffmpeg.org/wiki/x264EncodingGuide explains how two-pass would be preferred if targeting a specific file size.
So you may want to do a little experimenting with a particular set of options for a particular file format.
Bitrate describes the quality of an audio or video file.
For example, an MP3 audio file that is compressed at 192 Kbps will have a greater dynamic range and may sound slightly more clear than the same audio file compressed at 128 Kbps. This is because more bits are used to represent the audio data for each second of playback.
Similarly, a video file that is compressed at 3000 Kbps will look better than the same file compressed at 1000 Kbps. Just like the quality of an image is measured in resolution, the quality of an audio or video file is measured by the bitrate.