I am working on a project that requires an on-line service that should audio-mix user submitted 30-sec audio together in one stereo file. It would do daily preview bounces where new audio would be mixed with the last stereo version.
But for final full quality export with some adjustable options I might need to audio-mix huge amount of files anew (potentially 1,000,000+).
What is the best approach, and what mistakes should be avoided?
Are there any differences in terms of efficiency and audio quality in incremental mixing (wav 1 + wav 2 => new wav + wav 3 => new wav + wav 4 => new wav etc) vs mixing all at once (wav 1 + wav 2 + ... + wav n => new wav)?
Related
I have a recording as a collection of files in mpegts format, like
audio: a-1.ts, a-2.ts, a-3.ts, a-4.ts
video: v-1.ts, v-2.ts, v-3.ts
I need to make a single video clip in mp4 or mkv format.
However, there are two problems:
audio and video segments have different duration each, number of audio segments is different from number of video segments. Total duration of audio and video matches. Hence I can not concat pairwise audio video segments using mpeg and merge them afterwards, I get sync issues increasing progressively
few segments are corrupt or missing. So if I concat audio and video streams separately using ffmpeg I get streams of different lengths. When I merge these streams using ffmpeg I have correct a/v synchronization until time when first missing packet is encountered.
It's OK if video freezes for a while or there is silence for a while as long as most of the video is in sync with audio.
I've checked with tsduck and PCR seems to be present in all audio and video segments yet I could not find a way to merge streams using mpegTS PCR as sync reference. Please advise how can I achieve this.
How many bytes can be stored per minute of audio using any method of steganography with a disregard to detectability or any other factor e.g if the original audio begins to sound different
I have some long audio files.I want to split this audio file into multiple short length audio file using python.Ex:The audio long length is more than 1 hour and want to split into multiple short length 5s files. i want to extract features for the whole audio file in each 5s.
There are two issues in your question.
Splitting the audio
Extracting features.
and both of them have the same, underlying, key information: sampling frequency.
The duration of an audio signal, in seconds, and the sampling frequency used for the audio file, define the amount of samples that an audio file has. An audio sample is (in simplified terms) one value of the audio signal in your hard-disk or computer memory.
The amount of audio samples, for a typical wav file, are calculated based on the formula sr * dur, here sr is the sampling frequency in Hz (e.g. 44100 for a CD quality signal) and dur is the duration of the audio file in seconds. For example, a CD audio file of 2 seconds has 44100 * 2 = 88200 samples.
So:
To split an audio file in Python, you first have to read it in a variable. There are plenty libraries and functions out there, for example (in a random order):
scipy.io.wavfile.read
wave module
and others. You can check this SO post for more info on reading a wav file.
Then, you just have to get N samples, e.g. my_audio_1 = whole_audio_file[0:5*sr].
BUT!!!
If you just want to extract features for every X seconds, then it is no need to split the audio manually. Most audio feature extraction libraries, do that for you.
For example, in librosa you can control the amount of the FFT points, which roughly are equivalent to the length of the audio that you want to extract features from. You can check, for example, here: https://librosa.org/doc/latest/feature.html
I have a video which has 3 audio streams in the file. The first one is English and the other ones are in different languages. How can I get rid of these audio streams without losing the quality of the video and the English stream.
I think ffmpeg should be used but I don't know how to do it.
Video
Bit rate mode: Variable
Overall bit rate: 38.6 Mb/s
Chroma subsampling: 4:2:0
Audio
Format: DTS-HD
Compression mode: Lossless
I'm having issues with playing back some quick time files using actionscript 3.0 (NetStream class).
I have no control on how the quick time files are produced, but it seems so far that the files with uncompressed audio do not play audio at all in Flash Player.
I'm trying to compile a list of audio formats using video(mov/flv/etc.) in Flash Player, but I'm confused by the resources.
I've look through the FLV Format Specs(pdf link) on devnet and the media types listed there are:
MP3 A media type of .mp3 (0x2E6D7033)
indicates that the track contains MP3
audio data. The dot character, hex
0x2E, is included to make a complete
four-character code.
AAC A media type
of mp4a (0x6D703461) indicates that
the track is encoded with AAC audio.
Flash Player supports the following
AAC profiles, denoted by their object
types:
- 1 = main profile
- 2 = low complexity, a.k.a. LC
- 5 = high efficiency/scale band replication, a.k.a. HE/SBR When the
audio codec is AAC, an esds box occurs
inside the stsd box of a sample table.
This box contains initialization data
that an AAC decoder requires to decode
the stream. See ISO/IEC 14496-3 for
more information about the structure
of this box.
On the wikipedia entry, there is a mention on uncompressed audio:
FLV files also support uncompressed
audio or ADPCM format audio.
but there is no reference for that statement.
Is there a page that lists all the supported audio formats for playing back video in Flash Player ?
Be careful not to confuse the F4V and FLV container formats.
The official specification you mentioned describes both of these formats.
Your quote specifically refers to the F4V format which only supports MP3 and AAC in the flash player.
The list of audio codecs supported by the FLV container is shown on page 70 in the same file:
SoundFormat
(See notes following
table, for special
encodings)
UB [4] Format of SoundData. The following values are defined:
0 = Linear PCM, platform endian
1 = ADPCM
2 = MP3
3 = Linear PCM, little endian
4 = Nellymoser 16 kHz mono
5 = Nellymoser 8 kHz mono
6 = Nellymoser
7 = G.711 A-law logarithmic PCM
8 = G.711 mu-law logarithmic PCM
9 = reserved
10 = AAC
11 = Speex
14 = MP3 8 kHz
15 = Device-specific sound
Formats 7, 8, 14, and 15 are reserved.
AAC is supported in Flash Player 9,0,115,0 and higher.
Speex is supported in Flash Player 10 and higher.