The problem: As the input there are two mp3 files.
First is a 24 hour mp3 record of today's radio broadcasting.
The second one is a one minute long record of the same radiostation, that was made during the day.
To be abstract the second file is a kind of "subsequence" of the first one.
Is there any way to automatically determine at which 'part' of the big file is located the little one?
Related
I have a 2 MP3 files, one is 10 minutes long and another track that is 1 second long. I would like to merge these tracks into a new file that plays the 1 second track at random intervals of the longer one.
processA ... split the longer file into several segments using your random interval for details see https://unix.stackexchange.com/a/1675/10949
processB ... then for each segment from above splitting operation append your shorter file ... repeat until you have each processA segment with that shorter file appended ... for details see https://superuser.com/a/1164761/81282
then stitch together all of above files from processB
I have not tried this however it might be easier if you first converted both original source mp3 files into WAV files before doing anything ... then once done and working as WAV convert the final WAV back to mp3
I have multiple videos of same resolution. Each video has a different length. And I also have a fixed length for output file. Let's say 4 minutes. Let's assume there are 4 input files each of 30 seconds but each input file could have different length. I want to put first 30 secs of output file blank and the next 30 secs as 1st input file and next 10 secs as blank and next 30 secs as 2nd input file so on. Basically I have a predetermined start point for each input file and between the gaps there should be black screen. How can I achieve this ? ffmpeg commands are fine but I'm going to have to automate this in nodejs so if you can give me any tips on it that'd be great!
There doesn't seem to be a single ffmpeg command to do this so I had to split the problem into a smaller problems.
First I generated a list of video segments that are going to a part of the final output video. Now some of these segments are already present and some are to be a black video.
So I used an ffmpeg command to generate a black video with silent audio with the desired length. So now I have all the segments I need and it's just a matter of combining them one after another.
I have two short 2-3 minute .wav files that were recorded within 1 minute of eachother. They could be anywhere from 0-60 seconds off. I'd like to sync them together. There is a sync tone that is played, and present in both audio files. There is very little audio in them besides the loud sync tone, it is very obvious when viewed in audacity.
I've tried every solution listed here Automatically sync two audio recordings in python
and none of them work. They all share the same problem, when they get to this method:
def find_freq_pairs(freqs_dict_orig, freqs_dict_sample):
time_pairs = []
for key in freqs_dict_sample.keys(): # iterate through freqs in sample
if freqs_dict_orig.has_key(key): # if same sample occurs in base
for i in range(len(freqs_dict_sample[key])): # determine time offset
for j in range(len(freqs_dict_orig[key])):
time_pairs.append((freqs_dict_sample[key][i], freqs_dict_orig[key][j]))
return time_pairs
Each time, the inner for loop ends up having to do (500k ^ 2) iterations for each of the 512 keys in the freqs_dict dictionary. This will take many months to run. This is with two 3-4 second audio files. With 1-2 minute audio files, it was (5m+ * 5m+) iterations. I think perhaps the library broke with python3, since everyone on that thread seemed happy with it...
Does anyone know a better way to sync two audio files with python?
Thank you
I'm trying to create a MOV file with two audio tracks and one video track, and I'm trying to do so without AVAssetExportSession or AVComposition, as I want to have the resultant file ready almost immediately after the AVCaptureSession ends. An export after the capture session may only take a few seconds, but not in the case of a 5 minute capture session. This looks like it should be possible, but I feel like I'm just a step away:
There's source #1 - video and audio recorded via AVCaptureSession (handled via AVCaptureVideoDataOutput and AVCaptureAudioDataOutput).
There's source #2 - an audio file read in with an AVAssetReader. Here I use an AVAssetWriterInput and requestMediaDataWhenReadyOnQueue. I call setTimeRange on its AVAssetReader, from CMTimeZero to the duration of the asset, and this shows correctly as 27 seconds when logged out.
I have each of the three inputs working on a queue of its own, and all three are concurrent. Logging shows that they're all handling sample buffers - none appear to be lagging behind or stuck in a queue that isn't processing.
The important point is that the audio file works on its own, using all the same AVAssetWriter code. If I set my AVAssetWriter to output a WAVE file and refrain from adding the writer inputs from #1 (the capture session), I finish my writer session when the audio-from-file samples are depleted. The audio file reports as being of a certain size, and it plays back correctly.
With all three writer inputs added, and the file type set to AVFileTypeQuickTimeMovie, the requestMediaDataOnQueue process for the audio-from-file still appears to read the same data. The resultant mov file shows three tracks, two audio, one video, and the duration of the captured audio and video are not identical in length but they've obviously worked, and the video plays back with both intact. The third track (the second audio track), however, shows a duration of zero.
Does anyone know if this whole solution is possible, and why the duration of the from-file audio track is zero when it's in a MOV file? If there was a clear way for me to mix the two audio tracks I would, but for one, AVAssetReaderAudioMixOutput takes two AVAssetTracks, and I essentially want to mix an AVAssetTrack with captured audio, and they aren't managed or read in the same way.
I'd also considered that the QuickTime Movie won't accept certain audio formats, but I'm making a point of passing the same output settings dictionary to both audio AVAssetWriterInputs, and the captured audio does play and report its duration (and the from-file audio plays when in a WAV file with those same output settings), so I don't think this is an issue.
Thanks.
I discovered that the reason for this is:
I correctly use the Presentation Time Stamp of the incoming capture session data (I use the PTS of the video data at the moment) to begin a writer session (startSessionAtSourceTime), and that meant that the timestamp of the audio data read from file had the wrong timestamp - outwith the time range that was dictated to the AVAssetWriter session. So I had to further process the data from the audio file, changing its timing information by using CMSampleBufferCreateCopyWithNewTiming.
CMTime bufferDuration = CMSampleBufferGetOutputDuration(nextBuffer);
CMSampleBufferRef timeAdjustedBuffer;
CMSampleTimingInfo timingInfo;
timingInfo.duration = bufferDuration;
timingInfo.presentationTimeStamp = _presentationTimeUsedToStartSession;
timingInfo.decodeTimeStamp = kCMTimeInvalid;
CMSampleBufferCreateCopyWithNewTiming(kCFAllocatorDefault, nextBuffer, 1, &timingInfo, &timeAdjustedBuffer);
I've written an MP4 parser that can read atoms in an MP4 just fine, and stitch them back together - the result is a technically valid MP4 file that Quicktime can open and such, but it can't play any audio as I believe the timing/sampling information is all off. I should probably mention I'm only interested in audio.
What I'm doing is trying to take the moov atoms/etc from an existing MP4, and then take only a subset of the mdat atom in the file to create a new, smaller MP4. In doing so I've altered the duration in the mvhd atom, as well as the duration in the mdia header. There are no tkhd atoms in this file that have edits, so I believe I don't need to alter the durations there - what am I missing?
In creating the new MP4 I'm properly sectioning the mdat block with a wide box, and keeping the 'mdat' header/size in their right places - I make sure to update the size with the new content.
Now it's entirely 110% possible I'm missing something crucial about the format, but if this is possible I'd love to get the final piece. Anybody got any input/ideas?
Code can be found at the following link:
https://gist.github.com/ryanmcgrath/958c602cff133bd7fa0b
I'm going to take a stab in the dark here and say that you're not updating your stbl offsets properly. At least I didn't (at first glance) see your python doing that anywhere.
STSC
Lets start with the location of data. Packets are written into the file in terms of chunks, and the header tells the decoder where each "block" of these chunks exists. The stsc table says how many items per chunk exist. The first chunk says where that new chunk starts. It's a little confusing, but look at my example. This is saying that you have 100 samples per chunkk, up to the 8th chunk. At the 8th chunk there are 98 samples.
STCO
That said, you also have to track where the offsets of these chunks are. That's the job of the stco table. So, where in the file is chunk offset 1, or chunk offset 2, etc.
If you modify any data in mdat you have to maintain these tables. You can't just chop mdat data out, and expect the decoder to know what to do.
As if this wasn't enough, now you have to also maintain the sample time table (stts) the sample size table (stsz) and if this was video, the sync sample table (stss).
STTS
stts says how long a sample should play for in units of the timescale. If you're doing audio the timescale is probably 44100 or 48000 (kHz).
If you've lopped off some data, now everything could potentially be out of sync. If all the values here have the exact same duration though you'd be OK.
STSZ
stsz says what size each sample is in bytes. This is important for the decoder to be able to start at a chunk, and then go through each sample by its size.
Again, if all the sample sizes are exactly the same you'd be OK. Audio tends to be pretty much the same, but video stuff varies a lot (with keyframes and whatnot)
STSS
And last but not least we have the stss table which says which frame's are keyframes. I only have experience with AAC, but every audio frame is considered a keyframe. In that case you can have one entry that describes all the packets.
In relation to your original question, the time display isn't always honored the same way in each player. The most accurate way is to sum up the durations of all the frames in the header and use that as the total time. Other players use the metadata in the track headers. I've found it best to just keep all the values the same and then players are happy.
If you're doing all that and I missed it in the script then can you post a sample mp4 and a standalone app and I can try to help you out.