I want to get the duration of the recorded audio from microphone. Currently, I'm using the method GetSampleDuration of the Microphone class.
totalRecordTime = microphone.GetSampleDuration(stream.ToArray().Length);
This works great but I think there are two problems:
The stream object contains the WAV header stuff. I don't how it affects the duration of the file? Or I can substract the WAV header length from the stream length before.
The method GetSampleDuration(int sizeInBytes). Is it possible that the sizeInBytes parameter exceed the maximum supported Integer length?
Related
How many bytes can be stored per minute of audio using any method of steganography with a disregard to detectability or any other factor e.g if the original audio begins to sound different
I have some long audio files.I want to split this audio file into multiple short length audio file using python.Ex:The audio long length is more than 1 hour and want to split into multiple short length 5s files. i want to extract features for the whole audio file in each 5s.
There are two issues in your question.
Splitting the audio
Extracting features.
and both of them have the same, underlying, key information: sampling frequency.
The duration of an audio signal, in seconds, and the sampling frequency used for the audio file, define the amount of samples that an audio file has. An audio sample is (in simplified terms) one value of the audio signal in your hard-disk or computer memory.
The amount of audio samples, for a typical wav file, are calculated based on the formula sr * dur, here sr is the sampling frequency in Hz (e.g. 44100 for a CD quality signal) and dur is the duration of the audio file in seconds. For example, a CD audio file of 2 seconds has 44100 * 2 = 88200 samples.
So:
To split an audio file in Python, you first have to read it in a variable. There are plenty libraries and functions out there, for example (in a random order):
scipy.io.wavfile.read
wave module
and others. You can check this SO post for more info on reading a wav file.
Then, you just have to get N samples, e.g. my_audio_1 = whole_audio_file[0:5*sr].
BUT!!!
If you just want to extract features for every X seconds, then it is no need to split the audio manually. Most audio feature extraction libraries, do that for you.
For example, in librosa you can control the amount of the FFT points, which roughly are equivalent to the length of the audio that you want to extract features from. You can check, for example, here: https://librosa.org/doc/latest/feature.html
I'm trying to create a MOV file with two audio tracks and one video track, and I'm trying to do so without AVAssetExportSession or AVComposition, as I want to have the resultant file ready almost immediately after the AVCaptureSession ends. An export after the capture session may only take a few seconds, but not in the case of a 5 minute capture session. This looks like it should be possible, but I feel like I'm just a step away:
There's source #1 - video and audio recorded via AVCaptureSession (handled via AVCaptureVideoDataOutput and AVCaptureAudioDataOutput).
There's source #2 - an audio file read in with an AVAssetReader. Here I use an AVAssetWriterInput and requestMediaDataWhenReadyOnQueue. I call setTimeRange on its AVAssetReader, from CMTimeZero to the duration of the asset, and this shows correctly as 27 seconds when logged out.
I have each of the three inputs working on a queue of its own, and all three are concurrent. Logging shows that they're all handling sample buffers - none appear to be lagging behind or stuck in a queue that isn't processing.
The important point is that the audio file works on its own, using all the same AVAssetWriter code. If I set my AVAssetWriter to output a WAVE file and refrain from adding the writer inputs from #1 (the capture session), I finish my writer session when the audio-from-file samples are depleted. The audio file reports as being of a certain size, and it plays back correctly.
With all three writer inputs added, and the file type set to AVFileTypeQuickTimeMovie, the requestMediaDataOnQueue process for the audio-from-file still appears to read the same data. The resultant mov file shows three tracks, two audio, one video, and the duration of the captured audio and video are not identical in length but they've obviously worked, and the video plays back with both intact. The third track (the second audio track), however, shows a duration of zero.
Does anyone know if this whole solution is possible, and why the duration of the from-file audio track is zero when it's in a MOV file? If there was a clear way for me to mix the two audio tracks I would, but for one, AVAssetReaderAudioMixOutput takes two AVAssetTracks, and I essentially want to mix an AVAssetTrack with captured audio, and they aren't managed or read in the same way.
I'd also considered that the QuickTime Movie won't accept certain audio formats, but I'm making a point of passing the same output settings dictionary to both audio AVAssetWriterInputs, and the captured audio does play and report its duration (and the from-file audio plays when in a WAV file with those same output settings), so I don't think this is an issue.
Thanks.
I discovered that the reason for this is:
I correctly use the Presentation Time Stamp of the incoming capture session data (I use the PTS of the video data at the moment) to begin a writer session (startSessionAtSourceTime), and that meant that the timestamp of the audio data read from file had the wrong timestamp - outwith the time range that was dictated to the AVAssetWriter session. So I had to further process the data from the audio file, changing its timing information by using CMSampleBufferCreateCopyWithNewTiming.
CMTime bufferDuration = CMSampleBufferGetOutputDuration(nextBuffer);
CMSampleBufferRef timeAdjustedBuffer;
CMSampleTimingInfo timingInfo;
timingInfo.duration = bufferDuration;
timingInfo.presentationTimeStamp = _presentationTimeUsedToStartSession;
timingInfo.decodeTimeStamp = kCMTimeInvalid;
CMSampleBufferCreateCopyWithNewTiming(kCFAllocatorDefault, nextBuffer, 1, &timingInfo, &timeAdjustedBuffer);
I am writing a code to extract AAC audio data from mpeg ts stream. I want to get stream properties like sampling frequency, number of channels, Audio type, Audio profile type etc. from Transport stream, without decoding the actual data. How much of the information will be available from stream?
Also I want to know is there any way to find the total duration of the stream without actually finding the last PTS value in the file
Thanks
AAC frames packed in TS use ADTS headers. Its 7 (or 9) bytes, and very easy to parse. ADTS header format is documented well online.
During capturing from some audio and video sources and encoding at AVI container for synchronizing audio & video I set audio as a master stream and this gave best result for synchronizing.
http://msdn.microsoft.com/en-us/library/windows/desktop/dd312034(v=vs.85).aspx
But this method gives a higher FPS value as a result. About 40 or 50 instead of 30 FPS.
If this media file just playback - all OK, but if try to recode at different software to another video format appears out of sync.
How can I programmatically set dwScale and dwRate values in the AVISTREAMHEADER structure at AVI muxing?
How can I programmatically set dwScale and dwRate values in the AVISTREAMHEADER structure at AVI muxing?
MSDN:
This method works by adjusting the dwScale and dwRate values in the AVISTREAMHEADER structure.
You requested that multiplexer manages the scale/rate values, so you cannot adjust them. You should be seeing more odd things in your file, not just higher FPS. The file itself is perhaps out of sync and as soon as you process it with other applciations that don't do playback fine tuning, you start seeing issues. You might be having video media type showing one frame rate and effectively the rate is different.