Due to the richness and complexity of my app's audio content, I am using AVAudioEngine to manage all audio across the app. I am converting every audio source to be represented as a node in my AVAudioEngine graph.
For example, instead using AVAudioPlayer objects to play mp3 files in my app, I create AVAudioPlayerNode objects using buffers of those audio files.
However, I do have a video player in my app that plays video files with audio using the AVPlayer framework (I know of nothing else in iOS that can play video files). Unfortunately, there seems to be no way I can obtain the audio output stream as a node in my AVAudioEngine graph.
Any pointers?
If you have a video file, you can extract audio data and pull it out from the video.
Then you can set the volume of AVPlayer to 0. (If you didn't remove audio data from the video)
and Play AVAudioPlayerNode.
If you receive the video data through network, You should make parser of the packet and divide them.
But AV-sync is very tough thing.
Related
I am unable to figure out the method in which Speechify is turning their text chunks into audio, and then playing it on my phone as if it is one large mp3 file. I am able to play each audio chunk separately and have it play while my ios app is in the background. But somehow Speechify is chunking these audio bits together and offering lock screen controls with an estimated time duration. Any ideas on how they are doing this? Are they streaming from the device to a local url?
Just for some background, Speechify takes text and turns it into mp3 audio. It does this by sending individual sentences as the reader progresses and getting back the base64 encoded mp3 audio chunk. It preloads about 2-3 sentences ahead.
I am using react-native for my frontend and Node + Express for backend. I am using amazon polly to generate the individual audio chunks for sentences. I am trying to mesh these audio chunks together so from the lockscreen it says there is one long file playing, not keep skipping to the next audio track/chunk.
I want to do live audio translation via microphone, to get streamed live vid/audio from Facebook, plug the mic into laptop and do live translation by mixing existing audio stream with one coming from the mic (translation). This is OK, somehow I got this part by using audio filter "amix" and mix two audio streams together into one. Now I want to add more perfection to it, is it possible to (probably is) upon mic voice detection to automatically decrease/fade down 20% volume of input/original audio stream to hear translation (mic audio) more loudly and then when mic action/voice stops for lets say 3-5 seconds the volume of original audio stream fades up/goes up to normal volume... is this too much, i can play with sox or similar?
youtube-dl can be used to see what formats are used to store YouTube content:
youtube-dl -F https://youtu.be/??????
The above command hints that the audio and video are mostly stored separately. Is it right? Does YouTube streaming combine audio and video in real-time?
Formats for a sample YouTube content
Most large streaming services will use ABR streaming (see: https://stackoverflow.com/a/42365034/334402).
The two most common ABR streaming formats are HLS and MPEG-DASH and both provide a manifest or index file which the player downloads first and which will contain links to the media streams, typically audio, video, subtitle tracks etc.
For encrypted content the audio and video, and even different bit rate video tracks, may all have separate encryption keys.
The player will download the audio and video tracks and synchronise them for playback.
in general streaming video and audio are sent in separate channels .... ditto for multi track audio like 5+1 ... during transport these channels are wrapped by a media container like mp4 etc
motive is partly due to distinct compression algorithms ... some algos are best for audio versus others for video and baked into these algos is the spread and sharing of data over time across video frames see B-frames for details ... these channels are not limited to video and audio ... if you own the sending and receiving sides you can send arbitrary data in many distinct channels by making up your own data protocol ... as an aside modern codec like H.256 allow data to get sent from receiver back to sender when you think you are simply viewing a movie (read the RFC)
youtube stores each of its various flavors of video and audio in separate files on its end then combines them based in desired streaming quality choices on a per download basis
I am trying to capture the video and audio using AVCaptureSession and I done with videocapturing and converted into pixel buffer and I played the output captured video at server side using ffmpeg n rtmp server. But the thing is how can I make the audio to be converted info data and play it at sever side where the data received. And want to know what the audio format is the audio that is captured.
Thank's All,
MONISH
I've tried to mix Audio playback with the URLStream and FileStream classes. My idea was to stream the file to disk to save memory and use the sampleData event of the Audio class to play some audio. Is it possible to access the streamed file while it is streaming somehow to feed the Audio class?
This is interesting because there are large podcasts out there that takes a lot of memory. The current solution is to delete the audio class when the user changes the track and it is working fine, but I want to make it even better.