Combine Audio and Images in Stream - python-3.x

I would like to be able to create images on the fly and also create audio on the fly too and be able to combine them together into an rtmp stream (for Twitch or YouTube). The goal is to accomplish this in Python 3 as that is the language my bot is written in. Bonus points for not having to save to disk.
So far, I have figured out how to stream to rtmp servers using ffmpeg by loading a PNG image and playing it on loop as well as loading a mp3 and then combining them together in the stream. The problem is I have to load at least one of them from file.
I know I can use Moviepy to create videos, but I cannot figure out whether or not I can stream the video from Moviepy to ffmpeg or directly to rtmp. I think that I have to generate a lot of really short clips and send them, but I want to know if there's an existing solution.
There's also OpenCV which I hear can stream to rtmp, but cannot handle audio.
A redacted version of an ffmpeg command I have successfully tested with is
ffmpeg -loop 1 -framerate 15 -i ScreenRover.png -i "Song-Stereo.mp3" -c:v libx264 -preset fast -pix_fmt yuv420p -threads 0 -f flv rtmp://SITE-SUCH-AS-TWITCH/.../STREAM-KEY
or
cat Song-Stereo.mp3 | ffmpeg -loop 1 -framerate 15 -i ScreenRover.png -i - -c:v libx264 -preset fast -pix_fmt yuv420p -threads 0 -f flv rtmp://SITE-SUCH-AS-TWITCH/.../STREAM-KEY
I know these commands are not set up properly for smooth streaming, the result manages to screw up both Twitch's and Youtube's player and I will have to figure out how to fix that.
The problem with this is I don't think I can stream both the image and the audio at once when creating them on the spot. I have to load one of them from the hard drive. This becomes a problem when trying to react to a command or user chat or anything else that requires live reactions. I also do not want to destroy my hard drive by constantly saving to it.
As for the python code, what I have tried so far in order to create a video is the following code. This still saves to the HD and is not responsive in realtime, so this is not very useful to me. The video itself is okay, with the one exception that as time passes on, the clock the qr code says versus the video's clock start to spread apart farther and farther as the video gets closer to the end. I can work around that limitation if it shows up while live streaming.
def make_frame(t):
img = qrcode.make("Hello! The second is %s!" % t)
return numpy.array(img.convert("RGB"))
clip = mpy.VideoClip(make_frame, duration=120)
clip.write_gif("test.gif",fps=15)
gifclip = mpy.VideoFileClip("test.gif")
gifclip.set_duration(120).write_videofile("test.mp4",fps=15)
My goal is to be able to produce something along the psuedo-code of
original_video = qrcode_generator("I don't know, a clock, pyotp, today's news sources, just anything that can be generated on the fly!")
original_video.overlay_text(0,0,"This is some sample text, the left two are coordinates, the right three are font, size, and color", Times_New_Roman, 12, Blue)
original_video.add_audio(sine_wave_generator(0,180,2)) # frequency min-max, seconds
# NOTICE - I did not add any time measurements to the actual video itself. The whole point is this is a live stream and not a video clip, so the time frame would be now. The 2 seconds list above is for our psuedo sine wave generator to know how long the audio clip should be, not for the actual streaming library.
stream.send_to_rtmp_server(original_video) # Doesn't matter if ffmpeg or some native library
The above example is what I am looking for in terms of video creation in Python and then streaming. I am not trying to create a clip and then stream it later, I am trying to have the program be able to respond to outside events and then update it's stream to do whatever it wants. It is sort of like a chat bot, but with video instead of text.
def track_movement(...):
...
return ...
original_video = user_submitted_clip(chat.lastVideoMessage)
original_video.overlay_text(0,0,"The robot watches the user's movements and puts a blue square around it.", Times_New_Roman, 12, Blue)
original_video.add_audio(sine_wave_generator(0,180,2)) # frequency min-max, seconds
# It would be awesome if I could also figure out how to perform advance actions such as tracking movements or pulling a face out of a clip and then applying effects to it on the fly. I know OpenCV can track movements and I hear that it can work with streams, but I cannot figure out how that works. Any help would be appreciated! Thanks!
Because I forgot to add the imports, here are some useful imports I have in my file!
import pyotp
import qrcode
from io import BytesIO
from moviepy import editor as mpy
The library, pyotp, is for generating one time pad authenticator codes, qrcode is for the qr codes, BytesIO is used for virtual files, and moviepy is what I used to generate the GIF and MP4. I believe BytesIO might be useful for piping data to the streaming service, but how that happens, depends entirely on how data is sent to the service, whether it be ffmpeg over command line (from subprocess import Popen, PIPE) or it be a native library.

Are you using ffmpeg.exe and running a command through CMD? If so you can use either concat demuxer or pipe. When you use concat demuxer, ffmpeg can take image input from a text file. Text file should contain image paths and ffmpeg can find those images from different folders. Following code line shows how you can use concat demuxer. Image locations are saved to input.txt fie.
ffmpeg -f concat -i input.txt -vsync vfr -pix_fmt yuv420p output.mp4
But most suitable solution would be to use a data pipe to feed images to ffmpeg.
cat *.png | ffmpeg -f image2pipe -i - output.mkv
you can check this link to see more information about ffmpeg data pipe.
Generating multiple videos and streaming at real time is not a very stable solution. You can run into several problems.

I have settled on using Gstreamer to create my streams on the fly. It can allow me to take separate video and audio streams and combine them together. I do not exactly have a working example right now, but I hopefully will either have an answer or figure it out on my own soon, at Gstreamer in Python exits instantly, but is fine on command line.

Related

Is there a way to ensure mp3 duration accuracy with variable bit rate using FFMPEG?

In our application, we are processing audio files using ffmpeg. Specifically, we use the NodeJS library fluent-ffmpeg, (npm link).
Our audio files are generated from various text to speech providers. We recently noticed that when we converted audio using ssml to add pauses to the generated audio, the duration on the file is no longer correct. Upon further investigation, we noticed that the standard audios were also incorrect, just more accurate overall due to the more consistent data. When we put a pause at the beginning of the audio, the estimate was the worst, overshooting it by a very large margin (e.g., a 25s audio clip would read as 3 minutes long, but skip to the end when playing past the 25s mark.
I did some searching and research into the structure of MP3 files, and to me it seems like the issue is because the duration gets estimated by various audio players. Windows media player is an example, but Firefox's web player seems to also do this. I tried changing the ffmpeg command from using .audioQuality(0), which sets ffmpeg to use VBR, to .audioBitrate(320), which tells ffmpeg to use a constant bitrate.
For reference, the we are using libmp3lame, and the full command that gets run is the following, for the VBR and CBR cases respectively:
For VBR (broken durations): ffmpeg -i <URL> -acodec libmp3lame -aq 0 -f mp3 pipe:1
For CBR (correct duration): ffmpeg -i <URL> -acodec libmp3lame -b:a 320k -f mp3 pipe:1
Note: we then pipe the output to the requesting client application after sending the appropriate file headers, hence the pipe:1 output. The input is a cloud storage url where the source file is located
This fixes our problem of having a correct duration, and it makes sense to me why this would fix it if the problem was because the duration is being estimated by some of these players / audio consumers. But, this came at the cost that the file size was significantly larger, which also makes sense to me. While testing we found that compared to the same file in WAV, the VBR mp3 was about 10% of the WAV file size, while the CBR mp3 was still 50% of the WAV file size. This practically defeats the purpose of supporting the mp3 format for our use-case, which is a smaller but slightly lossy alternative to the large WAV file.
While researching, I found that there can be ID3 tags in a chunk at the beginning of the mp3 file, specifying information for the consumer of the audio to know the duration before potentially having processed the whole file. But, I also found that there doesn't seem to be a standard, at least for duration. More things like song title, album, artist, etc.
My question is, is there a way to get the proper duration onto an mp3 file, preferably via some ffmpeg mechanism, while still using VBR? Thanks!
FFmpeg does write a Xing header by default with duration info. However, that value is only known after the entire stream data has been received, so ffmpeg has to seek to the head to write it. Since you're piping the output, that can't be done.
Write the file locally or to some seekable destination, and then upload.

mkv file out of sync with linear drift

I have a bunch of mkv files, with FLAC as the audio codec and FFV1 as the video one.
The files were created using an EasyCap aquisition dongle from a VCR analog source. Specifically, I used VLC's "open acquisition device" prompt and selected PAL. Then, I converted the files (audio PCM, video raw YUV) to (FLAC, FFV1) using
ffmpeg.exe -i input.avi -acodec flac -vcodec ffv1 -level 3 -threads 4 -coder 1 -context 1 -g 1 -slices 24 -slicecrc 1 output.mkv
Now, the files are progressively out of sync. It may be due to the fact that while (maybe) the video has a constant framerate, the FLAC track has variable framerate. So, is there a way to sync the track to audio, or something alike? Can FFmpeg do this? Thanks
EDIT
On Mulvya hint, I plotted the difference in sync at various times; the first column shows the seconds elapsed, the second shows the difference - in secs. The plot seems to behave linearly, with 0.0078 as a constant slope. NOTE: measurements taken by hands, by means of a chronometer
EDIT 2
Playing around with VirtualDub, I found that changing the framerate to 25 fps from the original 24.889 (Video->Frame rate...->Change frame rate to) and using the track converted to wav definitely does work. Two problems, though: VirtualDub crashes when importing the original FFV1-FLAC mkv file, so I had to convert the video to H264 to try it out; more, I find it difficult to use an external encoder to save VirtualDub output.
So, could I avoid using VirtualDub, and simply use ffmpeg for it? Here's the exported vdscript:
VirtualDub.audio.SetSource("E:\\4_track2.wav", "");
VirtualDub.audio.SetMode(0);
VirtualDub.audio.SetInterleave(1,500,1,0,0);
VirtualDub.audio.SetClipMode(1,1);
VirtualDub.audio.SetEditMode(1);
VirtualDub.audio.SetConversion(0,0,0,0,0);
VirtualDub.audio.SetVolume();
VirtualDub.audio.SetCompression();
VirtualDub.audio.EnableFilterGraph(0);
VirtualDub.video.SetInputFormat(0);
VirtualDub.video.SetOutputFormat(7);
VirtualDub.video.SetMode(3);
VirtualDub.video.SetSmartRendering(0);
VirtualDub.video.SetPreserveEmptyFrames(0);
VirtualDub.video.SetFrameRate2(25,1,1);
VirtualDub.video.SetIVTC(0, 0, 0, 0);
VirtualDub.video.SetCompression();
VirtualDub.video.filters.Clear();
VirtualDub.audio.filters.Clear();
The first line imports the wav-converted audio track.
Can I set an equivalent pipe in ffmpeg (possibly, using FLAC - not wav)? SetFrameRate2 is maybe the key, here.

How can low frame rate video be made to look more smooth?

I am trying to clean up a video that was recorded in 2003 in low-light conditions on what was possibly a cameraphone. The video has been cleaned up somewhat (cropped, logos removed and stabilized), but it remains quite jerky, due in large part to its low frame rate. What are some tricks that might clean up the video in this regard? I feel that I am asking for something a bit like tweening in flash animations, but for pixels, whereby additional frames are generated using nearby frames of the video. Does such a trick exist? Is there another way to approach this problem?
To reproduce the video processing so far, take the following steps:
# get video
wget http://www.anwarweb.net/saddamdown.wmv
# crop
ffmpeg -i saddamdown.wmv -filter:v "crop=292:221:14:10" -c:a copy saddamdown_crop.wmv
# remove logo 1
ffmpeg -i saddamdown_crop.wmv -vf delogo=x=17:y=77:w=8:h=54 -c:a copy saddamdown_crop_delogo_1.wmv
# remove logo 2
ffmpeg -i saddamdown_crop_delogo_1.wmv -vf delogo=x=190:y=174:w=54:h=8 -c:a copy saddamdown_crop_delogo_1_delogo_2.wmv
# stabilize
ffmpeg -i saddamdown_crop_delogo_1_delogo_2.wmv -vf deshake saddamdown_crop_delogo_1_delogo_2_deshake.wmv
Note: The video is of the Saddam Hussein execution.
You could try with slowmoVideo: https://github.com/slowmoVideo/slowmoVideo
It's an open source software to create smooth slow motion effects from pixel motion analysis (Windows, Linux, OSX with wine or crossover. Read and write with ffmpeg).
First calculate the slow down ratio: for example if the original video is 18fps and the desired output is 24fps, set the speed of slowmo to 75% (18/24=0.75).
The result depends a lot on the video content, obviously the more fixed are the shots the better.
Anyway you can tweak what they call "Optical Flow", that is the analysis part of the process.
Good luck ;)

Merging two video streams and saving as one file

I'm writing chat application with video call using webRTC. I have two MediaStreams, remote and local and want to merge and save them as one file. So when opening a file, i shall see large video frame (remote stream) and little video frame at top right (local stream). Now I can record these two streams separately using RecordRTC. How can i merge them with nodejs? (no code because I don't know how it's done)
You can use FFmpeg with -filter_complex, here is a working and tested example using FFmpeg version N-62162-gec8789a:
ffmpeg -i main_video.mp4 -i in_picture.mp4 -filter_complex "[0:v:0]scale=640x480[main_video]; [1:v:0]scale=240x180[in_picture];[main_video][in_picture]overlay=390:10" output.mp4
So, this command tells FFmpeg to read from two input files, main_video.mp4 and in_picture.mp4, then it send some information to the -filter_complex flag...
The -filter_complex flag takes the [0:v:0] (first input, first video track) and scale this video to be 640x480px and it identifies the video as [main_video], then, takes the [1:v:0] (second input, video track 0) and resize the video to 240x180px naming the video [in_picture], then it merges both videos making an overlay of the second one at x=390 y=10.
Then it saves the output to output.mp4
It is that what you want?
UPDATE: I forgot to add, all you need in node is a module to run FFmpeg, there are plenty of those:
https://nodejsmodules.org/tags/ffmpeg

FFMPEG produce sharper image from audio to video

I am using ffmpeg to create video file from by using the loop_input command and create an flv file as a wrapper then export it as mp4. For some reason when the video plays the image is clear as time goes on it starts to get really pixelated. Is there something I need to do to retain the quality during the looping or anywhere else that would help?
This is my video out parameters, not the whole line.
ffmpeg -loop_input -f image2 -i myimage.jpg -r 20 -g 20 -vcodec flv
Ah, thanks to this post I see what my problem was.
Image sequence to video quality
I forgot to add a bit rate. I'm used to only setting bit rates when dealing with video only but sense I was converting it to a video, of course it still needs it or it will guess and it will be very low. I used -b 1000k and the image looks great.

Resources