Let's say I have an input .mp4 file that contains 4 audio tracks.
How can I change their volumes independently and convert it to a new file that just contains all the 4 audio tracks mixed together and stored in the first audio track? For example I want the first, second and third audio tracks from the input file to be double their original volume and the fourth to be half its original volume, all saved in the output files first audio track. How would that command look like?
Here you can find many good answers: How to overlay/downmix two audio files using ffmpeg
where the most comprehensive one links to https://trac.ffmpeg.org/wiki/AudioChannelManipulation
I recently had a similar use case: freely mixing 6 mono tracks of a multi-track recording to stereo output with different volumes on either or both output channels, which can be achieved like this:
ffmpeg -i 0.flac -i 1.flac -i 2.flac -i 3.flac -i 4.flac -i 5.flac \
-filter_complex [0:a][1:a][2:a][3:a][4:a][5:a]amerge=inputs=6,pan=stereo|c0=c0+1.2*c1+1.2*c2+1.3*c3+c4|c1=c0+1.3*c3+c4+0.8*c5[a] \
-map [a] output.flac
I would like to be able to create images on the fly and also create audio on the fly too and be able to combine them together into an rtmp stream (for Twitch or YouTube). The goal is to accomplish this in Python 3 as that is the language my bot is written in. Bonus points for not having to save to disk.
So far, I have figured out how to stream to rtmp servers using ffmpeg by loading a PNG image and playing it on loop as well as loading a mp3 and then combining them together in the stream. The problem is I have to load at least one of them from file.
I know I can use Moviepy to create videos, but I cannot figure out whether or not I can stream the video from Moviepy to ffmpeg or directly to rtmp. I think that I have to generate a lot of really short clips and send them, but I want to know if there's an existing solution.
There's also OpenCV which I hear can stream to rtmp, but cannot handle audio.
A redacted version of an ffmpeg command I have successfully tested with is
ffmpeg -loop 1 -framerate 15 -i ScreenRover.png -i "Song-Stereo.mp3" -c:v libx264 -preset fast -pix_fmt yuv420p -threads 0 -f flv rtmp://SITE-SUCH-AS-TWITCH/.../STREAM-KEY
or
cat Song-Stereo.mp3 | ffmpeg -loop 1 -framerate 15 -i ScreenRover.png -i - -c:v libx264 -preset fast -pix_fmt yuv420p -threads 0 -f flv rtmp://SITE-SUCH-AS-TWITCH/.../STREAM-KEY
I know these commands are not set up properly for smooth streaming, the result manages to screw up both Twitch's and Youtube's player and I will have to figure out how to fix that.
The problem with this is I don't think I can stream both the image and the audio at once when creating them on the spot. I have to load one of them from the hard drive. This becomes a problem when trying to react to a command or user chat or anything else that requires live reactions. I also do not want to destroy my hard drive by constantly saving to it.
As for the python code, what I have tried so far in order to create a video is the following code. This still saves to the HD and is not responsive in realtime, so this is not very useful to me. The video itself is okay, with the one exception that as time passes on, the clock the qr code says versus the video's clock start to spread apart farther and farther as the video gets closer to the end. I can work around that limitation if it shows up while live streaming.
def make_frame(t):
img = qrcode.make("Hello! The second is %s!" % t)
return numpy.array(img.convert("RGB"))
clip = mpy.VideoClip(make_frame, duration=120)
clip.write_gif("test.gif",fps=15)
gifclip = mpy.VideoFileClip("test.gif")
gifclip.set_duration(120).write_videofile("test.mp4",fps=15)
My goal is to be able to produce something along the psuedo-code of
original_video = qrcode_generator("I don't know, a clock, pyotp, today's news sources, just anything that can be generated on the fly!")
original_video.overlay_text(0,0,"This is some sample text, the left two are coordinates, the right three are font, size, and color", Times_New_Roman, 12, Blue)
original_video.add_audio(sine_wave_generator(0,180,2)) # frequency min-max, seconds
# NOTICE - I did not add any time measurements to the actual video itself. The whole point is this is a live stream and not a video clip, so the time frame would be now. The 2 seconds list above is for our psuedo sine wave generator to know how long the audio clip should be, not for the actual streaming library.
stream.send_to_rtmp_server(original_video) # Doesn't matter if ffmpeg or some native library
The above example is what I am looking for in terms of video creation in Python and then streaming. I am not trying to create a clip and then stream it later, I am trying to have the program be able to respond to outside events and then update it's stream to do whatever it wants. It is sort of like a chat bot, but with video instead of text.
def track_movement(...):
...
return ...
original_video = user_submitted_clip(chat.lastVideoMessage)
original_video.overlay_text(0,0,"The robot watches the user's movements and puts a blue square around it.", Times_New_Roman, 12, Blue)
original_video.add_audio(sine_wave_generator(0,180,2)) # frequency min-max, seconds
# It would be awesome if I could also figure out how to perform advance actions such as tracking movements or pulling a face out of a clip and then applying effects to it on the fly. I know OpenCV can track movements and I hear that it can work with streams, but I cannot figure out how that works. Any help would be appreciated! Thanks!
Because I forgot to add the imports, here are some useful imports I have in my file!
import pyotp
import qrcode
from io import BytesIO
from moviepy import editor as mpy
The library, pyotp, is for generating one time pad authenticator codes, qrcode is for the qr codes, BytesIO is used for virtual files, and moviepy is what I used to generate the GIF and MP4. I believe BytesIO might be useful for piping data to the streaming service, but how that happens, depends entirely on how data is sent to the service, whether it be ffmpeg over command line (from subprocess import Popen, PIPE) or it be a native library.
Are you using ffmpeg.exe and running a command through CMD? If so you can use either concat demuxer or pipe. When you use concat demuxer, ffmpeg can take image input from a text file. Text file should contain image paths and ffmpeg can find those images from different folders. Following code line shows how you can use concat demuxer. Image locations are saved to input.txt fie.
ffmpeg -f concat -i input.txt -vsync vfr -pix_fmt yuv420p output.mp4
But most suitable solution would be to use a data pipe to feed images to ffmpeg.
cat *.png | ffmpeg -f image2pipe -i - output.mkv
you can check this link to see more information about ffmpeg data pipe.
Generating multiple videos and streaming at real time is not a very stable solution. You can run into several problems.
I have settled on using Gstreamer to create my streams on the fly. It can allow me to take separate video and audio streams and combine them together. I do not exactly have a working example right now, but I hopefully will either have an answer or figure it out on my own soon, at Gstreamer in Python exits instantly, but is fine on command line.
Using ffmpeg, I was able to remove duplicate frames from a video using ffmpeg -i in.mp4 -vf mpdecimate,setpts=N/FRAME_RATE/TB out.mp4. However, the audio went on for longer than the video, obviously because the command only removed the video portion. How would I remove the segments of audio which accompany the removed frames?
I’ve got a bunch of stereo files recorded for a documentary with a Zoom in 4 channel mode. Basically it’s sets of pairs of stereo file s— file A would be a stereo file with a lav or boom mike recording, file B of identical length would be a proper stereo recorded by Zoom itself.
Now I’m trying to convert all this into something I can correctly ingest into editing suite. Files A are a mess but I came up with a ffmpeg script which downconvert them to mono then reconvert them back to stereo (to get rid of inconsistensies). Now how do I merge two stereo files into a single WAV or AIFF file containing two separate stereo channels? I browsed around for any workflows and/or standards on that but can’t really find anything useful.
Any ideas on how to do that with ffmpeg (or anything else, really) would be appreciated!
Don't know if FCP-X reads multi track WAVs but you can output to a multi-track MOV.
ffmpeg -i file1.wav -i file2.wav -c copy -map 0 -map 1 file.mov
What I want is to be able to create a livestream from a Ubuntu v14.04 server to a RTMP Server (like Twitch) and to be able to use NodeJS to control visual aspects (adding layers, text, images) and add different sources (video files, others livestreams, etc). Like having OBS running on a server.
What I've done/researched so far:
FFmpeg
With ffmpeg I can can create video files streams like that:
ffmpeg -re -i video.mp4 -c:v libx264 -preset fast -c:a aac -ab 128k -ar 44100 -f flv rtmp://example.com
Also using the filter_complex I can create something near to a layer like this tutorial explains:
https://trac.ffmpeg.org/wiki/Create%20a%20mosaic%20out%20of%20several%20input%20videos
But I found the following problems:
The streams that I create with ffmpeg only last until the video file is over, if I wanted to stream multiple video files (dynamic playlist) it would interrupt the stream between each file;
The manipulation is very limited as far as I am concerned, I can't edit filter_complex once ffmpeg is executing;
Can't display text and create animated overlays, like sliding text.
I tried to search for any cli/nodejs package that is able to create a continuos video stream and manipulate it to use as input source for ffmpeg which streams to the RTMP server.
Can someone give me more information about what I am trying to do?
I'm playing with github.com/fluent-ffmpeg/node-fluent-ffmpeg to see if I have a different outcome.