How can I prepare video for Azure Media Services myself (on-premises) in variable bitrate? - azure

I usually use Media Encoder Standard to encode 4k videos in H264 Multiple bitrate format. But it's just becoming expensive (for me) because of source 4k file size, so it take up to 20 hours when encoding in Azure.
So I wonder is there a way to prepare it myself for this format https://learn.microsoft.com/en-us/azure/media-services/media-services-mes-preset-h264-multiple-bitrate-4k ? I do video editing and color grading anyway.

Ok, so the answer to this, as can be seen in the comment thread above is to do several changes to your workflow to reduce the time and the costs:
Change your source content to be 4k 30p instead of 60p. There really is no need to have 60p for the type of content that you are filming. It's not really high action content.
This should cut your upload source data size in half...
Download the JSON for the 4K preset that you are using "H264 Multi Bitrate 4k" and customize it. Don't trust that we have given you the right settings for your cost demands or scenario. :-)
Change the frame rates in the preset, drop some of the bitrate layers, remove some layers as Anil suggested above. This should seriously reduce the encoding time, and your overall output costs. Just cut it down to the bare minimum and give it another shot.
If that does not work out for you, ping us again at amshelp#microsoft.com and we can help figure out other scenarios to assist.
Thanks for using Azure Media Services! And also thanks for contributing to the community.
John D.

Related

Node.js: Is it possible to extract sub-clips from a broader video file based on start/stop time stamps?

I have a flow where iOS app users will record a large video file and upload it to our server. After the fact, the user might want to extract certain portions of that larger video based on specific time stamps and generate a highlight reel that can be viewed and shared locally back on the iOS device.
As a FE developer I don't really have much experience with where to even start here. Our BE will be built in NodeJS. It seems to me that this should be a relatively straightforward problem to solve, but I don't know.
Are there APIs that make movie manipulation easy? Can I easily extract a clip based on a start and stop time and save that as a separate file? Are those costly tasks? Or not too bad?
I'm guessing that the response to this call would be a list of a series of file names that have been generated as a result of these clips being generated, that the iOS app could then pull down and load.
It's not quite as straightforward as it might seem as video files are quite structured with header information and indexing into the individual video and audio tracks and frames. Any splitting up or cropping needs to allow for this and also create new files with the correct headers and indexing etc.
Fortunately, there are indeed libraries that you can use to do this type of thing, one of the most powerful being ffmpeg.
There are projects which allow the ffmpeg command line tool be used programatically - the advantage of this approach is that you get to leverage the vast community knowledge base for ffmpeg command line.
One of the popular ones for nodejs is:
https://github.com/damianociarla/node-ffmpeg
You can then look at the ffmpeg documentation or community answers to find the particularly functionality you need - for example to crop video at a start and end time as you asked:
https://stackoverflow.com/a/42827058/334402
https://superuser.com/a/704118
The general idea is quite simple and will be of the format:
ffmpeg -i yourInputVideo.mp4 -ss 01:30:00 -to 02:30:00 -c copy copy yourNewOutputVideo.mp4
It's worth taking a look at the seeking info in the ffmpeg online documentation (https://ffmpeg.org/ffmpeg.html) to help understand the examples, especially the second one above:
-ss position (input/output)
When used as an input option (before -i), seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved.
When used as an output option (before an output url), decodes but discards input until the timestamps reach position.
position must be a time duration specification, see (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual.

Azure Media Services <Quality> in thumbnails

I am using azure media services for video content management. For this I am also generating thumbnails using xml preset file as provided here.
But I am not able to understand the significance of Quality here. I am not able to find much documentation for this. I understand the common definition of the word. But will it impact the time taken for encoding, any cost difference etc. ?
It only impacts the JPEG compression. 0-100%, similar to 1-10 in Photoshop when compressing a JPEG. It's going to just crunch your JPEG down with more compression and artifacts. A typical setting of 70-80% is usually a good enough quality for a JPEG. If you are using PNG, you don't need to set it.
It does not affect the performance, cost, etc. of the Thumbnail job.

Understanding Azure Media Service Encoding permutations - that increases file size

How system can improve a video quality automatically? For example, a dark line on my face in video can't be removed by system automatically...make sense. Here I'm trying to understand Azure Media Services encoding permutations.
When I uploaded a 55.5 MB mp4 file and encoded with "H264AdaptiveBitrateMP4Set720p" encoder preset, I received following output files:
Now look at green rectangular highlighted video file, this looks good according to input file size. But if you look at red rectangular highlighted video files, these are improved files for adaptive streaming, which looks useless if you compare with my example 'a dark line on my face'. Here's my questions and I would love to read your input on this:
What are exact reasons encoder increases the file size?
Why I should pay more for bandwidth and storage on these large files, how I convince clients?
Is there any way I can define not to create such files when scheduling encoding?
Any input is highly appreciated.
1) The dark lines appearing on your face have nothing to do with encoding. Encoding simply means re-arranging bits that make up the video using a different compression algorithm than the one used in the source video.
2) As you see from the filenames of the files generated, they all have a different bitrate, denoted in kbps. This is the amount of data, i.e. number of bits, that the transcoder has to decode to get 1-second worth of video footage. The higher the bit-rate, the better is the quality of the video because there is more detail such as better light and color information stored in every pixel and hence in every frame of such a video.
As a corollary, a higher-bit rate video is suited better for faster internet connections.
So, Azure must have converted from your source video, these 4 different videos of different bit-rates, all having the same video (h.264) and audio (AAC) encoding.
3) As to how to let Azure not make so many files, I do not know the answer to that. I am pretty sure it will be some configuration somewhere but I honestly have no idea. I am confident, though, that it is only a matter of some configuration to tell it to stop doing the other bit-rate conversions.
In summary:
a) to clear off the dark thingy on your face in the video, you have to edit the source video in a video editor and that has nothing to do with video encoding.
b) The file sizes are different due to different bit-rates, meaning differences in light and color information, i.e. shadow detail, stored in every pixel of every frame of the video footage.
Those users who have a faster Internet connection, to them you could show the option of downloading a higher-bit-rate file. The higher bit-rate file will show slightly better quality even under the same video resolution, i.e. 720p in your case.

Secure streaming video with dynamic watermark

What are some scalable and secure ways to provide a streaming video to a recipient with their name overlayed as a watermark?
Some of the comments here are very good. Using libavfilter is probably a good place to start. Watermarking every frame is going to be very expensive because it requires decoding and re-encoding the entire video for each viewer.
One idea I'd like to expand upon is watermarking only portions of the video. I will assume you're working with h.264 video, which requires far more CPU cycles to decode and encode than older codecs. I think per cpu core you could mark 1 or 2 stream in real time. If you can reduce your requirements to 10 seconds marked out of 100, then you're talking about 10-20 per core, so about 100 per server. It's probably not the performance you're looking for.
I think some companies sell watermarking hardware for TV operators, but I doubt it's any cheaper than a rack of servers and far less flexible.
I think you want to use the ffmpeg libavfilter library. Basically it allows you to overlay an image on top of a video. There is an example showing how to insert a transparent PNG logo in the bottom left corner of the input. You can interface with the library from C++ or from a shell on a command line basis.
In older versions of ffmpeg you will need to use a extension library called watermark.so, often located in /usr/lib/vhook/watermark.so
Depending on what your content is, you may want to consider using invisible digital watermarking as well. It embeds a digital sequence into your video which is not visually detectable. Even if someone were to remove the visible watermark, the invisible watermark would still remain. If a user were to redistribute your video, invisible watermarking would indicate the source of the redistribution.
Of course there are also companies which provide video content management, but I get the sense you want to do this yourself. Doing the watermarking real time is going to be very resource intensive, especialy as you scale up. I would look to do some type of predicitive watermarking.

Estimating the time-position in an audio using data?

I am wondering on how to estimate where I am currently in an audio with regards to time, by using the data.
For example, I read data by byte[8192] blocks. How can I know how much byte[8192] is equivalent to in time?
If this is some sort of raw-ish encoding, like PCM, this is simple. The length in time is a function of the sample rate, bit depth, and number of channels. 30 seconds of 16-bit audio at 44.1kHz in mono is 2.5MB. However, you also need to factor in headers and container format crapola. WAV files for example can have a lot of other stuff in them.
Compressed formats are much more tricky. You can never be sure where you are without playing through the file to get to where you are. Of course you can always guesstimate based on the percentage of the file length, if that is good enough for your case.
I think this is not what he was asking.
First you have to tell us what kind of data you are using. WAV? MP3? Usually without knowing where that block came from - so you know if you have some kind of frame information and where to find it - you are not able to determine that block's position.
If you have the full stream and this data then you can do a search

Resources