Secure streaming video with dynamic watermark - security

What are some scalable and secure ways to provide a streaming video to a recipient with their name overlayed as a watermark?

Some of the comments here are very good. Using libavfilter is probably a good place to start. Watermarking every frame is going to be very expensive because it requires decoding and re-encoding the entire video for each viewer.
One idea I'd like to expand upon is watermarking only portions of the video. I will assume you're working with h.264 video, which requires far more CPU cycles to decode and encode than older codecs. I think per cpu core you could mark 1 or 2 stream in real time. If you can reduce your requirements to 10 seconds marked out of 100, then you're talking about 10-20 per core, so about 100 per server. It's probably not the performance you're looking for.
I think some companies sell watermarking hardware for TV operators, but I doubt it's any cheaper than a rack of servers and far less flexible.

I think you want to use the ffmpeg libavfilter library. Basically it allows you to overlay an image on top of a video. There is an example showing how to insert a transparent PNG logo in the bottom left corner of the input. You can interface with the library from C++ or from a shell on a command line basis.
In older versions of ffmpeg you will need to use a extension library called watermark.so, often located in /usr/lib/vhook/watermark.so
Depending on what your content is, you may want to consider using invisible digital watermarking as well. It embeds a digital sequence into your video which is not visually detectable. Even if someone were to remove the visible watermark, the invisible watermark would still remain. If a user were to redistribute your video, invisible watermarking would indicate the source of the redistribution.
Of course there are also companies which provide video content management, but I get the sense you want to do this yourself. Doing the watermarking real time is going to be very resource intensive, especialy as you scale up. I would look to do some type of predicitive watermarking.

Related

Node.js: Is it possible to extract sub-clips from a broader video file based on start/stop time stamps?

I have a flow where iOS app users will record a large video file and upload it to our server. After the fact, the user might want to extract certain portions of that larger video based on specific time stamps and generate a highlight reel that can be viewed and shared locally back on the iOS device.
As a FE developer I don't really have much experience with where to even start here. Our BE will be built in NodeJS. It seems to me that this should be a relatively straightforward problem to solve, but I don't know.
Are there APIs that make movie manipulation easy? Can I easily extract a clip based on a start and stop time and save that as a separate file? Are those costly tasks? Or not too bad?
I'm guessing that the response to this call would be a list of a series of file names that have been generated as a result of these clips being generated, that the iOS app could then pull down and load.
It's not quite as straightforward as it might seem as video files are quite structured with header information and indexing into the individual video and audio tracks and frames. Any splitting up or cropping needs to allow for this and also create new files with the correct headers and indexing etc.
Fortunately, there are indeed libraries that you can use to do this type of thing, one of the most powerful being ffmpeg.
There are projects which allow the ffmpeg command line tool be used programatically - the advantage of this approach is that you get to leverage the vast community knowledge base for ffmpeg command line.
One of the popular ones for nodejs is:
https://github.com/damianociarla/node-ffmpeg
You can then look at the ffmpeg documentation or community answers to find the particularly functionality you need - for example to crop video at a start and end time as you asked:
https://stackoverflow.com/a/42827058/334402
https://superuser.com/a/704118
The general idea is quite simple and will be of the format:
ffmpeg -i yourInputVideo.mp4 -ss 01:30:00 -to 02:30:00 -c copy copy yourNewOutputVideo.mp4
It's worth taking a look at the seeking info in the ffmpeg online documentation (https://ffmpeg.org/ffmpeg.html) to help understand the examples, especially the second one above:
-ss position (input/output)
When used as an input option (before -i), seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved.
When used as an output option (before an output url), decodes but discards input until the timestamps reach position.
position must be a time duration specification, see (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual.

Detecting ads in audio streams?

I have never tried, but just curious if there is any possibility to detect ads in audio streams? I mean except machine learning or something. Some specifics about byte stream during adverts. Maybe kind of different loud value?
From a purely audio standpoint, this isn't possible. There is nothing distinguishable between an advertisement and other audio content. Sure, you could argue that a station playing music will have different spectral characteristics than when talking comes on for an advertisement, but what about ads that also play music? How do you distinguish between an announcer and someone reading an ad? What if the ad is embedded in normal content?
Now, some stations do provide metadata which occasionally contain ad information. If you look at the length of a particular content item, your ads are usually going to be under a minute or 30 seconds. How you get this metadata and deal with it depend on the kind of stream you're working with.
There are techniques emerging to do this and they tend to leverage databases of known adverts to get around the theoretical problems that Brad correctly highlights in his answer.
One of the references below however, uses a techniques based on detecting slight differences in the audio when an ad starts as the initial detection trigger.
Some techniques also use both audio and visual streams to aid detection - for example the Google paper below uses first audio matching and then the video to validate/verify.
Some sources that might be worth looking at for anyone interested in this area (I realise it is an old question but it is still topical):
http://www.xavieranguera.com/papers/cimca_2008.pdf
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/55.pdf
https://www.audiblemagic.com/wp-content/uploads/2014/02/ad_detection_datasheet_150406.pdf

Recommendations for real-time pixel-level analysis of television (TV) video

[Note: This is a rewrite of an earlier question that was considered inappropriate and closed.]
I need to do some pixel-level analysis of television (TV) video. The exact nature of this analysis is not pertinent, but it basically involves looking at every pixel of every frame of TV video, starting from an MPEG-2 transport stream. The host platform will be server-class, multiprocessor 64-bit Linux machines.
I need a library that can handle the decoding of the transport stream and present me with the image data in real-time. OpenCV and ffmpeg are two libraries that I am considering for this work. OpenCV is appealing because I have heard it has easy to use APIs and rich image analysis support, but I have no experience using it. I have used ffmpeg in the past for extracting video frame data from files for analysis, but it lacks image analysis support (though Intel's IPP can supplement).
In addition to general recommendations for approaches to this problem (excluding the actual image analysis), I have some more specific questions that would help me get started:
Are ffmpeg or OpenCV commonly used in industry as a foundation for real-time
video analysis, or is there something else I should be looking at?
Can OpenCV decode video frames in real time, and still leave enough
CPU left over to do nontrivial image analysis, also in real-time?
Is sufficient to use ffpmeg for MPEG-2 transport stream decoding, or
is it preferable to just use an MPEG-2 decoding library directly (and if so, which one)?
Are there particular pixel formats for the output frames that ffmpeg
or OpenCV is particularly efficient at producing (like RGB, YUV, or YUV422, etc)?
1.
I would definitely recommend OpenCV for "real-time" image analysis. I assume by real-time you are referring to the ability to keep up with TV frame rates (e.g., NTSC (29.97 fps) or PAL (25 fps)). Of course, as mentioned in the comments, it certainly depends on the hardware you have available as well as the image size SD (480p) vs. HD (720p or 1080p). FFmpeg certainly has its quirks, but you would be hard pressed to find a better free alternative. Its power and flexibility quite impressive; I'm sure that is one of the reasons that the OpenCV developers decided to use it as the back-end for video decoding/encoding with OpenCV.
2.
I have not seen issues with high-latency while using OpenCV for decoding. How much latency can your system have? If you need to increase performance, consider using separate threads for capture/decoding and image analysis. Since you mentioned having multi-processor systems, this should take greater advantage of your processing capabilities. I would definitely recommend using the latest Intel Core-i7 (or possibly the Xeon equivalent) architecture as this will give you the best performance available today.
I have used OpenCV on several embedded systems, so I'm quite familiar with your desire for peak performance. I have found many times that it was unnecessary to process a full frame image (especially when trying to determine masks). I would highly recommend down-sampling the images if you are having difficultly processing your acquired video streams. This can sometimes instantly give you a 4-8X speedup (depending on your down-sample factor). Also on the performance front, I would definitely recommend using Intel's IPP. Since OpenCV was originally an Intel project, IPP and OpenCV blend very well together.
Finally, because image-processing is one of those "embarrassingly parallel" problem fields don't forget about the possibility of using GPUs as a hardware accelerator for your problems if needed. OpenCV has been doing a lot of work on this area as of late, so you should have those tools available to you if needed.
3.
I think FFmpeg would be a good starting point; most of the alternatives I can think of (Handbrake, mencoder, etc.) tend to use ffmpeg as a backend, but it looks like you could probably roll your own with IPP's Video Coding library if you wanted to.
4.
OpenCV's internal representation of colors is BGR unless you use something like cvtColor to convert it. If you would like to see a list of the pixel formats that are supported by FFmpeg, you can run
ffmpeg -pix_fmts
to see what it can input and output.
For the 4th question only:
video streams are encoded in a 422 format: YUV, YUV422, YCbCr, etc. Converting them to BGR and back (for re-encoding) eats up lots of time. So if you can write your algorithms to run on YUV you'll get an instant performance boost.
Note 1. While OpenCV natively supports BGR images, you can make it process YUV, with some care and knowledge about its internals.
By example, if you want to detect some people in the video, just take the upper half of the decoded video buffer (it contains the grayscale representation of the image) and process it.
Note 2. If you want to access the YUV image in opencv, you must use ffmpeg API directly in your app. OpenCV force the conversion from YUV to BGR in its VideoCapture API.

Is it possible rip game resources from a .smc file?

Is it possible rip game resources from a .smc file? Specifically art, music, sprites, etc. How does an emulator copy the system it emulates?
It's possible, in the sense that the information is all there in some manner. But an smc file is basically a compiled program with embedded resources, and there isn't even a standard compiler or standard format for storing the resources that you can start from.
And as far as image data goes, there is a good chance it will be in the palettized and tiled format used by the PPU, although it's also not unlikely that it will be compressed in some manner or another. But the palette will probably be almost impossible to find by static analysis, and the tile maps are probably generated from the level data rather than being explicitly stored anywhere. You may have better luck running it in an emulator and extracting the data from VRAM.
For music, the situation is even more discouraging. SNES audio is most akin to a MOD file: instruments are sampled, and then the individual samples are pitch-adjusted and mixed to generate the output sound. The SNES provides hardware to decode the instrument samples, manipulate the pitch, and mix them together, but no high-level program (i.e. no equivalent of a mod file "tracker") to play back actual songs. So you may be able to find the BRR-encoded instrument samples in the same manner you may be able to find the image tile data, but the song data can and will be formatted completely differently in different games. Again, your best luck may come from extracting the state of the APU as an SPC file and working with that.
As for your other question, see How do emulators work and how are they written? for a previous answer on that very topic.

How to detect silence in audio files?

I'm working on a tool to edit .srt (subtitle) files within the browser (the tool is to be used for linguistic annotation). In desktop tools that are used for similar purposes, the user has access to the waveform, and can "see" where silences are in the signal, and thus select a particular phrase for transcription.
Such a tool might be buildable in-browser down the road (using Web Workers and Canvas, say), but for now, it's not feasible to do the sort of signal processing it would take to find those silences.
So, I'm thinking about the next-best approach: what free tool could I use to produce a list of timestamps of where silences (below some given threshold) start and stop? If I produce such a list offline and upload it with the audio file, then I can at least make it possible to navigate through the "phrases" (defined as periods of non-silence). I think that would still be a win for in productivity for doing the transcription.
Audacity can sort of doing this, but AFAICT, only if you install Nyquist, which seems to have some patent issues.
Are there any alternatives?
It would be nice if the tool could handle as many as possible of ogg, mp3, and wav files.

Resources