Motion detection in compressed domain (JPEG/Mpeg4/H264) - jpeg

everyone!
I process video from IP cameras and have wrote a motion detection algorithm based on decompressed video analysis. But i really something more fast. I've found several papers about compressed domain analysis but have failed to find any implementations.
Can anyone recommend me some code?
found materials:
http://www.ist-live.org/intranet/school-of-informatics-university-of-bradford001-7/41410206.pdf/view
http://doc.rero.ch/lm.php?url=1000,43,4,20061128120121-NA/Bracamonte_Javier_-_A_Low_Complexity_Change_Detection_Algorithm_20061128.pdf

I had to detect motion in H.264-video, and for me the frame size was a really good indicator.
I used ffprobe (from the ffmpeg project) to export frame sizes like this:
./ffprobe -show_frames -pretty video.mp4 | grep 'size' | grep -o '[0-9]*' > sizes.txt
In my case no movement meant larger I-frames (for me, every 30th frame was an I-frame) and smaller sizes for some of the frames in between.
I'm new to video encoding so I guess these things might be very dependent on encoding and type of video signal, but it's worth a look since it's very fast to try out. Export framesizes and have a look in e.g. Matlab.
Edit:
In the end I re-encoded the video so that every second frame was an I-frame, as this gave better time resolution. One idea I did not test was to reverse the video and do the same thing, this should give more accurate estimations of when the motion started/ended, akin to removing the phase delay by forward-backward filtering.

https://github.com/Breakthrough/DVR-Scan
DVR-Scan is a cross-platform command-line (CLI) application that
automatically detects motion events in video files (e.g. security
camera footage). In addition to locating both the time and duration of
each motion event, DVR-Scan will save the footage of each motion event
to a new, separate video clip. Not only is DVR-Scan free and
open-source software (FOSS), written in Python, and based on Numpy and
OpenCV, it was built to be extendable and hackable.
I can confirm that it works perfectly with MPEG4 (H264) AVI files.
Scanning speed is about 30 fps at my laptop with i5 4300U CPU for 1200x900 video.
You can check the sources for the algorithm used.
And here are some explaning tutorial links from the same author:
https://github.com/Breakthrough/python-scene-detection-tutorial
See also Python scene change detection.

Related

Blender VSE Audio out-of-sync when animation (video) is rendered

Ok, so I found out that Blender has this really cool video-editing interface and I was beginning to love it. Until, I created this awesome project composition and when I exported the animation as a video file, the audio was out of sync :(.
Actual Problem
Audio is in-sync with video when the animation is played in Blender but is out-of-sync in the rendered video.
Solutions I tried out and failed
I used the 'Audio-Sync' option in the sequencer but that made no difference.
Then I thought that my scene audio frequency might have been an issue since it was initially 48kHz and my videos were at 24kHz, so I changed the scene audio frequency to 24kHz, this still failed to solve the issue.
Initially, I was combining videos with different frame rates and thought that might have been an issue (although animation played as expected in Blender), so I recreated the source videos to ensure all videos I was using in my project had the same frame rate, but this also did not work.
Someone online suggested exporting the video and audio separately and then combining them using a command-line tool like FFMPEG, this also failed.
What's really frustrating
This lag (audio is a few frames ahead of the video) is noticeable only in longer videos (>12 mins, my video is 1 hr long) suggesting a very small rendered rate difference between the video and the audio.
Also, note that the animation plays absolutely fine in Blender, so all I could figure out was that this was a rendering issue.
So if anyone figured this out please let me know. I am a noob in video/audio codecs so please forgive me if I used some incorrect nomenclature above.
I encountered this issue on OBS capture (a 13 minute clip) with Blender 2.93.3. OBS capture is constant framerate at 60 fps, I did try Handbrake conversion to 60 fps constant framerate also with no help. Workaround to solve the issue is to set Blender rendering fps to 59.94, sequencer shows audio track extending over video track but after render everything matches perfectly. Unfortunately you cannot edit the video in 59.94 fps mode, so you need to switch back to 60 fps for editing.
In case your video is 24 fps then use 23.98 fps preset and for 30 fps you can use the 29.97 fps preset.
May 2021. Blender v2.92.0 - I experienced the same as described out-of-sync problem with rendered videos that were over five minutes long. Source was as-is (3.6GB, 10mins) file from Canon EOS 5DMKII, which is an old camera, so pretty much any software can handle the encoding.
In Blender's preview mode everything looks in-sync. Audio and Video tracks are of the same length. I didn't even cut or merged any segments of the source video. I tried running rendering after a clean boot, gave Blender highest resource priority in Win10, allocated more memory to caching, etc. Source and output was on SSD. Rendered result still didn't match what GUI showed. Very frustrating, and a lot of wasted time.
What worked better for me is the following:
Change Video Codec to "FFmpeg video codec #1". This produces a lossless file that is about 27 times bigger (13.8GB for 10mins) than H.264 codec file (0.5GB). However, the audio remains in sync all the way through.
Use HandBrake open source video transcoder to convert FFmpeg file into H.264 (or H.265). End result produces a smallish-size file with A/V that is in sync.
This workaround is relatively painless and produces good-quality results because there's only a single lossy compression step. The time required to get to final file more than triples though. I believe the issue continues to be with the way H.264 rendering in Blender is implemented. I also experienced similar out-of-sync issues in ShotCut a year ago while working with cheap action cam H.265 files. I also found ShotCut to be less stable than Blender.
So after a lot of online searching, I did find an answer to fix this problem, but not in Blender. If you are like me and would like to use Blender for video editing and still get around the issue, then I found a workaround, but you need Shotcut for this. Shotcut is another great free and open-source video editor
Export the entire long video from Blender (the rendered video has desync issues as expected).
Open the video in Shotcut and detach the audio from it.
Use the audio properties to make very fine adjustments to the audio playback speed to suit your requirements (make fine adjustments until video and audio are in sync).
Follow the GIF attached.
(I am using a shorter video in the GIF but you get the idea)
Explanation
Blender has issues while rendering long videos and I noticed that the video is exported at 1.0x speed but the audio is sometimes faster (1.00400x or something like that) and hence the rendered video has audio not in sync with the video.
Another bad thing is that Blender does not really allow very fine playback speed adjustment just to the audio.
One trick is to adjust the pitch of the audio in Blender which in turn changes the playback speed but this is only allowed up to 2 decimal places (not what we want for long videos) and it makes the audio sound funny (since it actually changes the pitch).
Shotcut is a great tool that allows fine playback adjustment, and it also has a pitch compensation feature so that your pitch is kind of unaffected (since we don't want the characters to be sounding funny in our edited video).
Shotcut allows playback speed adjustment up to 6 decimal places.
I landed at this thread because of the same issue happening in a video that I have just finished. The "View animation CTRL F11" command starts an internal player that has sync issues with long videos. Opening the same video file on "Videos" in Fedora, it plays perfectly synchronized.

Speed up playback of a video with a video editor

Recently, I discover that my tutorial videos could be seen at 1.5x playback speed without losses in quality (they are actually better to see, as I normally speak slowly). My problem is that if I change the speed of the video when using a video editor, like Kdenlive, the audio becomes distorted and turns into a mess (higher pitch, I believe).
How could I obtain the same quality as VLC "playback fast" and Youtube "playback speed 1.5" for the audio track? I'm a layman in audio/video editing, so I'm also satisfied with partial answers, like the identification of which terms I should search for in this case.
It might be better to take your audio track and use something like Sound Forge to automatically remove silence. Just be sure to add a pad to that (built into sound forge) otherwise the speech will sound way to chopped and fast.
Aside from that, you could also use Vegas to (then) chop the video to keep pace with your new speech rate. Vegas is a video editing program that is best for this kind of down and dirty editing.

Is there a way to use ffmpeg audio filters to automatically synchronize 2 streams with similar content

I have a situation where I have a video capture of HD content via HDMI with audio from a sound board that goes through a impedance drop into a microphone input of a camcorder. That same signal is split at line level to a 'line in' jack on the same computer that is capturing the HDMI. Alternatively I can capture the audio via USB from the soundboard which is probably the best plan, but carries with it the same issue.
The point is that the line in or usb capture will be much higher quality than the one on HDMI because the line out -> impedance change -> mic in path generates inferior quality in that simply brushing the mic jack on the camera while trying to change the zoom (close proximity) can cause noise on the recording.
So I can do this today:
Take the good sound and the camera captured sound and load each into
audacity and pretty quickly use the timeshift toot to perfectly fit
the good audio to the questionable audio from the HDMI capture and
cut the good audio to the exact size of the video. Then I can use
ffmpeg or other video editing software to replace the questionable
audio with the better audio.
But while somewhat quick and easy, it always carries with it a bit of human error and time. I'd like to automate this if possible as this process is repeated at least weekly throughout the year.
Does anyone have a suggestion if any of these ideas have merit or could suggest another approach?
I suspect but have yet to confirm that the system timestamp of the start time may be recorded in both audio captured with something like Audacity, or the USB capture tool from the sound board as well as the HDMI mpeg-2 video. I tried ffprobe on a couple audacity captured .wav files but didn't see anything in the results about such a time code, but perhaps other audio formats or other probing tools may include this info. Can anyone advise if this is common with any particular capture tools or file formats?
if so, I think I could get best results by extracting this information and then using simple adelay and atrim filters in ffmpeg to sync reliably directly from the two sources in one ffmpeg call. This is all theoretical for me right now-- I've never tried either of these filters yet-- just trying to optimize against blind alleys by asking for advice up front.
If such timestamps are not embedded, possibly I can use the file system timestamp for the same idea expressed in 1a, but I suspect the file open of the two capture tools may have different inherant delays. Possibly these delays will be found to be nearly constant and the approach can work with a built-in constant anticipation delay but sounds messy and less reliable than idea 1. Still, I'd take it, if it turns out reasonably reliable
Are there any ffmpeg or general digital audio experts out there that know of particular filters that can be employed on the actual data to look for similarities like normalizing the peak amplitudes or normalizing the amplification of the two to some RMS value and then stepping through a short 10 second snippet of audio, moving one time stream .01s left against the other repeatedly and subtracting the two and looking for a minimum? Sounds like it could take a while, but if it could do this in less than a minute and be reliable, I suspect it could work. But I have only rudimentary knowledge of audio streams and perhaps what I suggest is just not plausible-- but since each stream starts with the same source I think there should be a chance. I am just way out of my depth as to how to go down this road, so if someone out there knows such magic or can throw me some names of filters and example calls, I can explore if I can make it work.
any hardware level suggestions to take a line level output down to a mic level input and not have the problems I am seeing using a simple in-line impedance drop module, so that I can simply rely on the audio from the HDMI?
Thanks in advance for any pointers or suggestinons!

Detect if video file contains movement

I have a bunch of video clips from a webcam (duration is 5, 10, 60 seconds), and I'm looking for a way to detect "does this video clip have movement", to decide whether the file should be saved or discarded in a future processing phase.
I've looked into motion and OpenCV, but motion seems to only want to work on the raw video stream, and OpenCV seems to be way too advanced for my use.
My ideal solution would be a linux command-line tool that I can feed video files into, and get a simple "does/doesn't contain movement" answer back, so I can discard the irrelevant files. False positives (in a reasonable quantity) are perfectly acceptable for my use.
Does such a tool exist? Or any simple examples of doing this with other tools?
You can check dvr-scan which is simple cross-platform command line tool based on OpenCV.
To just list motion events in csv format (scan only):
dvr-scan -i some_video.mp4 -so
To extract motion in single video:
dvr-scan -i some_video.mp4 -o some_video_motion_only.avi
For more examples and various other parameters see:
https://dvr-scan.readthedocs.io/en/latest/guide/examples/
I had the same problem and wrote the solution: https://github.com/jooray/motion-detection
Should be fairly easy to use from command-line.
If you would like to post-process already-captured video then motion can be useful.
VLC allow you to stream or convert your media for use locally, on your private network, or on the Internet. So an already-captured video can be streamed over HTTP, RTSP, etc. and motion can handle it as a network camera.
Furthermore:
How to Stream using VLC Media Player
If OpenCv is to advanced for you, maybe you should consider something easier which is... SimpleCV (wrapper for OpenCV) "This is computer vision made easy". There is even an example of motion detection using SimpleCV - https://github.com/sightmachine/simplecv-examples/blob/master/code/motion-detection.py Unfortunetely i can't test it(because my OpenCv version isn't compatible with SimpleCV), but generally it looks fine (and isn't complicated) - it just substract previous frame from current and calculate mean of the result. If this value is bigger than some threshold (which most likely you will have to adjust) than we can assume that there were some motion between those 2 frames. Note that setting threshold to 0 is really a bad idea, because always there is some difference between 2 consecuitve frames (changes of lighting, noises, etc).

Recommendations for real-time pixel-level analysis of television (TV) video

[Note: This is a rewrite of an earlier question that was considered inappropriate and closed.]
I need to do some pixel-level analysis of television (TV) video. The exact nature of this analysis is not pertinent, but it basically involves looking at every pixel of every frame of TV video, starting from an MPEG-2 transport stream. The host platform will be server-class, multiprocessor 64-bit Linux machines.
I need a library that can handle the decoding of the transport stream and present me with the image data in real-time. OpenCV and ffmpeg are two libraries that I am considering for this work. OpenCV is appealing because I have heard it has easy to use APIs and rich image analysis support, but I have no experience using it. I have used ffmpeg in the past for extracting video frame data from files for analysis, but it lacks image analysis support (though Intel's IPP can supplement).
In addition to general recommendations for approaches to this problem (excluding the actual image analysis), I have some more specific questions that would help me get started:
Are ffmpeg or OpenCV commonly used in industry as a foundation for real-time
video analysis, or is there something else I should be looking at?
Can OpenCV decode video frames in real time, and still leave enough
CPU left over to do nontrivial image analysis, also in real-time?
Is sufficient to use ffpmeg for MPEG-2 transport stream decoding, or
is it preferable to just use an MPEG-2 decoding library directly (and if so, which one)?
Are there particular pixel formats for the output frames that ffmpeg
or OpenCV is particularly efficient at producing (like RGB, YUV, or YUV422, etc)?
1.
I would definitely recommend OpenCV for "real-time" image analysis. I assume by real-time you are referring to the ability to keep up with TV frame rates (e.g., NTSC (29.97 fps) or PAL (25 fps)). Of course, as mentioned in the comments, it certainly depends on the hardware you have available as well as the image size SD (480p) vs. HD (720p or 1080p). FFmpeg certainly has its quirks, but you would be hard pressed to find a better free alternative. Its power and flexibility quite impressive; I'm sure that is one of the reasons that the OpenCV developers decided to use it as the back-end for video decoding/encoding with OpenCV.
2.
I have not seen issues with high-latency while using OpenCV for decoding. How much latency can your system have? If you need to increase performance, consider using separate threads for capture/decoding and image analysis. Since you mentioned having multi-processor systems, this should take greater advantage of your processing capabilities. I would definitely recommend using the latest Intel Core-i7 (or possibly the Xeon equivalent) architecture as this will give you the best performance available today.
I have used OpenCV on several embedded systems, so I'm quite familiar with your desire for peak performance. I have found many times that it was unnecessary to process a full frame image (especially when trying to determine masks). I would highly recommend down-sampling the images if you are having difficultly processing your acquired video streams. This can sometimes instantly give you a 4-8X speedup (depending on your down-sample factor). Also on the performance front, I would definitely recommend using Intel's IPP. Since OpenCV was originally an Intel project, IPP and OpenCV blend very well together.
Finally, because image-processing is one of those "embarrassingly parallel" problem fields don't forget about the possibility of using GPUs as a hardware accelerator for your problems if needed. OpenCV has been doing a lot of work on this area as of late, so you should have those tools available to you if needed.
3.
I think FFmpeg would be a good starting point; most of the alternatives I can think of (Handbrake, mencoder, etc.) tend to use ffmpeg as a backend, but it looks like you could probably roll your own with IPP's Video Coding library if you wanted to.
4.
OpenCV's internal representation of colors is BGR unless you use something like cvtColor to convert it. If you would like to see a list of the pixel formats that are supported by FFmpeg, you can run
ffmpeg -pix_fmts
to see what it can input and output.
For the 4th question only:
video streams are encoded in a 422 format: YUV, YUV422, YCbCr, etc. Converting them to BGR and back (for re-encoding) eats up lots of time. So if you can write your algorithms to run on YUV you'll get an instant performance boost.
Note 1. While OpenCV natively supports BGR images, you can make it process YUV, with some care and knowledge about its internals.
By example, if you want to detect some people in the video, just take the upper half of the decoded video buffer (it contains the grayscale representation of the image) and process it.
Note 2. If you want to access the YUV image in opencv, you must use ffmpeg API directly in your app. OpenCV force the conversion from YUV to BGR in its VideoCapture API.

Resources