How to detect silence in audio files? - audio

I'm working on a tool to edit .srt (subtitle) files within the browser (the tool is to be used for linguistic annotation). In desktop tools that are used for similar purposes, the user has access to the waveform, and can "see" where silences are in the signal, and thus select a particular phrase for transcription.
Such a tool might be buildable in-browser down the road (using Web Workers and Canvas, say), but for now, it's not feasible to do the sort of signal processing it would take to find those silences.
So, I'm thinking about the next-best approach: what free tool could I use to produce a list of timestamps of where silences (below some given threshold) start and stop? If I produce such a list offline and upload it with the audio file, then I can at least make it possible to navigate through the "phrases" (defined as periods of non-silence). I think that would still be a win for in productivity for doing the transcription.
Audacity can sort of doing this, but AFAICT, only if you install Nyquist, which seems to have some patent issues.
Are there any alternatives?
It would be nice if the tool could handle as many as possible of ogg, mp3, and wav files.

Related

Node.js: Is it possible to extract sub-clips from a broader video file based on start/stop time stamps?

I have a flow where iOS app users will record a large video file and upload it to our server. After the fact, the user might want to extract certain portions of that larger video based on specific time stamps and generate a highlight reel that can be viewed and shared locally back on the iOS device.
As a FE developer I don't really have much experience with where to even start here. Our BE will be built in NodeJS. It seems to me that this should be a relatively straightforward problem to solve, but I don't know.
Are there APIs that make movie manipulation easy? Can I easily extract a clip based on a start and stop time and save that as a separate file? Are those costly tasks? Or not too bad?
I'm guessing that the response to this call would be a list of a series of file names that have been generated as a result of these clips being generated, that the iOS app could then pull down and load.
It's not quite as straightforward as it might seem as video files are quite structured with header information and indexing into the individual video and audio tracks and frames. Any splitting up or cropping needs to allow for this and also create new files with the correct headers and indexing etc.
Fortunately, there are indeed libraries that you can use to do this type of thing, one of the most powerful being ffmpeg.
There are projects which allow the ffmpeg command line tool be used programatically - the advantage of this approach is that you get to leverage the vast community knowledge base for ffmpeg command line.
One of the popular ones for nodejs is:
https://github.com/damianociarla/node-ffmpeg
You can then look at the ffmpeg documentation or community answers to find the particularly functionality you need - for example to crop video at a start and end time as you asked:
https://stackoverflow.com/a/42827058/334402
https://superuser.com/a/704118
The general idea is quite simple and will be of the format:
ffmpeg -i yourInputVideo.mp4 -ss 01:30:00 -to 02:30:00 -c copy copy yourNewOutputVideo.mp4
It's worth taking a look at the seeking info in the ffmpeg online documentation (https://ffmpeg.org/ffmpeg.html) to help understand the examples, especially the second one above:
-ss position (input/output)
When used as an input option (before -i), seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved.
When used as an output option (before an output url), decodes but discards input until the timestamps reach position.
position must be a time duration specification, see (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual.

Detect if video file contains movement

I have a bunch of video clips from a webcam (duration is 5, 10, 60 seconds), and I'm looking for a way to detect "does this video clip have movement", to decide whether the file should be saved or discarded in a future processing phase.
I've looked into motion and OpenCV, but motion seems to only want to work on the raw video stream, and OpenCV seems to be way too advanced for my use.
My ideal solution would be a linux command-line tool that I can feed video files into, and get a simple "does/doesn't contain movement" answer back, so I can discard the irrelevant files. False positives (in a reasonable quantity) are perfectly acceptable for my use.
Does such a tool exist? Or any simple examples of doing this with other tools?
You can check dvr-scan which is simple cross-platform command line tool based on OpenCV.
To just list motion events in csv format (scan only):
dvr-scan -i some_video.mp4 -so
To extract motion in single video:
dvr-scan -i some_video.mp4 -o some_video_motion_only.avi
For more examples and various other parameters see:
https://dvr-scan.readthedocs.io/en/latest/guide/examples/
I had the same problem and wrote the solution: https://github.com/jooray/motion-detection
Should be fairly easy to use from command-line.
If you would like to post-process already-captured video then motion can be useful.
VLC allow you to stream or convert your media for use locally, on your private network, or on the Internet. So an already-captured video can be streamed over HTTP, RTSP, etc. and motion can handle it as a network camera.
Furthermore:
How to Stream using VLC Media Player
If OpenCv is to advanced for you, maybe you should consider something easier which is... SimpleCV (wrapper for OpenCV) "This is computer vision made easy". There is even an example of motion detection using SimpleCV - https://github.com/sightmachine/simplecv-examples/blob/master/code/motion-detection.py Unfortunetely i can't test it(because my OpenCv version isn't compatible with SimpleCV), but generally it looks fine (and isn't complicated) - it just substract previous frame from current and calculate mean of the result. If this value is bigger than some threshold (which most likely you will have to adjust) than we can assume that there were some motion between those 2 frames. Note that setting threshold to 0 is really a bad idea, because always there is some difference between 2 consecuitve frames (changes of lighting, noises, etc).

Converting Audio From Unknown Format

I would like to create a utility in either PHP or Perl to convert an audio file created by the Nortel's Callpilot voice mail system into a wave file. The problem is that the format, which has the .vbk file extension, is unknown to virtually any audio player. To date, I have not found one that will play a .vbk file. I've looked at audio file conversion libraries in CPAN and tried many of them, they don't recognize the file. I was not successful with PHP's audio formats manipulation either. Nortel does provide a converter, however, it does not suite my needs. I would like to have this run via cron on a CentOS system. I don't know how to reverse engineer this format. There seems to be just scraps of info on this format on the web. This page indicates that it is "based on the H.232 format":
https://www.odesk.com/o/jobs/job/Reverse-Engineer-Nortel-VBK-Audio-Format_~~f501f11679f3f6bb/
I know this is a very old thread, but I've recently been looking into converting Nortel's vbk format as well. Importing the vbk files into Audacity with raw data option, Encoding: U-Law, Byte order: little-endian, Channels: 1 Channel (Mono), Sample rate: 8000 Hz. Not sure if they have multiple formats for their vbk files, but mine were from a BCM50 phone system.
Well, this is the joy of closed proprietary systems. But there is a chance they could play nice. Try to contact Callpilot and see if they'll give you the format specs. It's worth a shot.
As for reverse engineering, you need to be able to generate known content. Like a constant tone at 60Hz for exactly 1 second. Then at 50Hz. Then at 10 seconds. Compare them. Isolate the data from the metadata. There is going to be compression involved, so try a handful of common compression schemes, maybe research into Nortel's practices will probably tell you more. If you can feed that into a player and get a tone back out, you're on your way.
There's probably more informed and structured ways to go about reverse engineering, but from my experience it's a lot of trial and error.

Secure streaming video with dynamic watermark

What are some scalable and secure ways to provide a streaming video to a recipient with their name overlayed as a watermark?
Some of the comments here are very good. Using libavfilter is probably a good place to start. Watermarking every frame is going to be very expensive because it requires decoding and re-encoding the entire video for each viewer.
One idea I'd like to expand upon is watermarking only portions of the video. I will assume you're working with h.264 video, which requires far more CPU cycles to decode and encode than older codecs. I think per cpu core you could mark 1 or 2 stream in real time. If you can reduce your requirements to 10 seconds marked out of 100, then you're talking about 10-20 per core, so about 100 per server. It's probably not the performance you're looking for.
I think some companies sell watermarking hardware for TV operators, but I doubt it's any cheaper than a rack of servers and far less flexible.
I think you want to use the ffmpeg libavfilter library. Basically it allows you to overlay an image on top of a video. There is an example showing how to insert a transparent PNG logo in the bottom left corner of the input. You can interface with the library from C++ or from a shell on a command line basis.
In older versions of ffmpeg you will need to use a extension library called watermark.so, often located in /usr/lib/vhook/watermark.so
Depending on what your content is, you may want to consider using invisible digital watermarking as well. It embeds a digital sequence into your video which is not visually detectable. Even if someone were to remove the visible watermark, the invisible watermark would still remain. If a user were to redistribute your video, invisible watermarking would indicate the source of the redistribution.
Of course there are also companies which provide video content management, but I get the sense you want to do this yourself. Doing the watermarking real time is going to be very resource intensive, especialy as you scale up. I would look to do some type of predicitive watermarking.

how to sync a midi file with an audio file

I want to take a classical music piece in .mp3 (or other audio file if necessary) file and take the same music piece in *.midi file. then - I want to synchronize between them so as a result only the midi file would change and the timing of its beat would be synchronized with the .mp3. So lets say - if I would play them both on the same time they would play the same notes synchronizly.
How can I do so?
(I have cubase if the answer might be there...)
It's a tough task because general beat-tracking (follow tempo changes) hasn't yet been figured out.
There's at least one tool that does work though for matching an audio file to a midi file, assuming the audio file is almost identical to the midi file in terms of the score. But I can't remember it's named, never have used it. The place is to ask is the Music Information Retrieval community of scientists:
http://listes.ircam.fr/wws/info/music-ir
For manual mathcing, you can use modern DAW's like Logic, Pro Tools, etc, to help you with this by providing reasonably nice tools to build a detailed tempo-map of the audio file, and then the MIDI file would line right up with it, but it's a tedious task. You'll likely need tempo changes more often than every measure to get a nice alignment - it will be style-dependent.
You could use tools that already exist. For example, if you know the tempo of the mp3, then you could use this page to change the tempo on the midi file.

Resources