codec encoding video efficiency test in webrtc - browser

I would like to build automatic tool that will test encoding efficiency on enter website. I'm using webrtc for video streaming and some users with weaker hardware experiencing problems with encoding efficiency. So ideally I would like to measure how the device behaves respectively with vp9, vp8 and h.264 encoding and choose that which will ensure smmoth stream and highest quality. I build simple tool that measures frames per second and resolution in each codec, but it seems that it is not the best measurement - webrtc manages resolution on it's own, sometimes keeps lowers resolution and more stable fps, but it is not always the case, it's quite unpredictable on older computers. I have another idea to make some benchmark, but I'm not sure if it's possible to make meaningful measurement in a few seconds (I don't want user wait too long). And it's problematic to measure cpu usage from website. What's the best approach for this? I would be grateful for some advices, I'm not very advanced in webrtc and maybe I'm missing something.

Related

Is there a way to use ffmpeg audio filters to automatically synchronize 2 streams with similar content

I have a situation where I have a video capture of HD content via HDMI with audio from a sound board that goes through a impedance drop into a microphone input of a camcorder. That same signal is split at line level to a 'line in' jack on the same computer that is capturing the HDMI. Alternatively I can capture the audio via USB from the soundboard which is probably the best plan, but carries with it the same issue.
The point is that the line in or usb capture will be much higher quality than the one on HDMI because the line out -> impedance change -> mic in path generates inferior quality in that simply brushing the mic jack on the camera while trying to change the zoom (close proximity) can cause noise on the recording.
So I can do this today:
Take the good sound and the camera captured sound and load each into
audacity and pretty quickly use the timeshift toot to perfectly fit
the good audio to the questionable audio from the HDMI capture and
cut the good audio to the exact size of the video. Then I can use
ffmpeg or other video editing software to replace the questionable
audio with the better audio.
But while somewhat quick and easy, it always carries with it a bit of human error and time. I'd like to automate this if possible as this process is repeated at least weekly throughout the year.
Does anyone have a suggestion if any of these ideas have merit or could suggest another approach?
I suspect but have yet to confirm that the system timestamp of the start time may be recorded in both audio captured with something like Audacity, or the USB capture tool from the sound board as well as the HDMI mpeg-2 video. I tried ffprobe on a couple audacity captured .wav files but didn't see anything in the results about such a time code, but perhaps other audio formats or other probing tools may include this info. Can anyone advise if this is common with any particular capture tools or file formats?
if so, I think I could get best results by extracting this information and then using simple adelay and atrim filters in ffmpeg to sync reliably directly from the two sources in one ffmpeg call. This is all theoretical for me right now-- I've never tried either of these filters yet-- just trying to optimize against blind alleys by asking for advice up front.
If such timestamps are not embedded, possibly I can use the file system timestamp for the same idea expressed in 1a, but I suspect the file open of the two capture tools may have different inherant delays. Possibly these delays will be found to be nearly constant and the approach can work with a built-in constant anticipation delay but sounds messy and less reliable than idea 1. Still, I'd take it, if it turns out reasonably reliable
Are there any ffmpeg or general digital audio experts out there that know of particular filters that can be employed on the actual data to look for similarities like normalizing the peak amplitudes or normalizing the amplification of the two to some RMS value and then stepping through a short 10 second snippet of audio, moving one time stream .01s left against the other repeatedly and subtracting the two and looking for a minimum? Sounds like it could take a while, but if it could do this in less than a minute and be reliable, I suspect it could work. But I have only rudimentary knowledge of audio streams and perhaps what I suggest is just not plausible-- but since each stream starts with the same source I think there should be a chance. I am just way out of my depth as to how to go down this road, so if someone out there knows such magic or can throw me some names of filters and example calls, I can explore if I can make it work.
any hardware level suggestions to take a line level output down to a mic level input and not have the problems I am seeing using a simple in-line impedance drop module, so that I can simply rely on the audio from the HDMI?
Thanks in advance for any pointers or suggestinons!

Determining the quality of mp3 audio streams

I have built a source client using Portaudio and LAME which streams the microphone input to an Icecast server to be listened to online via the HTML5 tag. I have managed to (supposedly) get the quality of the stream to MP3 320kbps at 44.1kHz and am looking for a way to confirm this using tests and or benchmarks.
I have an indication that these stats are somewhat correct from looking at stream inspectors in software such as iTunes and VLC, but I am looking to get a more in-depth data set.
What I basically want is to be able to test how much of the original file is being lost over the stream and if or how much the quality changes depending on environmental conditions of the broadcaster or streamer.
Does anyone know of any tools, frameworks to get some hard numbers or representations of this data?
If VLC tells you the stream is 320kbit CBR, then it is.
It sounds like what you're looking for is a comparison of the actual audio content. This is highly subjective. MP3 is built to use features of how our hearing works to save bandwidth. For example, quiet sounds are masked by loud sounds. High frequencies are harder to hear and are simply rolled off.
You can compare the spectral analysis between the original PCM-sampled waveform and the MP3 decoded waveform, but this doesn't tell you how humans interpret that sound. For that, you would have to survey humans.

How to play a wav file on high frequency?

Do you have any idea on converting my wav audio data to high frequency playback.
I am creating a module that plays a melody wav file in 16-20khz frequency.
any idea?
-tong
There are several different ways of doing this and it will depend whether you have any temporal information that you want to preserve, or whether you can just effectively increase the sample rate so that the frequency is scaled up (and the duration scaled down).
You didn't mention anything about OS, programming language, sound API, etc, so it's impossible to give specific solutions, but if your sound API allows you to change the playback rate than you may just be able to scale this up by a suitable factor. If not then you will need to resample the sound to achieve the same effect. There are plenty of third party libraries out there for resampling (upsampling/downsampling).
If you need to preserve temporal information then things get a little trickier and you'll need to look at pitch shifting algorithms, e.g. phase vocoder.

Secure streaming video with dynamic watermark

What are some scalable and secure ways to provide a streaming video to a recipient with their name overlayed as a watermark?
Some of the comments here are very good. Using libavfilter is probably a good place to start. Watermarking every frame is going to be very expensive because it requires decoding and re-encoding the entire video for each viewer.
One idea I'd like to expand upon is watermarking only portions of the video. I will assume you're working with h.264 video, which requires far more CPU cycles to decode and encode than older codecs. I think per cpu core you could mark 1 or 2 stream in real time. If you can reduce your requirements to 10 seconds marked out of 100, then you're talking about 10-20 per core, so about 100 per server. It's probably not the performance you're looking for.
I think some companies sell watermarking hardware for TV operators, but I doubt it's any cheaper than a rack of servers and far less flexible.
I think you want to use the ffmpeg libavfilter library. Basically it allows you to overlay an image on top of a video. There is an example showing how to insert a transparent PNG logo in the bottom left corner of the input. You can interface with the library from C++ or from a shell on a command line basis.
In older versions of ffmpeg you will need to use a extension library called watermark.so, often located in /usr/lib/vhook/watermark.so
Depending on what your content is, you may want to consider using invisible digital watermarking as well. It embeds a digital sequence into your video which is not visually detectable. Even if someone were to remove the visible watermark, the invisible watermark would still remain. If a user were to redistribute your video, invisible watermarking would indicate the source of the redistribution.
Of course there are also companies which provide video content management, but I get the sense you want to do this yourself. Doing the watermarking real time is going to be very resource intensive, especialy as you scale up. I would look to do some type of predicitive watermarking.

Best Voice Compression Algorithms/Formats

We have some raw voice audio that we need to distribute over the internet. We need decent quality, but it doesn't need to be of musical quality. Our main concern is usability by the consumer (i.e. what and where they can play it) and size of the download. My experience has shown that mp3s do not produce the best compression numbers for voice audio, but I am at a loss for what the best alternatives are. Ultimately we would like to automate the conversion process to allow the consumer to choose the quality vs. size level that they would like.
You should give Opus a try. Example compression command line:
ffmpeg -i x.wav -b:a 32k x.opus
Start here.
As you rightly point out, voice compression is different from general audio compression. You'll find many codecs dedicated to telephony applications, ranging from PCM and ADPCM through later packet based encodings such as CELP used on GSM cellular networks.
Still, VOIP voice encoding is slightly different from that due to the medium used. you can find a good, free (unencumbered and open source (BSD)) library for speech encoding/decoding in the Speex software library.
Again, which you choose depends on the speech you're encoding and the medium it's being transmitted over. Also note that many libraries have several algorithms they can use depending on the circumstances, and some will even switch on the fly based on conditions of the sound and network.
To get more help, narrow your question down.
-Adam
The most frequently used compression formats used in live voice audio (like VoIP telephony) are μ-Law (mu-Law/u-Law is used in the US) and a-Law (used in Europe, etc.) which, unlike Uncompressed PCM, don't support as wide of a frequency range (a smaller range of possible values ignores sounds outside of the necessary spectrum and requires less space to store).
For usability sake it is easiest to use mpeg compressions (mp2/3/4) for streaming to standard media players as the algorithms are readily available and typically quite fast and almost all media players should support it, but for voice you might try to specify a lower bitrate or do your conversion from a lower quality file in the first place (WAV can be at several sampling rates and voice requires a much lower sampling rate than music or effects, it's basically like frame-per-second on video). Alternatively you can use Real Media, WMA or other proprietary formats, but this would limit usability since the users would require specific third party software for playback, though WMA has an excellent compression ratio as well as compression options specific to voice audio.
Assuming your users will be running Windows, there is a WMA speech compression codec that you can use with the Windows Media Encoder SDK. Failing that, you can use ACM to use something like G723/G728, ADPCM, mu-law or a-law, some of which are installed as standard on Windows XP & above. These can be packaged inside WAV files. You'll need to experiment a little to find the right bitrate/quality (probably don't bother with mu-law or a-law). With voice data you can get away with quite low sample rates - e.g. 16000 or 8000, as there isn't much above 4Khz in the human spoken voice.
I think AMR is one of the best speech codecs. I was using it about a year ago and I remember that quality was very good and size levels were rather small.
One drawback, especially in your case is that, as far as I know, it isn't supported by wide range of media players. QuickTime and RealPlayer are two which I know to play .amr files.
Try speex ... unencumbered by patents, good performance both sizewise and CPU-wise. I've been having good luck using it on iPhone.

Resources