gstreamer convert audio/mpeg to audio/x-raw - audio

What gstreamer element(s) will convert a input of audio/mpeg into audio/x-raw?
Background
I am trying to grok how to assemble gstreamer pipelines to extract some audio from a MPEG/TS stream and save it in a wav file.
I can save the audio from a transport stream in MPEG audio format using:
gst-launch-1.0 udpsrc port=1235 caps="application/x-rtp" ! rtpjitterbuffer \
! rtpmp2tdepay ! tsdemux program-number=4352 \
! mpegaudioparse ! queue ! filesink location=audio.mp2
>mediainfo audio.mpg
General
Complete name : audio.mpg
Format : MPEG Audio
File size : 169 KiB
Duration : 5 s 400 ms
Overall bit rate mode : Constant
Overall bit rate : 256 kb/s
FileExtension_Invalid : m1a mpa1 mp1 m2a mpa2 mp2 mp3
Audio
Format : MPEG Audio
Format version : Version 1
Format profile : Layer 2
Duration : 5 s 400 ms
Bit rate mode : Constant
Bit rate : 256 kb/s
Channel(s) : 2 channels
Sampling rate : 48.0 kHz
Frame rate : 41.667 FPS (1152 SPF)
Compression mode : Lossy
Stream size : 169 KiB (100%)
But I can't quite figure out how to convert the mpeg audio into x-raw/PCM/wav for further manipulation either as part of the original pipeline or via a new one.
To my mind it should be something like:
gst-launch-1.0 filesrc location=audio.mp2 ! audio/mpeg ! audioconvert ! wavenc ! filesink location=audio.wav
But audioconvert expects audio/x-raw so this fails with:
WARNING: erroneous pipeline: could not link filesrc0 to audioconvert0, audioconvert0 can't handle caps audio/mpeg
Its not clear to me what elements can accept an audio/mpeg or how to find them. gst-inspect tells you what a plugin does but I need a way to list plugins having a given src or sink type.
gst-inspect shows that wavparse can produce audio/mpeg and mad can convert it to mp3 neither of which is helpful.
I am also working under the assumption that a good way to design a gstreamer pipeline is to use gst-launch to quickly create a command line that does the right thing and then translate that into C++. However, most of the documentation and questions here seem to start directly from the C++ instead. Am I missing a trick somewhere?

A plugin for MPEG audio that works is mpg123audiodec using libmpg123
See - https://gstreamer.freedesktop.org/documentation/plugins.html
>gst-inspect-1.0 mpg123audiodec
[snip]
Pad Templates:
SINK template: 'sink'
Availability: Always
Capabilities:
audio/mpeg
mpegversion: { 1 }
layer: [ 1, 3 ]
rate: { 8000, 11025, 12000, 16000, 22050, 24000, 32000, 44100, 48000 }
channels: [ 1, 2 ]
parsed: true
[snip]
This was recently moved from the set of 'ugly' plugins to good plugins:
gst-plugins-ugly-1.14.0/NEWS
Plugin and library moves
MPEG-1 audio (mp1, mp2, mp3) decoders and encoders moved to -good
Following the expiration of the last remaining mp3 patents in most
jurisdictions, and the termination of the mp3 licensing program, as well
as the decision by certain distros to officially start shipping full mp3
decoding and encoding support, these plugins should now no longer be
problematic for most distributors and have therefore been moved from
-ugly and -bad to gst-plugins-good. Distributors can still disable these
plugins if desired.
In particular these are:
- mpg123audiodec: an mp1/mp2/mp3 audio decoder using libmpg123
- lamemp3enc: an mp3 encoder using LAME
- twolamemp2enc: an mp2 encoder using TwoLAME
A command line for using this is:
gst-launch-1.0 filesrc location=audio.mp2 ! audio/mpeg ! mpg123audiodec ! wavenc ! filesink location=audio.wav

Related

Audio data format being rejected in Speech Studio

I'm uploading a zip file of audio data to Custom Speech project in Speech Studio. However, the files are being rejected after upload.
I've tried sox and ffmpeg to do the file conversion. The output of sox is matching the requirements on the doc pages. I don't understand why the files are being rejected.
sox.exe" --i audio1.wav
Input File : 'audio1.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:02.27 = 36320 samples ~ 170.25 CDDA sectors
File Size : 72.7k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
I zip up the file and upload it. I believe this matches with the requirements below.
File format RIFF (WAV)
Sample rate 8,000 Hz or 16,000 Hz
Channels 1 (mono)
Maximum length per audio 2 hours
Sample format PCM, 16-bit
Archive format .zip
Maximum archive size 2 GB
The UI displays "Failed to upload data. Please check your data format and try to upload again."
I can only believe that there's an issue with the service.
I have little experience with sox, but you use ffmpeg with:
ffmpeg.exe -i -ac 1 -ar 16000
You can find ffmpeg here: https://www.ffmpeg.org/
It is free.
Hope this helps.

How to fix image problems when streaming h.264 via gstreamer udpsink

Using gstreamer I want to stream images from several Logitech C920 webcams to a Janus media server in RTP/h.264 format. The webcams produce h.264 encoded video streams, so I can send the streams to a UDP sink without re-encoding data, only payloading it.
I'm using the gst-interpipe plugin to switch between the different webcams, so that the video stream received by Janus stays the same, but with images coming from whatever webcam I choose.
It works but I'm experiencing some problems with broken frames where the colors are gray and details are blurred away, mainly the first 5 - 10 seconds after I switch between webcam source streams. After that the images correct themselves.
First frames
After 5 - 10 seconds or more
First I thought it was a gst-interpipe specific problem, but I can reproduce it by simply setting up two pipelines - one sending a video stream to a UDP sink and one reading from a UDP source:
gst-launch-1.0 -v -e v4l2src device=/dev/video0 ! queue ! video/x-
h264,width=1280,height=720,framerate=30/1 ! rtph264pay
config-interval=1 ! udpsink host=127.0.0.1 port=8004
gst-launch-1.0 -v udpsrc port=8004 caps = "application/x-rtp,
media=video, clock-rate=90000, encoding-name=H264, payload=96" !
rtph264depay ! decodebin ! videoconvert ! xvimagesink
NB: I'm not experiencing this problem if I send the video stream directly to an xvimagesink, i.e. when not using UDP streaming.
Am I missing some important parameters in my pipelines? Is this a buffering issue? I really have no idea how to correct this.
Any help is greatly appreciated.
Due to the nature of temporal dependencies of video streams you cannot just tune in into stream and expect it to be decode-able immediately. Correct decoding can only start at Random-Access-Point frames (e.g. I- or IDR-frames). before that you will get image data that rely on video frames you haven't received - so they will look broken. Some decoders offers some control on what to do on these cases. libavdec_h264 for example has a output-corrupt option. (But actually I don't how it behaves for "correct" frames which just are missing reference frames). Or they may have options to skip everything until a RAP-frame occurs. This depends on your specific decoder implementation. Note however that on any of these options the initial delay before you will see any image will increase.

Audio format where silence would not affect file size

I'm looking for an audio format where a silence of a couple of hours at the beginning does not affect the overall file size. Has anyone any idea which one to use and what settings I have to use? I tried m4a, ogg and mp3 so far with no luck. An audio sample with 4 hours of silence in the beginning leads to a 400 MB file in some formats.
Of course, dealing with it programmatically would be the more sensible and SO way, something like SoX and the silence/pad effects. After all, any bit of silence is identical to any other bit of silence, trying to compress it is a bit of waste of effort.
Having said that, I was a little curious about this myself so I had a go at comparing how well the different codecs fared at compressing pure digital silence.
I created two test files. The first was a 44.1kHz 16bit 30 minutes long stereo WAVE file containing uncorrelated brown noise at -10.66 dBFS RMS. The second file was the same, except padded with 210 minutes of silence, making the total duration 240 minutes (or 4 hours). Next I encoded the files to various lossy and lossless codecs and looked at the size difference between the padded and unpadded files to gauge how efficiently the silence was encoded.
codec noise noise.silence diff ratio
wav 317.5 2540.0 2222.5 8.0
he-aac 14.6 116.5 101.9 8.0
vorbis 36.4 237.1 200.7 6.5
mp3 38.2 217.2 179.0 5.7
opus 27.0 81.6 54.6 3.0
tta 213.8 544.1 330.3 2.5
aac 54.0 131.7 77.7 2.4
wv 211.3 444.1 232.8 2.1
alac 212.5 393.7 181.2 1.9
flac 211.5 404.8 193.3 1.9
als 209.7 384.2 174.5 1.8
ofr 209.3 356.9 147.6 1.7
Codecs used:
Lossless
wav: WAVE
tta: True Audio v3.4.1
wv: WavPack v4.80.0 (wavpack -x)
alac: Apple Lossless
ofr: OptimFROG v5.100 (ofr --preset 2)
als: MPEG-4 Audio Lossless Coding v23 (mp4alsRM23 -a -b -o50)
flac: Free Lossless Audio Codec v1.3.1 (flac -8)
Lossy vbr
mp3: LAME MP3 v3.99.5 (lame -h -V2)
opus: Opus v1.1.2 (opusenc --bitrate 128 --framesize 40)
aac: Advanced Audio Codec v2.0 (afconvert -f 'm4af' -d aac -q 127 -s 3 -u vbrq 100)
vorbis: Vorbis aoTuV b5.5 (oggenc -q 5)
Lossy cbr
he-aac: High-Efficiency AAC v1 (afconvert -f 'm4af' -d aach -q 127 -s 0 -b 64000)
If you encode your audio file in .wav format, according to the "Multimedia Programming Interface and Data Specifications 1.0" at pages 56-60 you can encode, instead of the usual single "data" chunk, a "LIST" chunk of type 'wavl' alternating "data" and "slnt" chunks. For an interpretation of the obscure (and buggy) specification refer to the wikipedia page on the WAV format.
I'm not sure whether this helps, but if the size causes problems in storage or transfer, you can simply ZIP the wav and voilá! all the empty bytes disappear.
For usage you have to unpack it again though.
You might consider hacking the encoder to "pause" when it encounters more than a second or so of silence. Any of the codecs out there can be hacked to do this, though you will need to understand how they work before starting on changes like that...
Another option is to pipe the output of an MP3 encoder through a program that strips out "extra" silent frames. That might be less overall work (though you're still going to have to understand how MP3 framing & the Layer III bit reservoir work).

Syncing audio and video when mp4muxing in gst-launch-1.0

I have a Logitech C920 webcam that provides properly formatted h264 video, and a mic hooked up to an ASUS Xonar external USB sound card. I can read both and mux their data into a single file like this:
gst-launch-1.0 -e \
mp4mux name=muxy ! filesink location=/tmp/out.mp4 \
alsasrc device='hw:Device,0' do-timestamp=true ! audio/x-raw,rate=48000 ! audioconvert ! queue ! lamemp3enc ! muxy.audio_0 \
v4l2src do-timestamp=true ! video/x-h264,framerate=30/1,height=720 ! h264parse ! queue ! muxy.video_0
...but then I get poorly synchronized audio/video. The audio flow consistently starts with 250ms of garbage noise, and the resulting mp4 video is 250ms (7 or 8 frames at 30fps) out of sync.
Seems like the sources start simultaneously, but the sound card inserts 250ms of initialization junk every time. Or perhaps, the camera takes 250ms longer to start up but reports an incorrect start of stream flag. Or, maybe the clocks in my devices are out of sync for some reason. I don't know how to figure out the difference between these (and other) potential root causes.
Whatever the cause, I'd like to patch over the symptoms at least. I've been trying to do any of the following in the gstreamer pipeline, any of which would satisfy my requirements:
Cut out the first 250ms of audio
Delay the video by 250ms or 7 frames
Synchronize the audio and video timestamps properly with attributes like alsasrc slave-method or v4l2src io-mode
And I'm apparently doing it wrong. Nothing works. No matter what, I always end up with the video running 250ms/7 frames ahead of the audio. Adding the queue elements reportedly fixed the sync issue as mediainfo now reports Duration values for audio and wideo within 20ms of each other, which would be acceptable. But that's not how the resulting videos actually work. Clap my hands, the noise arrives late.
This can be fixed in post processing but why not avoid the hassle and get it right, straight from the gst pipeline? I'm all out of tricks and just about ready to fall back to fixing every single video's sync by hand instead. Any ideas out there?
Thanks for any help, tips, ideas.

Difficult gstreamer pipeline - Decode/demux h264 file to jpeg on Windows using DirectShow

I've been trying for days to get gstreamer's gst-launch-1.0 to output an h264 stream as individual jpegs, but want only one per second, and using the DirectShow hardware acceleration. I've tried numerous iterations of commands, and this is the closest I've gotten:
gst-launch-1.0 filesrc location=test.h264 ! decodebin ! videorate ! video/x-raw,framerate=1/30 ! jpegenc ! multifilesink location=img%03d.jpg
This gives me 300 jpegs from my 10 second h264 stream, and it doesn't use the DirectShow hardware interface.
I've used gst-inspect to try to use what I thought was the DirectShow decoder for h264 (video/x-h264) but that gives me errors. I've also tried to change the framerate from 1/30 to 30/1 and 1/1, but always get the same 30 jpeg per second output.
I thought decodebin was supposed to automatically select the best decoder based on the input stream, but it appears to be using a CPU intensive one (instead of GPU hardware-accelerated) judging by how the CPU on my test machine pegs at 100% for the duration of the gstreamer process.
Ideally, I'd also like the jpegs to be output at a different resolution than the resolution of the video, but everything I've tried (width=640,height=480) either causes errors or doesn't result in a resized jpg.
I'm not sure there's an easy way to do this as pre-baked pipeline. However you can write a probe that counts frames and drops say 29 out of every 30 (e.g. increments a counter % 30). I'm guessing you are wanting to generate thumbnails/previews from a video here?

Resources