How to combine jpeg frames with uncompressed mono audio into an h264 stream or any other format processed by web browsers out of the box? - audio

So I have an esp32 which captures images and sound. The esp32-camera library already returns the jpeg encoded buffer. The audio however is uncompressed and is just a digital representation of signal strength at high sample rate.
I use esp32 to host a webpage which contains <image> element and a JavaScript snippet, which constantly sends GET requests to a branching url for image data and updates the element. This approach is not very good, especially that now I've added audio capabilities to the circuit.
I'm curious if it would be possible to combine jpeg encoded frames and some audio data into a chunk of h264 and then send it directly as a response to a GET request making it a stream?
This not only would simplify the whole serving multiple webpages thing, but also remove the issues of syncing the audio and video if they are sent separately.
In particular I'm also curious how easy would it be to do on esp32 since it doesn't have a whole bunch of ram and computational power. It would be challenging to find or port large libraries which could help as well, so i guess I would have to code it myself.
I also am not sure if h264 is the best option. I know its supported on most browser out of the box and is using jpeg compression behind the scenes for the frames, but perhaps a simpler format exists which is also widely supported.
So to sum it up: Is h264 a best bet in the provided context? Is combining jpeg and uncompressed mono audio into h264 possible in the provided context? If an answer to either of previous questions is a no, what alternatives do i have if any?

I'm curious if it would be possible to combine jpeg encoded frames and some audio data into a chunk of h264 and then send it directly as a response to a GET request making it a stream?
H.264 is a video codec. It doesn't have anything to do with audio.
I know its supported on most browser out of the box and is using jpeg compression behind the scenes for the frames
No, this isn't true. H.264 is its own thing. It's far more powerful than JPEG and is specifically designed for motion, whereas JPEG was not.
You need a few things:
A video codec, to efficiently handle your frames. Most of these embedded camera libraries can give you an MJPEG stream. I'd use that if possible. I don't think your ESP32 has other video encoding capability, does it? H.264 is a good choice, but only if you can actually encode it.
A container format, to aid in streaming your audio and video streams together. ISOBMFF/MP4 is common, as is WebM/Matroska.
If you're only streaming to a single client (which seems likely given the limited horsepower of the board), and if you have enough capability to do the audio/video encoding, you can generate a WebM stream on the fly that is directly playable in a <video> element. This seems exactly what you are asking for.

Related

How to play audio from RTMP stream?

I'm working with RTMP. I have captured RTMP packents in wireshark. I know how to assemble and play video data but don't know how to play audio. Wireshark tell me that data is in .aac. But i don't get how i can play it? Is i need to wrap it in container? wireshark capture
AAC can be played without a container. But every frame must have an ADTS header (google can explain that part to you) to convert from raw frames to ADTS you must get the sequence header from the start of the stream and convert to ADTS.

Determining the quality of mp3 audio streams

I have built a source client using Portaudio and LAME which streams the microphone input to an Icecast server to be listened to online via the HTML5 tag. I have managed to (supposedly) get the quality of the stream to MP3 320kbps at 44.1kHz and am looking for a way to confirm this using tests and or benchmarks.
I have an indication that these stats are somewhat correct from looking at stream inspectors in software such as iTunes and VLC, but I am looking to get a more in-depth data set.
What I basically want is to be able to test how much of the original file is being lost over the stream and if or how much the quality changes depending on environmental conditions of the broadcaster or streamer.
Does anyone know of any tools, frameworks to get some hard numbers or representations of this data?
If VLC tells you the stream is 320kbit CBR, then it is.
It sounds like what you're looking for is a comparison of the actual audio content. This is highly subjective. MP3 is built to use features of how our hearing works to save bandwidth. For example, quiet sounds are masked by loud sounds. High frequencies are harder to hear and are simply rolled off.
You can compare the spectral analysis between the original PCM-sampled waveform and the MP3 decoded waveform, but this doesn't tell you how humans interpret that sound. For that, you would have to survey humans.

rtmp audio message(0x08) format (mp3)

im trying to write a little client for rtmp(audio only). so far i got the communication working (red5 server) but now im stuck with the audio data.
the server is sending in MP3 44KHz 16bit stereo.
i get my Audiomessage which consists of the byte identifying the codec (0x2f) and the audio data which looks for example like this
ff:fb:92:64:eb:80:03:98:58:d2:e9:26:1b:7e:5d:e7:4a:1a:19:26:5c:8b:89:07:47:44:98:6b:91:2d:9c:28:b4:33:15:70:82:c9:29:87:8d:e4:8f:31:83:84:7b:e5:82:b5:57:62:00:02:e5:bb:f1:86:15:7a:8f:da:9e:ca:4f:83:9d:0a:c4:56:7b:b3:3d:56:43:ba:2b:28:b8:9d:0c:e1:82:0c:08:36:24:f3:39:67:54:b7:41:d9:8e:ef:36:96:56:22:d2:b9:9f:ae:40:43:8e:ea:39:52:0c:a4:48:25:02:54:91:c7:35:37:2d:be:f2:37:23:61:65:35:d9:0f:aa:18:b4:37:d9:d4:c8:68:21:3c:bd:ea:c1:d0:98:df:eb:96:59:99:88:09:37:36:c3:8b:47:80:64:84:41:ba:35:ea:a6:0a:d6:74:9e:09:f6:a5:d7:3f:1f:53:d8:fb:8d:d9:d3:f8:ee:c7:c1:68:25:25:8e:ae:6a:1c:08:52:9d:58:cf:cf:87:c1:ba:a4:f0:63:76:b0:b4:65:79:1b:3b:21:5f:2f:b5:7a:18:43:af:f7:fd:15:0c:87:c9:73:54:95:22:94:cc:cb:e3:da:4d:e0:f3:8a:95:69:69:eb:32:71:57:08:49:76:e0:f3:84:8c:4b:4c:84:6b:5d:7a:c8:c9:d7:df:d5:e2:68:bb:5f:6c:9f:ba:f4:0a:6c:6e:51:8a:b3:59:9a:07:0c:e4:2a:9d:ec:d1:99:53:48:f2:8b:22:b2:d3:bf:e1:5b:9f:ee:49:9f:2c:ee:63:1f:6f:da:90:e7:65:00:55:99:97:77:b9:e8:97:43:81:fd:32:e4:81:20:d0:78:f5:4f:59:47:39:f2:57:5d:f4:d5:91:48:c9:45:10:52:49:4d:04:87:6b:0e:a5:72:ed:34:74:08:93:5b:8a:54:3a:d9:7e:53:8f:c7:5e:b1:99:f3:55:63:72:49:99:55:3a:b8:0d:73:3b:2a:ea:9a:b5:32:d2:3b:61:c2:4e:e9:56:78:99:14:4a:a7:46:f4:ee:ae:6f:ff:c8:85:2d:07:68:ad:e2:84:dd:0a:bd:2e:93:12:43
i dont find a little thing about the data format. as the first byte is always 0xff i assume every chunk of audio data has a little header describing its contents.
the rtmp spec from adobe doesnt loose a single word about the format of the audio message package (just two lines saying its an audio message... wow).
does anyone know the format for the audio messages or at least a source where i find something?
The Adobe spec doesn't document the elementary stream formats because they are covered in their own documents, and usually quite large. MP3 is covered by ISO/IEC 11172-3.
There is a good rundown available here:
http://www.mpgedit.org/mpgedit/mpeg_format/mpeghdr.htm

ffmpeg - Can I draw an audio channel as an image?

I'm wondering if it's possible to draw an audio channel of a video or audio file as an image using ffmpeg, or if there's another tool that would do it on Win2k8 x64. I'm doing this as part of an encoding process after a user uploads a video or audio file.
I'm using ColdFusion 10 to handle the upload and calling cfexecute to run ffmpeg.
I need the image to look something like this (without the horizontal lines):
You can do this programmatically very easily.
Study the basics of FFmpeg. I suggest you to compile this sample. It explains how to open a video/audio, identify the streams and loop over the packets.
Once you have the data packet (in this case you are interested only in the audio packets). You will decode it (line 87 of this document) and obtain the raw data of an audio. It's the waveform itself (the analogue "bitmap" for an audio).
You could also study this sample. This second example is how to write a video/audio file. You don't want to write any video, but with this sample you can easily understand how the audio raw data packet works, if you see the functions get_audio_frame() and write_audio_frame().
You need to have some knowledge about creating a bitmap. Any platform has an easy way to do that.
So, the answer for you: YES, IT IS POSSIBLE TO DO THIS WITH FFMPEG! But you have to code a little bit in order to get what you want...
UPDATE:
Sorry, there are ALSO built-in features for this:
You could use those filters... or
showspectrum, showwaves, avectorscope
Here are some examples on how to use it: FFmpeg Filters - 12.22 showwaves.

Saving V4L2 video camera output

What video format would be the easiest when saving the output of a camera using V4L2 if I capture it in bitmap format? Getting mpeg directly could be, of course, nice, but I can't unfortunately count on that.
I have managed to capture the frames, now I need to somehow view the video. Can I simply convert those frames using some Linux tool or could I save the video easily straight from my app?
To keep things simple (as in a Proof-of-Concept demo), you can go ahead and directly store the YUV frames captured from the device into a file.
There are a bunch of viewers that support playback of single/multiple frame(s) of YUV data from a file.
One such YUV viewer is freecode.com/projects/yay
You could use practically any format/codec if you used mencoder or ffmpeg
Btw, this question really should be on superuser.com
If you are capturing frames already, you could save them to PPM images and then go to JPEG. I did this using v4l2 and ImageMagick. Maybe you could push JPEGs into a Motion JPEG stream. It might not be as high tech as MPEG, but you might get it working quickly. PPM files were a cinch to create. If I remember correctly, the v4l2 example code shows you how to do that part.

Resources