Decode G711(PCM u-law) - audio

Please bear with me as my understanding of audio codec is limited.
I have this audio source from a IPCAM (through a htto//... CGI interface).
I am trying to write several client programs to play this audio source on Windows, MAC, as well as Android phone. The audio is encoded in G711 (PCM ulaw).
Do I need to decode the PCM audio data to a raw audio data before I could pass it to the audio engine to play? If so, is there some sample code on how to decode it?
I am confused as somehow I believe PCM is already RAW. Could I just feed it directly to the audio engine on Android for example?
thanks much in advance

It depends on what API you are using to play sound, but most require linear PCM and you have µ-law PCM, so unless your API supports µ-law playback you will need to convert the µ-law sample values to linear.
With G.711 the compressed µ-law samples are 8 bits and these will be converted to 14 bit linear values which you will store in a buffer as 2 bytes per sample. There is a brief description of the µ-law encoding on the G.711 Wikipedia page.

You may find this useful:
u-Law companding algorithm in C

Related

How to combine jpeg frames with uncompressed mono audio into an h264 stream or any other format processed by web browsers out of the box?

So I have an esp32 which captures images and sound. The esp32-camera library already returns the jpeg encoded buffer. The audio however is uncompressed and is just a digital representation of signal strength at high sample rate.
I use esp32 to host a webpage which contains <image> element and a JavaScript snippet, which constantly sends GET requests to a branching url for image data and updates the element. This approach is not very good, especially that now I've added audio capabilities to the circuit.
I'm curious if it would be possible to combine jpeg encoded frames and some audio data into a chunk of h264 and then send it directly as a response to a GET request making it a stream?
This not only would simplify the whole serving multiple webpages thing, but also remove the issues of syncing the audio and video if they are sent separately.
In particular I'm also curious how easy would it be to do on esp32 since it doesn't have a whole bunch of ram and computational power. It would be challenging to find or port large libraries which could help as well, so i guess I would have to code it myself.
I also am not sure if h264 is the best option. I know its supported on most browser out of the box and is using jpeg compression behind the scenes for the frames, but perhaps a simpler format exists which is also widely supported.
So to sum it up: Is h264 a best bet in the provided context? Is combining jpeg and uncompressed mono audio into h264 possible in the provided context? If an answer to either of previous questions is a no, what alternatives do i have if any?
I'm curious if it would be possible to combine jpeg encoded frames and some audio data into a chunk of h264 and then send it directly as a response to a GET request making it a stream?
H.264 is a video codec. It doesn't have anything to do with audio.
I know its supported on most browser out of the box and is using jpeg compression behind the scenes for the frames
No, this isn't true. H.264 is its own thing. It's far more powerful than JPEG and is specifically designed for motion, whereas JPEG was not.
You need a few things:
A video codec, to efficiently handle your frames. Most of these embedded camera libraries can give you an MJPEG stream. I'd use that if possible. I don't think your ESP32 has other video encoding capability, does it? H.264 is a good choice, but only if you can actually encode it.
A container format, to aid in streaming your audio and video streams together. ISOBMFF/MP4 is common, as is WebM/Matroska.
If you're only streaming to a single client (which seems likely given the limited horsepower of the board), and if you have enough capability to do the audio/video encoding, you can generate a WebM stream on the fly that is directly playable in a <video> element. This seems exactly what you are asking for.

Which is best sound format for IBM Speech to Text?

IBM advises using Opus sound format for audio submitted to its Watson Speech to Text service. The idea being that Opus is designed specifically for speech.
Otherwise, it says you will get better quality transcription when submitting audio in flac format than in mp3 format. The latter has the obvious advantage of its small size. There is after all a 100Mb limit for file submissions. So you weigh the balance of your needs. That all makes sense so far.
But looking at conversions done on a source WAV file, the Opus file is size is comparable with mp3.
Downsampling a 366Mb wav file to 8k sample rate (one of two sample rates advised for using the service), created a wav file of 66.4Mb. Converting that to flac, wav and opus produced flac: 43.6Mb; mp3: 6.2Mb; opus: 9.8Mb.
So is opus really the best choice for getting the most accurate transcription? And how can that be when it is so small compared to flac?
Opus is designed to efficiently encode speech. The details are explained in the linked wiki article, but just to give you a gist, consider that human vocalisation range is rather limited, roughly from 80 to 260 Hz. On the other hand, or hearing range is far greater, up to 20000 Hz. Whereas music encoders (like mp3) have to work roughly within our hearing range, voice-specialised encoders (like Opus) can focus on what matters to efficiently encode human voice, with no interest what lies significantly above our vocalisation range. That I hope provides some intuition why Opus is so efficient.
Is it the best? It's somewhat opinionated, but yes, I think it's among the best choices out there. To cite after Wikipedia, Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate.

File information of .raw audio files using terminal in linux

How to get file information like sampling rate, bit rate etc of .raw audio files using terminal in linux? Soxi works for .wav files but it isn't working for .raw.
If your life depended on discovering an answer you could make some assumption to tease apart the unknowns ... however there is no automated way since the missing header would give you the easy answers ...
The audio analysis tool called audacity allows you to open up a RAW file, make some guesses and play the track
http://www.audacityteam.org
In audacity goto File -> Import -> Raw Data...
Above settings are typical for audio ripped from a CD ... toy with trying stereo vs mono for starters.
Those picklist widgets give you wiggle room to discover the format of your PCM audio given that the source audio is something when properly rendered is recognizable ... would be harder if the actual audio was noise
However if you need a programmatic method then rolling your own solution to ask those same questions which appear in above window is possible ... is that what you need or will audacity work for you ? We can go down the road of writing code to play off the unknowns mentioned in #Frank Lauterwald's comment
To kick start discovering this information programmatically, if the binary raw audio is 16 bit then each audio sample (point on the audio curve) will consume two bytes of your PCM file. For mono audio then the following two bytes would be your next sample, however if its stereo then these two following bytes would be the sample from the other channel. If more than two channels then just repeat. Typical audio is little endian. Sampling rate is important when rendering the audio, not when programmatically parsing raw bytes. One approach would be to create an output file with a WAV header followed by your source PCM data. Populate the header with answers from your guesswork. This way you could listen to this output file to help confirm your guesses.
Here is a sample 500k mono PCM audio file signed 16 bit which can be imported into audacity or used as input to rolling your own identification code
The_Constructus_Corporation_Long_Street-ycexQvMy03k_excerpt_mono.pcm

Convert video from Hi8-Lp format

Some video, was recorded by camera (Hi8-Lp format). Then it was decoded to mpeg2video codec. I have this decoded video. But decoded video have not correct video and audio speed (like fast playback) and have longitudinal lines on video (you can see sample).
sample video
How to convert video with correct speed?
Thx for help.
You can use this application "DVDSanta" ..
http://www.topvideopro.com/burn-dvd/8mm-to-dvd.htm
I hope this answer help you...
Use WinDV if You are on windows, then convert it with ffmpeg (or StaxRip, MeGUI, Handbrake) to Your preferable format.
On Mac You could use iMovie
On Linux You could use xawtv (didn't tried this one)
Do not encode video when transferring from camera, do it afterwards

rtmp audio message(0x08) format (mp3)

im trying to write a little client for rtmp(audio only). so far i got the communication working (red5 server) but now im stuck with the audio data.
the server is sending in MP3 44KHz 16bit stereo.
i get my Audiomessage which consists of the byte identifying the codec (0x2f) and the audio data which looks for example like this
ff:fb:92:64:eb:80:03:98:58:d2:e9:26:1b:7e:5d:e7:4a:1a:19:26:5c:8b:89:07:47:44:98:6b:91:2d:9c:28:b4:33:15:70:82:c9:29:87:8d:e4:8f:31:83:84:7b:e5:82:b5:57:62:00:02:e5:bb:f1:86:15:7a:8f:da:9e:ca:4f:83:9d:0a:c4:56:7b:b3:3d:56:43:ba:2b:28:b8:9d:0c:e1:82:0c:08:36:24:f3:39:67:54:b7:41:d9:8e:ef:36:96:56:22:d2:b9:9f:ae:40:43:8e:ea:39:52:0c:a4:48:25:02:54:91:c7:35:37:2d:be:f2:37:23:61:65:35:d9:0f:aa:18:b4:37:d9:d4:c8:68:21:3c:bd:ea:c1:d0:98:df:eb:96:59:99:88:09:37:36:c3:8b:47:80:64:84:41:ba:35:ea:a6:0a:d6:74:9e:09:f6:a5:d7:3f:1f:53:d8:fb:8d:d9:d3:f8:ee:c7:c1:68:25:25:8e:ae:6a:1c:08:52:9d:58:cf:cf:87:c1:ba:a4:f0:63:76:b0:b4:65:79:1b:3b:21:5f:2f:b5:7a:18:43:af:f7:fd:15:0c:87:c9:73:54:95:22:94:cc:cb:e3:da:4d:e0:f3:8a:95:69:69:eb:32:71:57:08:49:76:e0:f3:84:8c:4b:4c:84:6b:5d:7a:c8:c9:d7:df:d5:e2:68:bb:5f:6c:9f:ba:f4:0a:6c:6e:51:8a:b3:59:9a:07:0c:e4:2a:9d:ec:d1:99:53:48:f2:8b:22:b2:d3:bf:e1:5b:9f:ee:49:9f:2c:ee:63:1f:6f:da:90:e7:65:00:55:99:97:77:b9:e8:97:43:81:fd:32:e4:81:20:d0:78:f5:4f:59:47:39:f2:57:5d:f4:d5:91:48:c9:45:10:52:49:4d:04:87:6b:0e:a5:72:ed:34:74:08:93:5b:8a:54:3a:d9:7e:53:8f:c7:5e:b1:99:f3:55:63:72:49:99:55:3a:b8:0d:73:3b:2a:ea:9a:b5:32:d2:3b:61:c2:4e:e9:56:78:99:14:4a:a7:46:f4:ee:ae:6f:ff:c8:85:2d:07:68:ad:e2:84:dd:0a:bd:2e:93:12:43
i dont find a little thing about the data format. as the first byte is always 0xff i assume every chunk of audio data has a little header describing its contents.
the rtmp spec from adobe doesnt loose a single word about the format of the audio message package (just two lines saying its an audio message... wow).
does anyone know the format for the audio messages or at least a source where i find something?
The Adobe spec doesn't document the elementary stream formats because they are covered in their own documents, and usually quite large. MP3 is covered by ISO/IEC 11172-3.
There is a good rundown available here:
http://www.mpgedit.org/mpgedit/mpeg_format/mpeghdr.htm

Resources