Has anyone tried to change the audio frame size in WebRTC? It uses 10ms frames. I need a different size, and the code doesn't look promising...
Fundamentally there's no reason you can't use non-10ms frames, but much of the code is written with that assumption. It would indeed likely be a serious undertaking to change it.
On the device you can use other audio frame (e.g. 20,40ms). For the codec,you can use other audio frame because codec has a audio buffer. I used silk codec and ios device.
Related
I am using moviepy (Python) to read video and audio frames of a video and after making some changes I am writing them back to a videofile, say new.avi, to preserve the changes, or to avoid compression, I am using codec= 'rawvideo' in write_videofile function. But when I read the video and audio frames back, the number of video and audio frames are different than when they were when written, they are usually increased.
Can anybody tell me the reason,? is it because of the ffmpeg used or some other reason? Does it happen always or there is some problem in my machine? Thank you :-)
I have tried to send 12-bit audio to be listened to in real time through the HC05 SPP bluetooth module hooked up to an arduino and DAC over serial with a python RFCOMM socket. I have since learned that Serial Port Protocol is not very great at all for this purpose due to its low bandwidth. I figured I could definitely send the data and then play it out through a DAC, but I doubt an arduino would hold an array the size of a WAV file and maybe not even an mp3 file, but that would defeat the purpose of controlling the audio (play,pause,rewind,etc) from my computer. Would it be more realistic and worthwhile to use an A2DP enabled bluetooth module? Or is it still possible to listen to acceptable quality 12-16 bit audio in real time with SPP? I have tried to use lower bit songs, adjusted baud rates for the arduino and HC-05 serial ports, and tried to adjust the magnitude of the values outputted by the DAC to the audio port and I still seem to get crackly audio. It seems the problem comes down to the low bitrate transfer speed of SPP, or am I wrong?
Is it realistic to stream 12-16 bit audio through SPP bluetooth in realtime?
Sure, at some awfully slow sample rate <= 8 kHz. You'd be better off sending 8-bit audio at a higher sample rate.
Would it be more realistic and worthwhile to use an A2DP enabled bluetooth module?
Yes, absolutely, without question. That's what it's designed for, as I mentioned in your other question.
Or is it still possible to listen to acceptable quality 12-16 bit audio in real time with SPP?
Acceptable is subjective. If it's just voice, you can get away with it. If you want reasonable audio quality for music, almost universally, no, it's not acceptable.
It seems the problem comes down to the low bitrate transfer speed of SPP, or am I wrong?
Without any code to inspect and debug, it's impossible to say what the specific problem is that you're referring to. Undoubtedly, the low bandwidth will not enable good quality audio anyway.
If you must continue to use SPP and simple codecs like PCM, at least use differential PCM to save a bit more bandwidth.
Short story:
If I myself intend to receive and then send a Shoutcast compatible audio stream processed by my application, then how to do it properly using an mp3 (de/en)coder library? Pseudo code, or better - lame mp3 specific code would be highly appreciated.
Long story:
More specific questions which bother me were caused by an article about mp3, which says:
Generally, frames are independent items. Each frame has its own header
and audio informations. There is no file header. Therefore, you can
cut any part of MPEG file and play it correctly (this should be done
on frame boundaries but most applications will handle incorrect
headers). For Layer III, this is not 100% correct. Due to internal
data organization in MPEG version 1 Layer III files, frames are often
dependent of each other and they cannot be cut off just like that.
This made me wonder, how Shoutcast servers and clients deal with frame headers and frame dependencies.
Do I have to encode to constant bitrate (CBR) only, if I want to achieve maximum compatibility with the most of Shoutcast players out there?
Is the mp3 frame header used at all or the stream format is deduced from a Shoutcast protocol specific HTTP header?
Does Shoutcast protocol guarantee (or is it common good practice) to start serving mp3 stream on frame boundaries and continue to respond with chunks that are cut at frame boundaries? But what is the minimum or recommended size of a mp3 frame for streaming live audio?
How does Shoutcast deal with frame dependencies - does it do something special with mp3 encoding to ensure that the served stream does not have frames which depend on previous frames (if this is even possible)? Or maybe it ignores these dependencies on server side/client side, thus getting audio quality reduction or even artifacts?
SHOUTcast servers do not know or care about the data being passed through them. They send it as-is. You can actually send arbitrary data through a SHOUTcast server, and receive it. SHOUTcast will segment the media data wherever the buffer size falls.
It's up to the client to re-sync to the data. It does this by locating the frame header, then being decoding. Once the codec has enough frames to reliably play back audio, it will begin outputting raw PCM. It's up to the codec when to decide it's safe to start playback. Since the codec knows what it's doing in terms of decoding the media, it knows when it has sufficient data (including bit reservoirs) to begin without artifacts. It's also worth noting that the bit reservoir cannot be carried on too far, so it doesn't take but a few frames at worst to handle it.
This is one of the reasons it's important to have a sizable buffer server-side, to flush to the clients as fast as possible on connect. If playback is to start quickly, the codec needs more data than the current frame to begin.
I have a windows phone 8 app which plays audio streams from a remote location or local files using the BackgroundAudioPlayer. I now want to be able to add audio effects, for example, reverb or echo, etc...
Please could you advise me on how to do this? I haven't been able to find a way of hooking extra audio processing code into the pipeline of audio processing even through I've read much about WASAPI, XAudio2 and looked at many code examples.
Note that the app is written in C# but, from my previous experience with writing audio processing code, I know that I should be writing the audio code in native C++. Roughly speaking, I need to find a point at which there is an audio buffer containing raw PCM data which I can use as an input for my audio processing code which will then write either back to the same buffer or to another buffer which is read by the next stage of audio processing. There need to be ways of synchronizing what happens in my code with the rest of the phone's audio processing mechanisms and, of course, the process needs to be very fast so as not to cause audio glitches. Or something like that; I'm used to how VST works, not how such things might work in the Windows Phone world.
Looking forward to seeing what you suggest...
Kind regards,
Matt Daley
I need to find a point at which there is an audio buffer containing
raw PCM data
AFAIK there's no such point. This MSDN page hints that audio/video decoding is performed not by the OS, but by the Qualcomm chip itself.
You can use something like Mp3Sharp for decoding. This way the mp3 will be decoded on the CPU by your managed code, you can interfere / process however you like, then feed the PCM into the media stream source. Main downside - battery life: the hardware-provided codecs should be much more power-efficient.
I have a capture card that captures SDI video with embedded audio. I have source code for a Linux driver, which I am trying to enhance to add video4linux2 support. My changes are based on the vivi example.
The problem I've come up against is that all the example I can find deal with only video or only audio. Even on the client side, everything seems to assume v4l is just video, like ffmpeg's libavdevice.
Do I need to have my driver create two separate devices, a v4l2 device and an alsa device? It seems like this makes the job of keeping audio and video in sync much more difficult.
I would prefer some way for each buffer passed between the driver and the app (through v4l2's mmap interface) contain a frame, plus some audio that matches up (with respect to time) with that frame.
Or perhaps have each buffer contain a flag indicating if it is a video frame, or a chunk of audio. Then the time stamps on the buffers could be used to sync things up.
But I don't see a way to do this with the V4L2 API spec, nor do I see any examples of v4l2-enabled apps (gstreamer, ffmpeg, transcode, etc) reading both audio and video from a single device.
Generally, the audio capture part of a device shows up as a separate device. It's usually a different physical device (posibly sharing a card), which makes sense. I'm not sure how much help that is, but it's how all of the software I'm familiar with works...
There are some spare or reserved fields in the v4l2 buffers that can be used to pass audio or other data from the driver to the calling application via pointers to mmaped buffers.
I modified the BT8x8 driver to use this approach to pass data from an A/D card synchronized to the video on Ubuntu 6.06.
It worked OK, but the effort of maintaining my modified driver caused me to abandon this approach.
If you are still interested I could dig out the details.
IF you want your driver to play with gstreamer etc. a separate audio device generally is what is expected.
Most of the cheap v4l2 capture card's audio is only an analog pass through with a volume control requiring a jumper to capture the audio via the sound card's line input.