Is 600kbps upload speed too slow for VOIP app? - voip

I've built a voip app for iphone and android phone (audio only). I've been testing the call quality. I noticed that if a person has an upload speed of less than 600kbps, then his partner will have trouble hearing him clearly. His partner might hear lots of crackling, dropped sentences in speech, or nothing at all.
I am currently using GSM codec on my android, iphone and on my asterisk server.
So my question is whether upload speeds of less than 600kpbs is generally considered too slow for VOIP calls?
If so, is there anything I can do to reduce necessary bandwidth for call? I could consider the G722 codec, but from what I remember, it doesn't offer significant bandwidth improvement compared to GSM
Additional Notes
I used speedtest.net to inform me of my upload speed. I'm not sure if that's reliable way to test speeds necessary for voip service.
Also, I'm using linphone core as my SIP library.

Related

Is Opus supported for VoLTE?

There are so many different codecs for phone calls and many of them have very high license fees, meaning it will take a lot of time before everyone can use normal telephony with wide band audio.
Is Opus supported for VoLTE?
The usual codecs for VoLTE are AMR, AMR-WB and EVS (see links below for more info - thanks, #Mikael DĂși Bolinder).
As with most mainstream voice (and video codecs) there is IPR and licensing associate with these. However, for end users the network providers and device manufacturers have included the licensing and the codecs in their rollouts so a typical operator service will use these.
I'm not aware of any restrictions from 3GPP on using other codecs if the devices and the network support them, but the above are definitely the default and the most widely used.
If you want to create your own voice service, e.g a VoIP service running over the data connection to the phone, then in theory you can use whatever codec you want. It's worth being aware that for software based codecs, which they will be unless they are tightly integrated in the device's hardware, the efficiency is important as an inefficient implementation may impact performance, battery life etc.
For Opus in particular there are several open source projects which provide Android libraries for this, for example. Opus is also supposed to be supported on devices from Android 5+ (https://developer.android.com/guide/topics/media/media-formats).
amr-licensing-wikipedia: https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_audio_codec#Licensing_and_patent_issues "AMR licensing (and issues) on Wikipedia"
amr-wb-licensing-wikipedia: https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_Wideband#Licensing "AMR-WB licensing on Wikipedia"
evs-news-patent-pool: http://www.mpegla.com/Lists/MPEG%20LA%20News%20List/Attachments/97/n-16-01-20.pdf "MPEG developing a patent pool for EVS"

What's the best protocol for live audio (radio) streaming for mobile and web?

I am trying to build a website and mobile app (iOS, Android) for the internet radio station.
Website users broadcast their music or radio and mobile users will just listen radio stations and chat with other listeners.
I searched a week and make a prototype with Wowza engine (using HLS and RTMP) and SHOUTcast server on Amazon EC2.
Using HLS has a delay with 5 seconds, but RTMP and SHOUTcast has 2 second delay.
With this result I think I should choose RTMP or SHOUTcast.
But I am not sure RTMP and SHOUTcast are the best protocol. :(
What protocol should I choose?
Do I need to provide a various protocol to cover all platform?
This is a very broad question. Let's start with the distribution protocol.
Streaming Protocol
HLS has the advantage of allowing users to get the stream in the bitrate that is best for their connection. Clients can scale up/down seamlessly without stopping playback. This is particularly important for video, but for audio even mobile clients are capable of playing 128kbit streams in most areas. If you intend to have a variety of bitrates available and want to change quality mid-stream, then HLS is a good protocol for you.
The downside of HLS is compatibility. iOS supports it, but that's about it. Android has HLS support but it is still buggy. (Maybe in another year or two once all the Android 3.0 folks are gone, this won't be as much of an issue.) JWPlayer has some hacks to make HLS work in Flash for desktop users.
I wouldn't bother with RTMP unless you're only concerned with Flash users.
Pure progressive streaming with HTTP is the route I almost always choose to go. Everything can play it. (Even my Palm Pilot's default media player from 12 years ago.) It's simple to implement and well understood.
SHOUTcast is effectively HTTP, but a poorly implemented version that has compatibility issues, particularly on mobile devices. It has a non-standard status line in its response which breaks a lot of clients. Icecast is a good alternative, and is what I would recommend for production use today. As another option, I have created my own streaming service called AudioPump which is HTTP as well, and has been specifically built to fix compatibility with oddball mobile clients, such as native Android players on old hardware. It isn't generally available yet, but you can contact me at brad#audiopump.co if you want to try it.
Latency
You mentioned a latency of 2 seconds being desirable. If you're getting 2-second latency with SHOUTcast, something is wrong. You don't want latency that low, particularly if you're streaming to mobile clients. I usually start with a 20-second buffer at a minimum, which is flushed to the client as fast as it can receive it. This enables immediate starting of the stream playback (as it fills up the client-side buffer so it can begin decoding) while providing some protection against buffer underruns due to network conditions. It's not uncommon for mobile users to walk around the corner of a building and lose their nice signal quality. You want your stream to survive that as best as possible, so if you have already sent the data to cover the drop out, the user doesn't have to know or care that their connection became mediocre for a short period of time.
If you do require low latency, you're looking at the wrong technology entirely. For low latency, check out WebRTC.
You certainly can tweak your traditional internet radio setup to reduce latency, but rarely is that a good idea.
Codec
Codec choice is what will dictate your compatibility more than anything else. MP3 is easily the most compatible, and AAC isn't far behind. If you go with AAC, you get better quality audio for a given bitrate. Most folks use this to reduce their bandwidth bill.
There are licensing fees with MP3, and there may be with AAC depending on what you're using for a codec. Check with a lawyer. I am not one, and the licensing is extremely complicated.
Other codecs include Vorbis and Opus. If you can use Opus, do so as the licensing is wide open and you get good quality for the bandwidth. Client compatibility here though is the killer of Opus. (Maybe in a few years it will be better.) Vorbis is a mediocre codec, but is free and clear.
On the extreme end, I have some stations doing their streaming in FLAC. This is lossless audio quality, but you're paying for 8x the bandwidth as you would with a medium quality MP3 station. FLAC over HTTP streaming compatibility is not code at the moment, but it works alright in VLC.
It is very common to support multiple codecs for your streams. Depending on your budget, if you can't do that, you're best off with MP3.
Finally on encoding, don't go from a lossy codec to another lossy codec if you can help it. Try to get the output stream as close to the input as possible. If you re-encode audio, you lose quality every time.
Recording from Browser
You mentioned users streaming from a browser. I built something like this a couple years ago with the Web Audio API where the audio is captured and then encoded and sent off to Icecast/SHOUTcast servers. Check it out here: http://demo.audiopump.co:3000/ A brief explanation of how it works is here: https://stackoverflow.com/a/20850467/362536
Anyway, I hope this helps you get started.
Streaming straight audio/mpeg (mp3 packets) has worked everywhere I've tried.
If you are developing an APP then go with AAC, if you are simply playing via web browser then you need a HTML5 Implimentation which is MP3. All custom protocols like RTMP or SHOUTcast requires additional UI to be built. There are some third party players available in open source world. You can either use them or stick to HTML5 MP3/OGG as most people now days are using chrome browser or other HTML5 complaint browsers.

What libraries/APIs allow me access real time audio waveforms of a phone call?

I am looking to build an app that needs to process incoming audio on a phone call in real time.
WebRTC allows for this but i think this works only in their browser based P2P audio communications functionality but not for phone calls/ VOIP.
Twilio and Plivo allow you record the audio for batch/later processing.
Is there a library that will give me access to the audio streams in real time? If not, what would I need to build such a service from scratch?
Thanks
If you are open to using a media server (so that the call is not longe P2P but it's mediated by the media server using a B2B model), then perhaps the Kurento Media Server may solve your problem. Kurento Media Server makes possible to create processing capabilities which are applyied in real time onto the media streams. There are many examples in the documentation of computer vision and augmented reality algorithms applied in real time over the video streams. I've never seen an only-audio processing module, but it should be simple to implement just by creating an additional module, which is not too complex if you have some knowledge about C/C++ and media processing concepts.
Disclaimer: I'm part of the Kurento development team.

AudioTrack Latency

I'm making an application, that will be capable to make a VOIP communications on WAN, using AudioManager, specifically AudioTrack and AudioRecord, AudioRecord works fine but I have serious problems with the latency to play with AudioTrack. It is really high and unacceptable. I'm receiving chunks of 160Bytes and my audio settings are of 16 bit, 8KHz, 1 Channel, by that, in my chunk of 160 bytes I have about 10 ms by chunk, I will not have significative latency
I know that they are a lot of peoples with the same problem, but VOIP applications exists and probably the problem is mine.
PD: I'm programming in Java, I have tested it between a Motorola Milestone (Droid, android 2.2) and in another Samsung phone (android 2.3) and I have the same problems in both playing device. Also, I have tried to play my sound streamed to my computer and it is in real time. By that, the problem is in the player (AudioTrack). The latency of the network is very low (I'm on WAN) and I receive more than 99% of the packets (about 16Kb/s).
There is any way to continue with a VOIP program and make it usable?
Really thanks, beyond this problem, I haven't found some clear solution and it surely exist. It is a very usual and usefull, more in a communication device.
Do you use UDP instead of TCP? If not, consider Google for "Android UDP example". If yes, sorry for bothering.

Low level audio programming

I wonder; does audio software like Cubase and Audacity use PlaySound calls??
Where can I learn about low level audio programming? As far as I've found information on the web, MCI seems to be the lowest level audio API in Windows...
Thanks
Edit: I don't ask for information specific for Windows only.
There's several audio APIs to choose from. The oldest and most widely supported is the waveOut API - look for functions starting with waveOut in MSDN. A slightly newer one is DirectSound which is geared more towards games, but it's main feature over waveOut is positional 3D sound which professional audio software doesn't use (it was also supposed to have lower latency than waveOut, but that never really materialized). For low latency audio, there is ASIO. Professional audio apps support this API, but not all drivers do (it's a standard feature in professional sound cards, but not gaming or on-board hardware). ASIO can provide much lower latency than waveOut or DirectSound. Finally, there's the kernel streaming interface, which is the lowest-level audio interface still accessible from user-mode code. This is a direct pipe into Windows's internal mixer which combines output from all apps that are currently playing sound into the signal that gets sent to the sound card. It's scarcely documented though. There's a driver called ASIO4ALL (just google it) that provides ASIO support on soundcards without ASIO drivers by implementing the ASIO API on top of the kernel streaming interface.
I'm a little late to the game here, but I posted a Windows API history last week that might add a little more context. The choice of API really depends on your needs. If you want to avoid 3rd party libraries, it really only comes down to MME, XAudio2, and Core Audio (WASAPI).
A Brief History of Windows Audio APIs
Hope this helps!
Actually, if you are looking for more than Windows-only output support, then the best way to start is to review Phil Burk's PortAudio, available as of this writing at http://www.portaudio.com/ .
ASIO is a good quality interface, but it's proprietary and owned by Steinberg.
There are many lower-level interfaces to audio output than MCI in modern Windows. These include, at least, DirectSound, XAudio and WASAPI.
I recommend avoiding the Windows APIs as much as possible, and learning PortAudio instead.

Resources