Is G729 codec 32kbps or 8 kbps? - voip

I'm building a VOIP app for iphone and android. I'm currently using the GSM codec ( I chose it arbitrarily) on both versions of my app and on my asterisk server.
Now that I'm fine tuning my app, I'd like to try different audio codecs. I'm considering G729. I did a research and wasn't sure why some sites say the G729 codec uses about 32kbps as in this site here
http://voip.about.com/od/voipbandwidth/f/How-Much-Of-My-Mobile-Data-Plan-Does-Voip-Consume.htm
while others say it is 8kpbs like this site here
http://www.javvin.com/protocolG7xx.html
I did some tests and it seems that 1 minute of conversation with the G729 codec uses up 0.5 mb of data. So it seems like the first link is correct. But i've seen other sites list similar stats of 8kbps...why the discrepancy?

If you look towards the bottom of the first link you show, it hints at the reason - the 8kbps is how much is used to encode the speech itself. You then need to send that encoded speech out over the network to the other end of the VoIP call, and hence need to pack it into an IP 'packet', typically using the RTP protocol.
The actual number of bits transmitted will depend on the number of samples taken per second, the number of samples packed into each IP packet, the protocol headers etc. Much of this is influenced by the codec chosen - the following link gives a good overview (see the table in the section titled 'VOIP - Per Call Bandwidth'):
http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml

Related

String compression to refresh WS2811 RGB LEDs faster

I have the following problem. I am using WS2811 diodes, Arduino Due and node.js to my project. I want to stream video from a device connected to a node.js server and show it on array of diodes. Right now I am able to capture video from any device with browser and camera, change resolution of the video to this desired by me (15x10) and create String chain containing informations of all colors (R,G,B) of all diodes. I am sending it from node.js server to arduino though serial port with baud rate 115200. Unfortunately sending process it is too slow. I would like it to refresh the LED array at least 10 times per second. So I was wondering maybe to compress this string which I am sending to arduino, when it gets there decompress it, and set colors to diodes. Maybe you guys have some experience with similar project and advice me what to do.
For handling diodes I am using adafruit_neopixel library.
If I were you I would try to convert the video to a 16-bit encoding (like RGB565), or maybe even 8-bit, on your server.
Even at that low resolution I'm not certain the atmega328p is powerful enough to convert it back to 24-bit and send the data out to the display, but TIAS. If it doesn't work, you might want to consider switching to a BeagleBone or RPi.
If you have large areas of a similar colour, especially if you have dropped your bit depth to 16 or 8 bits as suggested in the previous answer, Run Length Encoding compression might be worth a try.
It's easy to implement it in a few lines of code:
https://en.wikipedia.org/wiki/Run-length_encoding

Streaming audio from avconv via NodeJs WebSockets into Chrome with AudioContext

we're having trouble playing streamed audio in a browser (using Chrome).
We have a process which is streaming some audio (for example an internet radio) on udp on some port. It's avconv (avconv -y -i SOMEURL -f alaw udp://localhost:PORT).
We have a NodeJs server which receives this audio stream and forwards it to multiple clients connected via websockets. The audio stream which NodeJs receives is wrapped in a buffer which is an array with numbers from 0 to 255. The data is sent to the browser without any issues and then we're using AudioContext to play the audio stream in the browser (our code is based on AudioStreamer - https://github.com/agektmr/AudioStreamer).
At first, all all we got at this point was static. When looking into the AudioStreamer code, we realized that the audio stream data should be in the -1 to 1 range. With this knowledge we tried modifying each value in the buffer with this formula x = (x/128) - 1. We did it just to see what would happen and surprisingly the static became a bit less awful - you could even make out melodies of songs or words if the audio was speech. But it's still very very bad, lots of static, so this is obviously not a solution - but it does show that we are indeed receiving the audio stream via the websockets and not just some random data.
So the question is - what are we doing wrong? Is there a codec/format we should be using? Of course all the code (the avconv, NodeJs and client side) can be modified at will. We could also use another browser if needed, though I assume that's not the problem here. The only thing we do know is that we really need this to work through websockets.
The OS running the avconv and NodeJs is Ubuntu (various versions 10-13)
Any ideas? All help will be appreciated.
Thanks!
Tomas
The conversion from integer samples to floating point samples is incorrect. You must take into account:
Number of channels
Number of bits per sample
Signed/unsigned
Endianess
Let's assume you have a typical WAV file at 16 bit stereo, signed, little-endian. You're on the right track with your formula, but try this:
x = (x/32768) - 1

play audio file over VoIP

I want to implement a simple VoIP system which can achieve following;
The user uploads a mp3 or wav file and gives a phone number.
the given phone number is dialed, when the phone is picked, the uploaded mp3/wav file is played. once the whole file is played, the call is hung up.
i want to know if there is any opensource library which supports this?? or an opensource software using which i can achieve this?
I do similar testing as this for my job.
I have
a test framework on my box in my office using Freeswitch and I've created some users with passwords on the FreeSWITCH box.
Then I use a sip testing tool / client to manage the connection to the sip proxy, to another user.
For example... say my freeswitch is ip: 120.0.0.7
I am registering on that freeswitch as user 5000 and i want to call user 4000 who is also registered.
I use either SIPP (linux) or SIPCLI (windows.)
SIPP
The benefits of SIPP is that it's truly robust and can do a myriad of performance testing, and what not. But ot send audio it's a bit challenging, but it's doable. you're basically sending pcap's of recorded audio in some codec (g711, g729, etc.) so you run a command like:
sudo sipp -s [the phone number/ user] [your freeswitch] -sn uac_pcap -mi [your ip] -l 1 -m 1
The last two parameters (l and m) set how much load, by default sipp will send 10calls per sec. you prob dont want that. so l says "limit the calls to #" and m says "only run x calls at a time."
SIPCLI
The much easier method is sipcli (but it's a windows only tool.)
In sipcli, you basically can send a wav file, as well as text to speech. I love it. it has a built in library that will dial the number and you could pass something like -t "This is a test of the test harness for sip and v o i p testing." it would convert that to audio on the call, on the fly. you can also build out scenarios that point to wav files you've recorded....
SIPCLI would use a command like SIPP to connect:
sipcli [user/phone number] -d [domain or proxy] -t "This is text i want converted to speech on the phone call"
you could also pass in a link to a wav.
sipcli can also send dtmf tones, or you could point to wav's of dtmf tones.
the scenario editor is a bit complex at first, and takes a bit of getting used to. But once you get the hang of making scenario files, it's pretty easy.
Benefits of SIPP
SIPP can capture performance metrics (the over all time in ms between your configured start and end point)
SIPP can drive thousands of calls at your desired end
SIPP can ramp up calls or ramp them down on the fly
SIPP can generate statisics and csv files for analysis
SIPP scenarios you write are building the packets themselves. So you have more control over what your packet sends on the INVITE.
SIPP is open source
Negatives of SIPP
SIPP can NOT send a wav file
SIPP can NOT generate it's own dtmf tones (it uses pcaps, which can be problematic)
SIPP can NOT generate text to speech
SIPP is somewhat complicated to get going
Benefits of SIPCLI
SIPCLI can convert text to speech on the fly
SIPCLI can use recorded wav's to send to the recipient
SIPCLI is easy to use
SIPCLI can also act as a reciever (i.e. an IVR playing a greeting and taking input)
SIPCLI has some logic to validate data received (like user pressed #3, then #4.)
Negatives of SIPCLI
SIPCLI doesn't let you have access to the SIP headers it sends (so less control over the test)
SIPCLI doesn't do load or performance metrics
SIPCLI's editor is kinda difficult at first, but it's not as hard as learning SIPP's advanced features
SIPCLI is NOT opensource.... it's trial is 90% useful. To get the other 10% (longer phone calls) you need to purchase it at $70.
I've also tried other tools like PJSua, but these two are my bread and butter for testing the scenarios you are talking about.
Regarding the Framework/softwsitch/proxy... I use Freeswitch.
Yes You can use Asterisk, Freeswitch ( My personal preference) Or a number of other platforms similar to this.
Once you have freeswitch setup, check out this link to get it going:
http://wiki.freeswitch.org/wiki/Javascript_QuickStart
use ivrworx for simple testing
see streamer example.

Debug NAudio MP3 reading difference?

My code using NAudio to read one particular MP3 gets different results than several other commercial apps.
Specifically: My NAudio-based code finds ~1.4 sec of silence at the beginning of this MP3 before "audible audio" (a drum pickup) starts, whereas other apps (Windows Media Player, RealPlayer, WavePad) show ~2.5 sec of silence before that same drum pickup.
The particular MP3 is "Like A Rolling Stone" downloaded from Amazon.com. Tested several other MP3s and none show any similar difference between my code and other apps. Most MP3s don't start with such a long silence so I suspect that's the source of the difference.
Debugging problems:
I can't actually find a way to even prove that the other apps are right and NAudio/me is wrong, i.e. to compare block-by-block my code's results to a "known good reference implementation"; therefore I can't even precisely define the "error" I need to debug.
Since my code reads thousands of samples during those 1.4 sec with no obvious errors, I can't think how to narrow down where/when in the input stream to look for a bug.
The heart of the NAudio code is a P/Invoke call to acmStreamConvert(), which is a Windows "black box" call which I can't think how to error-check.
Can anyone think of any tricks/techniques to debug this?
The NAudio ACM code was never originally intended for MP3s, but for decoding constant bit rate telephony codecs. One day I tried setting up the WaveFormat to specify MP3 as an experiment, and what came out sounded good enough. However, I have always felt a bit nervous about decoding MP3s (especially VBR) with ACM (e.g. what comes out if ID3 tags or album art get passed in - could that account for extra silence?), and I've never been 100% convinced that NAudio does it right - there is very little documentation on how exactly you are supposed to use the ACM codecs. Sadly there is no managed MP3 decoder with a license I can use in NAudio, so ACM remains the only option for the time being.
I'm not sure what approach other media players take to playing back MP3, but I suspect many of them have their own built-in MP3 decoders, rather than relying on the operating system.
I've found some partial answers to my own Q:
Since my problem boils down to consuming too much MP3 w/o producing enough PCM, I used conditional-on-hit-count breakpoints to find just where this was happening, then drilled into that.
This showed me that some acmStreamConvert() calls are returning success, consuming 417 src bytes, but producing 0 "dest bytes used".
Next I plan to try acmStreamSize() to ask the codec how many src bytes it "wants" to consume, rather than "telling" it to consume 417.
Edit (followup): I fixed it!
It came down to passing acmStreamConvert() enough src bytes to make it happy. Giving it its acmStreamSize() requested size fixed the problem in some places but then it popped up in others; giving it its requested size times 3 seems to cure the "0 dest bytes used" result in all MP3s I've tested.
With this fix, acmStreamConvert() then sometimes returned much larger converted chunks (almost 32 KB), so I also had to modify some other NAudio code to pass in larger destination buffers to hold the results.

How does youtube support starting playback from any part of the video?

Basically I'm trying to replicate YouTube's ability to begin video playback from any part of hosted movie. So if you have a 60 minute video, a user could skip straight to the 30 minute mark without streaming the first 30 minutes of video. Does anyone have an idea how YouTube accomplishes this?
Well the player opens the HTTP resource like normal. When you hit the seek bar, the player requests a different portion of the file.
It passes a header like this:
RANGE: bytes-unit = 10001\n\n
and the server serves the resource from that byte range. Depending on the codec it will need to read until it gets to a sync frame to begin playback
Video is a series of frames, played at a frame rate. That said, there are some rules about the order of what frames can be decoded.
Essentially, you have reference frames (called I-Frames) and you have modification frames (class P-Frames and B-Frames)... It is generally true that a properly configured decoder will be able to join a stream on any I-Frame (that is, start decoding), but not on P and B frames... So, when the user drags the slider, you're going to need to find the closest I frame and decode that...
This may of course be hidden under the hood of Flash for you, but that is what it will be doing...
I don't know how YouTube does it, but if you're looking to replicate the functionality, check out Annodex. It's an open standard that is based on Ogg Theora, but with an extra XML metadata stream.
Annodex allows you to have links to named sections within the video or temporal URIs to specific times in the video. Using libannodex, the server can seek to the relevant part of the video and start serving it from there.
If I were to guess, it would be some sort of selective data retrieval, like the Range header in HTTP. that might even be what they use. You can find more about it here.

Resources