Extracting and injecting audio to an ongoing VOIP call - voip

I am very new when it comes to VOIP and integrations with VOIP systems.
Here is what I am trying to do:
A caller calls in and operator answers the call.
1.1. Start streaming audio of caller to an analysis service in cloud.
Once the audio analysis is performed (generally in a few seconds), operator will press the "Hold" button to perform an action suggested by the analysis.
2.1. Depending on the result of analysis, play a particular audio file back to the caller to let them know that the operator is doing "x," "y," or "z" while on hold.
Given my non-experience working with VOIP systems, I am looking for any suggestions / pointers to topics, areas, articles, technologies that can point me to the right direction.

I could give some general point of view. I would be assuming the SIP-based VOIP which is actually pretty omnipresent (IMS, LTE, 3GPP, etc.).
The VOIP has two parts that you might have spotted while searching:
SIP (the control plane)
RTP (the data or payload plane = audio)
In general, there are two approaches the one comes from a peer-to-peer world where every change in media flow is communicated to the other party with REFER doing actually call transfer for any purpose. But that is usually not a prefered way of doing things. Here comes the second approach which is kind of hiding whatever changes on the B-party (called party) side. Such thing is used also in IMS (which is behind the modern GSM networks). The trick is that the A-party (caller) actually reaches the B-party proxy. In terms of SIP, it is B2BUA aka back to back user agent. Which as the name suggests it covers all the magic that happens in the called party network.
The magic is then actually hidden behind that B2BUA which actually behaves as an entity in the middle and thus can manipulate both SIP and RTP.
Therefore this entity can actually fork the audio using an MGW (media gateway) towards the "real" B-Party (a human/operator) as well as directing the audio to the ML/AI/Expert System analysis. This process also incorporates an appropriate control plane events like starting the analytic process attach, actual audio forking (RTP) and also triggering the SIP INVITE for final B-party. Whenever the analysis is concluded then out of band messaging to some "rich" client at the SIP Agent (computer/tablet with SoftPhone) or some CRM system attached to the call centre system. Such a message should inform the B-Party about the result of the analysis.
All the magic is hidden either inside the B2BUA or eventually inside SIP application server which is a generic name for various services like call distribution to call centre agents, voice mail, IVR, etc.
The voice analysis is today used at banks for caller verification, mood analysis and many "smart" audio processing.
In that domain, there are some opensource and proprietary SIP systems. They tend to be somehow complex. And moreover, the logic is pretty different compared to request-response systems (like HTTP). The call is a stateful system with "session" (call ~ Call-ID) and everything is bound to that.
Hope that this can help you.

Have you considered using an API based VOIP provider like Plivo?
The realtime streaming part of your use case might be difficult but I bet you could find a decent work around. I used to work there as a solutions engineer so I'm pretty familiar with the APIs. Feel free to message me if you have any questions.

Related

How to securely distinguish traffic from my app and browser traffic

I'm designing a game that makes queries to a database on the web. The database is fronted by a web service. For example, a request could look like this:
Endpoint: "server.com/user/UID/buygold"
POST:
amount: 100
The web service would make sure that userid has enough funds to purchase 100 gold, then would return a Boolean answer based on the success of the transaction.
However, I want to limit the amount of scripting someone could possibly do to automate gameplay. For example, they could figure out their userid and have automated tasks that buy gold for them while they are at work.
On the web services side, what are some sound security measures that I can put in place to decline all but real app traffic. Is there also a way to trump reverse engineers who will take the app apart and look for keys/certs?
I hope that this is not for a production environment, the security implications alone are mind boggling and certainly go beyond the scope of the allowed response, as it would require a rather lengthy and in depth list of requirements and recommendations, that are really contingent on many other factors that make up your web-service environment. For example, the network topology, authentication, session control and management, and various other variables all play important roles in fostering and implementing a sound cyber security counter measure.
However, assuming that you have all that taken care of, I will answer your main questions as follows:
For question:
"what are some sound security measures that I can put in place to decline all but real app traffic"
Answer:
This is one of many options out there that would address your particular concern, and that is to check for the client User-Agent header in the the request, which may appear something like this:
"Mozilla/5.0 (Macintosh; Intel Mac OS X 32.11; rv:49.0) Gecko/20122101 Firefox/42.0"
Depending on how the script is being run to automate gameplay, if it is in the form of a browser extension, then the User Agent really plays very poorly as a counter measure, if on the other hand, the script is being run directly from the client to your server (web-service), then, you can detect it right away, and there are ways to detect if someone spoofed a User Agent just to bypass this counter measure.
Another counter measure you can utilize is session management at the client level. So, this would require an architectural overview of how you implemented your particular project, but a general summary would follow a pattern like this:
Customer/GamePlayer would naturally be required to login (authentication of some sort)
The client system (which is the User Interface) that the game player is using, will have counter measures implemented in a front-end type of scripting language, i.e. Javascript or any framework that makes use of JS, such as jQuery, DoJo, etc.
Register event handlers that monitor actions, such as type of input, some Boolean logic that will follow something like this:
"if input is not from keyboard or mouse, then send flag with request"
The server/web-service will have logic to handle this request appropriately. This would be a way to catch/detect the game player after they commit the violation, used for legal reasons and such to profile evidence and such. If on the other hand you want to prevent that from happening, then, you could have some Boolean logic that goes something like this: "if input is not from the keyboard or mouse (or whatever permitted input device), then do not allow action (GamePlay), and still report back to web-server".
There are a dozen other ways, but this one seems to generally address your question, provided that you take into consideration that there are hundreds of other factors to think about, from networking level all the way to the application layer, and down to what pattern are you using for your web-service, such as whether it's a REST/API type of environment, does it follow an MVC pattern, and so on. There is no silver lining when it comes to cyber security, it's really a proactive and constant initiative on your end to ensure that all stakeholders' assets are protected, in this case, the asset is the web-service and gameplay, and the threat is the risk of gameplay tampering, that would affect the integrity of your game.
Now, regarding your second question:
" Is there also a way to trump reverse engineers who will take the app apart and look for keys/certs?"
When reverse engineers really put their mind to it, there is nothing you can really do, as whatever counter measure you may implement, they will find a workaround, that's why it's called reverse engineering, they will reverse engineer your "counter measure", so, not to be all cynical about it, you have to accept the reality that there is really no such thing as a "trump" counter measure when it comes to cyber security. You can however employ various mechanisms at both, the network layer and all the way to the application level, combined with proactive initiatives, intrusion detection, abnormal behavioral characteristics of the gameplay pattern, all will mitigate your risk; with all that said, your final frontier will be to ensure you have a good legal policy in place in your TOS (terms of service), and depending where you're hosting your web-service (geographically), you will be protected when users violate such terms, especially when you have verbiage that precludes users from attempting to reverse-engineer, tamper with gameplay scoreboards or currency, and so on.
Another good way, is to really connect with users, users are people, and people sometimes forget that they are also affecting others by their actions, so, once a user is aware of how his/her actions may negatively affect others, such as in the case of a user's actions to increase their 100 Gold, they may financially and emotionally affect others who may have put real time and effort into making this service even possible, so, a simple introductory welcome video upon signing up can do wonders for example; however, sometimes the user may not really know that they were prohibited from using auto-scripts for gameplay, or at least can raise an affirmative defense of that, so, having well published policies can really mitigate and potentially reduce altogether these types of risks. Despite the potentially optimistic outlook on users, you still have to exercise good programming practices and have security counter measures in place though.
I hope that I have given you some insight and direction to assist you with this matter, and that as you can see, it is really an involved and can be a very complicated type of process.
Good luck with your initiatives.

How can a server detect an invalid client

How can a server, i.e. a remote host acting as a central service for multiple clients, detect malicious or invalid clients akin to Blizzard's Warden. In some way, these kinds of software ask a client for specific information every once in a while, which cannot be easily faked from a non-official client.
What I'm wondering is, how can such a mechanism be implemented so that it's hard or impossible to reverse engineer from the client side? Is there any such technique for open source client software (closed source server)?
Short answer: You can't. The client is fundamentally untrustable. Blizzard (and other purveyors of anti-cheat software) are engaged in a constant arms race with the cheaters. You can't just implement it once and be done with it; you have to constantly monitor your product (either heuristically or via player reports) for cheating, then figure out how to programmatically evaluate if someone is cheating.
The longer answer is that you keep your "secret sauce" detection off the client; the client instead just collects information, which it forwards to a trusted machine for analysis. This can make it harder for cheaters to avoid detection, since they only know what information is being collected, not what is being done with it. Eventually though, they'll figure out how to spoof that information, and your anti-cheat mechanism will need to then deal with that problem.
What you can do is implement heuristics in your server code to detect players who are sending inputs that should not otherwise be possible, and then flag those accounts for review or ban. This does nothing detect malicious software on a client, but it can detect the effects of that malicious software. So while you may not be able to pinpoint what is sending those invalid inputs, you can still act no the account.
More specifically to your question, though, it's impossible to give you examples, because you have to define what constitutes "cheating" in the context of your application, and then device methods for detecting it. This is a very domain-specific problem, and to make it more complex, you're unlikely to find open-source implementations of such systems, because they necessarily rely on obscurity to detect cheaters.

Asterisk and voip: which software, which professionals, which facilities

Consider that I don't know anything of asterisk, so one of my questions is who we are the main actors we know to be aware of in order to start this project.
Basically we want to create a bot (well, asterisk) that is able to call the users phones, have a short conversation with them where each line pronounced by the system (they'll be audio file) depends on the previous answer of the user (speech recognition, in fact we need to intercept the audio stream and pass it to a 3rd party speech recognition engine) and some logic that can be handled by an external module. Saying that the requirements are up to 200 concurrent conversations, and that the conversations will take place in the USA only, what services should we buy? One VOIP provider, one hosting solution for asterisk. How difficult is it to write the asterisk configuration for such a project?Thank you
Can you help me to separate the actors in such a project: professionals, software, facilities?
1) Dedicated server. But for diallout 200 calls need very hi end server. I think you will got 100 on usual server if got nice 2)
2) Dialling software/core
3) Call managment software - you need write it.
4) voice recognition. If it on your server, i not think it will work with more then 20-30 channels.
5) voip account for dialout(most provider NOT allow do automated dialout for marketing purpose).
most problematic is voice recognition - unlikly you will got quality of recognition if more then yes/no answer. reason: telephony use 8khz sounds, not enought quality for recognition.
Also unlikly you will got 200 channels on one server, so will need clustering=clustering expert or hi cost voip expert.
In general, if you got recognition, all other is doable, cost of development will be 1~100k depend of features.
I actually would suggest that you use Tropo for this (https://www.tropo.com/). The rates are reasonable, you can develop in your favorite language, they handle the massive infrastructure you'll require and its got a top-drawer TTS/STT engine built in.

Streaming Audio with Java

I am building an application which collects speech via microphone as wav files. These recordings need to be streamed to a server and saved (as wav files, I know they are big but they have to be wav). I also need to stream audio (these can be mp3) from the server to the web application to be played for the user. I have no idea how to implement this, but I would like to use a Java EE application because I am familiar with Java and it's easier to maintain than Flex (we are having trouble with old Flex code at work). My concerns are:
How do I buffer the transmission so that users hear the whole file without breaks? Transferring the whole file and then playing it is fine, too, but knowing how to do this would be nice.
How do I verify transmissions to the server? Can I send in packets and verify/resend per packet?
Are there existing APIs for this (please!) or do I have to write this all by hand?
As I commented on your question, it is unclear whether you have already decided upon which components exist in the topology. In particular, it is unclear whether you already have a server process in charge of storing those audio files. Therefore, I will have to make a few assumptions in my answer. Feel free to comment, and I'll try my best to adjust.
The only way to ensure that an audio file is played (by the end user) without network-induced breaks is to have the end-user (or an application running at the end-user's side, such as some JavaScript code) play the audio stream after it was downloaded in its entirety. Unless you do that, you can only reduce the risk of breaks; you cannot eliminate it. Even the most sophisticated buffering algorithm cannot cope with a network outage 99.99% into buffering the entire stream. As I am not sure whether you have a client-side application involved in this, I can't advise how to force the client-side to download the entire file rather than playing it "as it comes"; in the simplest case, you might be able to suffice with using the Content-Disposition header: http://en.wikipedia.org/wiki/MIME#Content-Disposition
The answer to this question, again, depends on how you architect the solution. In general, though, as long as you use standard stream API's (such as Java IO), I wouldn't worry too much about verifying the content for errors. Error-correction is already provided lower in the networking stack (for example, your operating system's networking driver).
Apache Commons' File-Upload might be useful - again, depending on your architecture: http://commons.apache.org/fileupload/

What protocol should I use for fast command/response interactions?

I need to set up a protocol for fast command/response interactions. My instinct tells me to just knock together a simple protocol with CRLF separated ascii strings like how SMTP or POP3 works, and tunnel it through SSH/SSL if I need it to be secured.
While I could just do this, I'd prefer to build on an existing technology so people could use a friendly library rather than the socket library interface the OS gives them.
I need...
Commands and responses passing structured data back and forth. (XML, S expressions, don't care.)
The ability for the server to make unscheduled notifications to the client without being polled.
Any ideas please?
If you just want request/reply, HTTP is very simple. It's already a request/response protocol. The client and server side are widely implemented in most languages. Scaling it up is well understood.
The easiest way to use it is to send commands to the server as POST requests and for the server to send back the reply in the body of the response. You could also extend HTTP with your own verbs, but that would make it more work to take advantage of caching proxies and other infrastructure that understands HTTP.
If you want async notifications, then look at pub/sub protocols (Spread, XMPP, AMQP, JMS implementations or commercial pub/sub message brokers like TibcoRV, Tibco EMS or Websphere MQ). The protocol or implementation to pick depends on the reliability, latency and throughput needs of the system you're building. For example, is it ok for notifications to be dropped when the network is congested? What happens to notifications when a client is off-line -- do they get discarded or queued up for when the client reconnects.
AMQP sounds promising. Alternatively, I think XMPP supports much of what you want, though with quite a bit of overhead.
That said, depending on what you're trying to accomplish, a simple ad hoc protocol might be easier.
How about something like SNMP? I'm not sure if it fits exactly with the model your app uses, but it supports both async notify and pull (i.e., TRAP and GET).
That's a great question with a huge number of variables to consider, and the question only mentioned a few them: packet format, asynchronous vs. synchronized messaging, and security. There are many, many others one could think about. I suggest going through a description of the 7-layer protocol stack (OSI/ISO) and asking yourself what you need at those layers, and whether you want to build that layer or get it from somewhere else. (You seem mostly interested in layer 6 and 7, but also mentioned bits of lower layers.)
Think also about whether this is in a safety-critical application or part of a system with formal V&V. Really good, trustworthy communication systems are not easy to design; also an "underpowered" protocol can put a lot of coding burden on application to do error-recovery.
Finally, I would suggest looking at how other applications similar to yours do the job (check open source, read books, etc.) Also useful is the U.S. Patent Office database, etc; one can get great ideas just from reading the description of the communication problem they were trying to solve.

Resources