I'm currently developing a Cydia tweak about speaker recognition on iPhone. This tweak can identify if the current user is the phone owner (after training). This tweak has already be implemented on Android and we have already compiled and tested the core library. The only difficult that we are facing is how to capture the audio data from Siri. We have tried:
hooked function "- (void)_tellSpeechDelegateRecordingWillBegin" and "- (void)_tellSpeechDelegateRecordingDidEnd" and used AvAudioRecorder to record the audio - failed because all the AvAudioSession will be interrupted when Siri is recording.
hooked function "- (void)startSpeechRequestWithSpeechFileAtURL:(id)arg1". This function seemed to be something related to the audio file but we could then get the function hooked with Logos tweak framework.
There are two possible ways we are considering:
Implement a low level audio recorder that can bypass Siri's interruption. (Something like a call recorder.)
Implement a Http(s) proxy server inside iPhone and capture the requests which forwards to the Siri's server.
But we have little experience for those options. Does anyone have ideas to capture the audio from Siri (by the phone but not through a external server)
Update (Feb 12 2014)
Check this.
I found there was a class named "AFSpeechRecorder". It was used in Siri. I guess it must be related to the audio data. But unluckily, this class is removed in the iOS 7. Can't get any idea about the changes.
Related
I'm trying to make my windows computer a valid output for bluetooth audio from my phone. Enabling the actual audio was easy enough using the winrt AudioPlaybackConnection, but I'm trying to get metadata working and running into dead ends in the Windows UWP documentation. I'm familiar with the MediaPlayer class, but I can't see how to set the source to the AudioPlaybackConnection. My next thought was to create a MediaPlayer and handle the controls/metadata myself, but I can't see how to access the metadata for the AudioPlaybackConnection either. I tried getting the BluetoothDevice matching the same phone since I see the properties for the actual device list AVRCP Transport and A2DP SNK as two separate hardware "devices" making up the phone device, but I have no more luck accessing metadata with the BluetoothDevice. I know Windows 10 supports Bluetooth's AVRCP and can handle metadata/controls (source), but I'm beginning to think it's under a different device in winrt and I don't have the winrt know-how to track it down.
I've consulted the Bluetooth team about this. But currently, control like this is not supported in Windows at this time. You could submit a feature request about this in the Feedback Hub. Please select Developer Platform->API Feedback as the category when you submit your request. The related team will check the request.
I'm trying to understand the best possible ways (technically and from a user experience) point of view to test the user's camera, microphone, and speakers. Or does it really come down to letting the user select an output for each and testing them manually, i.e.:
I see my self in the camera so it's working
My mic works b/c there's a visual indicator that tells me it's
picking up sound
My speakers work because there's a visual indicator that moves when I
talk
Thanks!
- Jess
Assuming your app is network connected, its often the case that people will supply some sort of 'test call' functionality.
This allows your user make a call to a server which can verify that audio and video is received, and send back both audio and video to the user so they can observe that they are reaching them correctly.
I'm trying to create an interactive voice-tree for an art project. Think of something like a choose-you-own-adventure, but on the phone and with voice commands. I already have a fair amount of experience working with Construct 2 (game-making software), and can easily build a branching, voice controlled interaction loadable through a modern browser with it. For reasons relevant to the overall story, I need players to connect to the interaction through a Google Voice number they will call.
I already have a GV number and have written an AutoHotKey script to auto-answer the Hangouts call, but I'm stuck trying to route the audio from the caller in Hangouts to the browser AND the audio response output of the browser back to the caller.
I know of an extremely primitive way to accomplish this, [which I've illustrated with this diagram:
Unfortunately, this is rather cumbersome and I suspect I can achieve my goal through virtualization or at the VERY least some sort of attenuation cables between two physical machines (I tried running a generic AUX cable between two laptops, but couldn't get speaker audio to go into microphone audio from one to the other).
I've been experimenting on Parallels running Windows 8.1 with Virtual Audio Cable(no luck), JACK(too robust), Chevolume(too limited), and IndieVolume(too limited).
I suspect VAC would be the best bet, but I can't seem to find a way to route Firefox audio output to a microphone input which directs to Chrome and vice versa. If I try accomplishing it all through just one virtual machine I have to use two different browsers for the voice-tree webpage and Hangouts call since Hangouts pushes its audio through Chrome (even the stand-alone application).
Is there any way to route microphone input and speaker output separately between two virtual machines? If not, could I still try and accomplish this with a specific type of cables between two laptops running windows 7/8 that have generic audio jacks?
While working on voip apps, I usually end up picking up one phone, talking to it, picking up the other phone and check if I hear myself. This even gets trickier if I'm doing apps with three way calling.
Using a softphone doesn't help.
Ideally, I want to be able to run multiple instances of some command line based sip ua wherein i can dial a number. Once the ua has dialed and the other party ha picked up, both agents exchange audio. But instead of having to hear some audio, the apps instead display some text which identifies the other end. Possibly some frequency pattern that can be converted to text. Then this text is displayed on the app.
Can something like this be done? I'm creating apps against freeswitch. Ideas how to debug voip apps are also welcome in the comments.
yes, absolutely. The easiest would be to have a separate FreeSWITCH server that is used for placing the test calls and sending/receiving your test signals.
tone_stream will generate the tones at frequencies that you need: https://freeswitch.org/confluence/display/FREESWITCH/Tone_stream
tone_detect can detect the frequencies and execute actions, or even better, generate events that you can catch over an ESL socket: https://freeswitch.org/confluence/display/FREESWITCH/mod_dptools%3A+tone_detect
The best way to generate such calls is to use a dialer script which communicates to FreeSWITCH via Event Socket. Here you can see some (working) examples that I made with Perl:
https://github.com/voxserv/rring/blob/master/lib/Rring/Caller/FreeSWITCH.pm -- this is a part of a test suite tat I build for testing a provider's SIP infrastructure. As you can see, it connects to FreeSWITCH, starts event listener, and then originates the call and also expects an inbound call. It then sends and analyzes DTMF.
https://github.com/voxserv/freeswitch-helper-scripts/tree/master/esl -- these are special-purpose dialers, you can also use them as examples.
https://github.com/voxserv/freeswitch-perf-dialer -- this one generates a series of calls, like SIPp does.
Another technique is to play a sample audio file and record the audio being received on the other end[call recording] and then comparing the two. This system works on setup where systems are located at various places and you are testing end to end quality.
There are lot of Audio Comparison tools [like PESQ] should help you not just detect the presence of Audio but also give stats about the degradation of various parameters in the audio stream.
This can be extended to do test analysis of FS patches as and when they are released and also for other hooks or quality standards you want to enforce.
I was wondering whether it is possible to capture audio data from other sources like the system out, FM radio, bluetooth headset, etc. I'm particularly interested in capturing audio from the FM radio and already investigated all possibilities including trying to sniff the raw bluetooth communication between the phone and the radio device with no luck. It's too bad Android only allows recording audio from the MIC.
I've looked at the Android source code and couldn't find a backdoor to allow me to do that without rooting the device. Do you, at least, have any idea how to use other devices (maybe access somehow /dev/audio) say via NDK or even better - Java (maybe Reflection?) to trick the system to capture the audio stream from say, the FM radio. (in my case I'm trying to develop the app for the HTC Desire)
PS. And for those of you who are against using undocumented APIs, please don't post here - I'm writing an app that will be for my personal use or even if I ever publish it I will warn the user of possible incompatibilities.
I've spent quite some time deciphering the audio stack, and I think you may try to hijack libaudio. You'll have trouble speaking directly to the hardware (/dev/*) because many devices use proprietary audio drivers. There's no rule in this regard.
However, the audio hardware abstraction layer (HAL) provided by /system/lib/libaudio.so should expose the API described at http://source.android.com/porting/audio.html
The Android system, and especially audioflinger, uses this libaudio HAL to find available devices, deal with routing, and of course to read/write PCM data.
So, you could hijack the interaction between audioflinger and libaudio, by renaming the later, and providing your own libaudio which decorates the real one. Doing so, you should be able to log what happens and very possibly intercept FM radio output, provided that this is not directly handled by the hardware.
Of course, all this requires rooting. Please comment if you manage to do this, that interests me.