Recognize specific ringtone - audio

What I want is to be able to get a signal at my raspberry pi at home when I'm not at home so I can e.g. wake up my PC. I always have an old phone lying around that I never really use. So I thought, I can call my phone, a specific mp3 ringtone plays, my raspberry pi listens and recognizes the ringtone and therefore the signal. So I can pretty much chose whatever ringtone I want (but hopefully a not too long one). But the problem is, that it should be recognizable by the raspberry and it should be distinguishable from other sounds. At best I can play random music at home and it will not get signalled until it's the specific ringtone i chose.
So I'm at the very beginning of the project and I have a lot of question. Is this even feasible? How do I listen to the ringtone? Should I use a normal microphone or could I e.g. trigger some gpio pin as long as a specific frequency is played? What kind of ringtone should I use to be as distinguishable as possible? And how to create the software to recognize the sound?
I know this is a lot and I don't expect a step by step solution. But maybe you got some hints to get me in the right direction?

If someone has a similar problem, I found a solution: First I had to choose between a mostly hardware solution and a mostly software solution. The hardware solution is to filter specific frequencies. This seems to be pretty hard using normal band-pass filters if you want narrow bands. There are also components that can do that, now I know of the NE567. But this component only reacts to one frequency and takes quite a lot of energy. To recognize a ringtone, more of these components are needes which means more power consumption. Additionally this solution is pretty unflexible.
So I went for the software solution. Now I have an Arduino Uno that gets an amplified electret microphone signal at an analog input pin. The data is collected and simultaneously analysed with an FFT algorithm. Then I check the dominant frequency if there is any and safe it in an array. Everytime a got a new data point I compare the array with the pattern of my ringtone and calculate a score for the match. If the score is big enough the ringtone is "found" and I can trigger my event.
I'm actually pretty pleased with the solution because it works quite well even with the phone some feet away from the microphone. I thought I need to put the microphone almost directly next to the phone to get good results, but I dont have to. It's still a little sensitive, because the sound volume shouldnt be too high or to low. But with the right volume settings it works with a quite big area when the phone is in the same room. It works even better with some space between microphone and phone, because the phones radiation from the call seems to disturb the circuit quite a lot. There is also the problem, that other noises block the ringtone recognition. I could compensate that with my algorithm, but I almost used up all resources of the Arduino, so I had to keep the algorithm simple. But in my case I dont have a noisy environment, so this is not a problem for me. Another pro is that my event was never triggered from another sound and it seems almost impossible that this could happen by accident.
So it is feasible and I think its actually a quite elegant solution. I also thought about a vibration detection or even directly using the vibration motor's signal but I have no control over the vibration function of that old phone. But I can chose the ringtone for every contact, so I only gave the "magic" ringtone to myself and so the event can only be triggered by myself. I only have to say, that writing the software was kind of hard with the Arduinos limitations. Because I need the data in real time I have limited time for the calculation. I had to limit the incomping data and therefore I can only listen to frequencies up to 10kHz. But the ringtone recognition is still possible and I think it was worth the effort. :)


Need advice on hardware stack for Wireless Audio solution

Good day!
Problem definition:
Current implementations of Bluetooth does not allow to simply support good quality of Audio(Earphones mode) and 2-way audio transition (Headset mode).
Also, even if one would manage to set this configuration up, which have huge limitations on the hardware/software used, there is no way to handle sound input from 2 different audio devices simultaneously.
So, technically - one cannot just play the Game, communicate on the Discord, and optionally listen to some music, unless he is bound to some USB-bundled earphones. Which are usually really crappy, or really expensive. Or both.
Solution sketch:
So, I came up with an idea that one can actually build such device, using Raspberry Pi, Arduino, or even barebone-component-based stacks.
Theoretical layout of connections per-se would look somehow like that:
Idea is to create 2 "simple" devices
One, not-so-portable, that would handle several analog inputs, and one analog output
One, portable, that would handle single analog Input and Output, and could be used with any analog earphones.
"Requirements" to such system would be quite simple:
This bundle have to handle Data Transition on some distance, preferably up to 10 meters, or more.
The "Inlet" device should be portable enough to keep it in the pocket, or in an arm band, or something
Sound Quality should be at the very least on the level of Bluetooth headphones profile, or if possible - even better
If possible - it would be nice to keep the price of the Solution under 500 Euros, but I'm so tired of current state of things that I might consider raising the budget...
Don't mind the yellow buttons on the Outlet device. Those are optional, and will depend on the implementation stack :)
Can anyone advice me which component-base would be a better solution to making such a tool, and why?
And maybe someone actually knows of similar systems already existing?
Personally I would prefer anything but the barebone-components-based solution, just because I'm really rusty with that area, and it requires quite the amount of tools, to handle it properly.
While using pre-built modules can save me from buying most of the hardware tools, minifying my "hardware customization" part of this solution, leaving only software part to handle (which is my main area of expertise).
But then again, if there are some experts here, that would consider other stacks non-viable - I would really appreciate to see their reasonings.
P.S. Just to be clear: If this project will prove viable - I will implement it, and share the implementation details with the communities. I am not the first one who needs such system, and unfortunately it seems that Hardware/Software vendors are not really interested in designing similar solutions...
I happen to find a "temporary" solution.
I've came across a wireless headset, that allows to simultaneously support Wireless USB Bundle connection, and Bluetooth connection to different devices, and provide nice way of controlling sound input/output with both connections.
This was almost a pure luck, as this "feature" was not described anywhere in the specs...
Actual headset name is:
JBL Quantum 800
This does not closes the question per-se, as I still plan to implement this "Summer Project" at some point, but I believe this information might be useful to those searching for similar solutions.

How can I synchronize two audio recordings *without* timestamps?

Let's say I have two separate recordings of the same concert (created on a user's phone and then uploaded to our server). These recordings are then aligned according to their creation timestamp. However, when these recordings are played together or quickly toggled between, it is revealed that their creation timestamps must be off because there is a perceptible delay.
Since the time stamp is not a reliable way to align these recordings, what is an alternative? I would really prefer not to have to learn about audio signal processing to solve this problem, but recognize this may be the only way. So, I guess my question is:
Can I get away with doing some kind of clock synchronization? Is that even possible if the internal device clocks are clearly off by an unknown amount? If yes, a general outline of how this would work and key words would be appreciated.
If #1 is not an option, I guess I need to learn about audio signal processing? Again, a general outline of how to tackle the problem from that angle and some key words would be appreciated.
There are 2 separate issues you need to deal with. Issue 1 is the alignment of the start time of the recordings. I doubt you can expect that both user's pressed record at the exact same moment. Even if they did they may be located different distances from the speaker and it takes time for sound to travel. Aligning the start times by hand is pretty trivial. The human brain is good at comparing the similarities of sound. Programmatically it's a different story. You might try using something like cross correlation or looking over on There is no exact method though.
Issue 2 is that the clocks driving the A/D converters on the two devices are not going to be running at the same exact rate. So even if you synchronize the start time, eventually the two are going to drift apart. The time it takes to noticeably drift is a function of the difference of the two clock frequencies. If they are relatively close you may not notice in a short recording. To counter act this you need to stretch the time of one of the recordings. This increases or decreases the duration of the recording without affecting the pitch. There are plenty of audio recording apps that allow you to time stretch but they don't give you any help in figuring out by how much. Start be googling "time stretching" or again have a look at
I realize neither of these are direct answers - rather suggestions.
Take a look at this document, describes how you can align recordings using Sonic Visualizer(GPL) and a plugin.
I've not used it before, but found the document (and this question) when I was faced with a similar problem.

How can I detect the sound in a raw sound file

I am developing a software which can auto record and extract every words in my voice. I used portaudio library to solve it. But I am stuck on detecting the sound: I set the silence's value is zero so if there is a sample which is zero, it must be a start or end point of a sound. But when I ran it, the program created many words. I think because the value I read by portaudio is raw data, so it can't be processed like that. Am I right? How can I fix it? By the way, I am coding in C++ :D
To detect the presence of a signal in a PCM stream you be able to detect it. As dprogramz put said, the noise floor of your soundcard is probably not perfect and so there will be some noise signal recorded (even with no mic connected).
The solution is to use a VOX or VAD algorithm to detect the presence of your voice. VOX can be tricky, since in most consumer grade electronics the noise floor is just low enough to be "silence" to the human ear, relative to the signal. This means that the difference on amplitude between the noise floor and signal may be slight. If your sound card has AGC turned on this can make it even more difficult, since the noise floor may move. Having said that, VOX can be implemented successfully on consumer grade equipment. It just takes more effort to establish the threshold. When done best the threshold is calculated periodically while the stream is active.
If I were doing this I'd implement a VAD algorithm. Since your objective is to detect your voice this should provide a reliable result regardless of the equipment you use.
I don't think it's because it is a RAW value. RAW sound files are a bitstream of frequency and volume information.
However, the value will rarely (if ever) be zero. You have to take into account there is a small amount of electrical noise that is made by the mic. Figure out the "idle" dB of your mic (just test the level when you aren't talking into it). You Then need to set a silence threshold (below a certain dB level for a certain number of samples) to detect the beginning/end. Attempting to detect a zero value is gonna be near impossible.

Recording the Stereo Mix and Parasites

I'm trying to make a video tutorial, so i decided to record the speeches using a TTS online service.
I use Audacity to capture the sound, and the sound was clear !
After dinning, i wanted to finish the last speeches, but the sound wasn't the same anymore, there is a background noise(parasite) which is disturbing, i removed it with Audacity, but despite this, the voice isn't the same ...
You can see here the difference between the soundtrack of the same speech before and after the occurrence of the problem.
The codec used by the stereo mix peripheral is "IDT High Definition Codec".
Thank you.
Perhaps some cable or plug got loose? Do check for this!
If you are using really cheap gear (built-in soundcard and the likes) it might very well also be a problem of electrical interference, anything from ...
Switching on some device emitting a electro magnetic field (e.g. another monitor close by)
Repositioning electrical devices on your desk
Changes in CPU load on your computer (yes i'm serious!)
... could very well cause some kinds of noises with low-fi sound hardware.
Generally, if you need help on audio sounding wrong make sure that you provide a way to LISTEN to the files, not just a visual representation.
Also in your posted waveform graphics i can see that the latter signal is more compressed, which may point to some kind of automated levelling going on somewhere in the audio chain.

Touchscreen using sound input?

i don't really know if it is actually possible, but i believe that it can be made. How possible is it to make a program that recognizes different sound bouncing from the screen and turn it into a position that will obviously be later fed to the mouse.
I know that it sounds kind of dumb, but lately i've been noticing that a very dull, strong sound is made when touching the screen, and that sound varies when doing so at different positions. Probably the microphone "hears" differently because the screen acts as a drum with the casing. Anyways, what do you think, anyone has any experience programming with sound?
First of all most domestic touch screens work by detecting pressure based on a criss-cross mesh layer underneath the display layer.
However I have seen an example where a touch interface was interrogated onto a pane of glass, it used 4 microphones to determine the corners, when you tapped a certain part of the screen it measures the delay in the sound getting to each microphone, therefore allowing one to triangulate the touch.
This is the methodology you would use, you don't even need to set up the hardware to test it, you could throw up an interface in VB, when you click in a box it sends out a circular wave and just calculate using the times it takes to reach the 4 points where the pointer is.
As nikie suggested, drag & drop, or any kind of gestures would be impossible using the microphone method, as the technique needs a wave of sound to detect the input.
I don't know if this will get you far, but you can investigate the techniques used in MIDI drums for returning various nuances of play.
