Is it possible to create your own Pitch Class Profile (PCP) of given notes? I want to create possible candidates which I will be comparing with wav audio.
I know that it is possible to match a chromagram of a chord with audio and thus estimate which chord is played, but this chromagram is given in advance.
For example, I expected: F4 C5 D5, and I want to create a PCP which I will compare with a chromagram of played sound.
Related
So to put it simply, I want to find the frequencies of an audio file.
There's only 2 frequencies in it (one in the left ear, one in the right ear) as it's a binaural. What I want to find is these two frequencies. I have audacity and audition.
I have an FFT output from a microphone and I want to detect a specific animal's howl from that (it howls in a characteristic frequency spectrum). Is there any way to implement a pattern recognition algorithm in Arduino to do that?
I already have the FFT part of it working with 128 samples #2kHz sampling rate.
lookup audio fingerprinting ... essentially you probe the frequency domain output from the FFT call and take a snapshot of the range of frequencies together with the magnitude of each freq then compare this between known animal signal and unknown signal and output a measurement of those differences.
Naturally this difference will approach zero when unknown signal is your actual known signal
Here is another layer : For better fidelity instead of performing a single FFT of the entire audio available, do many FFT calls each with a subset of the samples ... for each call slide this window of samples further into the audio clip ... lets say your audio clip is 2 seconds yet here you only ever send into your FFT call 200 milliseconds worth of samples this gives you at least 10 such FFT result sets instead of just one had you gulped the entire audio clip ... this gives you the notion of time specificity which is an additional dimension with which to derive a more lush data difference between known and unknown signal ... experiment to see if it helps to slide the window just a tad instead of lining up each window end to end
To be explicit you have a range of frequencies say spread across X axis then along Y axis you have magnitude values for each frequency at different points in time as plucked from your audio clips as you vary your sample window as per above paragraph ... so now you have a two dimensional grid of data points
Again to beef up the confidence intervals you will want to perform all of above across several different audio clips of your known source animal howl against each of your unknown signals so now you have a three dimensional parameter landscape ... as you can see each additional dimension you can muster will give more traction hence more accurate results
Start with easily distinguished known audio against a very different unknown audio ... say a 50 Hz sin curve tone for known audio signal against a 8000 Hz sin wave for the unknown ... then try as your known a single strum of a guitar and use as unknown say a trumpet ... then progress to using actual audio clips
Audacity is an excellent free audio work horse of the industry - it easily plots a WAV file to show its time domain signal or FFT spectrogram ... Sonic Visualiser is also a top shelf tool to use
This is not a simple silver bullet however each layer you add to your solution can give you better results ... it is a process you are crafting not a single dimensional trigger to squeeze.
I need to take short sound samples every 5 seconds, and then upload these to our cloud server.
I then need to find a way to compare / check if that sample is part of a full long audio file.
The samples will be recorded from a phones microphone, so they will indeed not be exact.
I know this topic can get quite technical and complex, but I am sure there must be some libraries or online services that can assist in this complex audio matching / pairing.
One idea was to use a audio to text conversion service and then do matching based on the actual dialog. However this does not feel efficient to me. Where as matching based on actual sound frequencies or patterns would be a lot more efficient.
I know there are services out there such as Shazam that do this type of audio matching. However I would imagine their services are all propriety.
Some factors that could influence it:
Both audio samples with be timestamped. So we donot have to search through the entire sound clip.
To give you traction on getting an answer you need to focus on an answerable question where you have done battle and show your code
Off top of my head I would walk across the audio to pluck out a bucket of several samples ... then slide your bucket across several samples and perform another bucket pluck operation ... allow each bucket to contain overlap samples also contained in previous bucket as well as next bucket ... less samples quicker computation more samples greater accuracy to an extent YMMV
... feed each bucket into a Fourier Transform to render the time domain input audio into its frequency domain counterpart ... record into a database salient attributes of the FFT of each bucket like what are the X frequencies having most energy (greatest magnitude on your FFT)
... also perhaps store the standard deviation of those top X frequencies with respect to their energy (how disperse are those frequencies) ... define additional such attributes as needed ... for such a frequency domain approach to work you need relatively few samples in each bucket since FFT works on periodic time series data so if you feed it 500 milliseconds of complex audio like speech or music you no longer have periodic audio, instead you have mush
Then once all existing audio has been sent through above processing do same to your live new audio then identify what prior audio contains most similar sequence of buckets matching your current audio input ... use a Bayesian approach so your guesses have probabilistic weights attached which lend themselves to real-time updates
Sounds like a very cool project good luck ... here are some audio fingerprint resources
does audio clip A appear in audio file B
Detecting audio inside audio [Audio Recognition]
Detecting audio inside audio [Audio Recognition]
Detecting a specific pattern from a FFT in Arduino
Detecting a specific pattern from a FFT in Arduino
Audio Fingerprinting using the AudioContext API
https://news.ycombinator.com/item?id=21436414
https://iq.opengenus.org/audio-fingerprinting/
Chromaprint is the core component of the AcoustID project.
It's a client-side library that implements a custom algorithm for extracting fingerprints from any audio source
https://acoustid.org/chromaprint
Detecting a specific pattern from a FFT
Detecting a specific pattern from a FFT in Arduino
Audio landmark fingerprinting as a Node Stream module - nodejs converts a PCM audio signal into a series of audio fingerprints.
https://github.com/adblockradio/stream-audio-fingerprint
SO followup
How to compare / match two non-identical sound clips
How to compare / match two non-identical sound clips
Audio fingerprinting and recognition in Python
https://github.com/worldveil/dejavu
Audio Fingerprinting with Python and Numpy
http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/
MusicBrainz: an open music encyclopedia (musicbrainz.org)
https://news.ycombinator.com/item?id=14478515
https://acoustid.org/chromaprint
How does Chromaprint work?
https://oxygene.sk/2011/01/how-does-chromaprint-work/
https://acoustid.org/
MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public.
https://musicbrainz.org/
Chromaprint is the core component of the AcoustID project.
It's a client-side library that implements a custom algorithm for extracting fingerprints from any audio source
https://acoustid.org/chromaprint
Audio Matching (Audio Fingerprinting)
Is it possible to compare two similar songs given their wav files?
Is it possible to compare two similar songs given their wav files?
audio hash
https://en.wikipedia.org/wiki/Hash_function#Finding_similar_records
audio fingerprint
https://encrypted.google.com/search?hl=en&pws=0&q=python+audio+fingerprinting
ACRCloud
https://www.acrcloud.com/
How to recognize a music sample using Python and Gracenote?
Audio landmark fingerprinting as a Node Stream module - nodejs converts a PCM audio signal into a series of audio fingerprints.
https://github.com/adblockradio/stream-audio-fingerprint
I am trying to convert a .wav music file into something playable at beep command.
I need to export the frequencies to a text format to use as input parameters at beep.
Ps.: It is not about Speech Transcription.
The beep command in linux is only to control de pc-speaker. It only allows one frequency simultaneously and doesn't apply. A wav file is a file of samples that normally carries music (music is made of a lot of simultaneous frequencies)
You cannot convert a wav file to play it on the pc-speaker. You need a sound card to do that.
As you say, it's not voice recognition, but even in that case, a violin simple note sounds different than a guitar one, because it carries not only a single frequency in it. There are what is called harmonics, different components at different frequencies (normally multiples of the original frequency) that makes the sound different (not only the frequencies matter, also the relative intensities of them) and that is impossible to reproduce with a tool that only allows you to play a single frequency, with a given shape (the wave is not sinusoidal, but have several already included harmonics, that make it sound like a pc speaker) and no intensity capable.
I am trying to do some work with basic Beat Detection (in both C and/or Java) by following the guide from GameDev.net. I understand the logic behind the implementation of the algorithms, however I am confused as to how one would get the "sound amplitude" data for the left and right channels of a song (i.e. mp3 or wav).
For example, he starts with the following assumption:
In this model we will detect sound energy variations by computing the average sound energy of the signal and comparing it to the instant sound energy. Lets say we are working in stereo mode with two lists of values : (an) and (bn). (an) contains the list of sound amplitude values captured every Te seconds for the left channel, (bn) the list of sound amplitude values captured every Te seconds for the right channel.
He then proceeds to manipulate an and bn using his following algorithms. I am wondering how one would do the Signal Processing necessary to get an and bn every Te seconds for both channels, such that I can begin to follow his guide and mess around with some simple Beat Detection in songs.
An uncompressed audio file (a .wav or.aiff for example) is for the most part a long array of samples. Each sample consists of the amplitude at a given point in time. When music is recorded, many of these amplitude samples are taken each second.
For stereo (2-channel) audio files, the samples in the array usually alternate channels: [sample1 left, sample1 right, sample2 left, sample2 right, etc...].
Most audio parsing libraries will already have a way of returning the samples separately for each channel.
Once you have the sample array for each channel, it is easy to find the samples for a particular second, as long as you know the sample rate, or number of samples per second. For example, if the sample rate for your file is 44100 samples per second, and you want to capture the samples in n th second, you would use the part of your vector that is between (n * 44100 ) and ((n + 1) * 44100).