Scipy - Audio Processing - audio

I looking for good tool for Audio signal processing.
e.g Speech & music analysis, identification.
Is scipy provides function for these?
Is this a good tool for Audio Signal processing?
Can you please suggest a better tool for this?
Thank you.

Scipy provides you with the basic scientific tools, but not anything particular. A lot of the stuff in scipy is inspired by or even copyied from the basic MatLab functionality. As fas as I know there is no speech processing package but instead everything you need to build such thing, i.e. dsp lib, fft, statistics, linear algebra, even a symbol toolbox.

Related

APCS final project: Converting an audio file to a simpler MIDI file

Lets say I have the audio file for Happy Birthday. I want to convert that audio file into an audio file that sounds like this : happy birthday.
First, I'd like to know if I have the ability to program this? Can a highschooler who's almost finished with APCS program this?
If I can:
How would I change the bpm of the song? I've searched through a bunch of websites, but they weren't very helpful.
I know that audio files can be represented in waveforms. How would I scan for each individual wave in an audio file (I need this to isolate the notes)?
This is a very ambitious project, actually. One reason is that it involves using digital signal processing tools like FFT (Fast fourier transforms) to analyze the sound to pick out the pitches. You might be able to find a library that can do this, but as far as coding such a tool, that would involve a steep learning curve.
If you would like to look further into this, there is a good online resource called "The Scientists and Engineers Guide to Digital Signal Processing". I was able to work through and understand the discrete fourier transform with only high school math (lots of trig) and a bit of calculus. It was a lift, though.
Trying to analyze rhythm is also no easy task. Even with advanced tools provided in professional notation system such as Finale, people have trouble playing rhythms in time well enough for the best transcription tools. Algorithms that "quantize" the beats help but also limit the amount of detail that can be included in the playback.
My guess is that as interesting and worthwhile as this project would be, to bring it to completion before the semester ends would require putting together prebuilt pieces. A lot of programming is done that way, these days.
If you scale the project back to something like just getting your code to analyze a short sample of a single note and give its pitch, that would be both impressive and doable with a lot of work. It could be done with a DFT algorithm instead of requiring FFT, reducing the amount of info you'd have to acquire first. That way, you'd only have to work your way up to understanding and implementing the material on this link which is about calculating the DFT. Notice that there is example code in BASIC. The code examples throughout this book are a big help.

Tutorial tensorflow audio pitch analysis

I'm a beginner with tensorflow and Python and I'm trying to build an app that automatically detects, in a football (soccer) match some key moments (yellow/red cards, goals, etc).
I'm starting to understand how to do a video analysis training the program on a dataset built by me, downloading images from the web and tagging them. In order to obtain some better results for the analysis, I was wondering if someone had some suggestions on tutorials to follow in order to understand how to train my app also on audio files, to make the program able to understand when there is a pitch variation in the audio of the video and combine both video and audio analysis in order to get better results.
Thank you in advance
Since you are new to Python and to tensorflow, I recommend you focus on just audio for now, especially since its a strong indicator of events of importance in a football match (red/yellow cards, nasty fouls, goals, strong chances, good plays, etc).
Very simply, without using much ML at all, you can use the average volume of a time period to infer significance. If you want to get a little more sophisticated, you can consider speech-to-text libraries to look for keywords in commentator speech.
Using video to try to determine when something important is happening is much, much more challenging.
This page can help you get started with audio signal processing in Python.
https://bastibe.de/2012-11-02-real-time-signal-processing-in-python.html

python3 audio signal processing

A project I am working on for one of my classes is to build a simple GUI sound editor for kids using python3 (using python3 is a strict project requirement). I don't want this editor to be as complex as something like audacity but I would like to have some fun built in effects similar to the sound editor on the nintendo ds http://nintendo.wikia.com/wiki/Nintendo_DSi_Sound.
I have been researching modules that are compatible with python3 that will help with the audio signal processing since I am very inexperienced in this area but I am running into trouble finding something that will work with python3. I found this great list of music modules for python: http://wiki.python.org/moin/PythonInMusic but everything that seems to have the functionality I think I want such as pyo and snack, does not have python3 compatibility.
I think at this point my best option is to use NumPy and SciPy for the signal processing but I was wondering if anyone had any better suggestions or advice? Or is using NumPy and SciPy an ideal choice if I can become familiar with them?
NumPy/SciPy can process audio signals, but it doesn't feel "native" as you have to write lots of interface code to play the resulting data as sound or write the in some standard format (like .wav).
I'd suggest porting those modules ; it's often very easy and straightforward and a good Python exercis.

Looking for an expressive audio programming language or library

I'm looking for an audio processing language or library which will allow me to experiment with different synthesis techniques. I've looked at Processing which I think is great at what it does, but haven't found any inspiring (and simple) audio libraries.
As a baseline, I want to simply create my own sample buffers and play them back (ideally in realtime). As a plus, the ability to handle MIDI events would be great. I'm an experienced C++ programmer so I could do it natively on but had hoped there was a more DSL (domain specific language) approach.
I have access to Windows, Mac or Linux so not too bothered yet about platform. Other languages I can deal with are C#, Java & Python.
Thanks
James
Depending on how much you want to stay out of the low-level housekeeping details, you may want to look at CSound , or if you want to not actually write code, the patching-based system PureData is great to work with. As #Lou points out, ChucK is interesting (but was too buggy to use the last time I checked it out).
If you really do want to write code, look at the Synthesis Toolkit, a set of C++ classes for audio processing and synthesis.
For an app framework, I recommend JUCE, which has incredibly nice cross-platform handling of audio/midi IO and GUI elements.
Max MSP is an audio production tool that is highly expressive.
I guess you could say it's a high-level tool, and not a low-level programming language. My impression of it is that it's geared towards the technical musician or the artistic engineer, but anyway it kicks ass and you could go low-level with it if you want.
I've always been a big fan of SuperCollider. It's designed for Mac OS X but also works on Linux.
The language is mostly based on SmallTalk, and it's pretty easy to pick up if you understand the basics of functional programming. The quality of the sound output by the SC Server is very good and there is plenty of documentation both built into the app environment and available online.
One interesting point of SuperCollider is the usage on android devices, and it's intercommunication with python trough out other modules.
Here goes an example
I know you didn't say Ruby, but check out Archaeopteryx
https://github.com/gilesbowkett/archaeopteryx/wiki
or ChucK
http://chuck.cs.princeton.edu/
Have a look at NAudio, an open source .NET audio SDK for working with audio files and devices in Windows. Some features include:
http://naudio.codeplex.com/
NAudio Features:
Play back audio using a variety of APIs
Decompress audio from different Wave Formats
Record audio using WaveIn, WASAPI or ASIO
Read and Write standard .WAV files
Mix and manipulate audio streams using a 32 bit floating mixing engine
Extensive support for reading and writing MIDI files
Full MIDI event model
Basic support for Windows Mixer APIs
A collection of useful Windows Forms Controls
Some basic audio effects, including a compressor

Sound Synthesis Framework in C/C++/Objective-C?

I've searched the net but didn't found anything interesting. Maybe I'm doing something wrong.
I'm looking for sound synthesis API written on C, C++ or even Objective-C, which can synthesize different types of waves, effects are optional.
Here's a complete library/toolkit for FM (Frequency Modulation) synthesis:
link1
link2
If you have time to spare... creating simple sound synthesis from scratch is actually a fun endeavor. If you create a small buffer of 256 16 bit samples which represent either a sine. a sawtooth, block or pulse, you can copy these to a live audiobuffer (e.g. a small buffer (say 16kb)) which constantly loops. By staying ahead of the playposition, and constantly filling up the buffer with new values, you can create the soundoutput.
You can use the small buffers to combine these in interesting ways (simplest is just to add them together (additive synthesis)).
The frequency of the tone can be manipulated by using a bigger or smaller sampling step through the small buffers. Amplitude can be manipulated by scaling the samples before putting them into the output buffer.
Great fun experimenting with this!
If you have this step nailed, you can add more sophisticated effects like filters (low pass, high pass, etc) and effects (reverbs, echoes, etc)
R
Have you looked at the synthesis toolkit (STK)? It's in C++ (I don't think ObjC is the right language for audio synthesis, in fact audio units, Apple's own way of doing audio stuff, including generators/filters/effects... is in C++).
STK will run on Mac OS X, and iOS no problem (CoreAudio is supported), but will also run on Linux and Windows (Direct sound and ASIO), using RtAudio. It's really nicely done and lightweight, these guys have spent a lot of time thinking about it and it will definitely give you a big head start. It can handle loads of different audio file formats + midi (and hopefully OSC soon...).
There is also Create and CLAM which is huge, these include GUI components and many other things which you might or might not want. If you're only interested in doing sound synthesis I really recommend STK.
PortAudio is also a great C API that we used last semester in an audio programming course. It provides an audio callback...what more could you need!?
I haven't tried incorporating it with anything in Objective-C yet, but will report back when I do.
Writing audio synthesis algorithms in C/obj-C is quite difficult in my opinion. I would recommend writing your signal processing algorithms using PureData and then use ZenGarden or libpd to embed and interpret the pd patches in your app.
Another C++ library is nsound:
http://nsound.sourceforge.net
One can generate any kind of modulated signal using the Generator class or using the provided Sine class. Each time-step can have it's own instantaneous frequency and phase offset.
You can also experiment with the Python module to prototype your algorithm quickly, then implement in C++. It can produce pretty matplotlib plots from Python and even from C++!
Have you looked at CSound? It's an incredibly flexible audio generation platform, and can handle everything from simple waveform generation to FM synthesis and all kinds of filters. It also provides MIDI support, and you can extend it by writing custom opcodes. There's a full C API and several C++ APIs as well.

Resources