Sound processing reference - audio

I am basically a .net programmer. I have a project in mind which is on core sound processing. But I want to do it in core language like assembly or c. I need some reference web sites or book names where I can found core format about sound or how to read sound file or from microphone. Thanks in advance.

I work well with examples. Here is some reference material also. If you need anything else let me know.
Audacity Open Source Audio program source code link
I also recommend this - Closest thing between C++ and .NET Visual C++ 2010 Express
Introduction to Audio Processing PDF
Book on Digital Signal Processing (An Electrical Engineering View)
Book on Audio Signal Processing
Site on Introduction to Sound Processing

Related

Audio content analysis for online audiovisual data

I want to work on a project where I have to segment and classify online audiovisual data based on its audio content, i.e. different parts of the audio visual data will be segmented and classified as silence, music, speech, speech+background music, etc based on their audio content.
I am aware that I have to obtain the audio part from the audiovisual data and extract features like zero crossing, spectral peaks, etc. and find out segment boundaries in order to segment audio data.
But I'm lost in the beginning itself.
I do not know how to start off with the project. The output of the software are segments of audiovisual data under different categories like silence, speech, music, etc.
It will be really helpful if someone lets me know
Which programming language is convenient for this purpose?
What steps should i follow in order to develop this software?
I have no background in digital signal processing. It'll be really helpful if I get some guidance
I'd suggest to look into a multimedia framework such as GStreamer. It is crossplatform, but the easiest to get started on Linux where it originates from. It already comes with all kind of plugins to receve, demux and decode audio and video. It also has a couple of analyzers (such as level and spectrum analyzers for audio as well as voice activity detection). Those could be a good starting point for your experiments. Gstreamer itself is written in C, but applications can use the language bindings to python, perl, c#, c++, java, ...

How to convert human voice into digital format?

I am working on a project where biometric system is used to secure the system. We are planning to use human voice to secure the system.
Idea is to allow the person to say some words or sentences and system will store that voice in digital format. Next time person wants to enter the system, he/she has to speak some words which may or may not be different from the words used earlier.
We don't want to match words but want to match voice frequency.
I have read some research papers regarding this system but those papers don't have any implementation details.
So just want to know whether there is any software/API which can convert analog voice into digital format and will also tell us the frequency of voice.
Until now I was working on normal web based applications so I know normal APIs and platforms like Java EE, C#, etc but I don't have any experience about this kind of application.
Please enlighten !!!
http://www.loquendo.com/en/products/speaker-verification/
http://www.nuance.com/for-business/by-solution/contact-center-customer-care/cccc-solutions-services/verifier/index.htm
(two links removed due to reported virus content)
http://www.persay.com/products.asp
This is as good a starting point as any : http://marsyas.info/
It's a open source software framework for audio processing. They've listed a bunch of projects that have used their framework in various ways so you could probably draw inspiration from it.
http://marsyas.info/about/projects. The Telligence project in particular seems the closest to your needs as it it was used to gender classify audio : http://marsyas.info/about/projects#5Teligence
There are two steps on a project like this one I believe:
First step would be to record the voice from an analog input into digital format (let's assume wav-pcm). For this you can use DirectShow API in C#, or standard Wav-In as in this project: http://www.codeproject.com/KB/audio-video/cswavrec.aspx. You may consider compressing your audio files later on, there are many options for this, in Windows you may consider Windows Media Format SDK to avoid licensing issues with other formats.
Second step is to build or use a voice recognition framework, if you want to build a recognition framework you will probably need to define a set of "features" for your sound fragments and select+implement a recognition algorithm. There are many aproaches available for this, IEEE amd ACM.org websties are usually good sources. If you want to use an existing framework you may want to consider Nuance Recognizer (commercial) or http://cmusphinx.sourceforge.net (open source).
Hope this helps.

Crossplatform Offline Audio Processing Library

I am looking for an audio dsp library for cleaning up some speech (voice) recording. I have not decided which language to use yet.
Here are the feature I am looking for:
Work in Linux and Windows
Importing MP3
Working with multiple channels mixing
Noise Filter
Bandpass filter
Compressor
I love to have these as well, but I can write my own if they are not available:
De-esser
multi-band compressor
Expender
Envelopes
(if you can suggest an application that do these in scripting / one mouse click, I will accept your answer too)
What about something like SoX?? http://sox.sourceforge.net/
Take a look at Juce from Raw Material Software.
It is free for non-commercial use, and very reasonably priced for commercial use. it also has a lot of built in audio capabilities (mixing, file I/O, etc.) and has a nice cross platform GUI toolkit as well.
Audacity does most of those things.

Available options for playing a stream or a remote mp3 file on iOS 4

I am trying to make an application for listening to podcasts. Each podcast is an mp3 file, around 50MB in size. After reviewing the Using Audio chapter of the Multimedia Programming Guide, I decided to use AVPlayer, as the other options did not seem appropriate. However, the more I work with AVFoundation, the more complicated it seems and I have a feeling that simply streaming an mp3 file should be easier. Plus on the top of this document, there is a note stating:
Important: This document contains
information that used to be in iOS
Application Programming Guide. The
information in this document has not
been updated specifically for iOS 4.0
Does that mean that I have some other options, or that AVFoundation is maybe an overkill for what I need to do? I would really appreciate it if someone could clear things out a bit and let me know if I'm making something wrong here.
Thanks in advance!
You should explore Cocos Denshion.
http://www.cocos2d-iphone.org/wiki/doku.php/cocosdenshion:cookbook
The audio engine comes with cocos2d, and it is just 5 classes you can include with your project.
It's very simple to use, as you can see from the above link. It's basically just a wrapper for some AVFoundation classes.
The only trick will be to stream your mp3, but it looks like you can simply update the Cocos Denshion CDAudioManager to hand a URL to the AVAudioPlayer, as a start. Whether or not that satisfies your streaming requirement, I don't know.
At the very least, it will give you some AVFoundation code to study.
I just found a pdf with a nice overview of some possible options from this course blog. Together with Julian's suggestion this is all I could find so far.

Looking for an expressive audio programming language or library

I'm looking for an audio processing language or library which will allow me to experiment with different synthesis techniques. I've looked at Processing which I think is great at what it does, but haven't found any inspiring (and simple) audio libraries.
As a baseline, I want to simply create my own sample buffers and play them back (ideally in realtime). As a plus, the ability to handle MIDI events would be great. I'm an experienced C++ programmer so I could do it natively on but had hoped there was a more DSL (domain specific language) approach.
I have access to Windows, Mac or Linux so not too bothered yet about platform. Other languages I can deal with are C#, Java & Python.
Thanks
James
Depending on how much you want to stay out of the low-level housekeeping details, you may want to look at CSound , or if you want to not actually write code, the patching-based system PureData is great to work with. As #Lou points out, ChucK is interesting (but was too buggy to use the last time I checked it out).
If you really do want to write code, look at the Synthesis Toolkit, a set of C++ classes for audio processing and synthesis.
For an app framework, I recommend JUCE, which has incredibly nice cross-platform handling of audio/midi IO and GUI elements.
Max MSP is an audio production tool that is highly expressive.
I guess you could say it's a high-level tool, and not a low-level programming language. My impression of it is that it's geared towards the technical musician or the artistic engineer, but anyway it kicks ass and you could go low-level with it if you want.
I've always been a big fan of SuperCollider. It's designed for Mac OS X but also works on Linux.
The language is mostly based on SmallTalk, and it's pretty easy to pick up if you understand the basics of functional programming. The quality of the sound output by the SC Server is very good and there is plenty of documentation both built into the app environment and available online.
One interesting point of SuperCollider is the usage on android devices, and it's intercommunication with python trough out other modules.
Here goes an example
I know you didn't say Ruby, but check out Archaeopteryx
https://github.com/gilesbowkett/archaeopteryx/wiki
or ChucK
http://chuck.cs.princeton.edu/
Have a look at NAudio, an open source .NET audio SDK for working with audio files and devices in Windows. Some features include:
http://naudio.codeplex.com/
NAudio Features:
Play back audio using a variety of APIs
Decompress audio from different Wave Formats
Record audio using WaveIn, WASAPI or ASIO
Read and Write standard .WAV files
Mix and manipulate audio streams using a 32 bit floating mixing engine
Extensive support for reading and writing MIDI files
Full MIDI event model
Basic support for Windows Mixer APIs
A collection of useful Windows Forms Controls
Some basic audio effects, including a compressor

Resources