Basics of Digital Audio

Basics of Digital Audio - linux

I have recently started going through sound card drivers in Linux[ALSA].
Can a link or reference be suggested where I can get good basics of Audio like :
Sampling rate,bit size etc.
I want to know exactly how samples are stored in Audio files on a computer and reverse of this which is how samples(numbers) are played back.

The Audacity tutorial is a good place to start. Another introduction that covers similar ground. The PureData tutorial at flossmanuals is also a good starting point. Wikipedia is a good source once you have the basics down.
Audio is input into a computer via an analog-to-digital converter (ADC). Digital audio is output via a digital-to-analog converter (DAC).
Sample rate is the number of times per second at which the analog signal is measured and stored digitally. You can think of the sample rate as the time resolution of an audio signal. Bit size is the number of bits used to store each sample. You can think of it as analogous to the color depth of an image pixel.
David Cottle's SuperCollider book also has a great introduction to digital audio.

I was in the same situation, and certainly this kind of information is out there but you need to do some research first. This is what I have found:
Digital Audio processing is a branch of DSP (Digital Signal
Processing).
DSP is one of the most powerful technologies that will
shape science and engineering in the twenty-first century.
Revolutionary changes have already been made in a broad range of
fields: communications, medical imaging, radar & sonar, high fidelity
music reproduction, and oil prospecting, to name just a few. Each of
these areas has developed a deep DSP technology, with its own
algorithms, mathematics, and specialized techniques…
This quote was taken from a very helpful guide that covers every topic in depth called the “The Scientist and Engineer's Guide to
Digital Signal Processing”. And though you are not asking for DSP specifically there’s a chapter that covers all digital audio related topics with a very good explanation.
You can find it in the chapter 22 - Audio Processing, and covers all this topics:
Human Hearing: how the sound is perceived by our ears, this is the
basis of how then the sound is then generated artificially.
Timbre: explains the properties of sound, like loudness, pitch and
timbre.
Sound Quality vs. Data Rate: once you know the previous concepts
we start to translate it to the electronic side.
High Fidelity Audio: gives you a picture of how sound is then
processed digitally.
Companding: here you can find how sound is then processed and
compressed for telecommunications.
Speech Synthesis and Recognition: More processes applied to the
sound, like filters, synthesis, etc.
Nonlinear Audio Processing: this is more advanced but understandable,
for sound treatment and other topics.
It explains the basics of sound in the real world, in case you might want to take a look, and then it explains how the sound is processed in the computer including what you are asking for.
But there are other topics that can be found in wikipedia that are more specific, let’s say the “Digital audio” page that explains every detail of this topic, this site can be used as a reference for further research, just in the beginning you can find a few links to sample rate, sound waves, digital forms, standards, bit depth, telecommunications, etc. There are a few things you might need to study more, like the nyquist-shannon theorem, fourier transforms, complex numbers and so on, but this is only used in very specific and advanced topics that you might not review or use. But I mention it just in case you are interested. You can find information in both the DSP guide book and wikipedia although you need to study some math.
I’ve been using python to develop and study these subjects with code since it has a lot of useful libraries, like numpy, sound device, scipy, etc. And then you can start plating with sound. On youtube you can find lots of videos that also guide you on how to do this. I’ve found synthesis, filters, voice recognition, you can create wav files with just code, which is great. But also I’ve seen projects in C/C++, Javascript, and other languages, so it might help you to keep learning and coding fun things.
There are a few other references across the internet but you might need to know what you are looking for, this book and the wikipedia page would be the best starting points for me, since it gives you the basics and explains in depth every topic. Then depending on the goal you want to achieve you can then start looking for more information.

Related

APCS final project: Converting an audio file to a simpler MIDI file

Lets say I have the audio file for Happy Birthday. I want to convert that audio file into an audio file that sounds like this : happy birthday.
First, I'd like to know if I have the ability to program this? Can a highschooler who's almost finished with APCS program this?
If I can:
How would I change the bpm of the song? I've searched through a bunch of websites, but they weren't very helpful.
I know that audio files can be represented in waveforms. How would I scan for each individual wave in an audio file (I need this to isolate the notes)?

This is a very ambitious project, actually. One reason is that it involves using digital signal processing tools like FFT (Fast fourier transforms) to analyze the sound to pick out the pitches. You might be able to find a library that can do this, but as far as coding such a tool, that would involve a steep learning curve.
If you would like to look further into this, there is a good online resource called "The Scientists and Engineers Guide to Digital Signal Processing". I was able to work through and understand the discrete fourier transform with only high school math (lots of trig) and a bit of calculus. It was a lift, though.
Trying to analyze rhythm is also no easy task. Even with advanced tools provided in professional notation system such as Finale, people have trouble playing rhythms in time well enough for the best transcription tools. Algorithms that "quantize" the beats help but also limit the amount of detail that can be included in the playback.
My guess is that as interesting and worthwhile as this project would be, to bring it to completion before the semester ends would require putting together prebuilt pieces. A lot of programming is done that way, these days.
If you scale the project back to something like just getting your code to analyze a short sample of a single note and give its pitch, that would be both impressive and doable with a lot of work. It could be done with a DFT algorithm instead of requiring FFT, reducing the amount of info you'd have to acquire first. That way, you'd only have to work your way up to understanding and implementing the material on this link which is about calculating the DFT. Notice that there is example code in BASIC. The code examples throughout this book are a big help.

Can FFT be used to find drum solos/breaks in audio files?

Is it possible with FFT to find a drum solo, or a drum break, in an audio file? Is this something FFT is able to do and are there any resources online that could aid me with learning?

In general, a FFT is not a good choice for detecting the onset of percussion sounds:
An FFT is always calculated over a window of samples (in effect a period of time) and yields the magnitude of signal within the bin and its phase offset. You can therefore determine that there is signal at that particular bin, but not its onset time. The best time resolution available is the window period. Of course, you can make the period shorter at the expense of frequency resolution.
Percussion sounds tend to look like noise and spread across the spectrum. This would be OK if you only had percussions sounds, but is not great in real-life polyphonic content.
However, you might be able to find some inference from the different characteristics of the spectra of a drum solo vs instrumental sections of a track.
The problem of finding the time at which percussion sounds start in music is described in academic journals as onset dectection and is one of the many techniques used for feature extraction; the wider field is known as Music Information Retrieval. Your problem sounds like one of identifying sections in audio files and this might be described as partitioning
A good place to start is Sonic Visualiser which is a tool written specifically for MIR applications. Plug-ins exist for various types of feature extraction. From these you will be able to easily find the large body of academic work in this area. There is an added bonus that the existing plug-ins are all open source too.

I'd look here, there was a bit of discussion with great pointers on the Gamedev SE: https://gamedev.stackexchange.com/questions/9761/beat-detection-and-fft :-)

Audio content analysis for online audiovisual data

I want to work on a project where I have to segment and classify online audiovisual data based on its audio content, i.e. different parts of the audio visual data will be segmented and classified as silence, music, speech, speech+background music, etc based on their audio content.
I am aware that I have to obtain the audio part from the audiovisual data and extract features like zero crossing, spectral peaks, etc. and find out segment boundaries in order to segment audio data.
But I'm lost in the beginning itself.
I do not know how to start off with the project. The output of the software are segments of audiovisual data under different categories like silence, speech, music, etc.
It will be really helpful if someone lets me know
Which programming language is convenient for this purpose?
What steps should i follow in order to develop this software?
I have no background in digital signal processing. It'll be really helpful if I get some guidance

I'd suggest to look into a multimedia framework such as GStreamer. It is crossplatform, but the easiest to get started on Linux where it originates from. It already comes with all kind of plugins to receve, demux and decode audio and video. It also has a couple of analyzers (such as level and spectrum analyzers for audio as well as voice activity detection). Those could be a good starting point for your experiments. Gstreamer itself is written in C, but applications can use the language bindings to python, perl, c#, c++, java, ...

How to amplify certain audio samples, particularly amplifying a certain frequency?

Can anyone provide sample pseudocode or share some existing link that has sample code.
Like for example I have a mix audio of 1kHz or 2kHz or 8kHz or so, and I want to boost certain frequencies like 1kHz only in real-time.
Reading some DSP books and resources confuses me.

You just need to design and implement a suitable digital filter. This is a large and complex subject area though, so you won't get a simple answer here. Probably the best thing as a first step would be to read a good introductory book on DSP, e.g. Understanding DSP by Rick Lyons, which is a very good for beginners as it's not too heavy on the math and has a more practical bent than most such introductory DSP books.
For this particular application though what you are trying to do is similar to implementing a graphic equalizer, and there are many pointers to how to implement this kind of thing if you use e.g. "graphic equalizer" as a search term.

There's a lot of math behind digital filtering. Sorry, I think it is important to at least understand basic filters (like those used in electronics). If you don't want to go through the basics: best to get an audio graphics equaliser where you can play with the (virtual) sliders. If you want to implement a very specific filter, please read on.
Real time: depends on your computing platform. If this is a small micro (like AVR, Microchip PIC,..) you'll need an efficient algorithm. This is likely a IIR band pass filter. The equivalent of a graphics equaliser consists of multiple band pass filters, all summed together. See http://en.wikipedia.org/wiki/Infinite_impulse_response
A more computing intensive algorithm uses FIR filters. In that case you can also control the phase of the filtered signal. http://en.wikipedia.org/wiki/Finite_impulse_response
If you find an algorithm (i.e. IIR), you'll need to calculate the coefficients. The algorithm is simple, calculating the coefficients is not.
I found a book matching your question: Audio digital signal processing in real time
I browsed through it; it seems to have the right answers.

I want to learn audio programming [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
At my high school we can take a class where we basically learn about a subject on our own for a semester. I was thinking that I want to learn about "sound programming," but I realized that I have no idea what that entails. I'm interested in learning about, for example, how a synthesizer works and how sound works in computer science. I really want to focus on the low-level code part, not so much the composition part. Is this a feasible subject? Are there any good tutorials out there for somebody completely new to this?
I know C++ and am using Windows. The first answer in this is something that interests me (although it's over my head).

"Sound programming" is a very broad field. First of all, it is definitely a feasible subject, but since you need to cram stuff into a single semester you will need to limit your scope. I can see that you're looking for a place to start, so here are some ideas to get you thinking.
Since you have mentioned both "how sound works in computer science" and "synthesizers", it's worth pointing out the difference between analogue sound, sampled sound and synthesized sound, as they are different concepts. I'll explain them briefly here.
Analogue sound is sound as we humans typically interpret it -- vibrations of air sensed by the human ear. You can think of sound as a one-dimensional signal, where the independent variable is time and the dependent variable is amplitude of vibration. Analogue sound is continuous both in the time and amplitude domain. Older sound recording methods (e.g. magnetic tape) used an analogue sound representation. Analogue sound is not frequently used with computers (computers aren't good with storing continuous-domain data), but understanding analogue signals is important nevertheless. Expect to see plenty of math (e.g. complex numbers, Fourier transforms) if you go down this path.
Sampled sound is the sound representation that lends itself well to processing with a computer. People are most familiar with sampled sound through CDs and other musical recordings. An analogue signal is sampled at some frequency (e.g. 44.1KHz for CD recording). So a sampled sound signal is discrete in the time domain. If the signal is quantized then it will be discrete in the amplitude domain as well. Formats like MP3 are sampled formats. There's lots of things to study in this field if you're interested, such as restoration (removing static, etc) and compression (again, codecs MP3, Ogg Vorbis). It's a lot of fun because there's lots to experiment with and code.
Both analogue and sampled sound dig deeply into a field called Digital Signal Processing. Google around for that to get a feel of what it's like. It's often taught as a course at universities, so if you're really keen you can have a look at some lecture slides or even try some of the earlier, simpler projects.
Synthesized sound is a representation that is suited for reproduction of a music track, where the instruments playing the track are known beforehand. Think of it as sheet music for the computer. Somebody has to write the sheet music -- you can't just record it like analogue or sampled sound. This makes synthesized sound a completely different representation to analogue sound and sampled sound Also, the computer needs to know what the instruments are (e.g. piano) so that it can play (synthesize) the track. If it doesn't know the instrument, it either gives up or picks a close match (e.g. replaces the piano with electric keyboard). I have never worked with synthesizers before so I can't comment on the learning curve for them.
So, based on what I wrote -- pick a direction that interests you more, Google around and then refine your question.
EDIT
A good book to read is this. You can probably look around related titles in Amazon and find something newer, but it's been a while since I did my audio processing shopping.
And if you have half an hour to spare, then have a look at this video tutorial. It covers sound, image and video processing -- they're actually closely related fields.

Consider working through the book "Who Is Fourier?: A Mathematical Adventure". You could adapt the examples to make small programming assignments that demonstrate the basic concepts. After you're done you should be able to use the fft to make a spectrogram of your voice as you pronounce the vowels a,e,i,o,u -- identifying the fundamental frequency and the formants of each vowel.
I recommend learning Python and the modules NumPy, SciPy, and matplotlib (there's a ton there, so beyond the basic tutorials, just learn as you go). The iPython shell has the option "-pylab -p scipy" to automatically import the most common tools into your namespace. You can record and play audio using PyAudio. There's also Pygame, which expands on SDL (Simple DirectMedia layer), and pyglet, which uses OpenAL (the OpenGL of audio; it does 3D audio and effects).
As to C/C++, there's IT++, SPUC, and FFTW for signal processing, and SDL/SDL_mixer and OpenAL/ALmixer for interfacing with hardware and audio files.

I would recommend this book : http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=8218
(part of it is available here :
http://books.google.com/books?id=nZ-TetwzVcIC&printsec=frontcover&dq=computer+musical+tutorial&hl=pt-BR&ei=D-dKTaKsBMOB8gbF4KDcDg&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDgQ6AEwAA#v=onepage&q=computer%20musical%20tutorial&f=false )
And another thing you could look is at puredata , it's a open source graphical environment for sound programming, and it's great for beginners. ( http://puredata.info/ )

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string