Is there a way to detect the positions (start and end) where other loud sounds than speech exist in an audio file? For example, the sound of tapping something, popping sound effect, mouse click sound effect, short computer-generated music, etc.
In summary, the conditions are:
The sound is not human voice.
The sound is louder than the average volume of the human speech in that audio file.
There is a number of readily available open source Voice Activity Detectors. If the following conditions are met:
The given audio frame is NOT classified as speech and
The audio frame energy is above the adaptive threshold calculated on speech frames
classify the frame as "loud non-speech".
RNNoise, a noise suppression library, has a very good VAD and the algorithm easily works in real time.
Here is a rough example on how to use the library to get the VAD:
#include <stdio.h>
#include "rnnoise.h"
#define FRAME_SIZE 480
int main(int argc, char **argv) {
int i;
int framepos = 0;
float vad;
float x[FRAME_SIZE];
FILE *f1;
DenoiseState *st;
st = rnnoise_create(NULL);
if (argc!=2) {
fprintf(stderr, "usage: %s <noisy speech>\n", argv[0]);
return 1;
}
f1 = fopen(argv[1], "rb");
while (1) {
short tmp[FRAME_SIZE];
fread(tmp, sizeof(short), FRAME_SIZE, f1);
if (feof(f1)) break;
for (i=0;i<FRAME_SIZE;i++) x[i] = tmp[i];
vad = rnnoise_process_frame(st, x, x);
if (vad < 0.1) printf("Non-speech frame position %d VAD %f", framepos, vad);
framepos += FRAME_SIZE;
}
rnnoise_destroy(st);
fclose(f1);
return 0;
}
I didn't compile it / run, so you might need to fix a line or two to get it working.
Related
I am new to signal processing and I don't really understand the basics (and more). Sorry in advance for any mistake into my understanding so far.
I am writing C code to detect a basic signal (18Hz simple sinusoid 2 sec duration, generating it using Audacity is pretty simple) into a much bigger mp3 file. I read the mp3 file and copy it until I match the sound signal.
The signal to match is { 1st channel: 18Hz sin. signal , 2nd channel: nothing/doesn't matter).
To match the sound, I am calculating the frequency of the mp3 until I find a good percentage of 18Hz freq. during ~ 2 sec. As this frequency is not very common, I don't have to match it very precisely.
I used mpg123 to convert my file, I fill the buffers with what it returns. I initialised it to convert the mp3 to Mono RAW audio:
init:
int ret;
const long *rates;
size_t rate_count, i;
mpg123_rates(&rates, &rate_count);
mpg123_handle *m = mpg123_new(NULL, &ret);
mpg123_format_none(m);
for(i=0; i<rate_count; ++i)
mpg123_format(m, rates[i], MPG123_MONO, MPG123_ENC_SIGNED_32);
if(m == NULL)
{
//err
} else {
mpg123_open_feed(m);
}
(...)
unsigned char out[8*MAX_MP3_BUF_SIZE];
ret = mpg123_decode(m, buf->data, buf->size, out, 8*MAX_MP3_BUF_SIZE, &size);
`(...)
unsigned char out[8*MAX_MP3_BUF_SIZE];
ret = mpg123_decode(m, buf->data, buf->size, out, 8*MAX_MP3_BUF_SIZE, &size);
(...) `
But I have to idea how to get the resulting buffer to calculate the FFT to get the frequency.
//FREQ Calculation with libfftw3
int transform_size = MAX_MP3_BUF_SIZE * 2;
fftw_complex *fftout = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
fftw_complex *fftin = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
fftw_plan p = fftw_plan_dft_r2c_1d(transform_size, fftin, fftout, FFTW_ESTIMATE);
I can get a good RAW Audio (PCM ?) into a buffer (if I write it, it can be read and converted into wave with sox:
sox --magic -r 44100 -e signed -b 32 -c 1 rps.raw rps.wav
Any help is appreciated. My knowledge of signal processing is poor, I am not even sure of what to do with the FFT to get the frequency of the signal. Code is just fyi, it is contained into a much bigger project (for which a simple grep is not an option)
Don't use MP3 for this. There's a good chance your 18 Hz will disappear or at least become distorted. 18 Hz is will below audible. MP3 and other lossy algorithms use a variety of techniques to remove sounds that we're not going to hear.
Assuming PCM, since you only need one frequency band, consider using the Goertzel algorithm. This is more efficient than FFT/DFT for your use case.
Please let me know if you have any If you have a source code to draw a line in a video with opencv 3.0.0 using c++
cordially
First of all you should consider that a video is basicly just some images displayed quickly after each other. Therefor you only need to know how to draw a line onto an image to draw it in a video (do the same for each frame). The cv::line function is documented here : http://docs.opencv.org/3.0-beta/modules/imgproc/doc/drawing_functions.html.
int main(int argc, char** argv)
{
// read the camera input
VideoCapture cap(0);
if (!cap.isOpened())
return -1;
Mat frame;
/// Create Window
namedWindow("Result", 1);
while (true) {
//grab and retrieve each frames of the video sequentially
cap >> frame;
//draw a line onto the frame
line(frame, Point(0, frame.rows / 2), Point(frame.cols, frame.rows / 2), Scalar(0), 3);
//display the result
imshow("Result", frame);
//wait some time for the frame to render
waitKey(30);
}
return 0;
}
This will draw a horizontal, black, 3 pixel thick line on the video-feed from your webcam.
I am building a custom IP core in Vivado HLS to run withing image/video processing system that runs in embedded linux on the Zybo board. The core takes image/video data in via and AXI stream, performs a processing task (say Sobel), then outputs this to another AXI stream. This works, however, I'm wishing to use the on board switches for the Zybo to determine which processing task should be ran (default is a passthrough).
I cannot find a resource or simple example that shows (in HLS.. not IP Integrator or the Vivado SDK) how to create a HLS RESOURCE/INTERFACE to read the data from the GPIO switches. What I have is the code below in my top module:
#include <hls_video.h>
#include "ip_types.h"
void MultiImaging(AXI_STREAM& inputStream, AXI_STREAM& outputStream, int rows, int cols, bool sw0, bool sw1)
{
#pragma HLS INTERFACE axis port=inputStream
#pragma HLS INTERFACE axis port=outputStream
#pragma HLS RESOURCE variable=rows core=AXI_SLAVE metadata="-bus_bundle CONTROL_BUS"
#pragma HLS RESOURCE variable=cols core=AXI_SLAVE metadata="-bus_bundle CONTROL_BUS"
#pragma HLS INTERFACE ap_stable port=rows
#pragma HLS INTERFACE ap_stable port=cols
//are these two correct for the switches?
#pragma HLS INTERFACE axis port=sw0
#pragma HLS INTERFACE axis port=sw1
//are these two correct for the switches?
#pragma HLS RESOURCE variable=sw0 core=AXI_SLAVE //GPIO?
#pragma HLS RESOURCE variable=sw1 core=AXI_SLAVE //GPIO?
RGB_IMAGE img(rows, cols);
RGB_IMAGE oimg(rows, cols);
RGB_IMAGE sobel_output(rows,cols);
RGB_IMAGE imgh(rows, cols);
RGB_IMAGE imgv(rows, cols);
RGB_IMAGE hsobel(rows, cols);
RGB_IMAGE vsobel(rows, cols);
GRAY_IMAGE imgGray(rows, cols);
GRAY_IMAGE oimgGray(rows, cols);
#pragma HLS dataflow
hls::AXIvideo2Mat(inputStream, img);
//Passthrough
if(sw1 == 0 && sw0 == 0){
//..code here
}
//Sobel
else if(sw1 == 0 && sw0 == 1){
//..code here
}
//Threshold
else if(sw1 == 1 && sw0 == 0){
//..code here
}
//..etc
}
The above works and gives the proper output for 'C Simulation' and 'C Synthesis.' It errors out in the 'RTL/C Cosimulation' with: "OpenCV Error: Sizes of input arguments do not match." This makes no sense to me since ALL RGB_IMAGES are set initially with the same rows/cols.
Well, the size of the data is NOT done, in this specific case, only by ROWs and COLs.
Try to have a look in your header file, there should be something like:
// typedef video library core structures
typedef hls::stream<ap_axiu<24,1,1,1> > AXI_STREAM;
typedef hls::Scalar<3, unsigned char> RGB_PIXEL;
typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> RGB_IMAGE;
This is clear because you are using AXI_STREAM. Here you are defining how many bit there are in a pixel, how many channel color in your pixel and so on. If the size of the images is the same, the sentence "Sizes of input arguments do not match" refers to this mismatching problem with the top main function.
Does the WaveOut API has some internal limitation of the size for the current piece of buffer played ? I mean if I provide a very small buffer does it affects somehow the sound played to the speakers. I am experiencing very strange noise when I am generating and playing the sinus wave with small buffer. Something like a peak, or "BUMP".
The complete Story:
I made a program that can generate Sinus sound signal in real time.
The variable parameters are Frequency and Volume. The project requirement was to have a maximum latency of 50 ms. So the program must be able to produce Sinus signals with manually adjustable frequency of audio signal in real time.
I used Windows WaveOut API, C# and P/invoke to access the API.
Everything works fine when the sound buffer is 1000 ms large. If I minimize the buffer to 50 ms as per latency requirement then for certain frequencies I am experiencing at the end of every buffer, a noise or "BUMP". I do not understand if the sound generated is malformed ( I checked and is not) or something happens with the Audio chip, or some delay in initializing and playing.
When I save the produced audio to .wav file everything is perfect.
This means the must be some bug in my code or the audio subsystem has a limitation to the buffer chunks sent to it.
For those who doesn't know WaveOut must be initialized at first time and then must be prepared with audio headers for each buffer that are containing the number of bytes that needs to be played and the pointer to a memory that contains the audio that needs to be player.
UPDATE
Noise happens with the following combinations 44100 SamplingRate, 16 Bits, 2 channels, 50 ms buffer and generated Sinus audio signal of 201Hz, 202Hz, 203Hz, 204Hz, 205Hz ... 219Hz,
220Hz, 240 Hz, is ok
Why is this difference of 20, I do not know.
There are a few things to keep in mind when you need to output audio smoothly:
waveOutXxxx API is a legacy/compatibility layer on top of lower level API and as such it has greater overhead and is not recommended when you are to reach minimal latency. Note that this is unlikely to be your primary problem, but this is a piece of general knowledge helpful for understanding
because Windows is not real time OS and its audio subsystem is not realtime either you don't have control over random latency involved between you queue audio data for output and the data is really played back, the key is to keep certain level of buffer fullness which protects you from playback underflows and delivers smooth playback
with waveOutXxxx you are no limited to having single buffer, you can allocate multiple reusable buffers and recycle them
All in all, waveOutXxxx, DirectSound, DirectShow APIs work well with latencies 50 ms and up. With WASAPI exclusive mode streams you can get 5 ms latencies and even lower.
EDIT: I seem to have said too early about 20 ms latencies. To compensate for this, here is a simple tool LowLatencyWaveOutPlay (Win32, x64) to estimate the latency you can achieve. With sufficient buffering playback is smooth, otherwise you hear stuttering.
My understanding is that buffers might be returned late and the optimal design in terms of smallest latency lies along the line of having more smaller buffers so that you are given them back as early as possible. For example, 10 buffers 3 ms/buffer rather than 3 buffers 10 ms/buffer.
D:\>LowLatencyWaveOutPlay.exe 48000 10 3
Format: 48000 Hz, 1 channels, 16 bits per sample
Buffer Count: 10
Buffer Length: 3 ms (288 bytes)
Signal Frequency: 1000 Hz
^C
So I came here because I wanted to find the basic latency of waveoutwrite() as well. I got around 25-26ms of latency before I got to the smooth sine tone.
This is for:
AMD Phenom(tm) 9850 Quad-Core Processor 2.51 GHz
4.00 GB ram
64-bit operating system, x64-based processor
Windows 10 Enterprise N
The code follows. It is a modfied version of Petzold's sine wave program, refactored to run on the command line. I also changed the polling of buffers to use of a callback on buffer complete with the idea that this would make the program more efficient, but it didn't make a difference.
It also has a setup for elapsed timing, which I used to probe various timings for operations on the buffers. Using those I get:
Sine wave output program
Channels: 2
Sample rate: 44100
Bytes per second: 176400
Block align: 4
Bits per sample: 16
Time per buffer: 0.025850
Total time prepare header: 87.5000000000 usec
Total time to fill: 327.9000000000 usec
Total time for waveOutWrite: 90.8000000000 usec
Program:
/*******************************************************************************
WaveOut example program
Based on C. Petzold's sine wave example, outputs a sine wave via the waveOut
API in Win32.
*******************************************************************************/
#include <stdio.h>
#include <windows.h>
#include <math.h>
#include <limits.h>
#include <unistd.h>
#define SAMPLE_RATE 44100
#define FREQ_INIT 440
#define OUT_BUFFER_SIZE 570*4
#define PI 3.14159
#define CHANNELS 2
#define BITS 16
#define MAXTIM 1000000000
double fAngle;
LARGE_INTEGER perffreq;
PWAVEHDR pWaveHdr1, pWaveHdr2;
int iFreq = FREQ_INIT;
VOID FillBuffer (short* pBuffer, int iFreq)
{
int i;
int c;
for (i = 0 ; i < OUT_BUFFER_SIZE ; i += CHANNELS) {
for (c = 0; c < CHANNELS; c++)
pBuffer[i+c] = (short)(SHRT_MAX*sin (fAngle));
fAngle += 2*PI*iFreq/SAMPLE_RATE;
if (fAngle > 2 * PI) fAngle -= 2*PI;
}
}
double elapsed(LARGE_INTEGER t)
{
LARGE_INTEGER rt;
long tt;
QueryPerformanceCounter(&rt);
tt = rt.QuadPart-t.QuadPart;
return (tt*(1.0/(double)perffreq.QuadPart));
}
void CALLBACK waveOutProc(HWAVEOUT hwo, UINT uMsg, DWORD_PTR dwInstance, DWORD_PTR dwParam1, DWORD_PTR dwParam2)
{
if (uMsg == WOM_DONE) {
if (pWaveHdr1->dwFlags & WHDR_DONE) {
FillBuffer((short*)pWaveHdr1->lpData, iFreq);
waveOutWrite(hwo, pWaveHdr1, sizeof(WAVEHDR));
}
if (pWaveHdr2->dwFlags & WHDR_DONE) {
FillBuffer((short*)pWaveHdr2->lpData, iFreq);
waveOutWrite(hwo, pWaveHdr2, sizeof(WAVEHDR));
}
}
}
int main()
{
HWAVEOUT hWaveOut ;
short* pBuffer1;
short* pBuffer2;
short* pBuffer3;
WAVEFORMATEX waveformat;
UINT wReturn;
int bytes;
long t;
LARGE_INTEGER rt;
double timprep;
double filtim;
double waveouttim;
printf("Sine wave output program\n");
fAngle = 0; /* start sine angle */
QueryPerformanceFrequency(&perffreq);
pWaveHdr1 = malloc (sizeof (WAVEHDR));
pWaveHdr2 = malloc (sizeof (WAVEHDR));
pBuffer1 = malloc (OUT_BUFFER_SIZE*sizeof(short));
pBuffer2 = malloc (OUT_BUFFER_SIZE*sizeof(short));
pBuffer3 = malloc (OUT_BUFFER_SIZE*sizeof(short));
if (!pWaveHdr1 || !pWaveHdr2 || !pBuffer1 || !pBuffer2) {
if (!pWaveHdr1) free (pWaveHdr1) ;
if (!pWaveHdr2) free (pWaveHdr2) ;
if (!pBuffer1) free (pBuffer1) ;
if (!pBuffer2) free (pBuffer2) ;
fprintf(stderr, "*** Error: No memory\n");
exit(1);
}
// Load prime parameters to format
waveformat.wFormatTag = WAVE_FORMAT_PCM;
waveformat.nChannels = CHANNELS;
waveformat.nSamplesPerSec = SAMPLE_RATE;
waveformat.wBitsPerSample = BITS;
waveformat.cbSize = 0;
// Calculate other parameters
bytes = waveformat.wBitsPerSample/8; /* find bytes per sample */
if (waveformat.wBitsPerSample&8) bytes++; /* round up */
bytes *= waveformat.nChannels; /* find total channels size */
waveformat.nBlockAlign = bytes; /* set block align */
/* find average bytes/sec */
waveformat.nAvgBytesPerSec = bytes*waveformat.nSamplesPerSec;
printf("Channels: %d\n", waveformat.nChannels);
printf("Sample rate: %d\n", waveformat.nSamplesPerSec);
printf("Bytes per second: %d\n", waveformat.nAvgBytesPerSec);
printf("Block align: %d\n", waveformat.nBlockAlign);
printf("Bits per sample: %d\n", waveformat.wBitsPerSample);
printf("Time per buffer: %f\n",
OUT_BUFFER_SIZE*sizeof(short)/(double)waveformat.nAvgBytesPerSec);
if (waveOutOpen (&hWaveOut, WAVE_MAPPER, &waveformat, (DWORD_PTR)waveOutProc, 0, CALLBACK_FUNCTION)
!= MMSYSERR_NOERROR) {
free (pWaveHdr1) ;
free (pWaveHdr2) ;
free (pBuffer1) ;
free (pBuffer2) ;
hWaveOut = NULL ;
fprintf(stderr, "*** Error: No memory\n");
exit(1);
}
// Set up headers and prepare them
pWaveHdr1->lpData = (LPSTR)pBuffer1;
pWaveHdr1->dwBufferLength = OUT_BUFFER_SIZE*sizeof(short);
pWaveHdr1->dwBytesRecorded = 0;
pWaveHdr1->dwUser = 0;
pWaveHdr1->dwFlags = WHDR_DONE;
pWaveHdr1->dwLoops = 1;
pWaveHdr1->lpNext = NULL;
pWaveHdr1->reserved = 0;
QueryPerformanceCounter(&rt);
waveOutPrepareHeader(hWaveOut, pWaveHdr1, sizeof (WAVEHDR));
timprep = elapsed(rt);
pWaveHdr2->lpData = (LPSTR)pBuffer2;
pWaveHdr2->dwBufferLength = OUT_BUFFER_SIZE*sizeof(short);
pWaveHdr2->dwBytesRecorded = 0;
pWaveHdr2->dwUser = 0;
pWaveHdr2->dwFlags = WHDR_DONE;
pWaveHdr2->dwLoops = 1;
pWaveHdr2->lpNext = NULL;
pWaveHdr2->reserved = 0;
waveOutPrepareHeader(hWaveOut, pWaveHdr2, sizeof (WAVEHDR));
// Send two buffers to waveform output device
QueryPerformanceCounter(&rt);
FillBuffer (pBuffer1, iFreq);
filtim = elapsed(rt);
QueryPerformanceCounter(&rt);
waveOutWrite (hWaveOut, pWaveHdr1, sizeof (WAVEHDR));
waveouttim = elapsed(rt);
FillBuffer (pBuffer2, iFreq);
waveOutWrite (hWaveOut, pWaveHdr2, sizeof (WAVEHDR));
// Run waveform loop
sleep(10);
printf("Total time prepare header: %.10f usec\n", timprep*1000000);
printf("Total time to fill: %.10f usec\n", filtim*1000000);
printf("Total time for waveOutWrite: %.10f usec\n", waveouttim*1000000);
waveOutUnprepareHeader(hWaveOut, pWaveHdr1, sizeof (WAVEHDR));
waveOutUnprepareHeader(hWaveOut, pWaveHdr2, sizeof (WAVEHDR));
// Close waveform file
free (pWaveHdr1) ;
free (pWaveHdr2) ;
free (pBuffer1) ;
free (pBuffer2) ;
}
I am trying to generate a tone to the sound card (Frequency: 1950 hz, duration: 40 ms, level: -30 db, right-channel, on steam 1). Any recommendations on how to accomplish this using C++ or C#. Are there any libraries (C++ or C#) for generating such precise tone?
David, playing audio to the speakers was built right into .NET (i think in the .NET 2.0 Framework). Using the System.Media.SoundPlayer you can play a sound from a memory stream that you build (in WAV format). Here is a function i coded that plays a simple frequency for a certain duration. Regarding the decibels and sending it to the sound card, i don't really understand what specifics you are referring to. For instance i fail to understand how audio as measured in decibels is sent to the sound card. My understanding is that decibels are simply a measure of how loud a sound is, thus after it's been reproduced by the speakers. Thus the volume control on the speakers affects what decibel your sounds will produce, and sending a certain decibel to the sound card thus makes no sense to me. Maybe you need something more detailed and maybe this doesn't work for you. But maybe you can run with this and get it to work for what you need. And maybe it is almost exactly what you are asking.
The process i use in this code allows one to build any audio you want, and plays it. So you can create 2 sine waves or many, many more, or triangle waves, or even speech synthesis with this method if you want. This method takes sound samples which are calculated and then plays those, so you need to code what each audio sample needs to be at the given moment in time. WAV allows stereo sound too, but this code sample only uses non-stereo sound. If you want stereo sound then it just needs modified to generate the bytes for a stereo WAV format instead. I expect it would not be too difficult.
Happy coding!
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Windows.Forms;
public static void PlayBeep(UInt16 frequency, int msDuration, UInt16 volume = 16383)
{
var mStrm = new MemoryStream();
BinaryWriter writer = new BinaryWriter(mStrm);
const double TAU = 2 * Math.PI;
int formatChunkSize = 16;
int headerSize = 8;
short formatType = 1;
short tracks = 1;
int samplesPerSecond = 44100;
short bitsPerSample = 16;
short frameSize = (short)(tracks * ((bitsPerSample + 7) / 8));
int bytesPerSecond = samplesPerSecond * frameSize;
int waveSize = 4;
int samples = (int)((decimal)samplesPerSecond * msDuration / 1000);
int dataChunkSize = samples * frameSize;
int fileSize = waveSize + headerSize + formatChunkSize + headerSize + dataChunkSize;
// var encoding = new System.Text.UTF8Encoding();
writer.Write(0x46464952); // = encoding.GetBytes("RIFF")
writer.Write(fileSize);
writer.Write(0x45564157); // = encoding.GetBytes("WAVE")
writer.Write(0x20746D66); // = encoding.GetBytes("fmt ")
writer.Write(formatChunkSize);
writer.Write(formatType);
writer.Write(tracks);
writer.Write(samplesPerSecond);
writer.Write(bytesPerSecond);
writer.Write(frameSize);
writer.Write(bitsPerSample);
writer.Write(0x61746164); // = encoding.GetBytes("data")
writer.Write(dataChunkSize);
{
double theta = frequency * TAU / (double)samplesPerSecond;
// 'volume' is UInt16 with range 0 thru Uint16.MaxValue ( = 65 535)
// we need 'amp' to have the range of 0 thru Int16.MaxValue ( = 32 767)
double amp = volume >> 2; // so we simply set amp = volume / 2
for (int step = 0; step < samples; step++)
{
short s = (short)(amp * Math.Sin(theta * (double)step));
writer.Write(s);
}
}
mStrm.Seek(0, SeekOrigin.Begin);
new System.Media.SoundPlayer(mStrm).Play();
writer.Close();
mStrm.Close();
} // public static void PlayBeep(UInt16 frequency, int msDuration, UInt16 volume = 16383)
NAudio provides a robust audio library for .NET.
NAudio is an open source .NET audio and MIDI library, containing dozens of useful audio related classes intended to speed development of audio related utilities in .NET. It has been in development since 2002 and has grown to include a wide variety of features. While some parts of the library are relatively new and incomplete, the more mature features have undergone extensive testing and can be quickly used to add audio capabilities to an existing .NET application. NAudio can be quickly added to your .NET application using NuGet.
Here's an article that walks step-by-step through using NAudio to create a sine wave. You can create the sine wave with any desired frequency, for any desired duration:
http://msdn.microsoft.com/en-us/magazine/ee309883.aspx