(How) Can I get a stream of all sounds recorded from the microphone that my computer did not produce? (using PulseAudio or something else) - linux

I've been playing around with some speech-to-text and text-to-speech systems, and am running into the problem that when the computer makes sounds that it can recognize, it starts taking commands from itself. To avoid this, I'd like a stream of all sounds picked up by the microphone that were not produced by the computer itself.
I see that PulseAudio has an echo cancellation module, but so far I have been unable to distinguish between its output and the raw microphone output: it still contains all the sounds picked up by the microphone that came from the computer speakers. I wonder if the default echo canceller is doing the opposite of what I want (i.e., it removes sounds heard by the microphone from being sent to the speakers).
Any idea how I can do this (preferably with pacmd)? I have thoroughly confused myself trying to specify non-default sources for the echo canceller, and have wandered into loopback modules and other things that are probably irrelevant. I know very little about PulseAudio, haven't found a good introduction to it (I've looked through much of the PulseAudio documentation but didn't see anything relevant), and might just be missing something simple. I feel frustrated that echo cancellation apparently doesn't work, I can't find documentation on it, and I can't find examples of it working from other people.
Thanks in advance for the help!
Other details that might be relevant: I'm running Ubuntu Saucy on a Lenovo Thinkpad T410. I'm using the built-in microphone and speakers (so, I'm pretty sure they're using the same sound card and I won't have clock drift issues). My actual application gets its sound through GStreamer, but GStreamer gets it from PulseAudio, and I don't think GStreamer itself has AEC capabilities. If there's a different way of doing this, I'd gladly switch to that.

Ah, I've got it! Merely loading the echo cancellation plugin isn't enough; you then need to start using it. In particular, it will only cancel echos of sounds passed into it, and if no sounds go through it, nothing will be cancelled. So, open /etc/pulse/default.pa and add the line
load-module module-echo-cancel
towards the bottom (I put it right after the line that loads module-filter-apply). Then, restart the PulseAudio daemon by running (as a non-root user) pulseaudio -k. Next, run pacmd to get a command line interface to PulseAudio, and give it the commands list-sources and list-sinks. Note the indices of the echo canceller in the responses. Edit /etc/pulse/default.pa again, and uncomment the two lines at the end about set-default, replacing the words input and output with the indices of the echo canceller's source and sink. Finally, restart PulseAudio again with pulseaudio -k (again, run as a non-root user).
Now, by default all sounds to be output get sent through the echo canceller before heading to the speakers, and all sounds to be input get pulled from the echo canceller after coming in through the microphone, and things actually work. You can verify that it's working by running pavucontrol and looking at the sound levels on the Input Devices screen (try playing some music and speaking, and note that the echo cancelled input shows normal sound levels when you speak but very low levels (verging on nothing) when you're silent but the music is playing).
This answer mostly comes from this post, which I wish I'd found weeks ago.

Related

LIRC and audio bugging each other out on Raspbian

I'm having a problem with LIRC breaking audio on the OS scale after firing a command.
For example, I'd do:
irsend send_once Samsung_BN59-01224C KEY_VOLUMEUP --count=5
and afterwards, play an audio file, and the program governing that file would seize up and not play any sound. Same goes for a script I've written that uses the pygame library for python.
What's worse is that LIRC also stops firing correctly after this bug occurs. I can see infrared light being shot out of the diode, but there might be something off with the timing.
This happens both ways, so, after playing an audio file, LIRC will stop working but further playing of audio is possible.
The following extremely rarely but sometimes I'm able to play audio after LIRC finishes a command, and the result is heavily pitched down version of the original sound that cuts out after around a second or so.
Tested with different remotes, same results occur. I'm not sure if the fix that a user proposed in this thread could cause this (https://github.com/raspberrypi/linux/issues/2993) but I'm putting it out there that I used it, since unmodified LIRC has problems with both the receiver and transmitter turned on in /boot/config.txt. The rest of my installation is standard.
Fixed this by reverting the fix I posted in the last paragraph. Apparently using PWM for Infrared causes issues with onboard audio on raspbian. I commented out the lines responsible for the receiver and left the gpio-ir-tx option uncommented. Works fine with just the transmitter on.

Beaglebone Black Custom Audio Cape DMA/IRQ trouble

I'm running a BBB straight out of the box running debian. Kernel version is 3.8.13-bone-47.
I'm working with a cape that is very similar to the one here. The difference is that I'm using a TLV320AIC3106 instead of the AIC3104, and I only have enabled the audio out, I'm not interested in recording audio in this application.
My pinout for my application is identical to the cape in the link above.
I've followed the link here to get the cape up and running. Everything that I have matches the output of the tutorial up until I try and play a sample wave file.
When I play a sample wave file, I get the following message: aplay: pcm_write:1710: write error: Input/output error
Running dmesg gives me ALSA sound/core/pcm_lib.c:1010 playback write error (DMA or IRQ trouble?)
Where I'm having trouble is I don't understand how the DMA is coming in to play. Is this a DMA problem? Is it a symptom of something else going wrong like my I2C? Am I missing a configuration somewhere else?
Any thoughts on how to track this down are appreciated.
I realize it has been covered in multiple places before, but it can never be stressed enough. Make sure that when you're sending out information, make sure it goes to the right address over the I2C. I figured out this morning that the audio codec was at address 0x1B, while the driver was addressed to 0x18. Small but critical difference.
The easy fix is to edit the BB-BONE-AUDI-02-00A0.dts file.
Edit line 65 to <0x1B>. Re-compile using the line: dtc -O dtb -o BB-BONE-AUDI-02-00A0.dtbo -b 0 -# BB-BONE-AUDI-02-00A0.dts
move the generated file to the /lib/firmware directory
Insert it using echo BB-BONE-AUDI-02 > /sys/devices/bone_capemgr*/slots
After applying this simple fix it seems to work. I can't say for certain because I have to get the audio amplifier circuit up and running still. At least aplay will play the file without crashing out on me, which is a start.

Which version of linux has support for Dolby Advanced Audio v2?

I am using LENEVO G500 Laptop and my sound card has support for Dolby Advanced Audio v2 that works nicely in Windows OS (i.e. Windows 7, 8 and 8.1). However I have failed to enable Dolby sound effect in my Linux OS (have tried it with Linux Mint 17, Fedora 20).
Does anyone have an idea which linux version has support for this feature or how I can enable in a linux OS.
I would appreciate if you could direct me to the right direction.
Thanks.
I've googled out a very good advise on forums that helps me to achieve a Dolby like sound on my Kubuntu 19.04 with Lenovo g780.
Install PulseEffects https://github.com/wwmm/pulseeffects
(repos with deb files are here: https://github.com/wwmm/pulseeffects/wiki/Package-Repositories#debian--ubuntu)
Restart the user session or reboot after this, because PulseAudio will be upgraded, and it may cause problems if you don't restart.
Run PulseEffects and close it. It'll create all settings dirs on first launch. They required for next step.
Install PulseEffects-Presets from here: https://github.com/JackHack96/PulseEffects-Presets
(I've used the suggested script that automatically downloads them to PulseEffects import dirs, it will require flatpak that could be got from repos with sudo apt install flatpak)
Launch PulseEffects again. Select Convolver. Enable it. Click on wave button. You'll see a list of presets. Enable:
Dolby ATMOS ((128K MP3)) 1.Default.irs
Close the dialog and that's it. You can toggle Convolver in PulseEffects on and off while playng music to compare results. You may play with other presets as well.
For improving sound on a notebook or a tablet PulseEffects help pages come with a tuorial about how to achieve this.
App can be minimized to tray on GTK-enabled desktops with an additional application: https://github.com/boomshop/pulseffectstray
It's better enable autostart in app settings (it will copy it's desktop file to ~/.config/autostart with --gapplication-service command line. So next time start without GUI).
It is possible to get reasonably close to the Dolby Advanced Audio output on Linux.
TLDR:
Record the result of playing a -0dBFS impulse in Windows with all effects enabled. Save that as a wav file and use it as input to the PulseEffects Convolver.
Step-by-step:
Install Audacity in Windows.
Configure Audacity to use WASAPI
Select the loopback device as input
Select your laptop speakers as output, making sure all Dolby Advanced Audio effects are enabled.
Start recording
Play an impulse audio file (you might need to do this twice, Audacity often doesn't pick up the first impulse.
Zoom in and select the area around the recorded impulse (see screenshot below)
Export the selection as a WAV file and change the extension to irs
Import this irs file into the Convolver.
Some notes:
Audacity isn't required, presumably any software capable of recording from the output device will be fine.
To avoid any changes introduced by sample rate conversion, set the sample rate of the output and input devices in Windows to be the same.
When recording, in Step 5, Audacity would not record unless audio was playing. This is probably due to using WASAPI. Just start recording, play the impulse and if you don't see it in the recording output as a single spike, play it again.
The screenshot is quite heavily zoomed in so that you can see the area where there is data. When selecting what to export, try to make sure the selection is roughly centered around the central peak. It doesn't have to be perfect.
As a useful check to make sure what you are recording has been processed by Dolby Advanced Audio, you can disable all effects on the output device in Windows and record the impulse a second time. This should show up as a single peak sample and not the symmetric pattern.
After a bit of research I found this explanation that seems to have satisfied my query. It generally says ...
There isn't going to be an easy fix for this, unless Dolby releases a Linux driver or publishes more information on what exactly their software is doing (which is unlikely).
Haswell-ThinkPad-problems, linux-low-audio-quality
Beware that recently PulseEffects has changed it's name to EasyEffects, but PulseEffects-Presets hasn't updated it's config files to cover this change; Therefore this answer might not be applicable anymore.

low latency sounds on key presses

I am trying to write an application(I'm a gui first timer) for my son, he has autism. There is a video player in the top half and a text entry area in the bottom. When letters are typed sounds are produced to mimic the words in the video.
There have been other posts on this site in regard to playing sounds on key presses, using gstreamer as a system call. I have also tried libcanberra but both seem to have significant delays between sounds. I can write the app in python or C but will likely do at least some of it in C.
I also want to mention that the video portion is being played by gstreamer. I tried to create two instances of gstreamer, to avoid expensive system calls but the audio instance seemed to kill the app when called.
If anyone has any tips on creating faster responding sounds I would really appreciate it.
You can upload a raw audio sample directly to PulseAudio so there will be no decoding and (perhaps save) extra switches by using the following function from Canberra:
http://developer.gnome.org/libcanberra/unstable/libcanberra-canberra.html#ca-context-cache
The next ca_context_play() will use it.
However, the biggest problem you'll encounter with this scenario (with simultaneous video playback) is that the audio device might be configured with large latency with PulseAudio (up to 1/2s or more for normal playback). It may be reasonable to file a bug to libcanberra to support a LOW_LATENCY flag, as it currently doesn't attempt to minimize delay for sound events afaik. That would be great to have.
GStreamer pulsesink could probably get low latency too (it has some properties for that), but I am afraid it won't be as lightweight as libcanberra, and you won't be able to cache a sample for instance. Ideally, GStreamer could also learn to cache samples, or pre-fill PulseAudio...

How to redirect from Audio Output to Mic Input using PulseAudio?

I'm working on a mobile app for Maemo/MeeGo and Maemo uses PulseAudio. I want to play a mp3 to caller (and cancel the mic when doing it, and not to listen caller, everything should be done on background), to do this, I have to redirect Audio Output from a certain (if not possible, all) app, fake it as a Input and make Phone app use it.
On my Ubuntu PC, I did it with pavucontrol. I created a NULL sink, then:
Audio Output (from Amarok) --> to NULL Output
Skype Input <-- NULL Output
Skype Output --> NULL
And It worked, Amarok played the music and It was streaming to Skype, without playing it to me and I didn't hear anything about all process. Problem is;
a) Maemo does not have pavucontrol.
b) Even If it did (or if I package it) It wouldn't be any good since It's a only-GUI app and I have to do all of this stuff on background, without any user input. (mean: CLI or API)
Asked about this on Freenode #pulseaudio and a helpful guy said "It can pretty much be done via pactl or pacmd, the commands you want are move-sink-input and move-source-output, but you need to know device and stream indexes." So It looks like pavucontrol is just a GUI, pactl and pacmd are the real deal, and most importantly, they're CLI apps.
I'm really thankful to him but I don't know anything about "pactl", "pacmd", "move-sink-input" or "device/stream indexes" so I need a very simplified manual page, or a source of similar app, a one-liner command (two? whole page of commands?, just give me them! ^^) or someone with enough patience to explain this stuff to me.
The manuals for pactl and pacmd are readily available if you give it a search. I found them here:
pactl
pacmd (pulse-cli-syntax)
I think you are interested in the following excerpt from the pactl manual:
move-sink-input ID SINK :
Move the specified playback stream (identified by its numerical index) to the specified sink (identified by its symbolic name or numerical index).
You should be able to use pactl list sink-inputs, pactl list source-outputs, pactl list sinks, and pactl list sources combined with some grep and/or sed or something of that nature to programmatically determine the correct stream and sink.

Resources