API for manipulating audio output in windows 8

API for manipulating audio output in windows 8 - audio

I want to manipulate audio output data, for all the different running applications, before it is sent to the speakers.
Turn the volume up or down, filter the audio, things like that.
How can I gain access to the audio output in real time?
Is there a way to not depend on the audio driver interface?
Thanks! :)

Windows Store apps allow you to use WASAPI. In WASAPI, there is a concept of "audio sessions", of which there is one for every stream of audio being sent to the soundcard. You can enumerate the audio sessions which give you access to IAudioSessionControl. However, this doesn't let you manipulate the audio, which as far as I know WASAPI simply doesn't allow. The best you can hope for is to get hold of ISimpleAudioVolume for each session, but last time I tried that, I found that you couldn't get hold of the session GUIDs you needed to adjust the volume for other processes. You may be able to get hold of the audio endpoints and adjust the master volume for the soundcard.
In short, WASAPI is the most powerful audio API for Windows Store apps but unfortunately I don't think it will let you do very much of what you are asking here.

Related

How to use `getUserMedia()` api to simulate WebRTC like behaviour?

My primary intention is to setup a VoIP session between 2 users A & B; Here the raw audio / video media bytes are fetched from A's browser are played in B's browser and vice versa.
The reason is that, when the user C & D are added into this call, we need not have to create a P2P mesh network which limits the performance.
Tried recording media with getUserMedia() and playback, but it is not real time. It also gives a bad user experience. (However, haven't experimented yet with videos of small chunks as 200 ms)
Is there any approach where I can get the raw bytes of the media and play it on other browser? Currently I have a server in between which can connect to both peers if required.
Any online examples or libraries are welcome.
Have already asked 2 questions in this regard with 100-100 bounties, but not much of use:
How to use libsrtp or similar library to decrypt/encrypt the WebRTC data stream?
How to integrate part of WebRTC as a static / dynamic library with the existing C++ code?
Related: How to stream, live video playing on my browser to browser of another user?

If i understand you well is you're looking on how to have more than two users on the session right? without using mesh topology
thats possible and configurable as well by means that some maybe active speaker or everyone is active speaker not only receiver whatever configuration you choose but to me it seems that you're asking for video conferencing
there are couple of tools for this the best one i might recommend is mediasoup its a SFU as selective fowarding unit mediasoup

I don't know if I understand correctly, but it is not likely that you will get raw video data and play it on the browser, it will just kill your bandwith and performance because the raw data is huge.
You need to use the compressed data ( media codec ex.H264 ) and you need a protocol to send and receive it. If you are looking for sub-second latency than webrtc is your best choice in here already. If you have a server in between, distribute your media through that server instead of Mesh. Check this out for webrtc network topologies:
https://antmedia.io/webrtc-servers/

Stream music from streaming platform (Deezer, Spotify, Soundcloud) to Web Audio API

Do any of you, know a way to get the audio stream of a music platform and plug it to the Web Audio API ?
I am doing a music visualizer based on the Web Audio API. It currently reads sounds from the mic of my computer and process a real-time visualization. If I play music loud enough, my viz works !
But now I'd like to move on and only read the sound coming from my computer, so that the visualization render only to the music and no other sound such as people chatting.
I know I can buffer MP3 file in that API and it would work perfectly. But in 2020, streaming music is very common, via Deezer, Spotify, Souncloud etc.
I know they all have an API but they often offer an SDK where you cannot really do more than "play" music. There is no easy access to the stream of audio data. Maybe I am wrong and that is why I ask your help.
Thanks

The way to stream music to WebAudio is to use a MediaElementAudioSourceNode or MediaStreamAudioSourceNode. However, these nodes will output zero unless you're allowed to access the data. This means you have to set the CORS property correctly on your end and also requires the server to allow the access through CORS.
A google search will help with setting up CORS. But many sites won't allow access unless you have the right permissions. Then you are out of luck.

I find a "no-code" work around. At least on Ubuntu 18.04, I am able to tell Firefox to take my speakers as the "microphone input".
You just have to select the good "mic" in the list when your browser asks for mic permission.
That solution is very convenient since I do not need to write platform-specific binding-code to access to the audio stream

WebRTC 5 person conference with recording for playbacks?

I am working on a project for large group broadcasting in WebRTC since it needs to work on iOS and Android devices, I am using Kurento, and iOSWEBRTC cordvoa plugin to build this I am curious if anyone can help improve my plan, or if there is a easier way to achieve this.
We need to have a video/audio conference with 5 people per room, however we need to be able to show that video to large audiences. Now my idea would be use Kurento as a middle-man and capture the streams into .webm files for live playback as the conference is going on.
Is there a better way to achieve this? And how would I playback the webm file as it is being recorded, it needs to update and continue playing as more video is sent, basically a live stream copy of the camera.
I am unsure if I am going the best route but I figured that would reduce the bandwidth from my original idea, I originally was thinking of making it like this:
5 person conference for broadcasters X number of viewers then downloaded those streams however I realize the upload bandwidth requirement would be crazy high, that is why I settled on this idea. Additionally the viewers do not have to see real time like the broadcasters. They need to be able to see and communicate with each other at the same time and the viewers can be a few seconds behind.
TL;DR:
Trying to make a 5 person video conference with video/audio capturing to then live stream it to viewers players. This would allow avoiding of PeerConnection bandwidth limitations. Would this work or am I forgetting something?

You'll need to look into using an SFU or MCU. An MCU is very costly, but multiplexes video streams and sends down a single video stream to all peers, and can also record that stream. An SFU is a single point of receipt of all streams, and selectively forwards them to clients. It could record off individual streams and then you could do post-processing to make a single recording out of the multiple recorded streams. A mesh network of connections really doesn't work for this use case.

Audio hooking or a custom audio driver for audio processing and routing to the default audio device

I have developed a pretty complex audio software for my client with plugins for Winamp, Windows Media player and VST. Now the client is interested in some method to avoid maintaining the multitude of plugins, we have no way to support all the media players out there.
The client does not care for Unix/Mac yet, so I can look only at Windows XP and Vista/7/
Basically, what we need is a way to always reliably intercept as much audio stream protocols as possible (well, except maybe ASIO, that's another story, I guess), then pass this audio through our custom effects engine and then route back to the default audio device, whatever it is.
Now I am thinking, what options do I have (theoretically).
I could use hooks. I need to hook globally older vaweOut and also DirectSound.
But will this still work on Vista/7?
I could use a virtual driver, like the author of the Virtual Audio Cable did:
http://software.muzychenko.net/eng/vac.htm
Seems a pretty daunting task. Anyway, the client will contact the author of VAC to see if he agrees to sell his source code for a reasonable price.
This driver could install itself as a default audio output device, intercept the audio stream from Windows, and pass it back to default device. Hmm, but what about various DirectSound audio buffers, do I have to mix them myself or is there any way I could tell Windows mixer to mix all for me and pass a single mixed audio stream?
It seems, this custom driver will of course kill all the hardware audio acceleration, but we can live with that, if we warn our customers about this issue.
As I understand, the most current Windows driver standard is WDF.
But maybe it does not work for audio on Windows Vista/7?
I know, Vista/7 has a different audio stack from XP.
If I can do it using WDF, what driver should I write - kernel mode or user mode?
Maybe I am missing more elegant and simple options to intercept, process and route audio on Windows?

Try Virtual Audio Streaming SDK. Also virutal sound card and let you read/process audio data in realtime.
http://www.virtualaudiostreaming.net/sdk-license.html

Playing multiple audio streams simultaneously from one audio file

I have written an application that receives media files from a central server and plays those files according to a playlist. All works well.
A client has contacted us and wants to use our application to play some audio files as presentations in a kiosk-style application. So far, so good, our application can handle this no problems.
He has requested as a potential feature that we would have a number of headphone sockets at the front of the kiosk. Each headphone socket would play the same audio presentation in a different language.
I have come up with the idea of encoding a single audio file with the presentation in multiple languages, and each language in a different channel. We would then require a sound card that could decode each channel and output it on a different headphone socket.
Thing is, while I'm think the theory is sound, I have absolutely no idea whether this is feasible and what would be required to pull it off.
Any ideas?!
As a side-note: the application uses Media Player as the underlying component to handle the playback of audio and video. I'd appreciate any help as to the software we could use to generate the multi-channel audio stream and the hardware (USB sound card would be fine) that we could use to decode the stream.
Thanks!

You need to use multiple files not channels, its going to be way easier that way.
Instead of using Media Player use DirectShow (on .NET you have DirectShow.NET), In DirectShow you have the notation of Multiple files on the same graph.
You will be able to control to which audio device play which files, and your Play, Pause, Stop commands will be preformed on all files without you need to worry about syncing.
There are many samples on how to build media player like with DiectShow, extending them to use multiple files should be really easy.
For HW take a look at this (USB with 8 output channels)

I think with Shay's hardware you've got a complete solution:
Encode a 7.1 file with a different mono voice track on each channel.
Use the 8 channel output device in 7.1 mode, with a different headset in each port, and you've got it. Or, if you only have 6 languages, a 5.1 file would work. Many PC's have 5.1 outputs built in, you'd only need 3 splitters to break out the left and right channels from each jack.
You can do the encoding with Windows Media Encoder, or other pro audio tool.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string