I'm converting an ESP32 project to a Raspberry Pi zero. One of the project behaviors is to play back sound effects based on specific events or triggers. I prefer to use MP3 format so I can store information about the contents of the file in the ID3TAGs to make the files themselves easier to manage. (there are a lot of them!)
I can find examples of using any number of libraries to play mp3s in python, and I found an example of selecting a device using 'sounddevice' but it seems to want numpy arrays to play sound data.
I'm wondering what the easiest and quickest way is to play mp3 files (or should I go to some other file format with a data stub file for each to do my file management?).
Since these behaviors are played as responses, they need to at least start playback quickly (i.e. not wait for a format conversion to take place). And in some cases, other behaviors (such as voice recognition triggers) are already going to add to potential latency on the device in it's total response time.
EDIT: additional info
quickest means processor speed (pi zeros slow down quick under heavy load)
These are real time responses so any 'lag' converting defeats the purpose of the playback.
Also, the device from seeed is configured as an alsa (asound) device
Related
I have an idea that I have been working on, but there are some technical details that I would love to understand before I proceed.
From what I understand, Linux communicates with the underlying hardware through the /dev/. I was messing around with my video cam input to zoom and I found someone explaining that I need to create a virtual device and mount it to the output of another program called v4loop.
My questions are
1- How does Zoom detect the webcams available for input. My /dev directory has 2 "files" called video (/dev/video0 and /dev/video1), yet zoom only detects one webcam. Is the webcam communication done through this video file or not? If yes, why does simply creating one doesn't affect Zoom input choices. If not, how does zoom detect the input and read the webcam feed?
2- can I create a virtual device and write a kernel module for it that feeds the input from a local file. I have written a lot of kernel modules, and I know they have a read, write, release methods. I want to parse the video whenever a read request from zoom is issued. How should the video be encoded? Is it an mp4 or a raw format or something else? How fast should I be sending input (in terms of kilobytes). I think it is a function of my webcam recording specs. If it is 1920x1080, and each pixel is 3 bytes (RGB), and it is recording at 20 fps, I can simply calculate how many bytes are generated per second, but how does Zoom expect the input to be Fed into it. Assuming that it is sending the strean in real time, then it should be reading input every few milliseconds. How do I get access to such information?
Thank you in advance. This is a learning experiment, I am just trying to do something fun that I am motivated to do, while learning more about Linux-hardware communication. I am still a beginner, so please go easy on me.
Apparently, there are two types of /dev/video* files. One for the metadata and the other is for the actual stream from the webcam. Creating a virtual device of the same type as the stream in the /dev directory did result in Zoom recognizing it as an independent webcam, even without creating its metadata file. I did finally achieve what I wanted, but I used OBS Studio virtual camera feature that was added after update 26.0.1, and it is working perfectly so far.
I have a windows phone 8 app which plays audio streams from a remote location or local files using the BackgroundAudioPlayer. I now want to be able to add audio effects, for example, reverb or echo, etc...
Please could you advise me on how to do this? I haven't been able to find a way of hooking extra audio processing code into the pipeline of audio processing even through I've read much about WASAPI, XAudio2 and looked at many code examples.
Note that the app is written in C# but, from my previous experience with writing audio processing code, I know that I should be writing the audio code in native C++. Roughly speaking, I need to find a point at which there is an audio buffer containing raw PCM data which I can use as an input for my audio processing code which will then write either back to the same buffer or to another buffer which is read by the next stage of audio processing. There need to be ways of synchronizing what happens in my code with the rest of the phone's audio processing mechanisms and, of course, the process needs to be very fast so as not to cause audio glitches. Or something like that; I'm used to how VST works, not how such things might work in the Windows Phone world.
Looking forward to seeing what you suggest...
Kind regards,
Matt Daley
I need to find a point at which there is an audio buffer containing
raw PCM data
AFAIK there's no such point. This MSDN page hints that audio/video decoding is performed not by the OS, but by the Qualcomm chip itself.
You can use something like Mp3Sharp for decoding. This way the mp3 will be decoded on the CPU by your managed code, you can interfere / process however you like, then feed the PCM into the media stream source. Main downside - battery life: the hardware-provided codecs should be much more power-efficient.
I have a program written in C++ that uses RtAudio ( Directsound ) to capture and playback audio at 48kHz samplerate.
The input capture uses a callback option. The callback writes data to a ringbuffer.
The output is a blocking write function in a separate thread that reads from the ringbuffer.
If the input and output devices are the same the audio loops thru perfectly.
Now I want to get audio from device 1 and playback on device 2. Each device has its own sampleclock set to 48kHz but are not in sync. After a couple of seconds the input and output are out of sync.
Is it possible to sync two independent oudio devices?
There are two challenges you face:
getting the two devices to start at the same time.
getting the two devices to stay in sync.
Both of these tasks are difficult. In the pro audio world, #2 is accomplished with special hardware to sync the word-clocks of multiple devices. It can also be done with a high quality video signal. I believe it can also be done with firewire devices, but I'm not sure how that works. In practice, I have used devices with no sync ("wild") and gotten very reasonable sync for up to an hour or two. Depending on what you are trying to do, the sync should not drift more than a few milliseconds over the course of a few minutes. If it does, you can consider your hardware broken (of course, cheap hardware is often broken).
As for #1, I'm not sure this is possible in any reliable sense with directsound. To the extent that it's possible with any audio API, it is difficult at best: both cards have streams that require some time to setup, open and start playing. In general, the solution is to use an API where this time is super low (ASIO, for example). This works reasonably well for applications like video, but I don't know if it really solves the problem in general.
If you really need to solve this problem, you could open both cards, starting to play silence, and use the timing information generated by the cards to establish the delay between putting data into the card and its eventual playback (this will be different for each card and probably each time you run) and use that data to calculate when to start actual playback. I don't know if RTAudio supplies the necessary timing information, but PortAudio does. This document may help.
I'm using midi files for background sound in my game. I'm creating and playing sound as follow s:
InputStream is = this.getClass().getResourceAsStream(
"/sound/" + bg.mid);
IngameSound = Manager.createPlayer(is, "audio/midi");
IngameSound.setLoopCount(-1);
IngameSound.start();
Using this code,the game play is slow. If wave sound file is used,then game play is fine.How to make game play smooth using midi files?
Sound performance via J2ME is often highly dependent on the device you are using, so what works well on one will often be nearly unusable on another.
However, one thing you can try to do is pre-load and/or prefetch all of your sounds prior to needing to play them (usually during the loading animation for a level), store all your players in an array and just tell them to start/stop/reset when you need to manipulate them. In the past I often found that the biggest performance hit with sound was the initial request to access a hardware resource, so anything you can do to perform all hardware requests as early as possible is usually beneficial.
I am trying to write an application(I'm a gui first timer) for my son, he has autism. There is a video player in the top half and a text entry area in the bottom. When letters are typed sounds are produced to mimic the words in the video.
There have been other posts on this site in regard to playing sounds on key presses, using gstreamer as a system call. I have also tried libcanberra but both seem to have significant delays between sounds. I can write the app in python or C but will likely do at least some of it in C.
I also want to mention that the video portion is being played by gstreamer. I tried to create two instances of gstreamer, to avoid expensive system calls but the audio instance seemed to kill the app when called.
If anyone has any tips on creating faster responding sounds I would really appreciate it.
You can upload a raw audio sample directly to PulseAudio so there will be no decoding and (perhaps save) extra switches by using the following function from Canberra:
http://developer.gnome.org/libcanberra/unstable/libcanberra-canberra.html#ca-context-cache
The next ca_context_play() will use it.
However, the biggest problem you'll encounter with this scenario (with simultaneous video playback) is that the audio device might be configured with large latency with PulseAudio (up to 1/2s or more for normal playback). It may be reasonable to file a bug to libcanberra to support a LOW_LATENCY flag, as it currently doesn't attempt to minimize delay for sound events afaik. That would be great to have.
GStreamer pulsesink could probably get low latency too (it has some properties for that), but I am afraid it won't be as lightweight as libcanberra, and you won't be able to cache a sample for instance. Ideally, GStreamer could also learn to cache samples, or pre-fill PulseAudio...