Streaming/Sending audio files with UNet - audio

I have a small app where one user can talk to the other. It's pretty much a walkie-talkie but one sided(Meaning only one can speak and the other can hear). I have integrated UNet with this app(I need to do it over the LAN and also I have other info to send through the network).
With the other messages(they are simple strings) everything works great, but when I try to send an AudioClip it will be null on the other side. I guess this happens because the AudioClip is not serializable or something similar. So I found SyncListFloat and I add all the audio data to this one(this kinda takes a lot of time which is sort of problematic, I think this happens because the message sending after each Add) but then the Player Prefab(which I use to send data) is no longer added to the scene. If I remove the SyncListFloat variable from the Player script everything works great(except for the audio sending part, obviously).
I would like to know how can I send and audio files through UNet and if SyncListFloat is the answer how can I add the whole byte array in one go.

Related

Sending a webcam input to zoom using a recorded clip

I have an idea that I have been working on, but there are some technical details that I would love to understand before I proceed.
From what I understand, Linux communicates with the underlying hardware through the /dev/. I was messing around with my video cam input to zoom and I found someone explaining that I need to create a virtual device and mount it to the output of another program called v4loop.
My questions are
1- How does Zoom detect the webcams available for input. My /dev directory has 2 "files" called video (/dev/video0 and /dev/video1), yet zoom only detects one webcam. Is the webcam communication done through this video file or not? If yes, why does simply creating one doesn't affect Zoom input choices. If not, how does zoom detect the input and read the webcam feed?
2- can I create a virtual device and write a kernel module for it that feeds the input from a local file. I have written a lot of kernel modules, and I know they have a read, write, release methods. I want to parse the video whenever a read request from zoom is issued. How should the video be encoded? Is it an mp4 or a raw format or something else? How fast should I be sending input (in terms of kilobytes). I think it is a function of my webcam recording specs. If it is 1920x1080, and each pixel is 3 bytes (RGB), and it is recording at 20 fps, I can simply calculate how many bytes are generated per second, but how does Zoom expect the input to be Fed into it. Assuming that it is sending the strean in real time, then it should be reading input every few milliseconds. How do I get access to such information?
Thank you in advance. This is a learning experiment, I am just trying to do something fun that I am motivated to do, while learning more about Linux-hardware communication. I am still a beginner, so please go easy on me.
Apparently, there are two types of /dev/video* files. One for the metadata and the other is for the actual stream from the webcam. Creating a virtual device of the same type as the stream in the /dev directory did result in Zoom recognizing it as an independent webcam, even without creating its metadata file. I did finally achieve what I wanted, but I used OBS Studio virtual camera feature that was added after update 26.0.1, and it is working perfectly so far.

Icecast: I have strange behaviour with repeats of end of tracks, as well as pitch changes from my Icecast Server

I only began using icecast a few days ago, so if I stuffed something up somewhere, please let me know.
I have a weird problem with icecast. Everytime a track is "finished" on icecast, a section of the end of the currently playing track (i think 64kbs of the track) is repeated about 2 to 3 times before the next song plays, but the next song doesn't begin playing in the start, but a few seconds of the way through. Also, I can notice that the playback speed (and hence the pitch) sometimes differs from the original as well.
I consulted this post and this post that was quoted below which taught me what the <burst-on-connect> and the <burst-size> tags are used for. It also taught me this:
What's happening here is that nothing is being added to the buffer, so clients connect, get the contents of that buffer, and then the stream ends. The client must be re-connecting repeatedly, and it keeps getting that same buffer.
Cheers to Brad for that post. A solution to this problem was provided in a comments section of that post and it said to decrease the <source-timeout> of the icecast server, so that it will close the connection quicker and stop any repeating. But this is assuming I want to close the mountpoint, and I dont, because what I am using Icecast for is actually a 24/7 radio player. If I did close my mountpoint, then what happens is VLC just turns off and doesn't repeatedly attempt to connect anymore. Unless this is wrong. I don't know.
I use VLC to hear the playback of the icecast streams and I use nodeshout which is a bunch of bindings from libshout built for node.js. I use nodeshout to send data to a bunch of mounts on my icecast server. In the future I plan to make a site that will listen to the icecast streams, meaning it will replace VLC.
icecast.xml
<limits>
<clients>100</clients>
<sources>4</sources>
<queue-size>1008576</queue-size>
<client-timeout>30</client-timeout>
<header-timeout>15</header-timeout>
<source-timeout>30</source-timeout>
<burst-on-connect>1</burst-on-connect>
<burst-size>252144</burst-size>
</limits>
This is a summary of the audio sending code on my node.js server.
nodejs
// these lines of code is a smaller part of a function, and this sets all the information. The variables name, description etc come from the arguments of the function
var nodeshout = require("nodeshout");
let shout = nodeshout.create();
shout.setHost('localhost');
shout.setPort(8000);
shout.setUser('source');
shout.setPassword(process.env.icecastPassword); //password in .env file
shout.setName(name);
shout.setDescription(description);
shout.setMount(mount);
shout.setGenre(genre);
shout.setFormat(1); // 0=ogg, 1=mp3
shout.setAudioInfo('bitrate', '128');
shout.setAudioInfo('samplerate', '44100');
shout.setAudioInfo('channels', '2');
return shout
// now meanwhile somewhere lower in the file, there is this summary of how the audio is sent to the icecast server
var nodeshout = require("nodeshout")
var {FileReadStream, ShoutStream} = require("nodeshout") //here is where the FileReadStream and ShoutStream functions come from
const filecontent = new FileReadStream(pathToSong, 65536); //if I change the 65536 to a higher value, then more bytes are being repeated at the end of the track. If I decrease this, it starts sounding buggy and off.
var streamcontent = filecontent.pipe(new ShoutStream(shoutstream))
streamcontent.on('finish', () => {
next()
console.log("Track has finished on " + stream.name + ": " + chosenTrack)
})
I also notice weirder behaviour. After the previous song had it's last chunk repeated a few times, that's when the server calls the streamcontent.on('finish') event that is located in the nodejs script, and only then does it warn me that the track is finished.
What I have tried
I tried messing around with the <source-timeout> tag, the number of bytes (or bits im not sure) that are being sent on nodejs, the burst size, I also tried turning bursting off completely but it results in super strange behavior.
I also thought creating a new stream every time per song was a bad idea as seen in new ShoutStream(shoutstream) when piping the file data, but using the same stream meant that the program would return an error because it would write the next track to the shoutstream after it had said it had closed.
If any more information is necessary to figure out what is going on, I can provide it. Thanks for your time.
Edit: I would like to add: Do you think I should manually control how many bytes are sent to icecast and then use the same stream object instead of calling a new one every time?
I found out why the stream didn't play some tracks as opposed to others.
How I got there
I could not switch to ogg/vorbis or ogg/opus for my stream, so I had to do something with my source client. I double checked everything was correct and that my audio files were in the correct bitrate. When i ran the ffprobe tool with ffprobe audio.mp3 sometimes the bitrates did not adhere to the typical rates of 120kbps, 128kbps, 192, 312, etc etc so on. It was always some strange value such as 129852 just to provide an example.
I then downloaded the checkmate mp3 checker here and checked my audio files, and they were all encoded in a variable bitrate!!! VBR damnit!
TLDR
I fixed my problem by re-encoding all my tracks to a constant bitrate of 128kbps using ffmpeg.
Quick Edit: I am pretty sure that programs such as Darkice might already support variable bit rate transfers to Icecast servers, but it would be impractical for me to use darkice, hence why I stuck with nodeshout.

How to simultaneously play two PCM audio streams in one DMA

I am working on a project on the stm32f769i-disco board, which has to run a video game made by me. The graphical part is solved, where I have problems is when I need to play two or more wav files, I do not know which is the correct way to add them in a single stream.
What is the algorithm that I must follow for more than one stream?
PD: I know manage a single stream which is played through dma with double buffer.

low latency sounds on key presses

I am trying to write an application(I'm a gui first timer) for my son, he has autism. There is a video player in the top half and a text entry area in the bottom. When letters are typed sounds are produced to mimic the words in the video.
There have been other posts on this site in regard to playing sounds on key presses, using gstreamer as a system call. I have also tried libcanberra but both seem to have significant delays between sounds. I can write the app in python or C but will likely do at least some of it in C.
I also want to mention that the video portion is being played by gstreamer. I tried to create two instances of gstreamer, to avoid expensive system calls but the audio instance seemed to kill the app when called.
If anyone has any tips on creating faster responding sounds I would really appreciate it.
You can upload a raw audio sample directly to PulseAudio so there will be no decoding and (perhaps save) extra switches by using the following function from Canberra:
http://developer.gnome.org/libcanberra/unstable/libcanberra-canberra.html#ca-context-cache
The next ca_context_play() will use it.
However, the biggest problem you'll encounter with this scenario (with simultaneous video playback) is that the audio device might be configured with large latency with PulseAudio (up to 1/2s or more for normal playback). It may be reasonable to file a bug to libcanberra to support a LOW_LATENCY flag, as it currently doesn't attempt to minimize delay for sound events afaik. That would be great to have.
GStreamer pulsesink could probably get low latency too (it has some properties for that), but I am afraid it won't be as lightweight as libcanberra, and you won't be able to cache a sample for instance. Ideally, GStreamer could also learn to cache samples, or pre-fill PulseAudio...

How does youtube support starting playback from any part of the video?

Basically I'm trying to replicate YouTube's ability to begin video playback from any part of hosted movie. So if you have a 60 minute video, a user could skip straight to the 30 minute mark without streaming the first 30 minutes of video. Does anyone have an idea how YouTube accomplishes this?
Well the player opens the HTTP resource like normal. When you hit the seek bar, the player requests a different portion of the file.
It passes a header like this:
RANGE: bytes-unit = 10001\n\n
and the server serves the resource from that byte range. Depending on the codec it will need to read until it gets to a sync frame to begin playback
Video is a series of frames, played at a frame rate. That said, there are some rules about the order of what frames can be decoded.
Essentially, you have reference frames (called I-Frames) and you have modification frames (class P-Frames and B-Frames)... It is generally true that a properly configured decoder will be able to join a stream on any I-Frame (that is, start decoding), but not on P and B frames... So, when the user drags the slider, you're going to need to find the closest I frame and decode that...
This may of course be hidden under the hood of Flash for you, but that is what it will be doing...
I don't know how YouTube does it, but if you're looking to replicate the functionality, check out Annodex. It's an open standard that is based on Ogg Theora, but with an extra XML metadata stream.
Annodex allows you to have links to named sections within the video or temporal URIs to specific times in the video. Using libannodex, the server can seek to the relevant part of the video and start serving it from there.
If I were to guess, it would be some sort of selective data retrieval, like the Range header in HTTP. that might even be what they use. You can find more about it here.

Resources