I have an external audio source that transmits audio data to my computer's sound card via S/PDIF. The sound card has an S/PDIF input. With "arecord" or "audacity" I can record over this input without any problems.
The audio source offers the data in different sample rates (32 kHz, 44.1 kHz, 48 kHz) which I cannot influence. I also can't tell from the source which sample rate the audio source has selected.
For "recording" I would now very much like to keep the sample rate and not have it converted (apparently by the sound card).
Now finally my question: Can I somehow detect with the help of Linux in which format and with which parameters the S/PDIF stream is encoded
Related
I am now working on a Audio algorithm which need 256 samples of audio from a micro phone and i need to process this 256 samples and result should get played on a speaker. I have done it using two wave file which is already on the file, now i need to do it in the real time.
Need a solution for this
I have a scenario where some audio is being received over the internet. The audio itself has lot of noise, which needs to be filtered out. Received audio is raw PCM 16 bit.
Tools like audacity can remove noise, but they create a noise profile and then remove the noise from part of or from the whole file. I want to instead remove noise from the audio as it comes in and gets written to a buffer, so that once all the audio is received and written to the buffer, noise reduction is already completed and the audio can be played out. Each packet from the network sends around 1 KB of audio, and the total audio is around 1 MB.
The audio contains conversation between two people, so I want to keep the audio within voice recording range (80-255 Hz from the comments).
I want to ask if anyone knows of any algorithm that can achieve this.
I use ffmpeg to convert videos from one format to another.
Is bitrate the only parameter which decides the output size of a video/audio file?
Yes, bitrate is essentially what will control the file size (for a given playback duration). It is the number of bits used to represent each second of material.
However, there are some subtleties, e.g. :
a video file encoded at a certain video bitrate probably contains a separate audio stream, with a separately-specified bitrate
most file formats will contain some metadata that won't be counted towards the basic video stream bitrate
sometimes the algorithm will not actually aim to achieve the specified bitrate - for example, using the CRF factor. http://trac.ffmpeg.org/wiki/x264EncodingGuide explains how two-pass would be preferred if targeting a specific file size.
So you may want to do a little experimenting with a particular set of options for a particular file format.
Bitrate describes the quality of an audio or video file.
For example, an MP3 audio file that is compressed at 192 Kbps will have a greater dynamic range and may sound slightly more clear than the same audio file compressed at 128 Kbps. This is because more bits are used to represent the audio data for each second of playback.
Similarly, a video file that is compressed at 3000 Kbps will look better than the same file compressed at 1000 Kbps. Just like the quality of an image is measured in resolution, the quality of an audio or video file is measured by the bitrate.
I asked earlier about H264 at RTP H.264 Packet Depacketizer
My question now is about the audio packets.
I noticed via the RTP packets that audio frames like AAC, G.711, G.726 and others all have the Marker Bit set.
I think frames are independent. am I right?
My question is: Audio is small, but I know that I can have more than one frame per RTP packet. Independent of how many frames I have, they are complete? Or it may be fragmented between RTP packets.
The difference between audio and video is that audio is typically encoded either in individual samples, or in certain [small] frames without reference to previous data. Additionally, amount of data is small. So audio does not typically need complicated fragmentation to be transmitted over RTP. However, for any payload type you should again refer to RFC that describes the details:
AAC - RTP Payload Format for MPEG-4 Audio/Visual Streams
G.711 - RTP Payload Format for ITU-T Recommendation G.711.1
G.726 - RTP Profile for Audio and Video Conferences with Minimal Control
Other
I might be asking the wrong question, but my knowledge in this area is very limited.
I'm using acmStreamConvert to convert PCM to GSM (6.10).
Audio Format: 8khz, 16-bit, mono
For the PCM buffer size I'm using 640 bytes (320 samples). For GSM buffer I'm using 65 bytes. My understanding is that GSM "always" converts 320 samples to 65 bytes.
The reason I ask "block or stream" is I'm wondering if I can safely convert multiple audio streams (real-time) using the same acmStreamConvert handle? I see the function has some flags for ACM_STREAMCONVERTF_START and ACM_STREAMCONVERTF_END and ACM_STREAMCONVERTF_BLOCKALIGN, but is it required I use this start/end sequence for GSM? I understand that might be required for some formats that use head/tails, but I'm hoping this isn't required for GSM format?
I'm working on a group VOIP client, and each client sends GSM format, and then needs to convert to PCM before playing. I'm hoping I don't need one ACM handle per client.
Stream based, or at least the ACM API usage of it is. Trying to use the same ACM objects/handles for multiple streams will produce undesired results. I suspect this also means it doesn't handle lost packets as well as other codecs might (haven't confirmed that part yet).