Audio formats supported by DICOM - audio

What audio formats are supported by DICOM as per DICOM standards?

DICOM Supplement 30 added support for waveforms to the DICOM Standard. Included in this is support for audio waveforms.
Also, there was the addition of support for MPEG2 video in DICOM Supplement 42. This video support allows encoding of mp3 data within the video stream.

Just for completeness, since the question is a little older:
Supplement 30 indeed added support for audio with the Basic Voice(!) Audio object. However, it only supports 8000 Hz sampling frequency.
Therefore later on DICOM introduced the General Audio Waveform object that permits up to 44,100 Hz.
Both objects are restricted to two audio channels,i.e. permit mono or stereo recordings. You can find information about them in part 3 of the DICOM standard.
Furthermore, DICOM also supports not only MPEG2 but also MPEG4 video. Look into DICOM part 5 for further information.
Michael

Related

does converting from mulaw to linear impact audio quality?

I want to change audio encoding from mulaw to linear in order to use a linear speech recognition model from Google.
I'm using a telephony channel, so audio is encoded in mulaw, 8bits, 8000Hz.
When I use Google Mulaw model, there are some issue with recognizing some short single words -> basically they are not recognized at all -> API returns None
I was wondering if it is a good practise to change the encoding for Linear or Flac?
I already did it, but I cannot really measure the degree of this improvement.
It is always best practice to use either LINEAR16 for headerless audio data or FLAC for headered audio data. They both provide lossless codec. It is good practice to set the sampling rate to 16000 Hz otherwise you can set the sample_rate_hertz to match the native sample rate of the audio source (instead of re-sampling). Since Google Speech to Text API provides various ways to improve the audio quality, you can use World Level Confidence to measure the accuracy for response.
Ideally the audio would be recorded to start with using lossless codec like linear16 ot flac. But once you have it in format like mulaw transcoding it before sending to Google speech-to-text is not helpful.
Consider using model=phone_call and use_enhanced=true for better telephony quality.
For quick experimentation you can use STT UI https://cloud.google.com/speech-to-text/docs/ui-overview.

Watson Speech To Text service works faster for which type of audio file?

I have tried the Watson Speech to Text API for MP3 as well as WAV files. As per my observation, the same length of audio takes less time if its given in MP3 format as compared to WAV. 10 consecutive API calls with different audios took on an average 8.7 seconds for MP3 files. On the other hand the same input in WAV format took average 11.1 seconds. Does the service response time depend on the file type? Which file type is recommended to use to obtain the results faster?
Different encoding formats have different bitrates. mp3 and opus are lossy compression formats (although suitable for speech recognition when bitrates are not too low) so they offer the lowest bitrates. If you need to push less bytes over the network that is typically better for latency, so depending on your network speed you can see shorter processing times when using encoding with lower bitrates.
However, regarding the actual speech recognition process (ignoring the data transfer over the network) all encodings are equally fast since before the recognition starts all the audio is uncompressed, if necessary, and converted to the sampling rate of the target model (broadband or narrowband).

How does an audio converter work?

I currently have the idea to code a small audio converter (e.g. FLAC to MP3 or m4a format) application in C# or Python but my problem is I do not know at all how audio conversion works.
After a research, I heard about Analog-to-digital / Digital-to-analog converter but I guess it would be a Digital-to-digital or something like that isn't it ?
If someone could precisely explain how it works, it would be greatly appreciated.
Thanks.
digital audio is called PCM which is the raw audio format fundamental to any audio processing system ... its uncompressed ... just a series of integers representing the height of the audio curve for each sample of the curve (the Y axis where time is the X axis along this curve)
... this PCM audio can be compressed using some codec then bundled inside a container often together with video or meta data channels ... so to convert audio from A to B you would first need to understand the container spec as well as the compressed audio codec so you can decompress audio A into PCM format ... then do the reverse ... compress the PCM into codec of B then bundle it into the container of B
Before venturing further into this I suggest you master the art of WAVE audio files ... beauty of WAVE is that its just a 44 byte header followed by the uncompressed integers of the audio curve ... write some code to read a WAVE file then parse the header (identify bit depth, sample rate, channel count, endianness) to enable you to iterate across each audio sample for each channel ... prove that its working by sending your bytes into an output WAVE file ... diff input WAVE against output WAVE as they should be identical ... once mastered you are ready to venture into your above stated goal ... do not skip over groking notion of interleaving stereo audio as well as spreading out a single audio sample which has a bit depth of 16 bits across two bytes of storage and the reverse namely stitching together multiple bytes into a single integer with a bit depth of 16, 24 or even 32 bits while keeping endianness squared away ... this may sound scary at first however all necessary details are on the net as its how I taught myself this level of detail
modern audio compression algorithms leverage knowledge of how people perceive sound to discard information which is indiscernible ( lossy ) as opposed to lossless algorithms which retain all the informational load of the source ... opus (http://opus-codec.org/) is a current favorite codec untainted by patents and is open source

Does the FLAC audio format have multiple quality settings?

I am writing an encoding software and dealing with uncompressed wav and flac formats. My question is, flac is supposed to be a lossless format, similar to wav but compressed. However, certain softwares such as JRiver's Media Center offer a 'quality' setting for encoding flac files. Does taht mean they are offering to reduce quality pre-encoding or am i missing something in the flac standard?
The quality parameter for FLAC refers to the quality of compression, not audio. The audio will stay lossless but you get a better compression with higher quality. Higher quality will take more time to compress however.
See docs http://wiki.jriver.com/index.php/Encoding_Settings
Free Lossless Audio Codec (FLAC): FLAC is a popular lossless, freely
available open source encoder. [2] Quality Settings: 0 - 8. Sets the
quality of compression (and not sound, which is lossless), 8 meaning
most compressed/time/effort.

What are the formats supported by MPMoviePlayerController?

Which formats does MPMoviePlayerController Support?
According to the documentation:
Supported Formats
This class plays any movie or audio file supported in iOS. This includes both streamed content and fixed-length files. For movie files, this typically means files with the extensions .mov, .mp4, .mpv, and .3gp and using one of the following compression standards:
H.264 Baseline Profile Level 3.0 video, up to 640 x 480 at 30 fps. (The Baseline profile does not support B frames.)
MPEG-4 Part 2 video (Simple Profile)
If you use this class to play audio files, it displays a white screen with a QuickTime logo while the audio plays. For audio files, this class supports AAC-LC audio at up to 48 kHz, and MP3 (MPEG-1 Audio Layer 3) up to 48 kHz, stereo audio.

Resources