What does the google's speech-to-text configuration looks like for an .opus audio file - speech-to-text

I am passing a .opus audio file to the google's speech-to-text api for transcription. I am using the following configurations:
encoding = enums.RecognitionConfig.AudioEncoding.OGG_OPUS
language_code = "en-US"
sample_rate_hertz = 16000
I am getting the following error:
google.api_core.exceptions.GoogleAPICallError: None Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.
I've tried other encodings like FLAC and LINEAR16 and get None as outputs.
Does opus audio files require additional configuration field and how should the configuration file look like?

After working through the documentations provided by google and a couple of trys, I figured out the solution to the error I was getting. The OGG_OPUS encoding requires explicit configuration definition of audio_channel_count. In my case, the audio channels were 2 and I needed to explicitly define it.
Also, in case of multi-channels, enable_separate_recognition_per_channel needs to be set to True.
The config that worked for me is :
encoding = enums.RecognitionConfig.AudioEncoding.OGG_OPUS
config = {
"audio_channel_count": audio_channel_count,
"enable_separate_recognition_per_channel": enable_separate_recognition_per_channel,
"language_code": language_code,
"sample_rate_hertz": sample_rate_hertz,
"encoding": encoding
}
It is very important that we use the correct values for each parameters in the config file.

Related

Convert SILK encoded data to wav file in python

I was surprised I couldn't find an answer to this from a search - maybe I'm using the wrong search terms.
I have what I suspect is a silk compressed datastream (see below) that I would like to turn into a audio file (something like wav ideally). I'm planning on doing this in python, however I have no idea how to do this - I don't get any reasonable searches for how to decompresses the data with silk - maybe there's no silk packages for python (?).
Silk data:
uz+ACgEAEAELgD4EQgEWAKV4mxnepfmhxKCQxAnKVNaHhKRXPIsmAH5RjXmJV0u+WTmrvgyCKxcraehjo/ZeKcFjksXQZEeOju4hLNv/MAB9KA7ww14Vc0ndYPB7dDXoXTexuxcW0Jg/diMgdH5ijWhe02Ch48KX86qJZYFyZV81AH76qCgh9AXliMdyWEgWTMbRD6xMX37WJALrXlSnxymIloSq2KGwXCcMXzQiSQIrcLVNfqdNJACCluFOIRKPmugUvsLZmnD04X0xhpAuNkwJECK4t51MBOWNWJlCAIDyZlJwWI45EPTjBB6yKyGOclu96qBV2MhFAh1d2J7WDZwe6YxOVu/BGkGcur9qTP85ZRfjANoiQxQrWvpoHFBFBy0AfX6k8XvbSwrk2nUAEP3P6kcmXORKUNKeu8HDnOUflQqtA5AkkTiun77fZrqnimIfWg==
First, validate your assumption that it is SILK.
If you open an encoded binary file, it should start with the header "#!SILK_V3". Your excerpt seems like some base64 encoding (maybe the raw binary?).
If you do get to the real silk data, you can decode it with the original Skype Silk SDK. There is probably no python port for this, but you can invoke external libs from python.
Once decoded, use ffmpeg/Audacity raw import/other to convert to WAV/mp3 and test the audio. Then in python, using the wave library or wavio api should work.

AVFoundation cannot read wav file format

I'm trying to create a wav file from multiple other wav files.
I use AVAsset, AVAssestReader and AVAssetWriter.
The format setting used for the AVAssetWriterInput and AVAssetReaderAudioMixOutput is created like this:
AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: 44100, channels: 2, interleaved: true)
And the AVAssetWriter is created like this: AVAssetWriter(url: outputURL, fileType: .wav)
Btw I noticed 2 weirds things:
1) When I create an AVAsset from a wav file I haven't any metadta.
The asset creation is:
let url = URL(fileURLWithPath: mWaveFilePath)
let asset = AVAsset(url: url)
I cannot do simpler, and when I look for metadata properties of this asset I get always empty array with wav file...
2) The most important is when I write a wave file I've the feelling that AVFoundation makes some errors in the wave header. Maybe it comes from me but I manage to create a wave file with audio, and followed some tutorial I've a bad time for finding where the error could come from.
Here is an example of good an and bad header:
The good header before importing the file.
We can see that the format tag is set to 1 which mean PCM. That's what we want.
Now the wrong header after the creation of my audio file:
-2... It's clearly wrong.
So did I miss something on using AVFoundation for creating a wav file, should I do something special?

iOS - Convert Audio Format (opus to mp3)

Recently I started to develop application that work with .opus file (Audio Format).
I am working with external SDK that can processor a mp3/wav file, unfortunately my local file is a .opus file and I need to convert it to mp3/wav format in order to process the file.
I read and research a lot around the network to find a solution,
I found the FFmpegWrapper library that can convert two type of Audio Format but when I try to convert .opus to .mp3/ , I get this error: opus codec not supported in WAVE format
I do not know what can be done, I'll be happy to help.
Any information about how to convert .Opus format to any other format will be appreciated.
Thanks
Have you tried using this pod: https://github.com/chrisballinger/Opus-iOS
You can use it to convert your Opus-encoded file to wav, then feed it into your SDK.

How to convert '.opus' file to flac file format

I have an audio file with '.opus' format.
http://img.wbcsrv.com/2017/03/14/4915792368684-41222-919020044692-1489468385000.opus
I need to use it with google cloud speech API. But the google speech API only support some file encodings, specified in https://cloud.google.com/speech/docs/basics#audio-encodings . How can i use 'opus' file format with google cloud speech API?
Is there any way to convert '.opus' file into the specified(googles audio encoding documentation) format or any npm available for do this?
In Node you can use ffmpgeg in several ways, using:
https://www.npmjs.com/package/ffmpeg
https://www.npmjs.com/package/ffmpeg-node
https://www.npmjs.com/package/ffmpeg-static
https://www.npmjs.com/package/ffmpeg-wrap
few more at https://www.npmjs.com/search?q=ffmpeg
The ffmpeg supports Opus according to the docs:
https://www.ffmpeg.org/ffmpeg-codecs.html#opus
https://www.ffmpeg.org/ffmpeg-codecs.html#libopus
https://www.ffmpeg.org/ffmpeg-codecs.html#libopus-1
You may need to use libopus for that:
http://opus-codec.org/downloads/
The ffmpeg also supports encoding FLAC so it can also be used for that part:
https://www.ffmpeg.org/ffmpeg-codecs.html#flac-2
There is not a straightforward way to convert Opus to Flac using Node without any external dependencies but it should be possible using the modules and libraries above.

FFmpeg library: Muxing audio from external file

I have successfully changed the muxing.c sample to use video frames that I generate on runtime.
I am trying now to replace the get_audio_frame function with a function that decodes an existing audio file, and writes its samples instead of the synthesized audio-samples in the example code.
I've tried using the "audio decoding" example to decode the audio file, but the not sure how / when to write the samples decoded.
I suggest to check the source of my Karaoke Lyrics Editor which is doing exactly what you need based on ffmpeg. See ffmpegvideoencoder.cpp, see createFile and encodeImage functions.

Resources