Why does the docs example from the Symphonia Rust crate fail at creating the decoder - audio

I was trying to follow along here and when I copy & paste that code (and replace the path variable with a raw string:)
let path: &str = r#"F:/testing.webm"#;
It gives me a Thread 'main' panicked at 'unsupported codec: Unsupported("core (codec):unsupported codec")'
on the let mut decoder line.
I tried multiple .webm videos did not work.
I tried using mp3's and enabling the mp3 feature on the crate, and no luck.

Related

How do I remove or replace non UTF-8 characters in a file?

I have a CSV file which cannot simply be loaded by using the csv crate, as it complains about a handful of non UTF-8 characters. I want to either remove, or preferably replace the non UTF-8 characters with a '£' sign (which is what they were in the source system).
I decided the simplest way might be to read the file and convert the characters before treating it with the csv crate.
I am quite new to rust, but have managed to read the file into a Vec<u8> and now need to iterate through the file replacing or removing the three bytes which contain
11101111 10111101 10111101 wherever they occur in the relatively small file (5k).
What is the best way to do this?
To read the file containing ISO 8859-1 data into string records, you can transcode it to UTF-8 using the encoding_rs crate:
use encoding_rs::WINDOWS_1252; // ISO 8859-1
use encoding_rs_io::DecodeReaderBytesBuilder;
let file = DecodeReaderBytesBuilder::new()
.encoding(Some(WINDOWS_1252))
.build(std::fs::File::open(path).unwrap());
let mut csv_reader = csv::Reader::from_reader(file);
// ... use it ...

OpenCV Nodejs prepare image to OCR TesseractJS, remove dots

I'm trying to read data over Tesseract from image captured by webcamera. Here is example of used image:
I'm working on nodejs server, and I tried a lot of technique in Jimp including doing invert/grayscale, using sharpening to image, or fiiltering specific colors /yellow/blue/ ... after all I build separated docker container using opencv4nodejs and apply few techniques to extract text from that image.
I need mostly big texts (so small one are not neccessary /also are not sharp on this image/). So I applied this:
const src = cv.imread('./970f5b45-9f24-41d5-91f0-ef3f8b9d8914.jpeg');
let src2 = src.cvtColor(cv.COLOR_BGR2GRAY)
let dst = src2.adaptiveThreshold(255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 12, 2);
let dst2 = dst.morphologyEx(cv.MORPH_OPEN)
After that I have this result, which is almost ready for reading by OCR, problem is a lot of dots in that image. Is there any chance to remove that dots, but keep quality of result (readable texts) in opencv, or other technique?
Result is right now:
Is it possible to extract just texts from that result? If I use this result in ocr by tesseract, it takes really a long time to extract text, and there is a huge amount of weird characters (probably because of dots/shapes).

What does the google's speech-to-text configuration looks like for an .opus audio file

I am passing a .opus audio file to the google's speech-to-text api for transcription. I am using the following configurations:
encoding = enums.RecognitionConfig.AudioEncoding.OGG_OPUS
language_code = "en-US"
sample_rate_hertz = 16000
I am getting the following error:
google.api_core.exceptions.GoogleAPICallError: None Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.
I've tried other encodings like FLAC and LINEAR16 and get None as outputs.
Does opus audio files require additional configuration field and how should the configuration file look like?
After working through the documentations provided by google and a couple of trys, I figured out the solution to the error I was getting. The OGG_OPUS encoding requires explicit configuration definition of audio_channel_count. In my case, the audio channels were 2 and I needed to explicitly define it.
Also, in case of multi-channels, enable_separate_recognition_per_channel needs to be set to True.
The config that worked for me is :
encoding = enums.RecognitionConfig.AudioEncoding.OGG_OPUS
config = {
"audio_channel_count": audio_channel_count,
"enable_separate_recognition_per_channel": enable_separate_recognition_per_channel,
"language_code": language_code,
"sample_rate_hertz": sample_rate_hertz,
"encoding": encoding
}
It is very important that we use the correct values for each parameters in the config file.

FFmpeg library: Muxing audio from external file

I have successfully changed the muxing.c sample to use video frames that I generate on runtime.
I am trying now to replace the get_audio_frame function with a function that decodes an existing audio file, and writes its samples instead of the synthesized audio-samples in the example code.
I've tried using the "audio decoding" example to decode the audio file, but the not sure how / when to write the samples decoded.
I suggest to check the source of my Karaoke Lyrics Editor which is doing exactly what you need based on ffmpeg. See ffmpegvideoencoder.cpp, see createFile and encodeImage functions.

Creating an image from a nsIBinaryInputStream

I create a binary input stream using some js trickery which contains compressed image data like jpeg or gif. I want to decode and display this data either using imgITools::decodeImageData or some other way but couldn't find a way yet. Where should I start?
The easiest way is to read the image data into a string, base64 encode the string, then turn that into a data: URL and set that as the src of your image. Unfortunately stackoverflow won't let me create a live data: link, but it would look like this:
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAYAAAByDd+UAAAAMklEQVRIx2NgGAWjYBQMFXAFDRMrR5GF/6H4CglyoxaOWjhq4aiFg7hoGwWjYBTQDgAAy8VWOfRR6fkAAAAASUVORK5CYII="

Resources