I've been hours going around this problem and I still can't solve it. Basically I get data from a database and using google text to speech I transform it into an mp3; after that I upload it to the google cloud storage. From there I use Twilio API to play the mp3 file when making an outbound call; I know I need to have a url for this file but I am very inexperienced in this and when I create a VoiceResponse() I can't input it. I am doing this all through Python. Is it possible for me to play the mp3 in the outbound call?
Best
Related
I am trying to solve a problem where you need to record screens in real time and keep on sending the data to the backend which will store the video as an s3 object(any cloud store).
I did research it, but everywhere I see people are recording the video and send it as a single file after recording is completed, the problem here is the file may be very big to send it as a single file, hence I want it to get saved in real-time in s3.
I have also seen Webrtc which helps in peer to peer communication.
any suggestions around this to implement in GO or Nodejs will be helpful.
Thanks
What you can do is using an SFU. Which will be used to send screen data to using webrtc and save it to a file server-side.
You can use mediasoup for this.
Here is a working example: https://github.com/ethand91/mediasoup3-record-demo
You should check Multipart upload overview.
No matter how large the video will be, you only need to upload each 5M data as a part to S3. Although it doesn't work exactly like a stream, it's almost a stream.
For the GO sdk, please check S3 Golang SDK
I'm coding API server to be used in mp3 app.
I've used koa-send, koa-static, and just setting mp3 file to response-body.
But, no matter what API the app uses, the app stops. When I sent the length of the MP3 file separately because the app did not seem to accept the length of the MP3 file, it worked on iOS but not on Android.
If I post the same MP3 file on S3 and send request to that URL, it worked well, so I can't understand what the problem is.
Also, if I play music on Safari using my API, it comes out as a live broadcast. (using other sites, it comes in the form of mp3)
If it's a problem that you don't know how long it's playing, why is it the same file, but not on other sites, and not on my API?
Other storage site:
My API:
I want to translate an audio file to text via google google cloud speech api.
I record an audio file of type :
audio/x-flac type with codecs=opus.
My audio file transcribed to base64 gives this (I only put the beginning):
GkXfo59ChoEBQveBAULygQRC84EIQoKEd2VibUKHgQRChYECGFOAZwH/////////FUmpZpkq17GDD0JATYCGQ2hyb21lV0GGQ2hyb21lFlSua7+uvdeBAXPFh6ZbkBme7LiDgQKGhkFfT1BVU2Oik09wdXNIZWFkAQEAAIC7AAAAAADhjbWERzuAAJ+BAWJkgSAfQ7Z1Af/////////ngQCjjIEAAID7A//+//7//qOMgQA7gPsD//7//v/+o4yBAHeA+wP//v/+//6jjIEAs4D7A//+//7//qOMgQDvgPsD//7//v/+o4yBASuA+wP//v/+//6jQOmBAWiA+4MCAv/+//5+Gjf43iuAMUp0LKWeBuKWw5OP5gEM9j8POpNh6x2zmKmvvjYztj563NneRFMpQuTaVo6Zmkee3gftByFW4mgqYrsH4JQBFm3muCMtGwBRKxLBqdm0Sz3VOaS5AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAcBGsaNXRSQmfKpNmHAEaVZ120Ho0faji/1DFqduipx5F9Kj2eUmcLGdgvb0QKsY2vitpqZ32XiZ4r4t75RKoMSlBsSKMP5ZjnbOvhJtib9B0b/FLyiGoQQIMEfCOYO2Db+AFUP6AxoUqNBqIEBo4D7g4qXYizKyF771oN3hbF/FQwqgbt5cnU1Aev4S9h0wbwTJuNCyd4PnjBBo2hvEEx29FIzFZZd7AUIhGG/f9YS5C+wIdnjl6i6EsufNN/eETxOZZqSheXcLLzn9G+TO4EeLIodZiZoxAYCF/pud8orEKG24zWYi6bNcg+moXt8/rpXbaLkqJeIoGzQRPFkd738ClvMW+yNUDmuKY1xmXdvtj4XoPwEdVtZHPxa6tABIZEWobIxh5CdCY495Ij9mqi1jFsnHLiYNP48NzvnoCfbAh/a3M13Mijct5n4+zGskka3y4WVAwYvQixFBQkhX1SyH5BcW1d8dvHv8lPxshJ6C9xCKdspDqFeLFrl9IU9y9l9lYF1p7XyZvXfqaz7bzDWV1ZsxhEVMwkVIiyUHch2E+omwR5JjY0+HmiXQAbnzAoP/+AJPZU6tSRENx5y8rRviuK76d/saDudaUZJ/aVG3t6J/kOPKlBg1g5LTr5sXnww/7PqbmH+eGfOBydu4eyfa7jbLmdE6AfYeO3kHs3hn6/nlwDKvYpStvWUHdeRdWe2zdKjQYeBAd+A+4OA
When I use the example of google on this link:
https://cloud.google.com/speech-to-text/
When I upload my audio file and I look the request sent I see that their content is not the same:

The google example works. Mine doesn't work.
Is the encoding different? if so, what is the encoding?
I use base64.exe for transcribe my audio file to base64.
Thanks
I found the solution.
I was using the js MediaRecorder library to generate my flac audio file but it is not compatible for google cloud speech. When you upload it on the example of google cloud speech it modifies the file to make it compatible.
So I changed the library to generate my file, I used recorder-js.
https://www.npmjs.com/package/recorder-js
It is working now.
I need to develop a Google Action which streams an audio/radio stream.
i thought about media response.
But the documentation says: "Audio for playback must be in a correctly formatted .mp3 file. Live streaming is not supported."
Documentation
Can someone give me an hint, what i have to do to stream an audio-stream? i found a german google action "baden fm" which streams their radio. But not sure how they do it.
Kind Regards
Stefan
The only ways to do this currently:
Stream it in chunks of MP3 files, using the callback at the end of streaming to stream the next chunk
Getting listed on TuneIn, Radio.com or iHeartRadio. From observation, Baden FM seems to be using TuneIn
Through an App Action
Use a Web site link that starts streaming via BrowseCarousel or Button
Last 2 options are not helpful if you're going after non-browser-enabled devices.
Also saw this thread which has some insight on MP3 size/duration: How can I tell Actions on Google to stream audio?
Google Actions do not currently support live audio streaming. I'm in contact with them but it seems they have no ETA to support this.
I was successful doing so with an mp3 live stream:
NPR: https://npr-ice.streamguys1.com/live.mp3?ck=1597372625378
but not with mpd
BBC test stream: https://rdmedia.bbc.co.uk/dash/ondemand/testcard/1/client_manifest-audio.mpd
or with the HLS that my company uses ( .m3u8, can't publish the link publicly )
Note: added links as text/code since I'm not sure whether their companies policies are cool with them being indexed.
I have looked into Google Cloud Speech API and got streaming my microphone working on a Node server.
I was then wondering what would be best practice for streaming my microphone from a web frontend? Is it sending an audiostream from getUserMedia to the Node server and pipe it to the API with the Node API client? Or is is simply saving the voice input to a file that I then transmit to the API?
The intent is to "transcribe" instructions (one or two sentences long) and send the result to another API.
I'm aware this question is over a year old and the OP has probably either found an answer or given up, but I spent long enough trying in vain to google this before I figured it out that I wanted to help anyone following in my footsteps: I wrote up a tutorial for basically this exact situation here.