Using Cortana for dictation of documents - azure

I'm currently doing research about Cortana as I'm interested in doing some development of custom skills for it. Currently I'm using Cortana to invoke Windows Speech Recognition where I can then use WSR as a means to dictate text into Word. I'm experimenting with this as a possibility to be used for recording and generating a transcript in real time for meetings.
Now this is quite a hassle as I've found and I'm curious to know if there is something that I can do to integrate a bot within Cortana for the same purpose. I've looked up and done some reading about Azure Bot Framework, Cognitive Services, LUIS, etc.
Is it possible to develop such a solution using the above mentioned services ?
Thank you in advance !

Yes, it is possible.
You can feed the streams to the Speech to Text API, then chunk the audio according to the returned Offset and Duration of each phrase, then send those chunks to the Speaker Recognition API to identify the speaker by name so you'd have a name for each chunk to put with it's transcribed phrase and create a dialog out of
Since you're considering it mainly for meetings, the solution you've mentioned was announced a while ago as a feature of Microsoft Teams, and it is going to be publicly available in the near feature, you can also watch a demo that was presented at Build 2018 from here

Related

Is it possible to record audio and play it afterwards on Google Assistant?

I'd like to know if it's possible to record an audio extract on Google Assistant (with Dialogflow and Firebase) and play it later? The idea is to:
Ask the user to tell his name vocally.
Record it.
Play it afterwards.
I read these answers. The answer was no, but maybe there's an update as now we can listen to what we said on Google Assistant "myactivity" as seen here.
The answer is still no, as a developer. While users have recordings available, there's no way to access them programmatically and no way to play them through Dialogflow.
You can read this great article which explains a trick to record audio and play it back on the Google Assistant using a progressive web app.
The design consists of thses two parts:
A web client for recording and uploading the audio files to Google Cloud Storage.
An Action that plays the audio file from Cloud Storage
You'll also find the url of the open-sourced code on github at the end of the article.

Convert live audio stream into text while conferencing using WebRTC

I am implementing a system like video conferencing using WebRTC and NodeJS.
but i want to add some extra feature to it , suppose there is one moderator and 5 audiences who is asking question , so 1 is busy with 1 moderator , rest audiences record their questions ,which will be converted to text and will be shown on moderator's screen , so that based on that moderator can answer as per his requirement and leave unwanted questions. hope you can imagine the system.
first thing is , is it doable?
if yes , any help will be appreciated.
You should simply try Google Speech Recognition API, same as Traslator.js do. Speech Recognition API can convert audio into text which can be further played as voice using either Google Translation API or meSpeak.js.
RecordRTC.js can be used only for wav/webm recordings. It is incapable to convert voice into text.
Updated at: 11:23 am -- Saturday, 7 June 2014 (UTC)
Personally I think Google Translation API is the only "Official" i.e. "non-free" API. Speech Recognition API is naively supported both on chrome and Firefox and it is part of some kind of specification, though submitted by Google developers.
Web Speech API Specification: https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

Detecting known words using the Web Speech API

I'm in the planning stages of a web app that is intended to help children learn vocabulary. We would like to make the app as interactive as possible. For example, we would show a picture of an apple and ask the child to identify the object. The child would then say "apple" and we would determine whether they are correct, etc.
The new Web Speech API seems like a promising tool for this project. However, looking through the documentation, I believe it will only produce transcripts from speech (i.e. it cannot match a spoken word to a known word and produce a confidence value – at least not out of the box).
Does anyone have experience with leveraging the Web Speech API in this way (or any other API for that matter)? I'm trying to stick to technologies that can run in the browser, if possible.
Try ispikit.com, it's way more suitable for your needs, it's specifically designed for education tasks and works in a browser on a client side. Web Speech API is not designed for detection, so you will not be able to use it.

Is possible to acces the waveform of a song from a spotify app?

I am thinking on how to build an spotify app that does beat detection (extract bpm of a song).
For that I need to access the raw audio, the waveform, and analyze it.
I am new to building spotify apps.
I know that with "libspotify" you can access raw audio. Can you do the same through the spotify apps API? And how?
For the record, currently exist two spotify apps apis:
Current
Preview
Unless you're really keen on writing that beat detection code yourself, you should look at the APIs provided by the EchoNest, which include that (and many other awesome things).
see Getting the tempo, key signature, and other audio attributes of a song
In a word: no. That isn't currently available in the Apps API.
There’s a new endpoint I guess. See an example https://medium.com/swlh/creating-waveforms-out-of-spotify-tracks-b22030dd442b?source=linkShare-962ec94337a0-1616364513
That uses the endpoint https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-analysis/
Edit: I agree with commenter #wizbcn that this does not answer this question. Is it sort of incorrect to leave it here because I found this SO post while searching for info about visualizing the tack's waveform as in the linked article? Maybe I should make this a comment instead?

how to implement a web site like youtube?

I'm doing a language web site for my university language center, where students login and see videos to learn English. i have to do it like this,
person is logging in to the system, search using a search area and find the details,lessons and videos relevant to that videos. this functionality exactly matches the youtube scenario.
for implementing twitter like functionality we can use status-net, is there a similer library, statusnet like famous implementation for youtube or a some kind of platform or a framework like codeigniter that we can use to implement youtube like site very easily??
please suggest some options?? a open source one or a commercial one ???
and what is the best video format to use in a such web site?? flv?? mp4?? or mov???
regards,
Rangana
Your best option is to use a 'cloud' based video processing service. Most have a sample project / library for many different languages and frameworks. Here is a list of a few I've tried and liked:
http://zencoder.com/
http://transloadit.com/
http://pandastream.com/
The typical steps involve uploading the video files to a large 'cloud' static asset host (such as S3) through the browser. If you are inexperienced it is best to select a processor that provides an uploader (it will handle putting the files in the right spot). Of the three, Transloadit and Panda both have custom unloaders.
Usually the service will allow you to either pass the encoding settings (what formats and qualities to) output to as parameters or configure them in your account. To support all current HTML5 browsers you just need H264 (.m4a) and OGG (.ogv). However, the new trend in the video world is for WEBM (.webm) so you might want to include it as well.
Next you will receive a unique code from the web service that you must store in persistent storage (database). The web service can be configured to 'callback' (perform an HTTP POST or GET request to your service) once the video is encoded.
Once your recieve a callback you can activate your video and start dislpaying it on your pages. For displaying, if you are inexperienced I'd highly recommend you use one of the following players:
http://sublimevideo.net/
http://longtailvideo.com/
http://videojs.com/
They all do similar things for different prices. My current personal favourite is Sublime Video (it offers cool light box effects and a gorgeous player).
Why do you have to re-implement Youtube when you can just use it for hosting your videos for free? Many online e-learning portals (e.g. Khan academy) do exactly that.
As far as the best video format to use -- go read about H.264/AVC. It's what Youtube currently uses.
I think you will not find already built solution ;)
But it's not really that hard. You can use existing frameworks that will make your life easier while you build account management system, the rest shouldn't be really that hard (assuming you don't really want to re-build the whole Youtube ;D ).
For playing videos, you can use JW Player. A great piece of software, you should check it out.

Resources