I want to create a software:
- Input as a video stream H264 ( from another software)
- Output as a webcam for my friends can watch in skype, yahoo, or something like that.
I knows I need to create directshow filter to do that, but I dont know what type filter I must to create.
And when I have a filter, I dont know how to import it to my application?
I need a example or a tutorial, please help me
You need to create a virtual video source/camera filter. There have been a dozen of questions like this on SO, so I will just link to some of them:
How to write an own capture filter?
Set byte stream as live source in Expression Encoder 4
"Fake" DirectShow video capture device
Windows SDK has PushSource sample which shows how to generate video off a filter. VCam sample you can find online shows what it takes to make a virtual device from video source.
See also: How to implement a "source filter" for splitting camera video based on Vivek's vcam?.
NOTE: Latest versions of Skype are picky as for video devices and ignore virtual devices for no apparent reason.
You should start here : Writing DirectShow Filters or here : Introduction to DirectShow Filter Development
I assume you already have Windows SDK for such develpment, if not check this
Related
This is my first question, new and fresh, hello guys.
As the title mentions, is there any workaround or way to add audio inside dialog-speech-template? As it doesn't support mp3, and only wav, I found it hard to implement.
The audio I wanted to get is origin from API, and hence it's not possible for me to download the mp3 file and convert it (as changes may happen to the audio).
Is there any programmatic way to convert the mp3 audio to wav? I am pretty new to Bixby, hope elders here can help.
Unfortunately, Bixby SSML only for certain wav format. Please refer SSML#AudioClip for details. There are also instructions how to convert using ffmpeg tool.
To support mp3 format, you can raise a Feature Request in our community. This forum is open to other Bixby developers who can upvote it, leading to more visibility within the community and with the Product Management team.
i want to develop an google-action. (ideally using dialogflow).
but the google-action needs some features where i couldn't find a solution, and i'm not sure if it's even possible.
My Usecases:
The google action starts a mps. someone stops and exits the google action, and if the user starts the google action again, i would resume the mp3.
but i couldn't find a solution where i can determine the "offset", when the user stops the mp3.
and even i would have this offset, i didn't find a solution how to tell google assistant, that i want to play the mp3, but starts at e.g. Minute 51.
I would be really wondered, it the google action possibilitys are so extremly restricted.
can someone confirm, that this usecases are not possible, or can someone give me a hint how to do it?
i only found this one, which is restricted to start a mp3 at beginning.
https://developers.google.com/actions/assistant/responses#media_responses
Kind Regards
Stefan
To start an mp3 file at a certain point you can try the SSML tag and its clipBegin property.
clipBegin - A TimeDesignation that is the offset from the audio source's beginning to start playback from. If this value is greater than or equal to the audio source's actual duration, then no audio is inserted.
https://developers.google.com/actions/reference/ssml
To use this, your mp3 file has to be hosted using HTTPS
Hope that this helps.
You could use the conversational actions (instead of dialogflow) where media responses allow using a start_offset
....
"content": {
"media": {
"start_offset": "2.12345s",
...
For more details see
https://developers.google.com/assistant/conversational/prompts-media#MediaResponseProperties
Even conversational actions seem to be the "newest" technology for google actions. Or at least released recently.
I am using the Vimeo Depth Player (https://github.com/vimeo/vimeo-depth-player/) for volumetric videos - only for a hobby/out of curiosity - and I'd like to know more about the parameters we use in the video description (such as in this video: https://vimeo.com/279527916) - I searched for it but I wasn't able to find a description for any of the supported parameters.
Does anyone here knows where to find such description?
Thanks!
Unfortunately, this JSON config is not publicly documented anywhere right now, except for the source code which parses it.
If you are using Depthkit to do a volumetric capture, they automatically generate this configuration for you so you don't have to worry about what it means.
https://docs.depthkit.tv/docs/the-depthkit-workflow
The point of this config is to mathematically describe how the subject was captured. e.g. How far is the subject from the camera? Without all of this, you won't be able to properly reconstruct the volumetric capture.
I've got a collaborative youtube playlist with some friends that we use when we get together to play games. The problem is that the internet connection where we get together is quite bad. So I made a little script where people can send songs using bluetooth or by sending a youtube link (youtube-dl downloads the mp3 file of that video using a script that uses the currently selected (youtube) link). I wanted an easier method of adding videos to the offline playlist.
I want to use the collaborative playlist to determine which songs are to be downloaded but I only want the newest additions to the playlist (since the last check/download) is it possible to retrieve the latest youtube playlist items in linux bash?
Have a look at the video selection options. In particular, --download-archive can be used for this purpose.
Simply run youtube-dl --download-archive /path/to/the/archive/file playlist_url. This will download all new songs in the playlist. If your playlist is large, you can also use --playlist-end 42 to only consider the first 42 songs.
I asked this question on SuperUser, but it's fallen on deaf ears. Hopefully I can get more of an audience here.
I'm looking for a low cost (or Free) solution like ScriptVox only with a better engine. That is, to read in a script and assign characters to voice. I've read the post here but even with those I'd have to concatenate wav files. It's not that I don't love Audacity, but it is time consuming. I am halfway thinking of writing my own, but I'm sure there has to be a solution out there. Any suggestions?
I would use Microsoft's Text-to-Speech engine. They have a simple example on how to do exactly what you're looking for:
http://msdn.microsoft.com/en-us/library/ms717065(v=vs.85).aspx
With that sample code, you can speak some text and have it dumped to a WAV file. From there, if you need to convert to a format such as MP3, you can use FFMPEG.
Brad's answer is pretty terrific, as it contains exactly what you're looking for. However, it's missing one fundament you'd expressed a preference for in the question errata: an implementation in C#.
Here's a full tutorial to gain access to the Speech API in managed code. With full credit to Blake Niemyjski and the appropriate teams at Microsoft, here's the salient bits, because the linkback to the original article is dead and this appears to be borrowed from Microsoft directly:
The following link (Giving Computers a Voice) will lead you to a
Microsoft site that will show you how to create a project and get a
basic text to speech application up and running in VB .Net or c# in no
time!
SAPI
SAPI is the speech API that gives applications access to speech
recognition and text-to-speech (TTS) engines. This article focuses on
TTS. For TTS, SAPI takes text as input and uses the TTS engine to
output that text as spoken audio. This is the same technology used by
the Windows accessibility tool, Narrator. Every version of Windows
since XP has shipped with SAPI and an English TTS engine.
TTS puts user's ears to work. It allows applications to send
information to the user without requiring the user's eyes or hands.
This is a very powerful output option that isn't often utilized on
PCs.
Three steps are needed to use TTS in a managed application:
Create an interop DLL
Since SAPI is a COM component, an interop DLL is needed to use it from
a managed app. To create this, open the project in Visual Studio.
Select the Project menu and click Add Reference. Select the COM tab,
select "Microsoft Speech Object Library" in the list, and click OK.
These steps add this reference to your project and create an
Interop.SpeechLib.dll in the same folder as your executable. This
interop DLL must always be in the same folder as your .exe to work
correctly.
Reference the interop namespace
Include this namespace in your application. In C#, add "using
SpeechLib;"; iIn VB, add “Imports SpeechLib”.
call Speak()
Create a SpVoice object and call Speak():
Visual C#
SpVoice voice = new SpVoice();
voice.Speak("Hello World!", SpeechVoiceSpeakFlags.SVSFDefault);
Visual Basic
voice = New SpVoice
voice.Speak("Hello World!", SpeechVoiceSpeakFlags.SVSFDefault)
I feel Brad's answer led me to the correct solution here (thus, he's more deserving of credit than I), but this should be the last piece you were missing. You should now be able to replicate the WAV-file writing from the C++ solution in managed code, and from there, transcode into your desired format.
If having the program access internet is acceptable, then you could use iSpeech.
You can use their API, but unfortunately it is limited to 200 uses/day.
Their API also allows appending format=(wav|mp3) following a query, allowing you to get your sound in both desired formats.
http://en.wikipedia.org/wiki/Comparison_of_speech_synthesizers
That's all I've got.
Google translate uses eSpeak http://support.google.com/translate/